Clustering and OTU table generation

sonumu...@gmail.com

unread,

May 16, 2017, 9:48:51 AM5/16/17

to VSEARCH Forum

Dear VSEARCH experts,

I have a question about clustering and OTU table generation using VSEARCH.

Using below mentioned procedure, I generated an uc mapping file (see data below). Now I would like to convert this into an OTU table with original abundance. I tried several options (uc2otutable.py, map.pl etc) but unable to produce an OUT table.

Could you please let me know function/procedure in vsearch or alternative option to make final OUT table? I heard PIVOT table option work as well but not sure how to do it?

Also, please let me know if there is something wrong in the procedure I used here for clustering.

Looking forward to hear from you…!

Regards

Sunil

20. Dereplication using VSEARCH and removing global singleton

vsearch -derep_fulllength clean.fasta -output clean_derep.fna -sizeout --minuniquesize 2

Writing output file 100%

245823 uniques written, 933142 clusters (global singletons) discarded (79.1%)

21. Clustering

module load vsearch/2.0.3

vsearch --cluster_size clean_derep.fasta --id 0.97 --sizein --sizeout --sizeorder --relabel OTU_ --centroids otus97_vsearch_repset.fasta --uc clean_derep_cluster_uc

Reading file clean_derep.fasta 100%

48008630 nt in 245823 seqs, min 100, max 297, avg 195

Masking 100%

Sorting by abundance 100%

Counting unique k-mers 100%

Clustering 100%

Sorting clusters 100%

Writing clusters 100%

Clusters: 10547 Size min 2, max 3470234, avg 23.3

Singletons: 0, 0.0% of seqs, 0.0% of clusters

21. making OUT table

vsearch -usearch_global clean_derep.fasta -db otus97_vsearch_repset.fasta -strand plus -id 0.97 -uc otu_table_mapping.uc

UC mapping look like this

H 5 145 100.0 + 0 0 = A10_226;size=734721; OTU_6;size=917068;

H 145 146 99.3 + 0 0 146M A10_89;size=330416; OTU_146;size=37375;

H 4 192 100.0 + 0 0 = A10_539;size=762083; OTU_5;size=929140;

H 8 159 100.0 + 0 0 = A10_12;size=401560; OTU_9;size=458630;

H 2 213 100.0 + 0 0 = A10_34;size=1644162; OTU_3;size=2322780;

H 1 190 100.0 + 0 0 = A10_581;size=1975476; OTU_2;size=3250102;

H 4469 190 97.4 + 0 0 189MD A10_712;size=397379; OTU_4470;size=372;

H 7 214 100.0 + 0 0 = A10_171;size=416077; OTU_8;size=536604;

H 0 186 100.0 + 0 0 = A10_21;size=2391901; OTU_1;size=2852990;

H 3 191 100.0 + 0 0 = A10_820;size=1641279; OTU_4;size=3470234;

H 3 191 99.5 + 0 0 191M A10_936;size=287918; OTU_4;size=3470234;

H 8568 191 97.4 + 0 0 141M5I50M A10_598;size=223423; OTU_8569;size=10;

H 10 198 100.0 + 0 0 = A10_124;size=218744; OTU_11;size=278533;

H 12 146 100.0 + 0 0 = A10_88;size=156791; OTU_13;size=248216;

H 11 148 100.0 + 0 0 = A10_3;size=172717; OTU_12;size=337548;

H 13 144 100.0 + 0 0 = A10_139;size=145594; OTU_14;size=268905;

H 9 267 100.0 + 0 0 = A10_1325;size=389839; OTU_10;size=2642492;

H 3 191 99.5 + 0 0 191M A10_26295;size=147308; OTU_4;size=3470234;

H 9 281 100.0 + 0 0 14D267M A10_6781756;size=313058; OTU_10;size=2642492;

H 2746 267 98.5 + 0 0 14I78MD12MI176M A10_1224;size=200518; OTU_2747;size=46973;

H 3 191 99.5 + 0 0 191M A10_464;size=136128; OTU_4;size=3470234;

H 1 190 99.5 + 0 0 190M A10_685;size=134315; OTU_2;size=3250102;

H 15 147 100.0 + 0 0 = A10_3906;size=95598; OTU_16;size=133092;

H 3 191 99.5 + 0 0 191M A10_394;size=103326; OTU_4;size=3470234;

H 19 145 100.0 + 0 0 = A10_39;size=64603; OTU_20;size=97331;

H 9600 191 98.4 + 0 0 2D189M A10_747;size=96852; OTU_9601;size=156;

H 17 145 100.0 + 0 0 = A10_2947;size=71629; OTU_18;size=135810;

H 2 215 99.1 + 0 0 18M2D195M A10_122;size=125897; OTU_3;size=2322780;

H 2677 190 97.9 + 0 0 179MI11M A10_1919;size=76653; OTU_2678;size=1570;

H 108 148 99.3 + 0 0 139MI9M A10_261;size=89155; OTU_109;size=22316;

H 16 142 100.0 + 0 0 = A10_567;size=87479; OTU_17;size=100297;

H 2 214 99.5 + 0 0 18MD195M A10_431;size=90304; OTU_3;size=2322780;

H 2746 281 98.6 + 0 0 92MD12MI176M A10_6781947;size=156876; OTU_2747;size=46973;

H 14 192 100.0 + 0 0 = A10_258;size=125837; OTU_15;size=422795;

H 18 161 100.0 + 0 0 = A10_1677;size=66888; OTU_19;size=83076;

H 6 146 100.0 + 0 0 = A10_13;size=658161; OTU_7;size=1285564;

H 3832 145 97.2 + 0 0 145M4I A10_708;size=85814; OTU_3833;size=103;

H 11 147 99.3 + 0 0 138MI9M A10_108;size=75713; OTU_12;size=337548;

H 5183 145 97.2 + 0 0 142M3D A10_6;size=67760; OTU_5184;size=115;

H 20 139 100.0 + 0 0 = A10_816;size=60761; OTU_21;size=79621;

H 22 103 100.0 + 0 0 = A10_423;size=54082; OTU_23;size=57714;

H 23 144 100.0 + 0 0 = A10_416;size=53035; OTU_24;size=70642;

H 7336 191 97.9 + 0 0 191M A10_10991;size=43215; OTU_7337;size=142;

H 1 190 98.9 + 0 0 190M A10_2144;size=41371; OTU_2;size=3250102;

H 24 173 100.0 + 0 0 = A10_44;size=51278; OTU_25;size=62582;

H 27 148 100.0 + 0 0 = A10_2368;size=45170; OTU_28;size=76682;

H 347 144 97.9 + 0 0 I144M A10_1264;size=33918; OTU_348;size=4456;

H 28 158 100.0 + 0 0 = A10_5478;size=44799; OTU_29;size=80258;

H 7547 191 97.9 + 0 0 172MI19M A10_7590;size=32535; OTU_7548;size=276;

H 3 191 99.0 + 0 0 191M A10_447;size=38856; OTU_4;size=3470234;

H 31 146 100.0 + 0 0 = A10_192;size=32419; OTU_32;size=101444;

H 7080 146 97.3 + 0 0 45I146M A10_294;size=29998; OTU_7081;size=44;

H 33 145 100.0 + 0 0 = A10_1042;size=29135; OTU_34;size=38026;

H 34 146 100.0 + 0 0 = A10_3194;size=26760; OTU_35;size=72476;

H 9600 191 97.9 + 0 0 2D189M A10_5466;size=22991; OTU_9601;size=156;

H 6671 186 98.3 + 0 0 7D179M A10_117;size=25941; OTU_6672;size=205;

H 12 146 99.3 + 0 0 146M A10_52;size=28543; OTU_13;size=248216;

Torbjørn Rognes

unread,

May 18, 2017, 3:58:25 AM5/18/17

to VSEARCH Forum

Hi

You can use the otutabout, mothur_shared_out or biomout option with the usearch_global command to produce an OTU table in the different formats. The query sequence headers must contain sample labels and the database sequence headers must contain otu labels. Please see the manual for details.

- Torbjørn

sonumu...@gmail.com

unread,

May 18, 2017, 5:15:33 AM5/18/17

to VSEARCH Forum

Thank you for you reply.

I am using vsearch/2.0.3, and it says unrecognised command for --biomout or --otutabout options.

My main question was how to back assign the original abundance data (which was reduced during dereplication) in the OTU table. Using the above mentioned steps (20, 21) I manage to produce an OTU table, but abundance information is only connected to dereplicated file not original fasta file.

Looking forward to hear from you..!

Regards

Sunil

Torbjørn Rognes

unread,

May 18, 2017, 5:26:20 AM5/18/17

to VSEARCH Forum

You need to use vsearch version 2.2.0 or later for the biomout, otutabout or mothur_shared_out options. These options will take the original abundances into account when producing the OTU table.

Make sure you use the "sizein" and "sizeout" options at all steps to propagate abundance information all the way.

There exists script to convert uc files into otu tables, but they are not included with vsearch.

- Torbjørn

sonumu...@gmail.com

unread,

May 18, 2017, 1:25:28 PM5/18/17

to VSEARCH Forum

I manage to use --otutabout option with latest vsearch release, but still it looks strange.

My original fasta sequence file looks like this (sample name before underscore and sequnce number after underscore).

Q 1. is this header format is compatible with vsearch?

>A10_3