Dear VSEARCH experts,
I have a question about clustering and OTU table generation using VSEARCH.
Using below mentioned procedure, I generated an uc mapping file (see data below). Now I would like to convert this into an OTU table with original abundance. I tried several options (uc2otutable.py, map.pl etc) but unable to produce an OUT table.
Could you please let me know function/procedure in vsearch or alternative option to make final OUT table? I heard PIVOT table option work as well but not sure how to do it?
Also, please let me know if there is something wrong in the procedure I used here for clustering.
Looking forward to hear from you…!
Regards
Sunil
20. Dereplication using VSEARCH and removing global singleton
vsearch -derep_fulllength clean.fasta -output clean_derep.fna -sizeout --minuniquesize 2
Writing output file 100%
245823 uniques written, 933142 clusters (global singletons) discarded (79.1%)
21. Clustering
module load vsearch/2.0.3
vsearch --cluster_size clean_derep.fasta --id 0.97 --sizein --sizeout --sizeorder --relabel OTU_ --centroids otus97_vsearch_repset.fasta --uc clean_derep_cluster_uc
Reading file clean_derep.fasta 100%
48008630 nt in 245823 seqs, min 100, max 297, avg 195
Masking 100%
Sorting by abundance 100%
Counting unique k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 10547 Size min 2, max 3470234, avg 23.3
Singletons: 0, 0.0% of seqs, 0.0% of clusters
21. making OUT table
vsearch -usearch_global clean_derep.fasta -db otus97_vsearch_repset.fasta -strand plus -id 0.97 -uc otu_table_mapping.uc
UC mapping look like this
H 5 145 100.0 + 0 0 = A10_226;size=734721; OTU_6;size=917068;
H 145 146 99.3 + 0 0 146M A10_89;size=330416; OTU_146;size=37375;
H 4 192 100.0 + 0 0 = A10_539;size=762083; OTU_5;size=929140;
H 8 159 100.0 + 0 0 = A10_12;size=401560; OTU_9;size=458630;
H 2 213 100.0 + 0 0 = A10_34;size=1644162; OTU_3;size=2322780;
H 1 190 100.0 + 0 0 = A10_581;size=1975476; OTU_2;size=3250102;
H 4469 190 97.4 + 0 0 189MD A10_712;size=397379; OTU_4470;size=372;
H 7 214 100.0 + 0 0 = A10_171;size=416077; OTU_8;size=536604;
H 0 186 100.0 + 0 0 = A10_21;size=2391901; OTU_1;size=2852990;
H 3 191 100.0 + 0 0 = A10_820;size=1641279; OTU_4;size=3470234;
H 3 191 99.5 + 0 0 191M A10_936;size=287918; OTU_4;size=3470234;
H 8568 191 97.4 + 0 0 141M5I50M A10_598;size=223423; OTU_8569;size=10;
H 10 198 100.0 + 0 0 = A10_124;size=218744; OTU_11;size=278533;
H 12 146 100.0 + 0 0 = A10_88;size=156791; OTU_13;size=248216;
H 11 148 100.0 + 0 0 = A10_3;size=172717; OTU_12;size=337548;
H 13 144 100.0 + 0 0 = A10_139;size=145594; OTU_14;size=268905;
H 9 267 100.0 + 0 0 = A10_1325;size=389839; OTU_10;size=2642492;
H 3 191 99.5 + 0 0 191M A10_26295;size=147308; OTU_4;size=3470234;
H 9 281 100.0 + 0 0 14D267M A10_6781756;size=313058; OTU_10;size=2642492;
H 2746 267 98.5 + 0 0 14I78MD12MI176M A10_1224;size=200518; OTU_2747;size=46973;
H 3 191 99.5 + 0 0 191M A10_464;size=136128; OTU_4;size=3470234;
H 1 190 99.5 + 0 0 190M A10_685;size=134315; OTU_2;size=3250102;
H 15 147 100.0 + 0 0 = A10_3906;size=95598; OTU_16;size=133092;
H 3 191 99.5 + 0 0 191M A10_394;size=103326; OTU_4;size=3470234;
H 19 145 100.0 + 0 0 = A10_39;size=64603; OTU_20;size=97331;
H 9600 191 98.4 + 0 0 2D189M A10_747;size=96852; OTU_9601;size=156;
H 17 145 100.0 + 0 0 = A10_2947;size=71629; OTU_18;size=135810;
H 2 215 99.1 + 0 0 18M2D195M A10_122;size=125897; OTU_3;size=2322780;
H 2677 190 97.9 + 0 0 179MI11M A10_1919;size=76653; OTU_2678;size=1570;
H 108 148 99.3 + 0 0 139MI9M A10_261;size=89155; OTU_109;size=22316;
H 16 142 100.0 + 0 0 = A10_567;size=87479; OTU_17;size=100297;
H 2 214 99.5 + 0 0 18MD195M A10_431;size=90304; OTU_3;size=2322780;
H 2746 281 98.6 + 0 0 92MD12MI176M A10_6781947;size=156876; OTU_2747;size=46973;
H 14 192 100.0 + 0 0 = A10_258;size=125837; OTU_15;size=422795;
H 18 161 100.0 + 0 0 = A10_1677;size=66888; OTU_19;size=83076;
H 6 146 100.0 + 0 0 = A10_13;size=658161; OTU_7;size=1285564;
H 3832 145 97.2 + 0 0 145M4I A10_708;size=85814; OTU_3833;size=103;
H 11 147 99.3 + 0 0 138MI9M A10_108;size=75713; OTU_12;size=337548;
H 5183 145 97.2 + 0 0 142M3D A10_6;size=67760; OTU_5184;size=115;
H 20 139 100.0 + 0 0 = A10_816;size=60761; OTU_21;size=79621;
H 22 103 100.0 + 0 0 = A10_423;size=54082; OTU_23;size=57714;
H 23 144 100.0 + 0 0 = A10_416;size=53035; OTU_24;size=70642;
H 7336 191 97.9 + 0 0 191M A10_10991;size=43215; OTU_7337;size=142;
H 1 190 98.9 + 0 0 190M A10_2144;size=41371; OTU_2;size=3250102;
H 24 173 100.0 + 0 0 = A10_44;size=51278; OTU_25;size=62582;
H 27 148 100.0 + 0 0 = A10_2368;size=45170; OTU_28;size=76682;
H 347 144 97.9 + 0 0 I144M A10_1264;size=33918; OTU_348;size=4456;
H 28 158 100.0 + 0 0 = A10_5478;size=44799; OTU_29;size=80258;
H 7547 191 97.9 + 0 0 172MI19M A10_7590;size=32535; OTU_7548;size=276;
H 3 191 99.0 + 0 0 191M A10_447;size=38856; OTU_4;size=3470234;
H 31 146 100.0 + 0 0 = A10_192;size=32419; OTU_32;size=101444;
H 7080 146 97.3 + 0 0 45I146M A10_294;size=29998; OTU_7081;size=44;
H 33 145 100.0 + 0 0 = A10_1042;size=29135; OTU_34;size=38026;
H 34 146 100.0 + 0 0 = A10_3194;size=26760; OTU_35;size=72476;
H 9600 191 97.9 + 0 0 2D189M A10_5466;size=22991; OTU_9601;size=156;
H 6671 186 98.3 + 0 0 7D179M A10_117;size=25941; OTU_6672;size=205;
H 12 146 99.3 + 0 0 146M A10_52;size=28543; OTU_13;size=248216;
But all these reads are not coming from same sample. They belonged to other samples as well here for ex to Sample A16, A1, A20, A21, A23 etc.
how this information will transfer to next step
but I guess this should be the original fasta file which was input while dereplication.
Q1. I am getting many false positive results while chimera analysis. I checked chimera detected by vsearch but they looks fine in blast analysis. Should I change some thing in chimera analysis text?
Q2. On --otutabout not working well.
Q3. Alternatively, I used python script (from usearch) and QIIME workflow for generating OTU table.