otu table wrong order OTU numbers

118 views
Skip to first unread message

Irene Adamo

unread,
Nov 17, 2017, 11:36:45 AM11/17/17
to VSEARCH Forum
Hi everyone,
I followed the vsearch pipeline present on github and when I did the clustering I used this commands 

vsearch --cluster_size CR.nonchimera.fasta --id 0.97 --strand plus --sizein --sizeout --fasta_width 0 --uc all.clustered.uc --relabel OTU_ --centroids all.otus.fasta --otutabout all.otutab.txt


#OTU ID
OTU_1
OTU_10
OTU_100
OTU_1000
OTU_1001
OTU_1002
OTU_1003
OTU_1004
OTU_1005
OTU_1006
OTU_1007
OTU_1008
OTU_1009
OTU_101
Since I have almost 4000 OTUs it is really difficult to order them in the right order by hand. In excel the option ordering from the smallest number to the biggest does not work.
Is there a way to order the OTUs in the way that I get 
OTU_1
OTU_2
OTU_3 and so on?
thanks for any help

Colin Brislawn

unread,
Nov 17, 2017, 4:56:32 PM11/17/17
to VSEARCH Forum
Hello Irene,

Yes, that 'alphabetic' sort order is frustrating.
Which file are you looking at that shows this order? The .uc file should be in correct sorted order, but I'm not sure about the other files.

I'm not sure if vsearch has a setting for this, but I know how to do it in Excel, if you are looking for a more manual solution.

Colin

Irene Adamo

unread,
Nov 17, 2017, 5:35:44 PM11/17/17
to VSEARCH Forum
Hi Colin, thanks for your answer.
I am looking at the txt file generated by the -otutabaout command. Could you tell me how to do it in Excel?
I also tried to use the uc file with the python create_otu_table_from_uc_file.py script but I get in return this error:

Error in uc file formating. Check for spaces in sample IDs and to make sure there is a semicolon after sample IDs.
First line with issue:
S 0 200 * * * * * CR16_seq.fa1;size=1056; *
0.2%

I think there is something wrong or because of txt editor I use ore with the uc file.
thanks again for your help!

Colin Brislawn

unread,
Nov 17, 2017, 7:01:40 PM11/17/17
to VSEARCH Forum
Glad to help!

Ok, so --otutabout makes a flat text file, and excel will automatically recognize rows and columns of this file once it's opened. Our goal here will be to make a new column that excel will recognize as a number, that way it will do a number sort. Here is one way to make this new column:

In Excel, copy the column with OTU_1, OTU_10, OTU_11, OTU_12, etc into a new column at the very right of the sheet. 
Select your new column, then choose Data > Text to columns...
In the window that pops up, choose "Delimited" and click next 
Under Delimiters, choose "Other" then enter an underscore _ into the box
Click next, then Finish

You should now have split your new column into TWO new columns. 1) A new column full of OTU 2) a new column full of 1, 10, 11, 12, etc. This is the column that we can use to sort our sheet!
Select the full sheet. Then sort by your final column, making sure to check "My list has headers" in the sort window.

I hope that helps!
Colin 

Irene Adamo

unread,
Nov 18, 2017, 7:59:47 AM11/18/17
to VSEARCH Forum
It worked perfectly!!!! thanks a lot!

Colin Brislawn

unread,
Nov 18, 2017, 12:44:47 PM11/18/17
to VSEARCH Forum
OK great! Have a good weekend.

Colin

Frédéric Mahé

unread,
Nov 20, 2017, 6:31:00 AM11/20/17
to VSEARCH Forum
Hello,

for a command-line solution you can use the "version sort" option of the command sort:

printf 'OTU_1\nOTU_10\nOTU_2\nOTU_100\n' | sort -k1,1V

The sort command can operate on columns individually (here it targets column 1). You can even sort the output of the --otutabout option before even writing to your hard drive:

vsearch ... --otutabout >(sort -k1,1V > all.otutab.txt)

but that sorting is probably something that could be corrected in vsearch itself, especially when using the relabel+ticker option.
Reply all
Reply to author
Forward
0 new messages