otu table wrong order OTU numbers

Irene Adamo

unread,

Nov 17, 2017, 11:36:45 AM11/17/17

to VSEARCH Forum

Hi everyone,

I followed the vsearch pipeline present on github and when I did the clustering I used this commands

vsearch --cluster_size CR.nonchimera.fasta --id 0.97 --strand plus --sizein --sizeout --fasta_width 0 --uc all.clustered.uc --relabel OTU_ --centroids all.otus.fasta --otutabout all.otutab.txt

#OTU ID

OTU_1

OTU_10

OTU_100

OTU_1000

OTU_1001

OTU_1002

OTU_1003

OTU_1004

OTU_1005

OTU_1006

OTU_1007

OTU_1008

OTU_1009

OTU_101

Since I have almost 4000 OTUs it is really difficult to order them in the right order by hand. In excel the option ordering from the smallest number to the biggest does not work.

Is there a way to order the OTUs in the way that I get

OTU_1

OTU_2

OTU_3 and so on?

thanks for any help

Colin Brislawn

unread,

Nov 17, 2017, 4:56:32 PM11/17/17

to VSEARCH Forum

Hello Irene,

Yes, that 'alphabetic' sort order is frustrating.

Which file are you looking at that shows this order? The .uc file should be in correct sorted order, but I'm not sure about the other files.

I'm not sure if vsearch has a setting for this, but I know how to do it in Excel, if you are looking for a more manual solution.

Colin

Irene Adamo

unread,

Nov 17, 2017, 5:35:44 PM11/17/17

to VSEARCH Forum

Hi Colin, thanks for your answer.
I am looking at the txt file generated by the -otutabaout command. Could you tell me how to do it in Excel?
I also tried to use the uc file with the python create_otu_table_from_uc_file.py script but I get in return this error:

Error in uc file formating. Check for spaces in sample IDs and to make sure there is a semicolon after sample IDs.
First line with issue:
S 0 200 * * * * * CR16_seq.fa1;size=1056; *
0.2%

I think there is something wrong or because of txt editor I use ore with the uc file.
thanks again for your help!

Colin Brislawn

unread,

Nov 17, 2017, 7:01:40 PM11/17/17

to VSEARCH Forum

Glad to help!

Ok, so --otutabout makes a flat text file, and excel will automatically recognize rows and columns of this file once it's opened. Our goal here will be to make a new column that excel will recognize as a number, that way it will do a number sort. Here is one way to make this new column:

In Excel, copy the column with OTU_1, OTU_10, OTU_11, OTU_12, etc into a new column at the very right of the sheet.

Select your new column, then choose Data > Text to columns...

In the window that pops up, choose "Delimited" and click next

Under Delimiters, choose "Other" then enter an underscore _ into the box

Click next, then Finish

You should now have split your new column into TWO new columns. 1) A new column full of OTU 2) a new column full of 1, 10, 11, 12, etc. This is the column that we can use to sort our sheet!

Select the full sheet. Then sort by your final column, making sure to check "My list has headers" in the sort window.

I hope that helps!

Colin

Irene Adamo

unread,

Nov 18, 2017, 7:59:47 AM11/18/17

to VSEARCH Forum

It worked perfectly!!!! thanks a lot!

Colin Brislawn

unread,

Nov 18, 2017, 12:44:47 PM11/18/17

to VSEARCH Forum

OK great! Have a good weekend.

Colin

Frédéric Mahé

unread,

Nov 20, 2017, 6:31:00 AM11/20/17

to VSEARCH Forum

Hello,

for a command-line solution you can use the "version sort" option of the command sort:

printf 'OTU_1\nOTU_10\nOTU_2\nOTU_100\n' | sort -k1,1V

The sort command can operate on columns individually (here it targets column 1). You can even sort the output of the --otutabout option before even writing to your hard drive:

vsearch ... --otutabout >(sort -k1,1V > all.otutab.txt)

but that sorting is probably something that could be corrected in vsearch itself, especially when using the relabel+ticker option.

Reply all

Reply to author

Forward