map taxid to clade (species or genus) name

176 views
Skip to first unread message

Jamie Kwok

unread,
Mar 19, 2017, 1:21:20 AM3/19/17
to MetaPhlAn-users
Hello,

I successfully ran MetaPhlAn2 on my samples. In the merged_abundance_table.txt, I tried to extract each samples' composition, by extracting just the rows whose first column ends with s__<species name here>. I then copied these rows into Excel and sum the percentages by column (per samples), to check if they add up to 100% (meaning all reads can be assigned to species-level). Half of the samples do add up to 100%, but the other half is less than 100%, ranging from 90% to 99.9%. I wonder how to find the missing 1-10%, so I tried the following script that didn't add up to 100%:
$ biom summarize-table -i sampleA.biom -o table_summary.txt --observations
And got this result:
...
Observation Metadata Categories: taxonomy

Counts/sample detail:
155: 0.88098
153: 9.42803
8623: 27.83432
10532: 27.91643
26344: 33.94024

I checked that these percentages (0.88 + 9.42 + .... 33.9) really add up to 100%! So I wanted to know the clade names of these taxids. I tried mapping them to NCBI Taxonomy but not all results show up:

code taxid primary taxid taxname
1 155 155 Spirochaeta isoacid
4 153
1 8623 8623 Dendroaspis jamesoni
1 10532 10532 Simian adenovirus 7
4 26344

My question is, is there any script in MetaPhlAn2 (or Biobakery for that matter) that can easily map these taxids to clade names? Thank you very much for your help.

Nicola Segata

unread,
Mar 19, 2017, 6:19:31 PM3/19/17
to Jamie Kwok, MetaPhlAn-users
Hi Jamie,
 we do not have NCBI tax IDs in MetaPhlAn. Howerver, the missing percentage is due to the presence of "unclassified" taxa at taxonomic levels higher than species. So instead of selecting all entries ending with s__<species name here>, you should select the entried "ending with s__<species name here>" OR "<name>_unclassified".

Best
Nicola

--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jamie Kwok

unread,
Mar 20, 2017, 12:04:03 AM3/20/17
to Nicola Segata, MetaPhlAn-users
Hi Nicola,

Thank you very much for your reply! I can get 100% for all samples now.
If the "taxid" (155, 153, 8623 as seen in example) is not from NCBI, which file in MetaPhlAn2 should I look at to know what they represent?

Best regards,
Jamie

To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages