I successfully ran MetaPhlAn2 on my samples. In the merged_abundance_table.txt, I tried to extract each samples' composition, by extracting just the rows whose first column ends with s__<species name here>. I then copied these rows into Excel and sum the percentages by column (per samples), to check if they add up to 100% (meaning all reads can be assigned to species-level). Half of the samples do add up to 100%, but the other half is less than 100%, ranging from 90% to 99.9%. I wonder how to find the missing 1-10%, so I tried the following script that didn't add up to 100%:
$ biom summarize-table -i sampleA.biom -o table_summary.txt --observations
And got this result:
...
Observation Metadata Categories: taxonomy
Counts/sample detail:
155: 0.88098
153: 9.42803
8623: 27.83432
10532: 27.91643
26344: 33.94024
I checked that these percentages (0.88 + 9.42 + .... 33.9) really add up to 100%! So I wanted to know the clade names of these taxids. I tried mapping them to NCBI Taxonomy but not all results show up:
code taxid primary taxid taxname
1 155 155 Spirochaeta isoacid
4 153
1 8623 8623 Dendroaspis jamesoni
1 10532 10532 Simian adenovirus 7
4 26344
My question is, is there any script in MetaPhlAn2 (or Biobakery for that matter) that can easily map these taxids to clade names? Thank you very much for your help.
--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-users+unsubscribe@googlegroups.com.