Comparison Metaphlan 1 and Metphlan 2

549 views
Skip to first unread message

JC Grenier

unread,
Apr 15, 2014, 12:37:11 PM4/15/14
to metaphl...@googlegroups.com
Hi Nicola, I'm presently doing a comparison of some of my results obtained by Metaphlan1 and trying to correlate them with the results provided by Metaphlan2.

What is exactly different between the two MPA reference sets? Have you removed a lot of species or they are still supposed to be there?

When I'm looking to Metaphlan1 results, I'm seeing a lot of unclassified species (Bifidobacterium_unclassified for example) while in Metaphlan2, it seems there is a little less but the results are quite different for some bacterial species.

To give you an example, with Metaphlan 1 for Bifidobacterium_unclassified, I'm getting an average of ~15 for relative abundance and it's by far my first top hit (I'm getting 2.50 for Bifidobacterium_adolescentis).

For Metaphlan2, I don't have that Bifidobacterium_unclassified but still have Bifidobacterium_adolescentis with an average abundance of 2.13 accross my samples. For sure there are now more information in the database that will affect that relative abundance but how can I verify that this version works well for me now?

Thanks a lot for your help.

Nicola Segata

unread,
Apr 18, 2014, 5:35:31 AM4/18/14
to metaphl...@googlegroups.com
Hi JC,
 thanks for getting in touch. MetaPhlAn 2 is based on ~15k species representing many more species than those present in MetaPhlA 1.

Unclassified species are a bit tricky because they are somehow reflecting missing information in the set of reference genomes. I can take a look at the data if you want, but my first bet would be that the Bifidobacteria markers at the genus level identified by MetaPhlAn 1 are not strong enough for the additional evidence of many new Bifidobacteria genomes included in MetaPhlAn 2 to be retained as markers. Do you see increase abundance of other Actinobacteria species potentially close to Bifidobacteria?

As I said I can take a look at the data if you like, and let me know if you have other comments or questions.
thanks
Nicola

JC Grenier

unread,
Apr 23, 2014, 11:19:22 AM4/23/14
to metaphl...@googlegroups.com
Thanks Nicola for your answer! We will try to figure it out on our own. Do you happen to have any stats about the new number of species per domain you add in your new database and how many you removed as well? Maybe that will give us a good idea of what happened with our dataset!

Thanks again!

JC Grenier

unread,
Apr 23, 2014, 12:19:48 PM4/23/14
to metaphl...@googlegroups.com
Hi Nicola, I have another question concerning that comparison. In Metaphlan2, there are "t__" annotations. What does it corresponds to exactly?

Thanks a lot.

JC Grenier

unread,
Apr 23, 2014, 1:19:35 PM4/23/14
to metaphl...@googlegroups.com
Here's a more complete question coming from my PI actually :

Dear Nicola,

Thanks so much for your reply. There are still a couple things that I am not sure to fully understand. In Metaphlan2 I have a lot of classifications starting at “t__***”. This seems to always refer to a species name that finish with “_unclassified”. For example, I have “…|s__Salinispora_tropica” and in the next row “..|s__Salinispora_tropica |t__Salinispora_tropica_unclassified”. The percentages assigned to both are always identical. What exactly this “t__” means? Can I simply remove these rows?


Second, I am still having a hard time understanding why some species previously identified by MetaPhlan1 seems to have “disappeared”? For example, using MetaPhlan1 I detected 11 species of Bifidobacterium (Bifidobacterium_adolescentis, s__Bifidobacterium_angulatum, s__Bifidobacterium_animalis, s__Bifidobacterium_bifidum, s__Bifidobacterium_breve, s__Bifidobacterium_catenulatum, s__Bifidobacterium_dentium, s__Bifidobacterium_gallicum, s__Bifidobacterium_longum, s__Bifidobacterium_pseudocatenulatum, s__Bifidobacterium_unclassified). Now, with the new version I am only identifying 6 different species (Bifidobacterium_adolescentis, s__Bifidobacterium_animalis, Bifidobacterium_bifidum, Bifidobacterium_breve, Bifidobacterium_longum, Bifidobacterium_dentium) whereas I was expecting to identify at least the same or more given that the previously unclassified could, in principle, be better defined in the new version.
Does this mean that some of the previous markers were not 100% specific when increasing the number of species being interrogated?

Thanks a lot for your help. I am happy to share some of my files if you would like (have the time) to look into it on your end.

Best

Nicola Segata

unread,
Apr 24, 2014, 10:15:40 AM4/24/14
to metaphl...@googlegroups.com
Hi JC,
 regarding t__*, it is an attempt to catch the specific strain. This is of course possible only in those rare cases in which a previously sequenced strain is present in the sample, so most of the times are "unclussified". It is perfectly fine to just filter them out (e.g. by "grep -v t__ result_file_name.txt") and focus on species and higher taxonomic level.

On your second question, yes, it is possible that some marker genes are not very good markers anymore. It can happen when a species A was present in version 1 and a new very close species B is added to the database. It can be that a different set of markers are selected based on the additional discrimination needed.

This are the Bifidobacterium species included in version 2 (i.e. available in NCBI as December 2013):
s__Bifidobacterium_adolescentis
s__Bifidobacterium_angulatum
s__Bifidobacterium_animalis
s__Bifidobacterium_asteroides
s__Bifidobacterium_bifidum
s__Bifidobacterium_breve
s__Bifidobacterium_catenulatum
s__Bifidobacterium_dentium
s__Bifidobacterium_gallicum
s__Bifidobacterium_longum
s__Bifidobacterium_magnum
s__Bifidobacterium_minimum
s__Bifidobacterium_pseudocatenulatum
s__Bifidobacterium_pseudolongum
s__Bifidobacterium_sp_12_1_47BFAA
s__Bifidobacterium_thermophilum
and these are the species in version 1:
s__Bifidobacterium_adolescentis
s__Bifidobacterium_angulatum
s__Bifidobacterium_animalis
s__Bifidobacterium_bifidum
s__Bifidobacterium_breve
s__Bifidobacterium_catenulatum
s__Bifidobacterium_dentium
s__Bifidobacterium_gallicum
s__Bifidobacterium_longum
s__Bifidobacterium_pseudocatenulatum

Notice that for some species many new genomes are available in version 2 and this is another aspect which improves the specificity of the markers.

I hope this helps
Nicola

JC Grenier

unread,
Apr 25, 2014, 11:22:01 AM4/25/14
to metaphl...@googlegroups.com
Thanks Nicola, that's helping us a lot.

Have a nice day!

Reply all
Reply to author
Forward
0 new messages