MetaPhlan2 taxonomies

571 views
Skip to first unread message

rodionovdm...@gmail.com

unread,
Apr 2, 2019, 2:02:20 PM4/2/19
to MetaPhlAn-users
Hello,

While using the MetaPhlan2 taxonomic profile output we noticed that the taxonomy string contain outdated taxonomic names (please see examples below).

What is the source and version of Taxonomy strings in MetPhlan2 outputs?

Is there a way to get NCBI TaxIDs or updated taxonomy strings for output taxonomies?

thank you,
Dmitry

k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Corynebacteriaceae|g__Corynebacterium
Problem: o__Actinomycetales should be o__Corynebacteriales

k__Bacteria|p__Verrucomicrobia|c__Verrucomicrobiae|o__Verrucomicrobiales|f__Verrucomicrobiaceae|g__Akkermansia|s__Akkermansia_muciniphila
Problem: f__Verrucomicrobiaceae should be f__Akkermansiaceae

k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Coriobacteriales|f__Coriobacteriaceae|g__Collinsella|s__Collinsella_aerofaciens
Problem: c__Actinobacteria should be c__Coriobacteriia

k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Blautia|s__Ruminococcus_torques
Problem: s__Ruminococcus_torques should be s__[Ruminococcus]_torques

Francesco Beghini

unread,
Apr 5, 2019, 11:23:31 AM4/5/19
to rodionovdm...@gmail.com, MetaPhlAn-users
Hi Dmitry,
The current MetaPhlAn2 database was created in 2015 starting from genomes updated in NCBI. Profiles generated by metaphlan2 are then created using the 2015 taxonomy of these bacteria. 
You can manually update the taxonomy string by changing by hand the database (https://bitbucket.org/biobakery/metaphlan2/src/default/#markdown-header-customizing-the-database). 
Unfortunately, the current MetaPhlAn2 database does not report the NCBI tax id.
We are in the process of releasing a new version of the database with new and updated markers, taxonomy and embedded NCBI taxid.

Best,
Francesco
Francesco Beghini
PhD Student - Laboratory of Computational Metagenomics
Department of Cellular, Computational and Integrative Biology - CIBIO
University of Trento


--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew

unread,
May 13, 2019, 9:09:16 AM5/13/19
to MetaPhlAn-users
Just to add, this would be exceedingly helpful to me, as it is otherwise very difficult to make comparisons with other tools.

Best wishes,

Andrew

Mipmipmipmipmip

unread,
May 16, 2019, 4:38:51 AM5/16/19
to MetaPhlAn-users
Hi Francesco,

From the latest commit it appears that support for viral identification is removed from MetaPhlAn2, is that correct? Any idea why this is removed?

https://bitbucket.org/biobakery/metaphlan2/commits/e09d9c86d5b443bde3fe552dda267ada232c2248#chg-README.md

Thank you, Pim

>>>
-MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.
+MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level. With the newly added StrainPhlAn module, it is now possible to perform accurate strain-level microbial profiling.

-MetaPhlAn 2 relies on ~1M unique clade-specific marker genes ([the marker information file `mpa_v20_m200_marker_info.txt.bz2` can be found in the Download page here](https://bitbucket.org/biobakery/metaphlan2/downloads/mpa_v20_m200_marker_info.txt.bz2)) identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing:
+MetaPhlAn 2 relies on ~1M unique clade-specific marker genes ([the latest marker information file `mpa_v25_CHOCOPhlAn_201901_marker_info.txt.bz2` can be found in the Download page here](https://bitbucket.org/biobakery/metaphlan2/downloads/mpa_v25_CHOCOPhlAn_201901_marker_info.txt.bz2)) identified from ~100,000 reference genomes (~99,500 bacterial and archaeal and ~500 eukaryotic), allowing:
<<<


On Friday, April 5, 2019 at 5:23:31 PM UTC+2, Francesco Beghini wrote:
> Hi Dmitry,
...


> We are in the process of releasing a new version of the database with new and updated markers, taxonomy and embedded NCBI taxid.

...
> Best,
> Francesco

Francesco Beghini

unread,
Jul 5, 2019, 5:50:10 AM7/5/19
to Mipmipmipmipmip, MetaPhlAn-users
Hi Pim,
we have chosen to remove viral identification since the pipeline we employ to find marker genes is not suitable with viral sequences. We plan to re-integrate viral profling in a future in a next version.

Best,
Francesco

Francesco Beghini
PhD Student - Laboratory of Computational Metagenomics
Department of Cellular, Computational and Integrative Biology - CIBIO
University of Trento

--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.

Nick

unread,
Sep 5, 2019, 4:58:10 AM9/5/19
to MetaPhlAn-users
Hi Francesco,

    In what way is it unsuitable? Is the false positive/negative rate higher for identifying viruses? Or are the references wrong? 

I want to use the newest database, but I'd also like to look at the viruses in my sample. Do you have any suggestions?

Regards,
Nick
To unsubscribe from this group and stop receiving emails from it, send an email to metaphl...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages