Updates to the metaphlan database and custom databases

1,944 views
Skip to first unread message

kat...@secondgenome.com

unread,
Feb 2, 2018, 5:01:22 PM2/2/18
to MetaPhlAn-users
Hi,
I'm wondering how often the metaphlan database is updated or if there is a planned update to be released. I see there is quite a bit of discussion about building an updated or custom database [1][2][3][4] and it seems like there is a pipeline in the works. Is there a planned release for this? If not, would it be possible to open source the existing code?

Thanks,
Kathryn


[1] https://groups.google.com/forum/#!searchin/metaphlan-users/database|sort:date/metaphlan-users/kOwvt8b79xo/yvpXTMpYAQAJ
[2] https://groups.google.com/forum/#!searchin/metaphlan-users/database|sort:date/metaphlan-users/vXQq4Jzga2Q/d4c6ztPLBQAJ
[3] https://groups.google.com/forum/#!searchin/metaphlan-users/custom$20database|sort:date/metaphlan-users/EEcOIkQV_T8/DWx2UXZNBAAJ
[4] https://groups.google.com/forum/#!searchin/metaphlan-users/custom$20database|sort:date/metaphlan-users/Fx3WUZaTmMk/95ul4yCkAgAJ

Francesco Asnicar

unread,
Feb 5, 2018, 5:09:36 PM2/5/18
to kat...@secondgenome.com, MetaPhlAn-users
Hello Kathryn,

Many thanks for your questions. Our recent updates to the MetaPhlAn2 work in this exact direction, to make easier for us (and others as well) to provide their own database of markers to be used in the MetaPhlAn2 analysis.

MetaPhlAn2 is released open-source and you can freely view/download/edit the code, have a look at the Bitbucket repository: https://bitbucket.org/biobakery/metaphlan2

There exists a way at the moment that allows users to provide their own database of markers, however, it is not as automatic (or easy) as we would like it to be. In the readme file we have a section ("Customizing the database") explaining how you can customize the markers database: https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database
However, the last part of the tutorial above asks you to specify the "--mpa_pkl" parameters, but with the recent updates you should use the "--index" parameter.

I hope this answers your questions. Please, let me know you should have any other question/doubt about MetaPhlAn2.

Bests,
Francesco

--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

kat...@secondgenome.com

unread,
Feb 16, 2018, 1:02:43 PM2/16/18
to MetaPhlAn-users
Hi Francesco,

Thank you for the reply. I understand now how to add markers to the database but I'm interested in making marker files for genomes not in the metaphlan database. Is this currently possible? Are the instructions for creating these marker files?

Thank you,
Kathryn
Kathryn

Nicholas Youngblut

unread,
Jun 22, 2018, 10:56:23 AM6/22/18
to MetaPhlAn-users
In regards to editing the taxonomy file, I'm confused by the 'ext' and 'score' parameters as they are described in the docs. The description of the 'ext' param states "external genome", but what is an "external" genome?? Does that mean "external to the clade of interest" or is it just a list of which genomes (in the clade or out of the clade) where the marker is found? For the 'score' parameter, is this a blast score or something else?

Nicholas Youngblut

unread,
Jun 23, 2018, 4:19:09 AM6/23/18
to MetaPhlAn-users
In addition to "ext" and "score", I'm confused by "clade" and "taxon". For constructing `mpa_v*_m200`, is "clade" the genus and "taxon" the species? Was a different definition used for these terms?

Francesco Asnicar

unread,
Jun 23, 2018, 7:30:48 AM6/23/18
to Nicholas Youngblut, MetaPhlAn-users
Hi,

Here https://bitbucket.org/biobakery/metaphlan2/src/default/#markdown-header-customizing-the-database  the instructions to customize the MetaPhlAn database. Below the answers to the specific questions about the specific fields of the pickle file.

"ext"
is the set of external genomes where the marker is also found, where external means that these genomes belong to a different clade.

"score"
I have to be honest and say that I don't really know the details about how the score is computed. However, this score should represent how "good" the marker is w.r.t. the other markers for the same clade, meaning that it is a number that should reflect how "core" the marker is, but also considering its length, and possibly other information. So, in the case you're defining a new set of markers for a new clade not yet present in the MetaPhlAn database, you can just assign a fixed number to all the markers you're defining in your new database.

"clade"
is the name used in the MetaPhlAn output and represent the taxonomic level to which the marker corresponds to.

"taxon"
is the full taxonomy of the marker.


I hope these details are clear enough and help you in defining your custom markers database.

Many thanks,
Francesco

Nicholas Youngblut

unread,
Jun 23, 2018, 10:01:32 AM6/23/18
to MetaPhlAn-users
Thank you so much for the quick and clear answers! That helps out quite a bit. I'm concerned however that I'm not going to be able to "correctly" add custom markers to the current metaphlan2 database without having the code that was used to generate the original marker set. For instance the original metaphlan paper includes the Methods section "Identification of clade-specific core genes", which outlines a rather complicated algorithm for what I believe was the original method for determining the metaphlan marker genes. The method for the procedure seems to have gotten even more complicated with metaphlan2 and the introduction of "quasi-marker" genes.

Without the original code on how the markers were created (or a new tool that can do the job), a researcher must recreate this algorithm with all of its intricacies (eg., determining the "score" correctly) in the same manner as all of the other markers in the existing database. Otherwise, the added markers will be biased in their selection and their "score". So, it seems to be a big challenge for a researcher to "correctly" add new markers to the marker database that doesn't wrongly bias the database towards or against their newly added markers.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages