Which database should I download for metaphlan2?

210 views
Skip to first unread message

Nick

unread,
Sep 4, 2019, 11:07:50 PM9/4/19
to MetaPhlAn-users
On the metaphlan2 main page description: 
"...the marker information file mpa_v20_m200_marker_info.txt.bz2 can be found in the Download page here"

However, on the download page, the latest database is: 
mpa_v29_CHOCOPhlAn_201901_marker_info.txt.bz2
But then I see that "ChocoPhlAn is a proprietary database used with HUMAnN2." So why is this database being used with metaphlan2?


Which one should I use? What's the different between the two? 


Francesco Beghini

unread,
Sep 9, 2019, 5:37:53 PM9/9/19
to Nick, MetaPhlAn-users
Hi Nick,
the mpa_v20 database is the only (for now) database compatible with HUMAnN2. We provided an updated but not final version of the database called mpa_v29 which currently is working only with MetaPhlAn2. mpa_v29 contains a new set of markers extracted from an expanded set of reference genomes. 
As I previously mentioned, the database is not final and we are currently working on it with refinements and further benchmarks.

Francesco

Francesco Beghini
PhD Student - Laboratory of Computational Metagenomics
Department of Cellular, Computational and Integrative Biology - CIBIO
University of Trento


--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metaphlan-users/fcb25dec-e7ef-4c5a-88b0-9c285ab2fe31%40googlegroups.com.

Monica Ticlla

unread,
Nov 8, 2019, 12:31:14 PM11/8/19
to MetaPhlAn-users
Hi Francesco,

A related question, rather issue:

I run metaphlan2.py with mpa_v20 successfully but when I ask metaphlan2.py to use the mpa_v29 by passing this --index v294_CHOCOPhlAn_201901,
I got the following error:

Help message for read_fastx.py
Traceback (most recent call last):
  File "/scif/apps/MetaPhlAn2/metaphlan2/metaphlan2.py", line 1565, in <module>
    metaphlan2()
  File "/scif/apps/MetaPhlAn2/metaphlan2/metaphlan2.py", line 1457, in metaphlan2
    tree = TaxTree( mpa_pkl, ignore_markers )
  File "/scif/apps/MetaPhlAn2/metaphlan2/metaphlan2.py", line 1134, in __init__
    add_lens( self.root )
  File "/scif/apps/MetaPhlAn2/metaphlan2/metaphlan2.py", line 1131, in add_lens
    lens.append( add_lens( c ) )
  File "/scif/apps/MetaPhlAn2/metaphlan2/metaphlan2.py", line 1131, in add_lens
    lens.append( add_lens( c ) )
  File "/scif/apps/MetaPhlAn2/metaphlan2/metaphlan2.py", line 1131, in add_lens
    lens.append( add_lens( c ) )
  [Previous line repeated 4 more times]
  File "/scif/apps/MetaPhlAn2/metaphlan2/metaphlan2.py", line 1132, in add_lens
    node.glen = sum(lens) / len(lens)
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'


I checked the bowtie2out and samout files and they were created, here the first five lines of each file:

D00535:105:CD1MBANXX:3:1102:15530:83073_1:N:0:AGGCAGAA+GAGCCTTA	329__B2UJE8__OR214_02957
D00535:105:CD1MBANXX:3:1307:11308:81771_1:N:0:AGGCAGAA+GAGCCTTA	329__B2UJE8__OR214_02957
D00535:105:CD1MBANXX:3:1101:3257:2186_1:N:0:AGGCAGAA+GAGCCTTA	329__A0A0B1ZCK1__BEK68_22290
D00535:105:CD1MBANXX:3:1101:9153:30973_1:N:0:AGGCAGAA+GAGCCTTA	305__F6G7E3__KR96_18365
D00535:105:CD1MBANXX:3:1205:13241:66934_1:N:0:AGGCAGAA+GAGCCTTA	305__F6G7E3__KR96_18365

@HD VN:1.0 SO:unsorted @SQ SN:1774__A0A145SSN4__BKG84_01780 LN:549 @SQ SN:1774__A0A1S1KP69__BKG84_29780 LN:371 @SQ SN:1774__A0A0E3TQG9__BKG84_03055 LN:489 @SQ SN:1774__A0A0E3XU86__BKG84_13495 LN:480

but the output with the relative abundances is empty.

Any clue of why is this happening?

Thanks,

Monica
On Monday, September 9, 2019 at 4:37:53 PM UTC-5, Francesco Beghini wrote:
Hi Nick,
the mpa_v20 database is the only (for now) database compatible with HUMAnN2. We provided an updated but not final version of the database called mpa_v29 which currently is working only with MetaPhlAn2. mpa_v29 contains a new set of markers extracted from an expanded set of reference genomes. 
As I previously mentioned, the database is not final and we are currently working on it with refinements and further benchmarks.

Francesco

Francesco Beghini
PhD Student - Laboratory of Computational Metagenomics
Department of Cellular, Computational and Integrative Biology - CIBIO
University of Trento


On Thu, Sep 5, 2019 at 5:07 AM Nick <nicochun...@gmail.com> wrote:
On the metaphlan2 main page description: 
"...the marker information file mpa_v20_m200_marker_info.txt.bz2 can be found in the Download page here"

However, on the download page, the latest database is: 
mpa_v29_CHOCOPhlAn_201901_marker_info.txt.bz2
But then I see that "ChocoPhlAn is a proprietary database used with HUMAnN2." So why is this database being used with metaphlan2?


Which one should I use? What's the different between the two? 


--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphl...@googlegroups.com.

Francesco Beghini

unread,
Nov 15, 2019, 10:15:17 AM11/15/19
to Monica Ticlla, MetaPhlAn-users
Hi Monica,
from the error, it seems that you are trying to use the latest database with MetaPhlAn 2.7 or older. mpa_v294_CHOCOPhlAn_201901 is only compatible with MetaPhlAn2 2.9. You can get this version from here https://bitbucket.org/biobakery/metaphlan2/src/2.9/
However, this version it is not final and it is still in testing.

Best,

Francesco Beghini
PhD Student - Laboratory of Computational Metagenomics
Department of Cellular, Computational and Integrative Biology - CIBIO
University of Trento

To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metaphlan-users/c23d74dc-8576-4a0c-9bb7-a1de012c2b07%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages