Chocophlan database doesn't include virus pan-genomes?

158 views
Skip to first unread message

jerome.yu...@gmail.com

unread,
Oct 13, 2016, 1:02:23 AM10/13/16
to HUMAnN Users
Hi, dear HUMAnN users and developers,

Recently, I use HUMAnN2 to analysis a metagenomic sample.

Before this, using Metaphlan2, I found some virus species highly present in my sample, then followed by HUMAnN2, I want to get some gene family info. about these virus.

The result is disappointing, nothing about virus. With curiousity, I checked Chocophlan database that only 4187 species in this dataset, and the virus datum are not included in it. It's really intersting, because the marker gene-set employed by Metaphlan2,which is drawn from Chocophlan core genes database. We can find the virus genes in marker-geneset, but which doesn't exist in Chocophlan.

I need some help about this, maybe the virus core genes in Chocophlan are in some other places I failed to find.

Thank you in advance to your consideration and help~

Cheers,
Yunlong.

Eric Franzosa

unread,
Oct 13, 2016, 10:25:36 AM10/13/16
to humann...@googlegroups.com
Hi Yunlong,

Your finding is correct: Viruses are not included with the version of ChocoPhlAn bundled with HUMAnN2, although they _are_ detectable by MetaPhlAn2. Notably, MetaPhlAn2's process for identifying viruses is somewhat different than that employed for cellular microbes (as outlined in Supplementary Note 2 of the MetaPhlAn2 paper).

Constructing/adding viral pangenomes to ChocoPhlAn is definitely on our radar for the next major update to the database. It's also something we could potentially add near-term as a supplement to the existing ChocoPhlAn - I will need to look into it.

Thanks,
Eric


jerome.yu...@gmail.com

unread,
Oct 13, 2016, 7:47:10 PM10/13/16
to HUMAnN Users
在 2016年10月13日星期四 UTC+9下午11:25:36,Eric Franzosa写道:
Hi, Eric,

Thank you for your prompt reply~

Moreover, I found the bacteria list in Metaphlan2 is different with the one in HUMAnN2. The former is much more, I think that's the reason why part of my bacteria was always lost which were abundant in the Metaphlan2 prescan procedure. Do you have any comment about this issue?

If possible, I need your advice pertaining to meta-viral analysis as complementary part of HUMAnN2. Any other HUMAnN2-like tool for virus?

I appreciate your guidance, waiting for your reply~

Thanks a lot,
Yunlong

Eric Franzosa

unread,
Oct 15, 2016, 1:41:20 PM10/15/16
to humann...@googlegroups.com
Hi Yunlong,

HUMAnN2's pangenome search focuses on species with relative abundance above a threshold (0.01% by default). Species below this threshold would not be included in the pangenome search. Are you saying that your sample contained a much more abundant species whose pangenome was not profiled? If so, can you reply with the corresponding line from your MetaPhlAn2 output?

We are looking into adding the viral pangenomes to HUMAnN2. Because they tend to contain only a few genes, and likely no metabolic pathways, I do not expect them to add much to HUMAnN2's main outputs (but good to have them for consistency!). Because viral genomes won't have as much gene content-level plasticity as genomes of cellular microbes, you could probably get a good sense of the genes a given virus is contributing by looking up the virus itself in UniProt.

Thanks,
Eric


Yunlong JIA

unread,
Oct 16, 2016, 10:38:39 PM10/16/16
to HUMAnN Users
I recheck the list in .log file. Only g__Pseudomonas.s__Pseudomonas_unclassified was excluded in custom ChocoPhlAn database, obviously, it's unclassified, that's why it wasn't counted in (albeit enough abundant).  I was mistaken about it.

Thank you for all your responsive and friendly help. My problems have been well solved ~

Thanks,
Yunlong 


在 2016年10月16日星期日 UTC+9上午2:41:20,Eric Franzosa写道:
Reply all
Reply to author
Forward
0 new messages