metaphlan strainer VS humann2's strain_profiler.py VS panphlan

110 views
Skip to first unread message

Ming Liao

unread,
May 3, 2016, 12:36:27 PM5/3/16
to MetaPhlAn-users
Hello, all

I have tried Panphlan before, it can provide the comparison between studied samples and reference genome as well, in the strain level. It seems the strainer in metaphlan is doing the same thing. Is there any strength for metaphlan strainer when compared with panphlan?

Moreover, the human2 has strain_profiler.py, which can provide the gene-family abundance for each specie. I think humann2 is focus on the function annotation for the species genes, which will provide more evidence for either metaphlan_strainer or panphlan_strainer. So I am wondering how to compare the results from metaphlan_strainer or panphlan_strainer.


Thanks,

Ming

Duy Tin Truong

unread,
May 3, 2016, 1:16:50 PM5/3/16
to Ming Liao, MetaPhlAn-users
Hi Ming,

metaphlan_strainer provides the multiple sequence alignment of all strains reconstructed directly from metagenomic samples. 
Meanwhile, panphlan produces the gene family presence/absence in the pan genome of all samples.

Cheers,
Tin

--
You received this message because you are subscribed to the Google Groups "MetaPhlAn-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metaphlan-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nicola Segata

unread,
May 3, 2016, 1:57:41 PM5/3/16
to Duy Tin Truong, Ming Liao, MetaPhlAn-users
Hi Ming,
 as Tin said, PanPhlAn and MetaPhlAn strainer are based on two different principles. PanPhlAn profiles strains by looking at which genes they have, whereas MetaPhlAn strainer looks at the sequence variations within the marker genes of the species. These two aspects have a different biological meaning and can be complementary.

cheers
Nicola

Ming Liao

unread,
May 4, 2016, 11:34:41 AM5/4/16
to MetaPhlAn-users, duytin...@gmail.com, liaom...@gmail.com, nicola...@unitn.it
Thanks, Nicola

I think you made it very clear about the difference between PanPhlAn and MetaPhlAn. What about humann2?
I have tried PanPhlAn before. It can output the genes's sequences for some specific specie, which can be mapped to KEGG and got its gene name. Finally, I can know which gene is absent or present all through the samples.

Meanwhile, humann2's strain_profiler can also do the same job. It can output the uniref ID which can be mapped to KEGG as well, then provide the gene name. Finally, I can know the difference of gene abundance between case and control.

In terms of humann2's output from strain_profiler, I can re-code the gene abundance in the following way:

gene abundance >0 is present gene
gene abundance =0 is absent gene.

So is this coding way considered to be the same as PanPhlAn coding the present and absent?

I am wondering how to use these two similar outputs from humann2's strain_profiler and PanPhlAn. Thanks

Best,


Ming

Matthias Scholz

unread,
May 4, 2016, 12:32:49 PM5/4/16
to Ming Liao, MetaPhlAn-users, Duy Tin Truong, Nicola Segata
Hi Ming,

Exact zero is not a good threshold for considering a gene as absent. Absent genes show abundance values that are very low but not always exact zero. Reads can map incorrectly due to a short sequence region similar to other genes. PanPhlAn instead uses a threshold related to the expected strain abundance which is identified by using the plateau in the coverage curves. "If a sample passes the strain detection filter criteria, gene family coverage values are converted into final presence/absence profiles based on a coverage depth threshold >0.5× times the mean coverage." (Online Methods)

We are currently working on a PanPhlAn feature to convert HUMAnN2 gene abundances into presence/absence profiles.

cheers,
  Matthias



Reply all
Reply to author
Forward
0 new messages