EPA-ng use case?

54 views
Skip to first unread message

Narfusala

unread,
Aug 26, 2023, 9:23:48 AM8/26/23
to Phylogenetic Placement
Good day,

My name is Lesley.
We recently submitted a paper where we showed some heavily degraded bacterial genome captures (with metagenomic background noise), and one of the reviewers suggested using EPA-ng for it's taxanomic placement. However from reading everything for a couple days i'm not sure EPA-ng is intended for capture/shotgun data (especially when there is a metgenomic aspect to it)?

From reading the github and manual, the example of how to generate the Query MSA talks about using .fasta files, suggests that the input should already be assembled regions (genes, 16s, 18s etc) and not just shotgun/whole genome data? I think this point is also mentioned a couple times that the method is mostly intended for specific regions.

Additionally the size of the genome is ~4Mb which kills the EPA software as soon as you start it up, so whole genome alignments is too much, and it doesn't make much sense to generate SNP data for heavily degraded samples i think since you'll probably get a lot of N's and maybe background noise in your data, plus it defeats the purpose of using software to place the results on a phylogenetic tree since you can already do so by just running the phylogenetic tree software.

Am i right in assuming that EPA-ng is not intended for this type of data but instead is mostly intended for sequenced amplicon data or assembly data for example?

Thank you in advanced for any help you can offer on this topic.

Cheers,
Lesley

Ps. i cross posted this on Gitter as well, but i don't see any activity there at all so i'm not sure anyone actually uses that?

Alexandros Stamatakis

unread,
Aug 28, 2023, 2:48:55 AM8/28/23
to phylogeneti...@googlegroups.com
Dear Lesly,

That is correct, there has been one attempt to do this with metagenomic
data:

https://www.nature.com/articles/nmeth.2693

but I was personally not too happy with what we did there placement-wise.

It is really difficult to extend placement beyond the single-gene case
due to gene tree species tree discordance, we have not yet found a good
solution for this.

You may also want to have a look at the placement review paper we
recently published:

https://www.frontiersin.org/articles/10.3389/fbinf.2022.871393/full

Hope this helps,

Alexis

On 26.08.23 16:23, Narfusala wrote:
> Good day,
>
> My name is Lesley.
> We recently submitted a paper where we showed some heavily degraded
> bacterial genome captures (with metagenomic background noise), and one
> of the reviewers suggested using EPA-ng for it's taxanomic placement.
> However from reading everything for a couple days i'm not sure EPA-ng is
> intended for capture/shotgun data (especially when there is a metgenomic
> aspect to it)?
>
> From reading the github and manual, the example of how to generate the
> Query MSA talks about using .fasta files, suggests that the input should
> already be assembled regions (genes, 16s, 18s etc) and not just
> shotgun/whole genome data? I think this point is also mentioned a couple
> times that the method is mostly intended for specific regions.
>
> Additionally the size of the genome is ~4Mb which kills the EPA software
> as soon as you start it up, so whole genome alignments is too much, and
> it doesn't make much sense to generate SNP data for heavily degraded
> samples i think since you'll probably get a lot of N's and maybe
> background noise in your data, plus it defeats the purpose of using
> software to place the results on a phylogenetic tree since you can
> already do so by just running the phylogenetic tree software.
>
> Am i right in assuming that EPA-ng is not intended for this type of data
> but instead is mostly intended for sequenced amplicon data or assembly
> data for example?
>
> <https://matrix.to/#/!wSpXcJuGdnmSxaIEGk:gitter.im/$GsYYMlwjvmaVrZXg8BuGle3LnO-EC9_7YfQozmesuNw?via=gitter.im&via=matrix.org>
> Thank you in advanced for any help you can offer on this topic.
>
> <https://matrix.to/#/!wSpXcJuGdnmSxaIEGk:gitter.im/$Z8dY0nYseN_nCdvXQCZw5Gh2iev9LQJlEGg2D3kkVHM?via=gitter.im&via=matrix.org>
> Cheers,
> Lesley
>
> Ps. i cross posted this on Gitter as well, but i don't see any activity
> there at all so i'm not sure anyone actually uses that?
>
> --
> You received this message because you are subscribed to the Google
> Groups "Phylogenetic Placement" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to phylogenetic-plac...@googlegroups.com
> <mailto:phylogenetic-plac...@googlegroups.com>.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/phylogenetic-placement/eebc0b9e-1356-4f72-a4a2-59ac69a01a33n%40googlegroups.com <https://groups.google.com/d/msgid/phylogenetic-placement/eebc0b9e-1356-4f72-a4a2-59ac69a01a33n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Alexandros (Alexis) Stamatakis

ERA Chair, Institute of Computer Science, Foundation for Research and
Technology - Hellas
Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.biocomp.gr (Crete lab)
www.exelixis-lab.org (Heidelberg lab)
Reply all
Reply to author
Forward
0 new messages