Does Gene caller IDs from a GC shouldn't all have positive hits following anvi-estimate-metabolism run?

21 views
Skip to first unread message

Guillaume Cailleau

unread,
Nov 23, 2022, 4:26:56 AM11/23/22
to Anvi'o
My question  relates to my conceptual understanding of how Anvio works (I am quite new to it). As no question is a bad question, so this is my rationale.

My database include 3 whole genomes, that I am comparing. In a bin exclusive to 2 of these genomes I was expecting to find the same functionalities, and so the same potential metabolic capabilities.

I dug into the "module.txt" outputs and used the genes-fasta file (output of anvi-get-sequences-for-gene-clusters) as a "Rosetta stone" (GC names/genes caller IDs correspondence).

I targeted some GCs using anvio interactive.
I checked the presence of gene caller ID in my "Rosetta stone", to be sure that my genomes for the bin of interest could provide further hits for metabolism.
Most of the results are consistent, but sometimes a discrepancy occurs. In one genome I have indeed a hit (e.g. "Leucine degradation, leucine => acetoacetate + acetyl-CoA") and it is absent in the other genome (for a GC).

Could this be due to sequence variability? Is the clustering could let it pass (and aggregates the genes) and  anvi-estimate-metabolism not?

PS: this questioning is not based on completeness of a module or not, because I check the whole output,

I hope my questioning makes sense to you.
Cheers
Guillaume

A. Murat Eren (Meren)

unread,
Nov 23, 2022, 4:35:09 AM11/23/22
to an...@googlegroups.com
Hey Guillaume,

You're wondering why some genes in the same gene cluster are missing functional annotations that are shared by other genes in the same GC.

This is unfortunately due to variations in gene sequences confusing HMMs for edge cases. The pangenome is extremely powerful to recover those associations, but even then I find it confusing how sequence features that are enough to bring together some genes are not equally accessible to a HMM since the latter offers a more comprehensive model to recognize similarities that conventional alignments BLAST or DIAMOND uses.

In summary, it is about the sensitivity of HMMs from KEGG :/ I hope this helps.


Best wishes,
--

A. Murat Eren
 (Meren) | he/him


--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/bee3c1c8-99c9-41b2-a560-0858c2b5cbfen%40googlegroups.com.

Guillaume Cailleau

unread,
Nov 23, 2022, 4:56:32 AM11/23/22
to Anvi'o
Hi Meren,
Thank you for these explanations, that confirm my guessing but with more accurate wording.
Many thanks for this nice tool you (all) 're developing...
Best regards,
Guillaume
Reply all
Reply to author
Forward
0 new messages