Guillaume Cailleau
unread,Nov 23, 2022, 4:26:56 AM11/23/22Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Anvi'o
My question relates to my conceptual understanding of how Anvio works (I am quite new to it). As no question is a bad question, so this is my rationale.
My database include 3 whole genomes, that I am comparing. In a bin exclusive to 2 of these genomes I was expecting to find the same functionalities, and so the same potential metabolic capabilities.
I dug into the "module.txt" outputs and used the genes-fasta file (output of anvi-get-sequences-for-gene-clusters) as a "Rosetta stone" (GC names/genes caller IDs correspondence).
I targeted some GCs using anvio interactive.
I checked the presence of gene caller ID in my "Rosetta stone", to be sure that my genomes for the bin of interest could provide further hits for metabolism.
Most of the results are consistent, but sometimes a discrepancy occurs. In one genome I have indeed a hit (e.g. "Leucine degradation, leucine => acetoacetate + acetyl-CoA") and it is absent in the other genome (for a GC).
Could this be due to sequence variability? Is the clustering could let it pass (and aggregates the genes) and anvi-estimate-metabolism not?
PS: this questioning is not based on completeness of a module or not, because I check the whole output,
I hope my questioning makes sense to you.
Cheers
Guillaume