Genes missing from pangenome analysis

24 views
Skip to first unread message

Emily St. John

unread,
Jun 30, 2022, 4:58:44 PM6/30/22
to Anvi'o
Hi Anvi'o developers, 
A quick question for you. I've generated a pangenome from 8 genomes, with a total of 14,462 non-partial genes (command "anvi-pan-genome -g pan-GENOMES.db -n pangenome --exclude-partial-gene-calls"). The pangenome run logs reflect the correct number of amino acid sequences. However, when I add a default collection and run anvi-summarize (command: anvi-summarize -p pangenome/pangenome-PAN.db -g pan-GENOMES.db --collection-name default -o summarize), I only recover 14,451 genes. One of these is marked in the logs as non-coding, but I cannot determine a reason the other 10 were excluded. Any thoughts on why this may be happening? I'm running conda-installed Anvi'o 7.1 on macOS Monterey.
Thanks so much!
Emily

Florian Trigodet

unread,
Jun 30, 2022, 6:15:30 PM6/30/22
to an...@googlegroups.com
Hi Emily,

Non-coding genes are ribosomal RNA and transfer RNA for instance.
They are not included in a pangenome analysis as the gene cluster are based on the amino-acid sequence.
Do you think it could be rRNAs or tRNAs you're missing here?

Best,
Florian
--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/8dec0296-5a52-4538-a014-191d188a6825n%40googlegroups.com.

Emily St. John

unread,
Jun 30, 2022, 6:28:17 PM6/30/22
to an...@googlegroups.com
Hi Florian, 

Unfortunately, these are coding proteins that are missing. I've checked and all are present in the external-gene-calls files generated from RefSeq, marked as complete and coding, and have amino acid sequences provided. I'm happy to send some specific accession numbers if that would be helpful. 

Thanks so much!
Best,
Emily

Florian Trigodet

unread,
Jun 30, 2022, 7:11:27 PM6/30/22
to an...@googlegroups.com
Hi Emiliy,

I carefully checked a pangenome I recently made and notice one coding gene was also missing in the final summary.
After inspection it was a very small gene (21 amino acid). Have you observed something similar with the missing genes?
I wonder if there is a step in the pagenome analysis which discard genes too small?
Very intriguing.

Florian
--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

Emily St. John

unread,
Jun 30, 2022, 7:24:21 PM6/30/22
to an...@googlegroups.com
Hi Florian, 

Interesting! My genes are mostly pretty short, ~49-55 amino acids, but two are closer to 100 amino acids. However, I have shorter genes that made it into the pangenome without any issues. I did notice that several (but again, not all) of the genes that are missing are highly repetitive sequences... not sure if that's playing a part? 

Best,
Emily 


You received this message because you are subscribed to a topic in the Google Groups "Anvi'o" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/anvio/c84ABjIDEEQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/155aa6ec-a39d-f512-fd25-507a3aff878c%40gmail.com.


Florian Trigodet

unread,
Jul 8, 2022, 11:58:05 AM7/8/22
to an...@googlegroups.com
Hi Emily,

We have found that Diamond use a mask for low complexity sequences (including repeats) and simply discard them with no information for the anvi'o user.
Hopefully, we can change the default parameters of DIamond and the missing genes should be back on the menu!
The associated github issue:
https://github.com/merenlab/anvio/issues/1955

Best,
Florian
Reply all
Reply to author
Forward
0 new messages