Pangenomics error

164 views
Skip to first unread message

Claudia Petrillo

unread,
Nov 30, 2020, 10:04:01 AM11/30/20
to Anvi'o

Hello Anvi'o team,

I tried to run a pangenomics workflow with the command below

$ anvi-pan-genome -g BACILLI-GENOMES.db --project-name new_bacilli_genomes --output-dir test --num-threads 4 --minbit 0.5 --mcl-inflation 10 --use-ncbi-blast

and got the following error message:

It seems you have 29,331 gene clusters in your pangenome. This exceeds the soft limit of 20,000 for anvi'o to attempt to create a hierarchical clustering of your gene clusters (which becomes the center tree in all anvi'o displays). If you want a hierarchical clustering to be done anyway, please see the flag`--enforce-hierarchical-clustering`.

This is the version used:

$ anvi-self-test --version


Anvi'o version ...............................: esther (v6.2)

Profile DB version ...........................: 31

Contigs DB version ...........................: 14

Pan DB version ...............................: 13

Genome data storage version ..................: 6

Auxiliary data storage version ...............: 2

Structure DB version .........................: 1


I ran the pangenomics with 98 Bacillus strains, which should be very closely related. Anyway anvi’o always finds more than 20k gene clusters, not including the hierarchical tree in the center. When I run the pangenomics with only ten strains out of 98, the tree appears. I tried different combinations of strains to understand if the problem could lie in some of the sequences, but when I run the workflow with just a few strains, it always works (the tree is there).

Can you please help me figure this out?

Thank you in advance.


A. Murat Eren

unread,
Nov 30, 2020, 11:17:08 AM11/30/20
to Anvi'o
Hi Claudia,

The way anvi'o generates and visualizes pangeomes, ~30,000 gene clusters is a bit too much to work with. I would run the anvi-pan-genome by adding this parameter to your command so the singletons are removed from the analyses:

--min-occurrence 2

Best wishes,
--

A. Murat Eren (Meren) | he/him
http://merenlab.org :: twitter


--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/c4158feb-c90f-4bcd-a727-91c2400e2487n%40googlegroups.com.

Claudia Petrillo

unread,
Dec 3, 2020, 12:45:08 PM12/3/20
to an...@googlegroups.com
Hello Meren,

thanks for the advice.
I ran the analysis with this command

anvi-pan-genome -g BACILLI-GENOMES.db --project-name new_bacilli_genomes --output-dir test  --num-threads 4  --minbit 0.5  --mcl-inflation 10  --use-ncbi-blast --min-occurrence 2 --enforce-hierarchical-clustering

This is the result:
image.png



Do you think this is good, or do you notice anything weird?
Is it normal to have so many clusters or did I make a mistake? 

Thanks for your help

A. Murat Eren

unread,
Dec 3, 2020, 3:35:27 PM12/3/20
to Anvi'o
It looks good to me. Although I think you should go to the Layers tab, and select gene cluster frequencies as an 'order' to organize your genomes so the patterns are more obvious.


Best wishes,
--

A. Murat Eren (Meren) | he/him
http://merenlab.org :: twitter

Reply all
Reply to author
Forward
0 new messages