Using external gene calls in anvi-gen-genomes-storage / pagenomics

221 views
Skip to first unread message

Jessica Jarett

unread,
Nov 8, 2017, 9:44:51 PM11/8/17
to Anvi'o
Hey all,

I have some genomes from IMG that I want to look at with the pangenome workflow. As I'd like the additional information available in IMG, I am using external gene calls and importing pfam annotations from IMG. Contigs dbs and profiles are fine, gene calls and annotations are visible when inspecting splits, etc. But anvi-gen-genomes-storage refuses to use these gene calls (with or without --gene-caller prodigal in the command). Does anvio expect the "source" column in the --external-gene-calls file to say exactly "prodigal" in order for these to be compatible? Is there some other secret to make this work or am I trying to do something which is not possible? Thanks!

Cheers
Jess


122:anvio_IMG_nano_SAGs jkjarett$ anvi-gen-genomes-storage -i Nano_bins_list_2017111.txt -o Nano_bins-GENOMES.h5 --gene-caller prodigal

WARNING
===============================================
Good news! Anvi'o found all these functions that are common to all of your
genomes and will use them for downstream analyses and is very proud of you:
'COG_FUNCTION, Pfam, COG_CATEGORY'.

Internal genomes .............................: 22 have been initialized.                                                                                                                                        
External genomes .............................: 0 found.                                                                                                                                                         

WARNING
===============================================
PLEASE READ CAREFULLY. Some of your genomes had gene calls identified by gene
callers other than the gene caller anvi'o used (which should be 'prodigal'
unless you specified another one). As a result, the following genomes contained
gene calls coming from other gene callers that did not get processed. This may
be exactly what you expected to happen, but if was not, you may need to use the
`--gene-caller` flag to make sure anvi'o is using the gene caller it should be
using. Here is the list: AB_777_F03_Nano (523 gene calls by "img_core_v400", 2
gene calls by "Ribosomal_RNAs"), AB_777_O03_Nano (654 gene calls by
"img_core_v400", 1 gene calls by "Ribosomal_RNAs"), AD_903_B02_Nano (167 gene
calls by "Prodigal V2.6.3 February, 2016 + prokaryotic GeneMark.hmm version
2.8", 15 gene calls by "Prodigal V2.6.3 February, 2016", 2 gene calls by
"prokaryotic GeneMark.hmm version 2.8"), ....


122:anvio_IMG_nano_SAGs jkjarett$ anvi-interactive --version
Anvi'o version ...............................: 3
Profile DB version ...........................: 20
Contigs DB version ...........................: 9
Pan DB version ...............................: 5
Samples information DB version ...............: 2
Genome data storage version ..................: 1
Auxiliary data storage version ...............: 4
Anvi'server users data storage version .......: 1

A. Murat Eren

unread,
Nov 11, 2017, 4:40:52 PM11/11/17
to Anvi'o
Hi Jess,

Sorry for the late response. Looking at the warning message:

WARNING
===============================================
PLEASE READ CAREFULLY. Some of your genomes had gene calls identified by gene
callers other than the gene caller anvi'o used (which should be 'prodigal'
unless you specified another one). As a result, the following genomes contained
gene calls coming from other gene callers that did not get processed. This may
be exactly what you expected to happen, but if was not, you may need to use the
`--gene-caller` flag to make sure anvi'o is using the gene caller it should be
using. Here is the list: AB_777_F03_Nano (523 gene calls by "img_core_v400", 2
gene calls by "Ribosomal_RNAs"), AB_777_O03_Nano (654 gene calls by
"img_core_v400", 1 gene calls by "Ribosomal_RNAs"), AD_903_B02_Nano (167 gene
calls by "Prodigal V2.6.3 February, 2016 + prokaryotic GeneMark.hmm version
2.8", 15 gene calls by "Prodigal V2.6.3 February, 2016", 2 gene calls by
"prokaryotic GeneMark.hmm version 2.8"), ...

I have the feeling that you should use this command instead:

anvi-gen-genomes-storage -i Nano_bins_list_2017111.txt -o Nano_bins-GENOMES.h5 --gene-caller img_core_v400

Please let us know if this works!


Best,

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/a208abc7-66bc-4000-8d7d-46f8f5722bd6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jessica Jarett

unread,
Nov 13, 2017, 2:13:02 PM11/13/17
to Anvi'o
Hi Meren,

Good thinking, this works, sort of. I have about five different gene callers listed for these genomes though. This command works for the 2 genomes which have that gene caller, but not for any of the others. I tried a couple of different ways to specify multiple gene callers but that doesn't seem to be supported. I guess I could go back and just replace this field with a dummy value like "prodigal" for all my genomes, but then I would have to re-do all these profiles (again...), right? Can I request that multiple gene callers be allowed here, or that I can tell anvio to not care how my genes were called (and probably get a sassy warning in response)? 
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

A. Murat Eren

unread,
Nov 13, 2017, 2:21:11 PM11/13/17
to Anvi'o
Hi Jess,

On Mon, Nov 13, 2017 at 1:13 PM, Jessica Jarett <jkja...@lbl.gov> wrote:
I guess I could go back and just replace this field with a dummy value like "prodigal" for all my genomes, but then I would have to re-do all these profiles (again...), right?

​You could do that, and you wouldn't have to re-profile anything :) ​Anvi'o is (sometimes) smart like that.

 
Can I request that multiple gene callers be allowed here, or that I can tell anvio to not care how my genes were called (and probably get a sassy warning in response)? 

​We can discuss more, but I think allowing multiple gene callers for pangenomic analyses is not a very appropriate thing to do (since the same ORF may be identified in multiple different gene callers).​ But we can give the opportunity to user in case they just want to watch the world burn.


​Best,​

Jessica Jarett

unread,
Nov 13, 2017, 2:44:07 PM11/13/17
to Anvi'o
Ok, is there a command I should re-run once I modify these files? Originally I used these gene calls in anvi-gen-contigs-database, do I just run that again with the new "gene caller" listed?

In principle I agree with you re: multiple gene callers for pangenomes. In this specific case I would argue that it is ok since all of these gene calls and annotations are coming from IMG. So, although multiple methods have been used they have been unified such that if multiple gene callers find the same ORF, they are all listed together for that entry, rather than being duplicated. So some gene calls are like this: "Prodigal V2.6.3 February, 2016 + prokaryotic GeneMark.hmm version 2.8", others are "Prodigal V2.6.3 February, 2016" , etc. Anvio adds the ribosomal RNA gene calls on top, but I took out tRNAs and rRNAs from my gene calls beforehand so those should also not be duplicated. What do you think, is duplication of ORFs the primary problem or do you think there remains serious problems with using different gene callers in different genomes? 

A. Murat Eren

unread,
Nov 13, 2017, 3:04:28 PM11/13/17
to Anvi'o

--

A. Murat Eren (meren)
http://merenlab.org :: twitter :: gpg

On Mon, Nov 13, 2017 at 1:44 PM, Jessica Jarett <jkja...@lbl.gov> wrote:
Ok, is there a command I should re-run once I modify these files?

You can use this sqlite command if you want to feel particularly hackery today:

sqlite3 CONTIGS.db 'update genes_in_contigs set source = "prodigal", version="v2.6.2";'

​and it should fix everything :)​

In principle I agree with you re: multiple gene callers for pangenomes. In this specific case I would argue that it is ok

​Yes, in this particular case it would certainly be OK, but ​if we make this available to everyone, it may be too tempting to not do it for some of us even when it would certainly not be OK :) I am casting "pessimism" while thinking long term here.


Best,

Reply all
Reply to author
Forward
0 new messages