Gene Caller IDs not found

28 views
Skip to first unread message

Andrea Cast

unread,
Apr 28, 2022, 6:17:46 AM4/28/22
to an...@googlegroups.com
Hi anvi'o team,
I'm having trouble getting sequences from gene calls, because anvio gives me the following error:

Config Error: The gene calls you provided do not look like gene callers anvi'o is used to working with :/ Here is one of them: '152855' (<class 'str'>). 

The gene call numbers change if I remove the 'problem' gene caller. Here's what I did:
anvi-export-gene-calls -c ${anvi_metagen}make_contig_db_05/contigs.db \
-o ${anvi_metagen}exported_gene_calls_14/output_export_gene_calls.txt \
--gene-caller prodigal --skip-sequence-reporting

### in R #####
all_gene_calls = readr::read_delim(paste0(anvi_metagen,'exported_gene_calls_14/output_export_gene_calls.txt'))
cyano_genes = read.csv(paste0(anvi_metagen,'extracted_gene_calls_15/cyano_gene_calls.csv'), header = F) %>% distinct()
noncyano_genes = all_gene_calls %>% select(gene_callers_id) %>% filter(!gene_callers_id %in% cyano_genes$V1)
write.table(noncyano_genes, row.names = F, col.names = F, paste0(anvi_metagen,'extracted_gene_calls_15/NONcyano_gene_calls.txt'))
##############
anvi-get-sequences-for-gene-calls -c ${anvi_metagen}make_contig_db_05/contigs.db \
--get-aa-sequences \
--gene-caller-ids ${anvi_metagen}extracted_gene_calls_15/NONcyano_gene_calls.txt \
-o ${anvi_metagen}interproscan_13/NONcyano_amino-acid-sequences.fa
The gene caller ids that aren't found don't seem to follow a pattern. I've been able to find some of them back in different bins, and they're not length 0. If I only run anvi-get-sequences-for-gene-calls for a subset (like the first 50 rows), it works fine, so I don't think it's the file formatting. 
Lastly, I've checked that the file paths actually match. I'm running this on the anvi-dev branch, so everything should be up to date. Any ideas as to why this could be failing?
thank you,
Andrea

A. Murat Eren (Meren)

unread,
May 4, 2022, 9:07:35 AM5/4/22
to Anvi'o
Hi Andrea,

Sorry for the late response! I just discovered this email in my spam folder :( Which makes me realize perhaps we should end Google Forums and carry on only with anvi'o Slack (which is quite active, and easy to keep an eye on).

I'm not sure if I fully understand the problem, but to me it seems like those missing gene caller ids are likely associated with the gene calls added by HMM profiles (so their source will not be 'prodigal', but other things such as 'Transfer_RNAs' etc). But what is also concerning is that anvi'o says your gene call is a string. It should always be an integer. So this actually may be due to a bug. 

It would have been very helpful for me to find out if this is a bug if you were to send me privately the contigs-db from which you're getting this error, and the exact command line that I can use to get the same error.

Of course, if you are still on this bug after 6 days :/


Best wishes,
--

A. Murat Eren
 (Meren) | he/him


--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/CAFski%2BbKFBvqyR04F%3D%3DVWhOdjXCUSYKpXfhTsMrzy2R7BQvjjA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages