Hi anvi'o team,
I'm having trouble getting sequences from gene calls, because anvio gives me the following error:
Config Error: The gene calls you provided do not look like gene callers anvi'o is used to working with :/ Here is one of them: '152855' (<class 'str'>).
The gene call numbers change if I remove the 'problem' gene caller. Here's what I did:
anvi-export-gene-calls -c ${anvi_metagen}make_contig_db_05/contigs.db \
-o ${anvi_metagen}exported_gene_calls_14/output_export_gene_calls.txt \
--gene-caller prodigal --skip-sequence-reporting
### in R #####
all_gene_calls = readr::read_delim(paste0(anvi_metagen,'exported_gene_calls_14/output_export_gene_calls.txt'))
cyano_genes = read.csv(paste0(anvi_metagen,'extracted_gene_calls_15/cyano_gene_calls.csv'), header = F) %>% distinct()
noncyano_genes = all_gene_calls %>% select(gene_callers_id) %>% filter(!gene_callers_id %in% cyano_genes$V1)
write.table(noncyano_genes, row.names = F, col.names = F, paste0(anvi_metagen,'extracted_gene_calls_15/NONcyano_gene_calls.txt'))
##############
anvi-get-sequences-for-gene-calls -c ${anvi_metagen}make_contig_db_05/contigs.db \
--get-aa-sequences \
--gene-caller-ids ${anvi_metagen}extracted_gene_calls_15/NONcyano_gene_calls.txt \
-o ${anvi_metagen}interproscan_13/NONcyano_amino-acid-sequences.fa
The gene caller ids that aren't found don't seem to follow a pattern. I've been able to find some of them back in different bins, and they're not length 0. If I only run anvi-get-sequences-for-gene-calls for a subset (like the first 50 rows), it works fine, so I don't think it's the file formatting.
Lastly, I've checked that the file paths actually match. I'm running this on the anvi-dev branch, so everything should be up to date. Any ideas as to why this could be failing?
thank you,
Andrea