Hi Anvio Team,
I am going through the snakemake workflows tutorial. Being but a novice in bioinformatics, I have encountered some errors that are beyond my troubleshooting skills.
My first errors were fixable, I got errors that the config-contigs.json file included in the mock data was lacking a "workflow_name" parameter, so I added that (set to "workflow_name": "contigs") and then I ran anvi-migrate to update it to the correct version.
Then when I try to run the Contigs workflow with anvi-run-workflow using the abovementioned config file I get this:
anvi-run-workflow -w contigs -c config-contigs.json
WARNING
===============================================
If you publish results from this workflow, please do not forget to cite
snakemake (doi:10.1093/bioinformatics/bts480)
WARNING
===============================================
We are initiating parameters for the contigs workflow
WARNING
===============================================
We are initiating parameters for the contigs workflow
Shell programs for the workflow
===============================================
Needed .......................................: gunzip, anvi-script-reformat-fasta, anvi-script-reformat-fasta, anvi-gen-contigs-database, anvi-import-functions, anvi-get-sequences-for-gene-calls, centrifuge, anvi-import-taxonomy-for-genes, anvi-run-hmms, anvi-run-pfams, anvi-run-ncbi-cogs, anvi-run-scg-taxonomy, anvi-scan-trnas, anvi-get-sequences-for-gene-calls
Missing ......................................: None
Building DAG of jobs...
InputFunctionException in line 178 of /home/dennistcc/miniconda3/envs/anvio-6.2/lib/python3.6/site-packages/anvio/workflows/contigs/Snakefile:
KeyError: '{group}'
Wildcards:
group={group}
I have done the one-time runs of anvi-setup-ncbi-cogs and anvi-setup-scg-databases, although one weird thing with anvi-setup-scg-databases is that it says the database is successfully downloaded, but it still ends with this error message:
(anvio-6.2) anvi-setup-scg-databases --reset
WARNING
===============================================
The existing directory for SCG taxonomy data dir has been removed. Just so you
know.
WARNING
===============================================
Please remember that the data anvi'o uses for SCG taxonomy is a courtesy of The
Genome Taxonomy Database (GTDB), an initiative to establish a standardised
microbial taxonomy based on genome phylogeny, primarly funded by tax payers in
Australia. Please don't forget to cite the original work, doi:10.1038/nbt.4229
by Parks et al to explicitly mention the source of databases anvi'o relies upon
to estimate genome level taxonomy. If you are not sure how it should look like
in your methods sections, anvi'o developers will be happy to help you if you
can't find any published example to get inspiration.
Local directory to setup .....................: /home/dennistcc/miniconda3/envs/anvio-6.2/lib/python3.6/site-packages/anvio/data/misc/SCG_TAXONOMY/GTDB
Reset the directory first ....................: True
Remote database ..............................: GTDB
Remote files of interest .....................: VERSION, ar122_msa_individual_genes.tar.gz, ar122_taxonomy.tsv, bac120_msa_individual_genes.tar.gz, bac120_taxonomy.tsv
GTDB release found ...........................: v95 (Released July 17, 2020)
Downloaded succesfully .......................: /home/dennistcc/miniconda3/envs/anvio-6.2/lib/python3.6/site-packages/anvio/data/misc/SCG_TAXONOMY/GTDB/VERSION
Config Error: Something went wrong with your download attempt. Here is the problem: 'HTTP
Error 404: Not Found'
Could it be related? Any help would be greatly appreciated.
Sincerely, Dennis (an overwhelmed bioinformatics padawan)