gottcha.py: error: Incorrect BWA index: missing ... RefSeq-Release89.Virus.species.fna.gz.amb

5 views
Skip to first unread message

gmoore777

unread,
Nov 20, 2018, 12:52:31 PM11/20/18
to edge-users

The gottcha2-speDB-v analysis under Reads Taxonomy Classification fails with this message:

  gottcha.py: error: Incorrect BWA index: missing ... RefSeq-Release89.Virus.species.fna.gz.amb

I untarred all the database files found under: https://edge-dl.lanl.gov/EDGE/dev/

There isn't anything with "RefSeq-Release89..."

Is this database missing on that download site?

Guy

These are all the .amb files under the database directory:

$ find . -type f | grep ".amb" | sort
./bwa_index/Host/Cow_Btau_5.0.1/Cow.fa.amb
./bwa_index/Host/Goat_CHIR_2.0/Goat.fa.amb
./bwa_index/Host/Hamster_C_griseus_v1.0/Hamster.fa.amb
./bwa_index/Host/Human_ref_GRCh38_all/human_ref_GRCh38_all.fa.amb
./bwa_index/Host/Invertebrate_Vectors_of_Human_Pathogens/all_vector.fa.amb
./bwa_index/Host/Monkey_Chlorocebus_sabeus_1.1/Monkey.fa.amb
./bwa_index/Host/PhiX/phiX.fa.amb
./bwa_index/Host/Pig_Sscrofa10.2/Pig.fa.amb
./bwa_index/Host/Plasmids/plasmids.fa.amb
./bwa_index/Host/RefSeq_Bacteria/NCBI_genome_Bacteria.fa.amb
./bwa_index/Host/RefSeq_Viruses/NCBI_genome_Viruses.fa.amb
./bwa_index/Host/rRNA/rRNA.fasta.amb
./bwa_index/Host/Sheep_Oar_v4.0/Sheep.fa.amb
./bwa_index/Host/UniVec_core/UniVec_Core.fa.amb
./bwa_index/NCBI-Bacteria-Virus.fna.amb
./GOTTCHA2/RefSeq-Release81.Bacteria.species.fna.amb
./GOTTCHA2/RefSeq-Release81.Virus.genus.fna.amb
./GOTTCHA2/RefSeq-Release81.Virus.species.fna.amb
./GOTTCHA2/RefSeq-Release81.Virus.strain.fna.amb
./GOTTCHA/GOTTCHA_BACTERIA_c4937_k24_u30_xHUMAN3x.genus.amb
./GOTTCHA/GOTTCHA_BACTERIA_c4937_k24_u30_xHUMAN3x.species.amb
./GOTTCHA/GOTTCHA_BACTERIA_c4937_k24_u30_xHUMAN3x.strain.amb
./GOTTCHA/GOTTCHA_VIRUSES_c5900_k24_u30_xHUMAN3x.genus.amb
./GOTTCHA/GOTTCHA_VIRUSES_c5900_k24_u30_xHUMAN3x.species.amb
./GOTTCHA/GOTTCHA_VIRUSES_c5900_k24_u30_xHUMAN3x.strain.amb
./PanGIA/NCBI_genomes_refseq86_BAV.fa.amb
./PanGIA/NCBI_genomes_refseq86_Human.fa.amb
./PanGIA/NCBI_genomes_refseq86_Plasmodium.fa.amb
./SNPdb/Bacillus/results/ambiguousSNPpositions.txt
./SNPdb/Brucella/results/ambiguousSNPpositions.txt
./SNPdb/Ecoli/results/ambiguousSNPpositions.txt
./SNPdb/Francisella/results/ambiguousSNPpositions.txt
./SNPdb/Yersinia/results/ambiguousSNPpositions.txt


Lo, Chien-Chi

unread,
Nov 20, 2018, 12:55:26 PM11/20/18
to gmoore777, edge-users

--
You received this message because you are subscribed to the Google Groups "edge-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to edge-users+...@googlegroups.com.
To post to this group, send email to edge-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/edge-users/02f937f4-0c6c-4505-9a30-5afe15e5fe60%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Greenleaf

unread,
Nov 20, 2018, 1:09:47 PM11/20/18
to Lo, Chien-Chi, edge-users, guy

I have already untarred those: All "81" files. No "89" files.

Here are the contents of them:

$ tar -tvzf  edge_dev_GOTTCHA2_virus_db.tgz
-rw-rw-r-- edge_user/genome_users 20 2017-08-30 06:35 database/GOTTCHA2/RefSeq-Release81.Virus.genus.fna.amb
-rw-rw-r-- edge_user/genome_users 104940898 2017-08-30 06:35 database/GOTTCHA2/RefSeq-Release81.Virus.genus.fna.ann
-rw-rw-r-- edge_user/genome_users 372400364 2017-08-30 06:35 database/GOTTCHA2/RefSeq-Release81.Virus.genus.fna.bwt
-rw-rw-r-- edge_user/genome_users  93100067 2017-08-30 06:35 database/GOTTCHA2/RefSeq-Release81.Virus.genus.fna.pac
-rw-rw-r-- edge_user/genome_users 186200184 2017-08-30 06:36 database/GOTTCHA2/RefSeq-Release81.Virus.genus.fna.sa
-rw-rw-r-- edge_user/genome_users   1761359 2018-03-22 21:55 database/GOTTCHA2/RefSeq-Release81.Virus.genus.fna.stats
-rw-rw-r-- edge_user/genome_users        20 2017-08-30 06:27 database/GOTTCHA2/RefSeq-Release81.Virus.species.fna.amb
-rw-rw-r-- edge_user/genome_users  99388200 2017-08-30 06:27 database/GOTTCHA2/RefSeq-Release81.Virus.species.fna.ann
-rw-rw-r-- edge_user/genome_users 313603196 2017-08-30 06:26 database/GOTTCHA2/RefSeq-Release81.Virus.species.fna.bwt
-rw-rw-r-- edge_user/genome_users  78400779 2017-08-30 06:27 database/GOTTCHA2/RefSeq-Release81.Virus.species.fna.pac
-rw-rw-r-- edge_user/genome_users 156801608 2017-08-30 06:29 database/GOTTCHA2/RefSeq-Release81.Virus.species.fna.sa
-rw-rw-r-- edge_user/genome_users   1682858 2018-03-22 21:55 database/GOTTCHA2/RefSeq-Release81.Virus.species.fna.stats
-rw-rw-r-- edge_user/genome_users        19 2017-08-30 06:22 database/GOTTCHA2/RefSeq-Release81.Virus.strain.fna.amb
-rw-rw-r-- edge_user/genome_users  43557514 2017-08-30 06:22 database/GOTTCHA2/RefSeq-Release81.Virus.strain.fna.ann
-rw-rw-r-- edge_user/genome_users 105429232 2017-08-30 06:22 database/GOTTCHA2/RefSeq-Release81.Virus.strain.fna.bwt
-rw-rw-r-- edge_user/genome_users  26357284 2017-08-30 06:22 database/GOTTCHA2/RefSeq-Release81.Virus.strain.fna.pac
-rw-rw-r-- edge_user/genome_users  52714616 2017-08-30 06:22 database/GOTTCHA2/RefSeq-Release81.Virus.strain.fna.sa
-rw-rw-r-- edge_user/genome_users    514801 2018-03-22 21:55 database/GOTTCHA2/RefSeq-Release81.Virus.strain.fna.stats
-rw-rw-r-- edge_user/genome_users   7185019 2017-09-05 17:22 database/GOTTCHA2/taxonomy.custom.tsv
-rw-rw-r-- edge_user/genome_users  80867763 2017-09-05 22:28 database/GOTTCHA2/taxonomy.tsv
$


$ tar -xvzf edge_dev_GOTTCHA2_bac_db.tgz
database/GOTTCHA2/RefSeq-Release81.Bacteria.species.fna.amb
database/GOTTCHA2/RefSeq-Release81.Bacteria.species.fna.ann
database/GOTTCHA2/RefSeq-Release81.Bacteria.species.fna.bwt
database/GOTTCHA2/RefSeq-Release81.Bacteria.species.fna.log
database/GOTTCHA2/RefSeq-Release81.Bacteria.species.fna.pac
database/GOTTCHA2/RefSeq-Release81.Bacteria.species.fna.sa
database/GOTTCHA2/RefSeq-Release81.Bacteria.species.fna.stats
database/GOTTCHA2/taxonomy.custom.tsv
database/GOTTCHA2/taxonomy.tsv
$

Lo, Chien-Chi

unread,
Nov 20, 2018, 1:24:44 PM11/20/18
to Greenleaf, edge-users, guy

Yes. I think what you downloaded is for EDGE release version 2.3.1

 

Sorry for the confusion. If users want to use the development version of EDGE, please keep eyes on the documentations (https://edge.readthedocs.io/en/latest/installation.html). We tried hard to keep it up to date.  The latest code bases we pushed to github development branch last week update the GOTTCHA2 database again. It uses minimap2 index, instead of bwa index, which improves the memory usage.

 

We appreciate users to work on the development version and report any issues to us.

 

Thanks,

Chienchi

Greenleaf

unread,
Nov 27, 2018, 6:37:34 PM11/27/18
to Lo, Chien-Chi, edge-users

1.)
I thought I was using the Dev version and not 2.3.1.(sorry)
But on my Edge main page, I do see "RUN EDGE 2.3.1".
Will that say "RUN EDGE dev" if I was using the "dev" version?

2A.)
I do see how the instructions changed to get these 2 databases which are now in the "/DB/" directory of
https://edge-dl.lanl.gov/EDGE/dev/

wget -c https://edge-dl.lanl.gov/EDGE/DB/edge_GOTTCHA2_db_20181115.tgz
wget -c https://edge-dl.lanl.gov/EDGE/DB/edge_dev_otherHostIndex.tgz

I also see this other database file: edge_GOTTCHA2_vir_db_20181011.tgz

I dragged all 3 files over, but I still don't see any "91" files:

$ tar -xvzf edge_GOTTCHA2_db_20181115.tgz
database/GOTTCHA2/RefSeq-r90.cg.BacteriaViruses.species.fna.mmi
database/GOTTCHA2/RefSeq-r90.cg.BacteriaViruses.species.fna.stats
database/GOTTCHA2/taxdump.tar.gz
database/GOTTCHA2/taxonomy.custom.tsv
$
$ tar -xzvf edge_GOTTCHA2_vir_db_20181011.tgz
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.amb
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.ann
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.bwt
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.pac
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.sa
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.stats
database/GOTTCHA2/merged.dmp
database/GOTTCHA2/names.dmp
database/GOTTCHA2/nodes.dmp
database/GOTTCHA2/taxonomy.custom.tsv
$
$ tar -xzvf edge_dev_otherHostIndex.tgz
database/bwa_index/Host/Cow_Btau_5.0.1/
database/bwa_index/Host/Cow_Btau_5.0.1/Cow.fa
database/bwa_index/Host/Cow_Btau_5.0.1/Cow.fa.amb
database/bwa_index/Host/Cow_Btau_5.0.1/Cow.fa.ann
database/bwa_index/Host/Cow_Btau_5.0.1/Cow.fa.bwt
database/bwa_index/Host/Cow_Btau_5.0.1/Cow.fa.pac
database/bwa_index/Host/Cow_Btau_5.0.1/Cow.fa.sa
database/bwa_index/Host/Goat_CHIR_2.0/
database/bwa_index/Host/Goat_CHIR_2.0/Goat.fa
database/bwa_index/Host/Goat_CHIR_2.0/Goat.fa.amb
database/bwa_index/Host/Goat_CHIR_2.0/Goat.fa.ann
database/bwa_index/Host/Goat_CHIR_2.0/Goat.fa.bwt
database/bwa_index/Host/Goat_CHIR_2.0/Goat.fa.pac
database/bwa_index/Host/Goat_CHIR_2.0/Goat.fa.sa
database/bwa_index/Host/README.txt
database/bwa_index/Host/Hamster_C_griseus_v1.0/
database/bwa_index/Host/Hamster_C_griseus_v1.0/Hamster.fa
database/bwa_index/Host/Hamster_C_griseus_v1.0/Hamster.fa.amb
database/bwa_index/Host/Hamster_C_griseus_v1.0/Hamster.fa.ann
database/bwa_index/Host/Hamster_C_griseus_v1.0/Hamster.fa.bwt
database/bwa_index/Host/Hamster_C_griseus_v1.0/Hamster.fa.pac
database/bwa_index/Host/Hamster_C_griseus_v1.0/Hamster.fa.sa
database/bwa_index/Host/Monkey_Chlorocebus_sabeus_1.1/
database/bwa_index/Host/Monkey_Chlorocebus_sabeus_1.1/Monkey.fa
database/bwa_index/Host/Monkey_Chlorocebus_sabeus_1.1/Monkey.fa.amb
database/bwa_index/Host/Monkey_Chlorocebus_sabeus_1.1/Monkey.fa.ann
database/bwa_index/Host/Monkey_Chlorocebus_sabeus_1.1/Monkey.fa.bwt
database/bwa_index/Host/Monkey_Chlorocebus_sabeus_1.1/Monkey.fa.pac
database/bwa_index/Host/Monkey_Chlorocebus_sabeus_1.1/Monkey.fa.sa
database/bwa_index/Host/Pig_Sscrofa10.2/
database/bwa_index/Host/Pig_Sscrofa10.2/Pig.fa
database/bwa_index/Host/Pig_Sscrofa10.2/Pig.fa.amb
database/bwa_index/Host/Pig_Sscrofa10.2/Pig.fa.ann
database/bwa_index/Host/Pig_Sscrofa10.2/Pig.fa.bwt
database/bwa_index/Host/Pig_Sscrofa10.2/Pig.fa.pac
database/bwa_index/Host/Pig_Sscrofa10.2/Pig.fa.sa
database/bwa_index/Host/Plasmids/
database/bwa_index/Host/Plasmids/plasmids.fa
database/bwa_index/Host/Plasmids/plasmids.fa.amb
database/bwa_index/Host/Plasmids/plasmids.fa.ann
database/bwa_index/Host/Plasmids/plasmids.fa.bwt
database/bwa_index/Host/Plasmids/plasmids.fa.pac
database/bwa_index/Host/Plasmids/plasmids.fa.sa
database/bwa_index/Host/rRNA/
database/bwa_index/Host/rRNA/rRNA.fasta
database/bwa_index/Host/rRNA/rRNA.fasta.amb
database/bwa_index/Host/rRNA/rRNA.fasta.ann
database/bwa_index/Host/rRNA/rRNA.fasta.bwt
database/bwa_index/Host/rRNA/rRNA.fasta.pac
database/bwa_index/Host/rRNA/rRNA.fasta.sa
database/bwa_index/Host/Sheep_Oar_v4.0/
database/bwa_index/Host/Sheep_Oar_v4.0/Sheep.fa
database/bwa_index/Host/Sheep_Oar_v4.0/Sheep.fa.amb
database/bwa_index/Host/Sheep_Oar_v4.0/Sheep.fa.ann
database/bwa_index/Host/Sheep_Oar_v4.0/Sheep.fa.bwt
database/bwa_index/Host/Sheep_Oar_v4.0/Sheep.fa.pac
database/bwa_index/Host/Sheep_Oar_v4.0/Sheep.fa.sa
database/bwa_index/Host/UniVec_core/
database/bwa_index/Host/UniVec_core/UniVec_Core.fa
database/bwa_index/Host/UniVec_core/UniVec_Core.fa.amb
database/bwa_index/Host/UniVec_core/UniVec_Core.fa.ann
database/bwa_index/Host/UniVec_core/UniVec_Core.fa.bwt
database/bwa_index/Host/UniVec_core/UniVec_Core.fa.pac
database/bwa_index/Host/UniVec_core/UniVec_Core.fa.sa
$


2B.)
Should I assume that whenever any databases change or get updated on
https://edge-dl.lanl.gov/EDGE/,
that I should also download the latest "edge_dev_main.tgz"
because the Databasenames/DirectoryNames/FileNames are hardcoded in the software(i.e.: edge_dev_main.tgz)?

Or can I update the database directory without, deleting and untarring and installing a new edge_dev_main.tgz?


3.)
I see that the older file of:
https://edge-dl.lanl.gov/EDGE/dev/edge_dev_GOTTCHA2_virus_db.tgz
is 883 MB
but the newer file:
https://edge-dl.lanl.gov/EDGE/DB/edge_GOTTCHA2_vir_db_20181011.tgz
is a lot smaller at 179 MB.
Is this expected?

And the newer instructions at
https://edge.readthedocs.io/en/develop/installation.html
do not mention using either the newer or older file.


4.)
I plan on creating a new Linux machine (in a week or so), and going through all the installation steps again, to see if some of the issues I encountered disappear.

Thanks ahead for any comments.


gmoore777

Lo, Chien-Chi

unread,
Nov 27, 2018, 7:12:35 PM11/27/18
to Greenleaf, edge-users

Sorry for confusion on all this again.  EDGE is under active development. Unless users are familiar with the platform, please use the stable release version and follow the correct version documentation site.  Since you are going to build from scratch again. Please use the stable release version.  Or we highly recommend using Docker image for ease of installation.

 

Release version 2.3 documentation:

https://edge.readthedocs.io/en/2.x/installation.html

 

EDGE Docker:

https://hub.docker.com/r/bioedge/edge_dev/

 

 

We put warning on the top of the develop version documentation site in case user pull wrong database while we are updating it at same time.

 

 

If users want to use the development version of EDGE, please keep eyes on the documentations (https://edge.readthedocs.io/en/latest/installation.html). We tried hard to keep it up to date. 

 

 

1)

 

Since it is develop on top of the version 2.3.1, we have not update the version number in the development code base yet. We have fix it to EDGE 2.4.0 dev recently.

 

2A)

 

In the issue/email title, the missing index is *89*.  I think you already download and untar it.

 

----

$ tar -xzvf edge_GOTTCHA2_vir_db_20181011.tgz
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.amb
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.ann
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.bwt
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.pac
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.sa
database/GOTTCHA2/RefSeq-Release89.Virus.species.fna.gz.stats
database/GOTTCHA2/merged.dmp
database/GOTTCHA2/names.dmp
database/GOTTCHA2/nodes.dmp
database/GOTTCHA2/taxonomy.custom.tsv

----

 

2B)  and 3)

The short answer is Yes.  Since you are using development version of EDGE which we are still updating all code and database overtime.

 

4) response in the beginning.  

Reply all
Reply to author
Forward
0 new messages