Missing files when creating the Deconseq database

60 views
Skip to first unread message

lord...@gmail.com

unread,
May 11, 2018, 10:47:03 AM5/11/18
to Edwards Lab Tools
Hello,

I have seen other people having this problem but haven't found a solution anywhere.

Here is my problem. I need to remove some contaminant reads from an NGS dataset. I downloaded the genome of the species I think is the contaminant, and created an indexed database for deconseq using BWA. Unfortunately when I launch Deconseq I get the error message: "ERROR: cannot find all database files for database "your_db" in dir "db/"."

Indeed, in my database folder I only have 5 files instead of 8, despite the fact the indexing was complete without errors. Apparently it is due to the fact that the version of bwa I use (0.6.2) handles forward and reverse reads differently than the previous versions (< 0.6), thus creating 3 files less then before.

Has anyone been able to circumvent this issue? I tried to install an older version of bwa but it does not seem possible to have several versions installed.

I am running out of ideas, any suggestion would be more than welcome.

Sébastien

alexandra...@gmail.com

unread,
Aug 1, 2018, 11:53:27 PM8/1/18
to Edwards Lab Tools
Hi,

I got the same problem as you (I was using BWA 0.7 to index the db); you can simply use BWA64 that comes with the standalone version of DeconSeq, it worked for me

Good luck!

Alex

sgsut...@gmail.com

unread,
Apr 22, 2020, 4:51:08 PM4/22/20
to Edwards Lab Tools
I had a similar issue, as using BWA64 breaks when using mac os catalina

-bwaMAC: Bad CPU type in executable

molda...@gmail.com

unread,
May 12, 2020, 4:59:33 PM5/12/20
to Edwards Lab Tools
I've used the `bwa64` that came with the standalone version but I'm still getting the error. I've tried placing the files in a `db` folder within the working directory and have also tried placing them in `~/db` but I still get the same error.

```
# Downloading
for i in {1..22} X Y; do wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_assembly_structure/Primary_Assembly/assembled_chromosomes/FASTA/chr$i.fna.gz

# Extract and Join
for i in {1..22} X Y; do gzip -dvc chr$i.fna.gz >> hs_ref_GRCh37_p13.fa; rm chr$i.fna.gz; done

# Splitting by long repeats of ambiguous base N
cat hs_ref_GRCh37_p13.fa | perl -p -e 's/N\n/N/' | perl -p -e 's/^N+//;s/N+$//;s/N{200,}/\n>split\n/' >hs_ref_GRCh37_p13_split.fa; rm hs_ref_GRCh37_p13.fa

# Filtering sequences
perl ~/bin/prinseq-lite-0.20.4/prinseq-lite.pl -log -verbose -fasta hs_ref_GRCh37_p13_split.fa -min_len 200 -ns_max_p 10 -derep 12345 -out_good hs_ref_GRCh37_p13_split_prinseq -seq_id hs_ref_GRCh37_p13_$

# Creating the database
~/bin/deconseq-standalone-0.4.3/bwa64 index -p hs_ref_GRCh37_p13 -a bwtsw hs_ref_GRCh37_p13_split_prinseq.fasta >bwa.log 2>&1 &
```

I now have 8 files:

-rw-r--r-- 1 moldach mtgraovac         9116 May 12 13:47 hs_ref_GRCh37_p13.amb
-rw-r--r-- 1 moldach mtgraovac        17732 May 12 13:47 hs_ref_GRCh37_p13.ann
-rw-r--r-- 1 moldach mtgraovac   1099384668 May 12 14:23 hs_ref_GRCh37_p13.bwt
-rw-r--r-- 1 moldach mtgraovac    732923081 May 12 13:47 hs_ref_GRCh37_p13.pac
-rw-r--r-- 1 moldach mtgraovac   1099384668 May 12 14:23 hs_ref_GRCh37_p13.rbwt
-rw-r--r-- 1 moldach mtgraovac    732923081 May 12 13:47 hs_ref_GRCh37_p13.rpac
-rw-r--r-- 1 moldach mtgraovac    366461564 May 12 14:35 hs_ref_GRCh37_p13.rsa
-rw-r--r-- 1 moldach mtgraovac    366461564 May 12 14:29 hs_ref_GRCh37_p13.sa


I get the error:

(base) [moldach@synergy deconseq]$ perl /home/moldach/bin/deconseq-standalone-0.4.3/deconseq.pl -f CG00010F_R1.fastq -dbs hsref
ERROR: cannot find all database files for database "hsref" in dir "db/".

Try 'deconseq -h' for more information.
Exit program.


I assume DeconSeq is looking somewhere else for my files, but where?


molda...@gmail.com

unread,
May 12, 2020, 5:15:15 PM5/12/20
to Edwards Lab Tools
Two changes that were required, not entirely clear from the Documentation were the changes that needed to be made to the `DeconSeqConfig.pm` config file.

For the above issue the db name needs to be exactly the same as what was in: `~/bin/deconseq-standalone-0.4.3/bwa64 index -p hs_ref_GRCh37_p13` after the `-p`. So specifically I was missing the release `_p13`. So added it to the file

use constant DBS => {hsref => {name => 'Human Reference GRCh37',  #database name used for d$
                               db => 'hs_ref_GRCh37_p13'},            #database name as def$
                     bact => {name => 'Bacterial genomes',
                              db => 'bactDB'},
                     vir => {name => 'Viral genomes',
                             db => 'virDB'}};


Next was setting PROG_DIR to where I installed the standalone version of DeconSeq:

use constant DB_DIR => 'db/';
use constant TMP_DIR => 'tmp/';
use constant OUTPUT_DIR => 'output/';

use constant PROG_NAME => 'bwa64';  # should be either bwa64 or bwaMAC (based on your syste$
use constant PROG_DIR => '/home/moldach/bin/deconseq-standalone-0.4.3/';      # should be t$



DB_DIR is looking for the 8 files in db within the working directory.


Reply all
Reply to author
Forward
0 new messages