denovo_map.pl error to found files

827 views
Skip to first unread message

Ali Basuony

unread,
Dec 21, 2020, 4:34:48 PM12/21/20
to Stacks

Hi everyone,

I am trying to run the pipeline with denovo_map.pl for paired-end RAD data. so I did demultiplexing and cleaning using process_radtags and clone _filter and then used the output files (Example is highlighted in yellow below) from clone_filter as an inputs for denovo_map.pl.

I'm pretty confused about which output files we should use with denovo pipline. the following are an example for one sample from both process_radtags and clone_filter.

process_radtags,
bob101.1.fq.gz
bob101.2.fq.gz
bob101.rem.1.fq.gz
bob101.rem.2.fq.gz

clone_filter,
bob101.1.1.fq.gz
bob101.2.2.fq.gz
bob101.log
bob101.rem.1.1.fq.gz
bob101.rem.2.2.fq.gz
bob101.rem.log

I am running into the error "Error: Failed to find the first reads file './data_qual/clean_nc/bob101(.1).(fq|fastq|fa|fasta)(.gz)'. However, I am sure the samples names in  popmap.txt are consistant with the files names


This is my denovo_map.pl running:

vulpesvulpes@vulpesvulpes:~/de_novo$ stacks denovo_map.pl -T 2 --samples  ~/data_qual/clean_nc --popmap popmap_mackerel33.txt -o denovo_M4 --paired --rm-pcr-duplicates --min-populations 2 --min-samples-per-pop 0.75 -X "populations: --write-single-snp  --structure --plink" -X "ustacks: -M 4 -N 0 --disable-gapped"

Parsed population map: 33 files in 5 populations and 1 group.
Error: Failed to find the first reads file '/home/vulpesvulpes/data_qual/clean_nc/bob101(.1).(fq|fastq|fa|fasta)(.gz)'.

This might be a small details, but it is my first time to use stacks.


I appraciate your help and suggestions.

Regards,

Ali

Julian Catchen

unread,
Dec 21, 2020, 5:05:18 PM12/21/20
to stacks...@googlegroups.com, Ali Basuony
Hi Ali,

Your population map should specify the post-clone filter samples as
"bob101.1", or you should rename them back.

Also, it depends on your analysis, but it is generally better to not run
clone_filter and instead allow gstacks to remove pcr duplicates (since
it has more information at that point to remove pcr clones). gstacks is
new to stacks v2, while clone_filter has been around longer and was used
before the main pipeline could handle paired-end reads (i.e. stacks v1).

Best,

julian

Ali Basuony wrote on 12/21/20 3:34 PM:

Ali Basuony

unread,
Dec 21, 2020, 5:43:01 PM12/21/20
to Stacks
Hi Julian,

Thanks so much for your help.

I'm using stacks v2. I used the outputs fron process_radtags, and I got the error below. the pipline worked for ustacks, cstacks and sstacks, but stopped at tsv2bam. Attached are the log files.

Command:
vulpesvulpes@vulpesvulpes:~/de_novo$ stacks denovo_map.pl -T 2 --samples  ~/data_qual/clean --popmap popmap_mackerel33.txt -o denovo_M4 --paired --rm-pcr-duplicates --min-populations 2 --min-samples-per-pop 0.75 -X "populations: --write-single-snp  --structure --plink" -X "ustacks: -M 4 -N 0 --disable-gapped"

Error:
Generating catalog...
  /usr/lib/stacks/bin/cstacks -p 2 -M popmap_mackerel33.txt -P denovo_M4

Matching samples to the catalog...
  /usr/lib/stacks/bin/sstacks -P denovo_M4 -p 2 -M popmap_mackerel33.txt

Sorting reads by RAD locus...
  /usr/lib/stacks/bin/tsv2bam -P denovo_M4  -t 2 -M popmap_mackerel33.txt -R /home/vulpesvulpes/data_qual/clean/


denovo_map.pl: Aborted because the last command failed (128); see log file.
-----------------------


I really appreciate your help.

Thanks
denovo_map.log
tsv2bam.log.txt

Julian Catchen

unread,
Dec 21, 2020, 6:18:19 PM12/21/20
to stacks...@googlegroups.com, Ali Basuony
As denovo_map.pl would have printed to the console, and is in the log
file, your data have very low coverage:

Depths of Coverage for Processed Samples:
bob101: 3.63x
bob102: 3.59x
bob103: 3.40x
bob104: 3.37x
bob105: 3.39x
bob106: 3.51x
bob107: 3.44x
bob108: 3.42x
sja109: 3.59x
sja110: 3.56x
sja111: 3.58x
sja112: 3.53x
sja113: 3.56x
sja114: 3.47x
sja115: 3.63x
sco116: 3.59x
sco117: 3.52x
sco118: 3.58x
sco119: 3.41x
sco120: 3.53x
sco121: 3.41x
gal122: 3.40x
gal123: 3.39x
gal124: 3.41x
gal125: 3.47x
gal126: 3.45x
gal127: 3.35x
ctb128: 3.41x
ctb129: 3.37x
ctb130: 3.44x
ctb131: 3.67x
ctb132: 3.41x
ctb133: 3.43x

I would have to look into what is causing that specific error, but it is
almost certainly related to your coverage.

julian

Ali Basuony wrote on 12/21/20 4:43 PM:
> Hi Julian,
>
> Thanks so much for your help.
>
> I'm using stacks v2. I used the outputs fron process_radtags, and I got
> the error below. the pipline worked for ustacks, cstacks and sstacks,
> but stopped at tsv2bam. Attached are the log files.
>
> *Command:*
> vulpesvulpes@vulpesvulpes:~/de_novo$ stacks denovo_map.pl -T 2
> --samples  ~/data_qual/clean --popmap popmap_mackerel33.txt -o denovo_M4
> --paired --rm-pcr-duplicates --min-populations 2 --min-samples-per-pop
> 0.75 -X "populations: --write-single-snp  --structure --plink" -X
> "ustacks: -M 4 -N 0 --disable-gapped"
>
> *Error:*
> <https://urldefense.com/v3/__http://denovo_map.pl__;%21%21DZ3fjg%21tFlZCwGxpjeujtbDUHHJmNVh4CN1VG56OAXGPRtX0Qcf1kbI7Q_GSX_d1S7UcrPJrBU$>
> for paired-end RAD
> > data. so I did demultiplexing and cleaning using process_radtags and
> > clone _filter and then used the output files (Example is
> highlighted in
> > yellow below) from clone_filter as an inputs for denovo_map.pl
> <https://urldefense.com/v3/__http://denovo_map.pl__;%21%21DZ3fjg%21tFlZCwGxpjeujtbDUHHJmNVh4CN1VG56OAXGPRtX0Qcf1kbI7Q_GSX_d1S7UcrPJrBU$>.
>
> >
> > I'm pretty confused about which output files we should use with
> denovo
> > pipline. the following are an example for one sample from both
> > process_radtags and clone_filter.
> >
> > process_radtags,
> > bob101.1.fq.gz
> > bob101.2.fq.gz
> > bob101.rem.1.fq.gz
> > bob101.rem.2.fq.gz
> >
> > clone_filter,
> > bob101.1.1.fq.gz
> > bob101.2.2.fq.gz
> > bob101.log
> > bob101.rem.1.1.fq.gz
> > bob101.rem.2.2.fq.gz
> > bob101.rem.log
> >
> > I am running into the error "Error: Failed to find the first
> reads file
> > './data_qual/clean_nc/bob101(.1).(fq|fastq|fa|fasta)(.gz)'.
> However, I
> > am sure the samples names in  popmap.txt are consistant with the
> files names
> >
> >
> > This is my denovo_map.pl
> <https://urldefense.com/v3/__http://denovo_map.pl__;%21%21DZ3fjg%21tFlZCwGxpjeujtbDUHHJmNVh4CN1VG56OAXGPRtX0Qcf1kbI7Q_GSX_d1S7UcrPJrBU$>
> running:
> >
> > vulpesvulpes@vulpesvulpes:~/de_novo$ stacks denovo_map.pl
> <https://urldefense.com/v3/__http://denovo_map.pl__;%21%21DZ3fjg%21tFlZCwGxpjeujtbDUHHJmNVh4CN1VG56OAXGPRtX0Qcf1kbI7Q_GSX_d1S7UcrPJrBU$>

Ali Basuony

unread,
Dec 22, 2020, 7:50:05 AM12/22/20
to Stacks
Hi Julian,

I tried to run the pipline with another data which was published and worked well with stacks before, but I still have the same problem.  I performed de novo assembly using only Read1 files (example:  adr146.1.fq.gz, adr147.1.fq.gz,........) .
Attached are log files.
Command:
stacks denovo_map.pl -T 2 --samples  ~/data_qual/clean1 --popmap popmap_mackerel.txt -o denovo_M6 --min-populations 2 --min-samples-per-pop 0.75 -X "populations: --write-single-snp  --structure --plink" -X "ustacks: -M 4 -N 0 --disable-gapped"
Error:
Generating catalog...
  /usr/lib/stacks/bin/cstacks -p 2 -M popmap_mackerel.txt -P denovo_M6


Matching samples to the catalog...
  /usr/lib/stacks/bin/sstacks -P denovo_M6 -p 2 -M popmap_mackerel.txt


Sorting reads by RAD locus...
  /usr/lib/stacks/bin/tsv2bam -P denovo_M6  -t 2 -M popmap_mackerel.txt



denovo_map.pl: Aborted because the last command failed (128); see log file.

Thanks,

Ali
denovo_map .log
tsv2bam.log

Nic Bail

unread,
Jul 25, 2022, 10:58:56 PM7/25/22
to Stacks
Hi Ali and Julian,

Did either of you (or anyone else) manage to resolve this tsv2bam issue? I ask because I have a seemingly identical error when I run denovo_map.pl

Julian, you mentioned it could be related to depth of coverage. What would you propose as the best way forward for the data at that point. Does that mean there is no solution beyond going back and sequencing better quality samples?

If anyone is willing to look into the issue more with me, the following are more details about how I encountered the error:

I am also working on ddRAD paired-end data, sequenced through the Australian Genomics Research Facility's GBS service. I ran the process_radtags without any noticeable problems, and then ran the denovo_map.pl on 12 samples to begin exploring parameters. I am using version 2.60. I cannot identify any issues in the log file until it reaches tsv2bam. At tsv2bam it tries to start working on one of the samples (or multiple depending on the number of threads I allow it). It then gives the following error in the denovo_map.pl log but not in the tsv2bam log:

tsv2bam: src/BamI.cc:104: void BamHeader::init_sdict(): Assertion `h_->sdict' failed.

denovo_map.pl: Aborted because the last command failed (128).

I have slightly better coverage than identified in previous posts, but if that's the issue than it could also be too low. 

BEGIN cov_per_sample
# Depths of Coverage for Processed Samples
sample             depth of cov     max cov           number reads incorporated    % readsincorporated
LL2_a               10.45               987                  1896902                                   59.5
LL4                  7.62                 1155               836138                                     58.5
conx5383         7.68                 1052                791316                                     60.1
conx5391         8.50                 402                  1014947                                   59.5
conx5399         8.24                 1508               925542                                     61.8
conx5405_b     9.81                 2143                1538081                                   60.4
conx5418         8.35                 1545                1012460                                   61.8
lorica46            8.30                 1961                966802                                     60.4
lorica52            7.94                 1451                862842                                     58.3
lorica55            8.15                 603                  920150                                     60.7
lorica56            5.92                 756                  375528                                     51.3
lorica60            8.65                 1891                1061188                                   60.4
END cov_per_sample

I have tried running just tsv2bam on the outputs of denovo_map.pl up to this point, and it runs into the same issue (but gives me no error message). I have tried two different values of M and n (1 and 3 for both), requesting 90gb of data, and using fewer samples.

The commands I have used are:

process_radtags -P -p $raw_dir -b $barcodes_file -o ./ -r -c -q  --inline_index --renz_1 pstI --renz_2 mseI

denovo_map.pl --samples $reads_dir --popmap $popmap -o $out_dir -M 1 -n 1 -m 3 -T 8 --paired &> $log_file

Log files are attached.

If anyone has any suggestions, or would like to discuss this further I would be very grateful. Thank you for having read my message.

Kind regards,

Nic Bail
process_radtags.lane1.log
denovo_map.oe
denovo_map.log

Catchen, Julian

unread,
Jul 26, 2022, 2:22:37 PM7/26/22
to stacks...@googlegroups.com

Hi Nic,

 

The code is failing during an ‘assert’ that makes sure a BAM header object was able to be created and initialized (by the HTS library, which handles reading/writing BAM files and isn’t native Stacks code). Hence, this shouldn’t typically fail, perhaps you are not allocating enough memory for the job? You could test by re-running the tsv2bam step, but only including a small number of your samples, or by increasing the amount of memory you specify for the batch job.

 

Best,

 

julian

Nic Bail

unread,
Jul 26, 2022, 7:59:43 PM7/26/22
to Stacks
Hi Julian,

Thank you for getting back to me. I will retest as you suggest and see how it goes.

Thanks again for your time.

Kind regards,
Nic

Nic Bail

unread,
Jul 27, 2022, 12:30:55 AM7/27/22
to Stacks
Hi Julian,

Thanks for your suggestion. I tried running just tsv2bam with just one sample. I attempted once on my Uni's HPC's log-in node (using 'top' to monitor), and I submitted a batch job with 200gb requested. Unfortunately I ran into the same problem both times.

Do you have any other ideas of things to look into? What is the 'h_->sdict' assertion trying to check exactly? Is it something in the .bam file that is meant to be created by tsv2bam? 

Kind regards,
Nic Bail

Nic Bail

unread,
Jul 27, 2022, 12:35:27 AM7/27/22
to Stacks
Sorry, I meant: is the BAM header object found in the .bam file that is meant to be created.

Catchen, Julian

unread,
Aug 9, 2022, 5:16:12 PM8/9/22
to stacks...@googlegroups.com

Hi Nic,

 

I don’t know what could be happening at the moment. If you wanted to give me access to a few samples and the commands you ran that reproduce the error (say via Dropbox) I could try to debug it and see what is happening.

Nic Bail

unread,
Aug 11, 2022, 8:58:20 PM8/11/22
to Stacks
Hi Julian,

I was just putting together these files to send to you when the HPC team at my Uni resolved the issue.

I tried running another dataset (that had no issues in the past) through stacks and I ran into the same problem in tsv2bam. I note that Ali had this issue too. So I figured there must be an issue with the install on the HPC that lead to the problem (possibly the recent transition to containerisation).

They reinstalled it a few different times, which I tested. Finally they installed v 2.62 (rather than 2.60) and it worked. I don't know exactly what the issue was. It could have been a bug fix in the recent update. However, I think it was the way it was installed or containerised stacks on the HPC. I am discussing the issue more with them to see if I can find more of an explanation and I will share it with the stacks community here when I do.

Thank you so much for your time on this Julian. It is great for the community that you are so available to support us. Let me know if there are any specific pieces of information from this problem that you might want to know.

All the best,
Nic Bail

Nic Bail

unread,
Aug 14, 2022, 8:54:43 PM8/14/22
to Stacks
For anyone who has the same issue (now or in the future),

The people who run the High Performance Cluster at my Uni let me know what they did differently to resolve the issue. This is what they said:

"The previous containerised versions of stacks were binary installs (I simply put the pieces together) - packages available using standard Ubuntu "apt-get install ...". The latest version was compiled from source (I built the environment from "scratch")."

Hopefully that helps you resolve the issue faster if you encounter it.

Cheers,
Nic



Reply all
Reply to author
Forward
0 new messages