Catalogue construction failed on denovo_map.pl

1,037 views
Skip to first unread message

bander...@gmail.com

unread,
Apr 15, 2015, 7:59:19 AM4/15/15
to stacks...@googlegroups.com
Hi all,

I have been able to get Stacks to run through my dataset, but only by doing part with the wrapper (denovo_map.pl) and then the rest manually due to a problem building the catalogue. Here is my command to execute the wrapper:

names=`cut -f 2 $BARCODE_FILE`
list=""
for sample in $names
do
  list+="-s ./samples/${sample}.fq.gz ";
done
aprun -n 1 -d 20 denovo_map.pl -T 20 -S -m 5 -b 1 -n 2 -t -o ./stacks $list

The aprun is to launch the job onto a supercomputer (where Stacks is compiled and denovo_map.pl is in the path).
The job runs ustacks fine (a few warnings due to length differences... I will have to re-process samples and trim), but dies when trying to build the catalogue. If I run cstacks manually on the output from ustacks produced using denovo_map.pl, I can build the catalogue. I'm wondering if an extra whitespace is the problem. Here is the end of the command denovo_map.pl gave to cstacks:

... -s ./stacks/Tw-Pyr21 -s ./stacks/Tw-YFlat06 -s ./stacks/Tw-MtRob32 -s ./stacks/Tw-Rocklea26 -s ./stacks/Tw-Weeli03  -p 20 -n 2 2>&1
Catalog construction failed.

There is an extra space between the last sample and "-p 20". I don't know if this is causing the catalog construction failure. When I execute it by hand I put the list of samples at the end of the command, and it works.

Any thoughts?

Cheers,

Ben

Julian Catchen

unread,
Apr 20, 2015, 5:44:35 PM4/20/15
to stacks...@googlegroups.com, bander...@gmail.com
Hi Ben,

Whitespace in the command is not your problem. The shell will handle
that just fine. Cstacks is failing and the denovo_map.pl wrapper is
picking that up and halting ("Catalog construction failed"). You should
look at the denovo_map.log file to see what the error is. Make sure you
are specifying the input and output paths correctly.

But, you will need to trim and and reprocess your files for ustacks
first, the error may have been caused by ustacks outputting bad data
when the lengths of reads were different in a single individual.

Best,

julian

bander...@gmail.com

unread,
Apr 22, 2015, 5:23:37 AM4/22/15
to stacks...@googlegroups.com, bander...@gmail.com, jcat...@illinois.edu
Thanks Julian, that was the problem. After I ran process_radtags again and trimmed my reads to the same length, it was able to run through the pipeline without a problem. I checked the log and it failed when it was unable to load one of the samples (I assume that was because of the variable read lengths). Cheers!

Vanessa Robitzch

unread,
Dec 2, 2015, 5:11:13 AM12/2/15
to Stacks, bander...@gmail.com, jcat...@illinois.edu
Hi Ben, I was wondering, how did you trim your reads to the same length? I am having similar troubles in building the catalog. But ALSO, I checked the denovo_map.log file and for some reason it is incomplete (ver. stacks 1.35). It just stops in the middle of the file, as if it would have stopped recording the process. I wonder why? could it be due to too many samples? I know for a fact that all 160 samples went through ustacks but the log file is only showing me the info up to sample 92 and that also incomplete. Any thoughts? thanks a lot! Below the last sample log: Identifying unique stacks; 

""file  92 of 160 [MAG_010108] /usr/local/bin/ustacks -t fastq -f ./MAG_010108.fq -o ./denovo -i 92 -m 3 -M 3 -p 15 -d -r 2>&1 Min depth of coverage to create a stack: 3 Max distance allowed between stacks: 3 Max distance allowed to align secondary reads: 5 Max number of stacks allowed per de novo locus: 3 Deleveraging algorithm: enabled Removal algorithm: enabled Model type: SNP Alpha significance level for model: 0.05 Parsing ./MAG_010108.fq Loaded 2706244 RAD-Tags; inserted 636044 elements into the RAD-Tags hash map.   0 reads contained uncalled nucleotides that were modified.   Mean coverage depth is 11; Std Dev: 15.87 Max: 2659 Coverage mean: 11; stdev: 15.87 Deleveraging trigger: 27; Removal trigger: 43 Calculating distance for removing repetitive stacks.   Distance allowed between stacks: 1   Using a k-mer length of 47   Number of kmers per sequence: 50   Minimum number of k-mers to define a match: 3 Removing r""

Vanessa Robitzch

unread,
Dec 3, 2015, 4:11:29 AM12/3/15
to Stacks, bander...@gmail.com, jcat...@illinois.edu
Ok, it seems that I just ran out of space. But is that also why the .log file is corrupted? And do I really need input sequences of the same length? Or would denovo_map.pl run without troubles with the default process_radtags settings?

thanks a lot,

Vanessa

Julian Catchen

unread,
Dec 3, 2015, 11:31:38 AM12/3/15
to stacks...@googlegroups.com, vanessa...@gmail.com
Hi Vanessa,

Yes, you need sequences of the same length. The pipeline is trying to
identify the same SNPs consistently across individuals in your set of
populations. If you have variable length reads, you will simply add a
lot of poorly covered SNP calls and higher error rates associated with them.

If you received quality trimmed reads from your sequencing core, you
should ask for the original, unmodified files.

Otherwise, you can specify a final length of the reads to the
process_radtags program, which will trim reads to that length and
discard reads already shorter than that length.

Best,

julian
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__denovo-5Fmap.pl&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=OHZyL_ubmcwYRt0ZxWvKbXVTs6ZOE0LSReUhnCKkaMk&m=iWCuZpVNhPrEmsanrsi2wsr5da_qgJM13q9-ZBzeVqc&s=N0wdtZ5sUHkfgXbICGdh5f7HtI0jmecJv_YEfmR22yg&e=>
> wrapper is
> picking that up and halting ("Catalog construction failed").
> You should
> look at the denovo_map.log file to see what the error is.
> Make sure you
> are specifying the input and output paths correctly.
>
> But, you will need to trim and and reprocess your files for
> ustacks
> first, the error may have been caused by ustacks outputting
> bad data
> when the lengths of reads were different in a single
> individual.
>
> Best,
>
> julian
>
> bander...@gmail.com wrote:
> > Hi all,
> >
> > I have been able to get Stacks to run through my dataset,
> but only by
> > doing part with the wrapper (denovo_map.pl
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__denovo-5Fmap.pl&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=OHZyL_ubmcwYRt0ZxWvKbXVTs6ZOE0LSReUhnCKkaMk&m=iWCuZpVNhPrEmsanrsi2wsr5da_qgJM13q9-ZBzeVqc&s=N0wdtZ5sUHkfgXbICGdh5f7HtI0jmecJv_YEfmR22yg&e=>)
> and then the rest manually
> > due to a problem building the catalogue. Here is my
> command to execute
> > the wrapper:
> >
> > names=`cut -f 2 $BARCODE_FILE`
> > list=""
> > for sample in $names
> > do
> > list+="-s ./samples/${sample}.fq.gz ";
> > done
> > aprun -n 1 -d 20 denovo_map.pl
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__denovo-5Fmap.pl&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=OHZyL_ubmcwYRt0ZxWvKbXVTs6ZOE0LSReUhnCKkaMk&m=iWCuZpVNhPrEmsanrsi2wsr5da_qgJM13q9-ZBzeVqc&s=N0wdtZ5sUHkfgXbICGdh5f7HtI0jmecJv_YEfmR22yg&e=>
> -T 20 -S -m 5 -b 1 -n 2 -t -o ./stacks $list
> >
> > The aprun is to launch the job onto a supercomputer
> (where Stacks is
> > compiled and denovo_map.pl
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__denovo-5Fmap.pl&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=OHZyL_ubmcwYRt0ZxWvKbXVTs6ZOE0LSReUhnCKkaMk&m=iWCuZpVNhPrEmsanrsi2wsr5da_qgJM13q9-ZBzeVqc&s=N0wdtZ5sUHkfgXbICGdh5f7HtI0jmecJv_YEfmR22yg&e=>
> is in the path).
> > The job runs ustacks fine (a few warnings due to length
> differences... I
> > will have to re-process samples and trim), but dies when
> trying to build
> > the catalogue. If I run cstacks manually on the output
> from ustacks
> > produced using denovo_map.pl
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__denovo-5Fmap.pl&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=OHZyL_ubmcwYRt0ZxWvKbXVTs6ZOE0LSReUhnCKkaMk&m=iWCuZpVNhPrEmsanrsi2wsr5da_qgJM13q9-ZBzeVqc&s=N0wdtZ5sUHkfgXbICGdh5f7HtI0jmecJv_YEfmR22yg&e=>,
> I can build the catalogue. I'm wondering
> > if an extra whitespace is the problem. Here is the end of
> the command
> > denovo_map.pl
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__denovo-5Fmap.pl&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=OHZyL_ubmcwYRt0ZxWvKbXVTs6ZOE0LSReUhnCKkaMk&m=iWCuZpVNhPrEmsanrsi2wsr5da_qgJM13q9-ZBzeVqc&s=N0wdtZ5sUHkfgXbICGdh5f7HtI0jmecJv_YEfmR22yg&e=>
> gave to cstacks:
> >
> > ... -s ./stacks/Tw-Pyr21 -s ./stacks/Tw-YFlat06 -s
> ./stacks/Tw-MtRob32
> > -s ./stacks/Tw-Rocklea26 -s ./stacks/Tw-Weeli03 -p 20 -n
> 2 2>&1
> > Catalog construction failed.
> >
> > There is an extra space between the last sample and "-p
> 20". I don't
> > know if this is causing the catalog construction failure.
> When I execute
> > it by hand I put the list of samples at the end of the
> command, and it
> > works.
> >
> > Any thoughts?
> >
> > Cheers,
> >
> > Ben
>
> --
> Stacks website: http://catchenlab.life.illinois.edu/stacks/
> ---
> You received this message because you are subscribed to the Google
> Groups "Stacks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to stacks-users...@googlegroups.com
> <mailto:stacks-users...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/stacks-users
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__groups.google.com_group_stacks-2Dusers&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=OHZyL_ubmcwYRt0ZxWvKbXVTs6ZOE0LSReUhnCKkaMk&m=iWCuZpVNhPrEmsanrsi2wsr5da_qgJM13q9-ZBzeVqc&s=mcoy6UMtvGsqHYvM29uhnP_TlehmfbQqYeLoui2waTQ&e=>.
> For more options, visit https://groups.google.com/d/optout
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_optout&d=BQMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=OHZyL_ubmcwYRt0ZxWvKbXVTs6ZOE0LSReUhnCKkaMk&m=iWCuZpVNhPrEmsanrsi2wsr5da_qgJM13q9-ZBzeVqc&s=FhriRooi5TsTBDrqMuQZgzcNjmG7rdZttuQ-Vian3Ls&e=>.

--
Julian M Catchen, Ph.D.
Assistant Professor
Department of Animal Biology
University of Illinois, Urbana-Champaign
--
jcat...@illinois.edu; @jcatchen

IlaC

unread,
Feb 14, 2017, 6:00:33 AM2/14/17
to Stacks

Dear all,

i am encountering this same problem, even after trimming (-t 100, in process_radtags).
I am analyzing single-disgested, paired end data. I run process_radtags (with the -t flag), the concatenating R1 and R2, and then denovo_map.pl.

I get this for every sample:

 Identifying unique stacks; file   1 of 180 [MD-AD07-01]
/usr/local/bin/ustacks -t fastq -f ./conc_trimmed/MD-AD07-01.fq -o ./denovo_default -i 1 -r -m 3  2>&1
ustacks paramters selected:

  Min depth of coverage to create a stack: 3
  Max distance allowed between stacks: 2
  Max distance allowed to align secondary reads: 4

  Max number of stacks allowed per de novo locus: 3
  Deleveraging algorithm: disabled

  Removal algorithm: enabled
  Model type: SNP
  Alpha significance level for model: 0.05
  Gapped alignments: disabled
Parsing ./conc_trimmed/MD-AD07-01.fq
Loaded 0 RAD-Tags; inserted 0 elements into the RAD-Tags hash map.
Error: Unable to load data from './conc_trimmed/MD-AD07-01.fq'.


This is the command:

 denovo_map.pl -b 1 -o ./denovo_default/ -O ./concatenated/popmap_sort  --samples ./conc_trimmed -m 3 -S


Should I trim more?

Any ideas?

Thanks

Ilaria


Reply all
Reply to author
Forward
0 new messages