Error: different sequence lengths (Stacks v2.60)

274 views
Skip to first unread message

agathe leveque

unread,
Mar 3, 2022, 8:19:43 AM3/3/22
to Stacks

Dear Julian, Nicolas, and Stacks users,

I’m having trouble running denovo_map.pl on my ddRADseq samples (NovaSeq 6000 Sequencing), and I hope you could help me.

I have sequenced several individuals (split into 4 pools) marked with combinations of inline-inline barcodes (barcodes of different lengths 5 to 8bp on one side and 5 to 7bp on the other side).

I demultiplex my data using process_radtags without noticeable problems using:
mod="inline_inline"
for i in 1 2 3 4
do
nohup process_radtags -1 agrion${i}_R1.fastq.gz -2 agrion${i}_R2.fastq.gz -b barcode/2-barcode_stacks_agrion${i}.txt -o test_demult_NO_BACKUP/${mod} --renz_1 pstI --renz_2 mspI --${mod} -c -q -r > 2-barcode_agrion${i}_${mod}.trace 2>&1 && mv test_demult_NO_BACKUP/${mod}/process_radtags.log test_demult_NO_BACKUP/${mod}/2-agrion${i}-${mod}-process_radtags.log&
done

I then proceed to run denovo_map.p (keeping separate R1 and R2 files) and I get an error in the log file:

ustacks

==========

Sample 1 of 20 'Cam1-13-m20'

----------

/eep/softwares/miniconda/envs/stacks-2.60/bin/ustacks -t gzfastq -f /eep/data1/genomic/reads/agrion-ddRADseq/demultiplexage/inline_inline/Cam1-13-m20.1.fq.gz -o /home/agathe.leveque/working_dir/Agrion-ddRADseq/Results_data/denovo/denovo.test -i 1 --name Cam1-13-m20 -p 10 -M 1

ustacks parameters selected:

  Input file: '/eep/data1/genomic/reads/agrion-ddRADseq/demultiplexage/inline_inline/Cam1-13-m20.1.fq.gz'

  Sample ID: 1

  Min depth of coverage to create a stack (m): 3

  Repeat removal algorithm: enabled

  Max distance allowed between stacks (M): 1

  Max distance allowed to align secondary reads: 3

  Max number of stacks allowed per de novo locus: 3

  Deleveraging algorithm: disabled

  Gapped assembly: enabled

  Minimum alignment length: 0.8

  Model type: SNP

  Alpha significance level for model: 0.05

Loading RAD-Tags...1M...2M...3M...4M...5M...6M...7M...8M...9M...10M...11M...12M...13M...14M...15M...16M...17M...18M...

Error: different sequence lengths detected, this will interfere with Stacks algorithms, trim reads to uniform length (override this check with --force-diff-len).

denovo_map.pl: Aborted because the last command failed (1).

So, I first looked at the Fast QC files for the raw dataset (1per pool) and it showed that both the R1 & R2 read lengths had different lengths (ex : R1 :35-251 and 35-251)

It seems that after process_radtags, the R1 files get shortened to 120, and the R2 files this variability remain and maybe exacerbate with the difference in barcode size

I also check the length of my sequences before and after demultiplexing using: zcat file.fq.gz | sed -n '2~4p' | awk '{print length($0)}' | sort -n | uniq -c and it seems that these variations are found both before and after the demultiplexing.

 I assume if I force the different lengths it will cause issues for my downstream analysis and SNP calling (which I have seen, you mention in other Stacks threads).

Do stacks allow a certain threshold of variability between read lengths? Should I trim my raw data (and lost information) to be able to run stacks? Or use the trimming parameter -t in process_radtags?

Many thanks and I apologize if this question has been asked before.

Best,

Agathe

paige...@ucsb.edu

unread,
Apr 7, 2022, 4:32:53 PM4/7/22
to Stacks
Hello Agathe,
I can't answer all of your questions as I'm no expert on stacks2, but I had the exact problem you are discussing with process_radtags earlier this year.  I spent some time trying to fix the problem.  I tried running using the -t option, but it actually did not lead to all the reads being the same length (I'm not sure what happened), some were still 1 or 2 bases different and when I tried to run the de novo pipeline with the process_radtags output using --force-diff-len, it would cause an error and not complete. I could not find any work around and did not get a timely answer on this group (stopped waiting after a week).  So, I used BBTools and trimmed them all to the same length (after completing the process_radtags step), which meant just losing at most 5 bases per read on the 3' end..  That worked great for me.  Just letting you know, in case it can save you some time!

Best,
Reply all
Reply to author
Forward
0 new messages