Error: Failed to find any matching paired-end reads

Dani Dols

unread,

Dec 9, 2019, 4:53:48 AM12/9/19

to Stacks

Hello everyone!

I am currently working on some ddRAD data that I got recently and I'm a bit of a hatchling regarding Stacks.

I've been following the workflow proposed by Paris et al (2017) and Rochette & Catchen (2017) to run the denovo pipeline. Process_radtags went apparently smoothly but during the execution of the denovo_map.pl problems arose. Especifically, this one "Error: Failed to find any matching paired-end reads".

As the aforementioned error occured I was testing out different values for M and n parameters to assess the optimal parametrization for my data. I noticed that only the *.2.fq.gz files yielded that error, but the files that prompted the error weren't always the same and varied between different iterations of M and n (i.e. in M=n=1 the amount of error-prompting files was different from M=n=2). I've checked some of the failing fq.gz files using simple UNIX commands and they look 'normal' to me, but I'm not much of an expert and I might be missing out on something. Here's a short example of a zcat thefile.2.fq.gz | head -n 10 of one of the failing files and its *.1.fq.gz counterpart after process_radtags

#thefile.1.fq.gz

@6_1101_12286_1156/1/1
CTAGCGAAGCAGAATTTAAACCAGTTGATATATCACAGAGTGGAATATACCAAACGAAAGATCTGGAACGATCCAGAGAACATCTATTATTCCCAGCACAAAGGCAAGCCATACTATCCAACATTGTAACTATAA
+
JJJJJJJJJJJJJJFJJJJJJJJJ-FAFJJJJJJJJJJJFJJJJJAFJJJJJJJJJJJJJJFJJJJJJFAJFFJJFAJJJFJFFJJJJFJFJJJJJJJJJJJFJJFJJJJJJJJJJ<FJFJJJFJJJJJJJJFFJ
@6_1101_16589_1156/1/1
CTAGCTCAGATCAGTCACAGATTAGGTTTAGTGAAACTCCATGAAAAGTCTGCAATTCTAAAAAAACAAAATGCAATATTTAGAACGGTACTTACTTATTAACGCCACTGCAGCAGAAGTTGCGCAAATCTTGTG
+
JJJJJJJJJJJJJJJJJJJJJJJJJJFFJFJJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFJJ<FJJFJJJJJJJJFJJJJAJJ<J-FJF<AFFJFJJJJFFFFF-A--F
@6_1101_19796_1156/1/1
CTAGCACCTTCCACATACATACTTCTAAGACCTACCGTTCTTAAACCTTTAACACACACACACACCAACCAACGATCACCGATCCGCACCGATCAAAACCGATCCGCACTGATCCGCACCGATCCATACTAGTAC

#thefile.2.fq.gz

@6_1101_12286_1156/2/2
AATTCCTACTAATCTCTTTATCTTGTGGATATCTTTTGAGTGTCCATGTAACACCATTGTCTTCGCTAAAAACGGTAAAAATCCAATTGCCATTTTCTAGTTGATTTTTCTTTCTCTTTTATAAATATCGAAACA
+
FAJFJJJJJJFJJJJF7JFFJJJFFAFJ-FJ--FJJ<7-AJJFJF<F-F<FFAJ<<FJJ-<FJF-A7FJA-7FJF77AAA<7<J7A<FJ<AJ-FA<A-AFF77AJ-F-<7FJJFJAAA<-AJ-AAJJJAF<F-7<
@6_1101_16589_1156/2/2
AATTCTGATTCACTCAAACCAGTTTGTCTTTTATAAGTTCCGTTTTGATGATTACCTGTTTTGGAAATTACATTTCCAAAACTTACGTTTACCAAAATAAAGTTTCACGTCATGGGAATTGGTACATTTTAGTAG
+
JJJJJJJJJJJJJJJFJJJJJJ<<AFFFJJJJJFJJJJFJJJJJJAJJJJJJJJJJJJ<AJJJJJJJJJJJJJJJJJJJJJJJAJJ<AJJJJJJJJJJJJJJJJFJJJFJJAFJJFJJJF<<--FJJJJJ7FFJF
@6_1101_19796_1156/2/2
AATTCATCCTTCTTTACTACGATTGCAGTATTAATTGCATGCTTCATGTTTAGCATATTCTTTTCACTGTGAAAAAAATTTGTTATATTTAATAAAATTTTAAATAGTACTAGTATGGATCGGTGCGGATCAGTG

Also, I'll add you the command I used to run the denovo_map.pl

/users/ddols/programes/stacks-2.41/scripts/denovo_map.pl -M 2 -n 2 -T 8 -X "ustacks: -m 3" --samples /users/ddols/top/tests.denovo -O /users/ddols/top/info/popmap.test_samples.tsv -o /users/ddols/top/tests.denovo/stacks.M2 --paired

Any help will be appreciated:)

Catchen, Julian

unread,

Dec 9, 2019, 12:47:32 PM12/9/19

to stacks...@googlegroups.com, Dani Dols

Hi Dani,

The problem is that your FASTQ headers have a "/1/1" and /2/2" suffix.
Stacks expects one of those suffixes, not two, so it strips off the /1
or /2 suffix and expects the rest of the FASTQ header to match so it can
tell which paired-end reads go with which single-end reads.

You need to remove one of those suffixes. You could do that with sed on
UNIX. Or, you could reprocess the raw data only one time with
process_radtags (or whatever program added them) so they don't appear twice.

julian

Dani Dols wrote on 12/9/19 3:53 AM:

> --
> Stacks website: http://catchenlab.life.illinois.edu/stacks/
> ---
> You received this message because you are subscribed to the Google
> Groups "Stacks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to stacks-users...@googlegroups.com
> <mailto:stacks-users...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/stacks-users/65303997-1604-402a-b4ba-4e41952b87dd%40googlegroups.com
> <https://groups.google.com/d/msgid/stacks-users/65303997-1604-402a-b4ba-4e41952b87dd%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Julian M Catchen, Ph.D.
Assistant Professor
Department of Evolution, Ecology, and Behavior
Carl R. Woese Institute for Genomic Biology
University of Illinois, Urbana-Champaign
--
jcat...@illinois.edu; @jcatchen

Dani Dols

unread,

Dec 10, 2019, 5:32:53 AM12/10/19

to Stacks

Thank you very much, Julian

Your insights were very helpful. I've come up with the following command to do as you suggested

% zcat thefile.n.fq.gz | sed '/^@/ s/.\{2\}$//' | gzip > thefile.n.fq.gz

This way I can remove the last two characters from all lines starting with @ but I can't quite figure out how to loop it so it can be run as a .sh and be applied to all the files in a directory.

Could you give me any clues or suggestions in that regard?

Cheers!

Dani

Dani Dols

unread,

Dec 10, 2019, 5:44:46 AM12/10/19

to Stacks

Could this work or is it too naïve?

% for filename in *.fq.gz

% do

% zcat $filename | sed '/^@/ s/.\{2\}//' | gzip > $filename.fq.gz

% done

Dani

Catchen, Julian

unread,

Dec 10, 2019, 4:26:43 PM12/10/19

to stacks...@googlegroups.com, Dani Dols

Hi Dani,

I would use this regular expression with sed:

sed -E 's/\/1$//'

Test it with a piece of the file:

zcat testfile.fq.gz | head -n 24 | sed -E 's/\/1$//'

(I don't need to match on the '@' symbol because if sed doesn't match
the pattern, it won't change the line.

Then you could incorporate it into a loop like this:

ls -1 *.1.fq | while read line; do cat $line | sed -E 's/\/1$//'; done

Or, if compressed (all on one line):

ls -1 *.1.fq.gz |
while read line; do zcat $line | sed -E 's/\/1$//'; done

Then, capturing the output into a renamed file would look like this (all
on one line):

ls -1 *.1.fq.gz |
sed -E 's/\.1\.fq\.gz//' |
while read line; do zcat ${line}.1.fq.gz |
sed -E 's/\/1$//' > ${line}.fixed.1.fq.gz; done

(repeat for the *.2.fq.gz files, swapping '2' for '1' in the regular
expression pattern).

Best,

julian

Dani Dols wrote on 12/10/19 4:32 AM:

Dani Dols

unread,

Dec 11, 2019, 5:50:08 AM12/11/19

to Stacks

Very helpful!

Thank you very much:)

Cheers,

Dani

Reply all

Reply to author

Forward