Error in denovo pipeline tsv2bam step (unrecognized paired-end read name format)

147 views
Skip to first unread message

betsy.s...@gmail.com

unread,
Apr 9, 2024, 12:33:37 PM4/9/24
to Stacks
Good afternoon stacks community, 

I am trying to solve an error I received when running the denovo pipeline, specifically at the tsv2bam step (denovo log attached). 

I know generally that the error has to do with the fact that I am using paired fq files output from the program RADOrgMiner (https://github.com/laczkol/RADOrgMiner), but I am not sure the exact problem or how to fix it. 

In RADOrgMiner I am aligning the output from process radtags for each sample (with all adapter removed and pcr clones removed) to the chloroplast genome for a close relative and then using the unaligned reads (aka the nuclear genome). Very important point - this exact dataset output from process rad tags has worked many times in the denovo pipeline, so I know it is something that is happening in the RADOrgMiner pipeline, I am just not sure what

The text from the portion of the denovo.log is: 

Error: Unrecognized paired-end read name format, at '1041_2_1106_7365_8860'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1218_26964_28463'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1119_23863_17174'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1122_32660_10363'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1358_29197_15452'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1103_25093_15702'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1104_7048_30138'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1105_10502_2300'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1101_12301_13714'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1102_17508_34695'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1131_11668_18192'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1101_11017_31062'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1102_15438_2957'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1101_5547_7686'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1102_26096_13683'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1506_5602_20149'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1315_17318_31485'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1112_12102_12461'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1102_21079_24972'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1112_14724_26897'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1125_20681_28416'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1124_12174_11960'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1419_16288_12258'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1103_12192_31688'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1106_10176_34021'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1309_21206_27445'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_2136_7545_5916'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1348_32696_15436'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1348_5059_23876'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1101_3730_35916'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1321_12608_7200'.

Aborted.

Error: Unrecognized paired-end read name format, at '1041_2_1104_4182_10802'.

Aborted.


I really appreciate any ideas for potential causes and, especially, solutions so that I can hopefully use the files output from RADOrgMiner in the denovo pipeline. 


Thanks, 

Betsy Collins, 

Ph.D. candidate, George Mason University 

denovo_map.log

betsy.s...@gmail.com

unread,
Apr 9, 2024, 12:54:42 PM4/9/24
to Stacks
Could it be as simple as adding the /1 and /2 at the end of the @ header line in the read 1 and read 2 files?! 
I just checked and the /1 and /2 are present in my input fq.gz files but absent from the output fq.gz files from RADOrgMiner. 

Does everything else look ok with the header?  

One example of a read2 sample file output from processradtags and input into RADOrgMiner (they contain the /2 at end)

zcat Collins_022.2.fq.gz | head -n 40 


@1041_2_1101_2618_1078/2

AATTCTATTAACAGAGTATGCATGTCAGCATTTCATTAATGCACTAAAAACTATTCAGCTGATGTATCAGAAATACATTAGAAGAAACTGTTGCAGTCCTATAGATGCTGTTTAATCCTTGCAGACCATCAACATATTGT

+

FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

@1041_2_1101_1669_1094/2

AAGTCCAAATCGACTCAGAATGTTCTCTTATAAAAATCTGATTACTGTTTAGTGATCATTATTTCGACAAGGAATTGCTTCAACCCATTTTGACAAGTAATCAACAGCAACCAGGATGTAAACAAAACCAAATTAAGTTG

+

FF,:FFFFF,:F:FF:,FFFFF:FFFFF,,FFF::FFFF:FFFF,,:F:,,FFF,:,:,FFF:F,F,F:FFF:FF,FFF:F,:F,:F,:,FFFFF,FFF,F::FFF,F,:,FF,:,FFFFFFF:FFF,,FFF,,F,,,:F

@1041_2_1101_25518_1094/2

AATTCTTTGAGATTCTCAAAACTGGAAGATGCTTATTTCGGATGAGCCGAGCCAATAGGATGAACTAAAAAAAAGAGTTCTGCATCATGAACTTTGTATCGCGCACATCGCTTAGATGAGCTCTAGAGGGGCATATAGGA

+

FF,FFFF:FFFFF,FFFFFFF::FFFFFFF,FFFFF:FFF:FFFFF::FFF,FFFFFFFFFFFF:FFFFFFFFFFF,FFFFF:F:FFFFFFF:FF:FFFF:FFFFFF:FFFFFFFFFFFFFFFFFFFF::FFFFFFFFFF

@1041_2_1101_2003_1110/2

AATTCAGCTACTAATGACAAAATAAAACCCATCAAGCAGTTCATCAATGAAATGATTCCTCCTTCACGGCTCTCCTAAGAATTCTATCTAAAACTCCATAAATACCACGACCTAATTTAGTTTAAAAAAGAAGGACAAAC

+

FFF,,F:FFF:F,FFFFFFF,F:FFFF,::FFFFF:FF::FFFFFFF:,FFF,FFF,F,FFFFF,FFF:FFF,F,:FF,,:FFFF,F:F:FFF::,F,FFFF,FF,FF,,FFF,::F,:F,FFFFFFF,,FFF,F:FFFF


One example of a read2 sample file output from RADOrgMiner (no /2 at end of @ header line). 

zcat Collins_022.2.fq.gz | head -n 40 


@1041_2_1101_1063_11475

AATTCTTGGAAGAAGTTTAACAAATAAATTAATTAAATAAAAAATATCTACTGCTAGTTTGTCCCTAAGAAAATCAAACAAAAATTACAGTGATTATTGTTTGGTAAATTAAACCTTGCTTTATCAAAATGTTTAATTTC

+

FF,FFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF:FFFFFFFFFFFF,:F:F:::FFF:FFFFFFFFFFFFFFFF:FFFF:FF:FFFFF:,FFFF,FFFF:,FF,F:F:FF

@1041_2_1101_1063_23156

AAGGCTTTATAACAATATTTTATTCAGATCGTGGTGACAATTTATAAATGAAAGTCAACCAACTTTTTTTATCCATCCAAGATAAGAATAACAAAACTTGGTTGTAGGAGTCAATATTTAAAATTTTTTTATTTTTTTTA

+

FF:::FFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF,FF,FFFFFFF:FF:,:F:FFFFFFFFFFFFFFFFFFF:

@1041_2_1101_1099_22842

AAGGCTTGTAGAGTATTTTGAGCTTATTTCTCTACTTTGGCCGAATCTTCTTGTTCATTGTTATGGGCTATTTTCTTGCAATTTTGTGATGAAAATATGTTGAAATAGATCAAAATCATTCATACATTAGCTAGATGCAT

+

FF,:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF,FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:,FFFFF:FFFFFFFF,FFFF,FF:FFFFFFFFFFFFFFFFFFFFF:FFFFFF

@1041_2_1101_1118_30984

AAGGCAGAAAACAGAAACACTTTCAAAAATCACCGAAGACACTTCCGTTACCCGATGGAGAAGCGGCTACGTTCATCACTACAATCTTCAGCCGAAGAATTCCTAATCTTGGCCACAAATCAATCCCTCAAATCATCCAA

+

FF:FFFF:FF:FFF:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF::FFF,FFFFFFFFF,:FFFFFFF


I am going to try the solution in this stacks thread here: 
If anyone has another solution I am open to it! 

Thanks
Betsy Collins
Ph.D. candidate, George Mason University

Catchen, Julian

unread,
Apr 9, 2024, 2:12:26 PM4/9/24
to stacks...@googlegroups.com

Hi Betsy,

 

The tsv2bam program needs to match the single-end reads used to build loci in the first half of the pipeline with the paired-end reads (so that gstacks can build a paired-end contig from them). The read IDs have to match between the pairs of reads and tsv2bam does expect the read to end in either “/2” or “_2”.

 

Best,

Julian

Hannah Llames

unread,
Dec 30, 2024, 12:08:09 PM12/30/24
to Stacks
Hi Julian,

I am experiencing the same error as Betsy and have also tried the solution posted here: https://groups.google.com/u/1/g/stacks-users/c/qvtQhPhXgT4/m/Mc5j-tVNCQAJ

However, I still receive the error message "Failed to find any matching paired-end reads."

I have a different set of RADseq data from a different sequencer (Novaseq data), and the solution worked for that. However, the data from the HiSeq sequencer fails to complete the run due to the error I mentioned earlier. I would like to ask if there are any other solutions so that I can proceed with the pipeline.

I am also sending the head of some of my sample *fq files after processing with process_radtags:



==> BOL_C1_F_17.27.1.fq <==
@6_1101_2625_1191/1
AATTCGTTGTATAAATCAGATTTTTTTCTATTTCAATCATGTTTTTTTTTTTAATGGAAATCATGGGTTTTATCGAGGTTCTAATCAATGTGATTTAAATCAAGCAACCCTAATTCTGGA
+
JJJJFJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFJJFJJJAJJJJJFJJJJJJAF-F-<AFJ<7-7A7<FFJ7-7----FFJAFFA77-FFAAFJJFFA<<JAFJJFJAJ<AAAJJJA
@6_1101_11718_1191/1
AATTCATGATCTGGGAGACATTTTGTTATTAAAATCCTTTTAAATGAGGTGAAACTACAAATAAGATTGTATATCCATAGGTGTTTAAACAACGTATTTTATTTCAAACAAAATCAACAC
+
JJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJFJJJJAFFJJFFJFFJ7FJJJJJJJJFFJFJAJJJFJJJJJJJJJJJJJJJJ
@6_1101_20669_1191/1
AATTCGGCTGTCGGGGCAGGACAGATGGCAAACCATTAGACACTGCATACACCTGAACAACTTCTCGTCCGTTTTGTAAGGAAAACTAGCAAAGCAAAATCAAGAAGATCGTCCGAGATC

==> BOL_C1_F_17.27.2.fq <==
@6_1101_2625_1191/2
CGGAGACAAAAATGCCGAAGCTTCGAAGAAACTATGAACGAAGCAAGTACTATGCAACTCAATGAAAAGGAAGGAGCAGAAAAGCCACATTGCTTCAGAGAAAATACTAGAAAAAAGGTA
+
#AAA<J-FJ-J---<--7777<<F<F--<JJAJJ<7AF-AFJ7AJJ--<A-<7FF7AFF--7--7<<-F<7A-)7<)7)A-A-7A<FJ)-7-7--7-------7--7--7A---7-77--
@6_1101_11718_1191/2
CGGAAAAGAGTTTTGGCCAGACCTAATAACCTATTGGTCAGAAGCTGAGCCGTAAAACCCCATATAACTAATAAAGTAAGCTTTTACTCGTATACCGCTATCCCTAACTGTGGTTGAAGA
+
#AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJFJJJ7AFFJJJJJJFJJJJFFJJJFAJJJJFF-A7AJJAF-)7A<AF--7--AF-)---<7<F
@6_1101_20669_1191/2
CGGACGATCTTCTTGATTTTGCTTTGCTAGTTTTCCTTACAAAACGGACGAGAAGTTGTTCAGGTGTATGCAGTGTCTAATGGTTTGCTATCTGTCCTGCCCCGTCAGCCAAATTTGGTT

I hope you can help me. Thank you!

Regards,
Hannah Llames
MS Marine Science Student, University of the Philippines

Catchen, Julian

unread,
Dec 30, 2024, 12:10:04 PM12/30/24
to stacks...@googlegroups.com

Hi Hannah,

 

Please supply the commands you executed to analyze your RADseq data up until the error, the popmap you used (if any), etc.

 

Cheers,

 

Julian

Hannah Llames

unread,
Jan 4, 2025, 9:36:36 PM1/4/25
to Stacks
Hi Julian,

I have been using the denovo pipeline, but I run it individually and separately since I have multiple samples and can run it efficiently. Everything is working, and I followed the manual up until the tsv2bam step using the PE reads, for which I used the commands below. This prompted the error:

# Working directory
wdir="/home/projects/dscf/jsllames/ddrad_lnk/re-run/04_ustacks"  ## FIXME: SET CORRECT PATH TO WDIR

# Population map path
popmap="/home/projects/dscf/jsllames/ddrad_lnk/re-run/04_ustacks/popmap3.txt"

# PE reads directory
sample="/home/projects/dscf/jsllames/ddrad_lnk/re-run/02_ln_cleaned120bp/trial" ## set PE reads directory

# Define the path to the tsv2bam executable
stacks="hpc joshuagad/stacks-custom tsv2bam"

# Run stacks (tsv2bam) for each pair of reads in the input directory
${stacks} -P ${wdir} -M ${popmap} -t 16

I am also attaching the population map as requested.

Thank you,
Hannah
popmap3.txt

Catchen, Julian

unread,
Jan 9, 2025, 4:02:13 PM1/9/25
to stacks...@googlegroups.com

Hi Hannah,

 

It doesn’t look to me like you are supplying the path to the paired-end reads to tsv2bam. I see you define the path in the $sample variable, but I don’t see you using the $sample variable in the command that actually runs tsv2bam. See: https://catchenlab.life.illinois.edu/stacks/manual/#denovobyhand

 

Best,

 

Julian

 

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/stacks-users/55d0d5af-916b-4a2c-b2d0-4eb612430868n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages