paired-end alignment and rem files for ddRAD analysis in Stacks 2.68

4 views
Skip to first unread message

Lucrezia Latini

unread,
Oct 17, 2025, 5:28:09 PM (12 hours ago) Oct 17
to Stacks

Dear Dr. Catchen and group members,

I am new to bioinformatics and would greatly appreciate your guidance on analyzing my ddRAD dataset using Stacks 2.68.

Here is a summary of the tests performed so far for the reference genome alignment, after running process.radtags on paired-end data with inline_null.

  • First attempt: I concatenated all four files per sample (.1, .2, rem.1, rem.2) into a single FASTQ. The alignment worked well (~99.9% of reads mapped), but all reads were treated as single-end by BWA, without preserving any pairing information.

  • Second attempt: I concatenated .1 with rem.1 and .2 with rem.2, keeping R1 and R2 separate. The resulting BAM files were paired-end with still over 90% mapping, but only about 60% of the reads were correctly paired, maybe suggesting that the rem files are not synchronized between forward and reverse reads.

  • Third attempt: we used only the .1 and .2 files, without concatenating the rem files. The percentage of correctly paired reads remained similar overall, although it varied from sample to sample, reaching 80% in some cases.

Based on these results, I would like to ask your opinion on a few points:

  • Should I still concatenate all four files together into one? Although it seems to me that the latest protocols do not perform this step and that the two files, R1 and R2, are handled separately.

  • Should rem files be included in the creation of loci in Stacks 2 (v2.68), or is it preferable to work only with .1 and .2 files, ensuring correct pairing even if it means discarding rem reads?

  • Finally, should we be concerned about the relatively low percentage of properly paired reads (60–80%), or is this acceptable to proceed with downstream analyses in Stacks?

I'd be very grateful for any advice you can provide. I just want to make sure I'm processing my paired-end data correctly before proceeding with the full dataset.

Thank you so much for your time and for your continued support of the Stacks community.

Best regards,

Lucrezia


Catchen, Julian

unread,
Oct 17, 2025, 5:43:37 PM (12 hours ago) Oct 17
to stacks...@googlegroups.com
Hi Lucrezia,

Some answers below.

Julian

From: stacks...@googlegroups.com <stacks...@googlegroups.com> on behalf of Lucrezia Latini <lucrezia...@gmail.com>
Date: Friday, October 17, 2025 at 4:28 PM
To: Stacks <stacks...@googlegroups.com>
Subject: [stacks] paired-end alignment and rem files for ddRAD analysis in Stacks 2.68

Dear Dr. Catchen and group members,

I am new to bioinformatics and would greatly appreciate your guidance on analyzing my ddRAD dataset using Stacks 2.68.

Here is a summary of the tests performed so far for the reference genome alignment, after running process.radtags on paired-end data with inline_null.

  • First attempt: I concatenated all four files per sample (.1, .2, rem.1, rem.2) into a single FASTQ. The alignment worked well (~99.9% of reads mapped), but all reads were treated as single-end by BWA, without preserving any pairing information.

  • Second attempt: I concatenated .1 with rem.1 and .2 with rem.2, keeping R1 and R2 separate. The resulting BAM files were paired-end with still over 90% mapping, but only about 60% of the reads were correctly paired, maybe suggesting that the rem files are not synchronized between forward and reverse reads.

  • Third attempt: we used only the .1 and .2 files, without concatenating the rem files. The percentage of correctly paired reads remained similar overall, although it varied from sample to sample, reaching 80% in some cases.

Based on these results, I would like to ask your opinion on a few points:

  • Should I still concatenate all four files together into one? Although it seems to me that the latest protocols do not perform this step and that the two files, R1 and R2, are handled separately.

No. As I mentioned in our previous email exchange, the software does not work this way in v2, and the software tracks pairs of reads and considers each set of paired reads one RAD locus.
  • Should rem files be included in the creation of loci in Stacks 2 (v2.68), or is it preferable to work only with .1 and .2 files, ensuring correct pairing even if it means discarding rem reads?

No. You can include them, but it can create an imbalance in the locus where you have SNPs called on one side of the locus but not the other, so it can create loci that have high levels of 'missing data’ after genotyping that is hard to interpret downstream.
  • Finally, should we be concerned about the relatively low percentage of properly paired reads (60–80%), or is this acceptable to proceed with downstream analyses in Stacks?

Potentially. If it was me, I would look at the alignments of your individual reads in the SAM/BAM files. You can extract out, using samtools, reads that are not properly paired. You can then look at the alignments for those reads to see why they are not properly paired. Did one end of the read end up in a repetitive region (e.g., the improperly paired read may have multiple secondary alignments)? Are the being aligned to short contigs so both reads can’t align? 
Reply all
Reply to author
Forward
0 new messages