Ambiguous RAD_Tags Problem(?)

Matthew Penney

unread,

Apr 22, 2021, 10:44:42 AM4/22/21

to Stacks

Hello again,

So I have STACKS now reading my already-demultiplexed data (I'm not sure if PCR duplicates were removed already, I have asked Genome Quebec about this) with -c -q -r included. I think my adapters are removed, as well. So I have some questions.

For each sample, I get a few thousand retained reads with the Rescue funciton. That's it. The rest are lost entirely to Ambiguous RAD_Tags, and none are discarded due to low quality. And it seems like a tiny fraction of the data that each one has (see log). Would this be normal for samples that are already demultiplexed? Should I ask if they kept in the cut site for the RAD-Tags?

I also spoke with the scientist at Genome Quebec that told me there was lower quality sequence in areas with low diversity (e.g. cut sites). Is this the culprit, potentially? And if so, is there a way to correct this?

I have reposted a FASTq header here and the log I'm currently getting. I'm just wondering if there's something about my code or the data that is throwing away reads for no reason. That would be bad (this data is expensive...). Thank you!

Cheers,

Matthew Penney

Acadia University Umm... is this normal..JPG

kwojt...@gmail.com

unread,

Apr 22, 2021, 11:15:12 AM4/22/21

to Stacks

Hi Matthew,

It looks like your cut sites have been trimmed off already which is pretty common, so Stacks can't find your cut sites and is dropping reads because of that (at least that's what I think is happening).

You can disable the RAD cut site check with --disable_rad_check, and if your reads are retained then that is the likely culprit.

Also- have you run Fastqc? If you haven't I would recommend doing that first. That will tell give you all the quality metrics you need, ie read quality, adapter contamination, etc. which will help you decide which flags and filters you need to apply.

Hopefully this was more helpful than my last message!

-Kris

Matthew Penney

unread,

Apr 22, 2021, 11:46:59 AM4/22/21

to stacks...@googlegroups.com

Hi Kris,

Thanks! I was wondering if that might be the issue. Now, GQ said that adapters are not removed nor are duplicates removed, so that's a bit odd. Though I might follow up and ask if that's in cases where data is not demultiplexed on site. Especially since I noted the reads don't seem to start with the same sequence. The FASTQ I showed was from an F2, though, just in case that makes a difference.

I am trying the run now with the --disable_rad_check command. It certainly takes longer...

I haven't run the fastqc on it, no. I'll make a note to do that.

I'm also checking back with the contact I mentioned at GQ to see if the low quality issue for low diversity sites that he mentioned was more general or largely restricted to cut sites. The latter, from what I've read, might be fixable.

Fingers crossed this all works out and my thesis doesn't end up being "Here Is How I Failed At RADseq." I don't even want to imagine defending that...

Thank you!

Virus-free. www.avast.com

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/c3f86b57-c43b-479b-b350-857e4e7e3460n%40googlegroups.com.

Matthew Penney

unread,

Apr 22, 2021, 12:10:12 PM4/22/21

to Stacks

Update:

So, the cut site is most likely the issue. When I re-ran my data with the --disable_rad_check but with all three quality filters (-c -q -r) in place, the number of reads shot up from about 5,000 to 34,154,382 with 4962 reads lost to low quality. MASSIVE improvement. Okay, good. I don't need to drink a whole bottle of vodka now.

The journey continues. Thanks!

Roseanna Gamlen-Greene

unread,

Apr 22, 2021, 7:02:03 PM4/22/21

to stacks...@googlegroups.com

Hi Matthew,

I recently encountered the exact same issue (poor quality cut sites on the reverse reads) with paired-end double digest GBS Novaseq data from GenomeQC. My solution was to trim the cut site off all my reverse reads before demultiplexing each plate. I used Trimmomatic on Compute Canada servers. GenomeQC had already trimmed the adapters off the reads before they gave the data to me and had demultiplexed the lanes but not the plates. I.e. I was given two files per plate (one file with the reveres reads and the other with forward reads for that plate - and each plate contained 96 samples) that I needed to demultiplex into 192 files for my individual samples (96 reverse read files, 96 forward read files).

My FastQC showed that only the reverse reads had poor quality cut sites and so I only trimmed reverse reads and not forward reads. That meant I only specified one restriction enzyme in process_radtags. After I ran Trimmomatic, I then ran process_radtags successfully to demultiplex each plate into samples by calling the unaltered forward reads file and the altered trimmed reverse read file.

These were my process_radtags flags: --inline_null --renz_1 sbfI --quality --rescue --barcode_dist_1, s =20, w= 0.25

Best,

Roseanna

Roseanna Gamlen-Greene (she/her)

PhD Candidate | Vanier Canada Scholar | Killam Laureate | National Geographic Explorer | UBC Public Scholar

Forest and Conservation Sciences

University of British Columbia, Vancouver

To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/a433dca1-9cf0-4925-93cc-7468a1e22b61n%40googlegroups.com.

Matthew Penney

unread,

May 9, 2021, 10:54:46 AM5/9/21

to Stacks

Hi Roseanna,

Thanks for this! Looking over my reverse reads I don't see any cut sites at all, or much consistent sequence from one read to another. I've been trying to see if someone at GQ can clarify if the adapter sequence is still in the reverse reads (doesn't appear to be the case, but just want to be sure) but haven't had much luck. There IS some adapter remaining in my forward reads (12bp, including my 4bp degenerate region for identifying clones), which is why I was suspicious.

I'm running my stuff through FastQC now (I have 93 samples, so it's going to take awhile) but I do have some QC data on Nanuq, as well. One thing I know will be an issue for my data is PCR clones, and I'm planning to use filter_clones for that. I also chose to do 150bp sequence fragments, so I'm expecting to have to do some trimming, as well. Previously, the majority of reads for each sample passed basic process_radtags quality filtering (-c -q -r), so I'm hopeful there is usable data here.

Thanks very much for the help! ^_^

-Matt Penney

Acadia University

Matthew Penney

unread,

May 17, 2021, 8:45:50 AM5/17/21

to Stacks

So I've looked through the fastqc results. Things are mostly good. I do have an issue with short reads and adapter read-through, and the initial sequence on my R2 files is not good, but those seem to be fixable. I also have a lot of clones (as expected), which I'm working through now.

One issue I continue to have is that STACKS will not read my data as paired-end, and I don't know why. I have the reads as separate R1 and R2 files, so I know they're not interleaved. Should I just process both both R1 and R2 as single-end reads for clean-up?