All reads coming back as "ambiguous barcodes"

187 views
Skip to first unread message

Matthew Penney

unread,
Apr 19, 2021, 11:36:23 AM4/19/21
to Stacks

Hello,

I am running process_radtags 2.53 on a set of sequences obtained from an Illumina NovaSeq 6000 platform at Genome Quebec. They were able to demultiplex the reads already (I'm just doing quality filtering), so I assume that the barcodes were successfully read by the company. The problem is as follows:

I developed my rad tags using a double digest approach with paired-end combinatorial indexed barcodes. However, process_radtags will not read my files as paired (only as single-end). It was able to read both the sequence files (which are fastq.gz format) and the barcode file I'm using, so I went forward (note: this was a test to get process_radtags to read the files, so I did not include any cleaning or rescue functions). All of the reads returned as ambiguous barcodes, which is odd given that the samples were already demultiplexed from pooled libraries and the barcode file was formatted accordingly.

Here are some examples of the file names for my read files:
NS.1518.002.Msp1_4---Pst1_7.121419-31_R1.fastq.gz
NS.1518.002.Msp1_4---Pst1_7.121419-31_R2.fastq.gz

When I tried this again adding in the -c -q -r functions, I got the error message (Segmentation fault). My current line of code is as follows:

process_radtags  -p ./Da_Data/ -b ./Barcodes/Sample_Barcode_Match_Tab.csv -o ./Prophecy/ -c -q -r --index_index --renz_1 pstI --renz_2 mspI

Does anyone know what the issue might be? Thank you!


-Matt Penney
Acadia University

Matthew Penney

unread,
Apr 19, 2021, 1:59:16 PM4/19/21
to Stacks
A small amendment to add:

So, the Segmentation Fault seems to be tied to the -r (rescue) function, since there was no trouble reading the files once this command was removed. Does anyone know more about this?

To further elaborate on this problem, I removed the -r and adapter sequences (for now) and tried the following code:

process_radtags -p ./Da_Data/ -b ./Barcodes/Sample_Barcode_Match_Tab.csv -o ./Prophecy/ -q --index_index --renz_1 pstI --renz_2 mspI

Here is the readout:

Processing single-end data.
Using Phred+33 encoding for quality scores.
Found 188 input file(s).
Searching for single and paired-end, indexed barcodes.
Loaded 93 barcodes (5bp / 5bp).
Processing file 1 of 188 [NS.1518.002.Msp1_2---Pst1_6.121419-14_R2.fastq.gz]
  Processing RAD-Tags...1M...2M...3M...4M...5M...6M...7M...8M...9M...10M...11M...12M...13M...14M...15M...16M...17M...18M...
  18468105 total reads; -18468105 ambiguous barcodes; -0 ambiguous RAD-Tags; +0 recovered; -0 low quality reads; 0 retained reads.
Processing file 2 of 188 [NS.1518.002.Msp1_7---Pst1_8.121419-56_R2.fastq.gz]
>Attempting to read first input record, unable to allocate Seq object (Was the correct input type specified?).<

Does anyone know what this error means?

If additional information is needed I am happy to provide (or find it and then provide). Thank you!

kwojt...@gmail.com

unread,
Apr 19, 2021, 2:24:20 PM4/19/21
to Stacks
Hi Matt,

I am not sure I can help with your ambiguous barcode issue, but this latest error may be caused by omitting the -i flag which tells stacks what type file is being input (ie fastq, gzfastq, etc) I know it says in the manual that it will guess, but I have never run Stacks without that flag. 

Also, the reason it says its running single-end is because you are using the lower case -p flag, the upper case -P flag tells it that the directory contains paired files. (Disregard if you know this already and if its not helpful!)

A screen shot of what your reads look like might help people figure what is going on with you ambiguous barcode situation. Hopefully this helps a little.

-Kris

Message has been deleted

Matthew Penney

unread,
Apr 19, 2021, 4:03:52 PM4/19/21
to Stacks
Hi Kris,

I didn't think to add the -i flag to it, since my files are all fastq.gz anyway. I'll give that a try.

When I included the -P command it wouldn't read any of the files. That's what I was trying to figure out.

As for the barcodes, I think I figured out part of the problem. The reverse barcodes in the reads are in the opposite read direction of the ones in the barcode file (i.e. the barcode TGCAC is GTGCA in the FASTQ). So that's something I'll have to fix in my barcode file. Okay, easy enough.

Here is the head of one of my FASTq files. This is one of the samples that was on the lower end of quality, though, so I might pick another one later. Does the format at least look right?

Thanks!Sea Cuke Ilumina Reads.JPG

Matthew Penney

unread,
Apr 22, 2021, 9:50:00 AM4/22/21
to Stacks
Hello again,

So, I tried adding in the -i flag and changing the Barcode file to match the FASTQ file I have (attached, reverse barcodes first as that is the orientation in the FASTQ file, forward barcodes are reverse complements).

When I retried running with the -i flag and new barcode file as single-end data, I got the same issue (all ambiguous). And it still will not read my data as paired-end at all (log attached). The photo also includes the directory with my data files.

Does anyone know what the issue might be, or how I might go about troubleshooting this further? Any help is appreciated. Thank you!

STACKS attempt 2 log.JPG

On Monday, April 19, 2021 at 3:24:20 PM UTC-3 kwojt...@gmail.com wrote:
Sample_Barcode_Match_Tab2.txt

Matthew Penney

unread,
Apr 22, 2021, 10:03:01 AM4/22/21
to stacks...@googlegroups.com
...well then..JPG
Oh... this might be the problem, then. Mine are demultiplexed already. XD

Well, I got some good tips anyway. Thanks!

Virus-free. www.avast.com

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/995e31b0-9930-4c23-a6f6-e0fa840def8fn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages