ddRAD combination barcodes did not yield anything in process

Uyen Nguyen

unread,

Mar 24, 2024, 10:14:43 PM3/24/24

to Stacks

Good day everyone,

I'm a M. S. student studying phylogeny of Psychotria mariniana, a coffee relative which doesn't have a reference genome. I'm analyzing my first fastqs from our in-house iSeq100 and have problems with demultiplexing.

Our samples were double digested with ecoRI and MspI, ligated with the 1-24 set of Peterson et al barcodes, then PCR with 4 PCR2 indices, among other steps of ddRAD. It was my first time sequencing and so I didn't know the machine could demultiplex it for me and so did not put in the indices. Now I'm stuck with non-separated files with both primers and barcodes.

It said here that I can demultiplex my file using process_radtags and a combination of barcode(5bp) TAB primer(6bp) TAB sample name: AACCA TTAGGC HL116. I used --inline_index following error that said there were too many columns in my barcodesprimers file, as suggested here. It seemed process_radtags found all of my barcodes but no combination of barcode and primers.

I've attached barcodesprimers file and the full process_radtags.raw for reference but here's a summary:

process_radtags v2.2, executed 2024-03-24 16:55:42

process_radtags -P -p ./raw -b barcodesprimers.txt --inline_index -o ./samples -c -q -r --renz_1 ecoRI --renz_2 mspI -i gzfastq
File Retained Reads Low Quality Barcode Not Found RAD cutsite Not Found Total
s1_S1_L001_R1_001.fastq.gz 0 0 2876800 0 2876800

Total Sequences 2876800
Barcode Not Found 2876800
Low Quality 0
RAD Cutsite Not Found 0
Retained Reads 0

[summarized]
Barcode Filename Total NoRadTag LowQuality Retained
AACCA-TTAGGC HL116 0 0 0 0

CGATC-ATCACG JP1414 0 0 0 0

GCATG-TAGCTT KFR11 0 0 0 0

AAGGA-GGCTAC WT25 0 0 0 0

Sequences not recorded
Barcode Total
ACGGT-1 416662
AATTA-1 151902
CGTCG-1 96862
TCGAT-1 90238
GGTTG-1 88808
GAAAA-1 69516
GCATG-1 65748
GCGGT-1 59334
CGTAC-1 53336
GATCG-1 52376
GGAAA-1 51712
ACTTC-1 43526
ACTGG-1 38802
CAACC-1 27840
AAGGA-1 23650
GGACG-1 20592

etc.

My raw fastq are linked above but here are the first few.

FYI I tried using the function of Separation by barcodes of Geneious Prime but apparently it only found the primers in a handful (2%) of my libary.

Any help or suggestions would be highly appreciated.

Thank you in advance,

Uyen

barcodesprimers.txt

process_radtags.raw.log

Nguyen Vu

unread,

Mar 26, 2024, 9:20:20 AM3/26/24

to stacks...@googlegroups.com

Hi Uyen,

My guess is your barcode file format (two cols as barcode and sampleID). Also check the command used such as --renz-1 not --renz_1. See if fix these can resolve your issue.

Hope it helps

Vu

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/eef9d0bd-ed68-4b65-87fd-b79988837568n%40googlegroups.com.

Catchen, Julian

unread,

Mar 26, 2024, 4:35:44 PM3/26/24

to stacks...@googlegroups.com

Hi Uyen,

It is hard to follow the different labels of sequences in your message. For example, we would talk about paired-end barcodes, which may be either inline (as part of the sequenced read) or index (encoded in the FASTQ header by Illumina’s software). The inline barcode is typically part of the molecular P1 and/or P2 adaptor, while the index would be in the i5 or i7 region, upstream of this adaptor in the molecular protocol. We would not combine the labels ‘barcode’ and ‘primer’, though both of these occur in the molecular protocol and we would not use the ‘primer’ to demultiplex.

If you did not specify the indexes to the Illumina software, it is possible that the sequencing machine not only did not demultiplex the files but did not sequence the i5 and/or i7 region so these barcodes are unknown. (I don’t know enough about setting up the iseq100 to say what may have happened.)

Anyway, the simplest thing might be for you to tell us where in the FASTQ file or in the sequence you expect the barcodes to be located.

Julian

FYI I tried using the function of Separation by barcodes of Geneious Prime but apparently it only found the primers in a handful (2%) of my libary.

Uyen Nguyen

unread,

Mar 27, 2024, 9:18:04 AM3/27/24

to Stacks

@Anh Vu: Thanks for your response. I tried running "--renz-1" and got an error "process_radtags: unrecognized option '--renz-1'"

Uyen Nguyen

unread,

Mar 27, 2024, 9:18:05 AM3/27/24

to Stacks

Hello Julian, thanks so much for your response!

" the simplest thing might be for you to tell us where in the FASTQ file or in the sequence you expect the barcodes to be located". I am not entirely sure. As in this diagram from Peterson et al, 2012 (the doc at the bottom), I expect to see:

1. The first part of a sequence to be the PCR 1, which is the same for all. It is supposed to be AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACG, but I can't find this exact sequence (or the pink chunk in the diagram or even 6 bp) anywhere.

2. The barcode in the middle. However, when I tried searching for a barcode it'd be all over the place.

3. The multiplexing indices, PCR 2 primer, near the end. However, when I tried searching for this 6 bp code (e.g. TTAGGC), as above it would appear anywhere.

4. I have a lot of sequences that ended like TGAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG,

Please let me know what you think.

Sincerely.

Catchen, Julian

unread,

Apr 4, 2024, 3:16:53 PM4/4/24

to stacks...@googlegroups.com

Hi Uyen,

The primers are not sequenced on the Illumina machine, they set up the molecule to be sequenced by binding to the flow cell. If you look at the Peterson diagram, the only sequence the machine will produce is where you see the “NNNN”, which represents your inserted DNA, along with the upstream restriction enzyme cutsite and the inline barcode (in blue in the diagram).

The GGGGGG… your report in your output sequences represents “no data”, that is, one of the nucleotides on the Illumina flow cell is detected by a lack of florescence (‘G’), so when you try to sequence short DNA fragments the machine reports GGGGGGG… for all the sequencing rounds where there was no molecule left on the flow cell to sequence. This suggests that your size selection was incorrect and/or you have a lot of sequencing primer-dimers (nothing but sequencing primers stuck together without your sequence inserted in them, the ‘NNNN’ in the diagram).

You probably need to revisit the molecular protocol and try to verify if you have useable DNA that is making it through the whole protocol and evaluate if you did the size selection correctly.

Best,

Julian

From: stacks...@googlegroups.com <stacks...@googlegroups.com> on behalf of Uyen Nguyen <devil...@gmail.com>
Date: Wednesday, March 27, 2024 at 8:18 AM
To: Stacks <stacks...@googlegroups.com>
Subject: Re: [stacks] ddRAD combination barcodes did not yield anything in process_radtags

Hello Julian, thanks so much for your response!

" the simplest thing might be for you to tell us where in the FASTQ file or in the sequence you expect the barcodes to be located". I am not entirely sure. As in this diagram from Peterson et al, 2012 (the doc at the bottom), I expect to see:

1. The first part of a sequence to be the PCR 1, which is the same for all. It is supposed to be AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACG, but I can't find this exact sequence (or the pink chunk in the diagram or even 6 bp) anywhere.

2. The barcode in the middle. However, when I tried searching for a barcode it'd be all over the place.

FYI I tried using the function of Separation by barcodes of Geneious Prime but apparently it only found the primers in a handful (2%) of my libary.

Any help or suggestions would be highly appreciated.

Reply all

Reply to author

Forward