Adapterama III barcode file

85 views
Skip to first unread message

Maciej Konopiński

unread,
Sep 26, 2023, 11:43:40 AM9/26/23
to Stacks
Dear All,
I'm using Adapterama III protocol (3RAD) according to 3rd design (PstI & DdeI, with NsiI as dimer cutting enzyme). The data has been demultiplexed by the sequencing company using Illumina indexes. I'm using inline barcodes to separate individuals.

To my surprise process_radtags recognizes cut sites only when barcodes contain first letter of the cut site (the one before the overhang). Does `--renz1` recognize the overhang only? This issue is not discussed in the manual thus I am not sure if I do it right. It is not too intuitive so I prefer to ask.

Another thing is that even for the "non-intuitive" solution (part of cut site included in barcode) I get about 50% of i5 reads (forward, with PstI in the beginning) dropped due to missing cut site. When I examined some random reads I find no issues with the sequence. Could it be a result of using the second cutting enzyme in the protocol? Should I be bothered by the issues?

Thanks in advance,
Maciek

-----------------------
Below you'll find some examples:
PstI cut site CTGCA (overhang italicised)

Working barcodes (final C and G are the first letters of the two cutting enzymes used PstI & DdeI):
CCGAATC CACATGTCG BCK312
TTAGGCAC TGTGCACGAG BCK313
AACTCGTCC GCATCAG BCK314
GGTCTACGTC ATGCTGTG BCK315

Not working barcodes (taken from the publication):
CCGAAT CACATGTC BCU05
TTAGGCA TGTGCACGA BCU06
AACTCGTC GCATCA BPU13
GGTCTACGT ATGCTGT BPU14


Sample of input file (PstI/NsiI cut sites underlined, bold - barcodes):

@A00627:690:HKM2VDSX7:2:2150:9778:1141 1:N:0:ATTCAGAA+AGGCTATA
GATACCCTGCAGGTTTGCACTACACAACATCAGGAAGATCAGGCCCTTTCTGACAGAGCAT+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF@A00627:690:HKM2VDSX7:2:2150:10357:1141 1:N:0:ATTCAGAA+AGGCTATA
TTAGGCACTGCATTAAAGTAAATCGGGTCAGTTTTGATATCATGTTGATCTTAATCGTAACC+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF:FFF@A00627:690:HKM2VDSX7:2:2150:22706:1141 1:N:0:ATTCAGAA+AGGCTATA
CCGAATCTGCAGCTAATGTGCTTCCTCGAGGCGCCTTGTGTTTGTGTGTCTGTAAGTGTGT+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00627:690:HKM2VDSX7:2:2150:28203:1141 1:N:0:ATTCAGAA+AGGCTATA
TTAGGCACTGCAGTTTTTCTTTGGGTGCGATGAATGAAACTGAAATACTAGAAGCAGTAGA+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00627:690:HKM2VDSX7:2:2150:28492:1141 1:N:0:ATTCAGAA+AGGCTATA
GATACCCTGCAGGACGCCGCTTCATGGCCTGACCGCTACCCACTACTGTTCAACGGGCTGG+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00627:690:HKM2VDSX7:2:2150:30337:1141 1:N:0:ATTCAGAA+AGGCTATA
CTGCAACTCTGCATAAATCAAAGCAGACAGAAGGAAGCAGCTATGTAGCACAATTACAGTC+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00627:690:HKM2VDSX7:2:2150:31891:1141 1:N:0:ATTCAGAA+AGGCTATA
CCGAATCTGCATTTCCATCTCCCATTATTCGCATTAACCCTTTTTTTTTAGGAATTAAGGCAT+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:F:F@A00627:690:HKM2VDSX7:2:2150:32000:1141 1:N:0:ATTCAAAA+AGGCTATA
GATACCCTGCATCTTACCAAATTGCTAATCTTACCTTTTGATTCAAAACAGAACAAGAATGT+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00627:690:HKM2VDSX7:2:2150:32452:1141 1:N:0:ATTCAGAA+AGGCTATA
CCGAATCTGCAGTAAACACAGAGCGCAATATCACCACGATGTCGTCATCGTCGCTACCTAA+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:@A00627:690:HKM2VDSX7:2:2150:13386:1157 1:N:0:ATTCAGAA+AGGCTATA
CCGAATCTGCATAAATACAACTGATGTCACTATTTATACAAATTCATAACAAGGAAGTGCAC+
FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00627:690:HKM2VDSX7:2:2150:16875:1157 1:N:0:ATTCAGAA+AGGCTATA
TGTCTACGTCTGCATGTTACAGGGCACAAAACCGTTCAATGATAATGAACAAGAAACCAAT+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF@A00627:690:HKM2VDSX7:2:2150:26259:1157 1:N:0:ATTCAGAA+AGGCTATA
AGCGTTGCTGCATGAATATATAGACAGGTCTGGGGAGTAACTAGTAACATGTAACGGAATTA+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF@A00627:690:HKM2VDSX7:2:2150:27326:1157 1:N:0:ATTCAGAA+AGGCTATA
CCGAATCTGCATTCTGATTATTTGCGGGTGGGTCGTGGATAAAATAAATCACAAATGATGCA+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFF,FFFF@A00627:690:HKM2VDSX7:2:2150:27344:1157 1:N:0:ATTCAGAA+AGGCTATA
AGCGTTGCTGCATGGTTTCTCTAAAGGCTTTTAATACAGCTTTAATGTGTGTGTGCTCGTGT+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF

Maciej Konopiński

unread,
Oct 18, 2023, 11:00:35 AM10/18/23
to Stacks
Is there anyone out there? :)

Catchen, Julian

unread,
Oct 25, 2023, 4:06:14 PM10/25/23
to stacks...@googlegroups.com

Hi Maciek,

 

This is incorrect: “To my surprise process_radtags recognizes cut sites only when barcodes contain first letter of the cut site.”

 

The process_radtags program recognizes the cutsite remnant that it expects to be present in the sequenced read. In the case of PstI, that expected remnant is “TGCAG.” These sequences are hardcoded into the program based on the enzyme you specify with --renz-1/--renz-2. If you provide a barcode, it expects to find those 5bp immediately after your specified barcode. It does not have any special knowledge of barcodes and just looks for the sequence you specify, independent of the restriction enzyme specified. Some variant protocols include extra linker nucleotides that appear in between the barcode and restriction enzyme cutsite remnant, if yours has this (I don’t recall what 3RAD includes at the moment), you would just append the constant nucleotide on to the end of your barcode.

 

Best,

 

julian

Reply all
Reply to author
Forward
0 new messages