Barcode/key file format for ddRAD-seq data

Sarah Endenburg

unread,

Apr 4, 2016, 2:48:36 PM4/4/16

to TASSEL - Trait Analysis by Association, Evolution and Linkage

Hello,

I'm trying to use the UNEAK pipeline on Tassel version 3 to analyze ddRAD-seq data. However, I'm not sure how to format my barcode file. All the documentation I've read shows only one barcode sequence per sample. My data has two barcodes per sample (one per enzyme used in the double digest) as shown below.

Barcode1    Barcode2    Sample    Column    Row
CCGAATG    CTAACGT    AB-R-01-01    A    1
TTAGGCAG    CTAACGT    AB-R-01-02    A    2
AACTCGTCG    CTAACGT    AB-R-01-03    A    3
GGTCTACGTG    CTAACGT    AB-R-01-04    A    4
GATACCG    CTAACGT    AB-R-01-05    A    5
AGCGTTGG    CTAACGT    AB-R-01-06    A    6
CTGCAACTG    CTAACGT    AB-R-01-07    A    7
TCATGGTCAG    CTAACGT    AB-R-01-08    A    8
CCGAATG    TCGGTACT    AB-R-01-09    A    9
TTAGGCAG    TCGGTACT    AB-R-01-10    A    10

Will Tassel accept a key file with two barcode columns, or is there another way to format this so that it will run properly?

Thanks!

Sarah Endenburg

unread,

Apr 5, 2016, 8:00:22 AM4/5/16

to TASSEL - Trait Analysis by Association, Evolution and Linkage

As well, are the flowcell ID and lane number essential for UNEAK to work? The company that sequenced my data didn't provide that information, and Stacks can demultiplex it using only the barcodes and sample IDs.

Matt C

unread,

Apr 15, 2016, 9:48:18 AM4/15/16

to TASSEL - Trait Analysis by Association, Evolution and Linkage

I have the exact same question. Please update if you learn anything! I will do the same.

Matt

Sarah Endenburg

unread,

Apr 15, 2016, 10:08:23 AM4/15/16

to TASSEL - Trait Analysis by Association, Evolution and Linkage

Seemed to have gotten around the two barcode issue by listing all samples twice, once per barcode. Someone else did that here:

https://groups.google.com/forum/#!searchin/tassel/barcodes/tassel/4yXE1P0eEhY/GR6E_xBf60AJ

The flowcell ID should be in the first line of your Illumina files, if that's the kind of data you have. I found mine using the command: head -n1 * in the folder containing the raw files. I also e-mailed the sequencing company and got information about the lanes, but unfortunately it seems like Tassel just might not be suited to our data anyways. We were only using Tassel to obtain files in the correct format for Haplotag, so some colleagues are now trying to write a script that will allow us to use Stacks instead.

Hope that helps a bit :)

Reply all

Reply to author

Forward