demultiplexing/split_libraries_fastq.py

413 views
Skip to first unread message

Reid Griggs

unread,
Jan 3, 2017, 8:07:49 PM1/3/17
to Qiime 1 Forum
Hello,

I'm having trouble attempting to demultiplex my illumina run.  I'm utilizing a barcode file that I created (a tab-delimited .txt file) from the mapping file I was given to demultiplex the forward sequences first. The following is an example of the command and error

split_libraries_fastq.py -i ./michaelv4r1.fastq -o ./demux -m ./2016_09_mike.map.v4_vs1.txt -b barcodesv4.txt --barcode_type 8 -r 999 -n 999 -q 0 -p 0.0001 &
[2] 7857
[1]   Exit 2                  split_libraries_fastq.py -i ./michaelv4r1.fastq -o ./demux -m ./2016_09_mike.map.v4_vs1.txt -b barcodesv4.txt -r 999 -n 999 -q 0 -p 0.0001
MacQIIME Reids-MacBook-Pro-2:Michaell $ Traceback (most recent call last):
  File "/macqiime/anaconda/bin/split_libraries_fastq.py", line 365, in <module>
    main()
  File "/macqiime/anaconda/bin/split_libraries_fastq.py", line 344, in main
    for fasta_header, sequence, quality, seq_id in seq_generator:
  File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/split_libraries_fastq.py", line 317, in process_fastq_single_end_read_file
    parse_fastq(fastq_read_f, strict=False, phred_offset=phred_offset)):
  File "/macqiime/anaconda/lib/python2.7/site-packages/skbio/parse/sequences/fastq.py", line 174, in parse_fastq
    seqid)
skbio.parse.sequences._exception.FastqParseError: Failed qual conversion for seq id: SampleID BarcodeSequence. This may be because you passed an incorrect value for phred_offset.



Any help or direction would be greatly appreciated. 

Thanks,
Reid

TonyWalters

unread,
Jan 4, 2017, 1:09:31 AM1/4/17
to Qiime 1 Forum
Hello Reid,

I think something may be amiss with your barcodesv4.txt and/or michaelv4r1.fastq file (normally sequence files, like the barcodes file, should be fastq format). Can you post the first few lines of each of these files?

Reid Griggs

unread,
Jan 4, 2017, 2:03:05 AM1/4/17
to Qiime 1 Forum
Hi Tony,
As noted above the barcode file is not in fastq format.  Heres an example of the two files:

barcode:
#SampleID BarcodeSequence
blank03 GTCATGAC
blank04 GTCTTCAC

reverse run (tail):

+

>>A111>1>B3@33B3D1AF31D311BAFE0A0001AABEEFF1FGEE?/EB22DD1DAGFE0/0/AF/>@//BBBEGE10/B?/0BBGHGGHDDDDG2BFFGGF?>22FG?C>/><GD//B<<C0@1>GGHHFF1??GG111?G1CFFDFDF>=<<<F11==.<...D=00==<=<A/.-C/:CC;;A.00/9BB0F0::00//:9C00CBF0.:-.9.---------99/;/-;A9B///9/;;;--;/

@M00384:79:000000000-A7R67:1:2114:16551:29237 2:N:0:1

GGAATACCAGGGTTTCTAATCCCGTTTGCTCCCCATGCTTTCGCACCCCAGCGTCGGTAGGGACCCAGAGAGCTGCCTTCGCTTTTGGCGTTCCTTCGTAGATCTCCGGATTTCACCCCTACACACGAAATTTCACTCTCCTCTGTCTCACTCAAGTGAATTGGTTTCGAGAGCATTCCGCCACTTTTTGGCGACTTTCACTTTCAACCCGATTCACCGCCTACGTGCCCTTTACGCCCACTCATTCCGAA

+

1AA111BFFFAAAAEBGEFAGGH?0EG0FGFGFCB0B1DGGF0EGGGGEGGFEAE//ABA//0/BEF//>B/F1@FGFGGCE/?/11>F>>>?>FFGE0BEFHHFHH/@G?CGGHHF0BFFHGFHC?CGGEF<GG?<D1C0<F=C1>1<<GHBFBG0==000</=<<=CFEG-CG:0C0/;C::CCFF09;/A?;-.;FFF0C09;FFFB-9-9////-;;---9AFEEB9F/:F?B?>---;;F/9;/--

@M00384:79:000000000-A7R67:1:2114:17307:29239 2:N:0:1

GGACTACAGGGGTTTCTAATCCCGTTTGCTCCCCATGCTTTCGCACCCCAGCGTCGGTAGGGACCCAGAGAGCTGCCTTCGCCTTTGGCGTTCCTTCGTAGATCTCCGGATTTCACCCCTACACACGAAATTCCACTCTCCTCTGTCTCACTCAAGTGAATTGGTTTCGAGAGCATTCCGCCACTTTTTGGCGACTTTCACTTTCAACCCGATTCACCGCCTACGTGCCCTTTACGCCCAGTCATTCAGAA



I'm currently running using fastx to demultiplex and have figured out that # of sequences was too much to my computer.  After splitting up the barcode file, its running fine.  Perhaps the same issue is happening in QIIME?

Thanks for the help.

Reid

TonyWalters

unread,
Jan 4, 2017, 2:15:03 AM1/4/17
to Qiime 1 Forum
Hello Reid,

The barcodes are specified in the mapping file under the BarcodeSequence header. See this page for format and an example: http://qiime.org/documentation/file_formats.html#metadata-mapping-files

When split_libraries_fastq.py reads a sequence from the input (-i) fastq file, it reads a line from the input barcodes fastq (-b) file, and matches up the sequence (in your case, 8 base pairs from a fastq file) to the barcodes in the mapping file to determine which SampleID the sequence belongs to.

I need to know how your data are formatted to tell you how to demultiplex your data. Do you have reads with barcodes a the beginning of each of the reads in your fastq file? Do you have one/paired fastq file per sample? Do you have a separate fastq file with 8 base pair reads for your barcodes?

Reid Griggs

unread,
Jan 4, 2017, 2:24:09 AM1/4/17
to Qiime 1 Forum
Hi Tony,

The data are paired end reads, so I have a forward and reverse fastq file.  I've created the barcode file (pointed to with -b, because I wasn't given a barcode file) by grabbing the sequences from the mapping file under the heading "barcode sequence".  These are in a column next to the sample id in the barcode file I've created.  The barcodes are at the beggining of the sequences.  

TonyWalters

unread,
Jan 4, 2017, 3:43:41 AM1/4/17
to Qiime 1 Forum
Hello Reid,

Here is the process I think we should proceed with:

1. Stitch your paired-end R1/R2 reads. You should be able to do this with join_paired_ends.py.
2. On the reads that are joined together in step 1, extract the 8 base pair barcodes from the beginning of the reads with extract_barcodes.py (http://qiime.org/scripts/extract_barcodes.html). You should be able to use the option --bc1_len 8 to get the first 8 base pairs. This should yield a barcodes fastq and a reads fastq file.
3. Using the barcodes (-b) and reads (-i) fastq files from step 2, call split_libraries_fastq.py.

There are caveats to this that can make the process more complicated, e.g., are the reads in mixed orientation, or are they consistent (i.e. read 1 is always in the sense direction), and if step 1 gives really low yield (the unjoined data are larger than joined data), then you may want to bypass that step and just use read 1 for steps 2/3.

-Tony

Reid Griggs

unread,
Feb 3, 2017, 1:09:25 AM2/3/17
to Qiime 1 Forum
Hi Tony, 

Sorry for delayed reply.  Im trying what you've suggested now.  Two more things.
1. Can you please walk me through the initial error message I posted?
2.  Is there a way to demultiplex/debarcode the forward and reverse runs seperately?


Thanks,
Reid

TonyWalters

unread,
Feb 3, 2017, 1:18:33 AM2/3/17
to Qiime 1 Forum
Hello,

1. Sometimes the fastq formats aren't consistent, and have to be converted (say by the increment value) to get a valid score value. See https://en.wikipedia.org/wiki/FASTQ_format for more details about the characters/values and what they mean in the fastq quality lines.

2. You could do the extract_barcodes.py step above to generate the barcodes fastq file, which could then be used (independently) to demultiplex the data for read 1 or read 2 with separate calls to split_libraries_fastq.py, if you want to avoid the stitching process for read 1/read 2. I would also analyze these data independently-there's not really a great way to merge these data if that's what you're looking to do.

-Tony

Reid Griggs

unread,
Feb 3, 2017, 1:31:17 AM2/3/17
to Qiime 1 Forum
Tony-

Thanks for the prompt reply.  I've tried changing the phred offset value to both of the options, and that doesnt appear to resolve the situation.  

I'll try the suggested method for demultiplexing/debarcoding separately and get back to you. 

Thanks,
Reid
Reply all
Reply to author
Forward
0 new messages