Demultiplexing paired-end illumina data

1,252 views
Skip to first unread message

Chelsea Carey

unread,
Jan 10, 2014, 5:19:55 PM1/10/14
to qiime...@googlegroups.com
Hello! 

I am new to QIIME and am trying to demultiplex (and quality filter) my 16S pair-end Illumina sequences. The sequences were generated using the Earth Microbiome Project protocol (http://www.earthmicrobiome.org/emp-standard-protocols/16s/). I received three files back: L001_R1_001.fastq, L001_R2_001.fastq, and L001_I1_001.fastq (forward & reverse sequences and a mapping file). 

I want to make sure I am following the correct steps to successfully demultiplex my data, as at the end of split_libraries_fastq.py my quality data doesn't' seem right. 

First, I used join_paired_ends.py to join the pair-end reads: join_paired_ends.py -f $PWD/L001_R1_001.fastq -r $PWD/L001_R2_001.fastw -b $PWD/L001_I1_001.fastq 

This created fastqjoin.join.fastq and fastqjoin.join_barcodes.fastq files that I then used to run split_libraries_fastq.py: 

split_libraries_fastq.py -o slout -i fastq-join_joined/fastqjoin.join.fastq -b fastq-join_joined/fastqjoin.join_barcodes.fastq --rev_comp_mapping_barcodes -m 16S_Mapping.txt --rev_comp_barcode 

This created 3 files: seq.fna, histograms.txt, split_library_log.txt. When I pulled up the split_library_log.txt file, these were the quality filter results: 

Quality filter results
Total number of input sequences: 8369799
Barcode not in mapping file: 24287
Read too short after quality truncation: 16
Count of N characters exceeds limit: 126
Illumina quality digit = 0: 0
Barcode errors exceed max: 8345252 

It seems like something is not right with the barcodes, given the high amount of "barcode not in mapping file" and "barcode errors exceed max" occurrences. 

Could someone please help me to understand if I am following the steps correctly - and what these quality filter results mean? 

Thank you! 

Chelsea 

Tony Walters

unread,
Jan 10, 2014, 6:37:29 PM1/10/14
to qiime...@googlegroups.com
Hello Chelsea,

You are probably right to suspect the barcodes are the issue-did you try running the split_libraries_fastq.py command without the --rev_comp_barcode  option (so just:
split_libraries_fastq.py -o slout -i fastq-join_joined/fastqjoin.join.fastq -b fastq-join_joined/fastqjoin.join_barcodes.fastq --rev_comp_mapping_barcodes -m 16S_Mapping.txt)?

Also, you might do some spot checks to get counts for a few of the barcodes from your mapping file (and their reverse complements) by using a grep command, e.g., for the fictional barcode AAAATTTTCCCC
you could use this command to get the counts in the fastq-join_joined/fastqjoin.join_barcodes.fastq file:
grep -c "^AAAATTTTCCCC" fastq-join_joined/fastqjoin.join_barcodes.fastq
and its reverse complement:
grep -c "^GGGGAAAATTTTfastq-join_joined/fastqjoin.join_barcodes.fastq

You should expect many thousands of counts for the barcodes. The ^ character tells it to look at the beginning of lines, to make this run a bit faster.

-Tony


--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages