Hello!
I am new to QIIME and am trying to demultiplex (and quality filter) my 16S pair-end Illumina sequences. The sequences were generated using the Earth Microbiome Project protocol (
http://www.earthmicrobiome.org/emp-standard-protocols/16s/). I received three files back: L001_R1_001.fastq, L001_R2_001.fastq, and L001_I1_001.fastq (forward & reverse sequences and a mapping file).
I want to make sure I am following the correct steps to successfully demultiplex my data, as at the end of split_libraries_fastq.py my quality data doesn't' seem right.
First, I used join_paired_ends.py to join the pair-end reads: join_paired_ends.py -f $PWD/L001_R1_001.fastq -r $PWD/L001_R2_001.fastw -b $PWD/L001_I1_001.fastq
This created fastqjoin.join.fastq and fastqjoin.join_barcodes.fastq files that I then used to run split_libraries_fastq.py:
split_libraries_fastq.py -o slout -i fastq-join_joined/fastqjoin.join.fastq -b fastq-join_joined/fastqjoin.join_barcodes.fastq --rev_comp_mapping_barcodes -m 16S_Mapping.txt --rev_comp_barcode
This created 3 files: seq.fna, histograms.txt, split_library_log.txt. When I pulled up the split_library_log.txt file, these were the quality filter results:
Quality filter results
Total number of input sequences: 8369799
Barcode not in mapping file: 24287
Read too short after quality truncation: 16
Count of N characters exceeds limit: 126
Illumina quality digit = 0: 0
Barcode errors exceed max: 8345252
It seems like something is not right with the barcodes, given the high amount of "barcode not in mapping file" and "barcode errors exceed max" occurrences.
Could someone please help me to understand if I am following the steps correctly - and what these quality filter results mean?
Thank you!
Chelsea