Hi Qiime developers and users,
My problem is in the demultiplexing step using split_libraries_fastq.py . Almost 70% of my input reads has no barcode reference in the mapping file.
Here my pipeline:
// generating barcode file
extract_barcodes.py -f SAM1-25_S8_L001_R1_001.fastq -r SAM1-25_S8_L001_R2_001.fastq -c barcode_paired_end --bc1_len 8 --bc2_len 8 -o barcode
output:
barcodes_20lines.fastq
// validating mapping file
validate_mapping_file.py -m mapping_file.txt -o check_id_map
output:
mapping_file_corrected.txt// join_paired_ends
join_paired_ends.py -f SAM1-25_S8_L001_R1_001.fastq -r SAM1-25_S8_L001_R2_001.fastq -b barcodes.fastq -o fastq-join_joined
output:
fastqjoin.join_20lines.fastq fastqjoin.join_barcodes_20lines.fastq
// demultiplexing
split_libraries_fastq.py -i fastqjoin.join.fastq -b -o demultiplex_NO_barCodes/ -m mapping_file_corrected.txt -q 19 --barcode_type 8
First lines from split_library_log.txt:
Mapping filepath: mapping_file_corrected.txt (md5: 2ebdc2c6e3cc86da635db8c3a537a407)
Sequence read filepath: fastqjoin.join.fastq (md5: 09dc1774afbbf8146598a44c650c02f4)
Barcode read filepath: fastqjoin.join_barcodes.fastq (md5: 5462eec2bc663960c2ae3c5f996bf3ef)
Quality filter results
Total number of input sequences: 1197320
Barcode not in mapping file: 811160
Read too short after quality truncation: 84368
Count of N characters exceeds limit: 3
Illumina quality digit = 0: 0
Barcode errors exceed max: 0
Result summary (after quality filtering)
Median sequence length: 523.00
P1b.18.9 17472
yucra.2a 15875
I have already seen
https://groups.google.com/forum/#!topic/qiime-forum/XLRdwnpILBs . It seems that the reverse complement is not the problem.
Any idea what happening? I am making a mistake in previous steps? Is there a problem with the mapping file, like barcodes missing?
Thanks for your help
Matias