Hello everyone,
I am seeking help with respect to a situation arising during running my sequence analysis.
I ran 100 samples (2X300) on a MiSeq run using bacterial and archaeal 16S primers (Bac357F/785R, Arc344F/806R). The samples were prepared based on the strategy used by Herbold et al 2015 with a new header sequence attached to the barcode and primer.
Nonetheless the MiSeq run gave 10million reads. The R1 and R2 files were devoid of the adapter sequences. I performed the following procedures on the output files:
1) I extracted the barcodes with the unique header sequences using
"extract_barcodes.py -f Friegrich-01_S1_L001_R1_001.fastq -r Friegrich-01_S1_L001_R2_001.fastq -c barcode_paired_end --bc1_len 24 --bc2_len 24 -o processed_seqs"
2) I joined the reads using
"join_paired_ends.py -f reads1.fastq -r reads2.fastq -b barcodes.fastq -j 10 -o joined.fastq"
3) I prepared my own mapping file and ran the split library command as follows (twice)
split_libraries_fastq.py -i fastqjoin.join.fastq -b fastqjoin.join_barcodes.fastq -o slout/ -m Mappingfile.txt --store_demultiplexed_fastq -r 0 -
q 19 -n 100 --barcode_type 24 --phred_offset 33
split_libraries_fastq.py -i fastqjoin.join.fastq -b fastqjoin.join_barcodes.fastq -o slout/ -m Mappingfile.txt --store_demultiplexed_fastq -r 0
-q 0 -n 100 --barcode_type 24
The only difference in both the cases was the -q command wherein:
-q 19 gave me an output of :
Total number of input sequences: 3891531
Barcode not in mapping file: 2776834
Read too short after quality truncation: ~
704000whereas,
-q 0 gave me an output of:
Total number of input sequences: 3891531
Barcode not in mapping file: 2776834
Read too short after quality truncation:
10381As you can see I am not sure which -q value should I use. Clearly the latter give me more sequences per sample as compared to the former. Can somebody help me understand the implications of using or not using the latter in terms of quality check for sequence analysis?
The -q 0 command was one recommended by the BR microbiome protocols which have a published set of commands for 16S analysis using qiime (
http://www.brmicrobiome.org/#!16s-profiling-pipeline-illumina/czxl).
Any help with this confusion would be really helpful.
I am a little bit new to this so just bear with my understanding and responses.
Awaiting your thoughts and suggestions in patience.
Regards,
Ajinkya