Hi All,
I am struggling to pre-process my 16S rRNA Illumina
Sequening data with QIIME. I have several issues that I have not been able to find specific answers for:
I have 4 files from the sequencing - read 1, read 2, index 1 and index 2 (MiSeq Paired End - 2x 250 cycle).
Workflow Plan:
1) Extract Barcodes (extract_barcodes.py): with the option to re-orientate reads (I am finding the reverse complement of my i7 adaptor / linker/ pad/ and barcodes at the beginning of some of my reads in the read 1 file and vice versa in the read 2 file (but reverse complement i5 adapter and forward primer instead)
2) Join Paired Ends (join_paired_ends.py): with option to update the index / barcode reads file to match the surviving joined pairs.
3) Split libraries (split_libraries.py) :(Demulitplex and QC - with option z- to remove the reverse primer (and adapter / linker / pad/ sequence)
My Questions:
1) I am struggling how to see how the extract barcodes script helps me - in QIIME's website it says:
for two index/barcode reads and two fastq reads...
This situation can be treated as a special case of paired-end reads. One could supply the index files (labeled as index1.fastq, index2.fastq) and use the --input_type barcode_paired_end:
i.e.: extract_barcodes.py --input_type barcode_paired_end -f index1.fastq -r index2.fastq --bc1_len 6 --bc2_len 6 -o parsed_barcodes/
The output barcodes.fastq file would be used for downstream
processing, and the reads1 and reads2 files could be ignored.... (this is what
I don't understand... I need the original read 1 and 2 to join the reads and then
demultiplex samples in the other 2 scripts, correct?) Or is this referring to the output files from this script? Also, I would like to run the extract barcode script to re-orientate my reads. Is this possible when I need to run it with my index read files? Or do I need to re-run the script to do this - but then wont this generate an additional barcodes file? If so, how to I handle that additional barcode file?
2) The mapping file for the split_libraries.py script- THIS is my biggest issue. How do I list both barcodes when the formatting and script allows for only one barcode column? Should I concatenate them? Do I need to reverse complement the barcodes? How do others handle duel barcodes with this script and mapping file format? I have been reading other pages but I can't find an answer or example on how best to handle this.
3) Any other examples or resources to handle paired end illumina miseq data in QIIME for first time users - specifically for those with 4 original sequencing files - 2 read files and 2 index read files.
Thank you in advance!
Sara
An example of the output that gets generated can be found here, the main two outputs are the following folders:
sl-out: includes a summary of the quality controlled sequences per sample as processed by split_libraries_fastq.py.
closed-ref: the main file you want to pay attention to in this folder is otu_table.biom which contains your OTU table picked using pick_closed_reference_otus.py. This is the output that you will need to use as an input for the QIIME Visualizations app."
Thank you very much for your help with this. I wanted to clarify if I should use your script on the de-multiplexed read files that I can get from the sequencing site or run your script and your recommendations on the original files.
Thank you!
Sara
| ACGACGTG |
| ATATACAC |
| CGTCGCTA |
| CTGCGTGT |
| TCATCGAG |
| CGTGAGTG |
| GGATATCT |
| ACCTACTG |
| AGCGCTAT |
| AGTCTAGA |
| CAGTGAGT |
| CGTACTCA |
| CTACGCAG |
| GGAGACTA |