How to create mapping file in case of dual index MiSeq sequencing

student

unread,

Feb 4, 2015, 8:46:06 PM2/4/15

to qiime...@googlegroups.com

Hi all,

I worked with dual index MiSeq sequencing and I got some MiSeq data.

Barcodes were extracted by "extract_barcodes.py" script.(http://qiime.org/scripts/extract_barcodes.html 3rd example)

Then, I tried to create mapping file (referring to http://qiime.org/documentation/file_formats.html) but I don't know how to fill in "Barcode Sequence."

One sample has one forward index sequence and one reverse index sequence.

Please tell me how to make mapping file.

I will appreciate your help with this situation.

Sophie

unread,

Feb 5, 2015, 8:03:07 AM2/5/15

to qiime...@googlegroups.com

Hello,

This post may help. The BarcodeSequence column will be mainly used to split your libraries (split_libraries_fastq.py) if you had multiple samples combined into one sequencing run. Not sure what step you are at, but start with join_paired_ends.py then split_libraries_fastq.py. extract_barcodes.py is mainly for formatting compatible for split_libraries_fastq.py
Thanks,

Sophie

student

unread,

Feb 6, 2015, 12:16:16 AM2/6/15

to qiime...@googlegroups.com

Hello Sophie

Thank you for your reply.

I run the script"join_paired_ends.py" (join_paired_ends.py -f $PWD/1-a_S4_L001_R1_001.fastq -r $PWD/1-a_S4_L001_R2_001.fastq -b $PWD/barcodes.fastq -o $PWD/fastq-join_joined/1-a) but message "error: option -b: file does not exist" was displayed.

Can I use file "barcodes.fastq" got from the result of the script "extract_barcodes.py"?

Thanks.

2015年2月5日木曜日 22時03分07秒 UTC+9 Sophie:

Sophie

unread,

Feb 6, 2015, 8:22:41 AM2/6/15

to qiime...@googlegroups.com

Hi,

We would recommend running the scripts in this order:

1. join_paired_ends.py to stitch the reads

2. extract_barcodes.py with the forward and reverse barcode lengths matching the length of the primers (again once per stitched read from step 1 if there are multiple files).
3. split_libraries_fastq.py, following the last example on http://qiime.org/scripts/split_libraries_fastq.html to specify comma separated SampleIDs and trimmed fastq files from step 2 (-q parameter not needed).

So, try stitching the reads first (with the original files you got from the sequencing center), then extracting bar codes, and then follow the *second* to last example on http://qiime.org/scripts/extract_barcodes.html (not last) to try and remove the primer sequences (adjust parameters based on your expected length of primers). The input should be the output of the extracting barcodes step.

Also, make sure you create a mapping file with the right LinkerPrimerSequence and ReversePrimer sequence fields (both need to be in 5'-3' orientation) for this.

For more details, you can read the beginning of this post.

Thanks,

Sophie

student

unread,

Feb 9, 2015, 2:14:42 AM2/9/15

to qiime...@googlegroups.com

Hello Sophie

Thank you for your reply.

I read the post you mentioned and tried steps you suggested.

I got 3 files: fastqjoin.join.fastq, fastqjoin.un1.fastq, fastqjoin.un2.fastq from "join_paired_ends.py -f $PWD/forward_reads.fastq -r $PWD/reverse_reads.fastq -o $PWD/fastq-join_joined".

Then, I want to run script "extract_barcodes.py --input_type barcode_paired_end -m mapping_file.txt -a -f reads1.fastq -r reads2.fastq --bc1_len 6 --bc2_len 8 -o parsed_barcodes/" by inputting files: "fastqjoin.un1.fastq" and "fastqjoin.un2.fastq" but I need "mapping.txt" file.

So I tried to make mapping file but I have two questions.

1. I have 2 index sequence per sample for example, "TAGATCGC" for forward index and "TCGCCTTA" for reverse index.

In such case, how should I fill in "BarcodeSequence"?

Is "TAGATCGCTCGCCTTA" ok?

2. I did sumple processing for MiSeq sequence using primer set, 515f and 806r.

How should I write "linkerPrimerSequence" space?

Please tell me if I have wrong understanding about above mentioned (I am a beginner about qiime).

Thanks.

2015年2月6日金曜日 22時22分41秒 UTC+9 Sophie:

Sophie

unread,

Feb 9, 2015, 12:47:12 PM2/9/15

to qiime...@googlegroups.com

Hello,

The un1 and un2 files should probably just be discarded (thanks Tony), and just the stitched (join.fastq) files should be used. It sounds like the stitched reads should look like this:

barcode_1-primer_1-amplicon-primer_2-barcode_2

but depending upon primer design, the reads may be in mixed orientation. I would suggest using extract_barcodes.py on the .join.fastq reads (-m barcode_paired_stitched), use a mapping file as input and the -a parameter to correct for read orientation, and specify the lengths of the forward and reverse barcodes as --bc1_len and --bc2_len. Then, take the output reads with the barcodes removed from this call, and call extract_barcodes.py again, with the barcode_paired_stitched method, and specify the primer length as the --bc1_len and --bc2_len. Then use the output barcodes fastq file from the first call to extract_barcodes.py and the reads output from the second call to extract_barcodes.py as input to split_libraries_fastq.py.

If you are using Illumina data - it doesn't really matter what you put in the LinkerPrimerSequence column, since these are usually not in this type of data (please see this post for more details).

Thanks,

Sophie

unread,

Feb 9, 2015, 1:11:23 PM2/9/15

to qiime...@googlegroups.com

Hi,

Slight correction - normally the primer sequence isn't used, but if you want to try and orient the reads with extract_barcodes.py, you'll need the LinkerPrimerSequence and ReversePrimer fields with appropriate data so it can try and orient the reads (it does a regular expression search for the primers in different orientations to detect the orientation of the reads).

student

unread,

Feb 11, 2015, 1:37:01 AM2/11/15

to qiime...@googlegroups.com

Hello Sophie,

I used 1st-extract_barcodes.py with -a and -m parameter and got 5files.

1.barcodes.fastq

2.barcodes_not_oriented.fastq

3.reads.fastq

4.reads1_not_oriented.fastq

5.reads2_not_oriented.fastq

But 1,3,5.files of some samples are 0 byte.

Then I used 2nd-extract_barcodes.py you mentioned and 5files(as stated above) were made again and most samples have 0byte 1,3,5.file.

I know "reads2_not_oriented.fastq" is 0byte with the forum that you told me before, but why were other 0byte files created?