Splitting Illumina Libraries

417 views
Skip to first unread message

Tilley

unread,
Sep 17, 2014, 6:14:08 PM9/17/14
to qiime...@googlegroups.com
Hello,

We have experience with 454 sequencing analysis but are new to illumina.  We are trying to process Miseq data through QIIME and are having some problems splitting the libraries. We have the following fastq files per sample: forward read, reverse read, i5 index and i7 index.  We are unsure of how to get this data into a format that can be used in downstream analysis.  We are also not sure what the barcode sequence in the mapping file should contain (ie. the full i7+i5 barcode).  Any advice would be appreciated!

Thanks.

Katherine Amato

unread,
Sep 18, 2014, 12:50:46 PM9/18/14
to qiime...@googlegroups.com
Hi Tilley,

We would suggest one of the following approaches:

1. start with extract_barcodes.py, following the 3rd example to merge the i5 and i7 files from this page: http://qiime.org/scripts/extract_barcodes.html

2. Either process a single end read of data with this extracted barcodes from step 1 as the barcode reads for split_libraries_fastq.py, or, do join_paired_ends.py on their forward/reverse reads (and filter the barcode reads generated in step 1) and use the stitched reads + the filtered barcodes with split_libraries_fastq.py.

After either option, one should be able to proceed to OTU picking as usual.

Best,
Katie

Tilley

unread,
Sep 28, 2014, 1:35:36 PM9/28/14
to qiime...@googlegroups.com
Great - thanks Katie!  We followed these steps and it seems have worked.

Tilley

Katherine Amato

unread,
Sep 28, 2014, 5:12:11 PM9/28/14
to qiime...@googlegroups.com
Great! Glad it worked!
Best,
Katie

--

---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/YK_ehp4JjqA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tilley

unread,
Feb 17, 2015, 2:03:01 PM2/17/15
to qiime...@googlegroups.com
Hi Katie,
 
I am now working on a different dataset and again having problems with this step.  I am able to successfully split the libraries by first extracting the barcodes then splitting the libraries using just the forward read, however I'd like to process this data by joining the paired ends and can't seem to successfully complete the split library step.  
 
I first joined the forward and reverse reads using join_paired_ends.py, then extracted the barcodes from the stitched reads (passing -c barcode_paried_stitched, --bc1 8 bc2 8).   For splitting the libraries, I entered the joined reads and the extracted barcodes but all samples are being filtered as a results of the barcode not being present in the mapping file.  For the split library step, I entered --barcode_type 8.
 
Any input/advice would be appreciated!
 
Thanks,
 
Tilley

Jenya Kopylov

unread,
Feb 17, 2015, 5:02:53 PM2/17/15
to qiime...@googlegroups.com
Hi Tilley,

There doesn't exist a QIIME pipeline that will allow you to do this in one easy step, however it is possible.
Below are the steps you will need to take, please let us know if something isn't clear.

Thanks,
Jenya

Prior to calling split_libraries, you will need to:

Step 1 (remove forward & reverse barcodes from each fastq read):
$ extract_barcodes.py -f seqs.fastq -c barcode_paired_stitched --bc1_len 8 --bc2_len 8 -o extract_barcodes

Step 2 (add concatenated forward & reverse barcode to head of each fastq read using Tony's script merge_bcs_reads.py):
$ python merge_bcs_reads.py extract_barcodes/barcodes.fastq extract_barcodes/reads.fastq reads_with_barcodes_at_head.fastq

Step 3 (update your mapping file to include only the stitched barcodes under column BarcodeSequence):
for example,

original mapping file:

#SampleID BarcodeSequence LinkerPrimerSequence ReversePrimer ReverseBarcode
Daff.0.1 ACTCACACTGT GGTGGTGCATGGCCGTTCTTAGTT TACAAAGGGCAGGGACGTAAT ACTCCCACTGT
Daff.0.2 ACTCACACTGT GGTGGTGCATGGCCGTTCTTAGTT TACAAAGGGCAGGGACGTAAT ACTCCCTAGCT

updated mapping file (ReverseBarcode concatenated to the BarcodeSequence to form a new forward barcode):

#SampleID BarcodeSequence LinkerPrimerSequence ReversePrimer Description
Daff.0.1 ACTCACACTGTACTCCCACTGT GGTGGTGCATGGCCGTTCTTAGTT TACAAAGGGCAGGGACGTAAT Daff.0.1
Daff.0.2 ACTCACACTGTACTCCCTAGCT GGTGGTGCATGGCCGTTCTTAGTT TACAAAGGGCAGGGACGTAAT Daff.0.2

Step 4 (split your fastq reads into fasta + qual)
$ convert_fastaqual_fastq.py -i reads_with_barcodes_at_head.fastq -c fastq_to_fastaqual -o convert_fastaqual_fastq

Step 5 (run split_libraries)
$ split_libraries.py -m mapping_file_with_stitched_barcodes.txt -f convert_fastaqual_fastq/reads_with_barcodes_at_head.fasta -q convert_fastaqual_fastq/reads_with_barcodes_at_head.qual --barcode-type 16 -o split_libraries

Tilley

unread,
Feb 18, 2015, 5:53:02 PM2/18/15
to qiime...@googlegroups.com
Hi Jenya,
 
Thank you for your reply!  I think the process makes sense.
 
I'm wondering if you could provide more guidance on how to install the additional script?  I'm running QIIME 1.8.0 through the virtual box.
 
Thanks again,
 
Tilley

Jenya Kopylov

unread,
Feb 18, 2015, 6:52:33 PM2/18/15
to qiime...@googlegroups.com
Hi Tilley,

If you have git installed on your virtual box, you can get the script by:


The script will be in 7326543/merge_bcs_reads.py

Otherwise, the simplest alternative is to get the code directly from this page.
You can either save it directly to your Shared_Folder, or you can
copy & save it to a text file directly in the virtual box.

Let me know if that works,

Jenya
Reply all
Reply to author
Forward
0 new messages