Orientation of mate-pairs for paired end illumina reads

sunil mundra

unread,

Apr 4, 2014, 8:25:25 AM4/4/14

to qiime...@googlegroups.com

Hello,

I have a MiSeq run which produced two fastq files; one of which contained all forward reads, and the second all reads in the same orientation that needed to be reverse complimented.

The data that seems to be problematic used TrueSeq amplicon library prep..
So each fastq file shows a mixture of orientated reads:

Fastq1
BARCODE--F. PRIMER--NNNN
BARCODE--R.PRIMER--NNNN

Fastq 2
BARCODE--R.PRIMER--NNNN
BARCODE--F. PRIMER--NNNN

I guess that join_paied_reads.py script designed to join overlapping PE reads will not be able to take into consideration the mixture of Fwd and Rev sequences in each file?

is this normal, or should I be seeing all Fwd and Rev primer sequences in separate Fastqs?

is there any way to analyses such data by QIIME?

I have used qiime for 454 data analysis but i a new to Illumina data and not good in custom scripting.

Cheers,
Sunil

sunil mundra

unread,

Apr 7, 2014, 3:45:24 PM4/7/14

to qiime...@googlegroups.com

Hello all QIIMERS

How can i solve this problem?

Looking forward to hear from someone...!!!

Regards

Sunil

Kyle Bittinger

unread,

Apr 7, 2014, 3:59:27 PM4/7/14

to qiime...@googlegroups.com

Forwarding this to the maintainer of join_paired_ends.py...

--Kyle

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tony Walters

unread,

Apr 7, 2014, 4:20:19 PM4/7/14

to qiime...@googlegroups.com

Kyle, we had an internal discussion about this that didn't get posted:

Mike:

"The separate files must ONLY have the forward or reverse reads. They can not be interleaved forward and reverse. I think this is one possible use of the code Antonio is working on (to handle the interleaved JGI output). He can use FLASH (can handle interleaved files) but I would not recommend it. I have a thread on this in the qiime forum someplace."

So handling the reads with mixed orientation isn't going to work for joined_paired_ends.py

Me:

it may be possible to do a call to extract_barcodes.py with the paired end reads selection and --attempt_read_orientation, and specify zero length forward and reverse barcodes to try and orient the forward and reverse reads (at least the ones that primers could be found in) without otherwise altering the reads. Then you could try stitching the reads on the reads that were successfully oriented.

Once the reads are stitched, then http://qiime.org/scripts/extract_barcodes.html could be called again with the stitched option to slice off the barcodes from the ends of the stitched reads, and then the resulting reads and barcodes can be used with split_libraries_fastq.py

sunil mundra

unread,

Apr 8, 2014, 5:06:17 AM4/8/14

to qiime...@googlegroups.com

HI

Is this possible if

1. I do demultiplexing first with based on Barcode + forward primer on joined fastq file.

2. than again demultiplexing of raw sequences in joined fastq file, but this time, if i consider my revesre primer as forward and vice versa.

3. than make reverse complementation of output of step 2

4. Join both fastq file.

5. than i will have all read in same orientation, is this correct way to go?

If any of you have idea, also i would like to know why i have reads in mix orientation. is this sequencing probelm? how can i ask company to have reads in same orientation?

Looking forward for suggestion

Kyle Bittinger

unread,

Apr 8, 2014, 9:08:10 AM4/8/14

to qiime...@googlegroups.com

Sounds like an acceptable strategy, but you will definitely want to spot check the results manually to make sure all the steps worked for a few reads.

--Kyle

sunil mundra

unread,

Apr 14, 2014, 4:11:48 PM4/14/14

to qiime...@googlegroups.com

Hello Kyle,

Now i am trying to process the data (MiSeq paried end data with mixed orientation) as per discussion, and got few more queries about. I followed the analysis in following ways.

##############

join_paired_ends.py -f /usit/abel/u1/sunilm/LSP/L_R1.fastq -r /usit/abel/u1/sunilm/LSP/L_R2.fastq -o /usit/abel/u1/sunilm/LSP/fastq-join_joined/

convert_fastaqual_fastq.py -c fastq_to_fastaqual -f /usit/abel/u1/sunilm/LSP/fastq-join_joined/fastqjoin.join.fastq

split_libraries.py -o /usit/abel/u1/sunilm/LSP/filtered_forward/ -f /usit/abel/u1/sunilm/LSP/fastqjoin.join.fna -q /usit/abel/u1/sunilm/LSP/fastqjoin.join.qual -m /usit/abel/u1/sunilm/LSP/mapfile_LSP.txt -w 50 -s 25 -H 8 -l 200 -L 450 -a 0 -b variable_length -d -z truncate_only

split_libraries.py -o /usit/abel/u1/sunilm/LSP/filtered_reverse/ -f /usit/abel/u1/sunilm/LSP/fastqjoin.join.fna -q /usit/abel/u1/sunilm/LSP/fastqjoin.join.qual -m /usit/abel/u1/sunilm/LSP/mapfile_LSP_rev.txt -w 50 -s 25 -H 8 -l 200 -L 450 -a 0 -b variable_length -d -z truncate_only

adjust_seq_orientation.py -i /usit/abel/u1/sunilm/LSP/filtered_reverse/seqs.fna

Joining of forward and reverse reads

cat filtered_forwrad/seqs.fna filtered_reverse/seqs_rc.fna > merge/merged_seqs.fna

#####

I didnt managed to use

identify_chimeric_seqs.py -m usearch61 -i /usit/abel/u1/sunilm/LSP/merge/merged_seqs.fna --suppress_usearch61_ref --keep_intermediates --usearch61_abundance_skew 2.0 --usearch61_mindiv 1.0 -o usearch61_chimera_checking/

######

pick_otus.py -s 0.97 -i /usit/abel/u1/sunilm/LSP/merge/merge.unique.pick.fna -m uclust --optimal -o /usit/abel/u1/sunilm/LSP/merge/uclust_97/

############

Issue 1. Is this workflow is OK.

Issue 2. After joining the paired end reads, i converted them into .fna and .qual files. then i used split_library.py for demultiplexing and quality control. I am not sure how best we should do quality control on converted quality score. Reads quality score looks not that good when we converted them into .qual file. In some illumina data paper researcher have selected reads with Phered score>35. What is the comparable 454 quality score for illumina data. In split_library i selected reads with 25 score values here. Is this good enough?

Issue 3. While using identify_chimeric_seqs.py i got error, that usearch61 has not installed. When i talk to people handling cluster they said it is installed. how can i use it. is there any alternate way to use it.

Looking forward to hear from you...

Regards

Sunil

Kyle Bittinger

unread,

Apr 14, 2014, 4:21:31 PM4/14/14

to qiime...@googlegroups.com

1. Sounds OK to me, though I would spot check some reads to make sure they were processed correctly.

2. Not sure that 454 and Illumina quality scores are really comparable. You want to select a quality score threshold. Ideally, your final results will be the same over a wide range of values. You probably want to check this once you work through your analysis if you are unsure which value to use. Also, try BLASTing a 5-10 reads against the GreenGenes training set, and look at where the alignments start to break down. Is there a clear quality score range where this starts to happen?

3. The usearch binary just needs to be located somewhere in your $PATH, so you can download it to your user directory and change your $PATH variable.

--Kyle

sunil mundra

unread,

Apr 15, 2014, 12:06:06 PM4/15/14

to qiime...@googlegroups.com

Hello Kyle

Thanks very much for you help...Now i have some more question..

1. I used split_library.py as mentioned below. As output i got seqs.fna and seqs_filtered.qual files. But i didnt get any log.txt and histogram.txt file as output.

2. I merged .fna file using cat command as mentioned below. Is it possible to use same cat command for merging qual files? if qual file will be joined using cat command, data will be in same order as in merged.fna file.

3. I have a big MiSeq paired end dataset *(R1 and R2 file), and while doing this trial run it takes a log long time. i would like to use a reduced subset for this trial and run. Could you please suggest command, to make dataset small.

split_libraries.py -o /usit/abel/u1/sunilm/LSP/filtered_forward/ -f /usit/abel/u1/sunilm/LSP/fastqjoin.join.fna -q /usit/abel/u1/sunilm/LSP/fastqjoin.join.qual -m /usit/abel/u1/sunilm/LSP/mapfile_LSP.txt -w 50 -s 25 -H 8 -l 200 -L 450 -a 0 -b variable_length -d -z truncate_only

cat filtered_forwrad/seqs.fna filtered_reverse/seqs_rc.fna > merge/merged_seqs.fna

Regards

Sunil

Kyle Bittinger

unread,

Apr 15, 2014, 12:17:56 PM4/15/14

to qiime...@googlegroups.com

1. Are you sure the script has finished without errors? The code indicates that the log file should be written immediately after the FASTA and QUAL files.

https://github.com/biocore/qiime/blob/master/qiime/split_libraries.py#L1438

2. Sure, but you should use the -n option in split libraries, as in the "Duplicate Barcode Example" on this page:
http://qiime.org/scripts/split_libraries.html

3. head -n 4000 inputfile.fastq

Reply all

Reply to author

Forward