join_paired_ends.py questions

710 views
Skip to first unread message

MushroomLady

unread,
Jan 16, 2014, 3:05:48 PM1/16/14
to qiime...@googlegroups.com
I was wondering about people's experiences with the join_paired_ends.py command is and what people recommend for minimum overlap in base pairs required to join pairs and percent allowable differences within region of overlap. The defaults on the command are set to 0 which doesn't seem like the most conservative way to go about things - is 100 bp overlap way too high? What do you all in the knight lab like to use?  Is the end trimming and quality filtering all done in the split libraries fastq.py command?

Mike R

unread,
Jan 17, 2014, 10:28:53 AM1/17/14
to qiime...@googlegroups.com
Hi MushroomLady. :-)

If you notice, the help documentation for the '-j' option of the join_paired_ends.py script states "If not set, progam defaults will be used." Which means if the user does not enter in a value for minimum overlap (i.e. "None") it will use the default setting for the chosen program. The minimum overlap for fastq-join and SeqPrep is 6 bp and 15 bp respectively.

-Mike

Mike R

unread,
Jan 17, 2014, 10:51:12 AM1/17/14
to qiime...@googlegroups.com
I forgot to mention that the minimum overlap setting largely depends on how much overlap you expect from your forward and reverse reads. For example if they are highly overlapping, then the minimum overlap setting will not make much of a difference. However, if you are trying to construct a 475 bp fragment from 2x 250bp reads, then you minimum overlap setting will have more of an affect on the joining process.

One option: run the script "quality_scores_plot.py" on both your forward and reverse reads to help guide your minimum overlap setting. You can also run "quality_scores_plot.py" on the "joined_paired_ends.py" output to help guide the quality filtering settings of "split_libraries_fastq.py"

Does this help?

-Mike

On Thursday, January 16, 2014 3:05:48 PM UTC-5, MushroomLady wrote:

jirong.long

unread,
Mar 13, 2014, 10:46:04 AM3/13/14
to qiime...@googlegroups.com
Hi there,

We got 150-bp paired (V4 region of 16S RNA) end data from MiSeq. After we run join_paired_ends.py, on;y around 50-70% reads (depending on different sequencing running plates) could be joined and the median of read length is 250bp. Did we lose too much data during joining? Thanks.

Jirong

Mike R

unread,
Mar 13, 2014, 11:30:49 PM3/13/14
to qiime...@googlegroups.com
Hi Jirong,

I'd suspect that the quality of the bases may be low, especially near the ends of the reads. Did you try running the script "quality_scores_plot.py" on both your forward and reverse reads? How do they look? Low quality sequence near the region of overlap have to many mismatches, etc...  in order for fastq-join or SeqPrep to join them. 

Have you tried adjusting the `--min_overlap` and other options for join_paired_ends.py?  Also, make sure the sequences in both the forward and reverse reads files are in the same exact order.

-Mike

Yevgeniy Marusenko

unread,
Oct 3, 2014, 8:25:45 PM10/3/14
to qiime...@googlegroups.com
Hi Mike,

Is there a way to check the quality of the bases from Illumina data in .fastq format? The quality_scores_plot.py only takes .qual files...

Thanks

Yevgeniy Marusenko

unread,
Oct 4, 2014, 12:35:30 AM10/4/14
to qiime...@googlegroups.com
I found an answer to my question:

convert_fastaqual_fastq.py can be used to convert .fastq to .qual as described in this post
Reply all
Reply to author
Forward
0 new messages