stitch reads with percent mismatch

17 views
Skip to first unread message

Sanjeev Sariya

unread,
Sep 2, 2015, 9:41:09 AM9/2/15
to EA Utils
Hi ea-utils team,

I'm using fastq-join to stitch 16S reads: V3-V4 region, length ~460 bp. Illumina paired end, demultiplexed data. 300 bp read length. Number of samples 159. QIIME 1.9.1
ea-utils version, I'm yet to know.  :(

I tried with different quality, overlap, and different iterations before I decided to go ahead with overlap 100 and quality 20. Ideal overlap would have been 140. But I gave more leeway here. My input raw reads count is: 5581125

I'm using usearch5.2, 99% identity for clustering, de novo approach. I'm working on host-pathogen data. 
RDP version used to classify is 2.2. I'm using RDP's default setting to classify. That is, no reference is used, neither SILVA, neither Green genes (QIIME's default).

My observations:
1) 8% (default) max mismatch used, I get 486780 reads. That is - I lose ~92% of my reads.
Chimera 2.65% (this % is of the reads stitched, here 486780)
Rep sequences: 717

2) 10% max mismatch used, I get 855862 reads. That is - I lose ~85% of my reads.
Chimera 3.45% (this % is of the reads stitched, here 855862)
Rep sequences: 961

3) 20% max mismatch used, I get 2268370 reads. That is - I lose ~60% of my reads.
Chimera 7.26% (this % is of the reads stitched, here 2268370)
Rep sequences: 1706

I knew that data wasn't pooled very well, and reads thrown out would be high,. However, 92% is an astonishingly high amount. My understanding is that, the mismatch here, is looked in the overlap region. Kindly correct me if I'm wrong.  

Following are my queries:

1) Would 20% mismatch be suggested by you? 
2) How much inflation would be present if I go ahead with 20% mismatch?
3) All rep sequences generated in above iterations are assigned until genus level taxonomy, hence I'm puzzled if I've correct reads, or inflated ones.
4) If I use higher mismatch, I see an increase in chimera %-age, but that's garbage in - garbage out. Any thoughts from you?

Thanks much for your tool, and time.
--Sanjeev
Reply all
Reply to author
Forward
0 new messages