Mapping over NNNNNNNNNN

27 views
Skip to first unread message

Josh Thia

unread,
Jul 19, 2017, 8:37:14 PM7/19/17
to dDocent User Help Forum
Hi all,

I have been inspecting my alignments (from PE assembly), and I have noticed that in some contigs, reads are mapped over the series of Ns placed between the forward and reverse end.

In this case, gaps were allowed to let the reads span the Ns:


In this case, a read (at the very bottom of the alignment) was allowed to completely span the
Ns:


I am interested in whether other people have had a similar experience?

Just thinking through the BWA parameters, it would make sense that gap open and gap extensions penalties could be raised to prevent large gaps from happening.

But what about reads that are mapped over the Ns without gaps? Obviously they incur a mismatch penalty, but it evidently isn't enough to be excluded from the alignment? I am cautious about making the mismatch penalty too high because this will then affect all other alignments.

I thought perhaps extending the series of Ns from 10*N to say 200*N might be a useful way to really prevent this from happening. However, I wasn't sure how this might affect the mapping with regard to the insert size estimation that is calculated by dDocent. I know from our size selection that the insert should be at least 380-430bp, which exceeds the 150bp reads; in theory, no read should be able to cover the region between forward and reverse ends.

Thoughts and opinions most appreciated!

Cheers,

~ JT
Auto Generated Inline Image 1
Auto Generated Inline Image 2

Jon Puritz

unread,
Jul 24, 2017, 10:50:23 PM7/24/17
to ddo...@googlegroups.com

Hi Josh,

I haven’t seen case 1 before, but I have seen loci like your case 2. In both cases, I would suspect that paralogs or multi-copy loci may be the problem here. Though, with only one weird read in case 2, it could simply be a rogue read. In general, with your type of data set, I look to filter any SNP that has both F and PE reads. See this part of my dDocent_filters script

For case one, you could also use grep or AWK to filter BAM files directly for the CIGAR string that indicates that particular type of alignment.

Hope that helps,

Jon


-- 
Jon Puritz, PhD
Postdoctoral Research Associate
Northeastern University
Marine Science Center
430 Nahant Rd, Nahant, MA 01908

Webpage: MarineEvoEco.com

Email: 
jpu...@gmail.com 

Cell: 401-338-8739

"The most valuable of all talents is that of never using two words when one will do."
-Thomas Jefferson

--
You received this message because you are subscribed to the Google Groups "dDocent User Help Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ddocent+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

autoGeneratedInlineImage1
autoGeneratedInlineImage2
Reply all
Reply to author
Forward
0 new messages