Repeat Sequences and false positives

23 views

Skip to first unread message

davewin...@gmail.com

unread,

Nov 20, 2016, 7:07:56 AM11/20/16

to SMufin

Dear SMuFin team,
I am currently testing the performance of SMuFin as you suggested.
However, I noticed some curiosities with the process and data.

In the first case:
I have generated my own datasets using ART, on chr22. Adding in manually 10SNVs.
There is no genetic variation, relative to the reference, and I ran ART with no error generation.
Effectively, the reads produced should be a copy paste.
Although this is the case, I still get false positives in the output. Do you have any idea how these
could arise, so I can correct for this when I start running on real data?

In the second case:
Now does SMuFin handle repeat elements? From reading the paper, it sounds like SMuFin is
vulnerable to the case where different genomic locations harbouring the same repeat will
get grouped together, because they share the same 30bp overlap. How is this overcome?

Kind regards,
Dave Winter

davewin...@gmail.com

unread,

Nov 20, 2016, 7:11:18 AM11/20/16

to SMufin

In regards to the curiosities with the process. I fail to find a call to BWA
for alignment. Indeed, I cannot see a bwa process execute when I record htop.
When exactly does the call to bwa occur on the version you release?

Reply all

Reply to author

Forward

0 new messages