In the first case:
I have generated my own datasets using ART, on chr22. Adding in manually 10SNVs.
There is no genetic variation, relative to the reference, and I ran ART with no error generation.
Effectively, the reads produced should be a copy paste.
Although this is the case, I still get false positives in the output. Do you have any idea how these
could arise, so I can correct for this when I start running on real data?
In the second case:
Now does SMuFin handle repeat elements? From reading the paper, it sounds like SMuFin is
vulnerable to the case where different genomic locations harbouring the same repeat will
get grouped together, because they share the same 30bp overlap. How is this overcome?
Kind regards,
Dave Winter