My group has noticed that delly's somatic SV process will generate a huge difference in the number of putative SVs it infers when it is applied to bams that have been processed using sambamba to mark duplicates versus those that were processed using samtools.
It's uncanny, but if I take a bam (where duplicates haven't been removed), and use sambamba to mark and remove duplicate reads, the initial set of putative SVs will be 4x greater than if I use samtools to mark and remove the duplicate reads.
As an example, for the same sample, I'll get 16500 SVs (this is on a whole-exome sample) if samtools was used to mark & remove duplicates. However, for the same initial bam, if sambamba was to mark & remove duplicates, delly will infer roughly 62000 SVs. This behavior isn't unique to this one sample, however, as we've observed similar behavior for other samples.
Has anyone else observed the same? If so, is this a known issue? I checked the README, and didn't see any mention of it.
If there's an issue using delly with bams that have been processed with sambamba, it might be good idea to add a note about that to the README.