MACS users,
I
was hoping to get some of your insight into an issue regarding read mapping and peak calling. For short-read mapping to the reference genome, do you only
use the uniquely mapped reads or do you use uniquely mapped reads plus
multi-mapped reads?
For
typical DHS samples (40bp reads), we get about 70% of reads map to
exactly 1 location (unique mapping) while about 25% align to more than 1
location (multi-mapping). For Bowtie2, for multi-mapped reads, it
randomly chooses the alignment. This may introduce a bias for
repetitive/ambiguous genomic regions that may translate into peak
calling.
What is your rationale and general practice regarding read mapping in relation to peak calling? Is it better to only
use the uniquely mapped reads or
use uniquely mapped reads plus
multi-mapped reads as input for peak calling?
Thanks,
Andy