Hello everyone,
I'm currently analyzing whole genome BAM files using both Qualimap and samtools depth, and I’ve noticed differences in the reported coverage values that I’d like to understand better.
In particular, I’m comparing the output from the file coverage_across_reference.txt generated by Qualimap with coverage values computed by my own script, which processes the output from samtools depth -a.
Here's what I’m doing:
I use samtools depth -a to extract per-base coverage for all genomic positions.
Then I divide the genome into windows of 3,000,000 bp.
For each window, I calculate either the mean or median coverage from the depth values in that region.
However, the average coverage values I obtain are often lower than those reported by Qualimap — even in windows that appear to have high coverage in IGV or Qualimap plots.
I suspect the reason is that Qualimap computes coverage based on the total number of mapped bases per window, whereas samtools depth averages per-position coverage (which includes a lot of 0s). I’d like to confirm:
Does coverage_across_reference.txt contain mean coverage calculated as total mapped bases divided by the effective window length?
Does Qualimap include all mapped reads regardless of whether multiple reads pile up at the same positions?
Would this explain why peaks seen in Qualimap are "diluted" in the samtools-based script?
Thank you very much in advance for clarifying this! I'm aiming to reproduce Qualimap’s behavior more closely and would greatly appreciate any insight into the internal logic behind this output.