Hello Everyone,
We are happy to announce the release of the Panmask Easy 151b Regions track for hg38. This new track is available in the Problematic Regions superTrack. The track contains a set of sample-agnostic easy regions where short-read variant calling reaches high accuracy. Easy regions are derived for variant filtration agnostic to individual samples. They are genomic intervals where general variant callers achieve high accuracy without sophisticated filtering.
The pm151 regions are used to filter spurious variant calls in centromeres, long repeats, and other genomic regions where short-read mapping is often problematic. They cover 88.2% of hg38, 92.2% of coding regions, and 96.3% of ClinVar pathogenic variants. The track can be used to filter variant calls for clinical or research human samples. It shows regions that are easy to sequence, rather than those that are problematic. The data was derived from the HPRC assemblies, and this track presents the 151b-easy panmask set.
We would like to thank the HLi lab at Harvard Medical School for making this data available. We would also like to thank Max Haeussler and Gerardo Perez for their efforts on this release.