Dear GIAB Community,
We are excited to make available v2.0 of the GA4GH/GIAB stratification bed files, which can be used with GIAB benchmark sets to understand strengths and weakness of variant callsets in different genome contexts, including homopolymers, tandem repeats, segmental duplications, GC content, coding regions, and the MHC. Major enhancements relative to the previous release are:
Please let us know if you have any questions or suggestions for improvements in future versions. A description of the files and how they were produced is at https://github.com/genome-in-a-bottle/genome-stratifications. The BED files and tsv files for use with hap.py when doing benchmarking are at ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/genome-stratifications/v2.0/. If you use these files, we ask that you cite and follow the GA4GH Benchmarking Best Practices manuscript https://doi.org/10.1038/s41587-019-0054-x as well as the data DOI https://doi.org/10.18434/M32190 (which will be active soon).
Best wishes,
Justin Zook, Jennifer McDaniel, Nate Olson, and Justin Wagner