Updated resource for stratifying variant calls by genome context

23 views
Skip to first unread message

Zook, Justin M. (Fed)

unread,
Mar 12, 2020, 4:40:11 PM3/12/20
to GIAB Analysis Team, genome-in...@googlegroups.com

Dear GIAB Community,

 

We are excited to make available v2.0 of the GA4GH/GIAB stratification bed files, which can be used with GIAB benchmark sets to understand strengths and weakness of variant callsets in different genome contexts, including homopolymers, tandem repeats, segmental duplications, GC content, coding regions, and the MHC. Major enhancements relative to the previous release are:

 

  1. BED files for GRCh38 as well as updated files for GRCh37
  2. Merged files covering all difficult regions as well as bed files for relatively easy regions
  3. Additional difficult regions like the MHC, VDJ, and errors in the GRCh37 reference
  4. Improved coverage of low complexity regions
  5. Genome-specific files for complex variants and SVs in HG002 used in our v4.1 small variant benchmark (v4.1 benchmark available at ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_v4.1_SmallVariantDraftBenchmark_12182019/)

 

Please let us know if you have any questions or suggestions for improvements in future versions. A description of the files and how they were produced is at https://github.com/genome-in-a-bottle/genome-stratifications. The BED files and tsv files for use with hap.py when doing benchmarking are at ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/genome-stratifications/v2.0/. If you use these files, we ask that you cite and follow the GA4GH Benchmarking Best Practices manuscript https://doi.org/10.1038/s41587-019-0054-x as well as the data DOI https://doi.org/10.18434/M32190 (which will be active soon).

 

Best wishes,

Justin Zook, Jennifer McDaniel, Nate Olson, and Justin Wagner

 

Reply all
Reply to author
Forward
0 new messages