New GIAB benchmark for challenging medically relevant genes

25 views
Skip to first unread message

Zook, Justin M. (Fed)

unread,
Jun 8, 2021, 11:00:26 AM6/8/21
to genome-in...@googlegroups.com, GIAB Analysis Team

Dear Genome in a Bottle Community,

 

We are excited to release a GIAB benchmark for 273 challenging medically relevant genes like SMN1, using a hifiasm diploid assembly of HG002. In the process we identified and corrected false duplication errors in GRCh38. We use the new CMRG benchmark to demonstrate a solution that improves short read accuracy from 8% to 100% in important genes on GRCh38 (CBS, CRYAA, and KCNE1). Specifically, we worked with the GRC to mask 5 GRCh38 false duplications on chr21. We are developing a more comprehensive list of false duplications that cover >1Mbp in GRCh38 with the Telomere to Telomere Consortium, so stay tuned for an improved masked GRCh38 soon, but an initial masked version of GRCh38 is under https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/. The new T2T reference fixes these and other issues as well, and we have a draft benchmark for the T2T-CHM13 reference. Notes from our curation of >1000 variants in this benchmark are in Supplementary File 5, and there are a lot of nice examples of short and long read alignments to challenging genes in IGV screenshots in the main and supplementary figures.

 

The preprint describing the new benchmark and findings is at https://www.biorxiv.org/content/10.1101/2021.06.07.444885v1.  Thanks to Justin Wagner, Jason Chin, and Fritz Sedlazeck for co-leading this work, as well as many others for contributing to the analysis and evaluation of the benchmark. As always, please let me know if you have any feedback about the benchmark.

 

Cheers,

Justin Zook

 

 

 

Reply all
Reply to author
Forward
0 new messages