Update on GIAB-related and T2T manuscripts

71 views
Skip to first unread message

Zook, Justin M. (Fed)

unread,
Jul 14, 2021, 11:36:08 AM7/14/21
to GIAB Analysis Team, genome-in...@googlegroups.com

Dear Genome in a Bottle Consortium,

 

We are excited about several recent preprints from GIAB and from the Telomere-to-Telomere Consortium, which may be of interest to many of you.

 

First, from GIAB, we are in the process of responding to generally positive reviews for 3 manuscripts, and we welcome any suggestions for these manuscripts that we might incorporate in the final versions of these:

  1. Challenging Medically Relevant Genes small variant and SV benchmarks in HG002
  2. Small variant benchmark (v4.2.1) including more difficult regions
  3. precisionFDA Truth Challenge V2 Manuscript (based on v4.2.1 small variant benchmark) 

 

In addition, the Telomere-to-Telomere Consortium, led by Adam Phillippy and Karen Miga, recently finished the first complete human genome sequence, a monumental effort that inspired a variety of companion papers as well.  Of particular interest to GIAB, the Variants team demonstrated how the T2T-CHM13 reference assembly fixes errors in GRCh38 like false duplications and collapsed duplications, improving variant calls in medically relevant genes.  Almost all of the other papers used GIAB data and samples as well, including a near-complete assembly of chrX in HG002. The flagship and companion manuscripts available now are:

  1. Flagship from assembly team: https://www.biorxiv.org/content/10.1101/2021.05.26.445798v1
  2. Variants team: https://www.biorxiv.org/content/10.1101/2021.07.12.452063v1
  3. Segmental Duplications team: https://www.biorxiv.org/content/10.1101/2021.05.26.445678v1
  4. Epigenetics team: https://www.biorxiv.org/content/10.1101/2021.05.26.443420v1
  5. Transposable elements team: https://www.biorxiv.org/content/10.1101/2021.07.12.451456v1
  6. Centromere/Satellite team: https://www.biorxiv.org/content/10.1101/2021.07.12.452052v1
  7. Polishing team: https://www.biorxiv.org/content/10.1101/2021.07.02.450803v1

We created the first benchmark on the T2T-CHM13 reference similar to the assembly-based Challenging Medically Relevant Genes Benchmark for HG002, which is described in the Variants team manuscript above.  In the next few weeks, we plan to release v4.2.1 benchmarks for HG001, HG005, HG006, and HG007 on GRCh37 and GRCh38, similar to those already available for HG002-HG004.  We do not plan to develop v4.2.1 benchmarks on T2T-CHM13 due to the complexity in regenerating all of the callsets used for these on a new reference.  The medical gene benchmark helped demonstrate the improvements gained by basing benchmarks primarily on whole genome diploid assemblies, so we plan for future small variant and structural variant benchmarks (likely on GRCh37, GRCh38, and T2T-CHM13) to be based on assemblies of GIAB samples in collaboration with the Human Pangenome Reference Consortium and others. We likely will start with chrX and then work on the remainder of the genome.

 

Thank you for all your support of GIAB, and please let us know if you have any feedback about our manuscripts or plans.

 

Cheers,

Justin, on behalf of the NIST-GIAB Team

Reply all
Reply to author
Forward
0 new messages