Q100 Variant Benchmarkset External Evaluation

87 views

Skip to first unread message

Olson, Nathanael David (Fed)

unread,

Nov 22, 2024, 10:21:44 AM11/22/24

to giab-anal...@googlegroups.com

Hi All,
As many of you know we have been working on new assembly-based small and structural variant benchmark sets for HG002 using the Q100 T2T assembly, https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_HG002_DraftBenchmark_defrabbV0.019-20241113/. Prior to an official release, our draft benchmark sets undergo external evaluation where we compare our draft benchmark set to collaborator-provided callsets, and a subset of the discrepancies are manually curated to ensure our benchmark sets are fit for purpose, able to reliably identify errors (RIDE, https://rdcu.be/d0Kiw).

We would greatly appreciate it if you would be willing to participate in the external evaluation process of our new benchmark sets. This would entail providing a high-accuracy callset against GRCh38 for us to compare to our draft benchmark and manually curate a subset of the discrepancies. Note that the goal of this effort is to evaluate the benchmark, rather than to evaluate individual callsets. To streamline the evaluation process, please provide only a single VCF unless you have reason to believe the different VCFs will help identify different types of errors in the benchmark and you are able to curate variants we send from each VCF (50-100 variants for small variant callsets or 40-60 for structural variant callsets). Additionally, structural variant callsets should be sequence-resolved. Evaluators will be asked to provide callsets for HG002 against GRCh38 by 12/6, along with documentation for how the callset was generated using the attached markdown template. We plan to provide a list of variants for curation in early January and collect the curations two weeks later.

If you are interested in participating in the external evaluation process please email me (nol...@nist.gov) with your proposed variant calling method and sequencing data used so we can select evaluators with complementary methods. To help with the evaluator selection and ensure the VCF are appropriately formatted for benchmarking we encourage potential evaluators to benchmark their callsets against the draft benchmark using recommended methods described in the draft benchmark README.

If you do not have a callset but would like to help with the evaluation in another way let us know.

Thanks for your continued participation in the Genome In A Bottle consortium. This work would not be possible without you!

Nate Olson and the NIST GIAB Team