hgvs package clinical validation study

29 views
Skip to first unread message

Somak Roy

unread,
Aug 14, 2018, 7:16:38 AM8/14/18
to hgvs-discuss
Hi Reece,
I wanted to share our recently published (J Mol Diagn) work on clinical implementation and validation of hgvs package in our molecular pathology lab at UPMC. Thanks for your continued support of this project!


Regards,
Somak Roy

Reece Hart

unread,
Aug 14, 2018, 2:16:32 PM8/14/18
to Somak Roy, Keith Callenberg, hgvs-discuss
Hi Somak-

I'm thrilled to see a third-party validation study. Thanks for letting us know.

Would you please send me a PDF of the paper and the validation set? May I add the validation set to the hgvs test suite?

As Keith may have informed you, we just had a paper accepted to Human Mutation that highlights concordance with Mutalyzer, Clinvar, and HGMD, with explanations of discrepancies. (Spoiler: Discrepancies are less frequent than reported in your JMD abstract, and represent cases where hgvs is correct.)

What version of hgvs did you use? Please cite the hgvs version number in proofs if you didn't do so already and that's still an option.

Thanks,
Reece


--
You received this message because you are subscribed to the Google Groups "hgvs-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to hgvs-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/hgvs-discuss.
To view this discussion on the web visit https://groups.google.com/d/msgid/hgvs-discuss/f4d08bde-cf99-4de2-95a0-c85f91a6b95f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reece Hart

unread,
Aug 16, 2018, 3:58:41 PM8/16/18
to hgvs-discuss, Somak Roy, Dalgleish, Raymond W.M. (Prof.), Meng Wang, Causey-Freeman, Peter J. (Dr.)

Hi Somak-

Thank you for sending the file of purported errors in the hgvs package. I investigated your results and want to share my interpretations. My comments are in https://docs.google.com/spreadsheets/d/1y4f68Yg_cKNLtoAPYxmUqgRCWSPFxEZDVc2LBz1TTrI/edit#gid=57717990.

Your set contains 330 tests with 45 unique transcripts in 45 unique genes. There were 23 discrepancies (defined as “manual hgvs c.” not equal to “automated hgvs c.”) in the data set:

  • 1 typo
  • 3 raised exceptions
  • 19 intronic variants

The typo was not included in the 22 reported discrepancies and I include it here only to help explain rows in the spreadsheet. The remaining discrepancies represent two distinct issues. In the code below, I use hgvs 1.1.0, the same version that was used in your report.


1) Regarding the raised exceptions, it appears that you have selected a transcript that is not within the bounds of the gene in question. Note that the transcript your team used for the TERT variants is not returned by the relevant_transcripts() method (see below). When one tries to use a transcript that does not span a variant region, the full text of the resulting exception is “HGVSInvalidIntervalError: Position is beyond the bounds of transcript record”. Writing succinct error messages is hard and I welcome your suggestion if another error message would have made the issue clearer.

>>> hgvs.__version__
'1.1.0'
>>> var_g = hp.parse_hgvs_variant("NC_000005.9:g.1295250G>A")
>>> am37.relevant_transcripts(var_g)
[]
>>> am37.g_to_c(var_g, "NM_198253.2")
Traceback (most recent call last):
...
HGVSInvalidIntervalError: start or end or both are beyond the bounds of transcript record


2) Regarding the discrepancies in intronic variants, hgvs is operating correctly. Normalizing a variant requires having the context of the surrounding sequence. RefSeq transcripts do not contain intronic sequence, and therefore intronic variants on RefSeq transcripts cannot be normalized. The hgvs package warns that intronic variants are not normalized and that the unnormalized variant is returned. Please note that one should not assume a particular genome reference in place of the intronic sequence. Because different genome references (e.g., GRCh37 and GRCh38) may imply different intronic sequences, a variant might normalize differently on different assumed intronic sequences.

>>> hgvs.__version__
'1.1.0'
>>> var_g = hp.parse_hgvs_variant("NC_000017.10:g.29528504_29528505insT")
>>> am37.g_to_c(var_g, "NM_001042492.2")
WARNING:hgvs.assemblymapper:Normalization of intronic variants is not supported; returning unnormalized variant
SequenceVariant(ac=NM_001042492.2, type=c, posedit=1260+1_1260+2insT)


It is important to have other groups validate hgvs and I appreciate the time that your team took to do so. I do not yet have access to the manuscript and therefore do not know whether the above explanations were included in the Discussion. Either way, it seems to me that the abstract (pubmed) implies that hgvs generated incorrect variants when, in fact, both classes of discrepancies reflect expected behavior with a sound rationale. Furthermore, these apparent discrepancies were accompanied by exceptions and warnings that explained that behavior. If you agree with this assessment, please consider changes in proofs or attaching an addendum.

I welcome your reply and any suggestions that will improve the usability of hgvs.

Best wishes,
Reece


On Tue, Aug 14, 2018 at 4:16 AM, Somak Roy <roys...@gmail.com> wrote:

--

Ibrahim Vazirabad

unread,
Aug 16, 2018, 4:09:23 PM8/16/18
to hgvs-d...@googlegroups.com
Here is the accepted manuscript Reece.

To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-discuss...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "hgvs-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-discuss...@googlegroups.com.

To post to this group, send email to hgvs-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/hgvs-discuss.
callenberg2018.pdf

Angie Hinrichs

unread,
Aug 17, 2018, 6:39:23 PM8/17/18
to hgvs-d...@googlegroups.com, Somak Roy, Dalgleish, Raymond W.M. (Prof.), Meng Wang, Causey-Freeman, Peter J. (Dr.)
Hi Somak et al.,

Thank you for sharing your validation test cases -- all of our variant/HGVS tools can improve from cross-testing.  I found Yen et al's test cases to be very helpful when developing my HGVS parsing and encoding tools for use in the UCSC Genome Browser.  Did you also try testing with their variants?  I found what I believe are some typos and errors in the personalis/hgvslib github repo's hgvs_test_cases_reference.txt file and made corrections (as well as adding several variants shared by Mehis Pold, and corrections found by Peter Causey-Freeman) in my own fork.  Ultimately I would love for there to be an open test set that we all can more or less agree on, and clearly state our reasons for any differences.  

I agree with Reece that where your manual HGVS terms differ from the hgvs package, a strict interpretation of HGVS recommendations would favor the hgvs package.  HGVS recommendations may be sometimes at odds with clinical utility, but still one can't hold it against hgvs that it adheres to HGVS.  Upstream variant c./n. terms are used in practice, but they are discouraged by HGVS (http://varnomen.hgvs.org/recommendations/DNA/variant/substitution/ - Q&A "How should I describe a variant in the promoter region of a gene?").  Strict HGVS does have recommendations for intronic variants, but only when the genomic reference sequence accession is included.  For example, "NG_012232.1(NM_004006.1):c.93+1G>T" not "NM_004006.1):c.93+1G>T" (http://varnomen.hgvs.org/recommendations/DNA/variant/substitution/ second example).  [NC_ chromosome identifiers would work, for those whose work is tied to GRCh37 or GRCh38.]

I also feel that your manuscript could have given the hgvs package more credit.  The ship has sailed, but if I had been a peer reviewer, I would have asked you to do a global search & replace of "implementation" with "integration".  It was significant work for you to integrate and validate the hgvs package -- but Reece, Meng and collaborators implemented it.  My $0.02.  

Angie




Somak Roy

unread,
Aug 20, 2018, 12:10:50 PM8/20/18
to Reece Hart, hgvs-d...@googlegroups.com, raymond....@leicester.ac.uk, wang...@gmail.com, pj...@leicester.ac.uk, Lucas Santana dos Santos, Keith Callenberg
Dear Reece, Raymond, and Angie,
Thank you for your comments and feedback on the manuscript. Sorry for not being able to provide the full text of the manuscript along with the validation data last week. 
The url to access the full text (https://www.sciencedirect.com/science/article/pii/S1525157817306220; free access till October 9th)

During our validation, we were aware of the HGVS nomenclature limitations regarding the intronic and intergenic sequence variant as you have described and agree that those are not attributable to hgvs package directly. The point of including these findings was to provide examples of challenges in generating HGVS nomenclatures when a clinical laboratory is establishing their NGS pipeline using any method or software. 
We have explained the apparent discrepancies with respect to the intronic and intergenic sequence variants (TERT) and attributed to the lack of intronic sequences in RefSeq transcripts and the lack of standardized and consensus strategy for naming intergenic variants using HGVS (under discussion; pg 633). Given the intended audience was the broad molecular laboratory community, we did not get into very technical details as you have provided in your email. 
We are in the process of an ongoing, more technical study, looking into challenges of generating HGVS nomenclature using an expanded variant list that includes intronic, TERT, and other difficult sequence variants. We will be happy to include the provided suggestions and technical details in this email thread. 

Angie, we have not reviewed the Yen et al's test cases. Thanks for pointing to that resource. We agree that a open test set will be very helpful!
 
Overall our clinical experience using hgvs package has been very positive for streamlining our lab's workflow. We appreciate all your hard work in maintaining this resource! 

Best regards,
Somak             
    

To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-discuss...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages