How to translate TERT NM_198253 promoter -124C>T?

24 views
Skip to first unread message

Jonathan Freidin

unread,
Dec 6, 2018, 5:21:57 PM12/6/18
to hgvs-discuss
Hi,
I read a related post that describes this limitation, but I'm wondering how to best work around it.

Mapping it the usual way I get:
In [40]: am37.c_to_g(hp.parse_hgvs_variant("NM_198253.2:c.-124C>T"))
HGVSInvalidIntervalError: The given coordinate is outside the bounds of the reference sequence.

Sort of wondering why AM can't just do the math...but any suggestion will be appreciated.
Best,
Jonathan

Reece Hart

unread,
Dec 9, 2018, 12:49:59 PM12/9/18
to hgvs-discuss
Coordinates beyond the bounds of a transcript are like street addresses beyond the end of a street: we can imagine that the land exists, but we shouldn't pretend that we know what's there with any certainty.

For transcript variants in the imaginary areas beyond the ends of the transcript, the reference sequence is unknown. That means that you can't validate or normalize such variants. (The same is true for RefSeq transcripts, but we give that special dispensation because those variants are so common and we emit a warning when they're encountered.) People often suggest that we could use "the reference genome", but such regions differ even between GRCh37 and 38, so we'd need to specify which one to use, at which point we'd have a transcript variant defined in a region that doesn't exist using reference residues and normalization that depends on a sequence that isn't specified in the variant.

So, the problem with "just doing the math" is that it's guesswork at best. Extrapolating off the end of a transcript is inadvisable.

I think I'd be okay with a flag to change this behavior, but no one has offered to implement that. As always, PRs are welcome.

-Reece


--
You received this message because you are subscribed to the Google Groups "hgvs-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-discuss...@googlegroups.com.
To post to this group, send email to hgvs-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/hgvs-discuss.
To view this discussion on the web visit https://groups.google.com/d/msgid/hgvs-discuss/952eaf06-515e-45b9-aafb-d03d3b696dc7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reece Hart

unread,
Dec 9, 2018, 1:53:30 PM12/9/18
to hgvs-discuss
As for what you should do about it...

At this time, the only thing you can do is to compute it yourself. Here's one way off the cuff:
  • n. coordinates are like c., but the origin is at the start of the transcript sequence (not the translation start). 
  • Project n.1 to c. on the transcript. This will tell you the minimum c. position supported on that transcript. It'll be negative for most transcripts, and perhaps 1 (i.e., n.1 == c.1) for transcripts where there is no 5' UTR. The difference between this value and the given c. value (e.g., -124) is the extrapolation amount.
  • Project n.1 to g. This will give you the transcription start site on the genome. Now, subtract the extrapolation amount for + strand transcripts, and add for - strand transcripts.
-Reece




--
Reply all
Reply to author
Forward
0 new messages