Annotation for intronic variants

23 views
Skip to first unread message

Matt Ducar

unread,
Sep 9, 2022, 2:28:30 PM9/9/22
to hgvs-discuss
Hi,

Still digging into the nuances of how the package works, but I didn't see any commentary on this anywhere.

The HGVS 20.05 recommendations indicate you can't annotate a variant in an intron using just an NM_ reference because the reference doesn't include those sequences.  They indicate you should use a syntax like:
NC_000023.10(NM_004006.2):c.357+1G>A

I'm parsing the variant in as a genomic representation and calling g_to_c to map the variant to the transcript.  Is there a parameter I'm missing to have the hgvs package generate the full string as recommended by HGVS?

If not:
1. Is this a bug I should report? or
2. Is this intended behavior and we need to post-process the output from the hgvs package to fully comply with HGVS recommendations?

Thanks,
Matt

Reece Hart

unread,
Sep 10, 2022, 5:05:35 PM9/10/22
to hgvs-d...@googlegroups.com
Hi Matt-

Yes, this is the intended behavior and it is unlikely to change. My recommendation is to preparse the variant to remove the chromosomal accession, then project the transcript variant to that chromosome if you want a genomic variant.

hgvs does not validate intronic variants precisely because intronic sequence is not a part of NM records, so you will likely see a warning when doing this projection.

That HGVS syntax was added 3 years ago, long after the hgvs parser was written. The parenthetical transcript syntax isn't currently supported. (I also think it's an unfortunate syntax decision, but that's beside the point.)

-Reece

 

--
You received this message because you are subscribed to the Google Groups "hgvs-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hgvs-discuss/dd887bfd-b07a-4dbb-a8be-9563b5ed2337n%40googlegroups.com.
Message has been deleted

Matt Ducar

unread,
Sep 27, 2022, 3:09:48 PM9/27/22
to hgvs-discuss
Hi Reece,

Thanks for getting back to me.  I understand your suggestion to preparse a variant to remove the chromosomal accession -- but that is going in the opposite direction of what I'm trying to do.

Take the following code snippit:
from hgvs import parser
from hgvs.dataproviders.uta import connect
from hgvs.assemblymapper import AssemblyMapper

hp = parser.Parser()
hdp = hgvs_data_provider = connect()
am37 = projector = hgvs_assembly_mapper_37 = AssemblyMapper(hgvs_data_provider, assembly_name='GRCh37')

var_g = hp.parse_hgvs_variant("NC_000017.10:g.7578369A>C")
print(var_g)

var_t = am37.g_to_t(var_g, "NM_000546.5")
print(var_t)

I get the following printout:

NC_000017.10:g.7578369A>C

NM_000546.5:c.559+2T>G

That isn't the output I expected based on the HGVS documentation at:
https://varnomen.hgvs.org/bg-material/numbering/

Which states: "a coding DNA reference sequence does not contain intron or 5’ and 3’ gene flanking sequences and can therefore not be used as a reference to describe variants in these regions see Reference Sequences."

The expected annotation would instead be:
NC_000017.10(NM_000546.5):c.559+2T>G

Reece Hart

unread,
Sep 29, 2022, 9:06:42 PM9/29/22
to hgvs-d...@googlegroups.com
Hi Matt-

Everything you wrote is correct. And, to be clearer (I hope): hgvs will neither parse nor generate the annotation you expect. 

The only solution at this time is to pre-parse the c. variant when it contains a genomic accession, and to post-process when you want to add a genomic accession. 

Implementing this part of the spec isn't too hard, I think, but it's just not done.

Does https://github.com/biocommons/hgvs/issues/642 capture your feature request?

-Reece


On Tue, Sep 27, 2022 at 11:40 AM Matt Ducar <matt....@gmail.com> wrote:
Hi Reece,

Thanks for getting back to me.  I understand your suggestion to preparse a variant to remove the chromosomal accession -- but that is going in the opposite direction of what I'm trying to do.

Take the following code snippit:
from hgvs import parser
from hgvs.dataproviders.uta import connect
from hgvs.assemblymapper import AssemblyMapper

hp = parser.Parser()
hdp = hgvs_data_provider = connect()
am37 = projector = hgvs_assembly_mapper_37 = AssemblyMapper(hgvs_data_provider, assembly_name='GRCh37')

var_g = hp.parse_hgvs_variant("NC_000017.10:g.7578369A>C")
print(var_g)

var_t = am37.g_to_t(var_g, "NM_000546.5")
print(var_t)from hgvs import parser

from hgvs.dataproviders.uta import connect
from hgvs.assemblymapper import AssemblyMapper

hp = parser.Parser()
hdp = hgvs_data_provider = connect()
am37 = projector = hgvs_assembly_mapper_37 = AssemblyMapper(hgvs_data_provider, assembly_name='GRCh37')

var_g = hp.parse_hgvs_variant("NC_000017.10:g.7578369A>C")
print(var_g)

var_t = am37.g_to_t(var_g, "NM_000546.5")
print(var_t)

I get the following printout:

NC_000017.10:g.7578369A>C

NM_000546.5:c.559+2T>G

That isn't the output I expected based on the HGVS documentation at:
https://varnomen.hgvs.org/bg-material/numbering/

Which states: "a coding DNA reference sequence does not contain intron or 5’ and 3’ gene flanking sequences and can therefore not be used as a reference to describe variants in these regions see Reference Sequences."

The expected annotation would instead be:
NC_000017.10(NM_000546.5):c.559+2T>G



--
You received this message because you are subscribed to the Google Groups "hgvs-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-discuss...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages