Reporting imprecise repeat expansion variants

30 views
Skip to first unread message

Steven Cordes

unread,
Mar 1, 2025, 6:07:49 AMMar 1
to hgvs-nom...@googlegroups.com

Hi HGVS group,

 

At Epic we are working on a project to add support for labs to report repeat expansion variants with inequality and ranges of repeats for a given nucleotide sequence present in the gene.

 

For example, running Southern Blot Analysis or other gel electrophoresis testing for Huntington’s (HTT gene) may indicate that there are at least 42 repeats of the CAG nucleotide sequence, would we represent this as NG_009378.1:g.3074_3076CAG[>42], NG_009378.1:g.3074_3076CAG[(>42)]  or some other format?

 

Another tangentially related question I have is whether you’ve encountered questions about reporting variants that have some indeterminate number of repeats (ex: Instead of >42 the lab result indicates that the nucleotides are “expanded”)? This came up during our conversations with some labs and we’re trying to get a handle on what exactly would make sense for reporting a variant name against this type of result (would it be valid to report NG_009378.1:g.3074_3076CAG[(expanded)]?)

Lastly, if we document a range of values (ex: 8-12 repeats) is this the correct nomenclature? Based on what I’ve seen for general recommendations this seems correct, but I want to confirm.

NG_009378.1:g.3074_3076CAG[(8_12)]

 

Any additional insight or recommendations would be appreciated!

 

Best,

Steven

 

Steven Cordes

| Software Developer

1 (608) 271-9000

Johan den Dunnen

unread,
Mar 9, 2025, 12:34:44 PMMar 9
to HGVS Nomenclature
Dear Steven,

at the moment I am writing this reply, I could not access the NG_009378.1 reference sequence and to be sure my answer follows HGVS recommendations I have used NM_002111.6. Please note that using NG_ reference sequences is not common, NC_ and NM_ reference sequences are the preferred choices.

Based on NM_002111.6 the description of "at least 42 repeats of the CAG nucleotide sequence" using HGVS nomenclature is NM_002111.6:c.54_110GCA[(42_?)]. A few remarks. First, it is essential to describe the full extent of the repeat region you refer to. I used c.54_110 since the reference sequence is interrupted after this position by an "ACA" unit. Second, following the 3'rule the repeat unit in the HTT gene is GCA, not CAG. Third, a repeat of at least 42 repeats is described as (42_?), the parentheses indicating the uncertainty and "42_?" the minimal and maximal estimated repeat length.

The format for 8 to 12 copies is NM_002111.6:c.54_110GCA[(8_12)].

The format for an "expanded" allele is NM_002111.6:c.54_110GCA[(20_?)]. The only thing I can do here is assuming "expanded" means larger than the reference allele (which in NM_002111.6 is 19 units), so 20 or more copies. Probably the authors using "expanded" mean "expanded to an allele size associated with disease", but for this I would need to read the publication/report.

Best regards,

Johan den Dunnen
HUGO HGVS Variant Nomenclature Committee (HVNC)

Op zaterdag 1 maart 2025 om 12:07:49 UTC+1 schreef Steven Cordes:
Reply all
Reply to author
Forward
0 new messages