Protein nomenclature for insertions that encode a translation stop codon

175 views
Skip to first unread message

David

unread,
May 30, 2024, 8:49:37 AM5/30/24
to HGVS Nomenclature

Dear HGVS team,

I would like to submit a particular point of ambiguity in the HGVS nomenclature guidelines concerning insertions that encode a translation stop codon.

Example 1:

  • NM_013976.5(GCDH):c.1074_1075insAGTTGAAGGAC

This insertion results in the addition of SerTer after Asp358. The translation stop codon is encoded in the insertion.

The Variant Validator, Mutalyzer and VEP tools represent this alteration as a frameshift:
p.(Gln359Serfs*2)

Example 2:
Modified from example 1 by adding one nucleotide:

  • NM_013976.5(GCDH):c.1074_1075insAGTTGAAGGACT

Since the nucleotide is added after the encoded termination signal, it does not impact the actual protein modification. However, the insertion's total length is now a multiple of three, leading to its classification as an insertion rather than a frameshift:
p.(Asp358_Gln359insSer*)

These two examples highlight how, for these prediction tools, the "multiple of three" rule plays a determinant role for the frameshift determination, while the presence of an encoded stop codon does not. In contrast, the HGVS guidelines do not involve the "multiple of three rule". Instead, they involve the open reading frame status of the inserted sequence.

The guidelines define frameshift as follows:
"Frameshift: a sequence change between the translation initiation (start) and termination (stop) codon where, compared to a reference sequence, translation shifts to another reading frame."

From the Frameshift, Insertions, Deletions-Insertions, and Duplications sections, it is stated that: inserted sequences on DNA or RNA level that:

  • encode a translation stop codon in the inserted sequence are on the protein level described as an insertion of this sequence, not as a deletion-insertion removing the entire C-terminal amino acid sequence.
  • encode an open reading frame which after the inserted sequence shifts to another reading frame, are described as a frameshift.

Notably, the emphasis on the term “after” (in the text) suggests that it is crucial for the classification that the sequence after the insertion continue to be translated.
 
The two insertions reported above share the following characteristics:

  • they do not encode an open reading frame
  • they do encode a translation stop codon in the inserted sequence.
  • the translation of the affected protein (if it actually occurs in vivo) does not actually shift to another reading frame.
  • they result in the same protein alteration

Without involving the "multiple of three" rule, It is unclear whether NM_013976.5(GCDH):c.1074_1075insAGTTGAAGGAC should be represented as a frameshift p.(Gln359Serfs*2) or simply as an insertion p.(Asp358_Gln359insSer*).

Since none of the examples provided in the guidelines directly address this scenario, I suggest adding the "non-multiple-of-three" case as a new example in the Frameshift section. This would help for the determination of similar situations, including Duplications, and Deletion-Insertions, reducing subjective interpretation and ensuring consistent application of the HGVS nomenclature.
 
Kind Regards,
David Hernandez

Sophia Genetics SA

Johan den Dunnen

unread,
Jun 4, 2024, 10:51:18 AM6/4/24
to HGVS Nomenclature
Dear David,

regarding your question;

> NM_013976.5(GCDH):c.1074_1075insAGTTGAAGGAC
> This insertion results in the addition of SerTer after Asp358. The translation stop codon is encoded in 
> the insertion. The Variant Validator, Mutalyzer and VEP tools represent this alteration as a 
> frameshift: p.(Gln359Serfs*2)

Please note HGVS does not allow the addition of a gene symbol ("(GCDH)") in variant descriptions. Correct is NM_013976.5:c.1074_1075insAGTTGAAGGAC.

When I check the HGVS nomenclature recommendations then the "inserted sequence encodes a translation stop codon", so the variant should be described as an insertion, NOT as a frame shift. The same is true for variant NM_013976.5:c.1074_1075insAGTTGAAGGACT, this insertion also encodes a translation stop codon.

I will check the nomenclature pages and see how we can improve them, e.g. by adding more examples as you suggest. .Thanks for pointing this out.

Best regards,

Johan den Dunnen
HUGO HGVS Variant Nomenclature Committee (HVNC)

Op donderdag 30 mei 2024 om 14:49:37 UTC+2 schreef david.g.h...@gmail.com:

David

unread,
Jul 17, 2025, 10:09:46 AMJul 17
to HGVS Nomenclature
Dear Johan,

I hope this message finds you well.

I would like to follow up on the question I raised in May 2024 regarding the correct HGVS description of DNA insertions that encode a stop codon. At that time, you confirmed that such variants should be described as insertions, regardless of whether the inserted sequence length is a multiple of three.

However, this point remains a source of confusion within the community. Major tools such as Variant Validator, Mutalyzer, and VEP still appear to apply the "multiple of three" rule as the primary criterion for deciding between a frameshift and an insertion (or a delins), without considering whether the inserted sequence encodes a stop codon.

This suggests that the current HGVS guidelines are not sufficiently clear on this point. To help avoid inconsistent variant descriptions across tools, databases, and publications, I would like to kindly reiterate my request for clarifying the guidelines.
This could be addressed by adding an explicit example of a non-multiple-of-three insertion that encodes a stop codon and should nevertheless be described as an insertion.

I appreciate your time and the ongoing efforts of the HGVS team to maintain a rigorous and practical nomenclature system.

Kind Regards,
David Hernandez

Sophia Genetics SA

Reply all
Reply to author
Forward
0 new messages