Query Regarding HGVS Nomenclature for Copy Number Variants (Amplifications)

37 views
Skip to first unread message

Ketaki Karmalkar

unread,
May 2, 2026, 11:40:05 AMMay 2
to hgvs-nom...@googlegroups.com
Respected Sir/Ma'am,
In our laboratory, we perform next-generation sequencing (NGS) using Illumina's short-read technology. A question has arisen regarding the appropriate HGVS nomenclature for reporting duplication copy number variants (CNVs).  

 The HGVS nomenclature guidelines recommend the format xxx_yyy[n] for duplication events inserted directly 3' of the original copy (https://hgvs-nomenclature.org/stable/recommendations/DNA/duplication/). 
image.png

Since our CNV calls are derived from NGS data, we are unable to define exact breakpoints. Instead, we report a range of possible breakpoints, as illustrated in the following example:  
chr19:(45409925_45411016)_(45411210_45411789)dup
c.(43+1_44-1)_(236+1_237-1)dup
(Duplication of exon 3)

We have identified two limitations with using the dup suffix in this context:

  1. Tandem assumption: The term "duplication" implies that the amplified region is in tandem with the original sequence. Using short-read NGS data alone, we cannot definitively confirm whether the duplicated segment is arranged in tandem.
  2. Copy number ambiguity: The dup suffix does not convey the number of copies of the amplified region, which is a value we are able to derive from our NGS data.

To address these limitations, we propose replacing dup with [n], where n represents the copy number determined from NGS data. An example of this proposed notation is as follows:

chr19:(45409925_45411016)_(45411210_45411789) [3]
c.(43+1_44-1)_(236+1_237-1) [3]
(Duplication of exon 3) 

We would like to know whether this annotation approach is acceptable under current HGVS guidelines.

Regards
Ketaki Karmalkar
Scientist-III  

Ketaki Karmalkar

unread,
May 19, 2026, 7:03:10 AMMay 19
to hgvs-nom...@googlegroups.com
Hello,
Gentle reminder.

Regards
Ketaki Karmalkar
Scientist-III  

Message has been deleted

Johan den Dunnen

unread,
Jun 9, 2026, 10:21:13 AMJun 9
to HGVS Nomenclature
Dear Ketaki,

sorry to be slow with this reply; an answer is not that simple.

1) you do not give the reference sequences used: I assume you mean the variant NC_000019.9:g.(45409925_45411016)_(45411210_45411789)dup NM_000041.2:c.(43+1_44-1)_(236+1_237-1)dup. Note in your description "g." was missing.

2) the format you suggest, NC_000019.9:g.(45409925_45411016)_(45411210_45411789)[3], cannot be used in HGVS nomenclature. An essential element of the HGVS nomenclature is that it describes the "differences compared to a reference sequence". The difference in the example is one additional copy, so going from 1 to 2 copies. 

3) I agree the description NC_000019.9:g.(45409925_45411016)_(45411210_45411789)dup, by using "dup", assumes the extra copy of the sequence is "in tandem" with the original copy, a conclusion you cannot conclude when the break point has not been sequenced. The topic is listed on the Recommendations > DNA > duplication page (see https://hgvs-nomenclature.org/stable/recommendations/DNA/duplication/), where the format g.?_?ins[NC_000019.9:g.(45409925_45411016)_(45411210_45411789)] is suggested as an alternative description. This format indicates that an extra copy was identified but that the location of the additional copy is not known and could be anywhere in the genome. 

Best regards,

Johan den Dunnen
HUGO HGVS Variant Nomenclature Committee (HVNC)
Op zaterdag 2 mei 2026 om 17:40:05 UTC+2 schreef ketaki.k...@strandls.com:
Reply all
Reply to author
Forward
0 new messages