Alu insertions

62 views
Skip to first unread message

Marco Montagna

unread,
Aug 14, 2024, 10:08:16 AM8/14/24
to HGVS Nomenclature
Dear All, 
if I understand correctly, according to the most recent HGVS nomenclature guidelines an Alu insertion should be described with the positions flanking the insertion followed by the inserted sequence coordinates and its own reference. "When the inserted sequence is not present in the reference genome, it should be submitted to a database (e.g., GenBank); the accession.version number obtained can then be used to describe the variant".
It usually happens that Alu inserts differ by a few nucleotides even when belonging to the same subfamily. 
While GenBank submission of these slightly differing sequences slows down the reporting times, I'm really wondering on the meaning of populating GenBank with these sequences that in same cases are likely to be more or less patient specific.
Is there any other way to report slightly different Alu sequences? I found the previous version much more reasonable and intuitive. But I'm probably missing something.
Many thanks
Marco Montagna

Johan den Dunnen

unread,
Aug 15, 2024, 2:39:03 AM8/15/24
to HGVS Nomenclature
Dear Marco,

using HGVS nomenclature the idea is that you should describe a variant in detail allowing, from the description, to generate the sequence of the variant you detected. So describing an Alu insertion like g.100_101insAlu will not work. Although everybody knows you describe the insertion of an Alu repeat element, the sequence itself is not described. Therefore HGVS nomenclature recommends: "When the inserted sequence is not present in the reference genome, it should be submitted to a database (e.g., GenBank); the accession.version number obtained can then be used to describe the variant".

It is however possible to describe the variant without a GenBank submissions. One option is to describe the variant as g.100_101ins{SEQUENCE} where {SEQUENCE} is the entire seuqence of the insertion. Another option is to describe the insertion as derived from an Alu repeat element elsewhere in the genome and include the differences, so a format like g.100_101ins[NC_000004.11:g.106370094_106370200;A;106370202_106370350;C;106370352_106370420]. Note that in this hypothetical example which I did not check for correctness, the Alu sequence located on chr 4 from g.106370094_106370420 was, compared to the reference, interrupted by substitutions at positions g.106370200 and g.106370351.

However, in all cases the description of the variant becomes rather lengthy (and prone to error), that is why HGVS recommends submission of the Alu repeat sequence identified.

Best regards,

Johan den Dunnen
HUGO HGVS Variant Nomenclature Committee (HVNC)

Op woensdag 14 augustus 2024 om 16:08:16 UTC+2 schreef marco.m...@iov.veneto.it:

Ivo Fokkema

unread,
Aug 21, 2024, 10:17:15 AM8/21/24
to marco.m...@iov.veneto.it, HGVS Nomenclature
Dear Marco,

I noticed this part in your question: "I found the previous version much more reasonable and intuitive".
Could you clarify what previous version you mean? We have recently updated large parts of the documentation, and I want to make sure that we haven't introduced any errors in this process. If you mean textual content, the previous version of the documentation can be found here: https://archive.hgvs-nomenclature.org/. Could you let us know what part of that documentation was more reasonable and intuitive? Or, if you mean the design/structure of the page, can you indicate what was better before?

Thank you!

Ivo

--
You received this message because you are subscribed to the Google Groups "HGVS Nomenclature" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-nomenclat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hgvs-nomenclature/905855d6-a236-4c85-b487-a2f7248f04efn%40googlegroups.com.

Marco Montagna

unread,
Aug 23, 2024, 9:40:43 AM8/23/24
to Ivo Fokkema, HGVS Nomenclature
Hi Ivo,
I meant the still widely used description such as "BRCA2 c.156_157insAlu" eventually referring to the specific Alu sub-family (i.e. without reference to the specific Alu sequence). I actually don't know which HGVS version, if any, ever contemplated it. I assumed it was one of the previous versions.
Sorry for the mistake
Thanks
Marco
Reply all
Reply to author
Forward
0 new messages