UTA transcripts obsolete in RefSeq

29 views
Skip to first unread message

Somak Roy

unread,
Apr 30, 2020, 1:24:56 PM4/30/20
to hgvs-discuss
Hi Reece,
I am building a list of prioritized RefSeq transcripts from the UTA database for generating HGVS nomenclatures. While doing so, I noticed that for a handful of genes, the records for a RefSeq transcript in UTA has been removed from the RefSeq database. For such genes, the UTA does not have another RefSeq transcript to use. The version of my UTA database and a couple of examples are below.
My prioritization is only for validated RefSeq transcripts (NM_ accession).

Am I missing any detail in UTA? If not, are you planning to update the transcripts in the UTA database for these genes to sync with the RefSeq database?

Thanks very much.
Regards,
Somak
 
UTA version (docker image): biocommons/uta:uta_20180821

examples
--------------------------------------------------
NM_004651.3     USP11

response after searching in NCBI database
NCBI Reference Sequence: NM_004651.3 (click to see this obsolete version)
Record removed. NM_004651.3: This RefSeq was removed because currently there is insufficient support for the transcript and the protein.
--------------------------------------------------

NM_031914.2     SYT16

NCBI Reference Sequence: NM_031914.2 (click to see this obsolete version)
Record removed. NM_031914.2: This RefSeq has been removed because currently there is insufficient support for the transcript and protein.
--------------------------------------------------

Reece Hart

unread,
May 1, 2020, 8:41:56 PM5/1/20
to hgvs-discuss
Hi Somak-

UTA is stale at the moment for two reasons: NCBI substantially changed some files that we relied on for loading, and UTA has only a maintenance level of support at the moment. The new USP11 transcript, NM_001371072.1, was released March 30, 2020, long after the most recent UTA update. I suspect that this is the case for other gaps. (Another possibility is that the gene symbol changed, which would result in a false negative if your query is based on the old symbol.)

Since I left Invitae, Andreas Prlic has been performing updates. I'm still in close contact with Invitae folks and have been contracting with them to support UTA and hgvs. However, updating the UTA loading process was not part of the contract. I do think it will happen eventually, but I can't promise when.

If anyone on the list has funding available to upgrade and update UTA, I'm all ears!

I'll follow-up on this group when UTA is updated.

-Reece


--
You received this message because you are subscribed to the Google Groups "hgvs-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hgvs-discuss/35769cca-120c-47ac-a3a9-274495b84acd%40googlegroups.com.

Somak Roy

unread,
May 5, 2020, 4:56:27 AM5/5/20
to hgvs-discuss
Hi Reece,
Thanks very much for the update! The hgvs and UTA are invaluable tools for my research and clinical bioinformatics work and I really hope that these projects get funded and maintained.

I use HGNC IDs instead of gene symbols for querying and therefore have not had trouble with querying the UTA with the exception of few transcript records missing gene annotations completely.

After having prioritized most of the transcripts in UTA by cross-referencing with RefSeq/MANE select records and APPRIS algorithm (GRCh37), it is working pretty well for cancer-related genes in my projects. I am happy to share the data with you, Andreas, and other developers of UTA/hgvs if that is helpful.

Best,
Somak 
To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-d...@googlegroups.com.

Reece Hart

unread,
May 25, 2020, 7:46:02 PM5/25/20
to hgvs-discuss
Hi Somak-

Sorry for dropping the line.  I appreciate your offer. I am happy to look if you have a question/concern about the data. However, I probably wouldn't make time to just explore what you've done.

FWIW, Andreas and I have talked about migrating UTA to use hgnc ids and/or NCBI gene ids. This would greatly simplify the kind of translation you're doing, I think.

-Reece


To unsubscribe from this group and stop receiving emails from it, send an email to hgvs-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hgvs-discuss/3856535b-ea03-4d8a-ae73-cc8f92eb1d89%40googlegroups.com.

Somak Roy

unread,
May 26, 2020, 7:00:01 AM5/26/20
to hgvs-discuss
Hi Reece,
No worries, these are busy times for everyone.
Thank you for offering to help with my questions/concerns. At this time, I have been able to get my project needs fulfilled with the current UTA version. Including HGNC (at least) and other gene ids in UTA will be a very helpful feature!

Best,
Somak
Reply all
Reply to author
Forward
0 new messages