RefSeqGene discrepancies with HUGO

13 views
Skip to first unread message

James Myslik

unread,
Sep 2, 2016, 10:41:45 AM9/2/16
to gen...@soe.ucsc.edu

I’m working on a graph database that includes genomic information (hg38) downloaded from UCSC with Table Browser. I’m linking it to gene information from the HUGO database. I’ve noticed some discrepancies in chromosome location between the two. For instance, Table Browser locates the following on chr22, the HUGO location is in the last column.

 

0

chr22

NR_132385

long intergenic non-protein coding RNA 1297

LINC01297

14q11.2

0

chr22

NR_073459

BMS1, ribosome biogenesis factor pseudogene 18

BMS1P18

14q11.2

0

chr22

NR_039973

microRNA 5096

MIR5096

4

0

chr22

NR_073460

BMS1, ribosome biogenesis factor pseudogene 17

BMS1P17

14q11.2

 

There are more discrepancies on other chromosomes. Forgive me if I’m asking a naïve question. I’m relatively new to bioinformatics.

 

Thanks!

 

Christopher Lee

unread,
Sep 2, 2016, 6:58:30 PM9/2/16
to James Myslik, UCSC Genome Browser Discussion List

Hi James,

Thank you for your question about RefSeq and HUGO discrepancies. The genes you listed in your example are all RNA's or pseudogenes that map to multiple locations in the genome. Note the BLAT results for the first gene in your list:

BLAT Search Results

Go back to chr22:15746674-15778289 on the Genome Browser.

   ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  START    END      SPAN
---------------------------------------------------------------------------------------------------
browser details NR_132385        570     1   572   572 100.0%    22   +   15746674  15778289  31616
browser details NR_132385        561     1   572   572  99.4%    14   -   19344327  19375967  31641
browser details NR_132385        561     1   572   572  99.4%    14   +   19024061  19055696  31636
browser details NR_132385        520     1   572   572  96.0%     2   +  131644195 131676060  31866
browser details NR_132385        514     1   572   572  95.7%    18   -   14444408  14493506  49099
browser details NR_132385        499     1   572   572  94.7%  15_KI270852v1_alt   -      96785    148106  51322
browser details NR_132385        499     1   572   572  94.7%  15_KI270727v1_random   -      66034    117355  51322
browser details NR_132385        499     1   572   572  94.7%    15   -   21338762  21390139  51378
browser details NR_132385        499     1   572   572  94.7%    15   -   20769675  20821024  51350
browser details NR_132385        443     1   572   572  94.4%    21   +   13659844  13710474  50631
browser details NR_132385        441     1   572   572  94.1%    21   +    6794799   6845451  50653

You can see that there are multiple high quality matches to the NR_132385 sequence, including the 2nd and 3rd result from the top, which correspond to 14q11.2.

The reason there is a discrepancy between the HUGO and RefSeq Table Browser results has to do with how we build the RefSeq track: we take the sequence (and not coordinates) from NCBI, which we then align to the genome with BLAT, and keep the best alignment, which in this case is the one on chr22. You can find more information about this procedure on the RefSeq track description page:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=refGene

There is an additional wrinkle in that sometimes the assemblies used to map transcripts change, in this case from hg19 to hg38. The following previously answered mailing list questions elaborate further on both of these topics:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/kJh3YJCiCDs/PEGTRqZdMwAJ
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/JPlIGxbsN6o/iVG5lSjSAAAJ

What likely happened is that HUGO used a different pipeline (or different assembly) to map the transcripts, and thus came up with the 14q11.2 location, while we came up with the chr22 location.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Christopher Lee
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Reply all
Reply to author
Forward
0 new messages