knownGene tables for hg38 and mm10

35 views
Skip to first unread message

James W. MacDonald

unread,
Jul 1, 2020, 1:10:54 PM7/1/20
to gen...@soe.ucsc.edu
Have the knownGene tables for hg38 and mm10 always been based on Gencode genes? If I wanted to get gene data for say hg38 that is most comparable (in provenance) to the knownGene table for hg19, would hg38.refGene be the closest match?

Thanks,

Jim



--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

Jairo Navarro Gonzalez

unread,
Jul 8, 2020, 6:58:51 PM7/8/20
to jma...@u.washington.edu, UCSC Genome Browser Discussion List

Hello Jim,

Thank you for using the UCSC Genome Browser and sending your inquiry.

Several years ago, we carried out a comparison of the human gene models from our traditional UCSC Genes with those from GENCODE Genes. We then made the decision to switch to GENCODE Gene models as our default set of genes (referred to as Known Genes) for human (hg38) and mouse (mm10).

Although it is beyond the scope of this mailing list to provide scientific advice, such as which track to use, I can provide information about the tracks so you can decide for yourself which track to use for your research. The following FAQ entries can explain the differences between the tracks:You can also learn more about the different tracks by reading the track description pages:

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genome Browser

Want to share the Browser with colleagues?
Host a workshop: https://bit.ly/ucscTraining


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAKO-U0pr8VGFw4jD9hX3yu29q8WkBuiGUPVMiEfEUotMSY4TsA%40mail.gmail.com.

James W. MacDonald

unread,
Jul 8, 2020, 7:40:02 PM7/8/20
to Jairo Navarro Gonzalez, UCSC Genome Browser Discussion List
Hi Jairo,

Thanks! I think I do understand the difference between the tracks, and I'm not asking which data I should use for research. Instead, I ask because the Bioconductor project has used the knownGene track as the source for genomic data that we provide in various R data packages, for many years now. Our intent was always that the underlying data would be based on NCBI data, as the rest of our annotation packages use NCBI Gene IDs as their central ID. And as I am sure you are aware, converting IDs from NCBI to Ensembl is a non-trivial task, so we want to keep the annotations based on NCBI Gene IDs.

Since the knownGene table for hg38 is now based on GENCODE annotations, it's no longer consistent with the knownGene table for hg19, and requires mapping of Ensembl to NCBI IDs. It appears that the refGene table for hg38 is probably the closest thing to a direct replacement for the old knownGene table, but given the complexity of how these data are generated, I am not sure that's exactly correct, and I hoped that you would be able to confirm that, or point me to a better choice.

Thanks!

Jim


Luis Nassar

unread,
Jul 10, 2020, 7:32:05 PM7/10/20
to jma...@u.washington.edu, Jairo Navarro Gonzalez, UCSC Genome Browser Discussion List

Hello Jim,

Thank you for the extra information.

There are two possible solutions. As you have said, you can change the track source from knownGene to ncbiRefSeq (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&c=chrX&g=refSeqComposite). In this case, you may want to use ncbiRefSeq instead of refGene, as ncbiRefSeq is built directly with the NCBI alignments.

A solution that may be better, however, would be to use our knownToLocusLink table for hg38. This table matches knownGene items with an NCBI Gene ID (Entrez ID), and would allow the current system to stay in place. This data can be found both on our public MySQL server, or our download server:

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/knownToLocusLink.txt.gz
https://genome.ucsc.edu/goldenPath/help/mysql.html

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute


Reply all
Reply to author
Forward
0 new messages