Possible error: Identical name/value fields in hg38 knownToEnsembl table

9 views
Skip to first unread message

Archana Shenoy

unread,
Feb 27, 2019, 3:47:37 PM2/27/19
to gen...@soe.ucsc.edu
Hi,

I am trying to convert known gene ids to ensembl ids and when I download the knownToEnsembl.txt file for the hg38 reference, I find that the name and value fields have identical ensembl ids and the mapping to the known gene id is not available. 

Searching by known gene id in the Table Browser outputs the row for the correct ensembl id but again without the corresponding known id.

Attached is a screen shot of the first few lines of the file. 

Would it be possible to update to the correct version of the file?

Thanks,
Archana
Screen Shot 2019-02-27 at 11.58.16 AM.png

Daniel Schmelter

unread,
Feb 28, 2019, 9:06:04 PM2/28/19
to Archana Shenoy, UCSC Genome Browser Discussion List

Hello Archana,

Thank you for your question about the knownToEnsembl table and reporting a strange behavior.

The duplication you saw was actually an intentional decision, made in order to keep consistency for existing pipelines. The knownToEnsembl table shows the connection between our knownGene primary geneIDs and Ensembl geneIDs. UCSC switched our main geneIDs to the Ensembl identifiers about 7 months ago, thus both columns should be the same. If you are using hg38's knownGene, you may not have to convert gene IDs at all, since knownGene.txt has both Ensembl and UCSC identifiers. You can download the knownGene file for hg38 from our download site here:

http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/knownGene.txt.gz

If you would like a file with just those two columns, you can extract those name columns with an awk command like the following:

awk 'BEGIN {FS="\t"} {print $1, $12}' knownGene.txt

For hg19, the knownToEnsembl table still contains the UC to Ensembl conversion columns you may have expected. That data for hg19 can be accessed here: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/knownToEnsembl.txt.gz

I hope that was helpful. Thank you for writing in!

Kindly,
Daniel Schmelter
UCSC Genomics Institute

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAH2Kb95kMHyXK%3DxZkZOVhOEfEnx5qzKqWqwQ%3DnnDFJQyjxH5qQ%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.
Reply all
Reply to author
Forward
0 new messages