Dear Eric,
You can get the data that you're looking for using the table browser
found here: http://www.genome.ucsc.edu/cgi-bin/hgTables.
The "knownToLocusLink" table is part of the UCSC Genes set of tables. This table has Entrez identifiers for 164,238 of the 197,782 transcripts in the knownGene table, so it covers most of the gene set. In some of your examples, you may notice that there is no match for certain IDs. For example, Entrez Gene ID: 388795 (https://www.ncbi.nlm.nih.gov/gene/388795) is not included because it is not an NM_* id, it's an XM_*.
For this example, I'll use the hg38 assembly with the knownGene table, but you can choose the assembly or gene set as desired. I'll be using two example Entrez IDs from which I will filter my results to obtain basic annotation information.
26863
729574
Once you navigate to the Table Browser,
1. Select the following:
clade: Mammal
genome: human
assembly: hg38
group: Genes and Gene Prediction Tracks
track: GENCODE v24
table: knownGene
region: genome
output file: leave blank to see the output in the browser, add a file name to download a file of output results.
2. Click on the "create" filter button.
We need to create the filter on the linked table, "knownToLocusLink" which contains mappings from EntrezIDs to UCSC IDs. To do this, first tell the Table Browser that you want to add a Linked Table.
2a. Under the "Linked Tables" section, check the checkbox "hg38 knownToLocusLink."
2b. Next, scroll to the bottom of the page and click the "allow filtering using fields in checked tables" button, which will take you to the filter page where you can filter for your Entrez IDs.
3. From the Filter page, go to the 3rd section from the top, "hg38.knownToLocusLink based filters."
3a. It is the "value" column which contains the Entrez Gene IDs. The "name" column contains UCSC ID for transcripts. In the "value" row, you can paste in your IDs, separated by a space. First, delete the "*" asterisk in the "value" field, then paste in your IDs. I will paste in "26863 729574" without the quotes.
The value row should now read, "value does match 26863 729574" (without the quotes).
Although the filter field is small, you should be able to paste in thousands of IDs.
3b. Click any "submit" button on this filter page to submit your filter and return to the main Table Browser settings page.
4. Now we can select which fields to output.
4a. Next to "output format" select the drop-down selection for "selected fields from primary and related tables."
4b. Click the "get output" button, which will simply move you to the next step where you can select fields.
5. First, we now need to tell the Table Browser that we want to select fields from a linked table.
5a. From the section "Linked Tables" you can check the box for "hg38 knownToLocusLink".
5b. Scroll to the bottom of the page and click, "allow selection from checked tables" to move to the next step.
6. Make all of your output selections. For example,
6a. From the section, "Select Fields from hg38.knownGene" you can select "Name of gene, chrom, txStart, and txEnd."
6b. From the section, "hg38.knownToLocusLink fields" you can select the checkbox "value" which will output Entrez IDs.
6c. If you want to, under "hg38.kgXref fields" you can select "geneSymbol."
7. Near the top of the page, click the "get output" button. This time, you actually get output! For my example, output is:
#filter: (knownToLocusLink.value = '729574' OR knownToLocusLink.value = '26863')
#hg38.knownGene.name hg38.knownGene.chrom hg38.knownGene.txStart hg38.knownGene.txEnd hg38.kgXref.geneSymbol hg38.knownToLocusLink.value
uc031tqt.1 chr1 16514121 16514285 RNU1-1 26863
uc031tqw.1 chr1 16666784 16666948 RNU1-3 26863
uc031tqx.1 chr1 16673002 16673512 FAM231A 729574
uc031tra.1 chr1 16740515 16740679 RNU1-4 26863
uc031trd.1 chr1 16895979 16896143 RNU1-2 26863
uc031uqm.1 chr1 143729406 143729570 RNVU1-18 26863
uc031uub.1 chr1 146376806 146376970 U1 26863
uc032aye.1 chr14 34546713 34546877 RNU1-27P 26863
uc032ayf.1 chr14 34556225 34556389 RNU1-28P 26863
uc032mze.1 chr1_KI270713v1_random 21860 22024 U1 26863
uc032mzh.1 chr1_KI270713v1_random 35406 35916 FAM231B 729574
Note that there may be multiple transcripts, identified by the UCSC id, for one Entrez gene symbol.
Here is a session of this example. You can click on this session and look at the Table Browser settings.
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=cath&hgS_otherUserSessionName=EntrezIDsFilter
For future reference, if needed, here is an external conversion tool (e.g., convert Entrez Gene IDs to Gene Names):
http://www.uniprot.org/uploadlists/
Regarding your question about HUGO identifiers, you will take a similar process as the steps above, but you can upload (or paste in) a list into the "identifiers" button of the Table Browser (instead of using the 'filter' button. Please some previously answered mailing list questions below related to HUGO queries:
https://groups.google.com/a/soe.ucsc.edu/forum/#!searchin/genome/HUGO$20table$20browser$20gene$20names/genome/5sLOrAT_z34/AjWfGUJSBgAJ
https://groups.google.com/a/soe.ucsc.edu/forum/#!searchin/genome/HUGO$20table$20browser$20gene$20names/genome/cMP7Z9ECBsM/MeMT8ZYoDMsJ
https://groups.google.com/a/soe.ucsc.edu/forum/#!searchin/genome/HUGO$20table$20browser$20gene$20names/genome/PfiL1gkuGXw/j2RUssWYXtYJc
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
I hope this is helpful.
-Browser Team
UCSC Genome Browser
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.