downloading a list of chromosomal locations based on Entrez Gene ID numbers

199 views
Skip to first unread message

Eric Foss

unread,
Jan 20, 2017, 11:02:48 AM1/20/17
to gen...@soe.ucsc.edu
Dear UCSC Genome, 

I have a list of Entrez Gene ID numbers (388795, 390284, 391343, etc.) for human genes, and I would like to retrieve a table with a row for each gene and columns with chromosome, transcription start site, etc. Searching the help, I found this: 


I tried to go through the steps listed below this text:

If you have Entrez Gene identifiers of the type like "382301", which corresponds to UCSC identifier "uc009vfk.1", you can get the same output using the Table Browser "filter” tool …

Unfortunately, it didn’t work (the final “get output” gave me all genes in the human genome with no Entrez Gene IDs, clearly somehow having ignored the list of Entrez Gene IDs I had entered earlier in the instructions). Could you please give me some guidance with this? 

Thanks very much. 

Eric

P.S. I have the corresponding HUGO gene names as well, and I would appreciate information on how to get the same information using those names. 

Chris Villarreal

unread,
Jan 26, 2017, 5:05:41 PM1/26/17
to Eric Foss, UCSC

Dear Eric,

You can get the data that you're looking for using the table browser
found here: http://www.genome.ucsc.edu/cgi-bin/hgTables.

The "knownToLocusLink" table is part of the UCSC Genes set of tables. This table has Entrez identifiers for 164,238 of the 197,782 transcripts in the knownGene table, so it covers most of the gene set. In some of your examples, you may notice that there is no match for certain IDs. For example, Entrez Gene ID: 388795 (https://www.ncbi.nlm.nih.gov/gene/388795) is not included because it is not an NM_* id, it's an XM_*.

For this example, I'll use the hg38 assembly with the knownGene table, but you can choose the assembly or gene set as desired. I'll be using two example Entrez IDs from which I will filter my results to obtain basic annotation information.

26863
729574

Once you navigate to the Table Browser,

1. Select the following:

   clade: Mammal
   genome: human
   assembly: hg38
   group: Genes and Gene Prediction Tracks
   track: GENCODE v24
   table: knownGene
   region: genome
  output file: leave blank to see the output in the browser, add a file name to download a file of output results.
2. Click on the "create" filter button.

We need to create the filter on the linked table, "knownToLocusLink" which contains mappings from EntrezIDs to UCSC IDs. To do this, first tell the Table Browser that you want to add a Linked Table.

2a. Under the "Linked Tables" section, check the checkbox "hg38 knownToLocusLink."

2b. Next, scroll to the bottom of the page and click the "allow filtering using fields in checked tables" button, which will take you to the filter page where you can filter for your Entrez IDs.

3. From the Filter page, go to the 3rd section from the top, "hg38.knownToLocusLink based filters."

3a. It is the "value" column which contains the Entrez Gene IDs. The "name" column contains UCSC ID for transcripts. In the "value" row, you can paste in your IDs, separated by a space. First, delete the "*" asterisk in the "value" field, then paste in your IDs. I will paste in "26863 729574" without the quotes.

The value row should now read, "value does match 26863 729574" (without the quotes).

Although the filter field is small, you should be able to paste in thousands of IDs.

3b. Click any "submit" button on this filter page to submit your filter and return to the main Table Browser settings page.

4. Now we can select which fields to output.

4a. Next to "output format" select the drop-down selection for "selected fields from primary and related tables."

4b. Click the "get output" button, which will simply move you to the next step where you can select fields.

5. First, we now need to tell the Table Browser that we want to select fields from a linked table.

5a. From the section "Linked Tables" you can check the box for "hg38 knownToLocusLink".

5b. Scroll to the bottom of the page and click, "allow selection from checked tables" to move to the next step.

6. Make all of your output selections. For example,

6a. From the section, "Select Fields from hg38.knownGene" you can select "Name of gene, chrom, txStart, and txEnd."

6b. From the section, "hg38.knownToLocusLink fields" you can select the checkbox "value" which will output Entrez IDs.

6c. If you want to, under "hg38.kgXref fields" you can select "geneSymbol."

7. Near the top of the page, click the "get output" button. This time, you actually get output! For my example, output is:

#filter: (knownToLocusLink.value = '729574' OR knownToLocusLink.value = '26863')
#hg38.knownGene.name hg38.knownGene.chrom hg38.knownGene.txStart hg38.knownGene.txEnd hg38.kgXref.geneSymbol hg38.knownToLocusLink.value
uc031tqt.1 chr1 16514121 16514285 RNU1-1 26863
uc031tqw.1 chr1 16666784 16666948 RNU1-3 26863
uc031tqx.1 chr1 16673002 16673512 FAM231A 729574
uc031tra.1 chr1 16740515 16740679 RNU1-4 26863
uc031trd.1 chr1 16895979 16896143 RNU1-2 26863
uc031uqm.1 chr1 143729406 143729570 RNVU1-18 26863
uc031uub.1 chr1 146376806 146376970 U1 26863
uc032aye.1 chr14 34546713 34546877 RNU1-27P 26863
uc032ayf.1 chr14 34556225 34556389 RNU1-28P 26863
uc032mze.1 chr1_KI270713v1_random 21860 22024 U1 26863
uc032mzh.1 chr1_KI270713v1_random 35406 35916 FAM231B 729574

Note that there may be multiple transcripts, identified by the UCSC id, for one Entrez gene symbol.

Here is a session of this example. You can click on this session and look at the Table Browser settings.
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=cath&hgS_otherUserSessionName=EntrezIDsFilter

For future reference, if needed, here is an external conversion tool (e.g., convert Entrez Gene IDs to Gene Names):

http://www.uniprot.org/uploadlists/

Regarding your question about HUGO identifiers, you will take a similar process as the steps above, but you can upload (or paste in) a list into the "identifiers" button of the Table Browser (instead of using the 'filter' button. Please some previously answered mailing list questions below related to HUGO queries:

https://groups.google.com/a/soe.ucsc.edu/forum/#!searchin/genome/HUGO$20table$20browser$20gene$20names/genome/5sLOrAT_z34/AjWfGUJSBgAJ

https://groups.google.com/a/soe.ucsc.edu/forum/#!searchin/genome/HUGO$20table$20browser$20gene$20names/genome/cMP7Z9ECBsM/MeMT8ZYoDMsJ

https://groups.google.com/a/soe.ucsc.edu/forum/#!searchin/genome/HUGO$20table$20browser$20gene$20names/genome/PfiL1gkuGXw/j2RUssWYXtYJc



I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.



I hope this is helpful.

-Browser Team

UCSC Genome Browser 



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Reply all
Reply to author
Forward
0 new messages