should be simple in table browser

12 views
Skip to first unread message

LaFramboise, William A

unread,
Sep 11, 2014, 10:57:46 AM9/11/14
to gen...@soe.ucsc.edu
Pardon my abilities but I am trying to submit a list of genomic positions ( I have many) like the following

chr1 987462 987463
chr1 1149055 1149056
chr1 1196236 1196237
chr1 1234068 1234069
chr1 1263077 1263078
chr1 1288539 1288540

and get back the gene names and/or symbols encoded for each region.

Following your directions in the Table Browser and "help" list from previous questions for submission of a list of positions I used the following settings:

Mammal
Human
Feb. 2009 (hg19)
Genes and Gene Predictions
UCSC Genes
knownGene

output format: all fields from selected table

I get an output of

#name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds proteinID alignID

However, I cannot get gene names nor relate this back to my input file to interrogate with the protein IDs. May I trouble you for insight as to settings to reveal gene names associated with these positions?

Many thanks,

Bill



Matthew Speir

unread,
Sep 12, 2014, 6:41:57 PM9/12/14
to LaFramboise, William A, gen...@soe.ucsc.edu
Hi Bill,

Thank you for your question about getting gene symbols as part of your
from the Table Browser. You are on the right track with your current
Table Browser settings, and the only issue is your output settings. The
reason you are not getting gene symbols as part of your output is
because these are not stored in the knownGene table for the UCSC Genes
track, but instead stored in a linked table. When you select the output
option "all fields from selected table", you are only getting the
information contained in the knownGene table. I recommend using the
"selected fields from primary and related tables" output option. After
you click "get output", you will be taken to another page where you will
be able to select fields from both the knownGene table and various
linked tables that you want as part of your output. On this page, select
those fields from the "Select Fields from hg19.knownGene" section that
you are interested in. In the "hg19.kgXref fields" section, you will
find a number of alternative IDs for the transcripts in the knownGene
table. Check the box next to "geneSymbol", and any other IDs you are
interested in. Finally, click "get output". Your output will consist of
the fields you selected as columns in order starting from the top of the
"hg19.knownGene" section. While this output option doesn't necessarily
format your output in a terribly useful way, you can use a simple UNIX
command line utility such as awk to rearrange the columns however you want.

I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

LaFramboise, William A

unread,
Sep 15, 2014, 11:31:49 AM9/15/14
to gen...@soe.ucsc.edu, Matthew Speir
This solution worked nicely. One niggling issue--- I get many outputs (multiple variants) for my entries but cannot align them with my original entries since none of my input is retained in the output file. Is there a simple way to retain one of my entry coordinates or add a series number to the input to sort and align the output?

Thanks,

Bill.
________________________________________
From: Matthew Speir [msp...@soe.ucsc.edu]
Sent: Friday, September 12, 2014 6:41 PM
To: LaFramboise, William A; gen...@soe.ucsc.edu
Subject: Re: [genome] should be simple in table browser

Steve Heitner

unread,
Sep 16, 2014, 5:36:03 PM9/16/14
to LaFramboise, William A, gen...@soe.ucsc.edu, Matthew Speir
Hello, Bill.

With the exception of retrieving sequence from the Table Browser where you
can add an optional description field into your input which gets inserted
into the FASTA headers of your output, the Table Browser is not designed to
include any input in the output. You will have to design a script to parse
and insert the appropriate information into your output.

You may also consider checking out Galaxy (https://usegalaxy.org/) to see if
they have any tools that would be useful to you in this regard.

Please contact us again at gen...@soe.ucsc.edu if you have any further
questions. Questions sent to that address will be archived in a
publicly-accessible forum for the benefit of other users. If your question
contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group
--


LaFramboise, William A

unread,
Sep 17, 2014, 11:53:42 AM9/17/14
to st...@soe.ucsc.edu, gen...@soe.ucsc.edu, Matthew Speir
Thanks Steve.
Bill
Reply all
Reply to author
Forward
0 new messages