GFF for promoters

289 views
Skip to first unread message

Robin Mjelle

unread,
Mar 13, 2013, 5:04:50 AM3/13/13
to gen...@soe.ucsc.edu
Dear all,

I have ChIP-seq data for a transcription factor and I want to annotate it to all the promoters in the genome to see which promoter it binds. How do I get a GFF file with the coordinates for the promoters genome-wide?

Robin

Brooke Rhead

unread,
Mar 15, 2013, 4:23:56 PM3/15/13
to Robin Mjelle, gen...@soe.ucsc.edu
Hi Robin,

We don't have a track that consists of coordinates of promoters per se.
There are several tracks that contain data related to promoters (to
see a list, hit the "track search" button and look for the word
"promoter"). One track in particular that you might be interested in is
the ENCODE Integrated Regulation track:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeReg.

However, if you would like to retrieve a GFF file that contains the
regions just upstream of the genes in one of our gene tracks, I can tell
you how to do that. You will actually create a BED file first, which
can be converted to a GFF file using Galaxy (usegalaxy.org), a site that
is hosted by Penn State and works in conjunction with the Genome Browser.

Go to the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) and
select the "Gene and Gene Prediction Tracks" group. Now select a gene
track to use. If you hit the "view table schema" button, you will see a
count of the genes in the table (you may want to choose a track with a
lower count, if you are not interested in seeing many isoforms for each
gene). You will also see a description of how the track was made. Once
you have selected a gene track, make sure "region: genome" is set, then
choose "output format: BED" and check the "send output to Galaxy" box.
Hit "get output." On the next page, choose to create one BED record
that is "Upstream by <some number> bases" and hit "send query to Galaxy."

On the Galaxy page, under "Convert Formats," choose the BED-to-GFF
converter. Select the BED file you just imported and hit "Execute."
Your data should be converted to GFF format.

If you have any questions about how to use the Galaxy website, please
direct them to the Galaxy team. If you have further questions for UCSC,
please contact us again at gen...@soe.ucsc.edu.

--
Brooke Rhead
UCSC Genome Bioinformatics Group
> --
>
>
>
Reply all
Reply to author
Forward
0 new messages