Question about ENCODE tracks on genome browser

115 views
Skip to first unread message

Brian Lee

unread,
Jan 29, 2015, 2:20:43 PM1/29/15
to gen...@soe.ucsc.edu
Hi,

I had a general question about the DNase-hypersensitivity tracks on the UCSC genome browser.  With just "ENCODE regulation" track selected, the DHS for 125 cell types are shown, as well as ChIP-seq for transcription factors.  I had assumed that this was a comprehensive summary of all the DHSs and ChIP peaks that were seen at a given genomic region.  However, under the "DNase/FAIRE" track, there is a DNase "master list" that shows additional DHSs that do not show up in the default DNase track.  I was wondering what the difference was, i.e., why are there tracks in the master list that are not included in the default track, which presumably should show data on all 125 cell types.

Specifically, I'm interested in lymphoblast regulatory elements at a particular locus, and the "master list" shows an additional DHS that I had never seen before (it only was detected in one of many lymphoblast cell lines).  It also overlaps H3K4me1 and FAIRE-seq peaks in lymphoblasts, but I was confused why it wouldn't show up with the "ENCODE regulation" track.

Thanks!

Brian Lee

unread,
Jan 29, 2015, 2:21:31 PM1/29/15
to gen...@soe.ucsc.edu
Dear Sir,

Thank you for your question about the differences of the of DNaseI Hypersensitivity Clusters and DNaseI Hypersensitive Site Master List tracks in the UCSC Genome Browser.

Below are links to the Track Description pages for each track where you can read more detailed information about these data:

Here is a session loading the two tracks around a gene, SIRT1:
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Brian%20Lee&hgS_otherUserSessionName=hg19.SIRT1.DNase

Your original thinking is correct, the clusters track provides another view of the masters sites track.  Reading the descriptions you will learn the clusters track is a version of the same data, where the master site list was clustered, by UCSC with a program called regCluster, and where low-scoring clusters (score <100) were filtered out.  Thus there will be some discrepancies between the two tracks. Some of the master sites will failed to cluster into groups that met the minimum score threshold, and hence are missing from the clusters track. If the clustering and filtering features of the DNaseI Hypersensitivity Clusters track are undesirable, the masters list provides a chance to see a less processed level of the data.

You can further view the DNaseI Hypersensitivity Uniform Peaks tracks, a level below these two in the degree of data processing: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeAwgDnaseUniform

Also, just in case you have previously viewed the wgEncodeRegDnaseClustered, under the Release Notes, you will find an explanation that this is the third version of the track.  There was a previous V2 version, located now only on genome-preview, http://genome-preview.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegDnaseClusteredV2, where the clustering program was limiting clusters to only those that had at least two cell types. 

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group
Reply all
Reply to author
Forward
0 new messages