ENCODE ChIPseq data / background

Alessandro Testori

unread,

Oct 18, 2016, 12:24:39 PM10/18/16

to gen...@soe.ucsc.edu

Hello.

Should I subtract some background from the ENCODE ChIPseq files hosted at UCSC (https://genome.ucsc.edu/ENCODE/dataMatrix/encodeDataMatrixHuman.html)? And what about DNAse data?

Please let me know.

With kind regards.

Alessandro

Alessandro Testori

unread,

Oct 25, 2016, 11:09:00 AM10/25/16

to gen...@soe.ucsc.edu

Hello. I am interested in DNAse and ChIP seq signals from ENCODE UCSC tracks. For each experiment several types of data are available: Peaks, Hotspots, Raw Signals, Density Signal, Overlap Signal. I thought that choosing Peaks was a good thing, but in several cases it seems to me that they are too different from other types of data. Any suggestions? What about input files (background)? Please let me know. Cheers.

Brian Lee

unread,

Oct 25, 2016, 1:19:35 PM10/25/16

to Alessandro Testori, gen...@soe.ucsc.edu

Dear Alessandro,

Thank you for using the UCSC Genome Browser and your question about different data types for different ENCODE tracks at UCSC.

You will want to review the Track Description pages for the different experiments to find a Methods section that will outline how the data was produced. In summary, the order of processing is to go from the raw reads to a signal file, to a final summarized peak file. The Methods section will attempt to clarify what occurred, and whether an input was used.

Once you select the data you are interested in from the matrix, and you see a display of related track data, the link under "Track Name" will allow a pop-up of the Track Description. Or when you are browsing track data in the browser, you can click an item in the track and see the Track Description display.

For quicker access, here are the groupings of the DNase and TFBS ChIP-Seq super tracks:

http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeDNAseSuper

http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeTfBindingSuper

This super track grouping collects related experiments and provides links to further composites representing different experiments. From the DNase super track, if you click the "UW DNaseI HS " link you will see the data from the University of Washington from ENCODE, where in the methods section you can see what algorithms were used and also find a Credits section where there is a contact listed if you were interested to ask specific questions about this data. On the Track Description page there is also a "Downloads" link where you can see files for download with descriptive information, or click further to a "files.txt" link that has metaData information for each file from this experiment in one location with every line describing a new file.

Similarly from the TFBS ChIP-seq supertrack, if you click the "UW CTCF Binding" link, you will find similar Methods and Credit sections for this different experiment, and a column that lists the "Input" used for each cell type.

In essence, depending on the experiment you are looking at, you will want to consult the corresponding Track Description page to find the Methods section that will describe what took place to create the data, and also likely the References section that will list related papers, and lastly the possibility of contacting the labs directly with file specific inquiries found in the Credits section.

I suggest as well that you consider using the current ENCODE portal at https://www.encodeproject.org/ where there is also newer information since 2012 available as well.

The current ENCODE portal has many benefits, such as being more directly connected with labs as they accept new data for the current phase of the project, and thus more active in the curation of all metadata. You can search the UCSC ENCODE accession for a file at UCSC (like wgEncodeEH000492) at the current ENCODE portal to easily find the corresponding match. For example, this link represents one of the UW DNase-seq files https://www.encodeproject.org/experiments/ENCSR000EMT/ where you can find the same Track Description in the bottom "Documents" and "General Protocol" section, but also a graphical representation of the metadata about the pipeline used to produce the data.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee

UCSC Genomics Institute

--

Reply all

Reply to author

Forward