bed vs narrowPeak format for custom tracks

1,536 views
Skip to first unread message

Eric Van Nostrand

unread,
May 31, 2016, 5:01:01 PM5/31/16
to Cricket Alicia Sloan, gen...@soe.ucsc.edu

Hi,

I wanted to email to follow Cricket's discussion with you guys about bed format headers. We're currently working with the ENCODE DCC to make some files in the narrowPeak bed format available for public use, and we're having issues where the narrowPeak format appears to explicitly require a header in order to load properly as a custom track on the UCSC browser, but as Cricket notes:

A narrowPeak BED file is just a BED file that has specified columns.  The track headers that you want to prepend to the .bed.narrowpeak are commands to the browser, and not part of the BED definition.   Those commands to the browser are saying basically, turn on the narrowPeak features.  You could give the same files and say “turn on the BED6 features.”   So here is the core of the problem: A BED file is a file format.  A customTrack is a list of commands to the UCSC browser that can have BED files embedded in it.  UCSC documentation has generally confused these two concepts.  

I am currently trying to convince the UCSC genome browser that they need to relax the BED format now and tolerate headers, but they are giving resistance because they do not see the use case.  If you would like to join the fray, please email your confusion about narrowPeaks to genome.soe.ucsc.edu.  It might help me to be convincing. 


Here's my problem:

If you download input-normalized narrowpeak format peak file ENCFF735HJV from eCLIP experiment https://www.encodeproject.org/experiments/ENCSR987FTF/ , and try to upload it to ucsc, you get the following error:

"    Error File 'ENCFF735HJV.bed' - Error line 1 of custom track: thickStart out of range (chromStart to chromEnd, or 0 if no CDS) "

This is because in standard .bed+6 format (i.e., what UCSC reads the file as if there is no header), column 7 is:

thickStart - The starting position at which the feature is drawn thickly (for example, the start codon in gene displays). When there is no thick part, thickStart and thickEnd are usually set to the chromStart position.

However, in official UCSC narrowPeak format (which I believe was defined by UCSC and ENCODE 2, http://genome.ucsc.edu/FAQ/FAQformat.html#format12), column 7 is:

signalValue - Measurement of overall (usually, average) enrichment for the region.

This breaks because with no header line to define the file as narrowPeak format, they are read as standard .bed+6 files, and in that format thickStart is not allowed to be negative (as there's obviously no meaning to a negative chromosome position). For ChIP-seq peaks, maybe this worked for you guys because your peak calls happened to always have positive signalValues, but for the eCLIP peaks, the way the peak calling is done there exist entries that are actually depleted relative to the paired input experiment (the initial clusters are called using a transcript-level normalization, and then input normalized after).

For that example file, if you add the proper header line

track type=narrowPeak visibility=3 db=hg19 name="RBFOX2_HepG2_rep01" description="RBFOX2_HepG2_rep01 input-normalized peaks"

then the file is interpreted properly as narrowPeak format, and displays properly. You should be able to validate this behavior using any of the 'filetype: bed narrowPeak' files on the eCLIP pages (they're all submitted but not public yet)


Do you have a recommended solution? We appear to need some way to define a .bed file as 'narrowPeak bed', otherwise it doesn't get read properly

Thanks!
-Eric

-- 
Eric Van Nostrand
Merck Fellow of the Damon Runyon Cancer Research Foundation
Yeo Lab / UCSD
2880 Torrey Pines Scenic Drive, La Jolla, CA 92037
(858)246-1491

Cricket Alicia Sloan

unread,
Jun 1, 2016, 8:58:23 PM6/1/16
to Eric Van Nostrand, gen...@soe.ucsc.edu
Dear Eric,

I believe that the recommendation of the UCSC Genome Browser will be to have us store the files as bigBed format with embedded narrowpeak.as files.  In this way, you should be able copy the url of the bigBed into the custom track box.  An example,


If the file ENCFF997ZOA in this experiment was made with a  narrowPeak.as file, then copying the url of that file https://www.encodeproject.org/files/ENCFF997ZOA/@@download/ENCFF997ZOA.bigBed into https://genome.ucsc.edu/cgi-bin/hgCustom should give you the narrow peak functionality that you desire.

However,  it looks like that solution is not going to work, because custom tracks does not seem to like our S3 url links.  So, I think what you would have to do is


Which does work, but which does not handle the request. The request being a single file  or link that can be dropped right into custom tracks without the user needing to understand track lines.

Maybe our colleagues at the UCSC Browser have another idea for simplifying the transition from bed file or bigBed file into custom tracks.  


I hope this helps further clarify,

Cricket



Cricket Alicia Sloan
Lead Data Wrangler
ENCODE DCC
Stanford University
3165 Porter Dr
Palo Alto, CA 94304 
cric...@stanford.edu
Skype: cricket_sloan 

Luvina Guruvadoo

unread,
Jun 8, 2016, 3:05:52 PM6/8/16
to Cricket Alicia Sloan, Eric Van Nostrand, gen...@soe.ucsc.edu
Hello Eric and Cricket,

We have implemented this feature and it is scheduled to go out with our next release on June21st. You will be able to paste your link into the Custom Tracks box without the track line. Thanks again for bringing this to our attention.

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Regards,
Luvina

--
Luvina Guruvadoo
UCSC Genome Browser

http://genome.ucsc.edu




--


Cricket Alicia Sloan

unread,
Jun 9, 2016, 11:44:12 AM6/9/16
to Luvina Guruvadoo, Eric Van Nostrand, gen...@soe.ucsc.edu

Luvina,


Thank you for the fast turn around.  Keep up the excellent work.  

Cricket


Cricket Alicia Sloan
Lead Data Wrangler
ENCODE DCC
Stanford University
3165 Porter Dr
Palo Alto, CA 94304 
cric...@stanford.edu
Skype: cricket_sloan 

Reply all
Reply to author
Forward
0 new messages