Hi,
I wanted to email to follow Cricket's discussion with you guys
about bed format headers. We're currently working with the ENCODE
DCC to make some files in the narrowPeak bed format available for
public use, and we're having issues where the narrowPeak format
appears to explicitly require a header in order to load properly
as a custom track on the UCSC browser, but as Cricket notes:
Here's my problem:
If you download input-normalized narrowpeak format peak
file ENCFF735HJV from eCLIP experiment
https://www.encodeproject.org/experiments/ENCSR987FTF/ , and
try to upload it to ucsc, you get the following error:
" Error File 'ENCFF735HJV.bed' - Error line 1 of
custom track: thickStart out of range (chromStart to chromEnd, or
0 if no CDS) "
This is because in standard .bed+6 format (i.e., what UCSC reads the file as if there is no header), column 7 is:
thickStart - The starting position at which the feature is drawn thickly (for example, the start codon in gene displays). When there is no thick part, thickStart and thickEnd are usually set to the chromStart position.
However, in official UCSC narrowPeak format (which I
believe was defined by UCSC and ENCODE 2,
http://genome.ucsc.edu/FAQ/FAQformat.html#format12), column
7 is:
signalValue - Measurement of overall
(usually, average) enrichment for the region.
This breaks because with no header line to define the
file as narrowPeak format, they are read as standard .bed+6 files,
and in that format thickStart is not allowed to be negative (as
there's obviously no meaning to a negative chromosome position).
For ChIP-seq peaks, maybe this worked for you guys because your
peak calls happened to always have positive signalValues, but for
the eCLIP peaks, the way the peak calling is done there exist
entries that are actually depleted relative to the paired input
experiment (the initial clusters are called using a
transcript-level normalization, and then input normalized after).
For that example file, if you add the proper header line
track type=narrowPeak visibility=3 db=hg19 name="RBFOX2_HepG2_rep01" description="RBFOX2_HepG2_rep01 input-normalized peaks"
then the file is interpreted properly as narrowPeak format, and displays properly. You should be able to validate this behavior using any of the 'filetype: bed narrowPeak' files on the eCLIP pages (they're all submitted but not public yet)-- Eric Van Nostrand Merck Fellow of the Damon Runyon Cancer Research Foundation Yeo Lab / UCSD 2880 Torrey Pines Scenic Drive, La Jolla, CA 92037 (858)246-1491
--