big format for narrowPeak

716 views
Skip to first unread message

Daofeng Li

unread,
Mar 28, 2014, 11:16:07 AM3/28/14
to gen...@soe.ucsc.edu
Dear UCSC mailing list,

I need add some narrowPeak tracks to our remote hub.
I usually load the narrowPeak track to database when viewing it locally.
For viewing narrowPeak in remote hub, which format should I use for specify the bigDataUrl property? 

Thanks.

Daofeng

Jonathan Casper

unread,
Mar 28, 2014, 5:31:59 PM3/28/14
to Daofeng Li, gen...@soe.ucsc.edu

Hello Daofeng,

Thank you for your question about viewing narrowPeak data in a track hub. narrowPeak data is stored in a BED 6+4 format (seehttp://genome.ucsc.edu/FAQ/FAQformat.html#format12). You can add narrowPeak data to a track hub by converting the narrowPeak data into a bigBed file, and then specifying the URL to that bigBed file in the bigDataUrl property.

To convert a narrowPeak data file into bigBed format, use the bedToBigBed utility as described in Example Three on http://genome.ucsc.edu/goldenPath/help/bigBed.html. Example Three is important because it shows how to convert a file that uses additional non-standard BED fields. The narrowPeak format uses 4 additional non-standard fields. For the bedToBigBed utility to convert your data, you will need to provide a .as (AutoSQL) file to describe these fields. You can find a .as file describing the narrowPeak format in the Kent source tree, in the directory src/hg/lib/encode/narrowPeak.as. This file is also available from our online repository at http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=tree. Use this file as described in Example Three, and you should be able to create a bigBed file that contains all of your narrowPeak data.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group



--


Daofeng Li

unread,
Apr 2, 2014, 4:17:44 PM4/2/14
to Jonathan Casper, gen...@soe.ucsc.edu
Hi Jonathan,

Thank you very much for your reply.
One more question, so also for broadPeak and gappedPeak, also use bigBed format, but use a different .as file?
Correct?

Thank you again.

Daofeng

Steve Heitner

unread,
Apr 2, 2014, 4:51:39 PM4/2/14
to Daofeng Li, Jonathan Casper, gen...@soe.ucsc.edu

Hello, Daofeng.

Yes, this is correct.  You can find the broadPeak and gappedPeak descriptions just beneath the narrowPeak description (http://genome.ucsc.edu/FAQ/FAQformat.html#format13 and http://genome.ucsc.edu/FAQ/FAQformat.html#format14) and the .as files for broadPeak and gappedPeak are also in the same location as the narrowPeak.as file (http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=tree;f=src/hg/lib/encode).

Please contact us again at gen...@soe.ucsc.edu if you have any further questions. 
Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users.  If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group

--

Daofeng Li

unread,
Apr 2, 2014, 4:59:32 PM4/2/14
to st...@soe.ucsc.edu, Jonathan Casper, gen...@soe.ucsc.edu
Hi Steve,

Thank you very much for reply.
Yes, I found them and they worked perfect.

While I do encounter a problem when converting gappedPeak to bigBed, I am not sure it's my file format problem or how could I fix this error.

bedToBigBed -as=/home/dli/gappedPeak.as -type=bed12+3 y /home/dli/hg19.size y.bigBed

pass1 - making usageList (24 chroms): 41 millis

Error line 1 of y: BED blocks must span chromStart to chromEnd.  BED chromStarts[0] = 18, must be 0 so that (chromStart + chromStarts[0]) equals chromStart.


Here are some lines of my file y in gappedPeak format:

chr1    723974  727047  Rank_1116       22      .       723992  727038  0       5       584,508,241,298,352     18,1052,1743,2377,2712  2.03076 4.32095 2.27921

chr1    814267  818069  Rank_15955      8       .       816761  817048  0       1       287     2494    2.03690 2.45937 0.85795

chr1    822752  826312  Rank_11921      9       .       824956  825264  0       1       308     2204    1.84550 2.63678 0.98213

chr1    831923  834034  Rank_4006       15      .       833008  833987  0       2       572,325 1085,1739       2.59614 3.36433 1.56091

chr1    927046  928904  Rank_48625      4       .       927208  927578  0       1       370     162     1.91915 1.88345 0.48029

chr1    1033775 1036039 Rank_42052      5       .       1034184 1034424 0       1       240     409     1.93991 1.96547 0.52877


Thanks in advance for any response.


Daofeng

Steve Heitner

unread,
Apr 2, 2014, 6:20:48 PM4/2/14
to Daofeng Li, Jonathan Casper, gen...@soe.ucsc.edu

Hello, Daofeng.

The problem here is that column 12 of each line in your file should begin with 0, meaning that the first exon of each item in your track, whether coding or noncoding, should start at the very beginning of the item.  If it does not, it will cause an error.  If you treat the first 12 columns of your data as a BED 12 custom track and attempt to load it at http://genome.ucsc.edu/cgi-bin/hgCustom, you will receive the same error.

Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users.  If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Daofeng Li

unread,
Apr 2, 2014, 8:05:54 PM4/2/14
to Steve Heitner, Jonathan Casper, gen...@soe.ucsc.edu
Thank you Steve.

Actually I tried to replace the first number in column 12 with 0 and tried to convert to bigBed.
Another error happens:

pass1 - making usageList (24 chroms): 34 millis
Error line 1 of y: BED blocks must span chromStart to chromEnd.  (chromStart + chromStarts[last] + blockSizes[last]) must equal chromEnd.

Is there a way to ignore these errors and generate the bigBed files?

Thanks.

Daofeng

Matthew Speir

unread,
Apr 3, 2014, 6:18:45 PM4/3/14
to Daofeng Li, Steve Heitner, Jonathan Casper, gen...@soe.ucsc.edu
Hello Daofeng,

Thank you for your question about these bedToBigBed errors. Unfortunately, it is not possible to skip over these error messages and generate your bigBed file. You shouldn't ignore these errors anyways, as they can often indicate when something is terribly wrong with your file. The reason you are seeing this most recent error is because the 'exon' sizes, 'exon' starts, and 'gene' size to not match up the way they should. If we take the first sample line that you provided:


    chr1    723974  727047  Rank_1116       22      .       723992  727038  0       5       584,508,241,298,352     18,1052,1743,2377,2712  2.03076 4.32095 2.27921

We can see that transcript length is 727047 - 723974 = 3073, based on the chromosome start and end coordinates. The last 'exon' begins at position 2712 in the 'gene', and this last exon length should be 361, instead of 352. The value 361 is obtained by doing the following: 3073 - 2712 = 361. You may want to check your file for other similar errors.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Daofeng Li

unread,
Apr 4, 2014, 12:32:10 AM4/4/14
to Matthew Speir, Steve Heitner, Jonathan Casper, gen...@soe.ucsc.edu
It's very helpful, thank you Matthew.

Daofeng
Reply all
Reply to author
Forward
0 new messages