big format for narrowPeak

Daofeng Li

unread,

Mar 28, 2014, 11:16:07 AM3/28/14

to gen...@soe.ucsc.edu

Dear UCSC mailing list,

I need add some narrowPeak tracks to our remote hub.

I usually load the narrowPeak track to database when viewing it locally.

For viewing narrowPeak in remote hub, which format should I use for specify the bigDataUrl property?

Thanks.

Daofeng

Jonathan Casper

unread,

Mar 28, 2014, 5:31:59 PM3/28/14

to Daofeng Li, gen...@soe.ucsc.edu

Hello Daofeng,

Thank you for your question about viewing narrowPeak data in a track hub. narrowPeak data is stored in a BED 6+4 format (seehttp://genome.ucsc.edu/FAQ/FAQformat.html#format12). You can add narrowPeak data to a track hub by converting the narrowPeak data into a bigBed file, and then specifying the URL to that bigBed file in the bigDataUrl property.

To convert a narrowPeak data file into bigBed format, use the bedToBigBed utility as described in Example Three on http://genome.ucsc.edu/goldenPath/help/bigBed.html. Example Three is important because it shows how to convert a file that uses additional non-standard BED fields. The narrowPeak format uses 4 additional non-standard fields. For the bedToBigBed utility to convert your data, you will need to provide a .as (AutoSQL) file to describe these fields. You can find a .as file describing the narrowPeak format in the Kent source tree, in the directory src/hg/lib/encode/narrowPeak.as. This file is also available from our online repository at http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=tree. Use this file as described in Example Three, and you should be able to create a bigBed file that contains all of your narrowPeak data.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group

--

Daofeng Li

unread,

Apr 2, 2014, 4:17:44 PM4/2/14

to Jonathan Casper, gen...@soe.ucsc.edu

Hi Jonathan,

Thank you very much for your reply.

One more question, so also for broadPeak and gappedPeak, also use bigBed format, but use a different .as file?

Correct?

Thank you again.

Daofeng

Steve Heitner

unread,

Apr 2, 2014, 4:51:39 PM4/2/14

to Daofeng Li, Jonathan Casper, gen...@soe.ucsc.edu

Hello, Daofeng.

Yes, this is correct. You can find the broadPeak and gappedPeak descriptions just beneath the narrowPeak description (http://genome.ucsc.edu/FAQ/FAQformat.html#format13 and http://genome.ucsc.edu/FAQ/FAQformat.html#format14) and the .as files for broadPeak and gappedPeak are also in the same location as the narrowPeak.as file (http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=tree;f=src/hg/lib/encode).

Please contact us again at gen...@soe.ucsc.edu if you have any further questions. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group

--

Daofeng Li

unread,

Apr 2, 2014, 4:59:32 PM4/2/14

to st...@soe.ucsc.edu, Jonathan Casper, gen...@soe.ucsc.edu

Hi Steve,

Thank you very much for reply.

Yes, I found them and they worked perfect.

While I do encounter a problem when converting gappedPeak to bigBed, I am not sure it's my file format problem or how could I fix this error.

bedToBigBed -as=/home/dli/gappedPeak.as -type=bed12+3 y /home/dli/hg19.size y.bigBed

pass1 - making usageList (24 chroms): 41 millis

Error line 1 of y: BED blocks must span chromStart to chromEnd. BED chromStarts[0] = 18, must be 0 so that (chromStart + chromStarts[0]) equals chromStart.

Here are some lines of my file y in gappedPeak format:

chr1 723974 727047 Rank_1116 22 . 723992 727038 0 5 584,508,241,298,352 18,1052,1743,2377,2712 2.03076 4.32095 2.27921

chr1 814267 818069 Rank_15955 8 . 816761 817048 0 1 287 2494 2.03690 2.45937 0.85795

chr1 822752 826312 Rank_11921 9 . 824956 825264 0 1 308 2204 1.84550 2.63678 0.98213

chr1 831923 834034 Rank_4006 15 . 833008 833987 0 2 572,325 1085,1739 2.59614 3.36433 1.56091

chr1 927046 928904 Rank_48625 4 . 927208 927578 0 1 370 162 1.91915 1.88345 0.48029

chr1 1033775 1036039 Rank_42052 5 . 1034184 1034424 0 1 240 409 1.93991 1.96547 0.52877

Thanks in advance for any response.

Daofeng

Steve Heitner

unread,

Apr 2, 2014, 6:20:48 PM4/2/14

to Daofeng Li, Jonathan Casper, gen...@soe.ucsc.edu

Hello, Daofeng.

The problem here is that column 12 of each line in your file should begin with 0, meaning that the first exon of each item in your track, whether coding or noncoding, should start at the very beginning of the item. If it does not, it will cause an error. If you treat the first 12 columns of your data as a BED 12 custom track and attempt to load it at http://genome.ucsc.edu/cgi-bin/hgCustom, you will receive the same error.

Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Daofeng Li

unread,

Apr 2, 2014, 8:05:54 PM4/2/14

to Steve Heitner, Jonathan Casper, gen...@soe.ucsc.edu

Thank you Steve.

Actually I tried to replace the first number in column 12 with 0 and tried to convert to bigBed.

Another error happens:

pass1 - making usageList (24 chroms): 34 millis

Error line 1 of y: BED blocks must span chromStart to chromEnd. (chromStart + chromStarts[last] + blockSizes[last]) must equal chromEnd.

Is there a way to ignore these errors and generate the bigBed files?

Thanks.

Daofeng

Matthew Speir

unread,

Apr 3, 2014, 6:18:45 PM4/3/14

to Daofeng Li, Steve Heitner, Jonathan Casper, gen...@soe.ucsc.edu

Hello Daofeng,

Thank you for your question about these bedToBigBed errors. Unfortunately, it is not possible to skip over these error messages and generate your bigBed file. You shouldn't ignore these errors anyways, as they can often indicate when something is terribly wrong with your file. The reason you are seeing this most recent error is because the 'exon' sizes, 'exon' starts, and 'gene' size to not match up the way they should. If we take the first sample line that you provided:

chr1 723974 727047 Rank_1116 22 . 723992 727038 0 5 584,508,241,298,352 18,1052,1743,2377,2712 2.03076 4.32095 2.27921

We can see that transcript length is 727047 - 723974 = 3073, based on the chromosome start and end coordinates. The last 'exon' begins at position 2712 in the 'gene', and this last exon length should be 361, instead of 352. The value 361 is obtained by doing the following: 3073 - 2712 = 361. You may want to check your file for other similar errors.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

--

Daofeng Li

unread,

Apr 4, 2014, 12:32:10 AM4/4/14

to Matthew Speir, Steve Heitner, Jonathan Casper, gen...@soe.ucsc.edu

It's very helpful, thank you Matthew.

Daofeng

Reply all

Reply to author

Forward