BED file can not be loaded into UCSC genome browser

425 views
Skip to first unread message

Tobias Kockmann

unread,
Jan 23, 2009, 10:59:49 AM1/23/09
to macs-ann...@googlegroups.com
Dear MACS users,

I am having problems viewing my called peaks using the BED output file and UCSC genome browser. After running MACS on my input data I get the following files:

-rw-r--r-- 1 tobiasko dsf-bioinf    386 Jan 23 14:31 CB-116_mfold15_diag.xls
-rw-r--r-- 1 tobiasko dsf-bioinf  37684 Jan 23 14:26 CB-116_mfold15_model.r
-rw-r--r-- 1 tobiasko dsf-bioinf  82232 Jan 23 14:31 CB-116_mfold15_negative_peaks.xls
-rw-r--r-- 1 tobiasko dsf-bioinf 148187 Jan 23 14:31 CB-116_mfold15_peaks.bed
-rw-r--r-- 1 tobiasko dsf-bioinf 363015 Jan 23 14:31 CB-116_mfold15_peaks.xls

Now I try to add a custom track by uploading CB-116_mfold15_peaks.bed to the browser:

Error File 'CB-116_test2_peaks.bed' - Unrecognized format line 1 of custom track: 2L 5512 6101 (note: chrom names are case sensitive)

The file head looks like this:

[tobiasko@bs-dsvr09 CB-116]$ head CB-116_mfold15_peaks.bed
2L      5466    6101
2L      66344   67552
2L      72020   72827
2L      73016   73477
2L      73821   74124
2L      87267   87922
2L      107811  108389
2L      120779  121155
2L      128998  129580
2L      131750  132203

Does anyone have an idea what goes wrong here???

Thx,
Tobi

Tao Liu

unread,
Jan 23, 2009, 11:19:53 AM1/23/09
to macs-ann...@googlegroups.com
Hi Tobias,

The reason is very simple. In your input files for MACS, the
chromosome names are like 2L,2R,3L,3R... but for UCSC genome browser,
they use chr2L,chr2R,chr3L,chr3R... You just need to add 'chr' at the
beginning of every lines. Try this in your terminal:

$ awk '{ print "chr"$0 }' CB-116_mfold15_peaks.bed >
new_CB-116_mfold15_peaks.bed

Also, if you want to load WIGGLE files into UCSC, you need to modify
wiggle files as well, for example:

$ zcat CB-116_treat_afterfiting_2L.wig.gz | awk
'{ sub("chrom=","chrom=chr");print $0 }' >
CB-116_treat_afterfiting_chr2L.wig

Regards,
Tao

Aengus Stewart

unread,
Jan 23, 2009, 12:29:34 PM1/23/09
to macs-ann...@googlegroups.com
I understand that MACS will scale the total tag count of the control to be the same as the ChIP
experiment.

However in the current case I have, I have a number of experiments with different conditions and
these have obviously produced different number of total tags of the sequencer.

When I run MACS on these, as I am using the same control in each case (control common to all) the
control will be scaled to a different value, however to compare across the conditions I would like
all the experiments as well as the control to have the same total tag value.

Would it be possible to have an option to do this, or is my thinking incorrect and I shouldnt do this?


Regards
Aengus


--
-----------------------------------------------------------------------
Aengus Stewart
Head of Bioinformatics and BioStatistics
Bioinformatics and BioStatistics Tel: +44 (0)20 7269 3679
Cancer Research UK, Lincoln's Inn Fields, Holborn, London, WC2A 3PX, UK
-----------------------------------------------------------------------

This electronic message contains information which may be privileged and
confidential. The information is intended to be for the use of the
individual(s) or entity named above. Be aware that any third party
disclosure, distribution, copying or use of this communication, without
prior permission, is strictly prohibited.

This communication is from Cancer Research UK. Our website is at www.cancerresearchuk.org. We are a charity registered under number 1089464 and a company limited by guarantee registered in England & Wales under number 4325234. Our registered address is 61 Lincoln's Inn Fields, London WC2A 3PX. Our central telephone number is 020 7242 0200.

This communication and any attachments contain information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of disclosure, distribution, copying or use of this communication or the information in it or in any attachments is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender and delete the email and destroy any copies of it.

E-mail communications cannot be guaranteed to be secure or error free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses. We do not accept liability for any such matters or their consequences. Anyone who communicates with us by e-mail is taken to accept the risks in doing so.

Tobias Kockmann

unread,
Jan 26, 2009, 4:33:41 AM1/26/09
to macs-ann...@googlegroups.com
Hi Tao,

Thanks for the code lines to modify the MACS bed output file. After adding
the "chr" to each line I got a second error message from UCSC browser:

Error File 'CB-116_mfold15_peaks.mod.bed' - Error line 2911 of custom track:
chromStart after chromEnd (-26 > 302)

It seems like MACS calculated a negative start position for a peak, which is
of course not wanted for a chromosome. Is this a known bug? I simply
corrected this value to 0 using vim.

Next ucsc complained about "chrUextra", which is by definition of course not
a chromosome, but part of the genome assembly that was used by ELAND to
generate the alignments. I simply removed the lines and now it works.

Error File 'CB-116_mfold15_peaks.mod.bed' - Error line 6379 of custom track:
chrUextr not a chromosome (note: chrom names are case sensitive)

Is there a list of "accepted" chromosome identifiers for the UCSC browser?

Thx,
Tobi

MACS maintainer

unread,
Jan 26, 2009, 10:32:15 AM1/26/09
to MACS announcement
Hi Tobi,

The negative start position is due to the tag shifting and extension.
For example, if you have a -1 strand tag at 0 position, then any
shifting and extension will make it go before 0. It's ok to modify any
negative position to 0.

To check acceptable chromosome names for fruitfly genome, go to UCSC
hgGateWay 'http://genome.ucsc.edu/cgi-bin/hgGateway', select the
species and assembly, then click the 'Sequences' link in the 'About
Assembly' line and you will see the list. In your case, I think the
correct chr name must be 'chrUextra' instead of 'chrUextr' (missing an
'a').

Hope it helps,
Tao


On Jan 26, 4:33 am, Tobias Kockmann <tobias.kockm...@bsse.ethz.ch>
wrote:
> Hi Tao,
>
> Thanks for the code lines to modify the MACS bed output file. After adding
> the "chr" to each line I got a second error message from UCSC browser:
>
> Error File 'CB-116_mfold15_peaks.mod.bed' - Error line 2911 of custom track:
> chromStart after chromEnd (-26 > 302)
>
> It seems like MACS calculated a negative start position for a peak, which is
> of course not wanted for a chromosome. Is this a known bug? I simply
> corrected this value to 0 using vim.
>
> Next ucsc complained about "chrUextra", which is by definition of course not
> a chromosome, but part of the genome assembly that was used by ELAND to
> generate the alignments. I simply removed the lines and now it works.
>
> Error File 'CB-116_mfold15_peaks.mod.bed' - Error line 6379 of custom track:
> chrUextr not a chromosome (note: chrom names are case sensitive)
>
> Is there a list of "accepted" chromosome identifiers for the UCSC browser?
>
> Thx,
> Tobi
>

Tao Liu

unread,
Feb 11, 2009, 11:54:10 AM2/11/09
to macs-ann...@googlegroups.com
Hi Aengus,

Currently, it's ok to use the same control, if they are from the same
sequencer and the same protocol. MACS will scale the control to the
total tag number of ChIP, but there is no option to scale the total
tag number of all conditions to a certain number.

In order to compare different conditions, you could use fold-change
values for each peaks. But, I think you want a function which can
normalize tag numbers from ChIP and control channel then apply some
model to give each nucleotide a score along the whole genome in order
to make a universal scoring system. Unfortunately, MACS doesn't have
such function now.

Best,
Tao

Tobias Kockmann

unread,
Mar 2, 2009, 12:07:37 PM3/2/09
to macs-ann...@googlegroups.com
Hi Tao,

After using the —wig option MACS generates gzipped wig files for every chromosome:

[tobiasko@bs-dsvr09 treat]$ ls -l *.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf   55836 Mar  2 14:49 CB-116_mfold10_pvalue-10_treat_afterfiting_2LHet.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf 4369156 Mar  2 14:49 CB-116_mfold10_pvalue-10_treat_afterfiting_2L.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf  527156 Mar  2 14:49 CB-116_mfold10_pvalue-10_treat_afterfiting_2RHet.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf 3933176 Mar  2 14:49 CB-116_mfold10_pvalue-10_treat_afterfiting_2R.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf  408092 Mar  2 14:49 CB-116_mfold10_pvalue-10_treat_afterfiting_3LHet.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf 4372700 Mar  2 14:49 CB-116_mfold10_pvalue-10_treat_afterfiting_3L.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf  382923 Mar  2 14:50 CB-116_mfold10_pvalue-10_treat_afterfiting_3RHet.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf 5234259 Mar  2 14:50 CB-116_mfold10_pvalue-10_treat_afterfiting_3R.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf  219146 Mar  2 14:50 CB-116_mfold10_pvalue-10_treat_afterfiting_4.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf    4528 Mar  2 14:50 CB-116_mfold10_pvalue-10_treat_afterfiting_dmel_mitochondrion_genome.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf 3720628 Mar  2 14:50 CB-116_mfold10_pvalue-10_treat_afterfiting_Uextra.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf 1061424 Mar  2 14:50 CB-116_mfold10_pvalue-10_treat_afterfiting_U.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf   31916 Mar  2 14:50 CB-116_mfold10_pvalue-10_treat_afterfiting_XHet.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf 2989635 Mar  2 14:50 CB-116_mfold10_pvalue-10_treat_afterfiting_X.wig.gz
-rw-r--r-- 1 tobiasko dsf-bioinf   10839 Mar  2 14:50 CB-116_mfold10_pvalue-10_treat_afterfiting_YHet.wig.gz

Is there an easy way to merge all the wig files? I would like to have one wig file containing all chromosomes, but only one track. Of course one could concatenate all the files (cat *.wig > merge.wig), but then you should end up with one track per chromosome in the UCSC browser, right?

Thx,
Tobi

Tao Liu

unread,
Mar 2, 2009, 12:35:49 PM3/2/09
to macs-ann...@googlegroups.com
Hi Tobi,

Yes. You are right. If you concatenate them together, there will be a
problem to load the merged file directly to UCSC browser's custom
track. The reason why I decide to separate them into chromosomes is 1)
files can be loaded to Affymetrix IGB software; 2) if you have your
own UCSC genome browser, you can use wgEncode and hgLoadWiggle (UCSC
genome browser command line tools) to import these wiggle files one by
one into database then show them in a single track. Anywa, to solve
your problem and merge gzipped wiggle files together, try these
commands:

$ zcat CB-116_mfold10_pvalue-10_treat_afterfiting_2LHet.wig.gz | head
-1 > CB-116_mfold10_pvalue-10_treat_afterfiting.wig
$ zcat CB-116_mfold10_pvalue-10_treat_afterfiting_*.gz | grep -v
^track >> CB-116_mfold10_pvalue-10_treat_afterfiting.wig

Best,
Tao

Tobias Kockmann

unread,
Mar 2, 2009, 1:51:23 PM3/2/09
to macs-ann...@googlegroups.com
Hi Tao,

Thanks again for the fast reply! Actually we have our own UCSC browser
running and those tools sounds very helpful. Where can I find information on
them?

Greetings,
Tobi

Tobias Kockmann

unread,
Mar 2, 2009, 1:57:08 PM3/2/09
to macs-ann...@googlegroups.com
....found the load binaries:

[tobiasko@bs-dsvr09 loader]$ ll
total 5716
-rwxrwxr-x 1 wernerv bsse-dsf 1915312 Oct 30 15:05 hgLoadBed
-rwxrwxr-x 1 wernerv bsse-dsf 1914528 Oct 30 15:05 hgLoadMaf
-rwxrwxr-x 1 wernerv bsse-dsf 1912144 Oct 30 15:05 hgLoadWiggle
-rwxrwxr-x 1 wernerv bsse-dsf 87816 Oct 30 15:05 wigEncode
[tobiasko@bs-dsvr09 loader]$ pwd
/array0/ucsc/cgi-bin/loader

Tao Liu

unread,
Mar 2, 2009, 2:16:52 PM3/2/09
to macs-ann...@googlegroups.com
Hi Tobi,

Check this document <http://genomewiki.ucsc.edu/index.php/Adding_New_Tracks_to_a_browser_installation
>. There are plenty of other useful hints on this genome wiki site.

Here is the shell script I used:

#!/bin/bash

if [ $# -lt 2 ];then
echo 'l.sh <dbname> <wiggle file>!'
exit
fi

DB=ce4
WEBROOT=/Library/WebServer/Documents/

wigEncode ${2} ${1}.wig ${1}.wib
hgLoadWiggle ${DB} ${1} ${1}.wig
mv ${1}.wib /gbdb/${DB}/wib/
#####eof#####

Which will encode wiggle files to binary wib files and copy wib files
to gbdb ( binary db) directory. Then you need to modify the source
code in kent's source code tree: kent/src/hg/makeDb/trackDb/worm/ce4/
trackDb.ra ( I am dealing with C. elegans data). For example, add this:

track XXXTags
shortLabel XXX tags from XXX lab
longLabel Raw tags for XXXX
group genes
priority 100
visibility hide
chromosomes chrI,chrII,chrIII,chrIV,chrV,chrX
color 0,100,100
type wig 0 50
spanList 10

Then run kent/src/hg/makeDb/trackDb/loadTracks:

$ ./loadTracks trackDb hgFindSpec ce4

Regards,
Tao
Reply all
Reply to author
Forward
0 new messages