generating BED with sliding-windows intervals

5,963 views
Skip to first unread message

Assaf Gordon

unread,
Jan 24, 2012, 11:20:15 AM1/24/12
to bedtools...@googlegroups.com
Hello Aaron and all,

Is there a recommend (and quick and easy) way to generate a BED file containing "windows" intervals of an entire genome ?

Example, if I wanted 1MB windows over the human genome, a program that would produce:
===
chr1 0 1000000
chr1 1000000 2000000
chr1 2000000 3000000
...
===

Or if I wanted a sliding-window with 0.5MB overlap:
===
chr1 0 1000000
chr1 500000 1500000
chr1 1000000 2000000
chr1 1500000 2500000
...
===

Seems like something that might be useful in many cases.

Is there such a program or does it need to be re-invented ?

Thanks,
-gordon

Aaron Quinlan

unread,
Jan 24, 2012, 11:23:35 AM1/24/12
to bedtools...@googlegroups.com
Hi Gordon,

Version 2.15.0 has a tool called "makewindows" for this.

Ex1. 1MB, non-sliding (what's the right word for this?)
bedtools makewindows -g hg19.genome -w 1000000

E2. 1MB, sliding with a step size of 500K
bedtools makewindows -g hg19.genome -w 1000000 -s 500000

Best,
Aaron

Assaf Gordon

unread,
Jan 24, 2012, 11:35:37 AM1/24/12
to bedtools...@googlegroups.com
BEDTools to the rescue! as usual.

Thanks Aaron!

Assaf Gordon

unread,
Jan 26, 2012, 11:51:22 AM1/26/12
to bedtools...@googlegroups.com
To continue the window-making question:

Now I would like to generate windows based on given intervals, not based on entire genomes.
i.e. for each given interval, I want to generate either N windows equal in size, or windows of size N nucleotides.

Example - input:
chr1 50000 60000
chr2 4000 4500

Example - output with 5 windows for each input interval:
chr1 50000 52000
chr1 52000 54000
chr1 54000 56000
chr1 56000 58000
chr1 58000 60000
chr2 4000 4100
chr2 4100 4200
chr2 4200 4300
chr2 4300 4400
chr2 4400 4500

Is there an easy way to do it (easy - not writing a new script) ?

Thanks,
-gordon

Aaron Quinlan

unread,
Jan 26, 2012, 12:20:24 PM1/26/12
to bedtools...@googlegroups.com
Hi Gordon,

There's not a tool in bedtools that will do this, but there probably should be. This likely buggy awk script may get you close. In the interim, I will add this feature to the "to do" list.


awk -v num_win=5 '{len = $3-$2; size = len/num_win; \
for (start=$2; start<$3; start+=size) \
print $1"\t"start"\t"start+size}' < test

Assaf Gordon

unread,
Jan 26, 2012, 3:41:27 PM1/26/12
to bedtools...@googlegroups.com
Hi Aaron,

Aaron Quinlan wrote, On 01/26/2012 12:20 PM:
> Hi Gordon,
>
> There's not a tool in bedtools that will do this, but there probably should be. This likely buggy awk script may get you close. In the interim, I will add this feature to the "to do" list.
>

I've added those features to "makewindows".

The code is available in this branch:
https://github.com/agordon/bedtools/tree/feature/makewindows_extra1

It includes for commits, that gradually refactor the code (hopefully make it easier to understand and merge):
https://github.com/agordon/bedtools/commits/feature/makewindows_extra1

Comments are welcomed,
-gordon


The updated "bedtools makewindows" looks like this:
==========================

*****
*****ERROR: Need -g (genome file) or -b (BED file) for interval source.
*****

*****
*****ERROR: Need -w (window size) or -n (number of windows).
*****

Tool: bedtools makewindows
Version: 2.15.0
Summary: Makes adjacent and/or sliding windows across a genome.

Usage: bedtools makewindows [OPTIONS] [-g <genome> OR -b <bed>]
[ -w <window_size> OR -n <number of windows> ]

Input Options:
-g <genome>
Genome file size (see notes below).
Windows will be created for each chromosome in the file.

-b <bed>
BED file (with chrom,start,end fields).
Windows will be created for each interval in the file.

Windows Output Options:
-w <window_size>
Divide each input interval (either a chromosome or a BED interval)
to fixed-sized windows (i.e. same number of nucleotide in each window).
Can be combined with -s <step_size>

-s <step_size>
Step size: i.e., how many base pairs to step before
creating a new window. Used to create "sliding" windows.
- Defaults to window size (non-sliding windows).

-n <number_of_windows>
Divide each input interval (either a chromosome or a BED interval)
to fixed number of windows (i.e. same number of windows, with
varying window sizes).

Notes:
(1) The genome file should tab delimited and structured as follows:
<chromName><TAB><chromSize>

For example, Human (hg19):
chr1 249250621
chr2 243199373
...
chr18_gl000207_random 4262

Tips:
One can use the UCSC Genome Browser's MySQL database to extract
chromosome sizes. For example, H. sapiens:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
"select chrom, size from hg19.chromInfo" > hg19.genome

Examples:
# Divide the human genome into windows of 1MB:
$ bedtools makewindows -g hg19.txt -w 1000000


chr1 0 1000000
chr1 1000000 2000000
chr1 2000000 3000000

chr1 3000000 4000000
chr1 4000000 5000000
...

# Divide the human genome into sliding (=overlapping) windows of 1MB, with 500KB overlap:
$ bedtools makewindows -g hg19.txt -w 1000000 -s 500000


chr1 0 1000000
chr1 500000 1500000
chr1 1000000 2000000
chr1 1500000 2500000

chr1 2000000 3000000
...

# Divide each chromosome in human genome to 1000 windows of equal size:
$ bedtools makewindows -g hg19.txt -n 1000
chr1 0 249251
chr1 249251 498502
chr1 498502 747753
chr1 747753 997004
chr1 997004 1246255
...

# Divide each interval in the given BED file into 10 equal-sized windows:
$ cat input.bed
chr5 60000 70000
chr5 73000 90000
chr5 100000 101000
$ bedtools makewindows -b input.bed -n 10
chr5 60000 61000
chr5 61000 62000
chr5 62000 63000
chr5 63000 64000
chr5 64000 65000
...


Aaron Quinlan

unread,
Jan 26, 2012, 5:26:12 PM1/26/12
to bedtools...@googlegroups.com
Hi Gordon,

This is a great addition. I will add it to the next release, but it might take me a bit to merge it into the repository, as I am integrating a bunch of other changes.

Gratefully,
Aaron

Assaf Gordon

unread,
Jan 27, 2012, 12:23:44 PM1/27/12
to bedtools...@googlegroups.com
Hi Aaron,

One tiny addition + bugfix to "makewindows", based on requests of users in our lab:
The ability to give each window a numeric ID, which will help group the results later (with groupBy).

A second commit is a bugfix - an extra window was printed when the interval size evenly divide by the window size.

The branch is here:
https://github.com/agordon/bedtools/tree/feature/makewindows_with_IDs1

The commits are here:
https://github.com/agordon/bedtools/commits/feature/makewindows_with_IDs1

I based them on your latest "master", hopefully all the whitespace is OK this time.

-gordon


The added option looks like this:
===========
ID Naming Options:
-i src|winnum|srcwinnum
The default output is 3 columns: chrom, start, end .
With this option, a name column will be added.
"-i src" - use the source interval's name.
"-i winnum" - use the window number as the ID (e.g. 1,2,3,4...).
"-i srcwinnum" - use the source interval's name with the window number.
See below for usage examples.

# Add a name column, based on the window number:
$ cat input.bed
chr5 60000 70000 AAA
chr5 73000 90000 BBB
chr5 100000 101000 CCC
$ bedtools makewindows -b input.bed -n 3 -i winnum
chr5 60000 63334 1
chr5 63334 66668 2
chr5 66668 70000 3
chr5 73000 78667 1
chr5 78667 84334 2
chr5 84334 90000 3
chr5 100000 100334 1
chr5 100334 100668 2
chr5 100668 101000 3
...

# Add a name column, based on the source ID + window number:
$ cat input.bed
chr5 60000 70000 AAA
chr5 73000 90000 BBB
chr5 100000 101000 CCC
$ bedtools makewindows -b input.bed -n 3 -i srcwinnum
chr5 60000 63334 AAA_1
chr5 63334 66668 AAA_2
chr5 66668 70000 AAA_3
chr5 73000 78667 BBB_1
chr5 78667 84334 BBB_2
chr5 84334 90000 BBB_3
chr5 100000 100334 CCC_1
chr5 100334 100668 CCC_2
chr5 100668 101000 CCC_3
...

===========

Aaron Quinlan

unread,
Jan 29, 2012, 8:15:27 PM1/29/12
to bedtools...@googlegroups.com
Hi Gordon,

These look like useful additions. Would you mind making a "Pull Request" on Github? This will make integrating the code very simple for me.

Thanks again,
Aaron

w.elm...@gmail.com

unread,
Feb 1, 2017, 3:07:32 PM2/1/17
to bedtools-discuss
Hi Aaron and Assaf,


I'm currently trying to use bedtools makewindows.
I have a question, when sliding the window, is there a way to slide from the center and not from the begining of the window ?

I'd like to calculate the score of  a window of size k as the number of reads spanning the window minus those with an endpoint within the window. I would assign the determined score to the center of the window.
Lets say k = 16 and i want to test all bp.

the command would be :

$ bedtools makewindows -b input.bed -w 16 -s 1 -i winnum > output.bed ???? Thank you !


Best,

Walid


Reply all
Reply to author
Forward
0 new messages