[Genome] bed files for entire chromosomes

1,434 views
Skip to first unread message

Minou Bina

unread,
Sep 16, 2011, 3:58:30 PM9/16/11
to gen...@soe.ucsc.edu

Hi

We are planning to analyze entire human chromosomes and create .bed files for display on the human genome browser.

It is clear how we should specify the block position for an entire chromosome and how we should do the numbering.

Minou Bina
Purdue University

Katrina Learned

unread,
Sep 16, 2011, 7:11:18 PM9/16/11
to Minou Bina, gen...@soe.ucsc.edu
Hi Minou,

Here is some information about creating bed files:
http://genome.ucsc.edu/FAQ/FAQformat.html#format1

You might consider converting your bed files to our bigBed format using
the program bedToBigBed. The main advantage of the bigBed files is that
only the portions of the files needed to display a particular region are
transferred to UCSC, so for large data sets bigBed is considerably
faster than regular bed files. The bigBed file remains on your web
accessible server (http, https, or ftp), not on the UCSC server.

I am not sure if this is what you were asking, but to specify a block to
cover the entire chromosome, you would set your chromStart to 0 and your
chromEnd to the length of the chromosome. To find out the length of each
chromosome, from the gateway page
(http://genome.ucsc.edu/cgi-bin/hgGateway), use the drop-down menus to
select your assembly of interest, and under the drop-downs, in the light
blue bar that states, " About the <assembly information> (sequences)",
click on the 'sequences' link. For example, the hg19 sequences link
would take you here:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&chromInfoPage=

I hope this information is helpful. Please contact the mail list
(gen...@soe.ucsc.edu) again if you have any further questions.

Katrina Learned
UCSC Genome Bioinformatics Group
> _______________________________________________
> Genome maillist - Gen...@soe.ucsc.edu
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

Minou Bina

unread,
Sep 19, 2011, 11:22:01 AM9/19/11
to gen...@soe.ucsc.edu



Dear Katrina


Thank you for the info.

Not clear how nucleotide numbering is handled for gaps and chromosome ends when one uses the entire chromosomes to create bed files.

How could one check for quality control?

Minou Bina



----- Original Message -----
From: "Katrina Learned" <kat...@soe.ucsc.edu>
To: "Minou Bina" <bi...@purdue.edu>
Cc: gen...@soe.ucsc.edu
Sent: Friday, September 16, 2011 7:11:18 PM
Subject: Re: [Genome] bed files for entire chromosomes

Hi Minou,

Here is some information about creating bed files:
http://genome.ucsc.edu/FAQ/FAQformat.html#format1

You might consider converting your bed files to our bigBed format using the program bedToBigBed . The main advantage of the bigBed files is that only the portions of the files needed to display a particular region are transferred to UCSC, so for large data sets bigBed is considerably faster than regular bed files. The bigBed file remains on your web accessible server (http, https, or ftp), not on the UCSC server.

I am not sure if this is what you were asking, but to specify a block to cover the entire chromosome, you would set your chromStart to 0 and your chromEnd to the length of the chromosome. To find out the length of each chromosome, from the gateway page ( http://genome.ucsc.edu/cgi-bin/hgGateway ), use the drop-down menus to select your assembly of interest, and under the drop-downs, in the light blue bar that states, " About the <assembly information> (sequences)", click on the 'sequences' link. For example, the hg19 sequences link would take you here:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&chromInfoPage=

I hope this information is helpful. Please contact the mail list ( gen...@soe.ucsc.edu ) again if you have any further questions.

Greg Roe

unread,
Sep 20, 2011, 7:35:06 PM9/20/11
to Minou Bina, gen...@soe.ucsc.edu
Hi Minou,

Whole chromosome coordinates: for example, a 100 base long chromosome
(e.g. chr1:1-100) would be written like this in bed format: chr1 0 100

The Gap table indicates gaps at teleomeres and centromeres. Examples:
chr1 0 10 10-base_telomere
chr1 399 500 100-base_centromere
chr1 89 100 10-base_telomere

If you do not want your annotations to be in gaps, you can intersect
your annotations with the gap table using the table browser to see if
they do have annotations in the gaps:

Table Browser: http://genome.ucsc.edu/cgi-bin/hgTables

- upload your bed (bigBed) file as a custom track and select it in the
table browser.
- click the intersect button and select:
group: mapping and seq. tracks
track: gap
- and submit
- choose and output format and submit


Please let us know if you have any additional questions: gen...@soe.ucsc.edu

-
Greg Roe
UCSC Genome Bioinformatics Group

Minou Bina

unread,
Sep 21, 2011, 10:58:21 AM9/21/11
to Gen...@soe.ucsc.edu


Hi Greg

I am just concerned about the numbering of nucleotides in regions that include gaps

My output consists of nucleotide position and then a motif

50 CGCG
12000 CG

etc

Pauline Fujita

unread,
Sep 21, 2011, 7:39:23 PM9/21/11
to Minou Bina, Gen...@soe.ucsc.edu
Hello Minou,

The coordinate numbering is continuous across gaps, the locations of
which are specified in the gap track as outlined by my colleague.


Best regards,

Pauline Fujita,
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Reply all
Reply to author
Forward
0 new messages