bed format

32 views
Skip to first unread message

isabelle...@iee.unibe.ch

unread,
Jan 30, 2015, 12:20:23 PM1/30/15
to gen...@soe.ucsc.edu

Good morning,

 

I have a short question concerning bed format.

I read carefully the FAQ on this format:

http://genome.ucsc.edu/FAQ/FAQformat.html#format1

but I still have a doubt.

 

If I define a region using the bed format as:

Chr1        14642   14882

Does it correspond to this interval:  ] 14642  , 14882] -> positions from 14643 (included) to 14882 (included) = 240 bases in total ?

 

Many thanks in advance for your feedback.

And also many thanks for providing such a great browser/platform for the scientific community.

Isa

__________________________

Isabelle Dupanloup Duperret, PhD

University of Bern – IEE – CMPG

Baltzerstrasse, 6

CH-3012 Bern

 

Luvina Guruvadoo

unread,
Feb 2, 2015, 3:25:48 PM2/2/15
to isabelle...@iee.unibe.ch, gen...@soe.ucsc.edu
Hello Isabelle,

Thank you for using the UCSC Genome Browser. chromStart refers to the starting position of the feature on a chromosome, and chromEnd is the ending position. Note, chromEnd is not included in the display of the feature. If you want to include positions 14643 and 14882, the correct BED format should be chr1 14643 14883 (for a total of 240 bases).

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group


--


isabelle...@iee.unibe.ch

unread,
Feb 4, 2015, 12:20:57 PM2/4/15
to gen...@soe.ucsc.edu

Hi there,


 I’m using the bigWigAverageOverBed tool to compute average gerp score (http://hgdownload.cse.ucsc.edu/gbdb/hg19/bbi/All_hg19_RS.bw) over bed intervals.

 

And if I want to get the gerp score for a single position

let’s say: chr1:14643

the correct input bed interval should be:

Chr1      14642    14643

I verified this using the UCSC genome browser.

chromEnd is, in this case, the position to include and not chromStart.


Also in the tables you provided, for instance, the common snps in dbsnp141, snps positions are given as intervals, see below.

but the real positions of the snps correspond to chromEnd  and not chromStart.


#bin	chrom	chromStart	chromEnd	name	score	strand	refNCBI	refUCSC	observed	molType	class	valid	avHet	avHetSE	func	locType	weight	exceptions	submitterCount	submitters	alleleFreqCount	alleles	alleleNs	alleleFreqs	bitfields
837	chr21	33031973	33031974	rs7277748	0	+	A	A	A/G	genomic	single	by-cluster,by-frequency,by-hapmap,by-1000genomes	0.061353	0.164049	untranslated-5	exact	1		9	1000GENOMES,BCM_SSAHASNP,BUSHMAN,COMPLETE_GENOMICS,EGP_SNPS,ILLUMINA,KRIBB_YJKIM,STEJUSTINE-


And this is where i'm confused

since as explained in your FAQ

http://genome.ucsc.edu/FAQ/FAQformat.html#format1

in the bed format defined as

chr chromStart chromEnd

chromStart refers to the starting position of the feature on a chromosome, and chromEnd is the ending position.

chromEnd is not included in the display of the feature. 


I'm confused by this definition.

Could you please explain these apparent discrepancies ?


Many thanks in advance for your feedback.

And also many thanks for providing such a great browser/platform for the scientific community.

Isa


-----------------------------------------

Dr. Isabelle Dupanloup Duperret
CMPG - University of Bern
Baltzerstrasse 6 - CH-3012 Bern
Tel. +41(0)316314549 - Fax +41(0)316314888

Luvina Guruvadoo

unread,
Feb 12, 2015, 11:48:52 AM2/12/15
to Isabelle Duperret Dupanloup, gen...@soe.ucsc.edu
Hello Isa,

Thanks for your question. The BED format you have for a single position is correct. I think you may be confused by "chromEnd is not included in the display of the feature". This only means that chromEnd is not included in the graphical display of the Genome Browser. Here's a session I created using the example in your previous email (chr1 14642 14643 ): http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=luvina&hgS_otherUserSessionName=snp_example.

I hope this helps. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.


- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group

--


Galt Barber

unread,
Feb 12, 2015, 3:10:09 PM2/12/15
to Isabelle Duperret Dupanloup, gen...@soe.ucsc.edu

This is probably our number one FAQ question over the years:

https://genome.ucsc.edu/FAQ/FAQtracks.html#tracks1

http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC165576/
The Genome Browser Database contains both positional tables
with data based on genomic start–stop coordinates and
non-positional tables with data independent of position.
The coordinates in positional tables are defined using
half-open zero-based ranges,
i.e. the first 100 bases of a chromosome are represented as 0,100,
while the next 100 bases are represented as 100,200 and so on.
Half-open coordinates allow the length of a feature to be obtained by
simply subtracting the start from the end and tend to
minimize +/−1 errors during software development.

Here is a question on Biostars about the use and purpose
of various coordinate systems:

https://www.biostars.org/p/6373/

-Galt

--


Reply all
Reply to author
Forward
0 new messages