position gaps in liftOver

91 views
Skip to first unread message

Marie Saitou

unread,
Jul 7, 2016, 11:37:48 AM7/7/16
to gen...@soe.ucsc.edu
Dear UCSC genome browser team,


I am afraid but there may be an error in liftOver pantro3-hg19 chain.

When I input pantro3 chr1 127140000 127140002, liftover result is hg19
chr1 110255311 110255313.
But it should be chr1 110255312 110255314, considering UCSC genome
browser sequences of that region in hg19 and pantro3.
Is there any way to detect and fix these kind of errors?

Or, when I input pantro3 chr1:12714001-12740001 on online liftOver,
the output is hg19 chr1:110255313-110255313, seems to be precise.
However, I would like to convert a bunch of SNP data so I want to use
local one (online one stops when I upload a huge file), but local
liftOver ( macOSX.x86_64 version) seems to not recognize the
"chr1:12714001-12740001" format. Is there any way to input the
"chr1:12714001-12740001" format in the local liftOver?


--

Marie SAITOU
Unit of Human Biology & Genetics,
Department of Biological Sciences,
Graduate School of Science,
The University of Tokyo

Cath Tyner

unread,
Jul 7, 2016, 6:36:53 PM7/7/16
to Marie Saitou, UCSC Genome Browser Public Help Forum
Hello Marie,

As you noted, the following conversion provides expected results when using web-based liftOver:

panTro3 chr1:127140001-127140001 (1-based)

lifts to 

hg19 hg19 chr1:110255313-110255313 (1-based)

Please note the web-based output file extension is misleading in this case; while titled "*.bed" the positional output is not actually in 0-based BED format, since you gave 1-based positional input. 

​There are two types of coordinate formats that you can use:

1) ​0-based-start format coordinates (such as BED format) and 
2) 1-based-start "positional" format coordinates. 

The example above is in the 1-based-start "positional" format. 

You can go back to the web-based liftOver tool and enter the same position to lift, but this time, use the "0-based-start BED format." 

For this test, change your 1-based "position" formatchr1:127140001-127140001
to the 0-based-start BED formatchr1 127140000 127140001 

Note the differences: 
1 is subtracted from the start coordinate (127140001 - 1 = 127140000), and the punctuation delimiters (colon ":" and dash "-") are replaced with a space. 

Here are the web-based liftOver results, where the input and output are 0-based BED format.

panTro3 chr1 127140000 127140001 (0-based)

lifts to 

hg19 hg19 chr1 110255312 110255313 (0-based)

Since you have a list of 1-based "position" formatted coordinates, and you want to use the command-line liftOver utility, you will need to specify that you are using "positional" coordinates to the liftOver utility.

To view the liftOver utility usage statement and options, enter "liftOver" on your command-line (with no other parameters and without the quotes). 

Command-line LiftOver Utility examples:

1. Using input of 0-based-start BED format coordinates
The liftOver utility expects this format, no special options are needed in the command. 
liftOver panTro3.bed liftOver/panTro3ToHg19.over.chain.gz mapped unMapped

input: panTro3.bed
chr1 127140000 127140001

Results in output file "mapped":
chr1 110255312 110255313

2. Using input of 1-based-start "positional" format coordinates
The liftOver utility needs the "-positions" option included in the command. 
liftOver -positions panTro3.txt liftOver/panTro3ToHg19.over.chain.gz mapped unMapped

input: panTro3.txt
chr1:127140001-127140001

Results in output file "mapped":
chr1:110255313-110255313


Explanation:

0-based start
A 0-based start counting mechanism
​, such as BED file format, 
begins counting at 0 (instead of 1). For example, starting at 0 and incrementing by 1, the number “5” will be in position 6.

0-based count:    0-1-2-3-4-5
position:              1-2-3-4-5-6

For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.

1-based start
A 1-based start counting mechanism begins counting at 1 (instead of 0). For example, starting at 1 and incrementing by 1, the number “5” will also be in position 5.

1-based count:    1-2-3-4-5
position:              1-2-3-4-5

For example, the first 100 bases of a chromosome are defined as chromStart=
​1​
, chromEnd=100, and span the bases numbered 
​1​
-
​100​.


Resources:

If you submit data to the browser in position format (chr#:##-##), the browser assumes this information is 1-based. If you submit data in any other format (BED (chr# ## ##) or otherwise), the browser will assume it is 0-based. You can see this both in our liftOver utility and in our search bar.

Please respond to this list if you have further questions!

Thank you again for your inquiry and for using the UCSC Genome Browser. 
​Please send new and follow-up questions to one of our UCSC Genome Browser mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

UCSC Genome Browser Announcements List (email alerts for new data & software):
  * Subscribe: Email genome-annou...@soe.ucsc.edu 
  * Unsubscribe: Email genome-announ...@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube

​Enjoy,​
Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.


Marie Saitou

unread,
Jul 8, 2016, 12:24:23 PM7/8/16
to Cath Tyner, UCSC Genome Browser Public Help Forum
Dear Cath,

Thank you very much for your detailed explanation. I understood how
the formats works and my problem is solved, I think.

Sincerely,
Marie
Reply all
Reply to author
Forward
0 new messages