liftOver for single base position

293 views
Skip to first unread message

Hoover, David (NIH/CIT) [E]

unread,
Jan 15, 2021, 10:56:31 AM1/15/21
to gen...@soe.ucsc.edu
I've noticed that liftOver can convert single base positions when running with -positions, but not with BED files.

For example, if the input file is a text file with content

chr7:127471196-127471196

then

liftOver -positions old chain.file new unmapped

works, but if the input file is a BED file with content

chr7 127471196 127471196

then

liftOver old.bed chain.file new.bed unmapped

fails, marking the base position as deleted.

I can work around this by first incrementing the end position by one, running liftOver, then decrementing the end position in the result, but it is kind of baffling.

Why shouldn't a BED file allow a single base?

David
--
David Hoover, Ph.D.
Computational Biologist
High Performance Computing Services,
Center for Information Technology,
National Institutes of Health
12 South Dr., Rm 2N207
Bethesda, MD 20892, USA
TEL: (+1) 301-435-2986
Email: hoov...@hpc.nih.gov

Gerardo Perez

unread,
Jan 15, 2021, 7:29:09 PM1/15/21
to Hoover, David (NIH/CIT) [E], genome

Hello, David.

Thank you for your question about converting single base positions.

A BED file should allow a single base. The issue with the coordinate you shared is that the coordinate is a 1-based coordinate (chr7 127471196 127471196). Most of our tools, including our LiftOver BED coordinate formatting, require the zero-based coordinate format. In BED format, items where the chromosome start and end position are the same, have a size of 0 bases, such as chr7 127471196 127471196. The following blog post has a lot of great details about how coordinates work in BED format and position format: http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/.

To convert this base position using LiftOver BED coordinates you would have to subtract 1 from the start position where the start and end positions were the same. So, If we do this to your coordinate, we get chr7 127471195 127471196. This is equivalent to chr7:127471196. In short, for single base items in BED format, the end position is the genomic position.

Here's a previous mailing list answer that includes a short piece of awk code that can adjust the start position: https://groups.google.com/a/soe.ucsc.edu/g/genome/c/K_ZWDs_NRQY/m/SJzDRlSLAAAJ

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/498507e7-9684-e4c9-3b0e-e4d58fd25c5c%40hpc.nih.gov.
Reply all
Reply to author
Forward
0 new messages