Question about LiftOver program

87 views
Skip to first unread message

A l

unread,
Feb 2, 2021, 11:41:55 AM2/2/21
to gen...@soe.ucsc.edu
Hello,

I am Aude Martin, a french student in bioinformatics. I have a question about your LiftOver program.
Is it possible to use your tool with an input file, which has this format ?:
chr_a          start_a          end_a          chr_b          start_b          end_b          weight
For exemple: chr5     18291818      18292839      chr7      162783     162888       9
In fact it is a file with the coordinates of two regions of the hg38 genome that interacts and I would like to obtain the same file but for hg19. It has been obtain with the technology ChIA-PET. 
Maybe it is impossible to do it with your tool, but maybe it is possible to do it with a linux cut commande, and in this case I would be happy to obtain your help.

Thank you a lot for your answers in advance, I hope that I don't bother you too much.
Aude Martin, student in bioinformatics 

Luis Nassar

unread,
Feb 8, 2021, 8:27:30 PM2/8/21
to A l, gen...@soe.ucsc.edu

Hello, Aude.

Thank you for your interest in the Genome Browser.

While liftOver cannot directly convert a file in that format, you can reformat the file, covert it, and recover it as you suggest.

The liftOver utility can be found in our download directory (http://hgdownload.soe.ucsc.edu/admin/exe/). You will want to download the utility corresponding to your operating system, for example linux is found here (http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver):

$ wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver
$ chmod +x liftOver

You can run the utility on its own to see a help message, e.x.

$ ./liftOver

You can start with a file as you suggest, separated by tabs (or spaces), note that you will want to comment out the header line, or remove it:

$ cat ChIAPETdata.tsv
#chr_a    start_a    end_a    chr_b    start_b    end_b    weight
chr5    18291818    18292839    chr7    162783    162888    9

You will then want to create two separate files, one with the start coordinates and one with the end coordinates. We are also saving the entire record as the 4th column in order to be able to accurately recombine the file:

$ awk '{print $1, $2, $3, $1 ":" $2 ":" $3 ":" $4 ":" $5 ":" $6, $7;}' ChIAPETdata.tsv > startPos.hg38.bed
$ awk '{print $4, $5, $6, $1 ":" $2 ":" $3 ":" $4 ":" $5 ":" $6, $7;}' ChIAPETdata.tsv > endPos.hg38.bed

If you kept the header line, you will want to comment it out in the created endPos.hg38.bed file now.

Both of these new files are bed 4+1 files, with the additional field being the weight column seen in your example. You will now want to download the chain file from hg38 to hg19:

$ wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz

At this point you can run the liftOver command on both files:

$ ./liftOver startPos.hg38.bed hg38ToHg19.over.chain.gz startPos.hg19.bed startPosUnmapped
$ ./liftOver endPos.hg38.bed hg38ToHg19.over.chain.gz endPos.hg19.bed endPosUnmapped

You will now have four new files. startPos.hg19.bed and endPos.hg19.bed will contain the successfully lifted coordinates for each file, and the two Unmapped files will contain any records that failed to lift.

You can now combine the two files and order the results by position:

$ join -t $'\t' -j 4 startPos.hg19.bed endPos.hg19.bed | sort -k1,1 -k2n,2n > liftedJoined.tsv
$ $ head liftedJoined.tsv 
chr5:18291818:18292839:chr7:162783:162888    chr5    18291927    18292948    9    chr7    162783    162888    9

The file will be recombined based on the initial coordinates, which can still be found in the first column of this final output. Keep in mind that there may be unmapped records.

Let us know if you have any questions on this process, or any other Genome Browser features or utilities.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/VI1PR0701MB2830F7CAF7776570E6F0CE6AD8B59%40VI1PR0701MB2830.eurprd07.prod.outlook.com.

A l

unread,
Feb 9, 2021, 11:37:49 AM2/9/21
to Luis Nassar, gen...@soe.ucsc.edu
Hello Lou,

Thank you a lot for your answer, it is totaly clear to me know and everything works.
I just have an extra question: sometimes my weight are in a float format, for exemple 0.087161 and after the liftover command this is modify as a int format, 0.087161 become 0. Is it possible to cancel this convertion?

Thank you a lot for your answers in advance
Aude Martin, student in bioinformatics 




De : Luis Nassar <lrna...@ucsc.edu>
Envoyé : mardi 9 février 2021 02:26
À : A l <ange...@hotmail.fr>
Cc : gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
Objet : Re: [genome] Question about LiftOver program
 

Luis Nassar

unread,
Feb 10, 2021, 4:19:38 PM2/10/21
to A l, gen...@soe.ucsc.edu
Hello, Aude.

We are glad to hear the explanation helped.

The liftOver command should not change the values in the fifth column, meaning that you should be able to follow these same steps with floating points.

We are not able to replicate any behavior that shows otherwise. Would it be possible for you to send us some example records along with a list o the commands you are running and the final output?


I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

A l

unread,
Feb 16, 2021, 12:34:22 PM2/16/21
to Luis Nassar, gen...@soe.ucsc.edu
Hello Luis,

Thank you for your answer, but in fact it came from an error on my part, sorry for that and thank you again everything works now.

Have a nice day.
Aude Martin, student in bioinformatics 


De : Luis Nassar <lrna...@ucsc.edu>
Envoyé : mercredi 10 février 2021 22:18
Reply all
Reply to author
Forward
0 new messages