liftOver for genes which are moved to different chromosomes

20 views
Skip to first unread message

maka...@mskcc.org

unread,
Feb 18, 2020, 7:26:32 PM2/18/20
to gen...@soe.ucsc.edu

Good evening

 

I wonder if there is a way to deal with some genes when we translate between assemblies?

 

For example:

 

CD24:

chrY:21152526-21154705 - hg19

chr6:106969831-106975465 - hg38

 

When we translate from hg19 to hg38 coordinates, tools simply adds/subtracts values

https://www.ytree.net/hg19tohg38.html

It does not move genes between chromosomes.

Thus,

chrY:21152526-21154705

becomes

chrY:18990640-18992819

and not

chr6:106969831-106975465

 

Thank you very much,

 

Vlad

 

=====================================================================

Please note that this e-mail and any files transmitted from
Memorial Sloan Kettering Cancer Center may be privileged, confidential,
and protected from disclosure under applicable law. If the reader of
this message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient,
you are hereby notified that any reading, dissemination, distribution,
copying, or other use of this communication or any of its attachments
is strictly prohibited. If you have received this communication in
error, please notify the sender immediately by replying to this message
and deleting this message, any attachments, and all copies and backups
from your computer.

Luis Nassar

unread,
Feb 21, 2020, 5:51:53 PM2/21/20
to maka...@mskcc.org, UCSC Genome Browser Discussion List

Hello Vlad,

Thank you for your interest in the Genome Browser.

Our liftOver chains, which is what the liftOver program references, are based on whole-genome sequence alignments, and are filtered to include only the best alignment on the "from" genome (hg19 in this case). The best alignment of that part of hg19 chrY sequence is to hg38 chrY sequence, not chr6 sequence.

In the case of CD24, it is debatable whether is belonged on hg19 chrY to begin with. In hg19, it appears that chr6 was missing some sequence, and the best alignment of CD24 transcript sequence was to chrY. However, there is a fix patch for that part of chr6 in hg19 (and the fix was incorporated in hg38 chr6). If you click the following session link, you will see updated CD24 annotation on the chr6 fix sequence: http://genome-preview.soe.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=lou&hgS_otherUserSessionName=hg19CD24Fix

We are currently in the process of updating our liftOver chains, which should catch some of these cases.

In the meantime, a solution would be to query our data tables directly with the gene identifiers or gene symbols of interest. The coordinates you reference:

chrY:21152526-21154705 - hg19
chr6:106969831-106975465 - hg38

Correspond to a transcript with the accession NM_013230. If you go to the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) and make the following selections:

image.png

Then, click on identifiers (names/accessions): and paste your list of identifiers (in this case NM_013230). Then get output and make these selections on the following screen: chrom txStart txEnd geneSymbol. Finally get output again. This will prompt a file download, and the file should look as follows:

#hg38.knownGene.chrom    hg38.knownGene.txStart    hg38.knownGene.txEnd    hg38.kgXref.geneSymbol
chr6    106969830    106975465    CD24

You can vary the checked boxes if you would prefer more/less data. If you pass the CD24 gene symbol as the identifier instead of the specific transcript accession NM_013230, your output will include all the transcripts of the CD24 gene.

I hope this is helpful. If you have any further questions, please include gen...@soe.ucsc.edu in your reply to ensure visibility by the team. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/10214219-15CF-449E-86E1-280F3610D878%40mskcc.org.

maka...@mskcc.org

unread,
Feb 24, 2020, 10:15:39 AM2/24/20
to lrna...@ucsc.edu, gen...@soe.ucsc.edu

Good evening,

 

Thank you very much for your response.

Basically, if I tells us the hg38 coordinates of CD24 (which was on chrY in HG19)

 

chr6       106969830          106975465          CD24

 

My task is to convert not the whole gene coordinates, but enrichment peaks (bed file) with many regions.

 

It is a little off-topics, but I found an NCBI re-map tool

 

https://www.ncbi.nlm.nih.gov/genome/tools/remap

 

which provides conversion (result is practically the same as yours), but accept multiple peaks in bed format in hg19 and spits out their coordinates in hg38.

 

https://www.ncbi.nlm.nih.gov/genome/tools/remap/JSID_01_180209_130.14.18.128_9000_remap__1582339494

 

 

Would you recommend using it? It is slow, but seems to do right conversion.

 

Thank you

 

Vlad

 

 

From: Luis Nassar <lrna...@ucsc.edu>
Date: Friday, February 21, 2020 at 5:51 PM
To: "Makarov, Vladimir/Sloan Kettering Institute" <maka...@mskcc.org>
Cc: UCSC Genome Browser Discussion List <gen...@soe.ucsc.edu>
Subject: [EXTERNAL] Re: [genome] liftOver for genes which are moved to different chromosomes

 

Hello Vlad,

Thank you for your interest in the Genome Browser.

Our liftOver chains, which is what the liftOver program references, are based on whole-genome sequence alignments, and are filtered to include only the best alignment on the "from" genome (hg19 in this case). The best alignment of that part of hg19 chrY sequence is to hg38 chrY sequence, not chr6 sequence.

In the case of CD24, it is debatable whether is belonged on hg19 chrY to begin with. In hg19, it appears that chr6 was missing some sequence, and the best alignment of CD24 transcript sequence was to chrY. However, there is a fix patch for that part of chr6 in hg19 (and the fix was incorporated in hg38 chr6). If you click the following session link, you will see updated CD24 annotation on the chr6 fix sequence: http://genome-preview.soe.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=lou&hgS_otherUserSessionName=hg19CD24Fix

We are currently in the process of updating our liftOver chains, which should catch some of these cases.

In the meantime, a solution would be to query our data tables directly with the gene identifiers or gene symbols of interest. The coordinates you reference:

chrY:21152526-21154705 - hg19
chr6:106969831-106975465 - hg38

Correspond to a transcript with the accession NM_013230. If you go to the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) and make the following selections:



*** Only open attachments or links from trusted senders. Report phishing to inf...@mskcc.org ***

 

Matthew Speir

unread,
Feb 25, 2020, 3:01:58 PM2/25/20
to maka...@mskcc.org, Lou Nassar, UCSC Genome Browser Discussion List
Hello, Vlad.

There are a few different tools out there for converting coordinates between assemblies, including LiftOver (UCSC) and ReMap (NCBI) as well as CrossMap (http://crossmap.sourceforge.net/). There are regions each tool struggles with, so use the one where the results best align with what you are wanting to do with the results. 

If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genomics Institute





--
Matthew Speir
User Experience, Quality Assurance and User Support
HCA, CIRM, and UCSC Genome Browser
UCSC Genomics Institute
Reply all
Reply to author
Forward
0 new messages