TCGA LiftOver

116 views
Skip to first unread message

Jorge Fernandez de Cossio

unread,
Mar 30, 2016, 5:38:22 PM3/30/16
to gen...@soe.ucsc.edu

Dear Greg Roe,

 

I will appreciate any related to:

 

I tried LiftOver TCGA aml data genome version 36 to version 37.

I created the BED tab delimited file like the below with the chr,  StartPos and EndPos from the maf file, and pasted in the data and submitted.

chr3 127396114  127396115

chr3 127396114  127396115

chr15 86262345   86262351

chr4 36149165   36149168

chr6 129950550  129950553

chr20 31023271   31023272

chr2 25967233   25967234

chr12 14619468   14619471

chr5 40981538   40981540

The output was:

Successfully converted 220 records

Conversion failed on 2324 records

Almost all of the type #Deleted in new

 

Is there something wrong? Can I hope to use all these important project data remapped to updated genomes?

Thank you in advance for help

 

Best regards, -jorge

 

Jorge Fernandez de Cossio

Center for Genetic Engineering and Biotechnology

Cuba

 

 

 


************************************************************************
    CIGB, Inaugurado el 1ro de julio de 1986 por Fidel. 30 Años de apuestas por la vida.
************************************************************************

Matthew Speir

unread,
Apr 1, 2016, 1:37:17 PM4/1/16
to Jorge Fernandez de Cossio, gen...@soe.ucsc.edu
Hi Jorge,

Thank you for your question about the LiftOver utility in the UCSC Genome Browser.

If your LiftOver query includes some failed conversions, you will see a "failure file" that includes messages like "#Deleted in new". Alongside the link to the failure file, there will be a link titled "Explain failure messages" that includes a little bit more information on what the different messages in the failure file mean. Here is a direct link to the explanation of these failure messages: http://genome.ucsc.edu/cgi-bin/hgLiftOver?hglft_errorHelp=1.

Here is the explanation for "Deleted in new" from that link:

Deleted in new:
    Sequence intersects no chains

This means that the regions you specified in your LiftOver query don't match up to any regions in the new genome/assembly. This is not unexpected when going from one assembly release of a genome to the next. A genome will change from one assembly to the next it as it's improved by adding, changing or removing sequence. If you're curious, you can look at these regions in the NCBI36/hg18 browser alongside the "Hg19 Diff" track, http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=hg18ContigDiff. This track shows contigs used in the NCBI36/hg18 assembly that are different in the GRCh37/hg19 assembly. Here you can see a case where an entire contig in the hg18 assembly was dropped from the hg19 assembly:

http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=mspeir&hgS_otherUserSessionName=hg18_contigChanged

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages