liftOver and BroadPeak data (human genome)

749 views
Skip to first unread message

Ghislain Durif

unread,
Mar 5, 2014, 11:16:08 AM3/5/14
to gen...@soe.ucsc.edu
Dear UCSC Community,

I am sorry if my question has already been asked but I really looked into FAQ and mailing list without finding any answer to my problem.

I am working on Histone epigenetic marks on Human Genome, and I am espicacially interested in broad peaks such as wgEncodeBroadHistoneK562H3k27acStdPk.broadPeak.gz found at http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeBroadHistone (or http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHistone/ on the download server).

I would like to lift over this data set from hg18 to hg19, but I encountered problems with both the online and local tools.

The online tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver) applied on the uncompressed BroadPeak file returns the following error (browsing on Firefox 27.0.1 for Ubuntu) :

Warning/Error(s):
  • Expecting integer field 7 line 1 of ../trash/hglft_genome_5b20_7447e0.user, got 2.120582

The local tool (I download binary source for my computer running on Linux Ubuntu 12.04 here http://hgdownload.cse.ucsc.edu/admin/exe/ ) did not succeed either in lifting my BroadPeak file, but I am not sure if I use it in a proper way.

To the command "liftOver -minMatch=0.95 -bedPlus=9 -tab -errorHelp oldFile map.chain newFile unMapped", the shell tells me that it cannot find the newFile and unMapped files (the oldFile and map.chain files are existing). I then tried to create the newFile and unMapped files before with a touch command and I get this output:

Deleted in new:
    Sequence intersects no chains
Partially deleted in new:
    Sequence insufficiently intersects one chain
Split in new:
    Sequence insufficiently intersects multiple chains
Duplicated in new:
    Sequence sufficiently intersects multiple chains
Boundary problem:
    Missing start or end base in an exon

But the newFile and unMapped files stay empty. I precise that BroadPeak data files are supposed to be in BED 6+3 format.

I also tried with the option -bedPlus=6 or without the options -bedPlus and -tab for the same result.


Thank you in advance for your help


-- 
Ghislain DURIF

PhD Student
UMR 5558 Biomety et Evolutive Biology
Claude Bernard Lyon 1 University (France)

Matthew Speir

unread,
Mar 5, 2014, 4:02:44 PM3/5/14
to Ghislain Durif, gen...@soe.ucsc.edu
Hi Ghislain,

Thank you for your question about lifting data between assemblies. The file specified in your email is already data from hg19, so trying to convert these coordinates from hg18 to hg19 is unnecessary. You can view this data on the hg19 browser as part of the 'Histone Modifications by ChIP-seq from ENCODE/Broad Institute' track, http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeBroadHistone.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Ghislain Durif

unread,
Mar 6, 2014, 3:17:20 AM3/6/14
to gen...@soe.ucsc.edu
Thank you for your answer. I understand my mistake. I thought that the origAssembly parameter specified for each files was the actual genome reference assembly, and not just the original one.

Thank you again

Ghislain
-- 
Ghislain DURIF

UMR 5558 Biométrie et Biologie Évolutive
Université Claude Bernard Lyon 1
Reply all
Reply to author
Forward
0 new messages