Genome liftover question

2,657 views
Skip to first unread message

Shiyong Li

unread,
Jun 20, 2014, 5:08:03 PM6/20/14
to genome, lishiyong
Hi:

I have a question about the hg18->hg19 genomic coordinates conversion. I used the BED format as input. And I get the information:
"Successfully converted 14828 records: View Conversions 
Conversion failed on 305 records.    Display failure file    Explain failure messages"
The error output is been attached.

The 305 region is exists in hg18 genes region, why it can not be converted?
For the explain:
Deleted in new:
    Sequence intersects no chains
Partially deleted in new:
    Sequence insufficiently intersects one chain
Split in new:
    Sequence insufficiently intersects multiple chains
Duplicated in new:
    Sequence sufficiently intersects multiple chains
Boundary problem:
    Missing start or end base in an exon


I can not understand this, could you give me any detail information about this.

Best Wishes.

Shiyong Li
hg18_2.txt

Jonathan Casper

unread,
Jun 23, 2014, 7:58:58 PM6/23/14
to Shiyong Li, genome

Hello Shiyong,

Thank you for your question about the liftOver tool. The 305 regions were not converted for several reasons. Most of those regions were not converted because they were "Deleted in new". This means that there was no region in hg19 that aligned well to your region in hg18. Some error messages were for regions that were "Partially deleted in new" means that there was a piece of an alignment that matched your hg18 region, but not enough to convert the region to hg19. Some of your error messages were "Split in new", which means that your hg18 region was split up into different parts of hg19.

All of these issues are expected to happen when moving to a new assembly. Some parts of the assembly will change or move around, and mistakes from bad data may be fixed.

One of our engineers suggests that you can turn on the "Allow multiple output regions" checkbox to allow the output to still be mapped, even if it gets split onto multiple parts of the new genome. This will help reduce the number of "Split in new" errors. You will need to submit your data file in BED4 format (with a unique name or number for each region to be lifted) to permit this option. Our engineer also suggests that lowering the value of "Minimum ratio of bases that must remap" from the default of 0.95 will reduce the number of "Partially deleted in new" errors. Please note that this may also reduce the quality of your results.

Some of your errors may also occur for other reasons. One of your hg18 regions, for example, is "chr14 106805208 106805716". This region does not exist in hg18 - chr14 ends at 106,368,585. The "Deleted in new" error for this entry means that there was no alignment to this region because it does not exist in hg18.

If you would like to learn more about chains and how they are used, please see the wiki page at http://genomewiki.ucsc.edu/index.php/Chains_Nets.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group



--


Reply all
Reply to author
Forward
0 new messages