pslMap -swapMap -chainMapFile human_peaks.psl hg19ToMm9.over.chain mapped_human_peaks.psl
pslMap -swapMap -chainMapFile mouse_peaks.psl mm9ToHg19.over.chain mapped_mouse_peaks.psl
pslMap -chainMapFile mouse_peaks.psl hg19ToMm9.over.chain mapped_mouse_peaks.psl
Hello David,
Thank you for your question about converting genome coordinates with the pslMap tool. You are quite right - the difference is because the liftOver chain files are constructed to be single-coverage in the target, but not in the query. One of our engineers comments that we build liftOver chains de novo on each assembly, and they are not expected to be consistent with the chains for the mirrored direction.
The construction of the liftOver chains is a bit convoluted. To build the hg19ToMm9 chains, we start by aligning mm9 sequences to hg19. This may result in a single mm9 sequence being aligned to multiple hg19 regions. It may also result in multiple mm9 sequences finding a match in the same hg19 region. We then filter for single coverage in the "target" (in this case, human). This leaves most of the first set intact (where a single mm9 sequence goes to multiple hg19 sequences), but filters the second set down to a single match for each hg19 region. You then have two choices of how to use the mapping.
The first option (-swapMap) uses this mapping in the hg19-to-mm9 direction. It maps more reads because the filtering didn't affect the amount of hg19 covered by the chains, but sometimes results in collisions. The alternative is to use the mapping in the mm9-to-hg19 direction (no -swapMap). This maps fewer reads because the filtering did affect the amount of mm9 covered by the chains. It also will sometimes map one mm9 location to multiple locations in hg19. This means that your example plsMap command without the -swapMap option will place the same peak annotation in multiple regions of hg19. Only you can really answer which method makes more sense for your project.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--