Understanding the swapMap option

44 views
Skip to first unread message

David Garfield

unread,
May 23, 2014, 1:12:37 PM5/23/14
to gen...@soe.ucsc.edu, Pierre Khoueiry
I've hunted about in the archives, but not found a great explanation, so here goes.
Typical calls to pslMap look like the following. 

1) For translating human peaks to mouse, use the human-to-mouse chain 
pslMap -swapMap -chainMapFile human_peaks.psl hg19ToMm9.over.chain mapped_human_peaks.psl

2) For translating mouse peaks to human, use the mouse-to-human chain

pslMap -swapMap -chainMapFile mouse_peaks.psl mm9ToHg19.over.chain mapped_mouse_peaks.psl

In both cases, the -swapMap option is used so that the psl and the chain file have a common target (right?)

But the question here is, why -swapMap? It seems like one could well use instead a call like:

pslMap -chainMapFile mouse_peaks.psl hg19ToMm9.over.chain mapped_mouse_peaks.psl

And skip the option of -swapMap. 
Both options seem to work to a large degree. The first option gives more mapped region, but cases in which multiple peaks are mapped to the same location in the other genome. 
In the second case, fewer peaks map, but there are never overlaps. 

Is this due to the construction of the chain files such that only the target is single coverage? And, if so, is there a principled reason for not favouring option 2?

Thanks,

David


-------------------------------------------------------------------------------------
David Garfield, PhD
Furlong Group
European Molecular Biology Laboratory (EMBL)

Telephone    +49 6221 387 8426
Fax                 +49 6221 387 166
Snail Meyerhofstraße 1
D-69012 Heidelberg
Germany





Jonathan Casper

unread,
May 27, 2014, 8:03:11 PM5/27/14
to David Garfield, gen...@soe.ucsc.edu, Pierre Khoueiry

Hello David,

Thank you for your question about converting genome coordinates with the pslMap tool. You are quite right - the difference is because the liftOver chain files are constructed to be single-coverage in the target, but not in the query. One of our engineers comments that we build liftOver chains de novo on each assembly, and they are not expected to be consistent with the chains for the mirrored direction.

The construction of the liftOver chains is a bit convoluted. To build the hg19ToMm9 chains, we start by aligning mm9 sequences to hg19. This may result in a single mm9 sequence being aligned to multiple hg19 regions. It may also result in multiple mm9 sequences finding a match in the same hg19 region. We then filter for single coverage in the "target" (in this case, human). This leaves most of the first set intact (where a single mm9 sequence goes to multiple hg19 sequences), but filters the second set down to a single match for each hg19 region. You then have two choices of how to use the mapping.

The first option (-swapMap) uses this mapping in the hg19-to-mm9 direction. It maps more reads because the filtering didn't affect the amount of hg19 covered by the chains, but sometimes results in collisions. The alternative is to use the mapping in the mm9-to-hg19 direction (no -swapMap). This maps fewer reads because the filtering did affect the amount of mm9 covered by the chains. It also will sometimes map one mm9 location to multiple locations in hg19. This means that your example plsMap command without the -swapMap option will place the same peak annotation in multiple regions of hg19. Only you can really answer which method makes more sense for your project.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group



--


Reply all
Reply to author
Forward
0 new messages