Liftover: question about orders for input and output files

24 views
Skip to first unread message

WANG, YAN

unread,
Sep 7, 2016, 2:04:21 PM9/7/16
to gen...@soe.ucsc.edu, Elfaramawi, Mohammed

Good afternoon,

 

I am a liftover user from University of Arkansas for Medical Sciences. I have a question about the orders of SNPs for the input and output files. Are the orders for these two files the same? For example, my uploaded data is hg17, and I the new assembly to be hg19. The SNPs in my uploaded file is in the order of “1, 2, 3, 4, … ,998, 999, 1000”. Will the SNPs from the output file in the exact same order please? I am looking forward to hearing from you. Thanks!


Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.

Matthew Speir

unread,
Sep 9, 2016, 1:04:15 PM9/9/16
to WANG, YAN, gen...@soe.ucsc.edu, Elfaramawi, Mohammed
Hi Yan,

Thank you for your question about the LiftOver tool in the UCSC Genome Browser.

Based on a small experiment, it appears that the order of regions in the input is maintained in the output as well. For example, I used web-based LiftOver to convert the following regions from hg38 to hg19:

    chrX   151383000   151390000 region1
    chrX   151183000   151190000  region2
    chrX   151073054   151173000 region3
    chrX   151283000   151290000 region4
    chr1 100000000 100005000 region5

And the order of these regions was maintained in the output:

    chrX    150551472    150558472    region1
    chrX    150351472    150358472    region2
    chrX    150241526    150341472    region3
    chrX    150451472    150458472    region4
    chr1    100465556    100470556    region5

This little experiment was limited in scope and may not reflect what's done with a large amount on regions or with regions that map to multiple locations. I would highly recommend that you give each region a unique name in the input so that input regions can be definitively matched with output regions.

Additionally, if your SNPs have rsIDs associated with them, you may be able to use the snp147.txt.gz files we have on our downloads server and some UNIX commands like "grep" to extract the regions for these SNPs from the most recent SNP build available in the UCSC Genome Browser. You can find the snp147.txt.gz here: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/snp147.txt.gz.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages