mapping kgp ids to rs ids

628 views
Skip to first unread message

Xingbin

unread,
Jan 4, 2014, 9:42:31 PM1/4/14
to st...@soe.ucsc.edu, gen...@soe.ucsc.edu
Dear Dr. Heitner,

I am trying to map the kpg ids to rs ids, following the instructions
provided by you at
http://redmine.soe.ucsc.edu/forum/index.php?t=msg&goto=13224&S=29fba0172e0e677085609b8e1da196cb.
I have total 1608576 kgp ids, so I have to submit at least 1608 times
duo to 1000 regions limit per time. Do you have alternative method to
map the kgp ids to rs number?

Thanks!

--
===================================================
Xingbin Wang

Research Assistant Professor
Department of Human Genetics & Biostatistics
University of Pittsburgh
https://sites.google.com/site/xingbinwang
===================================================

Pauline Fujita

unread,
Jan 10, 2014, 5:48:01 PM1/10/14
to UCSC Genome Browser discussion list
Hello Xingbin,

We think Galaxy is probably the best fit for your needs. You can find
the homepage for Galaxy here:

https://usegalaxy.org/

We also have a tool that may be of use to you called the Variant
Annotation Integrator (hgVai):

http://genome.ucsc.edu/cgi-bin/hgVai

hgVai can match up rsIDs with submitted variants -- but it has a hard
upper limit of 100,000 variants per submission, and if too many
additional outputs are selected then the practical limit can be 10,000
(or the query may time out). This would require you to break up your
input into 11 chunks of 100,000 or fewer variants per chunk. You would
need to convert their kgp data to pgSnp format, upload 11 pgSnp files
as custom tracks, and then run hgVai with max variants = 100,000 (and
probably disabling the protein-coding change scores that are enabled
by default) on each of the 11 custom tracks. 11 manual operations is
better than 1000, and you might get some additional useful information
from hgVai, but if you really only want to associate rs IDs, Galaxy is
an easier choice.

If you have any further questions, please reply to
gen...@soe.ucsc.edu. All messages sent to that address are archived on
a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Best regards,

Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
> --
>
Reply all
Reply to author
Forward
0 new messages