I am trying to put together a dataset of hg38 human genome SNP data (CDS only) that contains information on:
Hi Viv,
Thank you for following up from your original BioStars question and
for asking about obtaining hg38 gene and snp info. Also my apologies for
taking so long to get back to you. Unfortunately there is really just
no way to 'easily' get the info you want. The Variant Annotation
Integrator (VAI), is able to get all of the information you want, but is
limited to 100,000 variants at a time, which means not even all of
chromosome 1 can be annotated at a time:
http://genome.ucsc.edu/cgi-bin/hgVai
If you were to go with the VAI approach, the best way to get this info is to load one of the VCF's from NCBI:
https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/
as a custom track, then navigate to the VAI, and limit your output to one chromosome (or sections of a chromosome) at a time. However this approach is time consuming and not really the best option.
You can run queries against our public MySQL server, but from my
experience while investigating this question, the query will time out
for a genome-wide search. You could however put the query in a bash loop
where you loop over the chromosome names, which will work but may time
out depending on how far away you are from our MySQL server:
for chr in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y M; \ do mysql --host=genome-mysql.soe.ucsc.edu --user=genome -A -Ne "select a.chrom, \ a.chromStart, a.chromEnd, a.name, a.transcript, a.alleles, a.codons, a.peptides, \ b.refNCBI, b.observed, b.func, b.alleleFreqs, refGene.name2 from snp147CodingDbSnp a \ join snp147 b on b.name=a.name join refGene on a.transcript=refGene.name where \ a.chrom='chr${chr}'" hg38 >> hg38.snpData; done
I hope this is helpful, please let us know if you have any other questions.
Thank you again for your inquiry and using the UCSC Genome Browser. If
you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data,
you may send it instead to genom...@soe.ucsc.edu.
Christopher Lee
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/boI6aVK4_S32ucgj5HwslzhurAvua41ZLR6sAmuG3iTQftLuoIaO4TNmWe9XrylVEeaqp6a68x4hdr8IWhFLsS0QEjmNAttBeal_Utihj4g%3D%40protonmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.