Thank you for your questions about phased genotypes for certain individuals. You can import other data into the UCSC Genome Browser in the form of custom tracks or track hubs. For more information about these two ideas please see the following help pages:
Custom Tracks: https://genome.ucsc.edu/goldenPath/help/customTrack.html
Track Hubs: https://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html
In the case of getting the 1000 Genomes data into the Genome Browser, we already have the 1000 Genomes Phase 3 data available as a native track in the Variation Group. Here is a session (https://genome.ucsc.edu/goldenPath/help/hgSessionHelp.html) of some variants from this dataset on chromosome 22:
https://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=chmalee&hgS_otherUserSessionName=hg19_1000GenomesNativeVsCustomTrack
This session displays data from a VCF custom track of just the chr22 variants (labeled as ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz), and data from the native version (labeled as 1000 Genomes Phase 3 Integrated Variant Calls: SNVs, Indels, SVs). In either the case, the data is the same, and I just wanted to show that you can load a VCF file as a custom track if we don't have the data available natively. If you click on any of the variants shown, you will be directed to a details page that displays a multitude of information about the particular variant you clicked on (all from the corresponding VCF file). If you click the plus next to "Detailed genotypes" to expand the section, you can see the phasing information for all the individuals you are interested in.
You can extract the information from this VCF file using the Table Browser. For instance, say you are interested in finding the phased genotypes of NA21144 and NA20911 for the rs1799967 and rs57205909 variants. What you could do is get the positions of those two variants and then grab all 1000 Genomes data corresponding to those two individuals. Please note that this is effectively the same approach used previously with the Data Integrator except substituting the 1000 Genomes dataset for the Genome Variants data and the Table Browser for Data Integrator:
1. Navigate to the Table Browser: http://genome.ucsc.edu/cgi-bin/hgTables
2. Make the following selections:
clade: Mammal
genome: Human
assembly:Feb 2009 (GRCh37/hg19)
group: Variation
track: Common SNPs(150), or any dbSNP track of interest
table: snp150Common
3. Next to "identifiers", click "paste list", enter in rs1799967 and rs57205909, and click submit.
4. Next to output format, make sure "selected fields from primary and related tables" is selected, and click "get output".
5. On the resulting page check the boxes for chrom, chromStart, and chromEnd and click "get output".
6. Copy the resulting coordinates, and then head back to the Table Browser.
7. This time instead of selecting the Common SNPs track, select the 1000G Ph3 Vars track.
8. Click the "define regions" button next to the position search, and paste the coordinates from step 6.
9. Choose "all fields from selected table" from the output format dropdown and then click "get output".
On the resulting page you will see the VCF header, and then the VCF lines corresponding to your variants of interest. The last ~2000 columns on each line contain the phasing information for all the different populations listed in the line that begins with "#CHROM POS ID...". You can then get the column number of your individual of interest and then look for the "0|1" in that column to check for the genotypes. This way you can obtain phasing information for multiple individuals in one go.
An easier way of obtaining this information is to download the VCF data that you are interested in, and then make a file of rsIDs, and use the following commands to extract the genotypes of particular individuals:
$ zgrep -Fwf rsIds.txt NameOfVcf.gz > rsIds.vcf $ cut -f 3,2463,2513 rsIds.vcf > genotypes.txt
Unfortunately there is currently no way to do genotype imputation with the Genome Browser, although we have noted this as a feature request and will be sure to let you know if the feature gets added. Could you provide an example of an rsID that was not available at UCSC that was available at Ensembl? I'm not sure I'm understanding what you mean by this, or what you were trying to accomplish.
As to the blank genotype results from the Data Integrator, that only means that there were no items in the secondary tables intersecting the first, or if there were, the fields you selected had no values.
I hope I answered all of your questions, please let me know if there was anything I missed or if you need further clarification.
Thanks,
Christopher Lee
UCSC Genomics Institute
Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining
Thank you again for your inquiry and using the UCSC Genome Browser. If
you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data,
you may send it instead to genom...@soe.ucsc.edu.