Dear Pooja,
Thank you for using the UCSC Genome Browser and your question about the Methylation 450K Bead Array from ENCODE/HAIB track and Illumina's annotation file.
To begin with, the information for this particular track is currently incorrect on our public server and the updated information is currently only available on our test server:
http://genome-test.soe.ucsc.edu/cgi-bin/hgTrackUi?g=wgEncodeHaibMethyl450. In general, you should look to the public server and not the test server, unless specifically advised to do so. The track is undergoing changes to original mappings. The initial release had incorrect probe mappings, and the data providers also requested these annotations be limited to the first nucleotide in the probe for the new version. So the single CpG assayed is displayed on the genome-test site, instead of the full probe (sometimes incorrectly for reverse probes) on the public site (which will be updated soon). If you have further questions about the track you can look under the Credits section of the track description page (link above) and write to the listed contact, Dr. Florencia Pauli. Also, we do have a CpG island track, under the blue bar "Regulation" adjacent to the "ENC DNA Methyl..." super track link. In case you want to show this track in relation to the Methyl 450 track, here is the test site link
http://genome-test.soe.ucsc.edu/cgi-bin/hgTrackUi?g=cpgIslandExt, please read the track description.
I suggest you contact Illumina about the annotation information provided in the supplemental file you referenced. It appears as though much of the questions you are asking are provided in what Illumina created for their Infinium HumanMethylation450 BeadChip Kit. If you look at their annotation information, it includes hg19 (build 37) reference coordinates. I performed a liftOver of the the provided hg18 (build 36) coordinates for one item, below, and it does match their provided hg19 coordinates. As well I BLATed the provided SourceSeq onto the hg19 assembly and it corresponds to the same location.
For example, we display an item cg00381604 at chr1:29435 from table wgEncodeHaibMethyl450Gm12878SitesRep1, below is our entry:
bin chrom chromStart chromEnd name score strand thickStart thickEnd itemRgb
585 chr1 29434 29435 cg00381604 55 + 29434 29435 0,0,205
Below is the Illumina annotation, likely created from multiple MySQL queries and other specific methods by Illumnia:
IlmnID,Name,AddressA_ID,AlleleA_ProbeSeq,AddressB_ID,AlleleB_ProbeSeq,Infinium_Design_Type,Next_Base,Color_Channel,Forward_Sequence,Genome_Build,CHR,MAPINFO,SourceSeq,Chromosome_36,Coordinate_36,Strand,Probe_SNPs,Probe_SNPs_10,Random_Loci,Methyl27_Loci,UCSC_RefGene_Name,UCSC_RefGene_Accession,UCSC_RefGene_Group,UCSC_CpG_Islands_Name,Relation_to_UCSC_CpG_Island,Phantom,DMR,Enhancer,HMM_Island,Regulatory_Feature_Name,Regulatory_Feature_Group,DHS
cg00381604,cg00381604,26752380,AAATCAACAAAATCCTAAAACCACACTCAAAAAAAACACAATAAAAAACA,50693408,GAATCGACGAAATCCTAAAACCGCGCTCGAAAAAAACGCAATAAAAAACG,I,A,Red,CTGGGTCCTAGCCCCGCCGCCCCCAGTCCGCCCGCGCCTCCGGGTCCTAACGCCGCCGCT[CG]CCCTCCACTGCGCCCTCCCCGAGCGCGGCTCCAGGACCCCGTCGACCCGGAGCGCTGTCC,37,1,29435,CGCCCTCCACTGCGCCCTCCCCGAGCGCGGCTCCAGGACCCCGTCGACCC,1,19298,F,,rs2462493,,,WASH5P,NR_024540,TSS200,chr1:28735-29810,Island,,,,1:18599-19663,,,
If you do not use direct MySQL queries, the Table Browser method will in essence involve generating custom tracks of your regions of interest. With two tracks - A and B - an intersection will produce output with all lines of file A that have overlap with file B. Should you want to do a merge between the two tracks/tables, then send both over to Galaxy (use the Table browser and check output "Galaxy") and perform an interval intersection. This will produce output with all lines of file A along with the lines from file B that have overlap. The Table Browser will timeout with a large number of items, you would likely need to break your custom tracks into smaller chromosome sized items.
Thank you again for your inquiry and using the UCSC Genome Browser. If you have further questions, please feel free to contact the mailing list again at
gen...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genome Bioinformatics Group