Methylation 450k CpG sites information on genome build 37 / hg19

2,022 views
Skip to first unread message

pooja mandaviya

unread,
Jun 27, 2013, 10:22:42 AM6/27/13
to gen...@soe.ucsc.edu
Dear UCSC Team,

I am analysing 450k methylation data at the moment. I had a query regarding the 450k annotation which was provided by illumina and which can be downloaded from this http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeHaibMethyl450/supplemental/ .. 

However since the annotation file from Illumina is from genome build 36, I am interested in looking for CpG site annotations from UCSC genome build 37. 

Can you help me know if there is a way to get the following annotation for the CpG sites from the UCSC genome 37 build? 
- Position in the gene body: whether it is a promoter, within the body, 5`UTR, etc.
- Location in the genome, CpG island, shore or shelve.
- which gene is associated with the CpG sites

So Is there any above annotation available for the 450k CpGs in the form of a downloadable data file? If not, can you suggest me how can I look for them ? 

Thanks in advance. 
 

Pooja
 

Brian Lee

unread,
Jun 27, 2013, 5:04:46 PM6/27/13
to pooja mandaviya, gen...@soe.ucsc.edu
Dear Pooja,

Thank you for using the UCSC Genome Browser and your question about the Methylation 450K Bead Array from ENCODE/HAIB track and Illumina's annotation file.

To begin with, the information for this particular track is currently incorrect on our public server and the updated information is currently only available on our test server: http://genome-test.soe.ucsc.edu/cgi-bin/hgTrackUi?g=wgEncodeHaibMethyl450. In general, you should look to the public server and not the test server, unless specifically advised to do so. The track is undergoing changes to original mappings. The initial release had incorrect probe mappings, and the data providers also requested these annotations be limited to the first nucleotide in the probe for the new version. So the single CpG assayed is displayed on the genome-test site, instead of the full probe (sometimes incorrectly for reverse probes) on the public site (which will be updated soon).  If you have further questions about the track you can look under the Credits section of the track description page (link above) and write to the listed contact, Dr. Florencia Pauli.  Also, we do have a CpG island track, under the blue bar "Regulation" adjacent to the "ENC DNA Methyl..." super track link.  In case you want to show this track in relation to the Methyl 450 track, here is the test site link http://genome-test.soe.ucsc.edu/cgi-bin/hgTrackUi?g=cpgIslandExt, please read the track description.

I suggest you contact Illumina about the annotation information provided in the supplemental file you referenced.  It appears as though much of the questions you are asking are provided in what Illumina created for their Infinium HumanMethylation450 BeadChip Kit.  If you look at their annotation information, it includes hg19 (build 37) reference coordinates.  I performed a liftOver of the the provided hg18 (build 36) coordinates for one item, below, and it does match their provided hg19 coordinates. As well I BLATed the provided SourceSeq onto the hg19 assembly and it corresponds to the same location.   

For example, we display an item cg00381604 at chr1:29435 from table wgEncodeHaibMethyl450Gm12878SitesRep1, below is our entry:

bin chrom chromStart chromEnd name score strand thickStart thickEnd itemRgb
585 chr1 29434 29435 cg00381604 55 + 29434 29435 0,0,205
(UCSC uses 0-based coordinates, see explanation here: http://genome.ucsc.edu/FAQ/FAQtracks#tracks1)

Below is the Illumina annotation, likely created from multiple MySQL queries and other specific methods by Illumnia:

IlmnID,Name,AddressA_ID,AlleleA_ProbeSeq,AddressB_ID,AlleleB_ProbeSeq,Infinium_Design_Type,Next_Base,Color_Channel,Forward_Sequence,Genome_Build,CHR,MAPINFO,SourceSeq,Chromosome_36,Coordinate_36,Strand,Probe_SNPs,Probe_SNPs_10,Random_Loci,Methyl27_Loci,UCSC_RefGene_Name,UCSC_RefGene_Accession,UCSC_RefGene_Group,UCSC_CpG_Islands_Name,Relation_to_UCSC_CpG_Island,Phantom,DMR,Enhancer,HMM_Island,Regulatory_Feature_Name,Regulatory_Feature_Group,DHS

cg00381604,cg00381604,26752380,AAATCAACAAAATCCTAAAACCACACTCAAAAAAAACACAATAAAAAACA,50693408,GAATCGACGAAATCCTAAAACCGCGCTCGAAAAAAACGCAATAAAAAACG,I,A,Red,CTGGGTCCTAGCCCCGCCGCCCCCAGTCCGCCCGCGCCTCCGGGTCCTAACGCCGCCGCT[CG]CCCTCCACTGCGCCCTCCCCGAGCGCGGCTCCAGGACCCCGTCGACCCGGAGCGCTGTCC,37,1,29435,CGCCCTCCACTGCGCCCTCCCCGAGCGCGGCTCCAGGACCCCGTCGACCC,1,19298,F,,rs2462493,,,WASH5P,NR_024540,TSS200,chr1:28735-29810,Island,,,,1:18599-19663,,,

I would ask Illumina to provide or help update their annotation. To recreate parts of this information you could perform various table intersections or MySQL queries to generate similar results.  Please see http://genome-test.soe.ucsc.edu/goldenPath/help/mysql.html, and http://genome-test.soe.ucsc.edu/goldenPath/help/hgTablesHelp.html#Intersection.  If you decide to not contact Illumina to gain insights on how they created their annotations, you may want to search our Mailing List Archives for previously offered steps for similar questions, http://genome.ucsc.edu/FAQ/index.html (bottom box), such as this one about mapping CpG Island Shores: https://lists.soe.ucsc.edu/pipermail/genome/2010-April/021971.html.

If you do not use direct MySQL queries, the Table Browser method will in essence involve generating custom tracks of your regions of interest. With two tracks - A and B - an intersection will produce output with all lines of file A that have overlap with file B. Should you want to do a merge between the two tracks/tables, then send both over to Galaxy (use the Table browser and check output "Galaxy") and perform an interval intersection. This will produce output with all lines of file A along with the lines from file B that have overlap. The Table Browser will timeout with a large number of items, you would likely need to break your custom tracks into smaller chromosome sized items.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have further questions, please feel free to contact the mailing list again at gen...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group



 

Pooja
 

--
 
 
 

Reply all
Reply to author
Forward
0 new messages