Hi Marcin,
Thank you for your question about obtaining CpG Sites overlapping exons. Is it possible for you to expand upon your question a little further? Do you want the coordinates of all CG dinucleotides that overlap exons? Or do you just want the coordinates of the items in the CpG Islands track that have any overlap with exons? If you want the coordinates of the individual dinucleotides, then it will take some custom scripting on your part as we do not store these positions anywhere. However, grabbing the coordinates of where the CpG Islands track overlaps with exons from a gene track can be accomplished with our Table Browser tool: http://genome.ucsc.edu/cgi-bin/hgTables.
To obtain this information, follow the below steps:
1. Navigate to the Table Browser: http://genome.ucsc.edu/cgi-bin/hgTables
2. Choose "Mammal", "Human", "Dec. 2013 (GRCh38/hg38)" from the "clade", "genome", and "assembly" dropdowns.
3. Now make the following selections:
group: Genes and Gene Predictions
track: Gene Track of Interest
table: Table of interest
region: genome
output format: custom track
4. Click "get output"
5.
On the Output page, enter an informative name in the name and
description fields, something like "geneTrack Exons", and then select
"Exons plus" from the "Create one BED record per" section. Lastly, click
"get custom track in table browser".
6. From the resulting Table Browser page, hover over the Tools section of the top blue menu bar, and click "Data Integrator".
7. From the Data Integrator page, make sure the hg38 assembly is selected, then change "region to annotate" to "genome".
8.
Now, in the "Add Data Source" section, select the custom track of exons
we just created by selecting "Custom Tracks" and "geneTrack Exons" from
the "track group" and "track" dropdowns, then click "Add".
9.
Select the CpG Islands track by selecting "Regulation" and "CpG Islands
(cpgIslandExt)" from the "track group" and "track" dropdowns, then click
"Add".
10. Now we are ready to obtain the overlapping items. In the
"Output Options" section, click "choose fields". You will likely want
to deselect all the fields from the custom track and only leave the
fields from the CpG Islands track, but which fields you choose is up to
you.
11. When you are done selecting fields, click "Done" and then click "Get output".
You will now have output of the following format, depending on the
fields you chose (here I limited to only a small region of chr9 that
contains two islands that overlap exons and two that don't, your results
will vary):
# hgIntegrator: database=hg38 region=chr9:133223138-133309723 Mon May 1 10:48:43 2017 #cpgIslandExt.chrom cpgIslandExt.chromStart cpgIslandExt.chromEnd cpgIslandExt.name cpgIslandExt.length cpgIslandExt.cpgNum cpgIslandExt.gcNum cpgIslandExt.perCpg cpgIslandExt.perGc cpgIslandExt.obsExp chr9 133255614 133256444 CpG: 63 830 63 535 15.2 64.5 0.73 chr9 133255614 133256444 CpG: 63 830 63 535 15.2 64.5 0.73 chr9 133255614 133256444 CpG: 63 830 63 535 15.2 64.5 0.73 chr9 133255614 133256444 CpG: 63 830 63 535 15.2 64.5 0.73 chr9 133274615 133275949 CpG: 154 1334 154 967 23.1 72.5 0.89
$ perl -pe 's/^[ \t]*//' cpgPerExon.txt | uniq chr9 133255614 133256444 CpG: 63 830 63 535 15.2 64.5 0.73 chr9 133274615 133275949 CpG: 154 1334 154 967 23.1 72.5 0.89
For the positions of CG dinucleotides within each CpG Island, we don't store the positions anywhere, so you will have to download the sequences of where these tracks overlap, and then write a script to extract positions from there. Such scripting is outside the scope of this mailing list, but you can use the Table Browser to extract the sequences of the overlapping positions:
1. Head to the Table Browser: http://genome.ucsc.edu/cgi-bin/hgTables (or Tools->Table Browser).
2. Make sure the hg38 assembly is selected.
3. Make the following selections:
group: Regulation
track: CpG Islands
table: cpgIslandExt
region:genome
output format: sequence
4.
Now intersect the CpG Islands track and our exon only custom track. We
could not do this in the previous example because the Table Browser
intersection discards fields of interest from our tables and only
outputs position ranges, but since now all we want is sequence it does
not matter.
- Next to intersection click "create".
- Select your
exons only custom track from the group, track, and table dropdowns, and
then select the bullet for "Base-pair-wise intersection (AND) of CpG
Islands and exonsOnlyCustomTrack"
- Click submit
5. Click "get output"
6. On the sequence retrieval page, select any formatting you would like and click "get sequence"
You will now have a FASTA format file like the following:
>hg38_cpgIslandExt_chr9.1 range=chr9:133255615-133256356 5'pad=0 3'pad=0 strand=+ repeatMasking=none CGGGAGGGGGACGGGGCTGCCGGCAGCCCTCCCAGAGCCCCTGGCAGCCG CTCACGGGTTCCGGACCGCCTGGTGGTTCTTGGGCACCGCAGTGAACCTC AGCTTCCTCAGGACGGCGGGCCAGCCCAGCAGCTGCTGGTCCCACAAGTA CTCGGGGGAGAGCACCTTGGTGGGTTTGTGGCGCAGCAGGTACTTGTTCA GGTGGCTCTCGTCGTGCCACACGGCCTCGATGCCGTTGGCCTGGTCGACC ATCATGGCCTGGTGGCAGGCCCTGGTGAGCCGCTGCACCTCTTGCACCGA ... ...
Thank you again for your inquiry and using the UCSC Genome Browser. If
you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data,
you may send it instead to genom...@soe.ucsc.edu.
Christopher Lee
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/B02B2B98-C32D-45B9-AEC5-C1176D917789%40rr-research.no.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.