associating DM-CpG, gene part, delta_methylation with a GENE NAME

281 views
Skip to first unread message

Alejandro

unread,
Oct 15, 2014, 5:18:26 PM10/15/14
to methylkit_...@googlegroups.com
HI
I have nicely identified DM-CpG and annotated them. Now I want to do some questions to my data, like:
 which are the genes that have DM-CpG in promoters whose difference with the control is > 50%?
Which are the genes that have DM-CpGs in exons but not in promoters?
and these kind of question

then I'm looking for and object that I can print as a table that contains that information, kind of"

CpG  Chr   start    end       GeneID   promoter   exon intron      %diff


Is there any table like that. If I understand what it has been discuss in certain post it looks like I have to to that outside methylKit. I'm right?
I look at the genomic range user guide but does not seems to produce this can of table neither.
Can some one hellp with this?
thanks!

Altuna Akalin

unread,
Oct 16, 2014, 7:09:09 PM10/16/14
to methylkit_...@googlegroups.com
there is no function to get such a table. But methylKit objects are coercible to GRanges objects from GenomicRanges. Using functions from GenomicRanges you can get such a table. I don't have an example code for that though. You can check GenomicRanges vignette, and some other people in the forum asked for a custom table like that, you can check those threads as well, but answer will not be much different from this.

Best,
Altuna

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at http://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.

Alejandro

unread,
Oct 17, 2014, 10:30:57 AM10/17/14
to methylkit_...@googlegroups.com
Thanks Altuna. Are you planning to incorporate a convienience funtion that provide such a output? I think is very important, It is actually one of the important question. You detect changes in methylation then you really want to know which genes are being affected. I found methylKit a very nice tool to use for naive people like me. THE convinience funtion is what can make this tool more and more popular. I will see what I can do with GenomicRanges.


On Thursday, October 16, 2014 7:09:09 PM UTC-4, Altuna Akalin wrote:
there is no function to get such a table. But methylKit objects are coercible to GRanges objects from GenomicRanges. Using functions from GenomicRanges you can get such a table. I don't have an example code for that though. You can check GenomicRanges vignette, and some other people in the forum asked for a custom table like that, you can check those threads as well, but answer will not be much different from this.

Best,
Altuna
On Wed, Oct 15, 2014 at 11:18 PM, Alejandro <alejandro...@gmail.com> wrote:
HI
I have nicely identified DM-CpG and annotated them. Now I want to do some questions to my data, like:
 which are the genes that have DM-CpG in promoters whose difference with the control is > 50%?
Which are the genes that have DM-CpGs in exons but not in promoters?
and these kind of question

then I'm looking for and object that I can print as a table that contains that information, kind of"

CpG  Chr   start    end       GeneID   promoter   exon intron      %diff


Is there any table like that. If I understand what it has been discuss in certain post it looks like I have to to that outside methylKit. I'm right?
I look at the genomic range user guide but does not seems to produce this can of table neither.
Can some one hellp with this?
thanks!

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.

Altuna Akalin

unread,
Oct 17, 2014, 10:56:10 AM10/17/14
to methylkit_...@googlegroups.com
There was a plan but didn't have time to do it, I agree it is an important function for convenience. 

Best,
Altuna

To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.

Kalyan K Pasumarthy

unread,
Oct 24, 2014, 5:49:46 AM10/24/14
to methylkit_...@googlegroups.com
Here is a blog post that does something close it. It may help in extracting the list of diff CpGs overlapping element of interest or vice versa.


Regards,
Kalyan



 


--

Alejandro

unread,
Oct 30, 2014, 11:55:20 AM10/30/14
to methylkit_...@googlegroups.com

I was trying to follow your blog “Extending methylKit : Extract promoters with differentially methylated CpGs”

 

And I got a list with my DM-CpG overlapping promoters. However still I can not get the name of genes and info about methylation


my methylDiff object (called myDiff) is lke this


> head(myDiff)
methylDiff object with 6 rows
--------------
   chr start end strand     pvalue    qvalue meth.diff
1 chr1   109 109      * 0.61932523 0.8603847  1.322751
2 chr1   115 115      * 0.25197783 0.5729940 -4.483048
3 chr1   161 161      * 0.26025821 0.5729940  5.505142
4 chr1   310 310      * 0.64983930 0.8832111  2.763158
5 chr1   500 500      * 0.06680346 0.4517701 11.034295
6 chr1   511 511      * 0.32199412 0.5866261  5.961400


and the gene.obj look like this:


> head(gene.obj)
GRangesList of length 4:
$exons
GRanges with 217183 ranges and 2 metadata columns:
           seqnames           ranges strand   |     score        name
              <Rle>        <IRanges>  <Rle>   | <integer> <character>
       [1]     chr1     [3631, 3913]      +   |         1   AT1G01010
       [2]     chr1     [3996, 4276]      +   |         2   AT1G01010
       [3]     chr1     [4486, 4605]      +   |         3   AT1G01010
       [4]     chr1     [4706, 5095]      +   |         4   AT1G01010
       [5]     chr1     [5174, 5326]      +   |         5   AT1G01010



# when I  get the list of promoters overlapping diff meth CpGs, using this:

> diff_promo= subsetByOverlaps(gene.obj$promoters, mydiff_GR)


I got this:


> diff_promo= subsetByOverlaps(gene.obj$promoters, mydiff_GR)
> head(diff_promo)
GRanges with 6 ranges and 2 metadata columns:
      seqnames         ranges strand |     score        name
         <Rle>      <IRanges>  <Rle> | <integer> <character>
  [1]     chr1 [ 2630,  4630]      + |         0           .
  [2]     chr1 [ 7737,  9737]      - |         0           .
  [3]     chr1 [12714, 14714]      - |         0           .
  [4]     chr1 [22145, 24145]      + |         0           .
  [5]     chr1 [22415, 24415]      + |         0           .
  [6]     chr1 [27499, 29499]      + |         0           .


But is not wath I'm trying to get. I would like to have myDiff objetc with two extra column, one that indicate the name of the gene, and other with the integer value that distiguish promoter, exon,intron etc.

Can you or someone here help me with this, I will greatly appreciate that.
Ale




On Friday, October 24, 2014 5:49:46 AM UTC-4, Kalyan Kumar Pasumarthy wrote:
Here is a blog post that does something close it. It may help in extracting the list of diff CpGs overlapping element of interest or vice versa.


Regards,
Kalyan




 


On 16 October 2014 00:18, Alejandro <alejandro...@gmail.com> wrote:
HI
I have nicely identified DM-CpG and annotated them. Now I want to do some questions to my data, like:
 which are the genes that have DM-CpG in promoters whose difference with the control is > 50%?
Which are the genes that have DM-CpGs in exons but not in promoters?
and these kind of question

then I'm looking for and object that I can print as a table that contains that information, kind of"

CpG  Chr   start    end       GeneID   promoter   exon intron      %diff


Is there any table like that. If I understand what it has been discuss in certain post it looks like I have to to that outside methylKit. I'm right?
I look at the genomic range user guide but does not seems to produce this can of table neither.
Can some one hellp with this?
thanks!

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.

Kalyan K Pasumarthy

unread,
Oct 30, 2014, 5:19:56 PM10/30/14
to methylkit_...@googlegroups.com
Dear Alejandro,

Are you sure that your promoter object contain the names of the genes. Pls look at diff_promo object you have posted. it doesn't contain any identifier for the gene. 

To start with, one needs a common identifier across all the objects (exons, promoters and introns) to link. Moreover linking needs some level of custom scripting!

Regards,
Kalyan



 


To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.

Alejandro

unread,
Nov 3, 2014, 12:50:54 PM11/3/14
to methylkit_...@googlegroups.com
Kaylan.
What if: first I subset my methyl diff objects to extract rows that are  overlapping promoters:

# Get the list of differentially methylated CpGs overlapping the promoters
myHyper_CpGs_inPromoters= subsetByOverlaps(myHyper_GR, gene.obj$promoters);
myHypo_CpGs_inPromoters= subsetByOverlaps(myHypo_GR, gene.obj$promoters);

THe I subset my gene.obj

myPromoters=subsetByOverlaps(gene.obj, gene.obj$promoters)


then I export both: myHyper_CpGs (or my Hypo_CpGs) and myPromoters to BED files/
then I use bed tool to combine (intercept)  both tables, e.g.  myHyper_CpGs with myPromoters (I suppose that if both tables are sorted the same the data from gene.obj will be aggreated to the data of my methyldiff object.
Is there something that you see is wrong with this approach?

PLEASE HELP ME TO UNDERSTAND THIS.

If I head the gene.obj I get this:



> head(gene.obj)
GRangesList of length 4:
$exons
GRanges with 217183 ranges and 2 metadata columns:
           seqnames           ranges strand   |     score        name
              <Rle>        <IRanges>  <Rle>   | <integer> <character>
       [1]     chr1     [3631, 3913]      +   |         1   AT1G01010
       [2]     chr1     [3996, 4276]      +   |         2   AT1G01010
       [3]     chr1     [4486, 4605]      +   |         3   AT1G01010
       [4]     chr1     [4706, 5095]      +   |         4   AT1G01010
       [5]     chr1     [5174, 5326]      +   |         5   AT1G01010


WHERE IS THE INFORMATION ABOUT PROMOTERS?

because when I subset by overlap following this:

> diff_promo= subsetByOverlaps(gene.obj$promoters, mydiff_GR)

i do not understand where in the previous table there is information that allows the command to identify rows overlapping promoters.

Regards,
Kalyan



 



Regards,
Kalyan




 


To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsubscrib...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at http://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.

Kalyan K Pasumarthy

unread,
Nov 3, 2014, 2:47:10 PM11/3/14
to methylkit_...@googlegroups.com
Pls see my reply in text below. Note that the solution for your tabulated output is not straight forward. It needs familiarity with the suggested package below.

Regards,
Kalyan



 


On 3 November 2014 19:50, Alejandro <alejandro...@gmail.com> wrote:
Kaylan.
What if: first I subset my methyl diff objects to extract rows that are  overlapping promoters:

# Get the list of differentially methylated CpGs overlapping the promoters
myHyper_CpGs_inPromoters= subsetByOverlaps(myHyper_GR, gene.obj$promoters);
myHypo_CpGs_inPromoters= subsetByOverlaps(myHypo_GR, gene.obj$promoters);

You may do this way to extract the list of hyper/hypo CpGs.

THe I subset my gene.obj

myPromoters=subsetByOverlaps(gene.obj, gene.obj$promoters)
This does not help you. Note that gene.obj is a GenomicRangesList object and gene.obj$promoters is one of the list items. You may try the following

myPromoters_hyper= subsetByOverlaps( gene.obj$promoters,myHyper_GR,);

then I export both: myHyper_CpGs (or my Hypo_CpGs) and myPromoters to BED files/
then I use bed tool to combine (intercept)  both tables, e.g.  myHyper_CpGs with myPromoters (I suppose that if both tables are sorted the same the data from gene.obj will be aggreated to the data of my methyldiff object.
Is there something that you see is wrong with this approach?
If you are trying the table format you mentioned in this email thread it may help you but make sure tht you are exporting the proper objects. you can play with bedtools, shell commands to get a list of CpGs and the corresponding promoter names.

PLEASE HELP ME TO UNDERSTAND THIS.

If I head the gene.obj I get this:


> head(gene.obj)
GRangesList of length 4:
$exons
GRanges with 217183 ranges and 2 metadata columns:
           seqnames           ranges strand   |     score        name
              <Rle>        <IRanges>  <Rle>   | <integer> <character>
       [1]     chr1     [3631, 3913]      +   |         1   AT1G01010
       [2]     chr1     [3996, 4276]      +   |         2   AT1G01010
       [3]     chr1     [4486, 4605]      +   |         3   AT1G01010
       [4]     chr1     [4706, 5095]      +   |         4   AT1G01010
       [5]     chr1     [5174, 5326]      +   |         5   AT1G01010


WHERE IS THE INFORMATION ABOUT PROMOTERS?
i suggest getting familiarised with GenomicRanges library and its related functions. This package is something very useful for everyone playing with NGS data in R/bioconductor. Familiarity with this package gives the analysis a new dimension.  gene.obj is a GenomicRangesList object. To access the promoter related information, you may need to know how to work with this kind of objects. That is why I am suggesting this package! Don't be offended by my suggestion. It will be beneficial for you to go through it .

because when I subset by overlap following this:

> diff_promo= subsetByOverlaps(gene.obj$promoters, mydiff_GR)

i do not understand where in the previous table there is information that allows the command to identify rows overlapping promoters.
understanding GenomicRanges helps you.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.

Alejandro

unread,
Nov 3, 2014, 10:17:10 PM11/3/14
to methylkit_...@googlegroups.com
Dear Kalyan, I completely agree on the importance of understand how genomicRanges works, but still after read the vignette is not clear to me how to subset that list. I can see that the gene.obj have information about the genes but when subseted by overlap somehow that info is lost.  Can you suggest to me an alternative source of information about genomicRanges where I can learn the full structure of the gene.obj and how to subset only those intervales that contains DM-CpGs (and still keeping the info about their annotations)?



On Monday, November 3, 2014 2:47:10 PM UTC-5, Kalyan Kumar Pasumarthy wrote:
Pls see my reply in text below. Note that the solution for your tabuelated output is not straight forward. It needs familiarity with the suggested package below.

jonathan corbi

unread,
Mar 25, 2015, 9:07:05 AM3/25/15
to methylkit_...@googlegroups.com
Hi Kalyan, 

first thanks for all your posts, they are very helpful. 

Also, the example given in the blog is to extract DMR for annotated regions such as exons, introns, promoters and TSS. Would it be possible to proceed the same way but excluding these regions to "isolate" the in intergenic regions? 

Thanks,

Jonathan 
 

Le vendredi 24 octobre 2014 05:49:46 UTC-4, Kalyan Kumar Pasumarthy a écrit :
Here is a blog post that does something close it. It may help in extracting the list of diff CpGs overlapping element of interest or vice versa.


Regards,
Kalyan



 


On 16 October 2014 00:18, Alejandro <alejandro...@gmail.com> wrote:
HI
I have nicely identified DM-CpG and annotated them. Now I want to do some questions to my data, like:
 which are the genes that have DM-CpG in promoters whose difference with the control is > 50%?
Which are the genes that have DM-CpGs in exons but not in promoters?
and these kind of question

then I'm looking for and object that I can print as a table that contains that information, kind of"

CpG  Chr   start    end       GeneID   promoter   exon intron      %diff


Is there any table like that. If I understand what it has been discuss in certain post it looks like I have to to that outside methylKit. I'm right?
I look at the genomic range user guide but does not seems to produce this can of table neither.
Can some one hellp with this?
thanks!

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.

Kalyan K Pasumarthy

unread,
Apr 1, 2015, 10:19:18 AM4/1/15
to methylkit_...@googlegroups.com
I lost the track! Could you elaborate the question and context!

Regards,
Kalyan


To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.

Juan Pablo Aguilar Cabezas

unread,
Apr 25, 2024, 12:09:00 PM4/25/24
to methylkit_discussion
Alejandro,
Were you able to do your analysis?
I am interested in doing the same, I want to find the genes that overlap the CpGs to see if those genes which are "differentially methylated" match the genes that are differentially expressed and test association/causality.

Thanks

Tuba

unread,
Sep 19, 2024, 5:17:06 PM9/19/24
to methylkit_discussion
Hi,
I tried to detect promoters in myDiff25p.hypo. For this I tried the commands as below. (http://chitka-kalyan.blogspot.com/2014/10/extending-methylkit-extract-promoters.html)
> myDiff25p.hypo=getMethylDiff(myDiff,difference=25,qvalue=0.01,type="hypo") 
> test_gr=as(myDiff25p.hypo, "GRanges") 
> diff_promo= subsetByOverlaps(gene.obj$promoters, test_gr)
but I could not get the promoter percentage in the output of the command below.
> getTargetAnnotationStats(diffAnn.hypo,percentage=TRUE,precedence=TRUE)


where am I doing wrong? can you help me?
@Kalyan K Pasumarthy


Thanks
25 Nisan 2024 Perşembe tarihinde saat 12:09:00 UTC-4 itibarıyla Juan Pablo Aguilar Cabezas şunları yazdı:

alex....@gmail.com

unread,
Sep 20, 2024, 11:43:43 AM9/20/24
to methylkit_discussion
Hi Tuba,

You were probably planning on using annotateWithGeneParts to annotate the DMRs with gene regions (including promoters), but this isn't shown in your code. You need to annotate your diff_promo or test_gr regions before using getTargetAnnotationStats. Here's how you can do this:

library(genomation)
diffAnn.hypo = annotateWithGeneParts(test_gr, gene.obj)

This will annotate the regions in test_gr with gene parts like promoters, exons, and introns, using gene.obj.

Getting the annotation stats should work after the annotation step. If you're still not seeing promoter percentages, ensure that the annotation (diffAnn.hypo) contains promoter regions.

Best,
Alex
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages