PCA1 | PCA2 | PCA3 | PCA4 | |
-0.0257456039 | -0.045478216 | -0.00483179 | 0.0210543866 | |
-0.0442088488 | -0.0081143345 | -0.0009079279 | 0.0350200662 | |
-0.02690392 | -0.0285829765 | -0.0242796867 | 0.0123196758 | |
-0.0256274555 | -0.0559394237 | 0.0523019875 | 0.041726609 | |
-0.003918779 | 0.0343353494 | 0.054446566 | -0.0098142739 |
1) How does methylKit calculate PCA? Is it based on the methylation level at each base pair or does methylKit create clusters and calculates a cumulative methylationscore for a particular region or cluster and then use the methylation score for each region to do a PCA analysis? What is the information that the methylKit input to prcomp() function?
2) How can I download/capture this information (the methylation profile for each tissue sample) that methylKit submits to the prcomp() function?
3) Also, if I have certain regions of interest (for example, promoters or CpG islands) and I have their chromosomal positions in the genome, how can I get the methylation scores for these regions of interest? Using methylKit, is there a way to filter out everything so that I only have the methylation scores for these regions of interest?
Thank you.Best regards,Jing Xian
--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at http://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.
Therefore, my questions are:a) What is coverage1? numCs1? and numTs1? I know that they represent values for my first sample.
Is "coverage" the number of times that base is covered by the sequence reads?
Does "numCs" represent the number of methylated cytosines (since these have not been converted by the bisulfite treatment) identified by the reads? So, for"chr1" 854966 854966 "+" 11 11 0 15 8 7In the first sample, there were 11 reads that covered this base and in all 11 reads, the base identified at this position by all 11 reads is cytosine.
Does "numTs" represent the number of thymine? As in the above example,"chr1" 854966 854966 "+" 11 11 0 15 8 7
does this mean that of the 15 reads that covered this base at this position (854966), 7 were converted to thymine by the bisulfite treatment?
b) When methylKit does a hierarchical clustering of the samples based on the methylation profile of the samples, what does it use for the to as the methylation level value? Does it use the "coverage" value or the "numCs" value or the "numTs" value or perhaps a ratio of numCs" to "numTs"? Is the hierarchical clustering based on the methylation level at EACH base pair?
2) Is it possible to produce a "heatmap" for the hierarchical clustering?
3) Based on the annotation of the UCSC bed file, I noticed that"chr1" 845810 845810 "+" 12 12 0 11 11 0"chr1" 845816 845816 "+" 12 12 0 11 11 0belong to a single annotated CpG island. Therefore, I was wondering how can one calculate the cumulative methylation value for this CpG island?
4) Finally, when I do the following:> getMethylationStats(myobj[[2]],plot=FALSE,both.strands=FALSE)methylation statistics per basesummary:Min. 1st Qu. Median Mean 3rd Qu. Max.0.000 0.000 5.263 36.020 86.050 100.000percentiles:0% 10% 20% 30% 40%0.000000 0.000000 0.000000 0.000000 0.00000050% 60% 70% 80% 90%5.263158 40.000000 76.470588 91.666667 97.46835495% 99% 99.5% 99.9% 100%100.000000 100.000000 100.000000 100.000000 100.000000At 50%, you can see the value is "5.263158". What does the value "5.263158" represent? Is this the percentage of bases that are 50% methylated? What does it mean? How do I interpret this table?
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
# setting destrand=TRUE, will merge reads on both strans of a CpG dinucleotide. This provides better
meth=unite(myobj,destrand=TRUE)
--
You received this message because you are subscribed to a topic in the Google Groups "methylkit_discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/methylkit_discussion/q6qUN-pyvwQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to methylkit_discus...@googlegroups.com.
1) I was looking through the methylation profile file for one of my samples and I noticed that, in general, only bases that have been methylated are recorded for "coverage", "numCs", and "numTs". That is to say, that if a base was not methylated then there is generally no information about it. However, I found that on some occasions, it was reported. That is to say, I found, for example,"chr1" 28739459 28739459 "-" 27 0 27So, this is an unmethylated base that was reported. Why? Why does MethylKit sometimes report these unmethylated bases and sometimes throws them out. Please note that this is the methylation profile for only one sample.
2) I also noticed in https://code.google.com/p/methylkit/ that it says:"# merge all samples to one table by using base-pair locations that are covered in all samples# setting destrand=TRUE, will merge reads on both strans of a CpG dinucleotide. This provides better# coverage, but only advised when looking at CpG methylation"meth=unite(myobj,destrand=TRUE)It says that it will use "base-pair locations that are covered in ALL samples". Therefore, am I right to understand that if I have 10 samples and for a particular location/base, 8 samples have values for "coverage", "numCs", and "numTs" but the remaining 2 samples do not have any information for this particular base, then MethylKit will not include the information for this base in the meth=unite(myobj,destrand=TRUE) function?If this is true, isn't this criteria a little too stringent? Because of 2 samples, all information for this base would be thrown out? Can this be avoided?
3) it also says that this is "only advised when looking at CpG methylation". I am a little confused. What else can I be looking at with MethylKit if I am not looking at CpG methylation? When would I NOT use "destrand=TRUE"?I look forward to your reply and once again, thank you for your time and patience.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsubscrib...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at http://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.
To post to this group, send email to methylkit_...@googlegroups.com.--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.
Visit this group at http://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "methylkit_discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/methylkit_discussion/q6qUN-pyvwQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to methylkit_discussion+unsub...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at http://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.
I was looking at a specific gene promoter and I noticed that for a particular sample, there were no methylation data ("coverage", "numCs", "numTs") for that region. At first, I thought it may be because there were no reads for that region for this particular sample. However, when I loaded the SAM file for this sample into IGV, I noticed that there were indeed reads in this region. But I also noticed that for all the reads in this region for this sample, they were coincidentally all unmethylated (for bases where C is in the reference genome, the reads for that position were all Ts). So, I guessed that they made not have been reported because they were unmethylated. However, when I looked at methylation data for the same promoter region in another sample, I found that there were instances when the unmethylated regions were reported.
You said: "There is no description how you achieved that control."This is only an observation. I am not sure what kind of control you are expecting.
You said: " There is coverage and quality thresholds if you are using read.bismark() function that's the only way a base would be discarded."I used the default parameters for the read.bismark() function so I don't know if that would discard any bases. I have decided to redo this and set mincov to 10 and minqual to 0 to see if this changes anything. However, the reason I asked this question was because I thought you may know the reason for this or you may have made the same observation when you were testing your software.
###Why do you think methylation is specific to only CpG dinucleotides? Where did you read that?"DNA methylation typically occurs in a CpG dinucleotide context; non-CpG methylation is prevalent in embryonic stem cells,[5][6][7] and has also been indicated in neural development."I think you've mis-interpreted the intent of my question. When I asked what else I could be looking at with MethylKit, I was actually thinking that perhaps your software might be able to estimate CNV. I have read that some other methylation analysis software are able to do this (see CopyNumber450K package). I could not find any mention of this in the MethylKit user guide so I decided to ask to be sure.
...