question about the output of programs

64 views

Skip to first unread message

Mathias Lesche

unread,

Aug 2, 2016, 3:33:34 PM8/2/16

to meth...@googlegroups.com

Hello,

I have some questions about the output from several programs which are described in the manual. Chapter 3.2.1 shows the call of methdiff when one compares two methylomes. Here is a line from the example:

chr1 3001006 + CpG 0.874119 21 18 15 22

The last 4 columns give the coverage of each CPG in NHFF, number of methylated reads in NHFF, coverage in ESC, and number of methylated reads in ESC. So 21 is the coverage, 18 the methylated reads, 15 the coverage for ESC and 22 the number of methylated reads for ESC. How come that the number of methylated reads is higher than the coverage. Shouldn't it be always lower?

Chapter 3.2.2 describes the usage of radmeth and the test for differential methylation. The manual is quite clear what to do and how to call the programs. The output of radmeth adjust is shown at the bottom of page 13 of the manual. It's a table with 11 columns. Columns 1-7 are clear and 8-11 is again coverage, number of methylated reads for my two conditions. But which conditions is the control or the case in this example? Is the control column 8 and 9 and case 10 and 11 and? It's nowhere explained and therefore I can't apply it to my data. I end up with differential methylated regions but I don't know to which of my conditions they belong or in which the CpG is higher methylated (control or case). The example in 3.2.1 (DM analysis of small groups) with the program dmr is able to delivers this classification

The selection of differential methylated sites is descripbed on the top of page 14.The command call is

awk '$5 < 0.01 "{print $0; $}"' cpgs.adjusted.bed > dm_cpgs.bed.

This is the call to get CpGs with FDR-corrected p-value below 0.01. Shouldn't it be $7 instead of $5 because it's written on the previous page that the FDR-correct p-value is in the 7th column.
Furthermore some rows have a -1-1 in the 7th columns. Here is an example:

chr1 10488 + CpG -1 -1 -1-1 6 6 4 4

Lastly, the output of radmeth merge gives a bed file with 6 columns as described on page 14. I was wondering how to read the last column which is the estimated methylation difference?. Is the from control to case which would mean for row 1 that case has 49% lower methylation in that region compared to control? Here is the output

chr1 57315 57721 dmr 10 -0.498148

So would'nt it be possible to have headers for all those outputs because I find it quite confusing to interpret the results

Thanks for the answers and feedback

Mathias

Meng Zhou

unread,

Aug 3, 2016, 2:29:04 PM8/3/16

to meth...@googlegroups.com

Hi Mathias,

(1) The last four columns of the methdiff output are:

methylated and unmethylated reads in input A, and methylated and unmethylated reads in input B, respectively.

So the coverage of each input is the sum of the corresponding two read counts. It turns out the manual section is incorrect. I will fix it. Thanks for pointing it out.

(2) For the output of radmeth regression, the last four columns correspond to (a) coverage of the case; (b) methylated reads of the case; (c) coverage of the control, and (d) methylated reads of the control.

(3) Yes you are right. The manual needs to be fixed.

(4) For the last column of radmeth merge, it is computed by: case_methylation_level - control_methylation_level. So it’s the difference between estimated case and control methylation levels. Your interpretation is correct.

Thanks for all these valuable comments, we will make the manual sections more clear and accurate. As adding a header to all the output, in some cases when the downstream program anticipates a BED format, but does not ignore the header line properly, it will cause some trouble. I’m considering adding the header information to the -v verbose output, so that users can see the header information in the screen output separated from the output file.