MLML produces all NaN

23 views
Skip to first unread message

Samantha Schaffner

unread,
Nov 20, 2018, 9:19:14 PM11/20/18
to meth...@googlegroups.com
Hello,

I am running mlml with paired BS and oxBS data in order to estimate hydroxymethylation levels. Before discovering MethPipe I had processed my data with BSMAP (alignment and methratio). I took the corresponding information from BSMAP's methratio output and formatted it to match the 6-column BED file described in the MethPipe manual:

   chr     pos   pos.1 read_count ratio strand

1 chr1 3037802 3037803     CpG:10 1.000      +

2 chr1 3037803 3037804     CpG:12 0.167      -

3 chr1 3037820 3037821     CpG:10 1.000      +

4 chr1 3037821 3037822     CpG:12 0.833      -

5 chr1 3037825 3037826     CpG:10 0.900      +


I saved these two files as WT_12m_EE_4_BS_methratio.txt and WT_12m_EE_4_oxBS_methratio.txt, then ran the following:

mlml -u WT_12m_EE_4_BS_methratio.txt -m WT_12m_EE_4_oxBS_methratio.txt -o WT_12m_EE_mlml.txt

mlml seemed to run with no errors but the output only included the chromosome and position -- no methylation or hydroxymethylation levels (across all 1,353,091 CpG sites which had already been filtered for read coverage >10X and for sites shared between the BS and oxBS data):

   chr      X0      X1 X0.1 X0.2 X1.1 X0.3

1 chr1 3037802 3037803  NaN  NaN  NaN  NaN

2 chr1 3037803 3037804  NaN  NaN  NaN  NaN

3 chr1 3037820 3037821  NaN  NaN  NaN  NaN

4 chr1 3037821 3037822  NaN  NaN  NaN  NaN

5 chr1 3037825 3037826  NaN  NaN  NaN  NaN

6 chr1 3037826 3037827  NaN  NaN  NaN  NaN


I am having trouble figuring out why this occurred. I tried changing my input files to a .bed extension in case there was a formatting issue but this gave the same result. 

Let me know if anyone has some insight on this -- thank you in advance for the help!

Sincerely,

Samantha

Meng Zhou

unread,
Nov 21, 2018, 1:46:15 PM11/21/18
to meth...@googlegroups.com
Hi Samantha,

Thanks for your feedback. It seems that we have changed the input format of MLML to the new methcounts format, but the manual was not updated accordingly. We will fix this issue as soon as possible.

Please reformat your input as the following:
chr  pos  strand  context  level  coverage
chr1  3037802  +  CpG  10  1.000
chr1  3037803  -  CpG  12  0.167

Best regards,
Meng


--
You received this message because you are subscribed to the Google Groups "MethPipe and MethBase Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methpipe+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Samantha Schaffner

unread,
Nov 21, 2018, 3:36:04 PM11/21/18
to meth...@googlegroups.com
Hi Meng,

Thank you for your reply and for letting me know about the update! This allowed me to directly take the corresponding columns from my methratio output which were already in this order, and mlml seems to be working correctly now.

Sincerely,

Samantha


Samantha Schaffner

unread,
Nov 30, 2018, 7:56:36 PM11/30/18
to meth...@googlegroups.com
Hi Meng,

I got mlml working and had run it individually on each sample in my experiment. However, all the hydroxymethylation level estimates came out as either zero or extremely low (on the magnitude of 1e-17).

I noticed in the manual it was suggested to first merge replicates of an experiment using merge-methcounts. I thought this might help me to detect more hydroxymethylation so I have been trying this approach. I merged my four replicates for each condition but now when I am running mlml on the merged files it is giving me an error. I have been using .txt file extensions throughout.

Here is how I merged four methratio files (from BSMAP):
merge-methcounts ACAGTG_methratio.txt ACTTGA_methratio.txt ATCACG_methratio.txt CAGATC_methratio.txt -o WT_12m_STD_oxBS.txt

The output from merge-methcounts looked like this:

  chr      X0 X X.1      X0.1 X562912164834048  

1 chr1 3014611 + CpG 1.0000000                1 

2 chr1 3014612 - CpG 0.0666667               15 

3 chr1 3014928 + CpG 1.0000000                1 

4 chr1 3014929 - CpG 0.6111110               18 

5 chr1 3014974 + CpG 0.0000000                1 

6 chr1 3014975 - CpG 0.8125000               16 


Then I ran mlml:
mlml -u WT_12m_STD_BS.txt -m WT_12m_STD_oxBS.txt -o mlml_WT_12m_STD.txt

Error message:

mlml: mlml.cpp:509: int main(int, const char**): Assertion `f_chr == s_chr && f_pos == s_pos' failed.

Aborted (core dumped)


I checked my merged files and filtered so that the same sites were all included in both the oxBS and BS files, in the same order. So I am not sure why there seems to be a problem matching the chr and pos columns.

Thank you again for all of your help and for creating this pipeline!

Sincerely,

Samantha


Meng Zhou

unread,
Nov 30, 2018, 8:27:37 PM11/30/18
to meth...@googlegroups.com
Hi Samantha,

Can you try to create a small subset of your data, which can reproduce this error? For example, you can use the head command to get the first ten lines of each input file, and run mlml again to see if you still get the error.

And just to make sure, is the following line actually included in your merged methcounts file? It looks like an invalid entry.
  chr      X0 X X.1      X0.1 X562912164834048  

Best regards,
Meng

Samantha Schaffner

unread,
Dec 3, 2018, 1:35:43 PM12/3/18
to meth...@googlegroups.com
Hi Meng,

The line "chr      X0 X X.1      X0.1 X562912164834048" showed up as a header in my merge-methcounts files. I saved the files again without a header, and with the filtering to include all the same cytosine sites in each, and mlml is working again!

Thank you for the help!

Samantha
Reply all
Reply to author
Forward
0 new messages