MeDeCom (DeComPipeline) install issue

130 views
Skip to first unread message

Peter Mcerlean

unread,
May 26, 2020, 6:26:37 AM5/26/20
to Epigenomics forum
Dear CompEpigen Team,

I've been trying to install MeDeCom as part of the DecompPipeline via the recommendation on the Github page and had issues with the DecompPipeline/FactorViz packages not locating it.

I tried all sorts including installing the binary manually and R CMD INSTALL in terminal but was still getting errors (below).

After some investigation I was able to determine the error is with MeDeCom 1.0.0 as using version MeDeCom 0.3.0 everything seems to load fine.

However, I haven't run any analysis with MeDeCom 0.3.0 as I should naturally be using the latest version?

Apologizes if this forum isn't he correct place for a development issue!

Best,

Peter
 

> library(MeDeCom)

 

…everything good until…

 

Error: package or namespace load failed for ‘MeDeCom’ in dyn.load(file, DLLpath = DLLpath, ...):

 unable to load shared object '/Library/Frameworks/R.framework/Versions/3.6/Resources/library/MeDeCom/libs/MeDeCom.so':

  dlopen(/Library/Frameworks/R.framework/Versions/3.6/Resources/library/MeDeCom/libs/MeDeCom.so, 6): Symbol not found: ____chkstk_darwin

  Referenced from: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/MeDeCom/libs/MeDeCom.so

  Expected in: /usr/lib/libSystem.B.dylib

 in /Library/Frameworks/R.framework/Versions/3.6/Resources/library/MeDeCom/libs/MeDeCom.so

 

Re install from github

 

> devtools::install_github("lutsik/MeDeCom")

Downloading GitHub repo lutsik/MeDeCom@master

Skipping 1 packages not available: RnBeads

 checking for file ‘/private/var/folders/ls/g6jy40g97xn6482d5tr5vfy80000gs/T/RtmpLxCd6p/remotesf93c46ed127d/lutsik-MeDeCom-d740fe6/DESCRIPTION’ ...

─  preparing ‘MeDeCom’:

  checking DESCRIPTION meta-information ...

─  cleaning src

─  checking for LF line-endings in source and make files and shell scripts

─  checking for empty or unneeded directories

─  looking to see if a ‘data/datalist’ file should be added

─  building ‘MeDeCom_1.0.0.tar.gz’

  

* installing *source* package ‘MeDeCom’ ...

** using staged installation

** libs

clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG  -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.6/Resources/library/RcppEigen/include" -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include `/Library/Frameworks/R.framework/Resources/bin/Rscript -e "Rcpp:::CxxFlags()"` `/Library/Frameworks/R.framework/Resources/bin/Rscript -e "RcppEigen:::CxxFlags()"` -I. -std=c++11 -fPIC  -Wall -g -O2  -c HCLasso.cpp -o HCLasso.o

HCLasso.cpp:47:10: fatal error: 'omp.h' file not found

#include <omp.h>

         ^~~~~~~

1 error generated.

make: *** [HCLasso.o] Error 1

ERROR: compilation failed for package ‘MeDeCom’

* removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/MeDeCom’

* restoring previous ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/MeDeCom’

Error: Failed to install 'MeDeCom' from GitHub:

  (converted from warning) installation of package ‘/var/folders/ls/g6jy40g97xn6482d5tr5vfy80000gs/T//RtmpLxCd6p/filef93c49311100/MeDeCom_1.0.0.tar.gz’ had non-zero exit status


Michael Scherer

unread,
May 27, 2020, 3:13:32 AM5/27/20
to Epigenomics forum
Hi Peter,

This forum is definitely the right place for this issue. However, I don't have an idea what the issue might be, since the installation works fine in my settings (Debian 10, R-4.0.0 and Debian 7, R-3.6.1). Could you give us some more information about your setting? Please note that, for Windows systems, we have a dedicated branch on GitHub (devtools::install_github("lutsik/MeDeCom",ref="windows")). For MacOS, you should use the binary available from GitHub (https://github.com/lutsik/MeDeCom/releases/download/v1.0.0/MeDeCom_1.0.0.tgz).

In case none of these installations are working, we also created a Docker container comprising MeDeCom, DecompPipeline and FactorViz available from https://hub.docker.com/r/mscherer/medecom. As a last instance, this should solve all installation issues.

Best,

Michael

Peter Mcerlean

unread,
May 27, 2020, 6:35:18 AM5/27/20
to Epigenomics forum
Hi Micheal, 

Thanks for the reply. 

I'm on a MAC, OS Sierra and R 3.6.3 via R Studio v1.2.5033. 

I thought that it might have been an update issue but when I found it was restricted to MeDeCom I was equally perplexed. And yes, I did try the binary version but still threw up the error. Should I try running it with MeDeCom 0.3.0?

I also thought about the Docker container so may just go ahead a use it. 

Also, my MAC has 2.2GHz processor and 16GB memory, do you think its capable of running 44 EPIC array samples through the pipeline?

Best,

Peter


Michael Scherer

unread,
May 27, 2020, 10:30:19 AM5/27/20
to Epigenomics forum
Hi Peter,

Ok, this is odd. We found that the binary version was working properly in our hands. But thanks for reporting this, we will look further into this. The core functionality of MeDeCom is also available in version 0.3.0, just some interpretation functions, as we report in our preprint (https://www.biorxiv.org/content/10.1101/853150v2) are not available in the older version. You might think about running the analysis on 0.3.0 and using the Docker container for the interpretation.

Yes, your MAC should be sufficient to run such an analysis, although we made better experience when using the framework on larger compute servers (with e.g. 128GB of main memory). Let us know if you face further issues.

Best,

Michael

Peter Mcerlean

unread,
May 27, 2020, 11:16:20 AM5/27/20
to Epigenomics forum
Hi Michael, 

So I updated to R 4.0.0 and found I was getting this error:

dyld: lazy symbol binding failed: Symbol not found: _utimensat
  Referenced from: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libR.dylib (which was built for Mac OS X 10.13)
  Expected in: /usr/lib/libSystem.B.dylib

So seems my MAC OS could be the issue?

I ended up bitting the bullet and went the Docker route but am running into other, probably novice issues.

I can use links provided in terminal and launch the libraries well enough:

library(RnBeads)

library(MeDeCom)

library(DecompePipeline)

library(FactorViz)


But since I already have a "rnb.set" file, I'm jumping straight into the analysis but seems my files aren't being recognized?

> ref.set <- load.rnb.set("/Users/.../rnbSet_preprocessed")

Error in load.rnb.set("/Users/.../rnbSet_preprocessed") : 

  invalid value for path; the path does not exist

> getwd()

[1] "/"

> setwd ("/Users/../Decomp")

Error in setwd("/Users/PFM/Decomp") : cannot change working directory



The path and file are good as they can be loaded into R studio (reverted back to R 3.6.3). I suspect I'm not making a directory to point the files/analysis too?

Best,

Peter


Michael Scherer

unread,
May 28, 2020, 2:37:21 AM5/28/20
to Epigenomics forum
Hi Peter,

Indeed, it looks like a MacOS thing. Sorry that I cannot help you further on that, since I don't have access to a MAC.

For the Docker environment, you will have to specify your local folder structure to Docker, since it constitutes a subsystem, which does not have access to your PC's folder structure. This can be done using the following command:

docker run mscherer/medecom -v <PC_path>:/mnt/win

I would like to refer to the Docker documentation for further information about Docker.

Best,

Michael

Peter Mcerlean

unread,
Jun 3, 2020, 5:23:11 AM6/3/20
to Epigenomics forum
Hi Michael,

So I ran the pipeline using MeDeCom 0.3.0/FactorViz etc all within R studio and it seemed to work ok with some interesting LMCs identified (attached).  

FYI, for my 44 EPIC array samples using default K/lambda settings the analysis took around 3.5 days (gulp!).

I've been wanting to see how my results compare to reference data sets (WGBS) but running the run.refbased option is giving me an error (below). 

I'm aware of missing values due to EPIC vs WGBS comparisons but don't think thats the issue.

I've played around a bit with some of the parameters (e.g. opt.method) but still throwing up same error. 

Another issue with my MAC OS version?

Best,

Peter


cg_subsets = 5000 most var CpGs

> ref.set
Object of class RnBeadRawSet
      44 samples
  784669 probes
of which: 784669 CpG, 0 CpH, and 0 rs
Region types:
  237884 regions of type tiling
   32060 regions of type genes
   41708 regions of type promoters
   24887 regions of type cpgislands
Intensity information is present
Detection p-values are present
Bead counts are present
Quality control information is present
Summary of normalization procedures:
The methylation data was normalized with method wm.dasen.
No background correction was performed.

>Blue.set
Object of class RnBiseqSet
      13 samples
  832953 methylation sites
Region types:
  247394 regions of type tiling
   43385 regions of type promoters
Coverage information is present

>Medecom.BlueP.test <-run.refbased(ref.set, Ks = 5, lambdas=0.0001, cg_subsets = cg_subset, opt.method = "MeDeCom.cppTAfact", temp.dir = NULL, ref.base = "local", most.var = NULL, NCORES = 4, cluster.settings = NULL, ref.set = Blue.set , id.col = "CellType",save.restricted.sites = FALSE)
2020-06-02 16:55:40     6.4  STATUS                 STARTED Imputation procedure knn 
2020-06-02 16:59:42     3.2  STATUS                 COMPLETED Imputation procedure knn 
2020-06-02 17:03:42     4.9  STATUS                 STARTED Imputation procedure knn 
Error: C stack usage  7971012 is too close to the limit
In addition: Warning message:
In knnimp(x, k, maxmiss = rowmax, maxp = maxp) :
  31376 rows with more than 50 % entries missing;
 mean imputation used for these rows


LMCs.png

Michael Scherer

unread,
Jun 4, 2020, 2:33:21 AM6/4/20
to Epigenomics forum
Hi Peter,

That really looks interesting. Please note that MeDeCom tests for different combinations of the regularizer lambda, the number of components K, and also performs cross-validation, which all contribute to the long running time. 3.5 days are not unusual for such an analysis.

The issue that you report is most likely insufficient main memory for the analysis that MeDeCom wants to perform, which is data imputation in this step. You should be able to solve this issue by setting an RnBeads options, in this case
rnb.options(enforce.memory.management=TRUE)

Hope that solves the issue.

Peter Mcerlean

unread,
Jun 4, 2020, 11:09:15 AM6/4/20
to Epigenomics forum
Hi Michael,

Thanks for the suggestion. Using the 'enforce memory' option worked but unfortunately produced to errors when trying to plot (below). 

I read around a bit on the error and found it could be due to too many zero values so removed them in my reference set and reran the analysis to same effect. 

Upon further inspection I actually noticed that the analysis was not restricted to the 5K CpGs but all the EPIC probes. I tried various ways to select 5K CpGs and using either the cg_subset and most.var options. It seemed the error came up with the most.var option.

I'm not sure if the two issues are related? 

Thanks again for your continued help!

Best,

Peter

> rnb.options(enforce.memory.management=TRUE)
> Medecom.BlueP.test <-run.refbased(ref.set, Ks = 5, lambdas=0.0001, cg_subsets = cg_subset, opt.method = "MeDeCom.cppTAfact", temp.dir = NULL, ref.base = "local", most.var = NULL, NCORES = 4, cluster.settings = NULL, ref.set = Blue.set , id.col = "CellType",save.restricted.sites = FALSE)

2020-06-04 09:05:19     2.3    INFO                     Low memory imputation not compatible with method knn, switched to mean.cpgs
2020-06-04 09:05:19     2.3  STATUS                     STARTED Low memory footprint version of imputation
2020-06-04 09:05:24     3.9  STATUS                     COMPLETED Low memory footprint version of imputation
2020-06-04 09:13:32     4.9    INFO                     No low memory footprint imputation available for matrix
[Main:] checking inputs
[Main:] preparing data
[Main:] preparing jobs
[Main:] 11 factorization runs in total
[Main:] 11 runs complete
[Main:] finished all jobs. Creating the object

> Medecom.BlueP.test
$MeDeComSet
An object of class MeDeComSet
Input data set:
776688 CpGs
44 methylomes
Experimental parameters:
k values: 5
lambda values: 1e-04
        ....etc.
        ....
        ....until end.....
[76,] 0.841269841 0.848484848 0.87500000 0.866666667 0.70967742
 [ reached getOption("max.print") -- omitted 776612 rows ] <- All epic probes?

>plotLMCs(Medecom.BlueP.test, K=5, lambda=0.0001, type="dendrogram")
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘getLMCs’ for signature ‘"list"’

Removing NAs

> meth.data <- meth(Blue.set)
> has.missing <- apply(meth.data,1,function(x)any(is.na(x)))
> BlueP.NAs.removed <- remove.sites(BlueP.test,has.missing)
> BlueP.NAs.removed
Object of class RnBiseqSet
      13 samples
  661849 methylation sites
Region types:
  217439 regions of type tiling
   39651 regions of type promoters
Coverage information is present

CpG selection issue:

most.var = 5000
or
CpGs <-5000
most.var = CpGs

Running same parameters as above produces this:

Error in get.i.vector(probelist, rownames(object@sites)) : 
  specified logical vector is of unexpected length

Michael Scherer

unread,
Jun 5, 2020, 2:27:51 AM6/5/20
to Epigenomics forum
Hi Peter,

Thanks for reporting this. This might be a bug in the run.refbased function, and I will look further into this and let you know about any updates.

For the removing sites method, it looks as if the vector that you specify does not have the same length as the number of CpGs in the object. Are the Blue.set and the BlueP.test set the same things?

Sorry for not being able to solve the issue.

Peter Mcerlean

unread,
Jun 5, 2020, 4:10:22 AM6/5/20
to Epigenomics forum
Hi Michael,

No worries. 

Datasets used in the analysis are below. 

Let me know if you'd like me to share any of the data. 

Best,

Peter


> Blue.set (WBGS data)
Object of class RnBiseqSet
      13 samples
  832953 methylation sites
Region types:
  247394 regions of type tiling
   43385 regions of type promoters
Coverage information is present

> ref.set (EPIC Arrays)
Object of class RnBeadRawSet
      44 samples
  784669 probes
of which: 784669 CpG, 0 CpH, and 0 rs
Region types:
  237884 regions of type tiling
   32060 regions of type genes
   41708 regions of type promoters
   24887 regions of type cpgislands
Intensity information is present
Detection p-values are present
Bead counts are present
Quality control information is present
Summary of normalization procedures:
The methylation data was normalized with method wm.dasen.
No background correction was performed.

> Medecom.BlueP.test (medecom output)
$MeDeComSet
An object of class MeDeComSet
Input data set:
776688 CpGs
44 methylomes
Experimental parameters:
k values: 5
lambda values: 1e-04

$RefMeth....etc....
Reply all
Reply to author
Forward
0 new messages