Initializing COBRA Toolbox in a script in parallel on a high-performance computer cluster

29 views
Skip to first unread message

Neel Sanjay Sanghvi

unread,
May 17, 2024, 6:07:39 PMMay 17
to COBRA Toolbox
Greetings COBRA toolbox community, 

I am trying to run a COBRA toolbox pipeline in MATLAB 2022a in a remote, high-performance computing cluster (HPCC) that has a Linux environment. I installed the toolbox following the "Download and Installation" header on the opencobra.github.io website directly onto the remote cluster (it has internet access). The cobratoolbox/ folder exists in my personal space in the cluster, so I have access/permissions to it. 

My pipeline is one .m file that processes one data sample, and so I seek to run this .m file anywhere from 7 to 400 times in parallel. I am doing so in this cluster environment, in which each run loads MATLAB 2022a independently and thus also initializes the COBRA toolbox (via the initCobraToolbox function) to use the rest of its functions. There is trouble when I do this, however, because it seems that initCobraToolbox is throwing the "The submodules could not be initialized" error. Please see any of the logs (basic text files) in the attached "fail_logs.zip", which is the standard/console output of the runs that failed after being submitted on the cluster. Below the relevant code snippet where the code fails (it's just the very beginning of my .m file); notably, my code functions just fine when not run in parallel: 

<<CODE SNIPPET START>>
trueStart = tic;
rng(42); % set seed for reproducibility.

cd(pathsStruct.currDir) % pathsStruct is just a struct of paths with various information. 

% Add the proper cobratoolbox to the path.
addpath(pathsStruct.cobraPath)

% Wait a particular amount of time before initiating...this was a hack I came up with 
% to try and avoid multiple instances of initCobraToolbox being run simultaneously. 
% It only kind of works and is quite suboptimal.  
fprintf(sprintf("Waiting %d minutes before initiating Cobra Toolbox...\n", time_i))
pause(10 * time_i)

initCobraToolbox(false) % <- code FAILS here!!!
changeCobraSolver('gurobi', 'all');
cd(pathsStruct.currDir)
% REST OF ANALYSIS BELOW %
<<CODE SNIPPET END>>

I then tested just running this snippet of code in two interactive sessions of MATLAB 2022a on the cluster, and in doing so reproduced the crash in more detail. Attached you will find three logs (titled "two_simultaneous*.txt) from three attempts to run the above snippet of code simultaneously across two MATLAB windows, and in those logs the same overall "The submodules could not be initialized." error comes up. But also, it seems there is an inability to LOCK some sort of config file when connecting to the github, implying some sort of data race issue when running initCobraToolbox simultaneously. Notably, in my experiments here, one script runs normally, while the other crashes, resulting in the logs attached. 

Is there anyway to get around this data race when initializing the COBRA toolbox in parallel? I tried also adding the full path and subfolders of the cobratoolbox installation folder to the environment (addpath(genpath(pathsStruct.cobraPath))) and just running changeCobraSolver("gurobi", "all") instead of initCobraToolbox(false) plus this command, but doing so resulted in the same errors. Is there, perhaps, some way to avoid this updating/connecting to Github process? I do not know exactly what initCobraToolbox sets up, other than the paths to its components, so I'm not sure how to proceed here in avoiding the steps that result in the data race... 

Apologies if anything is unclear; this is my first time writing in this google group. Please let me know if I can clarify or provide any helpful details about the setup, code, etc. I would be very appreciative of any guidance given! 

With great gratitude, 
Neel Sanghvi

Predoctoral Fellow – Cancer Data Science Laboratory

National Cancer Institute

Graduate Research Assistant, Bioengineering

University of Maryland, College Park

Pronouns: he/him

fail_logs.zip
two_simultaneous_runs_crash_3.txt
two_simultaneous_runs_crash_2.txt
two_simultaneous_runs_crash_1.txt

Ronan M.T. Fleming

unread,
May 17, 2024, 6:24:05 PMMay 17
to cobra-...@googlegroups.com
Check that the dependencies, eg curl are installed on each node.
--

---
You received this message because you are subscribed to the Google Groups "COBRA Toolbox" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cobra-toolbox+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cobra-toolbox/97822dbe-8fd3-48aa-9bdf-25e904455460n%40googlegroups.com.


--
--
Mr. Ronan MT Fleming B.V.M.S. Dip. Math. Ph.D.
----------------------------------------------------------------------------
Associate Professor,
School of Medicine,
University of Galway,
Ireland.
Coordinator of the Horizon Europe project "Reconstruction and Computational Modelling for Inherited Metabolic Diseases" (Recon4IMD)
----------------------------------------------------------------------------
Peer-reviewed publications: https://goo.gl/FZPG23
Mobile:  +353 852 109 806
Github: @rmtfleming
----------------------------------------------------------------------------

Nsanghvi

unread,
May 17, 2024, 6:30:50 PMMay 17
to Cobra-Toolbox
Thanks for the quick response! Yes the dependencies would be installed on each node, which is why the script DOES work for the most part. It just so happens that jobs that start simultaneously on the cluster also run initCobraToolbox simultaneously, which then causes the error as described in my original email. So the question remains, can I circumvent this subroutine of initCobraToolbox somehow? Or perhaps, is there another way of going about this so I may run my jobs in parallel? 

Thanks,
Neel 
You received this message because you are subscribed to a topic in the Google Groups "COBRA Toolbox" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cobra-toolbox/vOFnRjnnxSM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cobra-toolbo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cobra-toolbox/CAOivGYvJJbTOoej9KZerQ5h0a706XrdsWS%2B0xBte8qH4d_mKpg%40mail.gmail.com.

Neel Sanjay Sanghvi

unread,
Jun 4, 2024, 11:47:54 AMJun 4
to COBRA Toolbox
Hi folks, just following up on this. Anyone know a workaround for the initializing COBRA toolbox in parallel on a cluster problem that I am facing? 

Many thanks, 
Neel Sanghvi

To unsubscribe from this group and stop receiving emails from it, send an email to cobra-toolbo...@googlegroups.com.


--
--
Mr. Ronan MT Fleming B.V.M.S. Dip. Math. Ph.D.
----------------------------------------------------------------------------
Associate Professor,
School of Medicine,
University of Galway,
Ireland.
Coordinator of the Horizon Europe project "Reconstruction and Computational Modelling for Inherited Metabolic Diseases" (Recon4IMD)
----------------------------------------------------------------------------
Peer-reviewed publications: https://goo.gl/FZPG23
Mobile:  +353 852 109 806
Github: @rmtfleming
----------------------------------------------------------------------------

--

---

Thomas Pfau

unread,
Jun 6, 2024, 8:28:39 AMJun 6
to cobra-...@googlegroups.com

Hi,

It used to be that setting updateToolbox to false stops the toolbox to go through the whole git process, but it seems this has changed at some point.

I'm attaching a modified initCobraToolbox.m which essentially skips the whole git processing and assumes that you have manually initialized the toolbox at some prior point.

Best,

Thomas

initCobraToolbox.m

Neel Sanjay Sanghvi

unread,
Jun 22, 2024, 2:33:08 PMJun 22
to COBRA Toolbox
Thanks Thomas! Replacing the original initCobraToolbox.m file in my installation with your attached file worked perfectly. Now, no such data races seem to occur, and I can run a ton of jobs in parallel on the hpcc. 

Many thanks,
Neel Sanghvi

Reply all
Reply to author
Forward
0 new messages