when I try to validate the standard configuration of the parallel toolbox, I get the following error. Anyone can tell me what is going wrong? Thanks
Stage: Parallel Job
Status: Failed
Description: The job creation or submission encountered a MATLAB exception.
Command Line Output: (none)
Error Report:
Java exception occurred:
com.mathworks.toolbox.distcomp.local.MpiexecException: Failed to launch SMPD process
at com.mathworks.toolbox.distcomp.local.SmpdDaemonManager.ensureSmpdProcess(SmpdDaemonManager.java:98)
at com.mathworks.toolbox.distcomp.local.SmpdDaemonManager.getSmpdPort(SmpdDaemonManager.java:61)
Caused by: java.io.IOException: Couldn't get the port from the SMPD process output: <>
at com.mathworks.toolbox.distcomp.local.SmpdDaemonManager.buildSmpdProcess(SmpdDaemonManager.java:166)
at com.mathworks.toolbox.distcomp.local.SmpdDaemonManager.ensureSmpdProcess(SmpdDaemonManager.java:96)
... 1 more
Debug Log: (none)
Looks like we're failing to start the smpd process which manages the
local parallel workers. Could you please tell us:
- what OS you're running on
- what version of MATLAB/PCT you're using
Could you also check that in <MATLAB>/bin/<computer> there's a file called
"smpd" (on Linux/Mac) or "smpd.exe" on Windows.
Cheers,
Edric.
I am running Matlab 2010b on a 64-bit machine with Ubuntu 10.04. The smpd file is there (<matlab>/bin/glnxa64/smpd) and is executable for everyone.
Hmm, could you please try
smpd = sprintf( '%s/bin/glnxa64/smpd', matlabroot )
system( [smpd, ' -d'] )
Hopefully, this should print out a whole stack of debug messages to the
command window - you'll need to hit CTRL-C to stop it.
Cheers,
Edric.
This gives me the result below. Does this mean there already is a shared memory object, but matlab cannot access it?
By the way, I have set the Java Heap space to 16G (the maximum), but I have a total memory of 64G (+64G swap).
shm_open msg: Permission denied
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(294)..................: Initialization failed
MPID_Init(94)..........................: channel initialization failed
MPIDI_CH3_Init(116)....................:
MPIDI_CH3U_Init_sshm(260)..............: unable to create a bootstrap message queue
MPIDI_CH3I_BootstrapQ_create_named(347): failed to create a shared memory message queue
MPIDI_CH3I_mqshm_create(97)............: Out of memory
MPIDI_CH3I_SHM_Get_mem_named(465)......:
MPIDI_CH3I_SHM_Get_mem_named(464)......: unable to open shared memory object /mpich2q59C28A81208F2C654FFA8F924D341600 (errno 13) job aborted using terminate/kill:
process: node: exit code: error message:
0: localhost: 1: Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(294)..................: Initialization failed
MPID_Init(94)..........................: channel initialization failed
MPIDI_CH3_Init(116)....................:
MPIDI_CH3U_Init_sshm(260)..............: unable to create a bootstrap message queue
MPIDI_CH3I_BootstrapQ_create_named(347): failed to create a shared memory message queue
MPIDI_CH3I_mqshm_create(97)............: Out of memory
MPIDI_CH3I_SHM_Get_mem_named(465)......:
MPIDI_CH3I_SHM_Get_mem_named(464)......: unable to open shared memory object /mpich2q59C28A81208F2C654FFA8F924D341600 (errno 13)
/usr/local/matlab/2010b/bin/glnxa64/smpd -d: Killed
Strange, I haven't seen that failure mode before. (JVM memory is not
related here). I'll ask the MPICH2 developers to see if they have any
ideas. In the meantime, it should work to run
distcomp.feature( 'LocalUseMpiexec', false )
in MATLAB before running anything else.
Cheers,
Edric.
This works great. Thank you very much :)
distcomp.feature('LocalUseMpiexec',false) solved the problem.
My error message:
----
shm_open msg: Permission denied
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(294)..................: Initialization failed
MPID_Init(94)..........................: channel initialization failed
MPIDI_CH3_Init(116)....................:
MPIDI_CH3U_Init_sshm(260)..............: unable to create a bootstrap message queue
MPIDI_CH3I_BootstrapQ_create_named(347): failed to create a shared memory message queue
MPIDI_CH3I_mqshm_create(97)............: Out of memory
MPIDI_CH3I_SHM_Get_mem_named(465)......:
MPIDI_CH3I_SHM_Get_mem_named(464)......: unable to open shared memory object /mpich2q22D2B0891650E9536DEC94604DE55193 (errno 13)
job aborted using terminate/kill:
process: node: exit code: error message:
0: localhost: 1: Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(294)..................: Initialization failed
MPID_Init(94)..........................: channel initialization failed
MPIDI_CH3_Init(116)....................:
MPIDI_CH3U_Init_sshm(260)..............: unable to create a bootstrap message queue
MPIDI_CH3I_BootstrapQ_create_named(347): failed to create a shared memory message queue
MPIDI_CH3I_mqshm_create(97)............: Out of memory
MPIDI_CH3I_SHM_Get_mem_named(465)......:
MPIDI_CH3I_SHM_Get_mem_named(464)......: unable to open shared memory object /mpich2q22D2B0891650E9536DEC94604DE55193 (errno 13)
/home/smanohar/MATLAB/2010b/bin/glnxa64/smpd -d: Killed
Thank you for this very helpful discussion, I have the same problem on my Machine:
Intel X 980, 12GB RAM, Win7 64-bit, Matlab 2010b.
I created an (importable) mpiexec file, which runs perfectly on several different 32-bit multicore systems. Since the error message I receive during validation is basically identical with what you all have posted, I will not repost it fully; it is simply that: it can't find "option -d" for spmd. Same as with you, all files are actually there...
So when you apply the fix
distcomp.feature('LocalUseMpiexec',false)
Where exactly do you actually put that argument in the Parallel Manager? I tried several things, but I think I am doing something wrong.
Thank you very much in advance for your help!
Kind Regards,
Falk