Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parallel Job Validation fails

143 views
Skip to first unread message

Alexandra

unread,
Jan 12, 2011, 5:57:04 AM1/12/11
to
Hi,

when I try to validate the standard configuration of the parallel toolbox, I get the following error. Anyone can tell me what is going wrong? Thanks

Stage: Parallel Job

Status: Failed
Description: The job creation or submission encountered a MATLAB exception.

Command Line Output: (none)

Error Report:
Java exception occurred:
com.mathworks.toolbox.distcomp.local.MpiexecException: Failed to launch SMPD process
at com.mathworks.toolbox.distcomp.local.SmpdDaemonManager.ensureSmpdProcess(SmpdDaemonManager.java:98)
at com.mathworks.toolbox.distcomp.local.SmpdDaemonManager.getSmpdPort(SmpdDaemonManager.java:61)
Caused by: java.io.IOException: Couldn't get the port from the SMPD process output: <>
at com.mathworks.toolbox.distcomp.local.SmpdDaemonManager.buildSmpdProcess(SmpdDaemonManager.java:166)
at com.mathworks.toolbox.distcomp.local.SmpdDaemonManager.ensureSmpdProcess(SmpdDaemonManager.java:96)
... 1 more

Debug Log: (none)

Edric M Ellis

unread,
Jan 12, 2011, 6:37:11 AM1/12/11
to
"Alexandra " <rue...@tum.de> writes:

Looks like we're failing to start the smpd process which manages the
local parallel workers. Could you please tell us:

- what OS you're running on
- what version of MATLAB/PCT you're using

Could you also check that in <MATLAB>/bin/<computer> there's a file called
"smpd" (on Linux/Mac) or "smpd.exe" on Windows.

Cheers,

Edric.

Alexandra

unread,
Jan 12, 2011, 8:18:05 AM1/12/11
to
> Looks like we're failing to start the smpd process which manages the
> local parallel workers. Could you please tell us:
>
> - what OS you're running on
> - what version of MATLAB/PCT you're using
>
> Could you also check that in <MATLAB>/bin/<computer> there's a file called
> "smpd" (on Linux/Mac) or "smpd.exe" on Windows.

I am running Matlab 2010b on a 64-bit machine with Ubuntu 10.04. The smpd file is there (<matlab>/bin/glnxa64/smpd) and is executable for everyone.

Edric M Ellis

unread,
Jan 14, 2011, 5:50:44 AM1/14/11
to
"Alexandra " <rue...@tum.de> writes:

Hmm, could you please try

smpd = sprintf( '%s/bin/glnxa64/smpd', matlabroot )
system( [smpd, ' -d'] )


Hopefully, this should print out a whole stack of debug messages to the
command window - you'll need to hit CTRL-C to stop it.

Cheers,

Edric.

Alexandra

unread,
Jan 17, 2011, 5:48:04 AM1/17/11
to
> Hmm, could you please try
>
> smpd = sprintf( '%s/bin/glnxa64/smpd', matlabroot )
> system( [smpd, ' -d'] )
>
>
> Hopefully, this should print out a whole stack of debug messages to the
> command window - you'll need to hit CTRL-C to stop it.
>
> Cheers,
>
> Edric.

This gives me the result below. Does this mean there already is a shared memory object, but matlab cannot access it?
By the way, I have set the Java Heap space to 16G (the maximum), but I have a total memory of 64G (+64G swap).

shm_open msg: Permission denied
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(294)..................: Initialization failed
MPID_Init(94)..........................: channel initialization failed
MPIDI_CH3_Init(116)....................:
MPIDI_CH3U_Init_sshm(260)..............: unable to create a bootstrap message queue
MPIDI_CH3I_BootstrapQ_create_named(347): failed to create a shared memory message queue
MPIDI_CH3I_mqshm_create(97)............: Out of memory
MPIDI_CH3I_SHM_Get_mem_named(465)......:
MPIDI_CH3I_SHM_Get_mem_named(464)......: unable to open shared memory object /mpich2q59C28A81208F2C654FFA8F924D341600 (errno 13) job aborted using terminate/kill:
process: node: exit code: error message:
0: localhost: 1: Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(294)..................: Initialization failed
MPID_Init(94)..........................: channel initialization failed
MPIDI_CH3_Init(116)....................:
MPIDI_CH3U_Init_sshm(260)..............: unable to create a bootstrap message queue
MPIDI_CH3I_BootstrapQ_create_named(347): failed to create a shared memory message queue
MPIDI_CH3I_mqshm_create(97)............: Out of memory
MPIDI_CH3I_SHM_Get_mem_named(465)......:
MPIDI_CH3I_SHM_Get_mem_named(464)......: unable to open shared memory object /mpich2q59C28A81208F2C654FFA8F924D341600 (errno 13)
/usr/local/matlab/2010b/bin/glnxa64/smpd -d: Killed

Edric M Ellis

unread,
Jan 17, 2011, 7:57:06 AM1/17/11
to
"Alexandra " <rue...@tum.de> writes:
>> Hmm, could you please try
>>
>> smpd = sprintf( '%s/bin/glnxa64/smpd', matlabroot )
>> system( [smpd, ' -d'] )
>>
>>

Strange, I haven't seen that failure mode before. (JVM memory is not
related here). I'll ask the MPICH2 developers to see if they have any
ideas. In the meantime, it should work to run

distcomp.feature( 'LocalUseMpiexec', false )

in MATLAB before running anything else.

Cheers,

Edric.

Alexandra

unread,
Jan 18, 2011, 6:15:05 AM1/18/11
to
> Strange, I haven't seen that failure mode before. (JVM memory is not
> related here). I'll ask the MPICH2 developers to see if they have any
> ideas. In the meantime, it should work to run
>
> distcomp.feature( 'LocalUseMpiexec', false )
>
> in MATLAB before running anything else.
>
> Cheers,
>
> Edric.

This works great. Thank you very much :)

Sanjay Manohar

unread,
May 31, 2011, 4:48:04 PM5/31/11
to
In case this helps, I also had the same problem.
May have occurred after upgrade to 11.04 (natty)
Currently running 2.6.32-29
6-core Xeon W3...@3.33GHz, 12GB ram

distcomp.feature('LocalUseMpiexec',false) solved the problem.

My error message:
----


shm_open msg: Permission denied
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(294)..................: Initialization failed
MPID_Init(94)..........................: channel initialization failed
MPIDI_CH3_Init(116)....................:
MPIDI_CH3U_Init_sshm(260)..............: unable to create a bootstrap message queue
MPIDI_CH3I_BootstrapQ_create_named(347): failed to create a shared memory message queue
MPIDI_CH3I_mqshm_create(97)............: Out of memory
MPIDI_CH3I_SHM_Get_mem_named(465)......:

MPIDI_CH3I_SHM_Get_mem_named(464)......: unable to open shared memory object /mpich2q22D2B0891650E9536DEC94604DE55193 (errno 13)


job aborted using terminate/kill:
process: node: exit code: error message:
0: localhost: 1: Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(294)..................: Initialization failed
MPID_Init(94)..........................: channel initialization failed
MPIDI_CH3_Init(116)....................:
MPIDI_CH3U_Init_sshm(260)..............: unable to create a bootstrap message queue
MPIDI_CH3I_BootstrapQ_create_named(347): failed to create a shared memory message queue
MPIDI_CH3I_mqshm_create(97)............: Out of memory
MPIDI_CH3I_SHM_Get_mem_named(465)......:

MPIDI_CH3I_SHM_Get_mem_named(464)......: unable to open shared memory object /mpich2q22D2B0891650E9536DEC94604DE55193 (errno 13)
/home/smanohar/MATLAB/2010b/bin/glnxa64/smpd -d: Killed

Falk

unread,
Jul 22, 2011, 4:48:07 AM7/22/11
to
Hi Everybody,


Thank you for this very helpful discussion, I have the same problem on my Machine:
Intel X 980, 12GB RAM, Win7 64-bit, Matlab 2010b.

I created an (importable) mpiexec file, which runs perfectly on several different 32-bit multicore systems. Since the error message I receive during validation is basically identical with what you all have posted, I will not repost it fully; it is simply that: it can't find "option -d" for spmd. Same as with you, all files are actually there...

So when you apply the fix

distcomp.feature('LocalUseMpiexec',false)

Where exactly do you actually put that argument in the Parallel Manager? I tried several things, but I think I am doing something wrong.

Thank you very much in advance for your help!
Kind Regards,
Falk

0 new messages