At the first stage, just to familiarize myself to the process, I have attempted to run a job on my local machine through the local scheduler.
A tutorial is set-up to produce five tasks, where each job generates a 3-by-3 matrix of random numbers.
==========================================================
sched = findResource('scheduler','type','local');
obj = createJob();
createTask(obj, @rand, 1, {{3,3} {3,3} {3,3} {3,3} {3,3}});
get(obj,'Tasks')
submit(obj);
waitForState(obj)
out = getAllOutputArguments(obj);
==========================================================
Unfortunately, the simulation continues to display a 'failed' state run, when checking the status though:
get(obj,'Tasks')
As a result, the final solution through the out = getAllOutputArguments(obj); displays an "Empty cell array: 5-by-0"
Can you please suggest what is missing in my code.
Many thanks
> I am in the process of setting up my machine to eventually submit jobs to a
> cluster through MATLABs Parallel Computing toolbox.
>
> At the first stage, just to familiarize myself to the process, I have
> attempted to run a job on my local machine through the local scheduler.
>
> A tutorial is set-up to produce five tasks, where each job generates a 3-by-3 matrix of random numbers.
> ==========================================================
> sched = findResource('scheduler','type','local');
> obj = createJob();
the above line creates a job called "obj" on whatever happens to be your default
scheduler - you should force it to use the scheduler "sched" by doing
obj = createJob( sched );
Other than that, your code looks fine.
> Unfortunately, the simulation continues to display a 'failed' state run, when
> checking the status though: get(obj,'Tasks')
If you still get the problems, one way of getting more information is to check
the "debug log" for a task, like so:
sched.getDebugLog( obj.Tasks(1) )
Posting that output here may help - preferably after executing
setenv( 'MDCE_DEBUG', 'true' );
to give more debug information.
Cheers,
Edric.
Thanks Edric.
Here is the error that I am getting from
sched.getDebugLog( obj.Tasks(1) )
===========================================================
LOG FILE OUTPUT:
License checkout failed.
License Manager Error -9
This error may occur when:
-The hostid of this computer does not match the hostid in the license file.
-A Designated Computer installation is in use by another user.
If no other user is currently running MATLAB, you may need to activate.
==========================================================
The first point may apply here, so I am currently working my way through this. Hope to have this resolved shortly.
Thanks again.
Just checked. The hostid of my computer does indeed match the hostid in the license file!
> "Mr. CFD" <s210...@student.rmit.edu.au> wrote in message <gcmk0j$56o$1...@fred.mathworks.com>...
> > Thanks Edric.
> > Here is the error that I am getting from
> > sched.getDebugLog( obj.Tasks(1) )
> > ===========================================================
> > LOG FILE OUTPUT:
> > License checkout failed.
> > License Manager Error -9
> > This error may occur when:
> > -The hostid of this computer does not match the hostid in the license file.
> > -A Designated Computer installation is in use by another user.
> > If no other user is currently running MATLAB, you may need to activate.
> > ==========================================================
> > The first point may apply here, so I am currently working my way through
> > this. Hope to have this resolved shortly. Thanks again.
>
> Just checked. The hostid of my computer does indeed match the hostid in the
> license file!
Hi,
That certainly seems to have narrowed things down. Could I suggest you contact
our installation support people - they know way more than I do about this sort
of licencing thing.
Cheers,
Edric.
Hi Edric,
I sent of an email to mathworks in regards to this. In the meantime, I decided to upgrade to 2008b, since it was reported that the parallel computing toolbox was updated. Tried the same exercise again on the newer version and happy to report, it ran smoothly! Still don't really know why it was an issue in the previous older version, but the newer copy seemed to have resolved this issue.
Have now uploaded the name task onto a cluster, to see how it goes on that. It seems that the job is set on default walltime, which in this case is 3 months! Thus, you can imagine my job will be on queue for sometime! According to the notes provided by the computing facility, there is no functionality within the PCT to allow ffor the selection of a walltime. Sounds strange! Is that really the case?
Thanks
> I sent of an email to mathworks in regards to this. In the meantime, I decided
> to upgrade to 2008b, since it was reported that the parallel computing toolbox
> was updated. Tried the same exercise again on the newer version and happy to
> report, it ran smoothly! Still don't really know why it was an issue in the
> previous older version, but the newer copy seemed to have resolved this issue.
That's good news. I'm not sure if you're already aware, but you do need to have
the same version of MATLAB installed on your cluster too.
> Have now uploaded the name task onto a cluster, to see how it goes on that. It
> seems that the job is set on default walltime, which in this case is 3 months!
> Thus, you can imagine my job will be on queue for sometime! According to the
> notes provided by the computing facility, there is no functionality within the
> PCT to allow ffor the selection of a walltime. Sounds strange! Is that really
> the case? Thanks
What scheduler are you using? If you're using one of our built-in integrations
(i.e. Torque, PBSPro, or LSF), you can add any arbitrary command-line flags to
qsub by setting things in the "SubmitArguments" property of the scheduler
object. (Torque and PBSPro also have the ResourceTemplate property). Let me know
if this doesn't work for you.
Cheers,
Edric.
Hi Edric,
Thanks for your message.
> That's good news. I'm not sure if you're already aware, > but you do need to have the same version of MATLAB > installed on your cluster too.
This must explain why I get the following error when the job was submitted to the cluster. What are your thoughts?
===========================================================
Warning: you did not specify a shell in the first line of your PBS script
We have assumed you wish to use bash, however please update your script with a valid shell
< M A T L A B (R) >
Copyright 1984-2008 The MathWorks, Inc.
Version 7.6.0.324 (R2008a)
February 10, 2008
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
MATLAB core dump: Exit on fatal error (no core) enabled.
About to construct the storage object using constructor "makeFileStorageObject" and location "/home/matlab/"
About to find job proxy using location "Job1"
About to find task proxy using location "Job1/Task1"
Completed pre-execution phase
Unexpected error in PreJobEvaluate - MATLAB will now exit.
Error using ==> pSetInitData at 39
Error setting license information from job.
Nested Error :
No method 'addFeatures' with matching signature found for class 'com.mathworks.toolbox.distcomp.nativedmatlab.NativeMethods'.
===========================================================
Like I said in my post, I have upgraded to R2008b and on the cluster they have R2008a. I have since requested them to upgrade to the newer version if possible.
When this comes through, then I can give you more information in regards to the scheduler setting that I have. Hopefully we can then set the appropriate walltime.
Thanks again for your feedback and time
> Edric M Ellis <eel...@mathworks.com> wrote in message
> <ytwljww...@uk-eellis-deb4-64.mathworks.co.uk>...
> > That's good news. I'm not sure if you're already aware, but you do need to
> > have the same version of MATLAB installed on your cluster too.
>
> This must explain why I get the following error when the job was submitted to
> the cluster. What are your thoughts?
Yes, that error is a consequence of sending a job from R2008b for execution
under R2008a.
Cheers,
Edric.
Hi Edric,
Well we have finally got R2008b on the clusters and the simple (@rand) tutorial provided in the help file works fine,when submitting it to the cluster, unlike before when the two MATLAB versions were different.
Now I am trying to feed in my own function and have it submitted to the cluster. Have a very simple test.m model as below
=====================================================
function result=test(x)
y = x+2 ;
result = {y};
=======================================================
Using the following submit functions:
clusterHost = 'xxx.xxx.xxx';
remoteDataLocation = '/home/matlab/';
timeLimit = '00:02:00';
sched = findResource('scheduler', 'type', 'generic');
set(sched, 'DataLocation', 'C:\MATLAB')
set(sched, 'ClusterMatlabRoot', '/usr/local/matlab/R2008b');
set(sched, 'HasSharedFilesystem', true)
set(sched, 'ClusterOsType', 'unix');
set(sched, 'GetJobStateFcn', @pbsGetJobState);
set(sched, 'DestroyJobFcn', @pbsDestroyJob);
set(sched, 'SubmitFcn', {@pbsNonSharedSimpleSubmitFcn, timeLimit,clusterHost, remoteDataLocation});
j = createJob(sched)
createTask(j, @test, 1, {{1} {3}});
submit(j)
waitForState(j)
results = getAllOutputArguments(j);
celldisp(results)
=======================================================
The clusters report no errors, but MATLAB window continues to display an 'Empty cell array: 1-by-0' for the results output. Do you have any idea where the problem is?
Thanks
Hi Edric,
> Well we have finally got R2008b on the clusters and the simple (@rand)
> tutorial provided in the help file works fine,when submitting it to the
> cluster, unlike before when the two MATLAB versions were different.
>
> Now I am trying to feed in my own function and have it submitted to the
> cluster. Have a very simple test.m model as below
>
> =====================================================
> function result=test(x)
> y = x+2 ;
> result = {y};
> =======================================================
> Using the following submit functions:
>
> clusterHost = 'xxx.xxx.xxx'; [...]
>
> j = createJob(sched)
> createTask(j, @test, 1, {{1} {3}});
> submit(j)
I suspect that the problem is that your cluster cannot find "test.m", you should
add that to the FileDependencies of the job, like so:
j.FileDependencies = {'test.m'};
(This is not needed for "rand" since that's already on the MATLAB path of the
workers.)
> The clusters report no errors, but MATLAB window continues to display an
> 'Empty cell array: 1-by-0' for the results output.
When you say "The clusters report no errors" - where are you looking? Did you
try displaying the job (simply type the job name with no semicolon)? That might
indicate problems. Here's what I see if I try to run a job which doesn't have
correct FileDependencies:
>> j = jm.createJob(); j.createTask( @test, 1, {{1}, {3}} );
>> j.submit(); j.wait();
>> j
j =
Job ID 2 Information
====================
UserName : eellis
State : finished
SubmitTime : Tue Nov 11 10:56:30 GMT 2008
StartTime : Tue Nov 11 10:56:30 GMT 2008
Running Duration : 0 days 0h 0m 2s
- Data Dependencies
FileDependencies : {}
PathDependencies : {}
- Associated Task(s)
Number Pending : 0
Number Running : 0
Number Finished : 2
TaskID of errors : [1 2]
>> j.Tasks(1)
ans =
Task ID 1 from Job ID 4 Information
===================================
State : finished
Function : @test
StartTime : Tue Nov 11 10:56:30 GMT 2008
Running Duration : 0 days 0h 0m 1s
- Task Result Properties
ErrorIdentifier : MATLAB:UndefinedFunction
ErrorMessage : Undefined function or method 'test' for input...
: s of type 'double'.
Error Stack :
Cheers,
Edric.