USPEX job submission setup (submitJob_local.m and checkStatus_local.m) for Stampede supercomputer.

833 views
Skip to first unread message

hom sharma

unread,
Feb 25, 2014, 3:30:08 PM2/25/14
to us...@googlegroups.com
Hi,
I was wondering if anyone has used USPEX in the Stampede supercomputer. If so, could you please help/suggest me set up the job submission. I  couldn't run matlab also there.

thanks,
Hom

qian guangrui

unread,
Feb 25, 2014, 5:17:41 PM2/25/14
to us...@googlegroups.com
HI,

    I will post the submitJob.m and checkStatusC.m files for running jobs on Stampede later. The cluster now is under maintenance. 


Bests
Guangrui

hom sharma

unread,
Feb 25, 2014, 11:59:45 PM2/25/14
to us...@googlegroups.com
Thanks Guangrui !!


On Tuesday, February 25, 2014 3:30:08 PM UTC-5, hom sharma wrote:

qian guangrui

unread,
Feb 26, 2014, 10:39:09 AM2/26/14
to us...@googlegroups.com
HI, hom,
  
   The submitJob.m and checkStatusC.m files for Stampede are uploaded.

   Please use :
       MIPT : whichCluster
   in the INPUT.txt file for Stampede cluster.

  Pease don't run more than 50 parallel jobs on Stampede, and call MATLAB every 6-7 mins
   

Bests
Guangrui
submitJob.m
checkStatusC.m

hom sharma

unread,
Feb 26, 2014, 1:45:19 PM2/26/14
to us...@googlegroups.com
Hi Guangrui,
I am trying to use the latest USPEX 9.3.9. the sub submission script and check scripts are different (look much simpler) than older versions. here is what I am doing. please advise where i m doing wrong. thanks for your help. please see below- Hom

login1$ cat Submission/submitJob_local.m
function jobNumber = submitJob_local()
%-------------------------------------------------------------
%This routine is to check if the submitted job is done or not
%One needs to do a little edit based on your own case.

%1   : whichCluster (default 0, 1: local submission, 2: remote submission)
%-------------------------------------------------------------

%Step 1: to prepare the job script which is required by your supercomputer
fp = fopen('myrun', 'w');
fprintf(fp, '#!/bin/sh\n');
fprintf(fp, '#SBATCH -J Test\n');
fprintf(fp, '#SBATCH -A TG-DM-------\n');
fprintf(fp, '#SBATCH -o test \n');
fprintf(fp, '#SBATCH -n 32 \n');
fprintf(fp, '#SBATCH -p normal \n');
fprintf(fp, '#SBATCH -t 01:00:00 \n');
fprintf(fp, module load vasp/5.3.3\n');
fprintf(fp, 'ibrun vasp_std_vtst > log\n');
fclose(fp);

%Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc.
%It will output some message on the screen like '2350873.nano.cfn.bnl.local'
[a,b]=unix(['sbatch myrun'])


%Step 3: to get the jobID from the screen message
end_marker = findstr(b,'.');
jobNumber = b(1:end_marker(1)-1);
disp(jobNumber)

check script:
login1$ cat Submission/checkStatus_local.m
function doneOr = checkStatus_local(jobID)
%--------------------------------------------------------------------
%This routine is to check if the submitted job is done or not
%One needs to do a little edit based on your own case.
%1   : whichCluster (0: no-job-script, 1: local submission, 2: remote submission)
%--------------------------------------------------------------------

%Step1: the command to check job by ID.
    [a,b] = unix(['showq -u ' jobID ''])

%Step2: to find the keywords from screen message to determine if the job is done
%Below is just a sample:
%-------------------------------------------------------------------------------
%Job id                    Name             User            Time Use S Queue
%------------------------- ---------------- --------------- -------- - -----
%2455453.nano              USPEX            qzhu            02:28:42 R cfn_gen04
%-------------------------------------------------------------------------------
%If the job is still running, it will show as above.

%If there is no key words like 'R/Q Cfn_gen04', it indicates the job is done.
    if isempty(findstr(b,'Waiting')) & isempty(findstr(b,'Running'))
       doneOr = 1
       unix('rm USPEX*');    % to remove the log file
    else
       doneOr = 0;
    end

I also see error if i type matlab in Stampede. Is there anything we need to use matlab there ?
login1$ matlab
Warning: No display specified.  You will not be able to display graphics on the screen.
login1$ Error occurred during initialization of VM
Could not reserve enough space for object heap
Could not create the Java virtual machine.

On Tuesday, February 25, 2014 3:30:08 PM UTC-5, hom sharma wrote:

qian guangrui

unread,
Feb 26, 2014, 2:52:45 PM2/26/14
to us...@googlegroups.com
HI, Hom

   Sorry for not giving the correct version  :)

On Wednesday, February 26, 2014 1:45:19 PM UTC-5, hom sharma wrote:
Hi Guangrui,
I am trying to use the latest USPEX 9.3.9. the sub submission script and check scripts are different (look much simpler) than older versions. here is what I am doing. please advise where i m doing wrong. thanks for your help. please see below- Hom

login1$ cat Submission/submitJob_local.m
function jobNumber = submitJob_local()
%-------------------------------------------------------------
%This routine is to check if the submitted job is done or not
%One needs to do a little edit based on your own case.

%1   : whichCluster (default 0, 1: local submission, 2: remote submission)
%-------------------------------------------------------------

%Step 1: to prepare the job script which is required by your supercomputer
fp = fopen('myrun', 'w');
fprintf(fp, '#!/bin/sh\n');
fprintf(fp, '#SBATCH -J Test\n');
fprintf(fp, '#SBATCH -A TG-DM-------\n');
fprintf(fp, '#SBATCH -o test \n');
fprintf(fp, '#SBATCH -n 32 \n');
I think for VASP, 16core is enough at Stampede. If you want to use 32 core, here i think you need to add another line:
fprintf(fp, '#SBATCH -N 2 \n');
 
fprintf(fp, '#SBATCH -p normal \n');
fprintf(fp, '#SBATCH -t 01:00:00 \n'); 
fprintf(fp, module load vasp/5.3.3\n');
 
fprintf(fp, 'ibrun vasp_std_vtst > log\n');

for this line, i suggest you use the full path of ibrun. It should be :

   fprintf(fp, '/usr/local/bin/ibrun vasp_std_vtst > log\n');

 
fclose(fp);

%Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc.
%It will output some message on the screen like '2350873.nano.cfn.bnl.local'
 
[a,b]=unix(['sbatch myrun'])

%Step 3: to get the jobID from the screen message
end_marker = findstr(b,'.');
jobNumber = b(1:end_marker(1)-1);
disp(jobNumber)

 
Replace step2 and step3 with : 
 unix(['sbatch myrun > job.log'] );
 [a,b]=unix(['cat job.log | grep "batch job"'])
 start_marker=findstr(b,'job ');
 %end_marker = findstr(b,'');
 jobNumber = b(start_marker(1)+4:end-1);
try to use putty or other terminate tools without a X-win feature.

hom sharma

unread,
Feb 26, 2014, 10:29:31 PM2/26/14
to us...@googlegroups.com
Hi Guangrui,
I followed your suggestion and tried to run the job. It does submit the job but it doen't run. I was wondering if fprintf(fp, 'cd ${PBS_O_WORKDIR}\n'); is required to be included as given in the sample file. I had not used that in my script.

the log file looks like this inside the Calc folder:

login3$ cat CalcFold1/log
TACC: Starting up job 2887392
TACC: Setting up parallel environment for MVAPICH2+mpispawn.
TACC: Starting parallel tasks...

TACC: Shutdown complete. Exiting.

Jobsubmit file :
function jobNumber = submitJob_local()
%-------------------------------------------------------------
%This routine is to check if the submitted job is done or not
%One needs to do a little edit based on your own case.

%1   : whichCluster (default 0, 1: local submission, 2: remote submission)
%-------------------------------------------------------------

%Step 1: to prepare the job script which is required by your supercomputer
fp = fopen('myrun', 'w');
fprintf(fp, '#!/bin/sh\n');
fprintf(fp, '#SBATCH -J Test\n');
fprintf(fp, '#SBATCH -A TG-D----\n');
fprintf(fp, '#SBATCH -o test \n');
fprintf(fp, '#SBATCH -n 16 \n');
fprintf(fp, '#SBATCH -p normal \n');
fprintf(fp, '#SBATCH -N 2 \n');
fprintf(fp, '#SBATCH -t 00:30:00 \n');
fprintf(fp, 'module load vasp/5.3.3\n');
%fprintf(fp, 'cd ${PBS_O_WORKDIR}\n');
fprintf(fp, '/usr/local/bin/ibrun vasp_std_vtst > log\n');
fclose(fp);

%Step 2: to submit the job with the command like qsub, bsub, llsubmit, .etc.
%It will output some message on the screen like '2350873.nano.cfn.bnl.local'
%[a,b]=unix(['sbatch myrun'])


%Step 3: to get the jobID from the screen message
%end_marker = findstr(b,'.');
%jobNumber = b(1:end_marker(1)-1);
%disp(jobNumber)

 unix(['sbatch myrun > job.log'] );
 [a,b]=unix(['cat job.log | grep "batch job"'])
 start_marker=findstr(b,'job ');
 %end_marker = findstr(b,'');
 jobNumber = b(start_marker(1)+4:end-1);


login3$ cat CalcFold1/test
/tmp/slurmd/job2887392/slurm_script: line 12: module: command not found
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 1, pid: 29926) exited with status 1
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 0, pid: 29925) exited with status 1
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 2, pid: 29927) exited with status 1
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 3, pid: 29928) exited with status 1
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][child_handler] MPI process (rank: 4, pid: 29929) exited with status 1

On Tuesday, February 25, 2014 3:30:08 PM UTC-5, hom sharma wrote:

qian guangrui

unread,
Feb 26, 2014, 10:33:50 PM2/26/14
to us...@googlegroups.com
HI,  hom,

   The error is very clear. 
   
login3$ cat CalcFold1/test
/tmp/slurmd/job2887392/slurm_script: line 12: module: command not found
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)
[c447-104.stampede.tacc.utexas.edu:mpispawn_0][spawn_processes] Failed to execvp() 'vasp_std_vtst': `bc (2)


  You should not use module at your job submission script.
  Please change your default module at login nodes first to include the vasp, save them as a default module. And then try to run a USPEX calculation.

  The user guide of stampede will tell you how to load the module and save it.
 

Bests
Guanguri
Reply all
Reply to author
Forward
0 new messages