Restart a simulation on cluster where the simulation time is limited

379 views
Skip to first unread message

fds-quang

unread,
Dec 18, 2015, 8:45:50 AM12/18/15
to FDS and Smokeview Discussions
Hello everyone!

Currently I would like to model an apartment fire. The duration of the fire is 3000 s. The simulation is running on a linuix cluster where the simulation time is limited. By example, for my simulation,   the maximum time  is about 100 hours. It's not enough for finish the simulation at t=3000 s. After 100 hour of simulation, my calcul is stopped ("walltime 360082 exceeded limit 360000") (at t=2000s for the real fire).

I want to restart the simualtion to run it from t=2000 s to t=3000 s but i have a problem with the batch file

***************************


#!/bin/sh
#PBS -l walltime=100:00:00
#PBS -l select=3:ncpus=20:mpiprocs=20
#PBS -q default

module purge
module load intel-compilers-14/14.0.2.144
module load intel-mpi-4/4.1.3.048

cd $PBS_O_WORKDIR

NCPUS=`wc -l<$PBS_NODEFILE`

mpirun -np ${NCPUS} ../../source_file/fds6_v3_impi_26_11_2015 appt_T5_test54.fds

***********************************

If i let walltime=100:00:00, i can't restart the simulation because the walltime maximum is reached (for the fire simualtion from t=0 to t=2000 s). And we can not use a bigger value for the walltime.

Does anyone encounter this problem?

Thanks in advance!

Kevin

unread,
Dec 18, 2015, 8:52:59 AM12/18/15
to FDS and Smokeview Discussions
You have to use the RESTART feature. See the User's Guide. I suggest you try a case where you stop the calculation after a short time, then start it again to make sure the feature works for you.

fds-quang

unread,
Dec 18, 2015, 5:27:24 PM12/18/15
to FDS and Smokeview Discussions
Hi kevin, 

i've use the restart feature, i set RESTART=.TRUE. on MISC line. My problem is the time limited for simulation on our cluster. When you run your calcul on your cluster, do you have quota for the simulation time? 

Dave McGill

unread,
Dec 18, 2015, 5:32:36 PM12/18/15
to FDS and Smokeview Discussions
I ran into this years ago. I assume you are using PBS as the queuing system. Just ask whomever runs the cluster to  modify PBS to extend the maximum time available for a single job.

Dave

Lukas

unread,
Dec 19, 2015, 5:29:02 AM12/19/15
to fds...@googlegroups.com
Hi,

you can ask the batch system to send a signal before the wall clock time of you job ends. When it does, you just create a stop file and then the FDS simulation is gracefully shut down. This might be embedded in a job chain with dependent jobs, i.e. you setup a chain where each FDS simulation is dependent on the previous one.

Attached you find an example which I just striped from our working pipeline: a) a shell script to setup a job chain (fds-chain.sh) and b) a sample job file that stops the simulation 10 minutes before the wall clock time ends (fds-chain-part.job).

It is written for SLURM, but it can be easily adopted to PBS and will therefor give you an idea of a potential implementation for your setup. If you get stuck, just consult the PBS documentation (now you know what to look for) or ask your admins -- which in general know all the tweaks of their batch system.

Best,
Lukas

ps: I might not be responsive for the next couple of weeks...
> --
> You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
> To post to this group, send email to fds...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/f87f2517-4350-4ee9-adf0-e7b01bc5067e%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
fds-chain-part.job
fds-chain.sh

fds-quang

unread,
Apr 13, 2016, 4:57:46 AM4/13/16
to FDS and Smokeview Discussions
Hi Lukas

Thank you for your answer!

Just a small question : where did you declare the 10 minute value to stop the simualtion. I don't see it in both two files.

Lukas

unread,
Apr 14, 2016, 2:41:02 AM4/14/16
to FDS and Smokeview Discussions
See the line with the '--signal' declaration in the job script. It defines the time in seconds before the wall clock time ends to send the defined signal. The batch script must be able to catch it and interpret it, here it just creates a *.stop file.

Best,
Lukas
> To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/0852d6d5-8a5c-4910-9fcf-ff452b4fa73c%40googlegroups.com.

Mohamed

unread,
Apr 14, 2016, 4:15:45 AM4/14/16
to FDS and Smokeview Discussions
Hi,
 
You can try to add into your fds file DT_RESTART=50. to save restart files every 50 s, after the end of the fisrt job just set RESTART=.TRUE. and you can submit a new job.
 it's worked for me  
Reply all
Reply to author
Forward
0 new messages