LSF bsub: Provided memory on cluster node defined by -R and -M flags

1,672 views
Skip to first unread message

Elke Schaper

unread,
Dec 26, 2014, 8:08:09 AM12/26/14
to gc3...@googlegroups.com
Dear all,

this is just to report some LSF behaviour fyi, in case you ever stumble into related problems:

There seem to be two interfering, slightly different ways to request memory for a bsub command, the -R and -M options.
In GC3PIE, the -R option is used. This can lead to interesting effects in case the requested memory through -R is above the requested memory through -M, such as shown in this LSF logfile:

TERM_MEMLIMIT: job killed after reaching LSF memory usage limit.
Exited with exit code 1.

Resource usage summary:

   CPU time :               1.03 sec.
   Max Memory :             21.73 MB
   Average Memory :         21.73 MB
   Total Requested Memory : 10000.00 MB
   Delta Memory :           9978.27 MB
   (Delta: the difference between total requested memory and actual max usage.)
   Max Swap :               340 MB

A solution is to set the -M flag to a high enough level, e.g. in the gc3pie defaults .ini file (on my system, the -M flag used kb as unit.). For this example:
bsub = bsub -M 10000000


May LSF be with you,

Elke


Riccardo Murri

unread,
Jan 6, 2015, 4:44:43 PM1/6/15
to gc3...@googlegroups.com
Hi Elke,

> There seem to be two interfering, slightly different ways to request memory
> for a bsub command, the -R and -M options.
> In GC3PIE, the -R option is used. This can lead to interesting effects in
> case the requested memory through -R is above the requested memory through
> -M, [...]
> A solution is to set the -M flag to a high enough level, e.g. in the gc3pie
> defaults .ini file (on my system, the -M flag used kb as unit.). For this
> example:
> bsub = bsub -M 10000000

Many thanks for this explanation!

I think our use of `-R rusage[mem=...]` comes from the way the LSF system is
set up on the "Brutus" cluster at ETHZ. We can change this to use `-M`
(instead or in addition), depending on what is the correct LSF way of doing
things or what is most popular on LSF clusters out there.

So, LSF users, which option should be used and how?

Thanks,
Riccardo

Riccardo Murri

unread,
Jan 6, 2015, 5:03:27 PM1/6/15
to gc3...@googlegroups.com
Hi Elke, all,

I've done some googling and it seems that options `-M` and `-R rusage[mem=...]`
have different meanings:

* `-M` is for setting an *upper limit* to memory consumption; if a job exceeds
that limit, it is killed. However, man pages for `bsub` state that the
default is no limit -- that is likely why nobody complained so far.

* `-R rusage[mem=...]` is for *reserving* memory: i.e., it tells LSF that the
job needs this much memory free in order to run on a node, so that scheduler
can select appropriate HW.

Therefore I would say that the correct behavior should be -as you actually
suggested- to use `-R rusage[mem=...]` and to use `-M` with a *higher* value.

I'm not sure whether it's worth adding a new configuration option to set the
ratio of `-M` to `-R`, or we can just pick a "sensible" value e.g. 2

Ciao,
Riccardo

P.S. For the record, here's a few snippets of documentation extracted from man
pages for `bsub` and instruction pages found with a Google search. The
documentation text varies apparently with the actual LSF version, but the
behavior described seems to be consistent.

-M mem_limit
Set a per-process (soft) process resident set size limit to mem_limit
Kbytes for each of the processes that belong to this batch job (see
getrlimit(2)). The default is no soft limit.

-M mem_limit
Set the total process resident set size limit to
mem_limit KBytes for the whole job. The default is no
limit. Exceeding the limit causes the job to terminate.

-M [memory_limit in KB] : set a memory limit for all the processes that
belong to this batch job. The memory_limit is specified in
KB. LSF kills
the job when it exceeds the memory limit. This parameter does not
guarantee memory allocation, it's just a threshold. To
reserve memory on a
node, use the -R "rusage[mem=X]" option described below.

-M mem_in_kb
With this option one sets minimum memory requirements for the execution
hosts of a job. This is useful on the Altix (queue gwdg-ia64),
where nodes
with 3GB as well as 6GB of RAM are available. The value is in KB per
CPU. Please note that stating a memory requirement with -M is
not the same
as making a memory reservation with -R, as it is necessary in queue
gwdg-p690.
Reply all
Reply to author
Forward
0 new messages