Thanks for the comprehensive responses (and for Melissa dropping by
yesterday), I think I'm very close to solving this problem.
Here's what works:
AMBER MPI executables with "normal" slurm submission
AMBER MPI executables with BigJob submission
AMBER serial executables with "normal" slurm submisson
Here's what gives an AMBER related error:
AMBER serial executables with BigJob submission
Unfortunately the specific functionality that I need is ONLY available
with the serial executable (yes, that needs to change, but unfortunately
we don't get funded to improve old code these days).
The specific error is a failure to find a shared object file (libsvml.so);
I'm told that it is related to the Intel Math Libraries (MKL?). The AMBER
install I am using was compiled against the default environment on
Stampede ("module list" shows this to be intel/
13.0.2.146). AMBER's
configure script also asks for a "MKL_HOME" variable, which I set in my
".profile_user":
export MKL_HOME=/opt/apps/intel/13/composer_xe_2013.3.163/mkl
although "locate libsvml.so" gives multiple "hits" under
/opt/apps/intel/13/composer_xe_2013.0.062/compiler/lib/
So maybe this is not related. I also added the above MKL_HOME variable to
the "environment" of the compute unit description, with the same results.
So what is different in this last scenario? Is BigJob doing something
different other than "ibrun <executable> <arguments>"? Presumably
something special must be done if 16 jobs are to be placed on a single
node (the -o option Yaakoub mentioned?)
It also might be relevant that the stdout file in /agent contains an ibrun
usage message. Is this in response to an unexpected usage attempt or just
something that happens?
Brian
>>
>> His first question is regarding ibrun: "Is there a way to trick Stampede
> into using ibrun to run 16 single core jobs on a single node?" Can you
> please tell me if this is possible on command line? If it is, its possible
> we may be able to add similar functionality to BigJob.
>
> ibrun -n -o is one way. ibrun -help will show you more. However if you
> are
> using BigJob I think there's already something to do this in Bigjob. Going
> from lonestar to stampede, you should not see any difference in the way
> ibrun works.
>
>> Brian also notes that some of his jobs on particular compute nodes are
> not loading the same environment as on the login nodes (he notes SGE -V is
> the Lonestar equivalent to ensure this happens - not sure if SLURM has a
> similar feature?). I wanted to verify that the compute node configurations
> for Stampede are uniform. If they are indeed uniform, can you please tell
> me in what order the login nodes and in what order the compute nodes
> source
> configuration files (i.e. I am trying to make sure something like
> bash_profile isn't overwriting bashrc in one case and not the other)?
>
> this is built into slurm ,no need for an explicit option. depending on the
> shell used, .profile, .bashrc, .profile_user, or .login, .cshrc,
> .login_user
>
> Hope this helps. Let me know how it goes.