Running serial jobs on Stampede

136 views
Skip to first unread message

rad...@rci.rutgers.edu

unread,
Nov 3, 2013, 1:39:11 PM11/3/13
to bigjob...@googlegroups.com
Is there a way to trick Stampede into using ibrun to run 16 single core
jobs on a single node? I know that's not really a BigJob question, but I'd
also like to know if/how I can do that via BigJob.

On an unrelated note, some of my jobs appear to have trouble loading the
same environment as the login node (I'm thinking of the SGE -V option that
always gets passed on Lonestar). Is this always done on Stampede too or is
there another option I should be toggling with the compute unit
description?

Thanks,
Brian

============================================ Current Address ============
Brian Radak : Rutgers University
PhD candidate - York Research Group : BioMaPS Institute
University of Minnesota - Twin Cities : CIPR 308
Graduate Program in Chemical Physics : 174 Frelinghuysen Road,
Department of Chemistry : Piscataway, NJ 08854
rada...@umn.edu : rad...@biomaps.rutgers.edu
=======================================================================
Sorry for the multiple e-mail addresses, just use the institute
appropriate address.

Melissa Romanus

unread,
Nov 5, 2013, 1:30:44 PM11/5/13
to bigjob...@googlegroups.com
Hi Brian,

I actually have no idea. As you said, this is a system-level thing, not a BigJob thing, so I will ask Yaakoub (our TACC sysadmin POC) regarding ibrun.

For the second half of your question, again this is something I will have to ask Yaakoub (in terms of specifically asking sge -v vs a slurm alternative). But for the part of the question concerning the CUD, can you please elaborate? What in the environment doesn't get loaded? Like a problem with module load something? If it's a specific environment variable, you could always add it to the CUD, but if its in your bashrc, this should get sourced on the compute nodes. This is again going to come down to a misconfiguration or problem with the actual nodes of Stampede, not really a BigJob issue, so Yaakoub will be better poised to answer these kinds of questions. I will also ask him what order to bash config files are sourced in - this was previously an issue with Lonestar.

-Melissa



--
You received this message because you are subscribed to the Google Groups "bigjob-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigjob-users...@googlegroups.com.
To post to this group, send email to bigjob...@googlegroups.com.
Visit this group at http://groups.google.com/group/bigjob-users.
For more options, visit https://groups.google.com/groups/opt_out.

Melissa Romanus

unread,
Nov 5, 2013, 1:37:15 PM11/5/13
to Yaakoub El Khamra, bigjob...@googlegroups.com, Brian Radak, Shantenu Jha
Hi Yaakoub,

One of our BigJob users at Rutgers, Brian, has transitioned from using Lonestar to using Stampede. He has some questions that I was unable to answer that have to do with some Stampede-related things, and I was hoping you could help us if possible. I can also open a ticket with TACC if you are too busy, but please let me know if this is the case, so I can get back to Brian ASAP.

His first question is regarding ibrun: "Is there a way to trick Stampede into using ibrun to run 16 single core jobs on a single node?" Can you please tell me if this is possible on command line? If it is, its possible we may be able to add similar functionality to BigJob.

Brian also notes that some of his jobs on particular compute nodes are not loading the same environment as on the login nodes (he notes SGE -V is the Lonestar equivalent to ensure this happens - not sure if SLURM has a similar feature?). I wanted to verify that the compute node configurations for Stampede are uniform. If they are indeed uniform, can you please tell me in what order the login nodes and in what order the compute nodes source configuration files (i.e. I am trying to make sure something like bash_profile isn't overwriting bashrc in one case and not the other)?

Thank you for your time!

-Melissa 

rad...@rci.rutgers.edu

unread,
Nov 7, 2013, 10:36:50 AM11/7/13
to Yaakoub El Khamra, Melissa Romanus, Shantenu Jha, bigjob...@googlegroups.com
Thanks for the comprehensive responses (and for Melissa dropping by
yesterday), I think I'm very close to solving this problem.

Here's what works:

AMBER MPI executables with "normal" slurm submission
AMBER MPI executables with BigJob submission
AMBER serial executables with "normal" slurm submisson

Here's what gives an AMBER related error:

AMBER serial executables with BigJob submission

Unfortunately the specific functionality that I need is ONLY available
with the serial executable (yes, that needs to change, but unfortunately
we don't get funded to improve old code these days).

The specific error is a failure to find a shared object file (libsvml.so);
I'm told that it is related to the Intel Math Libraries (MKL?). The AMBER
install I am using was compiled against the default environment on
Stampede ("module list" shows this to be intel/13.0.2.146). AMBER's
configure script also asks for a "MKL_HOME" variable, which I set in my
".profile_user":

export MKL_HOME=/opt/apps/intel/13/composer_xe_2013.3.163/mkl

although "locate libsvml.so" gives multiple "hits" under

/opt/apps/intel/13/composer_xe_2013.0.062/compiler/lib/

So maybe this is not related. I also added the above MKL_HOME variable to
the "environment" of the compute unit description, with the same results.

So what is different in this last scenario? Is BigJob doing something
different other than "ibrun <executable> <arguments>"? Presumably
something special must be done if 16 jobs are to be placed on a single
node (the -o option Yaakoub mentioned?)

It also might be relevant that the stdout file in /agent contains an ibrun
usage message. Is this in response to an unexpected usage attempt or just
something that happens?

Brian

>>
>> His first question is regarding ibrun: "Is there a way to trick Stampede
> into using ibrun to run 16 single core jobs on a single node?" Can you
> please tell me if this is possible on command line? If it is, its possible
> we may be able to add similar functionality to BigJob.
>
> ibrun -n -o is one way. ibrun -help will show you more. However if you
> are
> using BigJob I think there's already something to do this in Bigjob. Going
> from lonestar to stampede, you should not see any difference in the way
> ibrun works.
>
>> Brian also notes that some of his jobs on particular compute nodes are
> not loading the same environment as on the login nodes (he notes SGE -V is
> the Lonestar equivalent to ensure this happens - not sure if SLURM has a
> similar feature?). I wanted to verify that the compute node configurations
> for Stampede are uniform. If they are indeed uniform, can you please tell
> me in what order the login nodes and in what order the compute nodes
> source
> configuration files (i.e. I am trying to make sure something like
> bash_profile isn't overwriting bashrc in one case and not the other)?
>
> this is built into slurm ,no need for an explicit option. depending on the
> shell used, .profile, .bashrc, .profile_user, or .login, .cshrc,
> .login_user
>
> Hope this helps. Let me know how it goes.

rad...@rci.rutgers.edu

unread,
Nov 7, 2013, 5:06:26 PM11/7/13
to Yaakoub El Khamra, Melissa Romanus, Shantenu Jha, bigjob...@googlegroups.com
AMBER looks for its own variable called MKL_HOME, maybe in addition to
MKLROOT? I also tried mucking with LD_LIBRARY_PATH, but perhaps in the
wrong fashion.

The binary is:
/home1/01992/radakb/amber12/bin/sander

and

>>ldd `which sander` | grep libsml
libsvml.so =>
/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64/libsvml.so
(0x00002b216eb80000)


In any event, apparently running this in parallel DOES work for the
functionality I want, and that seems to run without error on Stampede. For
the present I will file this as "solved", although I am suspicious of the
AMBER code, independent of BigJob or Stampede; that's clearly not your
problem. If it turns out to not be doing what I want, then I may have to
revisit this.

Thanks,
Brian


> you want to edit LD_LIBRARY_PATH not set MKL_HOME. Also it is MKLROOT not
> MKL_HOME
>
> Can you point me to the binary please? Can you do a quick ldd on it,
> ldd <binary> |grep libsvml
>
> That should tell you which directory to add to LD_LIBRARY_PATH.
>
> Regards
> Yaakoub
>
>
>
> Regards
> Yaakoub El Khamra
Reply all
Reply to author
Forward
0 new messages