Would an addprocs(machinefile, startflag=-p) make sense?

100 views
Skip to first unread message

Florian Oswald

unread,
Aug 5, 2014, 10:11:52 AM8/5/14
to juli...@googlegroups.com
I'm working on a cluster where you request the number of nodes (i.e. machines) you want, plus how many processors per machine. The machine file generated by the cluster manager (PBS in this case) looks something like

node1
node1
...
node1
node2
...
node2

i.e. within a node, there are no unique names for CPUs. I'm wondering how to best use addprocs in such a case (or . it seems what i really want is just one ssh connection (from node1 to node2, say), but then I want to launch julia on node2 as 

julia -p NumberOfCPUsOnNode2

I've been looking at multi.jl and was trying to figure out something along that line, but I could do with a bit of guidance. 
  1. is what I'm describing reasonable or what is a better way of working on a so set up cluster?
  2. what in multi.jl needs to be extended

gael....@gmail.com

unread,
Aug 10, 2014, 4:46:39 AM8/10/14
to juli...@googlegroups.com
If you got a certain number of nodes thanks to PBS, you really don't want to use only a part of them.

So you can either:

1- Ask PBS for two smaller jobs instead of one big and have julia use directly the machinefile with the suitable command line option.

2- Split your machinefile manually and then call julia twice with each of the new files.

Of course, instead of using the command line options, you can use the addproc function. If you need two cores on node2, you can call "addproc([node2, node2])".

gael....@gmail.com

unread,
Aug 10, 2014, 5:05:02 AM8/10/14
to juli...@googlegroups.com
Oh and yes, if you want to "split by node", you can read the machine file, count the number of cores for a given node and execute julia direcly through ssh in your qsub script with the required -p ncores option.
Reply all
Reply to author
Forward
0 new messages