I am interested in the status of XCPU and the long term goals and
future plans for the project?
Thanks!
--
Greg M. Kurtzer
Chief Technology Officer
HPC Systems Architect
Infiscale, Inc. - http://www.infiscale.com
Hi Greg.
I am afraid that ssh rules. My feeling after 5 years of xcpu and 9
years of bproc is that people really want their ssh on a cluster. It
scales well enough for the small stuff (64 or less) that constitutes
most systems out there, and people don't care enough about scaling to
large systems. It gives them a familiar environment.
Note that the fastest machine on the planet, the Oak Ridge Jaguar
system, runs sshd on every node. I had xcpu running on my XT4, and
demo'd it, and it always came back to: "But it doesn't look like ssh".
I think any future job spawning system for clusters has to either be
ssh or feel enough like ssh that nobody knows the difference.
ron
-eric
Look and feel like SSH?? Are these the same people that don't use a
scheduler/resource manager?
I assume that this fate includes XGET...
Greg
--
Thanks,
Lucho
> Look and feel like SSH?? Are these the same people that don't use a
> scheduler/resource manager?
They're the same people who want to run emacs on 10,000 cluster nodes.
They're our customers, sadly, and they still do things like run
editors written in Fortran.
Greg, we need to get into a different business where people are more
open to change.
Henry Ford put it best, I paraphrase: "Had I listened to my customers
I would have made better buggy whips".
>
> I assume that this fate includes XGET...
Well, I am re-looking at beoboot. Beoboot did a really good job for
what we needed and scaled very well indeed.
I'm actually moving on a bit. We're looking at what it takes to boot
10M kernels and it's pretty clear to me that nothing we've done to
date is really up to snuff.
Thanks
ron
Another option would be to provide side-by-side options so that people
can transition at their leisure. SSH for those that want familiarity,
and something like XCPU for those that want to actually use their big
systems. Many LANL users eventually got used to our BProc systems,
but we should have made the transition much smoother for them (e.g.
run the system as lightweight BProc, but provide a full distro file
system and start up sshd on each node).
People don't expect to be able to ssh into individual cores, but they
unfortunately don't tend to think of clusters the same way. Unless it
is something like an Altix.
--
Andrew Shewmaker
-eric
> Can I ask the stupid question of what behavior is different with ssh?
> It seems like ssh is a subset of xcpu functionality so why couldn't
> that look/feel be provided side-by-side? (disclaimer: I don't
> understand why someone would want ssh functionality, but I don't see
> where xcpu falls short of that)
I think the core of compatibility is rsh actually. There are tons of
programs automation and other things that work by assuming you can
say <prefix cmd> <command to run on other machine>.
Eric
> I think the core of compatibility is rsh actually.
True. But even mentioning something like rsh in the USG nowadays is a
bad thing to do. So you say "ssh" and it bypasses all the
cybersecurity mental filters. "Oh, ssh, that's secure".
ron
I hate to say it but one thing missing is ptys.
ron
That's easy enough to emulate, what specifically? So you can run
vi/emacs over the xcpu connection?
-eric
> xcpu both hash rsh/ssh syntax compatibility and security. there is something else missing, i guess :)
I know what I missed in bproc was the ability to run basic shell
scripts. When I was looking at that I figured I could provide that
capability with 10M-20M of basic binaries, and much less if I used
busybox.
My gut feel says you have to demonstrate a lot of benefit to convince
people to leave their creature comforts behind.
Eric
xrx n[1-100] /etc/init.d/cups start
They were used to
ssh node cmd
and if cmd was a shell script,
ssh node script
Now on bproc (bpsh) we learned that asking people to do this instead:
./script
and change commands in script from:
command
to
bpsh node-list command
in essence, turn the script inside out,
was a big hurdle for many folks, and they did not like it, *even if it
was only one line to change*, and *even if it gave them 1000-fold or
greater performance improvement*. I am not making this up.
People want to ssh in and have a full system, with command history and
all that jazz. This has other implications.
And I hate to say it, but people here at SNL who run clusters for a
living have found xcpu hard to set up and use. Performance is still
disappointing and really lags bproc by quite a bit.
Setup difficulty was also true for bproc -- it had kernel footprint
and keeping it all working was pretty awful, and it was not able to
function with even minor heterogeneity, e.g. a geode and a P4 were not
usable as one bproc cluster.
No matter what, xcpu2 has to be as easy to set up and use as ssh, and
"different even if better" translates to "harder" for most people.
Anyway, still tired from travel but hope this is not too incoherent.
ron
yes but they support the type of things Eric B. is mentioning. I think
what Eric is saying is correct.
ron
Lucho
-eric
xcpu2 is the way forward. The remaining issues are getting it to be
more familiar to sysadmins on setup, performance, and so on. But it's
all doable.
ron
I think it would be useful to have some well defined requirements
here. What about it do sysadmins find difficult to configure?
My experience is there are two things that one needs to know how to do:
a) setup authentication
b) manage the machine list
ssh gets around (a) by using LDAP or some other local auth. xcpu2
could do something similar, but there are some folks that use xcpu2
locally specifically because they don't have to get involved with an
outside userid/authentication mechanism
ssh doesn't get around (b) either, but it does seem like it would be
nice to have some easier mechanism for maintaining this sort of
information. Something like zeroconf could help on a local network,
but many of the networks we deploy xcpu2 on are segmented.
I guess (b) can be got around by using another workload management
system on top of xcpu/ssh which handles the management/monitoring of
physical resources.
Something like a distributed registry with distributed auth could help
solve some of these problems, but you'll likely need to configure at
least one node per network segment with the right information (auth
server and registry server) and then everyone else could pick it up
with zero conf. Alternatively, you could point all nodes to a
hierarchical parent which would eventually fully connect a
hierarchical tree of nodes in a manner which the
auth/registry/monitoring information could be distributed.
-eric
> I think it would be useful to have some well defined requirements
> here. What about it do sysadmins find difficult to configure?
>
> My experience is there are two things that one needs to know how to do:
>
> a) setup authentication
> b) manage the machine list
>
> ssh gets around (a) by using LDAP or some other local auth. xcpu2
> could do something similar, but there are some folks that use xcpu2
> locally specifically because they don't have to get involved with an
> outside userid/authentication mechanism
At one point I was tasked to try out Scali MPI on a BProc cluster, but
its daemons required PAM. Torque and slurm also manage authentication
using PAM modules. Is there a module so those sorts of things can look
to xcpu2 for authentication? Or can xcpu2 use PAM auth instead of its
own?
--
Andrew Shewmaker
Actually, so you can run bash :/
-JE
-JE
I run bash with xcpu2 without a problem.
-eric
The unfortunate side of that is it requires shared distributed file
system or shared auth mechanisms be present which mean you require
something more than the drone systems we currently deploy with xcpu2
which are much easier to manage.
However, that being said, there is no reason (that I can see) why
xcpu2 couldn't support both sorts of environments.
-eric
We don't necessarily use a shared distributed file system for things
like system keys. Since they don't change often, we may put them into
a RAM root image and perhaps update them with a tree'd remote copy.
I want to clarify what I said before, since I combined authentication
and account authorization. In addition to something like ssh key
authentication, resource managers like torque use PAM to determine
which accounts are active on a node at a given time.
Now, I'm not particularly fond of any of the existing resource
managers, so I would be content if a scheduler (Moab in our case)
talked directly to xcpu2. We also need tight integration with MPI
implementations. Currently we have a situation where the resource
manager has to establish connections to all of the nodes in an
allocation, then MPI has to do the same sort of wireup. I understand
that it is non-trivial to get Open MPI to utilize xcpu.
--
Andrew Shewmaker
To me the big missing pieces were: scheduler and MPI. Even though mvapich was
kind of working, it never really got debugged enough. And the bjs port remained
buggy too. If those two had worked properly, the cluster would still
be running xcpu.
I am not managing it anymore, so it has gone to caoslinux with torque/maui.
I still have a couple of bproc clusters running...
Now we manage a 4,000 node cluster using moab, xCAT,
diskless/stateless, but with
a "real" os image on every node. It works, even if it is ugly. Enough said...
Daniel