Re: [Rocks-Discuss] ssh takes forever[SOLVED]

109 views

Skip to first unread message

tomisla...@gmx.com

unread,

Oct 21, 2010, 2:11:27 PM10/21/10

to npaci-rocks...@sdsc.edu

Hi everyone, thanks for the help.

I've followed the instructions and edited the /etc/ssh/sshd.conf and I've set the UseDNS to "no" and restarted the sshd in /etc/init.d/ with ./sshd restart, it works really fast now. :)

Tomislav

> ----- Original Message -----
> From: npaci-rocks-dis...@sdsc.edu
> Sent: 10/20/10 09:00 PM
> To: npaci-rocks...@sdsc.edu
> Subject: npaci-rocks-discussion Digest, Vol 51, Issue 20
>
> Send npaci-rocks-discussion mailing list submissions to
> npaci-rocks...@sdsc.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sdsc.edu/mailman/listinfo/npaci-rocks-discussion
> or, via email, send a message with subject or body 'help' to
> npaci-rocks-dis...@sdsc.edu
>
> You can reach the person managing the list at
> npaci-rocks-di...@sdsc.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of npaci-rocks-discussion digest..."
>
>
> Today's Topics:
>
> 1. Queue instance in Ss state (Doll, Margaret Ann)
> 2. Re: looking for simple ROCKS sysadmin training materials
> (jean-francois prieur)
> 3. Re: Queue instance in Ss state (Mike Hanby)
> 4. Dreaded "choose language" on the nodes (Yoon Tiem Leong)
> 5. Re: looking for simple ROCKS sysadmin training materials
> (Nick Holway)
> 6. Re: Queue instance in Ss state (Doll, Margaret Ann)
> 7. Re: Anyone use the ticket/share system on grid engine?
> (Steven Dick)
> 8. Re: Anyone use the ticket/share system on grid engine?
> (David Noriega)
> 9. Re: Processor limits for MX (Myrinet) (Ray Muno)
> 10. Re: Queue instance in Ss state (Ian Kaufman)
> 11. Re: Queue instance in Ss state (Doll, Margaret Ann)
> 12. Re: Processor limits for MX (Myrinet) (Philip Papadopoulos)
> 13. Re: ssh takes forever (Philip Papadopoulos)
> (tomisla...@gmx.com)
> 14. 3. Re: profiling the cluster (newb)(Scott L. Hamilton)
> (tomisla...@gmx.com)
> 15. Re: ssh takes forever (Philip Papadopoulos) (Joe Landman)
> 16. Re: ssh takes forever (Philip Papadopoulos) (Ian Kaufman)
> 17. Re: ssh takes forever (Philip Papadopoulos) (Kevin Doman)
> 18. Fully Qualified Domain Name Change (Jim Kress)
> 19. Re: ssh takes forever (Philip Papadopoulos) (Greg Bruno)
> 20. Call a shell script from extend-compute.xml
> (Karengin, Mr. Dean, Contractor, Code 7501.1)
> 21. Re: SGE, slots, and affinity (Lino Garc?a Tarr?s)
> 22. LDAP Authentication in ROCKS 5.3 x86_64
> (Karengin, Mr. Dean, Contractor, Code 7501.1)
> 23. Re: SGE, slots, and affinity (Noam Bernstein)
> 24. Re: Call a shell script from extend-compute.xml (Bart Brashers)
> 25. Re: Dreaded "choose language" on the nodes (Greg Bruno)
> 26. Re: Call a shell script from extend-compute.xml
> (Philip Papadopoulos)
> 27. Re: LDAP Authentication in ROCKS 5.3 x86_64 (Ian Kaufman)
> 28. Re: Dreaded "choose language" on the nodes (Bart Brashers)
> 29. Re: Call a shell script from extend-compute.xml (Larry Baker)
> 30. ganglia & profiling (tomisla...@gmx.com)
> 31. Re: Call a shell script from extend-compute.xml
> (Karengin, Mr. Dean, Contractor, Code 7501.1)
> 32. Re: Call a shell script from extend-compute.xml
> (Karengin, Mr. Dean, Contractor, Code 7501.1)
> 33. Re: ganglia & profiling (Greg Bruno)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 19 Oct 2010 15:14:39 -0400
> From: "Doll, Margaret Ann" <margar...@brown.edu>
> Subject: [Rocks-Discuss] Queue instance in Ss state
> To: ROCKS <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTi=JLu-ZQz0LwrJ16aRMfkYFJWeTcJfzN6M=T5...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I have tried to remove the Ss status from the queue instance by using qmon
> and clicking on force and resume. That does not change the status. I
> rebooted the host on which the host exists; that did not resume the queue.
>
> There are no jobs in the queue instance.
>
> I am running ROCKS 5.0 , Centos 2.6.18-53.1.14.el5, Grid Engine 6.1u4
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101019/4d745518/attachment.html
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 19 Oct 2010 16:16:54 -0400
> From: jean-francois prieur <jfpr...@gmail.com>
> Subject: Re: [Rocks-Discuss] looking for simple ROCKS sysadmin
> training materials
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTim-jb5Gp=QjBUW12dQHUBEAe...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I have nothing useful to add ;) but I just wanted to thank you for your SGE
> training presentations that you made public. Very useful resource.
>
> Regards,
> JF Prieur
>
> On 19 October 2010 14:09, Chris Dagdigian <d...@sonsorol.org> wrote:
>
> >
> > Just wanted to thank everyone for the rapid feedback, in particular Phil
> > and others who pointed me to the ROCKS-A-Palooza presentations - I'm going
> > to use these materials to drive a basic half-day hackfest/workshop on ROCKS
> > admin using a lot of interactive and "How do I do X?" type examples on an
> > active cluster.
> >
> > Regards,
> > Chris
> >
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101019/e60eb742/attachment.html
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 19 Oct 2010 17:57:25 -0500
> From: Mike Hanby <mha...@uab.edu>
> Subject: Re: [Rocks-Discuss] Queue instance in Ss state
> To: "'Discussion of Rocks Clusters'" <npaci-rocks...@sdsc.edu>
> Message-ID:
> <A72C1C64C331B445A593C...@UABEXMBS3.ad.uab.edu>
> Content-Type: text/plain; charset="us-ascii"
>
> does the following command resolve the issue (example, say it's compute-0-1)
>
> qmod -usq all.q@compute-0-1
>
>
>
> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Doll, Margaret Ann
> Sent: Tuesday, October 19, 2010 2:15 PM
> To: ROCKS
> Subject: [Rocks-Discuss] Queue instance in Ss state
>
> I have tried to remove the Ss status from the queue instance by using qmon
> and clicking on force and resume. That does not change the status. I
> rebooted the host on which the host exists; that did not resume the queue.
>
> There are no jobs in the queue instance.
>
> I am running ROCKS 5.0 , Centos 2.6.18-53.1.14.el5, Grid Engine 6.1u4
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101019/4d745518/attachment.html
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 20 Oct 2010 10:41:11 +0800
> From: Yoon Tiem Leong <tly...@usm.my>
> Subject: [Rocks-Discuss] Dreaded "choose language" on the nodes
> To: "npaci-rocks...@sdsc.edu"
> <npaci-rocks...@sdsc.edu>
> Message-ID: <EF015C6CB58956498F829...@MBX1.usm.my>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I experienced the similar problem too. It seems that the compute node can't find the kickstart from the frontent. What I did was to enable the https in the frontend (check https in System -> Security and Firewall) to see if the compute node can find the kickstart key again. It works for me.
>
> tl
>
> >Its not quite the problem I had, but when I had a 'choose language'
> >problema while kickstarting my nodes, it was something about the path
> >of the img files in the frontend node.
>
> On Mon, Sep 20, 2010 at 10:40 AM, Frank Bures <lisfrank at chem.toronto.edu> wrote:
> > After a power loss I am running into the dreaded "choose language"
> > situation on all nodes.
> >
> > I tried to delete nodes in import-ethers and then re-register them, no effect.
> >
> > I tried "xmllint -noout" on all xml files in
> > /export/rocks/install/site-profiles/5.3/nodes/
> > The files are clean.
> >
> > rocks list host profile compute-0-2
> >
> > returns
> >
> > Traceback (most recent call last):
> > File "/opt/rocks/bin/rocks", line 264, in ?
> > command.runWrapper(name, args[i:])
> > File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
> > line 1774, in runWrapper
> > self.run(self._params, self._args)
> > File
> > "/opt/rocks/lib/python2.4/site-packages/rocks/commands/list/host/profile/__init__.py",
> > line 273, in run
> > [
> > File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
> > line 1467, in command
> > o.runWrapper(name, args)
> > File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
> > line 1774, in runWrapper
> > self.run(self._params, self._args)
> > File
> > "/opt/rocks/lib/python2.4/site-packages/rocks/commands/list/host/xml/__init__.py",
> > line 189, in run
> > xml = self.command('list.node.xml', args)
> > File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
> > line 1467, in command
> > o.runWrapper(name, args)
> > File "/opt/rocks/lib/python2.4/site-packages/rocks/commands/__init__.py",
> > line 1774, in runWrapper
> > self.run(self._params, self._args)
> > File
> > "/opt/rocks/lib/python2.4/site-packages/rocks/commands/list/node/xml/__init__.py",
> > line 511, in run
> > handler.parseNode(node, doEval)
> > File "/opt/rocks/lib/python2.4/site-packages/rocks/profile.py", line 388,
> > in parseNode
> > xml = handler.getXML()
> > File "/opt/rocks/lib/python2.4/site-packages/rocks/profile.py", line 916,
> > in getXML
> > return self.getXMLHeader() + string.join(self.xml, '')
> > File "/opt/rocks/lib/python2.4/string.py", line 318, in join
> > return sep.join(words)
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 50:
> > ordinal not in range(128)
> >
> > Any help would be greatly appreciated.
> >
> > Thanks
> > Frank
>
>
> ------------------------------
>
> Message: 5
> Date: Wed, 20 Oct 2010 08:48:31 +0100
> From: Nick Holway <nick....@gmail.com>
> Subject: Re: [Rocks-Discuss] looking for simple ROCKS sysadmin
> training materials
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTikBaq1v-X3s=0NF0YS9qw8Lrm...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On 19 October 2010 19:09, Chris Dagdigian <d...@sonsorol.org> wrote:
> >
> > Just wanted to thank everyone for the rapid feedback, in particular Phil and
> > others who pointed me to the ROCKS-A-Palooza presentations - I'm going to
> > use these materials to drive a basic half-day hackfest/workshop on ROCKS
> > admin using a lot of interactive and "How do I do X?" type examples on an
> > active cluster.
>
> Has anyone pointed you in the direction of Rocks' wiki at
> https://wiki.rocksclusters.org/wiki/index.php/Main_Page ?
>
> Nick
>
>
> ------------------------------
>
> Message: 6
> Date: Wed, 20 Oct 2010 08:37:45 -0400
> From: "Doll, Margaret Ann" <margar...@brown.edu>
> Subject: Re: [Rocks-Discuss] Queue instance in Ss state
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTikLLUk5mb08Q53HA...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> No. There was no change in the status of the queue instance. This queue
> instance in the het queue is part of a subordinate queue het-2hr.
>
> [root@ted g03]# qmod -usq het@compute-0-31
> [root@ted g03]# qmod -usq h...@compute-0-31.local
> [root@ted g03]# qmod -usq het...@compute-0-31.local
> Queue instance "het...@compute-0-31.local" is already in the specified
> state: unsuspended
> [root@ted g03]# qmod -usq h...@compute-0-31.local
>
>
> On Tue, Oct 19, 2010 at 6:57 PM, Mike Hanby <mha...@uab.edu> wrote:
>
> > does the following command resolve the issue (example, say it's
> > compute-0-1)
> >
> > qmod -usq all.q@compute-0-1
> >
> >
> >
> > -----Original Message-----
> > From: npaci-rocks-dis...@sdsc.edu [mailto:
> > npaci-rocks-dis...@sdsc.edu] On Behalf Of Doll, Margaret Ann
> > Sent: Tuesday, October 19, 2010 2:15 PM
> > To: ROCKS
> > Subject: [Rocks-Discuss] Queue instance in Ss state
> >
> > I have tried to remove the Ss status from the queue instance by using qmon
> > and clicking on force and resume. That does not change the status. I
> > rebooted the host on which the host exists; that did not resume the queue.
> >
> > There are no jobs in the queue instance.
> >
> > I am running ROCKS 5.0 , Centos 2.6.18-53.1.14.el5, Grid Engine 6.1u4
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101019/4d745518/attachment.html
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101020/54f95f75/attachment.html
>
>
> ------------------------------
>
> Message: 7
> Date: Wed, 20 Oct 2010 08:51:39 -0400
> From: Steven Dick <kg4...@gmail.com>
> Subject: Re: [Rocks-Discuss] Anyone use the ticket/share system on
> grid engine?
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTinWiCxBAdN3BcJHv...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I've played with all the share balancing methods with SGE, and have put
> weights in all of them.
>
> Share tree is the most flexible, but seems to only account for jobs that
> have completed, not ones currently running.
> Its kind of hard to tell for sure, as it _is_ based on a running average.
>
> Functional is based only on currently running jobs, so it balances share
> tree nicely.
>
> I also use quotas, to put a max limit on users, so that one user can't take
> over the whole cluster with a job that takes a week to finish.
>
> I also set up a "short" queue, which limits jobs to one day, but bypasses
> all of the quotas, and relies on the share tree to still keep things fair.
> I have a quota set up so that there's always at least one node available to
> run short jobs.
>
> This seems to keep my users happy. Everyone gets a fair chance, one user
> can't hog the whole cluster just because they got there first. Even when
> the cluster is full, and there's a big backlog of pending jobs, a user who
> hasn't run anything in a week typically can still get at least one job
> started in a day or so (because their job will run next when someone else's
> completes). Users running test jobs and debug jobs before they do the real
> job usually don't have to wait at all as long as they remember to use the
> short queue.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101020/81ce7447/attachment.html
>
>
> ------------------------------
>
> Message: 8
> Date: Wed, 20 Oct 2010 09:41:19 -0500
> From: David Noriega <tsk...@my.utsa.edu>
> Subject: Re: [Rocks-Discuss] Anyone use the ticket/share system on
> grid engine?
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTimDXCEWLNvMfH1RKOPW9GTDtsN1Z39==i4U...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Your setup is the one we've been thinking about. Can you describe how
> you setup your 'short' queue? Was that by using Limits?
>
> But functional looks like the way I'll go. I suppose I'll setup a
> default project some amount of shares, while those who requests
> priority could be part of another project with more shares. Am I
> correct in my thinking?
>
>
>
> On Wed, Oct 20, 2010 at 7:51 AM, Steven Dick <kg4...@gmail.com> wrote:
> > I've played with all the share balancing methods with SGE, and have put
> > weights in all of them.
> >
> > Share tree is the most flexible, but seems to only account for jobs that
> > have completed, not ones currently running.
> > Its kind of hard to tell for sure, as it _is_ based on a running average.
> >
> > Functional is based only on currently running jobs, so it balances share
> > tree nicely.
> >
> > I also use quotas, to put a max limit on users, so that one user can't take
> > over the whole cluster with a job that takes a week to finish.
> >
> > I also set up a "short" queue, which limits jobs to one day, but bypasses
> > all of the quotas, and relies on the share tree to still keep things fair.
> > I have a quota set up so that there's always at least one node available to
> > run short jobs.
> >
> > This seems to keep my users happy. ?Everyone gets a fair chance, one user
> > can't hog the whole cluster just because they got there first. ?Even when
> > the cluster is full, and there's a big backlog of pending jobs, a user who
> > hasn't run anything in a week typically can still get at least one job
> > started in a day or so (because their job will run next when someone else's
> > completes). ?Users running test jobs and debug jobs before they do the real
> > job usually don't have to wait at all as long as they remember to use the
> > short queue.
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101020/81ce7447/attachment.html
> >
>
>
>
> --
> Personally, I liked the university. They gave us money and facilities,
> we didn't have to produce anything! You've never been out of college!
> You don't know what it's like out there! I've worked in the private
> sector. They expect results. -Ray Ghostbusters
>
>
> ------------------------------
>
> Message: 9
> Date: Wed, 20 Oct 2010 10:23:12 -0500
> From: Ray Muno <mu...@aem.umn.edu>
> Subject: Re: [Rocks-Discuss] Processor limits for MX (Myrinet)
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID: <4CBF0960...@aem.umn.edu>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On 10/19/2010 10:02 AM, Philip Papadopoulos wrote:
> > We needed/wanted to build an mx roll for ourselves in Triton.
> > We (obviously) can't give you the MX source code, but you can download the
> > current roll source, drop in your version of MX (edit version files
> > appropriately),
> > and build a new roll.
> >
> > web site for this roll source is
> > http://git.rocksclusters.org/cgi-bin/gitweb.cgi?p=triton/myrinet_mx.git/.git;a=summary
> >
> > to download, you will need to use git. This is an experimental site for us
> > to supply unsupported roll sources, You will need to install git and then
> > point at this repository for
> > a simple download (we're working on providing source tarballs, too).
> >
> > -P
> >
>
> OK, I have the above mentioned Roll template.
>
> I can build a Roll, following the docs I find on line. I am ending up
> with a Roll iso but all it is doing, in the end, is making an RPM that
> installs the source tarball in /opt/mx.
>
> I do get the proper configs for startup on each of the nodes (to set
> mx_max_endpoints and mx_max_nodes. At least I got that part right.
>
> This is my first adventure in to building a custom roll. It is a bit of
> a misadveture at this point. I must be missing something obvious...
>
> The Myrinet source is an easy build, (configure, make, make install).
> --
>
> Ray Muno
> University of Minnesota
>
>
> ------------------------------
>
> Message: 10
> Date: Wed, 20 Oct 2010 08:33:09 -0700
> From: Ian Kaufman <ikau...@soe.ucsd.edu>
> Subject: Re: [Rocks-Discuss] Queue instance in Ss state
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTi=b8vWq9xTJTdkPD2z-jN_eFDeD=CHX_yZ=RQ...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Have you tried:
>
> qmod -f -usq h...@compute-0-31.local
>
> Ian
>
> --
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>
>
> ------------------------------
>
> Message: 11
> Date: Wed, 20 Oct 2010 12:11:34 -0400
> From: "Doll, Margaret Ann" <margar...@brown.edu>
> Subject: Re: [Rocks-Discuss] Queue instance in Ss state
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTin=M1ho8NDEOY-JvioTCm...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I had tried the force in qmon.
>
> When I ran your suggestion, there was no change in the status of the queue
> instance.
>
> On Wed, Oct 20, 2010 at 11:33 AM, Ian Kaufman <ikau...@soe.ucsd.edu> wrote:
>
> > Have you tried:
> >
> > qmod -f -usq h...@compute-0-31.local
> >
> > Ian
> >
> > --
> > Ian Kaufman
> > Research Systems Administrator
> > UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101020/3aec70d6/attachment.html
>
>
> ------------------------------
>
> Message: 12
> Date: Wed, 20 Oct 2010 09:25:22 -0700
> From: Philip Papadopoulos <philip.pa...@gmail.com>
> Subject: Re: [Rocks-Discuss] Processor limits for MX (Myrinet)
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTi=qd9MBk42a-tEZqz+ZL...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Wed, Oct 20, 2010 at 8:23 AM, Ray Muno <mu...@aem.umn.edu> wrote:
>
> > On 10/19/2010 10:02 AM, Philip Papadopoulos wrote:
> > > We needed/wanted to build an mx roll for ourselves in Triton.
> > > We (obviously) can't give you the MX source code, but you can download
> > the
> > > current roll source, drop in your version of MX (edit version files
> > > appropriately),
> > > and build a new roll.
> > >
> > > web site for this roll source is
> > >
> > http://git.rocksclusters.org/cgi-bin/gitweb.cgi?p=triton/myrinet_mx.git/.git;a=summary
> > >
> > > to download, you will need to use git. This is an experimental site for
> > us
> > > to supply unsupported roll sources, You will need to install git and then
> > > point at this repository for
> > > a simple download (we're working on providing source tarballs, too).
> > >
> > > -P
> > >
> >
> > OK, I have the above mentioned Roll template.
> >
> > I can build a Roll, following the docs I find on line. I am ending up
> > with a Roll iso but all it is doing, in the end, is making an RPM that
> > installs the source tarball in /opt/mx.
> >
> > I do get the proper configs for startup on each of the nodes (to set
> > mx_max_endpoints and mx_max_nodes. At least I got that part right.
> >
> > This is my first adventure in to building a custom roll. It is a bit of
> > a misadveture at this point. I must be missing something obvious...
> >
> > The Myrinet source is an easy build, (configure, make, make install).
> > --
> >
> > Ray Muno
> > University of Minnesota
> >
> >
> You to add the roll, recreate the distribution so that the new code will be
> used,
> and then apply the contents of roll. The roll is not just putting a source
> tarball in /opt/mx,
> it is also building a first-boot script in /etc/rc.d/rocksconfig.d, that
> rebuilds MX (and the driver)
> against your running kernel.
>
>
> # rocks add roll myrinet_mx*iso
> # cd /export/rocks/install
> # rocks create distro
>
> To rebuild the driver on the frontend
> # rocks run roll myrinet_mx | sh
> # shutdown -r now
>
> To rebuild a node
> # ssh <node> /boot/kickstart/cluster-kickstart-pxe
>
>
> -P
>
>
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
> 858-822-3628 (Ofc)
> 619-331-2990 (Fax)
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101020/710a2611/attachment.html
>
>
> ------------------------------
>
> Message: 13
> Date: Wed, 20 Oct 2010 18:26:40 +0200
> From: "tomisla...@gmx.com" <tomisla...@gmx.com>
> Subject: Re: [Rocks-Discuss] ssh takes forever (Philip Papadopoulos)
> To: npaci-rocks...@sdsc.edu
> Message-ID: <2010102016...@gmx.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Philip,
>
> thanks for the advice, I have tried it, and the login time climed up to 31s:
>
> [foam@showme ~]$ unset DISPLAY
> [foam@showme ~]$ time ssh compute-0-0 "ls"
> Desktop
> Documents
> OpenFOAM
>
> real ? ?0m31.019s
> user ? ?0m0.018s
> sys ? ? 0m0.005s
> [foam@showme ~]$
>
> Any further advice on the ssh issue? I should run very long simulations just to account for the possible changes in the login time, if I am to time the runs for profiling. Let's hope the ganglia has the answer I'm looking for: the actual computing time of the process. I don't want the preparations or the write time in my data, but it would be nice if the ssh didn't take so long, then I could neglect it with respect to the computational time.
>
> Best regards,
> Tomislav Maric
>
> Message: 6
> Date: Sun, 17 Oct 2010 11:08:01 -0700
> From: Philip Papadopoulos <philip.pa...@gmail.com>
> Subject: Re: [Rocks-Discuss] ssh takes forever
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTik8kw-c6CNZk2-_r...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> try
> unset DISPLAY
> and then try the ssh.
> -P
>
>
> ------------------------------
>
> Message: 14
> Date: Wed, 20 Oct 2010 18:32:45 +0200
> From: "tomisla...@gmx.com" <tomisla...@gmx.com>
> Subject: [Rocks-Discuss] 3. Re: profiling the cluster (newb)(Scott L.
> Hamilton)
> To: npaci-rocks...@sdsc.edu
> Message-ID: <2010102016...@gmx.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Scott,
>
> it seems to be a very interesting project, thanks a lot for the advice.
>
> Since I've bought the PCs I have already, and because I would like to develop/test OpenMP parallelisation of OpenFOAM in the future, I'll stick to my quad core boxes for now.
>
> Still, I've bookmarked the link. :)
>
> Thanks,
> Tomislav Maric
>
> > ----- Original Message -----
> > From: npaci-rocks-dis...@sdsc.edu
> > Sent: 10/18/10 09:00 PM
> > To: npaci-rocks...@sdsc.edu
> > Subject: npaci-rocks-discussion Digest, Vol 51, Issue 18
> >
> > Send npaci-rocks-discussion mailing list submissions to
> > npaci-rocks...@sdsc.edu
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > https://lists.sdsc.edu/mailman/listinfo/npaci-rocks-discussion
> > or, via email, send a message with subject or body 'help' to
> > npaci-rocks-dis...@sdsc.edu
> >
> > You can reach the person managing the list at
> > npaci-rocks-di...@sdsc.edu
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of npaci-rocks-discussion digest..."
> >
> >
> > Today's Topics:
> >
> > ?1. #cores of front-end in /proc/cpuinfo differs from #cores
> > ?that ganglia presents (Nastou Panagiotis)
> > ?2. SGE: Nodes in E state (pooja gupta)
> > ?3. Re: profiling the cluster (newb) (Tim Carlson)
> > ?(Scott L. Hamilton)
> > ?4. Re: Processor limits for MX (Myrinet) (Raymond Muno)
> > ?5. Re: #cores of front-end in /proc/cpuinfo differs from #cores
> > ?that ganglia presents (Bart Brashers)
> > ?6. Re: SGE: Nodes in E state (Mike Hanby)
> > ?7. ??: #cores of front-end in /proc/cpuinfo differs from
> > ?#cores that ganglia presents (Nastou Panagiotis)
> > ?8. Re: ??: #cores of front-end in /proc/cpuinfo differs from
> > ?#cores that ganglia presents (Bart Brashers)
> > ?9. Dacapo and Rocks cluster (Tadeu Leonardo Soares e Silva)
> > ?10. two clusters on one network with the same Private IP
> > ?(Edsall, William (WJ) )
> > ?11. Re: two clusters on one network with the same Private IP
> > ?(Lloyd Brown)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 18 Oct 2010 14:43:59 +0300
> > From: Nastou Panagiotis <pna...@aegean.gr>
> > Subject: [Rocks-Discuss] #cores of front-end in /proc/cpuinfo differs
> > from #cores that ganglia presents
> > To: "npaci-rocks...@sdsc.edu"
> > <npaci-rocks...@sdsc.edu>
> > Message-ID:
> > <6AEE2EE3778E1241B57804...@hermes.aegean.gr>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > Hi,
> >
> > Our cluster (a front-end with 5 compute nodes) is based on Rocks 5.2. Our front-end is an HP Proliant ML350 with 2 Dual Core Intel Xeon processors.
> > Looking in /proc/cpuinfo, I noticed that only one processor presents instead of 4. The same info, I get using top. This problem does not appear in compute nodes.
> > But, ganglia shows that the front-end has 4 processors.
> >
> > During reboot no error messages appeared but I noticed that Intel Virtual Technology is disabled.
> > How can I fix it so as /proc/cpuinfo to show the same number of processors with ganglia? Any suggestions?
> >
> > Regards,
> >
> > Panagiotis Nastou
> > Dpt of Mathematics
> > Aegean University
> > Karlovasi, Samos
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Mon, 18 Oct 2010 06:06:00 -0700
> > From: pooja gupta <po...@fnal.gov>
> > Subject: [Rocks-Discuss] SGE: Nodes in E state
> > To: npaci-rocks...@sdsc.edu
> > Message-ID:
> > <AANLkTikZuVJ53kL8FKugg...@mail.gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Dear All,
> >
> > We recently updated RAM on our compute nodes. qhost command shows the RAM
> > is increased. We also have our head node as execute node. But all my nodes
> > are in permanent E state. Please have a look at the output below. We are not
> > able to qlogin also. The output of the qlogin command is:
> >
> > Your job 1739 ("QLOGIN") has been submitted
> > waiting for interactive job to be scheduled ...
> >
> > Your "qlogin" request could not be scheduled, try again later.
> > error: error shutting down the connection: undefined commlib error code
> >
> >
> > We have stopped and restarted SGE many times but it didn't help. Please help
> > us in this regard.
> > We would highly appreciate your suggestions.
> > Many thanks in advance.
> >
> > With best regards,
> > Pooja
> >
> >
> >
> > =======================================================
> >
> > queuename qtype resv/used/tot. load_avg arch
> > states
> > ---------------------------------------------------------------------------------
> > al...@compute-0-0.local BIP 0/0/16 0.00 lx26-amd64 E
> > ---------------------------------------------------------------------------------
> > al...@compute-0-1.local BIP 0/0/16 0.00 lx26-amd64 E
> > ---------------------------------------------------------------------------------
> > al...@compute-0-2.local BIP 0/0/16 0.00 lx26-amd64 E
> > ---------------------------------------------------------------------------------
> > al...@compute-0-3.local BIP 0/0/16 0.02 lx26-amd64 E
> > ---------------------------------------------------------------------------------
> > al...@nanda.local BIP 0/0/8 0.00 lx26-amd64 E
> >
> > =======================================================
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101018/41f2f232/attachment.html
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Mon, 18 Oct 2010 08:41:49 -0500
> > From: "Scott L. Hamilton" <hamil...@mst.edu>
> > Subject: Re: [Rocks-Discuss] profiling the cluster (newb) (Tim
> > Carlson)
> > To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> > Message-ID: <4CBC4E9D...@mst.edu>
> > Content-Type: text/plain; charset=UTF-8; format=flowed
> >
> > Tomislav,
> >
> > I have a recommendation for you. I meet a group at SuperComputing that
> > built mini-clusters for classroom use from single board computers.
> > These are perfect for home project and software development use. They
> > are relatively inexpensive and look like a really fun project to build.
> > Check out http://littlefe.net/ for more information.
> >
> > Scott
> >
> > On 10/17/2010 06:58 AM, tomisla...@gmx.com wrote:
> > > Thanks a lot for the advice.
> > >
> > > The N is very very small (it's a home project), so I'm dealing with 8 cores on 2 nodes. It's a miniature thing, just 2 Quad Core processors, one per each node. As for the HSI and the production level server boards, I'm afraid this will remain out of my range indefinitely. I'm keeping this little thing at home, so I can expect to have maybe 8-10 nodes made out of regular PCs at most. It's supposed to serve as a development platform for my codes, and to test the parameters on mid-sized cases so that I can avoid the real simulation crashing on the real cluster, 10 minutes after I start it.
> > >
> > > The mpirun command takes forever, but the case is very very small, it's a tutorial case (2D CFD lid driven cavity case, with just 400 volumes). It's ridiculous to run this in parallel, I know, but I just used for a first try. I will generate a normal case (3D box for uncompressible flow, basically the same thing, but way more coarse) and try running it with increased volume densities on multiple cores to see if the time needed for the mpirun to execute will change.
> > >
> > > Thanks,
> > > Tomislav Maric
> > >
> > >
> > >> ----- Original Message -----
> > >> From: npaci-rocks-dis...@sdsc.edu
> > >> Sent: 10/14/10 09:00 PM
> > >> To: npaci-rocks...@sdsc.edu
> > >> Subject: npaci-rocks-discussion Digest, Vol 51, Issue 14
> > >>
> > >> Send npaci-rocks-discussion mailing list submissions to
> > >> npaci-rocks...@sdsc.edu
> > >>
> > >> To subscribe or unsubscribe via the World Wide Web, visit
> > >> https://lists.sdsc.edu/mailman/listinfo/npaci-rocks-discussion
> > >> or, via email, send a message with subject or body 'help' to
> > >> npaci-rocks-dis...@sdsc.edu
> > >>
> > >> You can reach the person managing the list at
> > >> npaci-rocks-di...@sdsc.edu
> > >>
> > >> When replying, please edit your Subject line so it is more specific
> > >> than "Re: Contents of npaci-rocks-discussion digest..."
> > >>
> > >>
> > >> Today's Topics:
> > >>
> > >> 1. Re: Node stuck at 'starting install process' (Alex H)
> > >> 2. Re: Strange NFS lsof output (Jeremy Mann)
> > >> 3. Re: Remote cluster access - wordpress: Does anyone have a
> > >> clue? (Jim Kress)
> > >> 4. Re: Remote cluster access - wordpress: Does anyonehave a
> > >> clue? (Gladu, Charles)
> > >> 5. some questions of Xen Roll agin (??)
> > >> 6. Re: some questions of Xen Roll agin (Greg Bruno)
> > >> 7. Re: Utilising Front-end for Jobs (Richard Chang)
> > >> 8. Re: Utilising Front-end for Jobs (Roy Dragseth)
> > >> 9. New appliance's XML to copy a library - Node doesn't show on
> > >> SGE. (Pablo Barrio)
> > >> 10. SGE, slots, and affinity (Noam Bernstein)
> > >> 11. profiling the cluster (newb) (tomisla...@gmx.com)
> > >> 12. install ganglia roll (Paula R T Coelho)
> > >> 13. Re: New appliance's XML to copy a library - Node doesn't
> > >> show on SGE. (Greg Bruno)
> > >> 14. Re: Utilising Front-end for Jobs (Bart Brashers)
> > >> 15. Re: profiling the cluster (newb) (Tim Carlson)
> > >>
> > >>
> > >> ----------------------------------------------------------------------
> > >>
> > >> Message: 1
> > >> Date: Wed, 13 Oct 2010 12:18:10 -0700
> > >> From: Alex H<alexon...@gmail.com>
> > >> Subject: Re: [Rocks-Discuss] Node stuck at 'starting install process'
> > >> To: Discussion of Rocks Clusters<npaci-rocks...@sdsc.edu>
> > >> Message-ID:
> > >> <AANLkTimph4GMid_xxVWGh...@mail.gmail.com>
> > >> Content-Type: text/plain; charset="iso-8859-1"
> > >>
> > >> Hi Greg,
> > >>
> > >> I did try
> > >>
> > >> #cd /export/rocks/install
> > >> #rocks create distro
> > >>
> > >> but with no luck.
> > >>
> > >> Also, under Ctrl-Alt-F4 I see these messages repeated at different times:
> > >>
> > >> lighttpd[534]: (mod_fastcgi.c.2568) unexpected end-of-file (perhaps the
> > >> fastcgi process died): pid: 886 socket: unix:/tmp/fastcgi.socket-3
> > >> lighttpd[534]: (mod_fastcgi.c.3356) response not received, request setn: 793
> > >> on socket: unix:/tmp/fastcgi.socket-3 for
> > >> /tracker/tracker-client?filename=/install/rocks-dist/x86_64/RedHat/RPMS/libXdcmp-1.01-2.1.x86_64.rpm,
> > >> closing connection
> > >> lighttpd[534]: (connections.c.1228) connection closed: poll() -> ERR 16
> > >>
> > >> I appreciate the help.. any other ideas?
> > >>
> > >> Alex
> > >>
> > >>
> > >> On Wed, Oct 13, 2010 at 11:40 AM, Greg Bruno<greg....@gmail.com> wrote:
> > >>
> > >>
> > >>> On Wed, Oct 13, 2010 at 11:23 AM, Alex H<alexon...@gmail.com> wrote:
> > >>>
> > >>>> Hi Bart,
> > >>>>
> > >>>> Thanks for the quick reply. After Ctrl-Alt-F1 -F2 -F3 -F4 I get this
> > >>>>
> > >>> message
> > >>>
> > >>>> repeating:
> > >>>>
> > >>>> INFO: ROCKS: DownloadHeader: calling _handlefailure: 2
> > >>>> WARNING: Try 1/10 for
> > >>>>
> > >>>>
> > >>> http://127.0.0.1/install/rocks-dist/x86_64/RedHat/RPMS/libXdmcp-1.0.1-2.1.x86_64.rpm
> > >>>
> > >>>> from mirror 1/1
> > >>>>
> > >>>> By the way, I'm using Rocks 5.4 Beta.
> > >>>>
> > >>> try rebuilding your distro:
> > >>>
> > >>> # cd /export/rocks/install
> > >>> # rocks create distro
> > >>>
> > >>> then try to reinstall the login node.
> > >>>
> > >>> - gb
> > >>>
> > >>>
> > >> -------------- next part --------------
> > >> An HTML attachment was scrubbed...
> > >> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101013/8e78a75c/attachment.html
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 2
> > >> Date: Wed, 13 Oct 2010 15:48:35 -0500
> > >> From: Jeremy Mann<jerem...@gmail.com>
> > >> Subject: Re: [Rocks-Discuss] Strange NFS lsof output
> > >> To: Discussion of Rocks Clusters<npaci-rocks...@sdsc.edu>
> > >> Message-ID:
> > >> <AANLkTinQ2XgPR3O6DLd96...@mail.gmail.com>
> > >> Content-Type: text/plain; charset="iso-8859-1"
> > >>
> > >> I had this happen after migrating two systems into one. Two users had the
> > >> same UID and the top one took precedence over the second.
> > >> On Oct 13, 2010 1:52 PM, "Gustavo Berman"<gustav...@gmail.com> wrote:
> > >>
> > >>> 2010/10/13 Greg Bruno<greg....@gmail.com>
> > >>>
> > >>>
> > >>>> it looks like the permissions on your users' home areas allow all
> > >>>> users to access files from other user accounts.
> > >>>>
> > >>>> perhaps your users have opened their permissions in order share files.
> > >>>>
> > >>>>
> > >>>>
> > >>> At the frontend, the user's home directories has 700 permissions
> > >>> So I don't think that's the problem.
> > >>> -------------- next part --------------
> > >>> An HTML attachment was scrubbed...
> > >>> URL:
> > >>>
> > >> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101013/84e1e673/attachment.html
> > >> -------------- next part --------------
> > >> An HTML attachment was scrubbed...
> > >> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101013/54214dac/attachment.html
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 3
> > >> Date: Wed, 13 Oct 2010 18:03:07 -0400
> > >> From: "Jim Kress"<jimkr...@kressworks.org>
> > >> Subject: Re: [Rocks-Discuss] Remote cluster access - wordpress: Does
> > >> anyone have a clue?
> > >> To:<npaci-rocks...@sdsc.edu>
> > >> Message-ID:<000001cb6b22$6f0f39c0$4d2dad40$@org>
> > >> Content-Type: text/plain; charset="us-ascii"
> > >>
> > >> Thanks guys. I've setup my router so that the kwfoundation.cluster.hpc.org
> > >> Hostname Is permanently assigned to 192.168.0.248. However, that seems to
> > >> not solve the problem.
> > >>
> > >>
> > >>
> > >> When I ping 192.168.0.248 (from my Windows 7 pro x64) machine I get:
> > >>
> > >>
> > >>
> > >> PS C:\ff71g> ping 192.168.0.248
> > >>
> > >>
> > >>
> > >> Pinging 192.168.0.248 with 32 bytes of data:
> > >>
> > >> Reply from 192.168.0.248: bytes=32 time<1ms TTL=64
> > >>
> > >> Reply from 192.168.0.248: bytes=32 time<1ms TTL=64
> > >>
> > >> Reply from 192.168.0.248: bytes=32 time<1ms TTL=64
> > >>
> > >> Reply from 192.168.0.248: bytes=32 time<1ms TTL=64
> > >>
> > >>
> > >>
> > >> Ping statistics for 192.168.0.248:
> > >>
> > >> Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
> > >>
> > >> Approximate round trip times in milli-seconds:
> > >>
> > >> Minimum = 0ms, Maximum = 0ms, Average = 0ms
> > >>
> > >>
> > >>
> > >> However, when I ping
> > >>
> > >>
> > >>
> > >> PS C:\ff71g> ping kwfoundation.cluster.hpc.org
> > >>
> > >>
> > >>
> > >> Pinging kwfoundation.cluster.hpc.org [64.158.56.56] with 32 bytes of data:
> > >>
> > >> Reply from 64.158.56.56: bytes=32 time=42ms TTL=53
> > >>
> > >> Reply from 64.158.56.56: bytes=32 time=60ms TTL=53
> > >>
> > >> Reply from 64.158.56.56: bytes=32 time=46ms TTL=53
> > >>
> > >> Reply from 64.158.56.56: bytes=32 time=46ms TTL=53
> > >>
> > >>
> > >>
> > >> Ping statistics for 64.158.56.56:
> > >>
> > >> Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
> > >>
> > >> Approximate round trip times in milli-seconds:
> > >>
> > >> Minimum = 42ms, Maximum = 60ms, Average = 48ms
> > >>
> > >>
> > >>
> > >> This also happens when I permanently assign (in my router)
> > >> KWFoundation.cluster.hpc.org to 192.168.0.248 and then ping
> > >> KWFoundation.cluster.hpc.org.
> > >>
> > >>
> > >>
> > >> 64.158.56.56 is some poor machine out in the ether somewhere. It is not on
> > >> my network.
> > >>
> > >>
> > >>
> > >> However, on the headnode I get:
> > >>
> > >>
> > >>
> > >> [root@KWFoundation ~]# ping kwfoundation.cluster.hpc.org
> > >>
> > >> PING KWFoundation.cluster.hpc.org (192.168.0.248) 56(84) bytes of data.
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=1
> > >> ttl=64 time=0.018 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=2
> > >> ttl=64 time=0.009 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=3
> > >> ttl=64 time=0.008 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=4
> > >> ttl=64 time=0.008 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=5
> > >> ttl=64 time=0.008 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=6
> > >> ttl=64 time=0.014 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=7
> > >> ttl=64 time=0.009 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=8
> > >> ttl=64 time=0.010 ms
> > >>
> > >>
> > >>
> > >> --- KWFoundation.cluster.hpc.org ping statistics ---
> > >>
> > >> 8 packets transmitted, 8 received, 0% packet loss, time 6999ms
> > >>
> > >> rtt min/avg/max/mdev = 0.008/0.010/0.018/0.004 ms
> > >>
> > >>
> > >>
> > >> and
> > >>
> > >>
> > >>
> > >> [root@KWFoundation ~]# ping 192.168.0.248
> > >>
> > >> PING 192.168.0.248 (192.168.0.248) 56(84) bytes of data.
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=1 ttl=64 time=0.016 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=2 ttl=64 time=0.012 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=3 ttl=64 time=0.009 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=4 ttl=64 time=0.015 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=5 ttl=64 time=0.010 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=6 ttl=64 time=0.009 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=7 ttl=64 time=0.010 ms
> > >>
> > >>
> > >>
> > >> --- 192.168.0.248 ping statistics ---
> > >>
> > >> 7 packets transmitted, 7 received, 0% packet loss, time 5999ms
> > >>
> > >> rtt min/avg/max/mdev = 0.009/0.011/0.016/0.004 ms
> > >>
> > >>
> > >>
> > >> Is there a DNS setup on the CLUSTER I should be modifying?
> > >>
> > >>
> > >>
> > >> Thanks for the help.
> > >>
> > >>
> > >>
> > >> Jim
> > >>
> > >> -------------- next part --------------
> > >> An HTML attachment was scrubbed...
> > >> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101013/b1657c91/attachment.html
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 4
> > >> Date: Wed, 13 Oct 2010 15:13:38 -0700
> > >> From: "Gladu, Charles"<cgl...@mgmresorts.com>
> > >> Subject: Re: [Rocks-Discuss] Remote cluster access - wordpress: Does
> > >> anyonehave a clue?
> > >> To: "Discussion of Rocks Clusters"<npaci-rocks...@sdsc.edu>
> > >> Message-ID:
> > >> <24C67244DEC69A49B4E...@PREXCHMB01P.MGMMIRAGE.ORG>
> > >> Content-Type: text/plain; charset="us-ascii"
> > >>
> > >>
> > >> I'm confused...why would you think you could control the DNS settings
> > >> for kwfoundation.cluster.hpc.org?
> > >>
> > >> Are you the owner of the hpc.org domain?
> > >>
> > >> The issue may be that you need to pick an FQDN that is in a domain that
> > >> you own or have DNS control over,
> > >>
> > >> Chuck
> > >>
> > >>
> > >>
> > >> -----Original Message-----
> > >> From: npaci-rocks-dis...@sdsc.edu
> > >> [mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Jim Kress
> > >> Sent: Wednesday, October 13, 2010 3:03 PM
> > >> To: npaci-rocks...@sdsc.edu
> > >> Subject: Re: [Rocks-Discuss] Remote cluster access - wordpress: Does
> > >> anyonehave a clue?
> > >>
> > >> Thanks guys. I've setup my router so that the
> > >> kwfoundation.cluster.hpc.org Hostname Is permanently assigned to
> > >> 192.168.0.248. However, that seems to not solve the problem.
> > >>
> > >>
> > >>
> > >> When I ping 192.168.0.248 (from my Windows 7 pro x64) machine I get:
> > >>
> > >>
> > >>
> > >> PS C:\ff71g> ping 192.168.0.248
> > >>
> > >>
> > >>
> > >> Pinging 192.168.0.248 with 32 bytes of data:
> > >>
> > >> Reply from 192.168.0.248: bytes=32 time<1ms TTL=64
> > >>
> > >> Reply from 192.168.0.248: bytes=32 time<1ms TTL=64
> > >>
> > >> Reply from 192.168.0.248: bytes=32 time<1ms TTL=64
> > >>
> > >> Reply from 192.168.0.248: bytes=32 time<1ms TTL=64
> > >>
> > >>
> > >>
> > >> Ping statistics for 192.168.0.248:
> > >>
> > >> Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
> > >>
> > >> Approximate round trip times in milli-seconds:
> > >>
> > >> Minimum = 0ms, Maximum = 0ms, Average = 0ms
> > >>
> > >>
> > >>
> > >> However, when I ping
> > >>
> > >>
> > >>
> > >> PS C:\ff71g> ping kwfoundation.cluster.hpc.org
> > >>
> > >>
> > >>
> > >> Pinging kwfoundation.cluster.hpc.org [64.158.56.56] with 32 bytes of
> > >> data:
> > >>
> > >> Reply from 64.158.56.56: bytes=32 time=42ms TTL=53
> > >>
> > >> Reply from 64.158.56.56: bytes=32 time=60ms TTL=53
> > >>
> > >> Reply from 64.158.56.56: bytes=32 time=46ms TTL=53
> > >>
> > >> Reply from 64.158.56.56: bytes=32 time=46ms TTL=53
> > >>
> > >>
> > >>
> > >> Ping statistics for 64.158.56.56:
> > >>
> > >> Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
> > >>
> > >> Approximate round trip times in milli-seconds:
> > >>
> > >> Minimum = 42ms, Maximum = 60ms, Average = 48ms
> > >>
> > >>
> > >>
> > >> This also happens when I permanently assign (in my router)
> > >> KWFoundation.cluster.hpc.org to 192.168.0.248 and then ping
> > >> KWFoundation.cluster.hpc.org.
> > >>
> > >>
> > >>
> > >> 64.158.56.56 is some poor machine out in the ether somewhere. It is not
> > >> on my network.
> > >>
> > >>
> > >>
> > >> However, on the headnode I get:
> > >>
> > >>
> > >>
> > >> [root@KWFoundation ~]# ping kwfoundation.cluster.hpc.org
> > >>
> > >> PING KWFoundation.cluster.hpc.org (192.168.0.248) 56(84) bytes of data.
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=1
> > >> ttl=64 time=0.018 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=2
> > >> ttl=64 time=0.009 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=3
> > >> ttl=64 time=0.008 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=4
> > >> ttl=64 time=0.008 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=5
> > >> ttl=64 time=0.008 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=6
> > >> ttl=64 time=0.014 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=7
> > >> ttl=64 time=0.009 ms
> > >>
> > >> 64 bytes from KWFoundation.cluster.hpc.org (192.168.0.248): icmp_seq=8
> > >> ttl=64 time=0.010 ms
> > >>
> > >>
> > >>
> > >> --- KWFoundation.cluster.hpc.org ping statistics ---
> > >>
> > >> 8 packets transmitted, 8 received, 0% packet loss, time 6999ms
> > >>
> > >> rtt min/avg/max/mdev = 0.008/0.010/0.018/0.004 ms
> > >>
> > >>
> > >>
> > >> and
> > >>
> > >>
> > >>
> > >> [root@KWFoundation ~]# ping 192.168.0.248
> > >>
> > >> PING 192.168.0.248 (192.168.0.248) 56(84) bytes of data.
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=1 ttl=64 time=0.016 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=2 ttl=64 time=0.012 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=3 ttl=64 time=0.009 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=4 ttl=64 time=0.015 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=5 ttl=64 time=0.010 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=6 ttl=64 time=0.009 ms
> > >>
> > >> 64 bytes from 192.168.0.248: icmp_seq=7 ttl=64 time=0.010 ms
> > >>
> > >>
> > >>
> > >> --- 192.168.0.248 ping statistics ---
> > >>
> > >> 7 packets transmitted, 7 received, 0% packet loss, time 5999ms
> > >>
> > >> rtt min/avg/max/mdev = 0.009/0.011/0.016/0.004 ms
> > >>
> > >>
> > >>
> > >> Is there a DNS setup on the CLUSTER I should be modifying?
> > >>
> > >>
> > >>
> > >> Thanks for the help.
> > >>
> > >>
> > >>
> > >> Jim
> > >>
> > >> -------------- next part --------------
> > >> An HTML attachment was scrubbed...
> > >> URL:
> > >> https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/2010
> > >> 1013/b1657c91/attachment.html
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 5
> > >> Date: Thu, 14 Oct 2010 10:04:58 +0800
> > >> From: ??<wbin...@gmail.com>
> > >> Subject: [Rocks-Discuss] some questions of Xen Roll agin
> > >> To: npaci-rocks...@sdsc.edu
> > >> Message-ID:
> > >> <AANLkTin-4fQ5Lsq6B=wYPR2b_7wtpTN...@mail.gmail.com>
> > >> Content-Type: text/plain; charset="gb2312"
> > >>
> > >> Hi,
> > >>
> > >> Inside the virt-manager window ,only can see the domain-0,(in the
> > >> user-guide, there have a "frontend-0-0-0 ")
> > >>
> > >> double click "domain-0" ,should see the screen of "domain-0 virtual
> > >> machine"
> > >>
> > >> in the console panel ,can see "console not configured for guest" (but i
> > >> login by root)
> > >>
> > >> why?
> > >>
> > >> --
> > >> ^_^ ????
> > >> -------------- next part --------------
> > >> An HTML attachment was scrubbed...
> > >> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101014/91ec2380/attachment.html
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 6
> > >> Date: Wed, 13 Oct 2010 20:56:25 -0700
> > >> From: Greg Bruno<greg....@gmail.com>
> > >> Subject: Re: [Rocks-Discuss] some questions of Xen Roll agin
> > >> To: Discussion of Rocks Clusters<npaci-rocks...@sdsc.edu>
> > >> Message-ID:
> > >> <AANLkTinxj4F9k=m9K9MrwmeeXOxK8...@mail.gmail.com>
> > >> Content-Type: text/plain; charset=GB2312
> > >>
> > >> 2010/10/13 ??<wbin...@gmail.com>:
> > >>
> > >>> Hi,
> > >>>
> > >>> Inside the virt-manager window ,only can see the domain-0,(in the
> > >>> user-guide, there have a "frontend-0-0-0 ")
> > >>>
> > >>> double click "domain-0" ,should see the screen of "domain-0 virtual
> > >>> machine"
> > >>>
> > >>> in the console panel ,can see "console not configured for guest" (but i
> > >>> login by root)
> > >>>
> > >> try starting your frontend VM again with 'rocks start host vm'. you
> > >> should then see 'frontend-0-0-0' under 'domain-0'.
> > >>
> > >> - gb
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 7
> > >> Date: Thu, 14 Oct 2010 09:58:06 +0530
> > >> From: Richard Chang<rchang...@gmail.com>
> > >> Subject: Re: [Rocks-Discuss] Utilising Front-end for Jobs
> > >> To: npaci-rocks...@sdsc.edu
> > >> Message-ID:<4CB686D6...@gmail.com>
> > >> Content-Type: text/plain; charset=ISO-8859-1
> > >>
> > >> On 8/18/2010 10:55 PM, Bart Brashers wrote:
> > >>
> > >>> For Torque/Maui (which you failed to specify) you can do the following
> > >>> (as root, on the frontend):
> > >>>
> > >>> # scp compute-0-0:/etc/pbs.conf /etc/pbs.conf
> > >>> # chkconfig pbs on
> > >>> # qmgr -c "create node<frontend>"
> > >>> # qmgr -c "set node<frontend> np = 6"
> > >>> # qmgr -c "set node<frontend> ntype=cluster"
> > >>> # service pbs start
> > >>>
> > >>> That should do it. I'm pretty sure you don't have to use
> > >>> <frontend>.local these days, but if it doesn't work you might try that.
> > >>> And to be clear, don't include the<> around the actual hostname of your
> > >>> frontend: qmgr -c "create node mycluster"
> > >>>
> > >>> I strongly suggest you specify the number of processors (the "np = 6"
> > >>> above) to be a few cores LESS than the number you really have in your
> > >>> frontend. If you have 8 cores, you want to reserve 2 cores for system
> > >>> stuff and login shells. Pick a number, you can always change it later
> > >>> if you notice the load on the frontend is too low or too high for your
> > >>> liking. That np number just says how many jobs/threads/cores are
> > >>> available to Torque.
> > >>>
> > >> Hello Bart,
> > >> The last time I asked for my front-end to be used for jobs, you had suggested the above. I had also seen you post in the wiki page. Thanks for this.
> > >>
> > >> Though I did this, I am not able to run jobs across the front-end and compute-nodes, i.e., I have a total 4 nodes along with the front-end. All of them are dual-processor 6-core Westmeres with a total of 12 cores per node.
> > >>
> > >> Jobs are running fine when I submit a job to run across the compute-nodes. But when I involve the front-end also, the job just stays there and doesn't output anything. In my experience it may not be able to communicate within themselves, and so the job is not able to proceed any further.
> > >>
> > >> Any suggestions about where to check?. Also, is there any way I can mention that the front-end will be the last node to get the job?. I have seen that if I enable my front-end for jobs, whenever I submit a job, it is always the first to get the job even though the compute nodes are free.
> > >>
> > >> regards,
> > >> Richard.
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 8
> > >> Date: Thu, 14 Oct 2010 10:25:41 +0200
> > >> From: Roy Dragseth<roy.dr...@uit.no>
> > >> Subject: Re: [Rocks-Discuss] Utilising Front-end for Jobs
> > >> To: Discussion of Rocks Clusters<npaci-rocks...@sdsc.edu>
> > >> Message-ID:<201010141025.41...@uit.no>
> > >> Content-Type: Text/Plain; charset="iso-8859-1"
> > >>
> > >> On Thursday, October 14, 2010 06:28:06 Richard Chang wrote:
> > >>
> > >>> On 8/18/2010 10:55 PM, Bart Brashers wrote:
> > >>>
> > >>>> For Torque/Maui (which you failed to specify) you can do the following
> > >>>> (as root, on the frontend):
> > >>>>
> > >>>> # scp compute-0-0:/etc/pbs.conf /etc/pbs.conf
> > >>>> # chkconfig pbs on
> > >>>> # qmgr -c "create node<frontend>"
> > >>>> # qmgr -c "set node<frontend> np = 6"
> > >>>> # qmgr -c "set node<frontend> ntype=cluster"
> > >>>> # service pbs start
> > >>>>
> > >>>> That should do it. I'm pretty sure you don't have to use
> > >>>> <frontend>.local these days, but if it doesn't work you might try that.
> > >>>> And to be clear, don't include the<> around the actual hostname of your
> > >>>> frontend: qmgr -c "create node mycluster"
> > >>>>
> > >>>> I strongly suggest you specify the number of processors (the "np = 6"
> > >>>> above) to be a few cores LESS than the number you really have in your
> > >>>> frontend. If you have 8 cores, you want to reserve 2 cores for system
> > >>>> stuff and login shells. Pick a number, you can always change it later
> > >>>> if you notice the load on the frontend is too low or too high for your
> > >>>> liking. That np number just says how many jobs/threads/cores are
> > >>>> available to Torque.
> > >>>>
> > >>> Hello Bart,
> > >>> The last time I asked for my front-end to be used for jobs, you had
> > >>> suggested the above. I had also seen you post in the wiki page. Thanks for
> > >>> this.
> > >>>
> > >>> Though I did this, I am not able to run jobs across the front-end and
> > >>> compute-nodes, i.e., I have a total 4 nodes along with the front-end. All
> > >>> of them are dual-processor 6-core Westmeres with a total of 12 cores per
> > >>> node.
> > >>>
> > >>> Jobs are running fine when I submit a job to run across the compute-nodes.
> > >>> But when I involve the front-end also, the job just stays there and
> > >>> doesn't output anything. In my experience it may not be able to
> > >>> communicate within themselves, and so the job is not able to proceed any
> > >>> further.
> > >>>
> > >>> Any suggestions about where to check?. Also, is there any way I can mention
> > >>> that the front-end will be the last node to get the job?. I have seen that
> > >>> if I enable my front-end for jobs, whenever I submit a job, it is always
> > >>> the first to get the job even though the compute nodes are free.
> > >>>
> > >> The easiest way to do this is to change /opt/torque/server_priv/nodes so that
> > >> the frontend name is on the top. Maui schedules jobs on the nodes in the
> > >> order they are presented. Remember to turn off automatic updates by editing
> > >> /etc/torque-roll.conf or else your changes to the node list will be
> > >> overwritten the next time you run rocks sync config.
> > >>
> > >> r.
> > >>
> > >>
> > >> --
> > >>
> > >> The Computer Center, University of Troms?, N-9037 TROMS? Norway.
> > >> phone:+47 77 64 41 07, fax:+47 77 64 41 00
> > >> Roy Dragseth, Team Leader, High Performance Computing
> > >> Direct call: +47 77 64 62 56. email: roy.dr...@uit.no
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 9
> > >> Date: Thu, 14 Oct 2010 11:20:49 +0200
> > >> From: Pablo Barrio<pba...@die.upm.es>
> > >> Subject: [Rocks-Discuss] New appliance's XML to copy a library - Node
> > >> doesn't show on SGE.
> > >> To: Rocks discussion list<npaci-rocks...@sdsc.edu>
> > >> Message-ID:<4CB6CB71...@die.upm.es>
> > >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > >>
> > >> Hello,
> > >>
> > >> I have installed a PCI board with two FPGAs in one of my compute nodes.
> > >> In order to compile programs for these boards, I just need to install
> > >> two header files and a shared library (libgidelproc.so). Compilation
> > >> will be done at the frontend, so the header files will only reside
> > >> there. Is it necessary to install the shared library on the compute
> > >> nodes? If so, could I modify the extend-compute.xml to copy the library
> > >> from the frontend to the compute nodes at kickstart with the minimum
> > >> fuss? i.e. no need to make a custom RPM for that single file. I had a
> > >> look at the node XML syntax, but I don't get things clear between eval,
> > >> include, etc. The programs currently execute OK inside the boards, but I
> > >> want to automate the library installation.
> > >>
> > >> Additionally, I've created a new appliance type, compute-FPGA. I have
> > >> changed the node with the new board to this type (by reinstalling).
> > >> After the name has changed, I cannot see the new node (compute-FPGA-0-1)
> > >> in SGE, and the old node (compute-0-1) is still present even when it
> > >> does not exist. I thought that insert-ethers took care of these details.
> > >> Is there any action that I must take to sync SGE with the cluster?
> > >> Perhaps I did something wrong.
> > >>
> > >> I'm using Rocks 5.3. Thanks in advance!
> > >>
> > >> --
> > >> Pablo Barrio
> > >> Dpto. Ing. Electr?nica - E.T.S.I. Telecomunicaci?n
> > >> Despacho C-203
> > >> Avda. Complutense s/n, 28040 Madrid
> > >> Tlf. 915495700 ext. 4234
> > >> @: pba...@die.upm.es
> > >>
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 10
> > >> Date: Thu, 14 Oct 2010 10:19:38 -0400
> > >> From: Noam Bernstein<noam.be...@nrl.navy.mil>
> > >> Subject: [Rocks-Discuss] SGE, slots, and affinity
> > >> To: Discussion of Rocks Clusters<npaci-rocks...@sdsc.edu>
> > >> Message-ID:<796FA33D-E4B9-4B89...@nrl.navy.mil>
> > >> Content-Type: text/plain; charset=us-ascii
> > >>
> > >> Hi all - we have some new 32 core AMD nodes (Rocks 5.2, SGE 6.2u2,
> > >> openmpi 1.4.1), and it makes a huge difference to add --mca mpi_paffinity_alone 1
> > >> (or perhaps --bind-to-core) for our openmpi runs on these nodes. However, I'd
> > >> like to be able to share the nodes between multiple small parallel jobs,
> > >> but each mpirun just assigns all its processes to the same cores, so they conflict
> > >> (and slow down by a factor of 100). Does anyone know what it would take for
> > >> something like this to work? Is there any integration for this via SGE (it would
> > >> need to assign particular slots to particular jobs, so that we could construct and
> > >> OpenMPI rank mapping file)? Another SGE version? Another queuing system?
> > >> Some other solution that I've missed?
> > >>
> > >> I've considered adding prologue/epilogue scripts that write which slots
> > >> each job is using in some temp directory, and avoid slots that another job
> > >> is claiming (by looking in the same directory), but any simple implementation
> > >> will not be that simple, and probably be prone to race conditions.
> > >>
> > >> thanks,
> > >> Noam
> > >>
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 11
> > >> Date: Thu, 14 Oct 2010 19:05:04 +0200
> > >> From: "tomisla...@gmx.com"<tomisla...@gmx.com>
> > >> Subject: [Rocks-Discuss] profiling the cluster (newb)
> > >> To: npaci-rocks...@sdsc.edu
> > >> Message-ID:<2010101417...@gmx.com>
> > >> Content-Type: text/plain; charset="utf-8"
> > >>
> > >> Hello everyone,
> > >>
> > >> the machine is running fine, now I would like to profile it for various CFD simulations of the OpenFOAM app. I have a few questions regarding the ROCKS structure. I am deeply grateful for any advice/help on these issues, I'm a Mechanical Engineer and I am more || less newbish to all this.
> > >>
> > >> 1) Why do the apps that are to be run in parallel with mpirun need to be installed to /share/apps (or another NFS directory)? Is it because OpenMPI relies on the availability of the binaries over the network?
> > >>
> > >> 2) Why does it take so long for the "`which mpirun` -np N -machinefile machines `which icoFoam` -parallel" to start running? It takes almost a few minutes. Is it possible to speed this up?
> > >>
> > >> 3) What would be the best way to profile the OpenMPI apps on a ROCKS cluster? I have a way of generating a parametric study of the simulations (OpenFOAM solvers) with the help of a Python library (PyFOAM) that could increase:
> > >> ? ?- the number of calculation points (coarsness),
> > >> ? ?- the number of cores on each node for the mpirun,
> > >> but I need to know how to check (in a script of some sort) if the memory of the whole machine is nearly filled (it must not swap, otherwise, the time measurment makes no sense). The info on the network traffic would be excellent also, so that I can see where the bottleneck will appear first.
> > >>
> > >> This way I can stop the study automatically and evaluate results. In the case when the speedup is good enough for the present increase in the simulation size, I can buy more RAM and continue where I've stopped until I reach the decrease of speedup (overload of the memory controller, the GiGeth switch or the processor). This way I can tune the RAM quantity to the CPU frequency of the nodes, for a single type of a GiGeth switch without writing the data over NFS to exclude the writing time. Although I've heard that the CFD apps are RAM dependent, this is my home project, so I'm very, very careful about what hardware I'm buying and how I'm scaling the machine.
> > >>
> > >> I've seen that there are programs such as Ganglia for monitoring the jobs, but I would prefer to work with raw data because the multitude of cases are to be automatically generated and run on the machines with different parameters. Is there a Python module of the ROCKS distro that gives me access to the current system status of the whole machine?
> > >>
> > >> Thanks again,
> > >>
> > >> Tomislav Maric
> > >>
> > >>
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 12
> > >> Date: Thu, 14 Oct 2010 14:54:01 -0300
> > >> From: Paula R T Coelho<paular...@gmail.com>
> > >> Subject: [Rocks-Discuss] install ganglia roll
> > >> To: Discussion of Rocks Clusters<npaci-rocks...@sdsc.edu>
> > >> Message-ID:
> > >> <AANLkTikgdAj8SRZfFFvO87YW=+tYsk6=pwevSB...@mail.gmail.com>
> > >> Content-Type: text/plain; charset="iso-8859-1"
> > >>
> > >> hello,
> > >> newbie question: the cluster is already running and i don't feel like
> > >> reinstalling the frontend.
> > >> how can i install ganglia on it, without reinstalling it all?
> > >>
> > >> thx
> > >> p.
> > >> -------------- next part --------------
> > >> An HTML attachment was scrubbed...
> > >> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101014/eff926f1/attachment.html
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 13
> > >> Date: Thu, 14 Oct 2010 11:09:52 -0700
> > >> From: Greg Bruno<greg....@gmail.com>
> > >> Subject: Re: [Rocks-Discuss] New appliance's XML to copy a library -
> > >> Node doesn't show on SGE.
> > >> To: Discussion of Rocks Clusters<npaci-rocks...@sdsc.edu>
> > >> Message-ID:
> > >> <AANLkTi=yhdEsHp8OQXFZNWvtM...@mail.gmail.com>
> > >> Content-Type: text/plain; charset=ISO-8859-1
> > >>
> > >> On Thu, Oct 14, 2010 at 2:20 AM, Pablo Barrio<pba...@die.upm.es> wrote:
> > >>
> > >>> Hello,
> > >>>
> > >>> I have installed a PCI board with two FPGAs in one of my compute nodes. In
> > >>> order to compile programs for these boards, I just need to install two
> > >>> header files and a shared library (libgidelproc.so). Compilation will be
> > >>> done at the frontend, so the header files will only reside there. Is it
> > >>> necessary to install the shared library on the compute nodes? If so, could I
> > >>> modify the extend-compute.xml to copy the library from the frontend to the
> > >>> compute nodes at kickstart with the minimum fuss? i.e. no need to make a
> > >>> custom RPM for that single file. I had a look at the node XML syntax, but I
> > >>> don't get things clear between eval, include, etc. The programs currently
> > >>> execute OK inside the boards, but I want to automate the library
> > >>> installation.
> > >>>
> > >> you could make a package for your header files and shared library like this:
> > >>
> > >> # cd /tmp
> > >> # mkdir new
> > >> # cd new
> > >>
> > >> assuming your header files are in /usr/include:
> > >>
> > >> # mkdir -p usr/include
> > >> # cp /usr/include/header1.h usr/include
> > >> # cp /usr/include/header2.h usr/include
> > >>
> > >> assumine your library is in /usr/lib:
> > >>
> > >> # mkdir -p usr/lib
> > >> # cp /usr/lib/libgidelproc.so usr/lib
> > >>
> > >> now make a package named 'gidelproc':
> > >>
> > >> # rocks create package $PWD/usr gidelproc prefix=/
> > >>
> > >> this will create and RPM named:
> > >>
> > >> gidelproc-1.0-1.*.rpm
> > >>
> > >> you can verify the files will be installed in the right place by executing:
> > >>
> > >> # rpm -qlp gidelproc-1.0-1.*.rpm
> > >>
> > >> now you can add the gidelproc RPM to your compute nodes by following
> > >> this procedure:
> > >>
> > >> http://www.rocksclusters.org/roll-documentation/base/5.3/customization-adding-packages.html
> > >>
> > >>
> > >>> Additionally, I've created a new appliance type, compute-FPGA. I have
> > >>> changed the node with the new board to this type (by reinstalling). After
> > >>> the name has changed, I cannot see the new node (compute-FPGA-0-1) in SGE,
> > >>> and the old node (compute-0-1) is still present even when it does not exist.
> > >>> I thought that insert-ethers took care of these details. Is there any action
> > >>> that I must take to sync SGE with the cluster? Perhaps I did something
> > >>> wrong.
> > >>>
> > >> what is the output of:
> > >>
> > >> # rocks list host
> > >>
> > >> - gb
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 14
> > >> Date: Thu, 14 Oct 2010 11:16:12 -0700
> > >> From: "Bart Brashers"<bbra...@Environcorp.com>
> > >> Subject: Re: [Rocks-Discuss] Utilising Front-end for Jobs
> > >> To: "Discussion of Rocks Clusters"<npaci-rocks...@sdsc.edu>
> > >> Message-ID:
> > >> <1B8D1B9BF4DCDC4A90A4...@irvine01.irvine.environ.local>
> > >>
> > >> Content-Type: text/plain; charset="us-ascii"
> > >>
> > >>
> > >>>> For Torque/Maui (which you failed to specify) you can do the
> > >>>>
> > >> following
> > >>
> > >>>> (as root, on the frontend):
> > >>>>
> > >>>> # scp compute-0-0:/etc/pbs.conf /etc/pbs.conf
> > >>>> # chkconfig pbs on
> > >>>> # qmgr -c "create node<frontend>"
> > >>>> # qmgr -c "set node<frontend> np = 6"
> > >>>> # qmgr -c "set node<frontend> ntype=cluster"
> > >>>> # service pbs start
> > >>>>
> > >>>> That should do it. I'm pretty sure you don't have to use
> > >>>> <frontend>.local these days, but if it doesn't work you might try
> > >>>>
> > >> that.
> > >>
> > >>>> And to be clear, don't include the<> around the actual hostname of
> > >>>>
> > >> your
> > >>
> > >>>> frontend: qmgr -c "create node mycluster"
> > >>>>
> > >>>> I strongly suggest you specify the number of processors (the "np =
> > >>>>
> > >> 6"
> > >>
> > >>>> above) to be a few cores LESS than the number you really have in
> > >>>>
> > >> your
> > >>
> > >>>> frontend. If you have 8 cores, you want to reserve 2 cores for
> > >>>>
> > >> system
> > >>
> > >>>> stuff and login shells. Pick a number, you can always change it
> > >>>>
> > >> later
> > >>
> > >>>> if you notice the load on the frontend is too low or too high for
> > >>>>
> > >> your
> > >>
> > >>>> liking. That np number just says how many jobs/threads/cores are
> > >>>> available to Torque.
> > >>>>
> > >>> Hello Bart,
> > >>> The last time I asked for my front-end to be used for jobs, you had
> > >>>
> > >> suggested
> > >>
> > >>> the above. I had also seen you post in the wiki page. Thanks for this.
> > >>>
> > >>> Though I did this, I am not able to run jobs across the front-end and
> > >>> compute-nodes, i.e., I have a total 4 nodes along with the front-end.
> > >>>
> > >> All of
> > >>
> > >>> them are dual-processor 6-core Westmeres with a total of 12 cores per
> > >>>
> > >> node.
> > >>
> > >>> Jobs are running fine when I submit a job to run across the
> > >>>
> > >> compute-nodes.
> > >>
> > >>> But when I involve the front-end also, the job just stays there and
> > >>>
> > >> doesn't
> > >>
> > >>> output anything. In my experience it may not be able to communicate
> > >>>
> > >> within
> > >>
> > >>> themselves, and so the job is not able to proceed any further.
> > >>>
> > >> I'm guess here that you are running MPI jobs, and using "across" to
> > >> indicate that jobs assigned to cores on the frontend and on a compute
> > >> node are failing.
> > >>
> > >> I'm going to further guess that single-threaded jobs assigned to the
> > >> frontend work without problem. I will also guess that MPI jobs assigned
> > >> only to cores on the FE work fine. Stop me when my guesses are not
> > >> correct...
> > >>
> > >> So it's really not that the _jobs_ are failing, it's an _MPI_ problem.
> > >>
> > >> Because the compute nodes are not allowed password-less ssh to the
> > >> frontend for security reasons, when the main thread attempts to ssh to
> > >> the FE and spawn a thread there, it's not being allowed in. So you need
> > >> to enable password-less ssh from the computes to the FE. Do that with
> > >> SSH keys.
> > >>
> > >> You could copy /root/.ssh/id_dsa* to /root/.ssh/ on all the compute
> > >> nodes. That should do it. Or you could generate ID's on each compute
> > >> node, and add all the id_dsa.pub lines to
> > >> frontend:/root/.ssh/authorized_keys.
> > >>
> > >> Bart
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> Message: 15
> > >> Date: Thu, 14 Oct 2010 11:41:04 -0700 (PDT)
> > >> From: Tim Carlson<tim.c...@pnl.gov>
> > >> Subject: Re: [Rocks-Discuss] profiling the cluster (newb)
> > >> To: Discussion of Rocks Clusters<npaci-rocks...@sdsc.edu>
> > >> Message-ID:<alpine.LRH.2.00.1...@scorpion.emsl.pnl.gov>
> > >> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> > >>
> > >> On Thu, 14 Oct 2010, tomisla...@gmx.com wrote:
> > >>
> > >>
> > >>> 1) Why do the apps that are to be run in parallel with mpirun need to be
> > >>> installed to /share/apps (or another NFS directory)? Is it because
> > >>> OpenMPI relies on the availability of the binaries over the network?
> > >>>
> > >> OpenMPI needs access to the binaries. These can be locally installed or
> > >> installed on each machine. The easy thing to do is keep them in a NFS
> > >> share.
> > >>
> > >>
> > >>
> > >>> 2) Why does it take so long for the "`which mpirun` -np N -machinefile
> > >>> machines `which icoFoam` -parallel" to start running? It takes almost
> > >>> a few minutes. Is it possible to speed this up?
> > >>>
> > >> How big is N? If N is reasonably large and I know OpenFOAM is a monster of
> > >> a program, it could take a while for everything to load. Also what do you
> > >> mean by "running". With gigabit ethernet, it might take sometime to
> > >> distribute your solution mesh.
> > >>
> > >>
> > >>> but I need to know how to check (in a script of some sort) if the memory of the whole machine is nearly filled (it must not swap, otherwise, the time measurment makes no sense). The info on the network traffic would be excellent also, so that I can see where the bottleneck will appear first.
> > >>>
> > >> ganglia gives you quite a bit of output on the fly.
> > >>
> > >> ganglia mem_free
> > >>
> > >> is a parameter you might want to look at.
> > >>
> > >>
> > >>
> > >>> This way I can stop the study automatically and evaluate results. In the
> > >>> case when the speedup is good enough for the present increase in the
> > >>> simulation size, I can buy more RAM and continue where I've stopped
> > >>> until I reach the decrease of speedup (overload of the memory
> > >>> controller, the GiGeth switch or the processor). This way I can tune the
> > >>> RAM quantity to the CPU frequency of the nodes, for a single type of a
> > >>> GiGeth switch without writing the data over NFS to exclude the writing
> > >>> time. Although I've heard that the CFD apps are RAM dependent, this is
> > >>> my home project, so I'm very, very careful about what hardware I'm
> > >>> buying and how I'm scaling the machine.
> > >>>
> > >> CFD apps are typically RAM bandwidth dependent. The larger the memory
> > >> bandwidth, the faster the app runs. Intel Nehalem and AMD Magny Cours
> > >> shine because of all the memory channels feeding the CPUs.
> > >>
> > >> My experience is that your code is probably not going to scale very well
> > >> with your network configuration. OpenFOAM really wants to have a good
> > >> (Infiniband) network. People pay an extra 20% per node to get IB because
> > >> apps like OpenFOAM really need the low latency and high bandwidth provided
> > >> by IB networks.
> > >>
> > >>
> > >>> I've seen that there are programs such as Ganglia for monitoring the
> > >>> jobs, but I would prefer to work with raw data because the multitude of
> > >>> cases are to be automatically generated and run on the machines with
> > >>> different parameters. Is there a Python module of the ROCKS distro that
> > >>> gives me access to the current system status of the whole machine?
> > >>>
> > >> ganglia actually does this for you pretty well. It collects the raw data
> > >> and displays it in graphical format on a web page. You can also use the
> > >> command line tools to collect the raw data.
> > >>
> > >> Tim
> > >>
> > >> --
> > >> -------------------------------------------
> > >> Tim Carlson, PhD
> > >> Senior Research Scientist
> > >> Environmental Molecular Sciences Laboratory
> > >>
> > >>
> > >> ------------------------------
> > >>
> > >> _______________________________________________
> > >> npaci-rocks-discussion mailing list
> > >> npaci-rocks...@sdsc.edu
> > >> https://lists.sdsc.edu/mailman/listinfo/npaci-rocks-discussion
> > >>
> > >>
> > >> End of npaci-rocks-discussion Digest, Vol 51, Issue 14
> > >> ******************************************************
> > >>
> > >
> >
> >
> > --
> > Scott L. Hamilton
> > Research Support Services
> > Missouri University of Science and Technology
> > 316 Engineering Research Lab
> > Rolla, MO 65409
> > Phone: 573-341-6117
> > hamil...@mst.edu
> >
> >
> >
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Mon, 18 Oct 2010 09:35:48 -0500
> > From: Raymond Muno <mu...@aem.umn.edu>
> > Subject: Re: [Rocks-Discuss] Processor limits for MX (Myrinet)
> > To: npaci-rocks...@sdsc.edu
> > Message-ID: <4CBC5B44...@aem.umn.edu>
> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >
> > ?Still waiting to hear back from Myricom.
> >
> > Looks like the limitation is hardcoded in their MX stack. That is
> > counter to the options they give in the start up files for the driver.
> >
> > Without looking deeper, there may be a balance between number of
> > endpoints and number of hosts in the cluster.
> >
> > Not a Rocks specific issue but I posted to see if anyone had insight in
> > to the issue on nodes with higher core counts. We ran this hardware for
> > a few years in a set of nodes with 2x dual core processors.
> >
> > -Ray Muno
> > ?University of Minnesota
> >
> > On 10/15/2010 7:48 PM, Philip Papadopoulos wrote:
> > > Sounds like a limitation in the Myri Lanai code. Have you asked Myricom
> > > about it?
> > > (This doesn't look to be Rocks-specific)
> > >
> > > -P
> > >
> >
> >
> >
> > ------------------------------
> >
> > Message: 5
> > Date: Mon, 18 Oct 2010 08:11:51 -0700
> > From: "Bart Brashers" <bbra...@Environcorp.com>
> > Subject: Re: [Rocks-Discuss] #cores of front-end in /proc/cpuinfo
> > differs from #cores that ganglia presents
> > To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>
> > Message-ID:
> > <1B8D1B9BF4DCDC4A90A4...@irvine01.irvine.environ.local>
> >
> > Content-Type: text/plain; charset="us-ascii"
> >
> > When you installed your frontend, did you select the Xen roll? What's
> > the output of
> >
> > # rocks list roll
> > # uname -a
> >
> > If you have the Xen roll installed, do you actually use virtual
> > machines?
> >
> > Bart
> >
> >
> > > Hi,
> > >
> > > Our cluster (a front-end with 5 compute nodes) is based on Rocks 5.2.
> > Our
> > > front-end is an HP Proliant ML350 with 2 Dual Core Intel Xeon
> > processors.
> > > Looking in /proc/cpuinfo, I noticed that only one processor presents
> > instead
> > > of 4. The same info, I get using top. This problem does not appear in
> > compute
> > > nodes.
> > > But, ganglia shows that the front-end has 4 processors.
> > >
> > > During reboot no error messages appeared but I noticed that Intel
> > Virtual
> > > Technology is disabled.
> > > How can I fix it so as /proc/cpuinfo to show the same number of
> > processors
> > > with ganglia? Any suggestions?
> > >
> > > Regards,
> > >
> > > Panagiotis Nastou
> > > Dpt of Mathematics
> > > Aegean University
> > > Karlovasi, Samos
> >
> >
> > This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.
> >
> >
> > ------------------------------
> >
> > Message: 6
> > Date: Mon, 18 Oct 2010 10:51:40 -0500
> > From: Mike Hanby <mha...@uab.edu>
> > Subject: Re: [Rocks-Discuss] SGE: Nodes in E state
> > To: "npaci-rocks...@sdsc.edu"
> > <npaci-rocks...@sdsc.edu>
> > Message-ID:
> > <A72C1C64C331B445A593C...@UABEXMBS3.ad.uab.edu>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Try using:
> >
> > qstat -explain E
> >
> > That may reveal the root problem.
> >
> > Also, if you previously made memory / virtual memory consumable, you may need to update the execution hosts with the new ram totals.
> >
> > -----Original Message-----
> > From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of pooja gupta
> > Sent: Monday, October 18, 2010 8:06 AM
> > To: npaci-rocks...@sdsc.edu
> > Subject: [Rocks-Discuss] SGE: Nodes in E state
> >
> > Dear All,
> >
> > We recently updated RAM on our compute nodes. qhost command shows the RAM
> > is increased. We also have our head node as execute node. But all my nodes
> > are in permanent E state. Please have a look at the output below. We are not
> > able to qlogin also. The output of the qlogin command is:
> >
> > Your job 1739 ("QLOGIN") has been submitted
> > waiting for interactive job to be scheduled ...
> >
> > Your "qlogin" request could not be scheduled, try again later.
> > error: error shutting down the connection: undefined commlib error code
> >
> >
> > We have stopped and restarted SGE many times but it didn't help. Please help
> > us in this regard.
> > We would highly appreciate your suggestions.
> > Many thanks in advance.
> >
> > With best regards,
> > Pooja
> >
> >
> >
> > =======================================================
> >
> > queuename qtype resv/used/tot. load_avg arch
> > states
> > ---------------------------------------------------------------------------------
> > al...@compute-0-0.local BIP 0/0/16 0.00 lx26-amd64 E
> > ---------------------------------------------------------------------------------
> > al...@compute-0-1.local BIP 0/0/16 0.00 lx26-amd64 E
> > ---------------------------------------------------------------------------------
> > al...@compute-0-2.local BIP 0/0/16 0.00 lx26-amd64 E
> > ---------------------------------------------------------------------------------
> > al...@compute-0-3.local BIP 0/0/16 0.02 lx26-amd64 E
> > ---------------------------------------------------------------------------------
> > al...@nanda.local BIP 0/0/8 0.00 lx26-amd64 E
> >
> > =======================================================
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101018/41f2f232/attachment.html
> >
> > ------------------------------
> >
> > Message: 7
> > Date: Mon, 18 Oct 2010 19:18:18 +0300
> > From: Nastou Panagiotis <pna...@aegean.gr>
> > Subject: [Rocks-Discuss] ??: #cores of front-end in /proc/cpuinfo
> > differs from #cores that ganglia presents
> > To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> > Message-ID:
> > <6AEE2EE3778E1241B57804...@hermes.aegean.gr>
> > Content-Type: text/plain; charset="iso-8859-7"
> >
> > Hi Bart,
> >
> > Yes, I have selected the Xen roll during installation.
> >
> > [root@pythagoras ~]# rocks list roll
> > NAME VERSION ARCH ENABLED
> > kernel: 5.2 x86_64 yes
> > area51: 5.2 x86_64 yes
> > base: 5.2 x86_64 yes
> > ganglia: 5.2 x86_64 yes
> > hpc: 5.2 x86_64 yes
> > java: 5.2 x86_64 yes
> > sge: 5.2 x86_64 yes
> > viz: 5.2 x86_64 yes
> > web-server: 5.2 x86_64 yes
> > xen: 5.2 x86_64 yes
> > os: 5.2 x86_64 yes
> > os: 5.2 x86_64 yes
> > os: 5.2 x86_64 yes
> > os: 5.2 x86_64 yes
> > os: 5.2 x86_64 yes
> > os: 5.2 x86_64 yes
> > os: 5.2 x86_64 yes
> > service-pack: 5.2.2 x86_64 yes
> > torque: 5.2.0 x86_64 yes
> >
> > [root@pythagoras ~]# uname -a
> > Linux pythagoras.math.aegean.gr 2.6.18-128.1.14.el5xen #1 SMP Wed Jun 17 07:10:16 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
> >
> > For the moment, I do not use virtual machines. What should I do?
> >
> > Thank you in advance.
> >
> > Regards,
> >
> > Panos
> > ________________________________________
> > ???: npaci-rocks-dis...@sdsc.edu [npaci-rocks-dis...@sdsc.edu] ?? ?????? ??? Bart Brashers [bbra...@Environcorp.com]
> > ????????: ???????, 18 ????????? 2010 6:11 ??
> > ????: Discussion of Rocks Clusters
> > ????: Re: [Rocks-Discuss] #cores of front-end in /proc/cpuinfo differs from #cores that ganglia presents
> >
> > When you installed your frontend, did you select the Xen roll? What's
> > the output of
> >
> > # rocks list roll
> > # uname -a
> >
> > If you have the Xen roll installed, do you actually use virtual
> > machines?
> >
> > Bart
> >
> >
> > > Hi,
> > >
> > > Our cluster (a front-end with 5 compute nodes) is based on Rocks 5.2.
> > Our
> > > front-end is an HP Proliant ML350 with 2 Dual Core Intel Xeon
> > processors.
> > > Looking in /proc/cpuinfo, I noticed that only one processor presents
> > instead
> > > of 4. The same info, I get using top. This problem does not appear in
> > compute
> > > nodes.
> > > But, ganglia shows that the front-end has 4 processors.
> > >
> > > During reboot no error messages appeared but I noticed that Intel
> > Virtual
> > > Technology is disabled.
> > > How can I fix it so as /proc/cpuinfo to show the same number of
> > processors
> > > with ganglia? Any suggestions?
> > >
> > > Regards,
> > >
> > > Panagiotis Nastou
> > > Dpt of Mathematics
> > > Aegean University
> > > Karlovasi, Samos
> >
> >
> > This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.
> >
> >
> > ------------------------------
> >
> > Message: 8
> > Date: Mon, 18 Oct 2010 09:37:09 -0700
> > From: "Bart Brashers" <bbra...@Environcorp.com>
> > Subject: Re: [Rocks-Discuss] ??: #cores of front-end in /proc/cpuinfo
> > differs from #cores that ganglia presents
> > To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>
> > Message-ID:
> > <1B8D1B9BF4DCDC4A90A4...@irvine01.irvine.environ.local>
> >
> > Content-Type: text/plain; charset="utf-8"
> >
> > This happens fairly often: people select ALL the rolls on the jumbo DVD, even if they don't intend to use them. The Xen roll is for virtual machines. The default frontend is a virtual machine with just one core (1 CPU), which is why /proc/cpuinfo shows just one. You can see that you're running the Xen kernel in the output of "uname -a" (the "xen" at the end).
> >
> > Note also that you must choose either the SGE roll or the torque roll, but you should NOT install both. They compete, and get in each other's way. Pick only one.
> >
> > If you never intend to use virtual machines, then I suggest making a restore roll and re-installing. This might be a good time to upgrade to 5.3 or 5.4 (due out on Thursday).
> >
> > http://www.rocksclusters.org/roll-documentation/base/5.2/upgrade-frontend.html
> > https://wiki.rocksclusters.org/wiki/index.php/Tips_and_tricks#Q._What_files_should_survive_a_frontend_re-install.2Fupgrade.3F
> >
> > If you think you might someday want to run virtual nodes and virtual clusters, then someone else will have to chime in and answer how to make your FE have access to more cores.
> >
> > Bart
> >
> > > Hi Bart,
> > >
> > > Yes, I have selected the Xen roll during installation.
> > >
> > > [root@pythagoras ~]# rocks list roll
> > > NAME VERSION ARCH ENABLED
> > > kernel: 5.2 x86_64 yes
> > > area51: 5.2 x86_64 yes
> > > base: 5.2 x86_64 yes
> > > ganglia: 5.2 x86_64 yes
> > > hpc: 5.2 x86_64 yes
> > > java: 5.2 x86_64 yes
> > > sge: 5.2 x86_64 yes
> > > viz: 5.2 x86_64 yes
> > > web-server: 5.2 x86_64 yes
> > > xen: 5.2 x86_64 yes
> > > os: 5.2 x86_64 yes
> > > os: 5.2 x86_64 yes
> > > os: 5.2 x86_64 yes
> > > os: 5.2 x86_64 yes
> > > os: 5.2 x86_64 yes
> > > os: 5.2 x86_64 yes
> > > os: 5.2 x86_64 yes
> > > service-pack: 5.2.2 x86_64 yes
> > > torque: 5.2.0 x86_64 yes
> > >
> > > [root@pythagoras ~]# uname -a
> > > Linux pythagoras.math.aegean.gr 2.6.18-128.1.14.el5xen #1 SMP Wed Jun 17
> > > 07:10:16 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
> > >
> > > For the moment, I do not use virtual machines. What should I do?
> > >
> > > Thank you in advance.
> > >
> > > Regards,
> > >
> > > Panos
> > > ________________________________________
> > > ???: npaci-rocks-dis...@sdsc.edu [npaci-rocks-discussion-
> > > bou...@sdsc.edu] ?? ?????? ??? Bart Brashers [bbra...@Environcorp.com]
> > > ????????: ???????, 18 ????????? 2010 6:11 ??
> > > ????: Discussion of Rocks Clusters
> > > ????: Re: [Rocks-Discuss] #cores of front-end in /proc/cpuinfo differs from
> > > #cores that ganglia presents
> > >
> > > When you installed your frontend, did you select the Xen roll? What's
> > > the output of
> > >
> > > # rocks list roll
> > > # uname -a
> > >
> > > If you have the Xen roll installed, do you actually use virtual
> > > machines?
> > >
> > > Bart
> > >
> > >
> > > > Hi,
> > > >
> > > > Our cluster (a front-end with 5 compute nodes) is based on Rocks 5.2.
> > > Our
> > > > front-end is an HP Proliant ML350 with 2 Dual Core Intel Xeon
> > > processors.
> > > > Looking in /proc/cpuinfo, I noticed that only one processor presents
> > > instead
> > > > of 4. The same info, I get using top. This problem does not appear in
> > > compute
> > > > nodes.
> > > > But, ganglia shows that the front-end has 4 processors.
> > > >
> > > > During reboot no error messages appeared but I noticed that Intel
> > > Virtual
> > > > Technology is disabled.
> > > > How can I fix it so as /proc/cpuinfo to show the same number of
> > > processors
> > > > with ganglia? Any suggestions?
> > > >
> > > > Regards,
> > > >
> > > > Panagiotis Nastou
> > > > Dpt of Mathematics
> > > > Aegean University
> > > > Karlovasi, Samos
> > >
> > >
> > > This message contains information that may be confidential, privileged or
> > > otherwise protected by law from disclosure. It is intended for the exclusive
> > > use of the Addressee(s). Unless you are the addressee or authorized agent of
> > > the addressee, you may not review, copy, distribute or disclose to anyone the
> > > message or any information contained within. If you have received this
> > > message in error, please contact the sender by electronic reply to
> > > em...@environcorp.com and immediately delete all copies of the message.
> >
> >
> > This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.
> >
> >
> > ------------------------------
> >
> > Message: 9
> > Date: Mon, 18 Oct 2010 16:02:16 -0200
> > From: "Tadeu Leonardo Soares e Silva" <tsi...@peq.coppe.ufrj.br>
> > Subject: [Rocks-Discuss] Dacapo and Rocks cluster
> > To: npaci-rocks...@sdsc.edu
> > Message-ID: <2010101817...@peq.coppe.ufrj.br>
> > Content-Type: text/plain; charset=utf-8
> >
> >
> > Can anyone help tell us how to install Dacapo in Rocks?
> >
> > Thanks in advance
> >
> > Tadeu
> >
> >
> >
> > ?+++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> > ?PEQ/COPPE renova o nivel de curso 7, maximo na CAPES:
> > ?45 anos de excelencia no ensino e pesquisa de pos-gradua??o em
> > ?Engenharia Quimica.
> >
> > ?************************************
> >
> > ?PEQ/COPPE : 45 years of commitment to excellence in teaching and
> > ?research in Chemical Engineering.
> >
> > ?+++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> > ------------------------------
> >
> > Message: 10
> > Date: Mon, 18 Oct 2010 14:10:49 -0400
> > From: "Edsall, William (WJ) " <WJEd...@dow.com>
> > Subject: [Rocks-Discuss] two clusters on one network with the same
> > Private IP
> > To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>
> > Message-ID:
> > <52CD990A674498429E6A...@USMDLMDOWX025.dow.com>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > Hello,
> > ?We have a strange problem.
> >
> > Two rocks clusters on one network. Cluster A is rocks 5.1, Cluster B is
> > rocks 5.3.
> >
> > Cluster A just started acting up and the private network will disable
> > itself.
> > If you try to manually start:
> > # ifup eth0
> > Error, some other host already uses address 10.1.1.1.
> >
> > So I did an arping for this address, and it returns the MAC of the
> > public NIC on Cluster B:
> > # arping -c2 -w3 -D -I eth1 10.1.1.1
> > ARPING 10.1.1.1 from 0.0.0.0 eth1
> > Unicast reply from 10.1.1.1 [00:30:48:BB:60:07] for 10.1.1.1
> > [00:30:48:BB:60:07] 0.736ms
> > Sent 1 probes (1 broadcast(s))
> > Received 1 response(s)
> >
> > The public NIC's of both cluster A and B are on the same switch. Would
> > splitting them up to separate physical switches prevent this broadcast
> > hiccup?
> >
> >
> > _______________________________________
> > William J. Edsall
> >
> >
> >
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101018/757167eb/attachment.html
> >
> >
> > ------------------------------
> >
> > Message: 11
> > Date: Mon, 18 Oct 2010 12:23:55 -0600
> > From: Lloyd Brown <lloyd...@byu.edu>
> > Subject: Re: [Rocks-Discuss] two clusters on one network with the same
> > Private IP
> > To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> > Message-ID: <4CBC90B...@byu.edu>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > ?I could be wrong, but I believe the default behavior in the Linux
> > kernel is to respond to ARP requests for any of the IPs on the host,
> > even if the request didn't come from that same interface. Therefore, if
> > both clusters use the same internal IP address, then you're going to
> > have problems like this.
> >
> > I believe that what you want to adjust this behavior uses the
> > "arp_ignore", which can be adjusted on a system-wide or a per-interface
> > basis, via either the /proc/filesystem, or the "sysctl" command line.
> > Although it's not for Rocks, I think this wiki shows the syntax well
> > enough:
> > http://kb.linuxvirtualserver.org/wiki/Using_arp_announce/arp_ignore_to_disable_ARP
> > I think you'd want the arp_ignore value of 1 or possibly 2. It's mostly
> > up to you.
> >
> > Another solution would be to use different internal subnets for the
> > different clusters.
> >
> >
> > Lloyd
> >
> >
> > On 10/18/10 12:10 PM, Edsall, William (WJ) wrote:
> > > Hello,
> > > We have a strange problem.
> > >
> > > Two rocks clusters on one network. Cluster A is rocks 5.1, Cluster B is
> > > rocks 5.3.
> > >
> > > Cluster A just started acting up and the private network will disable
> > > itself.
> > > If you try to manually start:
> > > # ifup eth0
> > > Error, some other host already uses address 10.1.1.1.
> > >
> > > So I did an arping for this address, and it returns the MAC of the
> > > public NIC on Cluster B:
> > > # arping -c2 -w3 -D -I eth1 10.1.1.1
> > > ARPING 10.1.1.1 from 0.0.0.0 eth1
> > > Unicast reply from 10.1.1.1 [00:30:48:BB:60:07] for 10.1.1.1
> > > [00:30:48:BB:60:07] 0.736ms
> > > Sent 1 probes (1 broadcast(s))
> > > Received 1 response(s)
> > >
> > > The public NIC's of both cluster A and B are on the same switch. Would
> > > splitting them up to separate physical switches prevent this broadcast
> > > hiccup?
> > >
> > >
> > > _______________________________________
> > > William J. Edsall
> > >
> > >
> > >
> > >
> > > -------------- next part --------------
> > > An HTML attachment was scrubbed...
> > > URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101018/757167eb/attachment.html
> >
> >
> > --
> >
> >
> > Lloyd Brown
> > Systems Administrator
> > Fulton Supercomputing Lab
> > Brigham Young University
> > http://marylou.byu.edu
> >
> >
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > npaci-rocks-discussion mailing list
> > npaci-rocks...@sdsc.edu
> > https://lists.sdsc.edu/mailman/listinfo/npaci-rocks-discussion
> >
> >
> > End of npaci-rocks-discussion Digest, Vol 51, Issue 18
> > ******************************************************
>
>
>
> ------------------------------
>
> Message: 15
> Date: Wed, 20 Oct 2010 12:38:27 -0400
> From: Joe Landman <lan...@scalableinformatics.com>
> Subject: Re: [Rocks-Discuss] ssh takes forever (Philip Papadopoulos)
> To: npaci-rocks...@sdsc.edu
> Message-ID: <4CBF1B03...@scalableinformatics.com>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 10/20/2010 12:26 PM, tomisla...@gmx.com wrote:
> > Hi Philip,
> >
> > thanks for the advice, I have tried it, and the login time climed up
> > to 31s:
> >
> > [foam@showme ~]$ unset DISPLAY [foam@showme ~]$ time ssh compute-0-0
> > "ls" Desktop Documents OpenFOAM
> >
> > real 0m31.019s user 0m0.018s sys 0m0.005s [foam@showme ~]$
> >
> > Any further advice on the ssh issue? I should run very long
> > simulations just to account for the possible changes in the login
> > time, if I am to time the runs for profiling. Let's hope the ganglia
> > has the answer I'm looking for: the actual computing time of the
> > process. I don't want the preparations or the write time in my data,
> > but it would be nice if the ssh didn't take so long, then I could
> > neglect it with respect to the computational time.
>
> This is a DNS issue ... we've seen this many times before, usually
> happens when resolution on the client side is broken. Change the client
> compute nodes /etc/ssh/sshd_config to have 'UseDNS Off' (its on by
> default). Restart sshd on the compute node, and then try again.
>
> Another mechanism is to replicate the hosts file to the compute nodes.
>
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics Inc.
> email: lan...@scalableinformatics.com
> web : http://scalableinformatics.com
> http://scalableinformatics.com/jackrabbit
> phone: +1 734 786 8423 x121
> fax : +1 866 888 3112
> cell : +1 734 612 4615
>
>
> ------------------------------
>
> Message: 16
> Date: Wed, 20 Oct 2010 09:38:11 -0700
> From: Ian Kaufman <ikau...@soe.ucsd.edu>
> Subject: Re: [Rocks-Discuss] ssh takes forever (Philip Papadopoulos)
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTi=RByg04aBxs0bh9rErS...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Tomislav,
>
> Looks like Joe beat me to it.
>
> To make sure, try sshing to the IP address and see if the delay decreases.
>
> Ian
>
> On Wed, Oct 20, 2010 at 9:26 AM, tomisla...@gmx.com
> <tomisla...@gmx.com> wrote:
> > Hi Philip,
> >
> > thanks for the advice, I have tried it, and the login time climed up to 31s:
> >
> > [foam@showme ~]$ unset DISPLAY
> > [foam@showme ~]$ time ssh compute-0-0 "ls"
> > Desktop
> > Documents
> > OpenFOAM
> >
> > real ? ?0m31.019s
> > user ? ?0m0.018s
> > sys ? ? 0m0.005s
> > [foam@showme ~]$
> >
> > Any further advice on the ssh issue? I should run very long simulations just to account for the possible changes in the login time, if I am to time the runs for profiling. Let's hope the ganglia has the answer I'm looking for: the actual computing time of the process. I don't want the preparations or the write time in my data, but it would be nice if the ssh didn't take so long, then I could neglect it with respect to the computational time.
> >
> > Best regards,
> > Tomislav Maric
>
> --
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>
>
> ------------------------------
>
> Message: 17
> Date: Wed, 20 Oct 2010 11:42:19 -0500
> From: Kevin Doman <kdom...@gmail.com>
> Subject: Re: [Rocks-Discuss] ssh takes forever (Philip Papadopoulos)
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTimtDDweerN_s_Q2O3C2=apJqgN=k_-fw+...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> I would first check /etc/resolv.conf file, make sure something didn't change it.
>
> Then follow others' advice to propagate the /etc/hosts file to all
> your nodes. I modifile my /var/411/Files.mk file to include /etc/hosts
> (FILES += /etc/hosts \)
>
> K.
>
>
>
> On Wed, Oct 20, 2010 at 11:38 AM, Joe Landman
> <lan...@scalableinformatics.com> wrote:
> > On 10/20/2010 12:26 PM, tomisla...@gmx.com wrote:
> >>
> >> Hi Philip,
> >>
> >> thanks for the advice, I have tried it, and the login time climed up
> >> to 31s:
> >>
> >> [foam@showme ~]$ unset DISPLAY [foam@showme ~]$ time ssh compute-0-0
> >> "ls" Desktop Documents OpenFOAM
> >>
> >> real ? ?0m31.019s user ? ?0m0.018s sys ? ? 0m0.005s [foam@showme ~]$
> >>
> >> Any further advice on the ssh issue? I should run very long
> >> simulations just to account for the possible changes in the login
> >> time, if I am to time the runs for profiling. Let's hope the ganglia
> >> has the answer I'm looking for: the actual computing time of the
> >> process. I don't want the preparations or the write time in my data,
> >> but it would be nice if the ssh didn't take so long, then I could
> >> neglect it with respect to the computational time.
> >
> > This is a DNS issue ... we've seen this many times before, usually happens
> > when resolution on the client side is broken. ?Change the client compute
> > nodes /etc/ssh/sshd_config to have 'UseDNS Off' (its on by default).
> > ?Restart sshd on the compute node, and then try again.
> >
> > Another mechanism is to replicate the hosts file to the compute nodes.
> >
> >
> > --
> > Joseph Landman, Ph.D
> > Founder and CEO
> > Scalable Informatics Inc.
> > email: lan...@scalableinformatics.com
> > web ?: http://scalableinformatics.com
> > ? ? ? http://scalableinformatics.com/jackrabbit
> > phone: +1 734 786 8423 x121
> > fax ?: +1 866 888 3112
> > cell : +1 734 612 4615
> >
>
>
> ------------------------------
>
> Message: 18
> Date: Wed, 20 Oct 2010 12:56:53 -0400
> From: "Jim Kress" <jimkr...@kressworks.org>
> Subject: [Rocks-Discuss] Fully Qualified Domain Name Change
> To: <npaci-rocks...@sdsc.edu>
> Message-ID: <000001cb7077$cc4c6950$64e53bf0$@org>
> Content-Type: text/plain; charset="us-ascii"
>
> How does one change the Fully Qualified Domain Name (FQDN) for a ROCKS 5.3
> cluster WITHOUT having to reinstall it?
>
>
>
> BTW, I suggest a change in the installation instructions that informs the
> user that the FQDN must be a real domain, owned or controlled by the user.
> This is not clear in the current instructions. In fact, it appears to the
> normal user installing ROCKS that any dummy name can be used for the FQDN
> when doing the install.
>
>
>
> Thanks.
>
>
>
> Jim
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101020/ac4a9042/attachment.html
>
>
> ------------------------------
>
> Message: 19
> Date: Wed, 20 Oct 2010 10:24:37 -0700
> From: Greg Bruno <greg....@gmail.com>
> Subject: Re: [Rocks-Discuss] ssh takes forever (Philip Papadopoulos)
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTin6fY8PHJh6_AxC4...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Wed, Oct 20, 2010 at 9:38 AM, Joe Landman
> <lan...@scalableinformatics.com> wrote:
> >
> > This is a DNS issue ... we've seen this many times before, usually happens
> > when resolution on the client side is broken. ?Change the client compute
> > nodes /etc/ssh/sshd_config to have 'UseDNS Off' (its on by default).
> > ?Restart sshd on the compute node, and then try again.
> >
> > Another mechanism is to replicate the hosts file to the compute nodes.
>
> another method is this:
>
> on the frontend, execute:
>
> # rocks set attr ssh_use_dns false
>
> then reinstall your compute nodes.
>
> - gb
>
>
> ------------------------------
>
> Message: 20
> Date: Tue, 19 Oct 2010 15:55:10 -0700
> From: "Karengin, Mr. Dean, Contractor, Code 7501.1"
> <dean.kar...@nrlmry.navy.mil>
> Subject: [Rocks-Discuss] Call a shell script from extend-compute.xml
> To: "'Discussion of Rocks Clusters'" <npaci-rocks...@sdsc.edu>
> Message-ID: <BCC96F2E4E87334C942E85D098C5450057216318@zeus>
> Content-Type: text/plain; charset="us-ascii"
>
> Good people of the ROCKS world,
>
> I am trying to add a rather long and large shell script to configure the compute nodes on a new ROCKS 5.3 cluster. I set :
>
> <shell="bash"> script </shell> and still get parser errors.
>
> Is there a better way to do this? Or am I completely off base?
>
>
>
>
> Dean Karengin (Contractor)
> Systems Administrator
> Dell Services Federal Government
> Naval Research Laboratory
> Marine Meteorology Division
> 7 Grace Hopper Ave.
> Monterey, Ca 93943
> Code 7501.1
> Office: 831-656-4243
> Cell: 831-392-7092
> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of jean-francois prieur
> Sent: Tuesday, October 19, 2010 1:17 PM
> To: Discussion of Rocks Clusters
> Subject: Re: [Rocks-Discuss] looking for simple ROCKS sysadmin training materials
>
> I have nothing useful to add ;) but I just wanted to thank you for your SGE training presentations that you made public. Very useful resource.
>
> Regards,
> JF Prieur
>
> On 19 October 2010 14:09, Chris Dagdigian <d...@sonsorol.org> wrote:
>
> >
> > Just wanted to thank everyone for the rapid feedback, in particular
> > Phil and others who pointed me to the ROCKS-A-Palooza presentations -
> > I'm going to use these materials to drive a basic half-day
> > hackfest/workshop on ROCKS admin using a lot of interactive and "How
> > do I do X?" type examples on an active cluster.
> >
> > Regards,
> > Chris
> >
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101019/e60eb742/attachment.html
>
>
> ------------------------------
>
> Message: 21
> Date: Mon, 18 Oct 2010 20:20:12 +0200
> From: Lino Garc?a Tarr?s <lino....@upc.edu>
> Subject: Re: [Rocks-Discuss] SGE, slots, and affinity
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID: <4CBC8FDC...@upc.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> >
> > Hi all - we have some new 32 core AMD nodes (Rocks 5.2, SGE 6.2u2,
> > openmpi 1.4.1), and it makes a huge difference to add --mca mpi_paffinity_alone 1
> > (or perhaps --bind-to-core) for our openmpi runs on these nodes. However, I'd
> > like to be able to share the nodes between multiple small parallel jobs,
> > but each mpirun just assigns all its processes to the same cores, so they conflict
> > (and slow down by a factor of 100). Does anyone know what it would take for
> > something like this to work? Is there any integration for this via SGE (it would
> > need to assign particular slots to particular jobs, so that we could construct and
> > OpenMPI rank mapping file)? Another SGE version? Another queuing system?
> > Some other solution that I've missed?
> >
> >
> Hi Noam,
> Some time ago I had to deal with a similar race condition problem with
> both MVAPICH and Intel-MPI and PBS on Rocks 5.1. I solved it by using
> the following mpirun parameters:
>
> -. MVAPICH: VIADEV_USE_AFFINITY=0
> -. IntelMPI: -env I_MPI_PIN_DOMAIN auto
>
> I hope it helps to you as well.
> Cheers,
> Lino.
>
>
> ------------------------------
>
> Message: 22
> Date: Wed, 20 Oct 2010 10:31:41 -0700
> From: "Karengin, Mr. Dean, Contractor, Code 7501.1"
> <dean.kar...@nrlmry.navy.mil>
> Subject: [Rocks-Discuss] LDAP Authentication in ROCKS 5.3 x86_64
> To: "'npaci-rocks...@sdsc.edu'"
> <npaci-rocks...@sdsc.edu>
> Message-ID: <BCC96F2E4E87334C942E85D098C545005721631E@zeus>
> Content-Type: text/plain; charset="us-ascii"
>
> We use LDAP for authentication and so on. Does every compute node need to be made ldap capable (hitting off our LDAP servers) or can I configure the front end, and do a "rocks sync users"? It doesn't seem like it would work, but I have to ask before I go through the work.
>
> On a related note, if I put all the config info in the extend-compute.xml, will that suffice? Has anyone actually done this? Or is there a better way to do it?
>
> Thanks for taking the time to read this.
>
>
>
> Dean Karengin (Contractor)
> Systems Administrator
> Dell Services Federal Government
> Naval Research Laboratory
> Marine Meteorology Division
> 7 Grace Hopper Ave.
> Monterey, Ca 93943
> Code 7501.1
> Office: 831-656-4243
> Cell: 831-392-7092
>
> ------------------------------
>
> Message: 23
> Date: Wed, 20 Oct 2010 13:36:14 -0400
> From: Noam Bernstein <noam.be...@nrl.navy.mil>
> Subject: Re: [Rocks-Discuss] SGE, slots, and affinity
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID: <679FD177-AFF5-4639...@nrl.navy.mil>
> Content-Type: text/plain; charset=iso-8859-1
>
> Unfortunately, I really want affinity to be on, because it makes my
> code much faster. But for that to work with different jobs sharing
> a node, they have to agree on which slots belong to which
> jobs, and so either the scheduler assigns that information,
> or I hack up something which does it automatically (which will
> be complex and cumbersome, I suspect).
>
> Noam
>
> On Oct 18, 2010, at 2:20 PM, Lino Garc?a Tarr?s wrote:
>
> > Hi Noam,
> > Some time ago I had to deal with a similar race condition problem with both MVAPICH and Intel-MPI and PBS on Rocks 5.1. I solved it by using the following mpirun parameters:
> >
> > -. MVAPICH: VIADEV_USE_AFFINITY=0
> > -. IntelMPI: -env I_MPI_PIN_DOMAIN auto
> >
> > I hope it helps to you as well.
> > Cheers,
> > Lino.
>
>
>
> ------------------------------
>
> Message: 24
> Date: Wed, 20 Oct 2010 10:42:26 -0700
> From: "Bart Brashers" <bbra...@Environcorp.com>
> Subject: Re: [Rocks-Discuss] Call a shell script from
> extend-compute.xml
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>
> Message-ID:
> <1B8D1B9BF4DCDC4A90A4...@irvine01.irvine.environ.local>
>
> Content-Type: text/plain; charset="us-ascii"
>
> All the code inside the <post> section of an extend-compute.xml uses
> bash by default. It says so here:
>
> http://www.rocksclusters.org/roll-documentation/base/5.3/customization-p
> ostconfig.html
>
> " Put your bash scripts in between the tags <post> and </post>:"
>
> Bart
>
>
>
> > Good people of the ROCKS world,
> >
> > I am trying to add a rather long and large shell script to configure
> the
> > compute nodes on a new ROCKS 5.3 cluster. I set :
> >
> > <shell="bash"> script </shell> and still get parser errors.
> >
> > Is there a better way to do this? Or am I completely off base?
> >
> >
> >
> >
> > Dean Karengin (Contractor)
> > Systems Administrator
> > Dell Services Federal Government
> > Naval Research Laboratory
> > Marine Meteorology Division
> > 7 Grace Hopper Ave.
> > Monterey, Ca 93943
> > Code 7501.1
> > Office: 831-656-4243
> > Cell: 831-392-7092
> > -----Original Message-----
> > From: npaci-rocks-dis...@sdsc.edu
> [mailto:npaci-rocks-discussion-
> > bou...@sdsc.edu] On Behalf Of jean-francois prieur
> > Sent: Tuesday, October 19, 2010 1:17 PM
> > To: Discussion of Rocks Clusters
> > Subject: Re: [Rocks-Discuss] looking for simple ROCKS sysadmin
> training
> > materials
> >
> > I have nothing useful to add ;) but I just wanted to thank you for
> your SGE
> > training presentations that you made public. Very useful resource.
> >
> > Regards,
> > JF Prieur
> >
> > On 19 October 2010 14:09, Chris Dagdigian <d...@sonsorol.org> wrote:
> >
> > >
> > > Just wanted to thank everyone for the rapid feedback, in particular
> > > Phil and others who pointed me to the ROCKS-A-Palooza presentations
> -
> > > I'm going to use these materials to drive a basic half-day
> > > hackfest/workshop on ROCKS admin using a lot of interactive and "How
> > > do I do X?" type examples on an active cluster.
> > >
> > > Regards,
> > > Chris
> > >
> > >
> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
> > discussion/attachments/20101019/e60eb742/attachment.html
>
>
> This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.
>
>
> ------------------------------
>
> Message: 25
> Date: Wed, 20 Oct 2010 10:52:16 -0700
> From: Greg Bruno <greg....@gmail.com>
> Subject: Re: [Rocks-Discuss] Dreaded "choose language" on the nodes
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTin6Vjav-X0pa2KXs...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Tue, Oct 19, 2010 at 7:41 PM, Yoon Tiem Leong <tly...@usm.my> wrote:
> > I experienced the similar problem too. It seems that the compute node can't find the kickstart from the frontent. What I did was to enable the https in the frontend (check https in System -> Security and Firewall) to see if the compute node can find the kickstart key again. It works for me.
> >
>
> you shouldn't use the RedHat tools to modify the frontend's firewall.
>
> to restore your firewall settings to the stock rocks settings, execute:
>
> # cd /etc/sysconfig
> # rm iptables
> # co iptables
> # service iptables restart
>
> - gb
>
>
> ------------------------------
>
> Message: 26
> Date: Wed, 20 Oct 2010 10:51:14 -0700
> From: Philip Papadopoulos <philip.pa...@gmail.com>
> Subject: Re: [Rocks-Discuss] Call a shell script from
> extend-compute.xml
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTi=CTqNn9oRTWFzN9uWm3WszT9J0WO=rQJ4...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I'm certain that your shell script itself has characters (like < and &) that
> the
> higher level XML parser is trying to parse.
> The simplest way to get around this is
> <post>
> <![CDATA[
> "contents of your shell script"
> ]]>
> </post>
>
> Text inside of CDATA blocks is pass through -uninterpreted- by the XML
> parser. This
> means that substitutions using Rocks attributes (eg. &Hostname;) are NOT
> expanded inside
> of a CDATA block. If you need such attributes expanded, then do not use
> CDATA and instead:
> within your shell script,
> you have to replace XML interpreted characters with their XML-ized
> counterparts:
> eg.
> & -> &
> < -> <
> > -> >
>
> -P
>
>
> On Tue, Oct 19, 2010 at 3:55 PM, Karengin, Mr. Dean, Contractor, Code 7501.1
> <dean.kar...@nrlmry.navy.mil> wrote:
>
> > Good people of the ROCKS world,
> >
> > I am trying to add a rather long and large shell script to configure the
> > compute nodes on a new ROCKS 5.3 cluster. I set :
> >
> > <shell="bash"> script </shell> and still get parser errors.
> >
> > Is there a better way to do this? Or am I completely off base?
> >
> >
> >
> >
> > Dean Karengin (Contractor)
> > Systems Administrator
> > Dell Services Federal Government
> > Naval Research Laboratory
> > Marine Meteorology Division
> > 7 Grace Hopper Ave.
> > Monterey, Ca 93943
> > Code 7501.1
> > Office: 831-656-4243
> > Cell: 831-392-7092
> > -----Original Message-----
> > From: npaci-rocks-dis...@sdsc.edu [mailto:
> > npaci-rocks-dis...@sdsc.edu] On Behalf Of jean-francois prieur
> > Sent: Tuesday, October 19, 2010 1:17 PM
> > To: Discussion of Rocks Clusters
> > Subject: Re: [Rocks-Discuss] looking for simple ROCKS sysadmin training
> > materials
> >
> > I have nothing useful to add ;) but I just wanted to thank you for your SGE
> > training presentations that you made public. Very useful resource.
> >
> > Regards,
> > JF Prieur
> >
> > On 19 October 2010 14:09, Chris Dagdigian <d...@sonsorol.org> wrote:
> >
> > >
> > > Just wanted to thank everyone for the rapid feedback, in particular
> > > Phil and others who pointed me to the ROCKS-A-Palooza presentations -
> > > I'm going to use these materials to drive a basic half-day
> > > hackfest/workshop on ROCKS admin using a lot of interactive and "How
> > > do I do X?" type examples on an active cluster.
> > >
> > > Regards,
> > > Chris
> > >
> > >
> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101019/e60eb742/attachment.html
> >
> >
>
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
> 858-822-3628 (Ofc)
> 619-331-2990 (Fax)
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101020/1dc11165/attachment.html
>
>
> ------------------------------
>
> Message: 27
> Date: Wed, 20 Oct 2010 10:54:16 -0700
> From: Ian Kaufman <ikau...@soe.ucsd.edu>
> Subject: Re: [Rocks-Discuss] LDAP Authentication in ROCKS 5.3 x86_64
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTim6vdd=G-Ue_494=Rf8s=4ZVZKTf_Q...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> It's been my experience that you need each node to query the LDAP
> servers. You need proper UID and GID mappings on each node to be able
> to use executables, read files, and most importantly, to be able to
> read your .ssh files.
>
> I used to manage a cluster that added the LDAP RPMs via
> extend-compute, and then called a post install script to drop the
> necessary files into place. But this was a ROCKS 4.X system. I am not
> sure if ROCKS 5 handles this more elegantly.
>
> Ian
>
> On Wed, Oct 20, 2010 at 10:31 AM, Karengin, Mr. Dean, Contractor, Code
> 7501.1 <dean.kar...@nrlmry.navy.mil> wrote:
> > We use LDAP for authentication and so on. Does every compute node need to be made ldap capable (hitting off our LDAP servers) or can I configure the front end, and do a "rocks sync users"? It doesn't seem like it would work, but I have to ask before I go through the work.
> >
> > On a related note, if I put all the config info in the extend-compute.xml, will that suffice? Has anyone actually done this? Or is there a better way to do it?
> >
> > Thanks for taking the time to read this.
> >
> >
> >
> > Dean Karengin (Contractor)
> > Systems Administrator
> > Dell Services Federal Government
> > Naval Research Laboratory
> > Marine Meteorology Division
> > 7 Grace Hopper Ave.
> > Monterey, Ca 93943
> > Code 7501.1
> > Office: 831-656-4243
> > Cell: 831-392-7092
>
> --
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
>
>
> ------------------------------
>
> Message: 28
> Date: Wed, 20 Oct 2010 11:03:04 -0700
> From: "Bart Brashers" <bbra...@Environcorp.com>
> Subject: Re: [Rocks-Discuss] Dreaded "choose language" on the nodes
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>
> Message-ID:
> <1B8D1B9BF4DCDC4A90A4...@irvine01.irvine.environ.local>
>
> Content-Type: text/plain; charset="us-ascii"
>
> > On Tue, Oct 19, 2010 at 7:41 PM, Yoon Tiem Leong <tly...@usm.my>
> wrote:
> > > I experienced the similar problem too. It seems that the compute
> node can't
> > find the kickstart from the frontent. What I did was to enable the
> https in
> > the frontend (check https in System -> Security and Firewall) to see
> if the
> > compute node can find the kickstart key again. It works for me.
> > >
> >
> > you shouldn't use the RedHat tools to modify the frontend's firewall.
> >
> > to restore your firewall settings to the stock rocks settings,
> execute:
> >
> > # cd /etc/sysconfig
> > # rm iptables
> > # co iptables
> > # service iptables restart
> >
> > - gb
>
> Would y'all be willing to add a section 2.7 to the docs "What NOT to do
> on your frontend"? Include things like this...
>
> Bart
>
>
>
> This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.
>
>
> ------------------------------
>
> Message: 29
> Date: Wed, 20 Oct 2010 11:18:45 -0700
> From: Larry Baker <ba...@usgs.gov>
> Subject: Re: [Rocks-Discuss] Call a shell script from
> extend-compute.xml
> To: "Karengin, Mr. Dean, Contractor, Code 7501.1"
> <dean.kar...@nrlmry.navy.mil>
> Cc: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID: <96A5A8C9-7CAA-44CA...@usgs.gov>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> Dean,
>
> I assume you want to use <eval shell="bash"> script </eval>.
>
> The syntax you tried
>
> > <shell="bash">
>
> is not valid XML. XML has elements, whose name, or tag, opens and
> closes the definition of an element's contents (<element_name ...> ...
> </element_name>). The opening tag of the element definition may also
> include modifiers, called attributes. Attributes are always of the
> form attribute_name="value". The XML element you tried to define
> contained only an attribute, but no element name. Google "xml syntax"
> and you will find quite a few basic XML resources.
>
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
>
> On Oct 19, 2010, at 3:55 PM, Karengin, Mr. Dean, Contractor, Code
> 7501.1 wrote:
>
> > Good people of the ROCKS world,
> >
> > I am trying to add a rather long and large shell script to configure
> > the compute nodes on a new ROCKS 5.3 cluster. I set :
> >
> > <shell="bash"> script </shell> and still get parser errors.
> >
> > Is there a better way to do this? Or am I completely off base?
> >
> >
> >
> >
> > Dean Karengin (Contractor)
> > Systems Administrator
> > Dell Services Federal Government
> > Naval Research Laboratory
> > Marine Meteorology Division
> > 7 Grace Hopper Ave.
> > Monterey, Ca 93943
> > Code 7501.1
> > Office: 831-656-4243
> > Cell: 831-392-7092
> > -----Original Message-----
> > From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-dis...@sdsc.edu
> > ] On Behalf Of jean-francois prieur
> > Sent: Tuesday, October 19, 2010 1:17 PM
> > To: Discussion of Rocks Clusters
> > Subject: Re: [Rocks-Discuss] looking for simple ROCKS sysadmin
> > training materials
> >
> > I have nothing useful to add ;) but I just wanted to thank you for
> > your SGE training presentations that you made public. Very useful
> > resource.
> >
> > Regards,
> > JF Prieur
> >
> > On 19 October 2010 14:09, Chris Dagdigian <d...@sonsorol.org> wrote:
> >
> >>
> >> Just wanted to thank everyone for the rapid feedback, in particular
> >> Phil and others who pointed me to the ROCKS-A-Palooza presentations -
> >> I'm going to use these materials to drive a basic half-day
> >> hackfest/workshop on ROCKS admin using a lot of interactive and "How
> >> do I do X?" type examples on an active cluster.
> >>
> >> Regards,
> >> Chris
> >>
> >>
> >>
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101019/e60eb742/attachment.html
>
>
>
> ------------------------------
>
> Message: 30
> Date: Wed, 20 Oct 2010 20:19:30 +0200
> From: "tomisla...@gmx.com" <tomisla...@gmx.com>
> Subject: [Rocks-Discuss] ganglia & profiling
> To: npaci-rocks...@sdsc.edu
> Message-ID: <2010102018...@gmx.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi everyone,
>
> do I need to reinstall the whole frontend if I want to install ganglia roll? I haven't added it in the initial installation, and there is no option to install it on an existng cluster.
>
> Profiling: the cluster profiling seems to be a nonlinear issue. How can I determine the right frequencies of the CPU-s and the right amounts of RAM without going to the store and buying some? This is not possible I guess. Are there any profiling results for the COTS electronics that runs mpi CFD applications? All over the literature I'm reading about having a "well balanced machine", so I would be really grateful if someone can point me in this direction ?("it depends" is not a pointer, it's a slap in the face :D ). The differences in the prices of both the power and the components (yes even for COTS stuff), as well as the stuff I've read in the books and the articles are making me rely on profiling to make a decision on the cluster config. For a newb, all this is really confusing (now I know why it's called a dark art).
>
> Tomislav Maric
>
>
> ------------------------------
>
> Message: 31
> Date: Wed, 20 Oct 2010 11:25:54 -0700
> From: "Karengin, Mr. Dean, Contractor, Code 7501.1"
> <dean.kar...@nrlmry.navy.mil>
> Subject: Re: [Rocks-Discuss] Call a shell script from
> extend-compute.xml
> To: "'Discussion of Rocks Clusters'" <npaci-rocks...@sdsc.edu>
> Message-ID: <BCC96F2E4E87334C942E85D098C5450057216320@zeus>
> Content-Type: text/plain; charset="us-ascii"
>
> Thanks for the quick replies.
>
> I had actually tried <eval shell="bash"> script </eval> and it failed to run, but I did find (just this morning!) that <eval shell="bash"> absolute path to script </eval> worked quite nicely.
>
> Thanks for taking the time to answer !
>
>
> Dean Karengin (Contractor)
> Systems Administrator
> Dell Services Federal Government
> Naval Research Laboratory
> Marine Meteorology Division
> 7 Grace Hopper Ave.
> Monterey, Ca 93943
> Code 7501.1
> Office: 831-656-4243
> Cell: 831-392-7092
> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Bart Brashers
> Sent: Wednesday, October 20, 2010 10:42 AM
> To: Discussion of Rocks Clusters
> Subject: Re: [Rocks-Discuss] Call a shell script from extend-compute.xml
>
> All the code inside the <post> section of an extend-compute.xml uses bash by default. It says so here:
>
> http://www.rocksclusters.org/roll-documentation/base/5.3/customization-p
> ostconfig.html
>
> " Put your bash scripts in between the tags <post> and </post>:"
>
> Bart
>
>
>
> > Good people of the ROCKS world,
> >
> > I am trying to add a rather long and large shell script to configure
> the
> > compute nodes on a new ROCKS 5.3 cluster. I set :
> >
> > <shell="bash"> script </shell> and still get parser errors.
> >
> > Is there a better way to do this? Or am I completely off base?
> >
> >
> >
> >
> > Dean Karengin (Contractor)
> > Systems Administrator
> > Dell Services Federal Government
> > Naval Research Laboratory
> > Marine Meteorology Division
> > 7 Grace Hopper Ave.
> > Monterey, Ca 93943
> > Code 7501.1
> > Office: 831-656-4243
> > Cell: 831-392-7092
> > -----Original Message-----
> > From: npaci-rocks-dis...@sdsc.edu
> [mailto:npaci-rocks-discussion-
> > bou...@sdsc.edu] On Behalf Of jean-francois prieur
> > Sent: Tuesday, October 19, 2010 1:17 PM
> > To: Discussion of Rocks Clusters
> > Subject: Re: [Rocks-Discuss] looking for simple ROCKS sysadmin
> training
> > materials
> >
> > I have nothing useful to add ;) but I just wanted to thank you for
> your SGE
> > training presentations that you made public. Very useful resource.
> >
> > Regards,
> > JF Prieur
> >
> > On 19 October 2010 14:09, Chris Dagdigian <d...@sonsorol.org> wrote:
> >
> > >
> > > Just wanted to thank everyone for the rapid feedback, in particular
> > > Phil and others who pointed me to the ROCKS-A-Palooza presentations
> -
> > > I'm going to use these materials to drive a basic half-day
> > > hackfest/workshop on ROCKS admin using a lot of interactive and "How
> > > do I do X?" type examples on an active cluster.
> > >
> > > Regards,
> > > Chris
> > >
> > >
> > >
> > -------------- next part -------------- An HTML attachment was
> > scrubbed...
> > URL: https://lists.sdsc.edu/pipermail/npaci-rocks-
> > discussion/attachments/20101019/e60eb742/attachment.html
>
>
> This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.
>
>
> ------------------------------
>
> Message: 32
> Date: Wed, 20 Oct 2010 11:27:00 -0700
> From: "Karengin, Mr. Dean, Contractor, Code 7501.1"
> <dean.kar...@nrlmry.navy.mil>
> Subject: Re: [Rocks-Discuss] Call a shell script from
> extend-compute.xml
> To: "'Discussion of Rocks Clusters'" <npaci-rocks...@sdsc.edu>
> Message-ID: <BCC96F2E4E87334C942E85D098C5450057216321@zeus>
> Content-Type: text/plain; charset="us-ascii"
>
> Philip,
>
> Yes, that was exactly the issue. I did find a workaround, thank you very much!
>
>
> Dean Karengin (Contractor)
> Systems Administrator
> Dell Services Federal Government
> Naval Research Laboratory
> Marine Meteorology Division
> 7 Grace Hopper Ave.
> Monterey, Ca 93943
> Code 7501.1
> Office: 831-656-4243
> Cell: 831-392-7092
> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of Philip Papadopoulos
> Sent: Wednesday, October 20, 2010 10:51 AM
> To: Discussion of Rocks Clusters
> Subject: Re: [Rocks-Discuss] Call a shell script from extend-compute.xml
>
> I'm certain that your shell script itself has characters (like < and &) that the higher level XML parser is trying to parse.
> The simplest way to get around this is
> <post>
> <![CDATA[
> "contents of your shell script"
> ]]>
> </post>
>
> Text inside of CDATA blocks is pass through -uninterpreted- by the XML parser. This means that substitutions using Rocks attributes (eg. &Hostname;) are NOT expanded inside of a CDATA block. If you need such attributes expanded, then do not use CDATA and instead:
> within your shell script,
> you have to replace XML interpreted characters with their XML-ized
> counterparts:
> eg.
> & -> &
> < -> <
> > -> >
>
> -P
>
>
> On Tue, Oct 19, 2010 at 3:55 PM, Karengin, Mr. Dean, Contractor, Code 7501.1 <dean.kar...@nrlmry.navy.mil> wrote:
>
> > Good people of the ROCKS world,
> >
> > I am trying to add a rather long and large shell script to configure
> > the compute nodes on a new ROCKS 5.3 cluster. I set :
> >
> > <shell="bash"> script </shell> and still get parser errors.
> >
> > Is there a better way to do this? Or am I completely off base?
> >
> >
> >
> >
> > Dean Karengin (Contractor)
> > Systems Administrator
> > Dell Services Federal Government
> > Naval Research Laboratory
> > Marine Meteorology Division
> > 7 Grace Hopper Ave.
> > Monterey, Ca 93943
> > Code 7501.1
> > Office: 831-656-4243
> > Cell: 831-392-7092
> > -----Original Message-----
> > From: npaci-rocks-dis...@sdsc.edu [mailto:
> > npaci-rocks-dis...@sdsc.edu] On Behalf Of jean-francois
> > prieur
> > Sent: Tuesday, October 19, 2010 1:17 PM
> > To: Discussion of Rocks Clusters
> > Subject: Re: [Rocks-Discuss] looking for simple ROCKS sysadmin
> > training materials
> >
> > I have nothing useful to add ;) but I just wanted to thank you for
> > your SGE training presentations that you made public. Very useful resource.
> >
> > Regards,
> > JF Prieur
> >
> > On 19 October 2010 14:09, Chris Dagdigian <d...@sonsorol.org> wrote:
> >
> > >
> > > Just wanted to thank everyone for the rapid feedback, in particular
> > > Phil and others who pointed me to the ROCKS-A-Palooza presentations
> > > - I'm going to use these materials to drive a basic half-day
> > > hackfest/workshop on ROCKS admin using a lot of interactive and "How
> > > do I do X?" type examples on an active cluster.
> > >
> > > Regards,
> > > Chris
> > >
> > >
> > >
> > -------------- next part -------------- An HTML attachment was
> > scrubbed...
> > URL:
> > https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20
> > 101019/e60eb742/attachment.html
> >
> >
>
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
> 858-822-3628 (Ofc)
> 619-331-2990 (Fax)
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20101020/1dc11165/attachment.html
>
>
> ------------------------------
>
> Message: 33
> Date: Wed, 20 Oct 2010 11:47:16 -0700
> From: Greg Bruno <greg....@gmail.com>
> Subject: Re: [Rocks-Discuss] ganglia & profiling
> To: Discussion of Rocks Clusters <npaci-rocks...@sdsc.edu>
> Message-ID:
> <AANLkTikMV8jKQBaLtnitU...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> On Wed, Oct 20, 2010 at 11:19 AM, tomisla...@gmx.com
> <tomisla...@gmx.com> wrote:
> > Hi everyone,
> >
> > do I need to reinstall the whole frontend if I want to install ganglia roll? I haven't added it in the initial installation, and there is no option to install it on an existng cluster.
> >
>
> you have to do a bit of work, but the following should do it:
>
> - download the ganglia ISO from the rocks web site.
>
> - install the roll:
>
> # rocks add roll ganglia*iso
> # rocks enable roll ganglia
>
> - rebuild the distro:
>
> # cd /export/rocks/install
> # rocks create distro
>
> - install the roll:
>
> # rocks run roll ganglia | bash
>
> you have do one extra step in order to get the 'cluster status' link
> on the frontend's web site:
>
> - create a file named /tmp/ganglia.sql and put the following in it:
>
> insert into wp_links (link_name, link_description, link_url, link_category)
> values ('Cluster Status', 'Click here to monitor your cluster',
> '/ganglia/',
> 2);
>
> - now add that to the wordpress database:
>
> # /opt/rocks/bin/mysql --user=root --p wordpress < /tmp/ganglia.sql
>
> - the above will prompt you for a password. enter the root password
> for your frontend.
>
> - reboot your frontend.
>
> let us know how it goes.
>
> - gb
>
>
> ------------------------------
>
> _______________________________________________
> npaci-rocks-discussion mailing list
> npaci-rocks...@sdsc.edu
> https://lists.sdsc.edu/mailman/listinfo/npaci-rocks-discussion
>
>
> End of npaci-rocks-discussion Digest, Vol 51, Issue 20
> ******************************************************

Reply all

Reply to author

Forward

0 new messages