Main challenges I see are:
The new server will have the same IP as the previous head node, so
only 1 can be connected to external network at once.
The new server will need all data from old /export moved across,
preferably keeping file permissions/ ownership. I also intend to use
XFS for new /export file system while old server uses ext3. If they
both have the same IP address are their any suggestion for best way to
copy across network? I was thinking to temporarily create a secondary
IP for both servers in 192 range so they can communicate over that.
I plan to create a restore roll on the old server to transfer cluster
settings across to the new server, but I am unsure if this will have
any problems with MAC address changes.
Any other suggestion or problems that may crop up?
Thanks,
James
James Rudd
http://jrudd.org/
---------------------
HPC Cluster Administrator
Centre of Excellence for Silicon Photovoltaics and Photonics
University of New South Wales
Sydney NSW 2052
AUSTRALIA
Step 1. Create a restore roll and burn to DVD.
Step 2. Reconfigure your old frontend's private IP Address to 10.1.1.2 (or
something else non-conflicting)
Step 3. Turn off dhcpd on your old frontend. Turn off gmond/gmetad.
Step 4. Disconnect the public network (remove cable from old frontend)
Step 5. Build your new frontend with the restore roll
Step 6. Copy via the local private network from 10.1.1.2 (old frontend) to
10.1.1.1 new frontend. Something
like "ssh 10.1.1.2 'cd /export/home; tar cf - *' | (cd /export/home; tar
xvfBp -)
Step 7. Rebuild your nodes.
-P
--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20111013/d47e3722/attachment.html
A side note, I was planning on using CentOS 5.7 DVD instead of OS
roll. Has anyone experienced any problems using Centos 5.7 with Rocks
5.4.3?
Mailing list archives show 5.6 appeared to work OK, so I'm hoping
there are no new problems with 5.7.
Thanks,
James
My latest cluster is 5.4.3. I started with 5.6 and then patched it to
whatever is the most current patch set from Redhat.
I have one problem, but I'm not sure if it is a 5.7 issue or a scalability
issue. When I add more nodes, I end up with a number of "rocks" zombie
processes (one for each node I add).
[root@pic-admin01 install]# rocks list host | wc
1291 9045 100698
So there are over 1200 entries (half of these are IPMI or PDU type
entries) in the database and at some point along the way, nodes didn't
install right off the bat because of a lag in updating the dhcpd process.
I haven't bothered to debug what is going on yet.
Othewise no isses but of course YMMV.
Tim
If it's working out of the box, leave it alone! Don't fix what ain't
broken :-)
I of course also could be completely and utterly wrong and this is no
longer the case with Rocks, but this is what has been passed down as
sound advice :)
Let's see, how can I put this subtley. Oh yeah.. That is pretty much
complete BS. :) I've been patching dozens of Rocks clusters since 2.x and
while I have had issues here and there, I have never really "broken" a
cluster.
Tim
--
-------------------------------------------
Tim Carlson, PhD
Senior Research Scientist
Environmental Molecular Sciences Laboratory
> On Thu, 13 Oct 2011, Russell Jones wrote:
>
> Let's see, how can I put this subtley. Oh yeah.. That is pretty much
> complete BS. :) I've been patching dozens of Rocks clusters since 2.x and
> while I have had issues here and there, I have never really "broken" a
> cluster.
>
I would add that Tim also has a reasonable "skip list" of things that he
does NOT update.
And, unless I've misunderstood, he points only to the appropriate
RHEL/CentOS repos for the OS, no external repos that can really cause havoc.
Each release of Rocks gets better in terms of isolation from dependence on
particular system versions, so updating most OS-supplied packages are just
fine in practice.
Most damage (unrecoverable) comes about when people do something like "I
enabled the EPEL repo and now everything is broken". Yep, broken alright.
-P
> ------------------------------**-------------
> Tim Carlson, PhD
> Senior Research Scientist
> Environmental Molecular Sciences Laboratory
>
>
--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628 (Ofc)
619-331-2990 (Fax)
-------------- next part --------------
An HTML attachment was scrubbed...
At least I can attest that not upgrading it via yum has prevented me
from even having issues here and there :)
Thanks,
Joe
--
Doing beautiful things is its own reward. - Teller of "Penn and Teller"
-------------- next part --------------
An HTML attachment was scrubbed...
2 issues I found after reinstall:
The /etc/profile.d/ssh-key.sh script appears to have a bug in it
related to checking shell level $SHLVL. If you are running locally it
is fine, but if you are using SSH it quits any subshells you load.
e.g. From SSH session type in bash and it immediately quits. More
common occurrence I found was as soon as I started 'screen' from
within a bash window it would exit with a [screen is terminating]
message.
I commented out the $EXIT $? line to fix it but there should be a
better way to solve it.
I had a problem with DHCP for power and remote manager appliances. I
found that although registered in the DB and appearing in rocks list
host, none of my power or manager hosts were added to dhcp.conf or
named files. I had to go through using an "insert-ethers --replace
manager-0-?" for each server for them to be added to dhcp.conf.
These existed on previous head node and came across with the DB but it
seems that they don't get added to DHCP until they are reinserted by
insert-ethers. This may be related to the appliance attr managed =
false on both appliance types, but I'm not sure.
Thanks,
James
On Fri, Oct 14, 2011 at 10:02 AM, Philip Papadopoulos
<philip.pa...@gmail.com> wrote: