upgrade questions

150 views
Skip to first unread message

Eric Rostetter

unread,
Mar 2, 2012, 5:58:36 PM3/2/12
to gan...@googlegroups.com
I need to add another node to my cluster, so it seems like a good time to
upgrade my Ganeti and/or CentOS software.

Currently all my nodes run CentOS 5.x with Ganeti 2.2.1 on them.

Note that none of this is about instances, only about the host nodes.

So my questions are:

1) Has anyone run a mixed cluster of CentOS 5 and CentOS 6 nodes? Or
should I keep the new node at CentOS 5 also?
2) Is Ganeti 2.5.0-rc5 stable enough to consider using, or should I stick
with 2.4.5 for production use?
3) If using 2.5.0~rc5, should the update to 2.5.0 when released be pain-free?
4) Can I upgrade directly from 2.2 to 2.4/2.5, or should I do step-wise
upgrades (2.2 to 2.3, then 2.3 to 2.4, etc)?

Thanks for any advise you can provide.

--
Eric Rostetter
The Department of Physics
The University of Texas at Austin

Go Longhorns!

Jun Futagawa

unread,
Mar 3, 2012, 3:10:56 AM3/3/12
to gan...@googlegroups.com
Hi,

On 2012/03/03 7:58, Eric Rostetter wrote:

> I need to add another node to my cluster, so it seems like a good time to
> upgrade my Ganeti and/or CentOS software.
>
> Currently all my nodes run CentOS 5.x with Ganeti 2.2.1 on them.
>
> Note that none of this is about instances, only about the host nodes.
>
> So my questions are:
>
> 1) Has anyone run a mixed cluster of CentOS 5 and CentOS 6 nodes? Or
> should I keep the new node at CentOS 5 also?

I have experienced running the mixed cluster temporarily
during the upgrade from CentOS 5 to Scientific Linux 6.

- node1: CentOS 5, node2: CentOS 5, VMs: DRBD and KVM
1. Backup /var/lib/ganeti on both node1 and node2
2. Live migrate all VMs to the node1
3. Clean install of node2, leaving the volume group for VM instance
-> node1: CentOS 5, node2: Scientific Linux 6
4. Restore /var/lib/ganeti to new node2, and start ganeti service
I recommend to use same the DRBD major version.
5. Activate all VMs: gnt-instance activate-disks VM_NAME
* Wait for DRBD sync to complete
6. Live migrate all VMs to the node2
7. Clean install of node1, leaving the volume group for VM instance
-> node1: Scientific Linux 6, node2: Scientific Linux 6
8. Restore /var/lib/ganeti to new node1, and start ganeti service
9. Activate all VMs: gnt-instance activate-disks VM_NAME

> 2) Is Ganeti 2.5.0-rc5 stable enough to consider using, or should I stick
> with 2.4.5 for production use?

I think a stable release is recommended for production use.

> 3) If using 2.5.0~rc5, should the update to 2.5.0 when released be pain-free?

I'm not sure.

> 4) Can I upgrade directly from 2.2 to 2.4/2.5, or should I do step-wise
> upgrades (2.2 to 2.3, then 2.3 to 2.4, etc)?

Yes. I confirmed to be able to upgrade directly from
ganeti-2.2.0-1.el5.noarch.rpm to ganeti-2.4.5-2.el5.noarch.rpm.
You will need to run cfgupgrade after update RPM.
http://docs.ganeti.org/ganeti/current/html/upgrade.html

Regards,

Jun Futagawa

Iustin Pop

unread,
Mar 3, 2012, 6:17:43 PM3/3/12
to gan...@googlegroups.com
On Fri, Mar 02, 2012 at 04:58:36PM -0600, Eric Rostetter wrote:
> I need to add another node to my cluster, so it seems like a good time to
> upgrade my Ganeti and/or CentOS software.
>
> Currently all my nodes run CentOS 5.x with Ganeti 2.2.1 on them.
>
> Note that none of this is about instances, only about the host nodes.
>
> So my questions are:
>
> 2) Is Ganeti 2.5.0-rc5 stable enough to consider using, or should I stick
> with 2.4.5 for production use?

While the 2.5 final release will come "soon" (we wanted to do it this
week), it's not the "stable" one.

> 3) If using 2.5.0~rc5, should the update to 2.5.0 when released be pain-free?

Yes, between the rc series and the final release, upgrading will be
trivial.

regards,
iustin

Eric Rostetter

unread,
Mar 5, 2012, 4:34:42 PM3/5/12
to gan...@googlegroups.com
Quoting Eric Rostetter <rost...@mail.utexas.edu>:

> I need to add another node to my cluster, so it seems like a good time to
> upgrade my Ganeti and/or CentOS software.

It appears from the list that I can run a mixed OS environment, and I can
even upgrade all the OS versions without downtime (by doing them one by one,
migrating instances off each machine as needed). Nice!

And it also appears that upgrading the Ganeti code may be useful and not
too complicated, but requires downtime. Am I correct that there is no way
to upgrade all the node's Ganeti verions (from 2.2 to either 2.4 or 2.5)
without whole-cluster downtime (since the versions won't talk to each
other)?

Assuming there will be downtime, and assuming I use the RPM packages from
Jun Futagawa, does anyone have any "real time estimates" on how long it
takes to shutdown all the nodes, backup /var/lib/ganeti, upgrade the RPM
packages on a single node, run the cfgupgrade on that node, and get a node up
again so we can re-start critical instances on it?

Seems like it shouldn't take long (RPM install is fast, cfgupgrade is probably
fast too, and Ganeti startup is fast, so overall I assume not too much
downtime...). But maybe others have done this and know otherwise (or
can
give me pointers on what to avoid, etc)?

Iustin Pop

unread,
Mar 5, 2012, 4:41:28 PM3/5/12
to gan...@googlegroups.com
On Mon, Mar 05, 2012 at 03:34:42PM -0600, Eric Rostetter wrote:
> Quoting Eric Rostetter <rost...@mail.utexas.edu>:
>
> >I need to add another node to my cluster, so it seems like a good time to
> >upgrade my Ganeti and/or CentOS software.
>
> It appears from the list that I can run a mixed OS environment, and I can
> even upgrade all the OS versions without downtime (by doing them one by one,
> migrating instances off each machine as needed). Nice!

Thanks :)

> And it also appears that upgrading the Ganeti code may be useful and not
> too complicated, but requires downtime. Am I correct that there is no way
> to upgrade all the node's Ganeti verions (from 2.2 to either 2.4 or 2.5)
> without whole-cluster downtime (since the versions won't talk to each
> other)?

Yes, but note that "downtime" means Ganeti downtime, i.e. you won't be
able to run ganeti commands, not that the instances have to be down.

In other words, operation of the cluster is impacted during the
downtime, but the instances themselves not.

> Assuming there will be downtime, and assuming I use the RPM packages from
> Jun Futagawa, does anyone have any "real time estimates" on how long it
> takes to shutdown all the nodes, backup /var/lib/ganeti, upgrade the RPM
> packages on a single node, run the cfgupgrade on that node, and get a node up
> again so we can re-start critical instances on it?

Well, the downtime should be quite short (10-20 minutes or less if you prepared
well and did a few dry-runs on a test cluster). But again, instances
will not be impacted.

> Seems like it shouldn't take long (RPM install is fast, cfgupgrade is probably
> fast too, and Ganeti startup is fast, so overall I assume not too
> much downtime...). But maybe others have done this and know
> otherwise (or can
> give me pointers on what to avoid, etc)?

I would only mention that this process will go a lot smoother if you can
repeat the procedure on a test cluster first. It doesn't have to be a
real cluster, it can even be a simulated one, per
http://code.google.com/p/ganeti/wiki/GanetiInVirtualBox.

Let us know how it goes!

iustin

Eric Rostetter

unread,
Mar 5, 2012, 4:47:25 PM3/5/12
to gan...@googlegroups.com
Quoting Iustin Pop <ius...@google.com>:

>> And it also appears that upgrading the Ganeti code may be useful and not
>> too complicated, but requires downtime. Am I correct that there is no way
>> to upgrade all the node's Ganeti verions (from 2.2 to either 2.4 or 2.5)
>> without whole-cluster downtime (since the versions won't talk to each
>> other)?
>
> Yes, but note that "downtime" means Ganeti downtime, i.e. you won't be
> able to run ganeti commands, not that the instances have to be down.
>
> In other words, operation of the cluster is impacted during the
> downtime, but the instances themselves not.

Ah, cool! So all my instances stay up and running? The only impact
is that I can do things like create new instances, migrate instances,
add nodes, etc? If so, that is better than I expected, and indeed
quite wonderful!

> I would only mention that this process will go a lot smoother if you can
> repeat the procedure on a test cluster first. It doesn't have to be a
> real cluster, it can even be a simulated one, per
> http://code.google.com/p/ganeti/wiki/GanetiInVirtualBox.

I'll look into that..

> Let us know how it goes!

Will do.

> iustin

Iustin Pop

unread,
Mar 5, 2012, 4:53:46 PM3/5/12
to gan...@googlegroups.com
On Mon, Mar 05, 2012 at 03:47:25PM -0600, Eric Rostetter wrote:
> Quoting Iustin Pop <ius...@google.com>:
>
> >>And it also appears that upgrading the Ganeti code may be useful and not
> >>too complicated, but requires downtime. Am I correct that there is no way
> >>to upgrade all the node's Ganeti verions (from 2.2 to either 2.4 or 2.5)
> >>without whole-cluster downtime (since the versions won't talk to each
> >>other)?
> >
> >Yes, but note that "downtime" means Ganeti downtime, i.e. you won't be
> >able to run ganeti commands, not that the instances have to be down.
> >
> >In other words, operation of the cluster is impacted during the
> >downtime, but the instances themselves not.
>
> Ah, cool! So all my instances stay up and running? The only impact
> is that I can do things like create new instances, migrate instances,
> add nodes, etc? If so, that is better than I expected, and indeed
> quite wonderful!

Yes, as far as I remember from 2.2 to 2.5 we didn't have significant
changes. Note that for safety, it would be best to restart all the
instances (gnt-instance stop --all, gnt-instance start --all), but it's
not absolutely needed (you can delay it for a while).

In general, if an upgrade will require instance downtime or restart, we
will note it in the doc/upgrade.rst file. Please read that, and make
sure that cluster verify is happy after the upgrade!

iustin

Eric Rostetter

unread,
Mar 5, 2012, 5:18:29 PM3/5/12
to gan...@googlegroups.com
Quoting Iustin Pop <ius...@google.com>:

> Yes, as far as I remember from 2.2 to 2.5 we didn't have significant
> changes. Note that for safety, it would be best to restart all the
> instances (gnt-instance stop --all, gnt-instance start --all), but it's
> not absolutely needed (you can delay it for a while).

By gosh, this software is even better than I had imagined... What
a treat!

> In general, if an upgrade will require instance downtime or restart, we
> will note it in the doc/upgrade.rst file. Please read that, and make
> sure that cluster verify is happy after the upgrade!

I did read it, but I failed to understand it. My assumption was that stopping
the daemons (/etc/init.d/ganeti stop/restart) would kill the instances. I
see now that is not correct. I'm a very happy camper now, understanding
how things work a lot better than before.

Thanks for all your replies.

Eric Rostetter

unread,
Mar 15, 2012, 11:41:16 AM3/15/12
to gan...@googlegroups.com
Quoting Iustin Pop <ius...@google.com>:

> On Mon, Mar 05, 2012 at 03:34:42PM -0600, Eric Rostetter wrote:
>> Quoting Eric Rostetter <rost...@mail.utexas.edu>:
>>
>> >I need to add another node to my cluster, so it seems like a good time to
>> >upgrade my Ganeti and/or CentOS software.

[...]

> Let us know how it goes!
>
> iustin

Well, just an update. I upgraded my exiting ganeti 2-node cluster to
CentOS 5.8 (from 5.7) without any instance downtime (migrating between
hosts) leaving them at ganeti 2.2.

I then installed a new machine with CentOS 6.2, and ganeti 2.4.5, but
didn't join it to the cluster yet.

I then upgraded the original two nodes to ganeti 2.4.5 without any instance
downtime. (But see below...)

I then joined the new machine to the cluster.

Everything looks pretty good (gnt-cluster verify goes okay, etc). I've
not put any instances on the new machine yet. (NB: this is now a mixed
OS cluster -- two nodes at CentOS 5.x and one at CentOS 6.x, all running
the same ganeti version).

But, on the new CentOS 6.x cluster, this is what I am seeing in the logs,
which is why I've not tried any instances on it yet:

Mar 15 10:30:02 vm3 abrt-server[19687]: statvfs('(null)'): Bad address
Mar 15 10:30:02 vm3 abrtd: Package 'ganeti' isn't signed with proper key
Mar 15 10:30:02 vm3 abrtd: Corrupted or bad dump
/var/spool/abrt/pyhook-2012-03-15-10:30:02-19686 (res:2), deleting
Mar 15 10:30:02 vm3 python: abrt: detected unhandled Python exception
in /usr/sbin/ganeti-confd
Mar 15 10:30:02 vm3 abrtd: New client connected
Mar 15 10:30:02 vm3 abrt-server[19786]: Saved Python crash dump of pid
19785 to /var/spool/abrt/pyhook-2012-03-15-10:30:02-19785
Mar 15 10:30:02 vm3 abrtd: Directory
'pyhook-2012-03-15-10:30:02-19785' creation detected
Mar 15 10:30:02 vm3 abrt-server[19786]: statvfs('(null)'): Bad address
Mar 15 10:30:02 vm3 abrtd: Package 'ganeti' isn't signed with proper key
Mar 15 10:30:02 vm3 abrtd: Corrupted or bad dump
/var/spool/abrt/pyhook-2012-03-15-10:30:02-19785 (res:2), deleting

I get a message like this every 5 minutes for ganeti-confd and every 10
minutes for ganeti-confd. So something is not happy, but have yet to
figure out what.

All machines are running ganeti-2.4.5-2 via RPM's from Jun Futagawa.

> Yes, as far as I remember from 2.2 to 2.5 we didn't have significant
> changes. Note that for safety, it would be best to restart all the
> instances (gnt-instance stop --all, gnt-instance start --all), but it's
> not absolutely needed (you can delay it for a while).

Yes, some instances seemed to hang after _many_ hours (without any restart
after all the migrartions and upgrades), while others didn't.

So restarting them _is_ recommended. The nice thing is I can do the upgrade
during work hours, and wait many hours until a downtime window comes around
that night to restart the instances... Very nice.

Iustin Pop

unread,
Mar 15, 2012, 12:11:41 PM3/15/12
to gan...@googlegroups.com
On Thu, Mar 15, 2012 at 10:41:16AM -0500, Eric Rostetter wrote:
> Quoting Iustin Pop <ius...@google.com>:
>
> >On Mon, Mar 05, 2012 at 03:34:42PM -0600, Eric Rostetter wrote:
> >>Quoting Eric Rostetter <rost...@mail.utexas.edu>:
> >>
> >>>I need to add another node to my cluster, so it seems like a good time to
> >>>upgrade my Ganeti and/or CentOS software.
>
> [...]
>
> >Let us know how it goes!
> >
> >iustin
>
> Well, just an update. I upgraded my exiting ganeti 2-node cluster to
> CentOS 5.8 (from 5.7) without any instance downtime (migrating between
> hosts) leaving them at ganeti 2.2.
>
> I then installed a new machine with CentOS 6.2, and ganeti 2.4.5, but
> didn't join it to the cluster yet.
>
> I then upgraded the original two nodes to ganeti 2.4.5 without any instance
> downtime. (But see below...)
>
> I then joined the new machine to the cluster.
>
> Everything looks pretty good (gnt-cluster verify goes okay, etc). I've
> not put any instances on the new machine yet. (NB: this is now a mixed
> OS cluster -- two nodes at CentOS 5.x and one at CentOS 6.x, all running
> the same ganeti version).

Nice.

> But, on the new CentOS 6.x cluster, this is what I am seeing in the logs,
> which is why I've not tried any instances on it yet:
>
> Mar 15 10:30:02 vm3 abrt-server[19687]: statvfs('(null)'): Bad address
> Mar 15 10:30:02 vm3 abrtd: Package 'ganeti' isn't signed with proper key
> Mar 15 10:30:02 vm3 abrtd: Corrupted or bad dump
> /var/spool/abrt/pyhook-2012-03-15-10:30:02-19686 (res:2), deleting
> Mar 15 10:30:02 vm3 python: abrt: detected unhandled Python
> exception in /usr/sbin/ganeti-confd
> Mar 15 10:30:02 vm3 abrtd: New client connected
> Mar 15 10:30:02 vm3 abrt-server[19786]: Saved Python crash dump of
> pid 19785 to /var/spool/abrt/pyhook-2012-03-15-10:30:02-19785
> Mar 15 10:30:02 vm3 abrtd: Directory
> 'pyhook-2012-03-15-10:30:02-19785' creation detected
> Mar 15 10:30:02 vm3 abrt-server[19786]: statvfs('(null)'): Bad address
> Mar 15 10:30:02 vm3 abrtd: Package 'ganeti' isn't signed with proper key
> Mar 15 10:30:02 vm3 abrtd: Corrupted or bad dump
> /var/spool/abrt/pyhook-2012-03-15-10:30:02-19785 (res:2), deleting
>
> I get a message like this every 5 minutes for ganeti-confd and every 10
> minutes for ganeti-confd. So something is not happy, but have yet to
> figure out what.
>
> All machines are running ganeti-2.4.5-2 via RPM's from Jun Futagawa.

Hmm. Yes, we know confd is broken, and we are in the process of changing
confd significantly for the 2.6 release. I just didn't realise it is
_that_ broken.

Can you make the abrt-server (I don't know what that is) ignore that
daemon?

> >Yes, as far as I remember from 2.2 to 2.5 we didn't have significant
> >changes. Note that for safety, it would be best to restart all the
> >instances (gnt-instance stop --all, gnt-instance start --all), but it's
> >not absolutely needed (you can delay it for a while).
>
> Yes, some instances seemed to hang after _many_ hours (without any restart
> after all the migrartions and upgrades), while others didn't.

Hmm, interesting.

> So restarting them _is_ recommended. The nice thing is I can do the upgrade
> during work hours, and wait many hours until a downtime window comes around
> that night to restart the instances... Very nice.

Glad to hear it has worked nice!

iustin

Iustin Pop

unread,
Mar 15, 2012, 12:19:00 PM3/15/12
to gan...@googlegroups.com
On Thu, Mar 15, 2012 at 10:41:16AM -0500, Eric Rostetter wrote:

TO make sure this is the same crashes that we see, could you give us (or
directly to me) the /var/log/ganeti/conf-daemon.log, or the snippets of
it that contain the exception?

thanks,
iustin

Eric Rostetter

unread,
Mar 15, 2012, 12:25:57 PM3/15/12
to gan...@googlegroups.com
Quoting Iustin Pop <ius...@google.com>:

> Hmm. Yes, we know confd is broken, and we are in the process of changing
> confd significantly for the 2.6 release. I just didn't realise it is
> _that_ broken.

What about noded? I get one for it also...

> Can you make the abrt-server (I don't know what that is) ignore that
> daemon?

I can turn the abrt service off actually. :) Or probably get it to
ignore confd. Just curious what is going on.

The abrt service is only in newer CentOS/RHEL/Fedora versions, hence
the reason I don't see anything on my CentOS 5.x machines, but do on
the newer CentOS 6.x machine.

Iustin Pop

unread,
Mar 15, 2012, 12:27:21 PM3/15/12
to gan...@googlegroups.com
On Thu, Mar 15, 2012 at 11:25:57AM -0500, Eric Rostetter wrote:
> Quoting Iustin Pop <ius...@google.com>:
>
> >Hmm. Yes, we know confd is broken, and we are in the process of changing
> >confd significantly for the 2.6 release. I just didn't realise it is
> >_that_ broken.
>
> What about noded? I get one for it also...

No, then it means something else is broken. Looking forward to the log
files :)

> >Can you make the abrt-server (I don't know what that is) ignore that
> >daemon?
>
> I can turn the abrt service off actually. :) Or probably get it to
> ignore confd. Just curious what is going on.
>
> The abrt service is only in newer CentOS/RHEL/Fedora versions, hence
> the reason I don't see anything on my CentOS 5.x machines, but do on
> the newer CentOS 6.x machine.

Could be, but if noded crashes, I suspect that Ganeti isn't actually
healthy (separate from the confd issues, which could trigger _sporadic_
crashes, not continuous ones).

iustin

Eric Rostetter

unread,
Mar 15, 2012, 12:32:25 PM3/15/12
to gan...@googlegroups.com
Quoting Iustin Pop <ius...@google.com>:

> TO make sure this is the same crashes that we see, could you give us (or
> directly to me) the /var/log/ganeti/conf-daemon.log, or the snippets of
> it that contain the exception?

Appears to be a locking issue:

Traceback (most recent call last):
File "/usr/sbin/ganeti-confd", line 21, in <module>
sys.exit(main.Main())
File "/usr/lib/python2.6/site-packages/ganeti/server/confd.py",
line 305, in Main
daemon.GenericMain(constants.CONFD, parser, CheckConfd,
PrepConfd, ExecConfd)
File "/usr/lib/python2.6/site-packages/ganeti/daemon.py", line 690,
in GenericMain
utils.WritePidFile(utils.DaemonPidFileName(daemon_name))
File "/usr/lib/python2.6/site-packages/ganeti/utils/io.py", line
730, in WritePidFile
filelock.LockFile(fd_pidfile)
File "/usr/lib/python2.6/site-packages/ganeti/utils/filelock.py",
line 45, in LockFile
raise errors.LockError("File already locked")
ganeti.errors.LockError: File already locked


> thanks,
> iustin

Iustin Pop

unread,
Mar 15, 2012, 1:30:55 PM3/15/12
to gan...@googlegroups.com
On Thu, Mar 15, 2012 at 11:32:25AM -0500, Eric Rostetter wrote:
> Quoting Iustin Pop <ius...@google.com>:
>
> >TO make sure this is the same crashes that we see, could you give us (or
> >directly to me) the /var/log/ganeti/conf-daemon.log, or the snippets of
> >it that contain the exception?
>
> Appears to be a locking issue:
>
> Traceback (most recent call last):
> File "/usr/sbin/ganeti-confd", line 21, in <module>
> sys.exit(main.Main())
> File "/usr/lib/python2.6/site-packages/ganeti/server/confd.py",
> line 305, in Main
> daemon.GenericMain(constants.CONFD, parser, CheckConfd,
> PrepConfd, ExecConfd)
> File "/usr/lib/python2.6/site-packages/ganeti/daemon.py", line
> 690, in GenericMain
> utils.WritePidFile(utils.DaemonPidFileName(daemon_name))
> File "/usr/lib/python2.6/site-packages/ganeti/utils/io.py", line
> 730, in WritePidFile
> filelock.LockFile(fd_pidfile)
> File "/usr/lib/python2.6/site-packages/ganeti/utils/filelock.py",
> line 45, in LockFile
> raise errors.LockError("File already locked")
> ganeti.errors.LockError: File already locked

Hmm, that simply shows a double startup. Can you kill any leftover
ganeti-confd and then run it manually and see what is says? ganeti-confd
-f, to run in foreground, and let it running as such for 5-10 minutes.

iustin

Eric Rostetter

unread,
Mar 19, 2012, 12:39:43 PM3/19/12
to gan...@googlegroups.com
Quoting Eric Rostetter <rost...@mail.utexas.edu>:

> I need to add another node to my cluster, so it seems like a good time to
> upgrade my Ganeti and/or CentOS software.

So this is what I've learned so far.

1) There is a KVM bug in RHEL/CentOS 5.8 (not in previous versions --
maybe only with virtio disk mode?) If you run KVM on 5.7, don't
upgrade to 5.8. If you do upgrade, you can try the patches at
http://people.redhat.com/myamazak/.kvm-83-249.affinity_fix.el5_8/

2) CentOS 6.x seems to have a problem with ganeti-watcher. The problem
is probably related to rapi in general, and perhaps from the following
message in water.log:
NotImplementedError: cURL uses unsupported SSL version 'NSS/3.12.7.0'

3) Using ganeti-instance-image with dump on a CentOS/RHEL 6.x machine
using ext4 won't work right, as dump includes /boot in the root dump,
which conflicts with the restoration of the boot dump. Solution is
to tell the root dump not to include the boot/ directory.

4) Otherwise things pretty much work as advertised, and much better than
I ever expected.

Eric Rostetter

unread,
Mar 20, 2012, 3:53:45 PM3/20/12
to gan...@googlegroups.com
Quoting Iustin Pop <ius...@google.com>:

It appears the problem is with RHEL/CentOS combined with
/usr/lib/ganeti/daemon-util when called from ganeti-watcher.
Specifically:

if [ -f /sbin/start-stop-daemon ]; then
start-stop-daemon --stop --signal 0 --quiet \
--pidfile $(_daemon_pidfile $name)
else
# if not Debian or Ubuntu
checkpid $(_daemon_pidfile $name)
fi

calls checkpid with the name of the pidfile, but CentOS/RHEL wants checkpid
called with the pid itself, not the name of the pidfile. I "fixed" it via

if [ -f /sbin/start-stop-daemon ]; then
start-stop-daemon --stop --signal 0 --quiet \
--pidfile $(_daemon_pidfile $name)
else
# if not Debian or Ubuntu
#checkpid $(_daemon_pidfile $name)
ejr=`cat $(_daemon_pidfile $name)`
checkpid $ejr
fi

which no longer generates any errors, and which does restart an instance
when I "kill -TERM" its kvm process and then run ganeti-watcher, but is
surely somewhat hackish...

Iustin Pop

unread,
Mar 20, 2012, 5:20:36 PM3/20/12
to gan...@googlegroups.com

Thanks, this looks good. Not sure why do you say it's hackish, if
checkpid wants the pid, it should get the pid :) Maybe just some
tweaking regarding missing pid file or such, but otherwise looks
reasonable.

thanks,
iustin

Martin Beauchamp

unread,
Apr 23, 2012, 3:08:00 PM4/23/12
to gan...@googlegroups.com

> Quoting Eric Rostetter <rost...@mail.utexas.edu>:
>
> > I need to add another node to my cluster, so it seems like a good
> > time to
> > upgrade my Ganeti and/or CentOS software.
>
> So this is what I've learned so far.
>
>
> 3) Using ganeti-instance-image with dump on a CentOS/RHEL 6.x machine
>     using ext4 won't work right, as dump includes /boot in the root
>     dump,
>     which conflicts with the restoration of the boot dump.  Solution
>     is
>     to tell the root dump not to include the boot/ directory.
>

I'm also running into a problem with ganeti-instance-image's make-dump tool on CentOS 6.2 with ext4-based instances. I'm not quite certain if its the same issue.

Eric, would you summarize how you overcame the problem with G.I.I.?

Best,
Martín

Martin Beauchamp

unread,
May 29, 2012, 2:56:22 PM5/29/12
to gan...@googlegroups.com

Regarding the problem with ganeti-watcher, it looks like there is a fix for this issue on CentOS 6.2. So maybe this is superfluous, but I've made a bug report [1] just to make sure the patch makes it to release.

Best,
Martín

[1] http://code.google.com/p/ganeti/issues/detail?id=241
Reply all
Reply to author
Forward
0 new messages