On Thu, Aug 30, 2012 at 6:46 PM, Gustave Hellman
<
gustave...@gmail.com> wrote:
>
> Hi,
>
> I have inherited a Ganeti 2.4.5 cluster of 11 servers running 14 instances.
> Most of the servers are running Ubuntu 11.04/11.10. Using KVM. The servers
> are Dell 610's and 710's.
>
> In preparation for upgrading Ganeti to a new version I decided to upgrade to
> Ubuntu 12.04. I upgraded one of the servers so that both the server and
> instance were at 12.04 with no problems. This instance runs OpenGrok and
> not much else. I upgraded another server/instance where the instance is
> used as a Jenkins build slave with no problems. So after a couple of more
> days I upgraded another Jenkins build slave.
>
> The problem is that after a number of days the network connection between
> the server and the instance stops transmitting data. This only happens on
> the two servers that are Jenkins build slaves. That a network problem would
> turn up on the two servers that have more network throughput is not
> surprising.
>
> .Checking the servers I can see all of the Ganeti processes running. The
> Ganeti cluster master sees the instancse as up. When I check the network on
> the servers br0 and tap0 are there. Commands such as "brctl show" return
> correct answers. The servers have no problems communicating out to the
> network. They just won't communicate to the instances. I can use VNC to
> connect to the instances and everything looks fine.
>
> Bouncing the network on the instance produced no changes. Didn't expect it
> to. Bouncing the network on the server sometimes worked sometimes not.
>
> When I first came into the group the SA's had said there was a problem with
> bonding ethernets on Ubuntu that would cause the instances in their terms
> "to go away". So they had removed bonding. Using a newer version of
> ifenslave I was able to get bonding working. The problem I am seeing
> sounded much like that so first I made sure the networking packages were
> fully patched and they are. I then removed the bonding and rebooted but
> after a couple of days the problem comes back.
>
> I have looked around but haven't seen anything close to this problem.
>
> Any ideas?
>
> By the way, the people here are not locked into Ubuntu for the servers.
> They need Ubuntu on the instances to match target systems. Is there a
> better OS for Ganeti servers? Not a short term solution but perhaps
> something to aim for.
>
Hi Gustave,
Sorry for the late reply, unfortunately I don't know what exactly
could cause your issue. :(
Can you tell us more about your configuration? Check if it works with
a different kernel? Try to sniff packages on the bridge and the tap,
and see where/when they stop? Perhaps check if you have any "strange"
firewalling rule that could stop traffic after it passed some
quotas...
Ubuntu should be ok for running Ganeti. We do use Debian, but this
issue doesn't seem to be distro specific but more something with the
kernel, if traffic stops flowing. :)
Thanks,
Guido