Re: Upgrade server to Ubuntu 12.04 causes strange connectivity issue with instances on Ganeti 2,4,5

67 views
Skip to first unread message

Guido Trotter

unread,
Sep 30, 2012, 12:55:56 AM9/30/12
to gan...@googlegroups.com
On Thu, Aug 30, 2012 at 6:46 PM, Gustave Hellman
<gustave...@gmail.com> wrote:
>
> Hi,
>
> I have inherited a Ganeti 2.4.5 cluster of 11 servers running 14 instances.
> Most of the servers are running Ubuntu 11.04/11.10. Using KVM. The servers
> are Dell 610's and 710's.
>
> In preparation for upgrading Ganeti to a new version I decided to upgrade to
> Ubuntu 12.04. I upgraded one of the servers so that both the server and
> instance were at 12.04 with no problems. This instance runs OpenGrok and
> not much else. I upgraded another server/instance where the instance is
> used as a Jenkins build slave with no problems. So after a couple of more
> days I upgraded another Jenkins build slave.
>
> The problem is that after a number of days the network connection between
> the server and the instance stops transmitting data. This only happens on
> the two servers that are Jenkins build slaves. That a network problem would
> turn up on the two servers that have more network throughput is not
> surprising.
>
> .Checking the servers I can see all of the Ganeti processes running. The
> Ganeti cluster master sees the instancse as up. When I check the network on
> the servers br0 and tap0 are there. Commands such as "brctl show" return
> correct answers. The servers have no problems communicating out to the
> network. They just won't communicate to the instances. I can use VNC to
> connect to the instances and everything looks fine.
>
> Bouncing the network on the instance produced no changes. Didn't expect it
> to. Bouncing the network on the server sometimes worked sometimes not.
>
> When I first came into the group the SA's had said there was a problem with
> bonding ethernets on Ubuntu that would cause the instances in their terms
> "to go away". So they had removed bonding. Using a newer version of
> ifenslave I was able to get bonding working. The problem I am seeing
> sounded much like that so first I made sure the networking packages were
> fully patched and they are. I then removed the bonding and rebooted but
> after a couple of days the problem comes back.
>
> I have looked around but haven't seen anything close to this problem.
>
> Any ideas?
>
> By the way, the people here are not locked into Ubuntu for the servers.
> They need Ubuntu on the instances to match target systems. Is there a
> better OS for Ganeti servers? Not a short term solution but perhaps
> something to aim for.
>

Hi Gustave,

Sorry for the late reply, unfortunately I don't know what exactly
could cause your issue. :(
Can you tell us more about your configuration? Check if it works with
a different kernel? Try to sniff packages on the bridge and the tap,
and see where/when they stop? Perhaps check if you have any "strange"
firewalling rule that could stop traffic after it passed some
quotas...

Ubuntu should be ok for running Ganeti. We do use Debian, but this
issue doesn't seem to be distro specific but more something with the
kernel, if traffic stops flowing. :)

Thanks,

Guido

Karl Katzke

unread,
Sep 30, 2012, 1:56:09 AM9/30/12
to gan...@googlegroups.com
We're running fine on 12.04 and haven't noticed any networking issues. It might be an issue with the upgrade; also, we hand-build our packages and don't depend on the ubuntu repo as we have specific paths that we want to keep things in and prefer to pass those in at compile-time. 

-K 

marcolinuz

unread,
Oct 3, 2012, 3:29:12 AM10/3/12
to gan...@googlegroups.com
Crap!
This is exactly what happens in my network.. I have a 4 nodes cluster with with 35 VMs on debian squeeze and from some months I'm experiencing these random netwok loss for some minutes during heavy load.
It's quite frustrating cause it make my backups and my live migrations randomly fail.. :(

My configuration is the following:
QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5) on Debian Squeeze with 2 network adapters in bonding for the lan bridge connectivity of the VMs.
It happens on both Windows and linux guests so I guess it's a KVM or (worse) a kernel bug. 

Is there any known patch or workaround that solves it?


On Wed, Oct 3, 2012 at 3:21 AM, Ben Kochie <sup...@gmail.com> wrote:
This is probably this bug:



On Thursday, August 30, 2012 10:46:49 AM UTC-7, Gustave Hellman wrote:
 
Hi,
 
I have inherited a Ganeti 2.4.5 cluster of 11 servers running 14 instances.  Most of the servers are running Ubuntu 11.04/11.10.  Using KVM.  The servers are Dell 610's and 710's.
 
In preparation for upgrading Ganeti to a new version I decided to upgrade to Ubuntu 12.04.  I upgraded one of the servers so that both the server and instance were at 12.04 with no problems.  This instance runs OpenGrok and not much else.  I upgraded another server/instance where the instance is used as a Jenkins build slave with no problems. So after a couple of more days I upgraded another Jenkins build slave. 
 
The problem is that after a number of days  the network connection between the server and the instance stops transmitting data.  This only happens on the two servers that are Jenkins build slaves. That a network problem would turn up on the two servers that have more network throughput is not surprising.
 
.Checking the servers I can see all of the Ganeti processes running. The Ganeti cluster master sees the instancse as up.  When I check the network on the servers br0 and tap0 are there.  Commands such as "brctl show" return correct answers. The servers have no problems communicating out to the network. They just won't communicate to the instances. I can use VNC to connect to the instances and everything looks fine.
 
Bouncing the network on the instance produced no changes.  Didn't expect it to.  Bouncing the network on the server sometimes worked sometimes not.
 
When I first came into the group the SA's had said there was a problem with bonding ethernets on Ubuntu that would cause the instances in their terms "to go away".  So they had removed bonding.  Using a newer version of ifenslave I was able to get bonding working.  The problem I am seeing sounded much like that so first I made sure the networking packages were fully patched and they are.  I then removed the bonding and rebooted  but after a couple of days the problem comes back.
 
I have looked around but haven't seen anything close to this problem.
 
Any ideas?
 
By the way, the people here are not locked into Ubuntu for the servers.  They need Ubuntu on the instances to match target systems.  Is there a better OS for Ganeti servers?  Not a short term solution but perhaps something to aim for.
 
Thanks,
Gustave
 
 
 
 
 



--
By MCM.

«Un computer è come il dio del vecchio testamento: ha un casino di regole e nessuna pietà.»

Ben Kochie

unread,
Oct 3, 2012, 2:31:05 PM10/3/12
to gan...@googlegroups.com

marcolinuz

unread,
Oct 4, 2012, 5:06:48 AM10/4/12
to gan...@googlegroups.com
I see, thanks a lot.

However, I decided to wait for a debian official patch. 
I just hope it will be published soon.
Reply all
Reply to author
Forward
0 new messages