Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#864642: vmxnet3: Reports suspect GRO implementation on vSphere hosts / one VM crashes

2,451 views
Skip to first unread message

Sven Hartge

unread,
Jul 6, 2017, 6:10:03 AM7/6/17
to
On Mon, 12 Jun 2017 10:02:56 +0200 =?UTF-8?Q?Patrick_Matth=c3=a4i?=
<pmat...@debian.org> wrote:

> Since updating the kernel from linux-image-4.9.0-2-amd64 (4.9.18-1) to
> linux-image-4.9.0-3-amd64 (4.9.30-1) all VMs report - just for the
> "primary" interface this:
>
> TCP: ens192: Driver has suspect GRO implementation, TCP performance may
> be compromised.
>
> I can't see any performance impact. This happens on all our vSphere 6.0
> and 6.5 hosts (running on HPE ProLiant DL 360 G8 - G9 HW / ProLiant ML
> 350 G9 and so on).

I see the same for my Stretch Test VMs, running on ESXi 5.5 on Dell R720.

I have yet to experience a kernel panic, but those VMs are mostly idle
and don't transfer many bytes via network, so the crash-intensity might
be related to the amount of data transmitted or the peak throughput at
some time.

Grüße,
Sven.

signature.asc

Sven Hartge

unread,
Jul 6, 2017, 4:00:03 PM7/6/17
to
Hi!

Could this be https://bugzilla.kernel.org/show_bug.cgi?id=191201 ?

Try the following, from comment 37
https://bugzilla.kernel.org/show_bug.cgi?id=191201#c37

| In the meantime, suggested workaround:
| - disable rx data ring: ethtool -G eth? rx-mini 0

Also adding "vmxnet3.rev.30 = FALSE" to the vmx file of the VM seems to
be needed. https://bugzilla.kernel.org/show_bug.cgi?id=191201#c40

Also: Which hardware version are you running? It is v10 for me (highest
for ESX5.5)

Grüße,
Sven.

Ben Hutchings

unread,
Jul 16, 2017, 5:50:02 PM7/16/17
to
Control: tag -1 moreinfo

Sven asked this, but forgot to add you to the recipients:

On Thu, 2017-07-06 at 21:50 +0200, Sven Hartge wrote:
> Hi!
>
> > Could this be https://bugzilla.kernel.org/show_bug.cgi?id=191201 ?

Note that this has been root-caused as a bug in the virtual device, not
the driver. (Though it would be nice if the driver could work around
it.)

Ben.

> Try the following, from comment 37 
> https://bugzilla.kernel.org/show_bug.cgi?id=191201#c37
>
> > In the meantime, suggested workaround:
> >  - disable rx data ring: ethtool -G eth? rx-mini 0
>
> Also adding "vmxnet3.rev.30 = FALSE" to the vmx file of the VM seems to 
> be needed. https://bugzilla.kernel.org/show_bug.cgi?id=191201#c40
>
> Also: Which hardware version are you running? It is v10 for me (highest 
> for ESX5.5)

--
Ben Hutchings
If the facts do not conform to your theory, they must be disposed of.

signature.asc

Debian Bug Tracking System

unread,
Jul 16, 2017, 5:50:03 PM7/16/17
to
Processing control commands:

> tag -1 moreinfo
Bug #864642 [src:linux] vmxnet3: Reports suspect GRO implementation on vSphere hosts / one VM crashes
Added tag(s) moreinfo.

--
864642: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=864642
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems

Patrick Matthäi

unread,
Aug 3, 2017, 9:40:05 AM8/3/17
to
severity #864642 normal
thanks


Am 16.07.2017 um 23:42 schrieb Ben Hutchings:
> Control: tag -1 moreinfo
>
> Sven asked this, but forgot to add you to the recipients:
>
> On Thu, 2017-07-06 at 21:50 +0200, Sven Hartge wrote:
>> Hi!
>>
>>> Could this be https://bugzilla.kernel.org/show_bug.cgi?id=191201 ?
> Note that this has been root-caused as a bug in the virtual device, not
> the driver. (Though it would be nice if the driver could work around
> it.)
>
> Ben.

I can confirm, that the VMs do not crash anymore with vSphere 6.5 build
5969303 from 27.07.2017, that is why I lowered the severity.

But we have got still the issue with "Driver has suspect GRO
implementation, TCP performance may be compromised" and the fact, that
4.9.18-1 wasn't crashing and has not this message, while 4.9.30-1 was
crashing with the message.

--
/*
Mit freundlichem Gruß / With kind regards,
Patrick Matthäi
GNU/Linux Debian Developer

Blog: http://www.linux-dev.org/
E-Mail: pmat...@debian.org
pat...@linux-dev.org
*/


signature.asc

Sven Hartge

unread,
Aug 3, 2017, 10:30:02 AM8/3/17
to
On 03.08.2017 15:34, Patrick Matthäi wrote:
> Am 16.07.2017 um 23:42 schrieb Ben Hutchings:
>> On Thu, 2017-07-06 at 21:50 +0200, Sven Hartge wrote:

>>>> Could this be https://bugzilla.kernel.org/show_bug.cgi?id=191201 ?
>> Note that this has been root-caused as a bug in the virtual device, not
>> the driver. (Though it would be nice if the driver could work around
>> it.)

> I can confirm, that the VMs do not crash anymore with vSphere 6.5 build
> 5969303 from 27.07.2017, that is why I lowered the severity.

This is the version from 6.5u1, right?

Still: Stretch is basically unusable with HW13 on ESX6.5 below Update1.

Grüße,
Sven.



signature.asc

Sven Hartge

unread,
Aug 8, 2017, 5:50:05 AM8/8/17
to
Um 16:22 Uhr am 03.08.17 schrieb Sven Hartge:
Hmm. There are discussions on Reddit right now indicating the bug still
occurs even with the latest ESXi6.5u1 (Build 5969303).

https://www.reddit.com/r/homelab/comments/6s5dh6/debian_9_on_esxi_65u1_complete_lockup/

One of the latest comments on the Kernel Bugzilla shows the same:

https://bugzilla.kernel.org/show_bug.cgi?id=191201#c54

(For me, this is really frustrating right now, since I waited until
ESX6.5u1 before updating my infrastructure and now it seems I have to push
this update even farther into the future because of this critical blocker
bug.)

I really wonder what could be done on the Kernel side to avoid the
problem, since only newer Kernel are affected while older one don't show
the problem.

Grüße,
Sven.
0 new messages