Bug#669184: qemu-kvm: Virtio network drops in case of a heavy load

Michael Tokarev

unread,

Apr 23, 2012, 11:30:03 AM4/23/12

to

On 23.04.2012 11:54, Vugar Dzhamalov wrote:
[]

I'll take a look. For the next time, please don't send me
content of your /usr/bin or any other directories. Just
_versions_ of the software affected, and, most important,
the steps you did to (re)produce the issue. This includes
the way you start your guests too - this is your kvm command
line.

> Is there anything else I can do here to help with this? Thank you.

Yes. What's the guest kernel details please?

What is your kvm command line?

Also, does it happen with regular bridge networking
too, or only vde?

Thanks,

/mjt

--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Michael Tokarev

unread,

Apr 23, 2012, 12:00:02 PM4/23/12

to

Okay, I re-created your configuration with vde_switch,
and I found the kvm command lines you used in one of
the files -- you sent just too much information so
really important bits got lost in the noize initially.

I'm doing a copy test from one guest /dev/zero to another
guest /dev/null. It copied 120 Gigs of data so far, and
counting. Should it stop somewhere as per your bugreport?
How can I make it stop/stall?

Thank you!

Vugar Dzhamalov

unread,

Apr 26, 2012, 4:30:02 AM4/26/12

to

Thank you for doing this. I am very appreciate and sorry for wasting your time
with this.
Lets be honest here I've reported it more than a week ago and so far no one
else joined the discussion. I guess it is quite obvious that there is
something wrong with my setup rather than anything else...

I've launched 64 bit CentOS 6.2 as two guests in the same manner as I did with
two 64bit Debian testing (wheezy) guests in the previous attempt and got the
same issues with the virtio NIC. There is something wrong with my setup...

I've found following in my /var/log/syslog

kernel: [201326.063213] kvm: 17363: cpu0 unhandled rdmsr: 0xc001100d
kernel: [201326.063226] kvm: 17363: cpu0 unhandled rdmsr: 0xc0010112
kernel: [201326.193885] kvm: 17363: cpu0 unhandled rdmsr: 0xc0010001
kernel: [201326.205717] kvm: 17363: cpu1 unhandled rdmsr: 0xc001100d
kernel: [201336.675267] kvm: 17377: cpu0 unhandled rdmsr: 0xc001100d
kernel: [201336.675285] kvm: 17377: cpu0 unhandled rdmsr: 0xc0010112
kernel: [201336.815707] kvm: 17377: cpu0 unhandled rdmsr: 0xc0010001
kernel: [201336.827603] kvm: 17377: cpu1 unhandled rdmsr: 0xc001100d

I don't know if this related in any way. Could you please tell me what is it
all about? I can't see anything suspicious in my other logs be it a guest or
host...

I can't see anything else I can do at this stage. I can use e1000 - it is
more than sufficient for my current needs. Thank you.

Michael Tokarev

unread,

Apr 26, 2012, 4:50:02 AM4/26/12

to

On 26.04.2012 12:24, Vugar Dzhamalov wrote:
>
> Thank you for doing this. I am very appreciate and sorry for wasting your time
> with this.
> Lets be honest here I've reported it more than a week ago and so far no one
> else joined the discussion. I guess it is quite obvious that there is
> something wrong with my setup rather than anything else...

I'm not sure I understand. You have a problem which looks very much like
a bug. The fact that no one else joined this discussion tell exactly
nothing, since at this state, bugs are much more difficult to hit, since
most obvious bugs which are hit by all users are fixed ages ago. Again,
the fact I can't reproduce it does not mean it does not exist, but it
most likely means my setup is (slightly) different than yours and something
which we overlooked does not let me to hit this bug.

Lack of other users hitting this bug does not mean the bug does not exist.

We should actually find what is going on - either a real bug in the code
or something "wrong" in your setup or something else, before deciding
what to do next.

> I've launched 64 bit CentOS 6.2 as two guests in the same manner as I did with
> two 64bit Debian testing (wheezy) guests in the previous attempt and got the
> same issues with the virtio NIC. There is something wrong with my setup...
>
> I've found following in my /var/log/syslog
>
>
> kernel: [201326.063213] kvm: 17363: cpu0 unhandled rdmsr: 0xc001100d
> kernel: [201326.063226] kvm: 17363: cpu0 unhandled rdmsr: 0xc0010112
> kernel: [201326.193885] kvm: 17363: cpu0 unhandled rdmsr: 0xc0010001

These are accesses by the guest to model-specific CPU registers
(MSRs). Qemu must emulate these but it does not emulate all
of them (and we can't even know all of them since CPUs are
different). What you see here are:

0xC001100d CPU_ID_HYPER_EXT_FEATURES
(information about extended features of hypervisor)
0xc0010112 MSR_K8_TSEG_ADDR
(some AMD K8-specific register)
0xC0010001 MSR_K7_EVNTSEL1
(some AMD K7-specific register)

Guest just probes various features of the CPU to know what a
CPU can do, poking around what it knows. Qemu merely reports
it didn't know what to do with these, and returned some sort
of failure to the guest, which, at this stage, was fully
prepared to handle. So these are all harmless.

> I can't see anything else I can do at this stage. I can use e1000 - it is
> more than sufficient for my current needs. Thank you.

This is very good that you have a workaround - it means I can
do some more urgent work before I'll take a look at this again.

But the probleb appears to be here and we need to find it
and fix it or at least understand why and where it happens.

(And yes I'd be very glad to close this bugreport so I'd
have less bugs to deal with :)

Thank you!

/mjt

Vugar Dzhamalov

unread,

Apr 30, 2012, 2:00:01 AM4/30/12

to

I just wanted to let you know that I have reinstalled the system.
Unfortunately it didn't prove to be useful. Basically it is the same issue
with the virtio NIC.

At the moment of writing:

Host OS: Debian Testing (Wheezy) amd64

qemu-kvm version: 1.0+dfsg-11

I'll let you know if I stumble on something new. Thank you!

Vugar Dzhamalov

unread,

May 1, 2012, 7:00:01 AM5/1/12

to

Another quick update here. It could be related somehow...
Just got a kernel failure popup, here is the message:

Kernel failure message 1:
------------[ cut here ]------------
WARNING: at /build/buildd-linux-2.6_3.2.15-1-amd64-
EOdTQR/linux-2.6-3.2.15/debian/build/source_amd64_none/net/sched/sch_generic.c:255
dev_watchdog+0xe9/0x148()
Hardware name: GA-970A-D3
NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Modules linked in: mperf cpufreq_powersave cpufreq_stats cpufreq_userspace
cpufreq_conservative parport_pc ppdev lp parport nfsd nfs nfs_acl auth_rpcgss
fscache lockd sunrpc kvm_amd kvm iptable_nat nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack ip_tables x_tables tun ext3 jbd loop usbhid hid
snd_usb_audio snd_usbmidi_lib arc4 uvcvideo snd_seq_midi snd_seq_midi_event
snd_rawmidi snd_hda_codec_hdmi btusb bluetooth tuner_simple tuner_types wm8775
snd_hda_codec_realtek tda9887 tda8290 snd_hda_intel snd_hda_codec snd_hwdep
snd_pcm snd_page_alloc sp5100_tco tuner radeon cx25840 ath5k ath ttm mac80211
ivtv cx2341x tveeprom v4l2_common videodev snd_seq cfg80211 rfkill
drm_kms_helper drm v4l2_compat_ioctl32 power_supply media snd_seq_device
i2c_piix4 i2c_algo_bit sr_mod xhci_hcd ohci_hcd edac_mce_amd cdrom i2c_core
ehci_hcd usbcore edac_core r8169 k10temp mii snd_timer fam15h_power usb_common
snd wmi evdev pcspkr button processor soundcore thermal_sys ext4 crc16 jbd2
mbcache dm_mod sd_mod crc_t10dif ahci libahci libata scsi_mod
Pid: 0, comm: swapper/2 Not tainted 3.2.0-2-amd64 #1
Call Trace:
<IRQ> [<ffffffff81046811>] ? warn_slowpath_common+0x78/0x8c
[<ffffffff810468bd>] ? warn_slowpath_fmt+0x45/0x4a
[<ffffffff812a1e75>] ? netif_tx_lock+0x40/0x72
[<ffffffff812a1fd6>] ? dev_watchdog+0xe9/0x148
[<ffffffff81051ebc>] ? run_timer_softirq+0x19a/0x261
[<ffffffff812a1eed>] ? netif_tx_unlock+0x46/0x46
[<ffffffff810659ff>] ? timekeeping_get_ns+0xd/0x2a
[<ffffffff8104be30>] ? __do_softirq+0xb9/0x177
[<ffffffff813504ac>] ? call_softirq+0x1c/0x30
[<ffffffff8100f8e5>] ? do_softirq+0x3c/0x7b
[<ffffffff8104c098>] ? irq_exit+0x3c/0x9a
[<ffffffff81023fe8>] ? smp_apic_timer_interrupt+0x74/0x82
[<ffffffff8134ed1e>] ? apic_timer_interrupt+0x6e/0x80
<EOI> [<ffffffff8100d6a3>] ? __switch_to+0x133/0x258
[<ffffffffa0117398>] ? arch_local_irq_enable+0x4/0x8 [processor]
[<ffffffffa0118020>] ? acpi_idle_enter_simple+0xc6/0x102 [processor]
[<ffffffff8126b8ab>] ? cpuidle_idle_call+0xec/0x179
[<ffffffff8100d248>] ? cpu_idle+0xa5/0xf2
[<ffffffff810706c2>] ? arch_local_irq_restore+0x2/0x8
[<ffffffff8133b77f>] ? start_secondary+0x1d5/0x1db
---[ end trace 633101d7318d6344 ]---

Possibly related to this bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=526983

Michael Tokarev

unread,

Jul 3, 2012, 11:00:02 AM7/3/12

to

[Replying to an old bugreport]

01.05.2012 14:50, Vugar Dzhamalov wrote:
>
> Another quick update here. It could be related somehow...
> Just got a kernel failure popup, here is the message:
>
> Kernel failure message 1:
> ------------[ cut here ]------------
> WARNING: at /build/buildd-linux-2.6_3.2.15-1-amd64-
> EOdTQR/linux-2.6-3.2.15/debian/build/source_amd64_none/net/sched/sch_generic.c:255
> dev_watchdog+0xe9/0x148()
> Hardware name: GA-970A-D3
> NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

Um. No, this is unrelated to virtio or guest networking. It is your
host network problem, which is something else.

Now... Looking at the whole issue... You did lots of work, but the outcome
is zero. It is something I don't like... :(

Have you tried anything since the initial work you've done? For example,
it is worth trying the current 1.1-z0+dfsg-1 version in sid/unstable,
which is a new upstream release with lots of changes -- does it behave
any different?

If not, and you still have the interest, let's try to address this issue
upstream?

Thank you for your patience!

/mjt

Bug#669184: qemu-kvm: Virtio network drops in case of a heavy load - Follow-up on virtio_net drop

Michael Tokarev

Michael Tokarev

Vugar Dzhamalov

Michael Tokarev

Vugar Dzhamalov

Vugar Dzhamalov

Michael Tokarev