CPU steal time 100% after live migration of kvm guest debian jessie

1,646 views
Skip to first unread message

scakkia

unread,
Apr 1, 2015, 12:32:29 PM4/1/15
to gan...@googlegroups.com
Hi,

I've a 3 nodes cluster with OS debian jessie with ksm enabled and all installed from jessie packages :
ganeti 2.12.0-3
qemu 1:2.1+dfsg-11.

Happens a strange behavior when I do a live migration of a vm debian jessie, this don't happen with vm wheezy:
after live migration, the CPU steal time go to 100% and never returns down.
I'm forced to restart the vm to restore the situation.

$ mpstat -P ALL 
Linux 3.16.0-4-amd64 (xxxxxxx) 04/01/2015 _x86_64_ (2 CPU)

06:07:23 PM  CPU    %usr   %nice  %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:07:23 PM     all      0.00    0.00    0.00    0.00        0.00    0.00    100.00    0.00      0.00       0.00
06:07:23 PM       0      0.00    0.00    0.00    0.00       0.00     0.00    100.00    0.00      0.00       0.00
06:07:23 PM       1      0.00    0.00    0.00    0.00       0.00     0.00    100.00    0.00      0.00       0.00

Someone happened to that? 
some input?

thanks

scakkia

unread,
Apr 3, 2015, 10:57:47 AM4/3/15
to gan...@googlegroups.com
Hi @all,

nobody else have this issue? 

I tried to disable ksm, changing cpu_type, use only 1 vcpu for vm, but the problem persists.

I have to ask at kvm mailinglist ?


The vm is start with this parameters:


/usr/bin/kvm -name coffee08 -m 2048 -smp 1 -pidfile /var/run/ganeti/kvm-hypervisor/pid/coffee08 -balloon virtio,id=balloon,bus=pci.0,addr=0x3 -daemonize -machine pc-i440fx-2.1 -monitor unix:/var/run/ganeti/kvm-hypervisor/ctrl/coffee08.monitor,server,nowait -serial unix:/var/run/ganeti/kvm-hypervisor/ctrl/coffee08.serial,server,nowait -usb -usbdevice tablet -vnc :5155 -cpu host -uuid 6734b4a2-9bea-45ef-a56f-879bba874b65 -netdev type=tap,id=hotnic-cfb81326-pci-5,fd=10 -device virtio-net-pci,mac=aa:00:00:5b:70:99,id=hotnic-cfb81326-pci-5,bus=pci.0,addr=0x5,netdev=hotnic-cfb81326-pci-5 -incoming tcp:10.x.x.x:8102 -qmp unix:/var/run/ganeti/kvm-hypervisor/ctrl/coffee08.qmp,server,nowait -qmp unix:/var/run/ganeti/kvm-hypervisor/ctrl/coffee08.kvmd,server,nowait -boot c -device virtio-blk-pci,drive=hotdisk-f2f3465c-pci-4,id=hotdisk-f2f3465c-pci-4,bus=pci.0,addr=0x4 -drive file=/var/run/ganeti/instance-disks/coffee08:0,format=raw,if=none,id=hotdisk-f2f3465c-pci-4,bus=0,unit=4


and info migrate:

# echo 'info migrate' |  /usr/bin/socat STDIO UNIX-CONNECT:/var/run/ganeti/kvm-hypervisor/ctrl/coffee08.monitor

QEMU 2.1.2 monitor - type 'help' for more information

(qemu) info migrate

capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off

Migration status: active

total time: 1785 milliseconds

expected downtime: 60 milliseconds

setup: 1 milliseconds

transferred ram: 110616 kbytes

throughput: 503.36 mbps

remaining ram: 59008 kbytes

total ram: 2106180 kbytes

duplicate: 485257 pages

skipped: 0 pages

normal: 26536 pages

normal bytes: 106144 kbytes

dirty sync count: 0

(qemu)



best regards
Daniele

Neal Oakey

unread,
Apr 3, 2015, 12:04:54 PM4/3/15
to gan...@googlegroups.com
Hi Daniele,

this is more or less a known issue.
KSM hasn't got anything to do with is, the cpu_type neither.

The Problem is a Bug in KVM 2.x, which should be fixed in KVM 2.1.3 or KVM 2.2.x.
=> this has to be fixed by debian.

Greetings
Neal

scakkia

unread,
Apr 3, 2015, 1:19:03 PM4/3/15
to gan...@googlegroups.com
Hi Neal,

thanks for your response,
I had to look better in KVM bugs.

regards

Osvaldo T Crispim Filho

unread,
Apr 8, 2015, 7:25:13 AM4/8/15
to gan...@googlegroups.com
Is there any workaround?

Apollon Oikonomopoulos

unread,
Jun 3, 2015, 5:21:21 AM6/3/15
to scakkia, gan...@googlegroups.com
On 16:57 Fri 03 Apr , scakkia wrote:
> Hi @all,
>
> nobody else have this issue?
>
> I tried to disable ksm, changing cpu_type, use only 1 vcpu for vm, but the
> problem persists.

Hi,

It seems this is a bug with steal time accounting. I don't know (yet) if
it's qemu, the kernel or both, but if you turn off kvm steal time
reporting, it should go away:

gnt-instance modify -H 'cpu_type=qemu64,-kvm_steal_time' <instance_name>

Can you give it a try?

Regards,
Apollon
> > 06:07:23 PM CPU %usr %nice %sys %iowait %irq %soft *%steal*
> > %guest %gnice %idle
> > 06:07:23 PM all 0.00 0.00 0.00 0.00 0.00 0.00
> > *100.00* 0.00 0.00 0.00
> > 06:07:23 PM 0 0.00 0.00 0.00 0.00 0.00 0.00
> > *100.00* 0.00 0.00 0.00
> > 06:07:23 PM 1 0.00 0.00 0.00 0.00 0.00 0.00
> > *100.00* 0.00 0.00 0.00
> >
> > Someone happened to that?
> > some input?
> >
> > thanks
> >

--
Apollon Oikonomopoulos apo...@skroutz.gr
Skroutz S.A. http://skroutz.gr

scakkia

unread,
Jun 3, 2015, 9:35:48 AM6/3/15
to Apollon Oikonomopoulos, gan...@googlegroups.com
Hi Apollon,

I've Parameter Error:

gnt-instance modify -H 'cpu_type=qemu64,-kvm_steal_time' test-jessie
 
Parameter Error: Unknown parameter 'kvm_steal_time'


I've ganeti from jessie-backports:

gnt-cluster version 
Software version: 2.12.4
Internode protocol: 2120000
Configuration format: 2120000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version v2.12.4

and all the rest is jessie default:
qemu-system-x86_64 --version
QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-12)

uname -a
Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24) x86_64 GNU/Linux


regards
Daniele

Apollon Oikonomopoulos

unread,
Jun 3, 2015, 9:37:24 AM6/3/15
to scakkia, gan...@googlegroups.com
Hi,

On 15:35 Wed 03 Jun , scakkia wrote:
> Hi Apollon,
>
> I've Parameter Error:
>
> gnt-instance modify -H 'cpu_type=qemu64,-kvm_steal_time' test-jessie

You need to escape the comma using '\,' :
-H 'cpu_type=qemu64\,-kvm_steal_time'

scakkia

unread,
Jun 3, 2015, 9:52:42 AM6/3/15
to Apollon Oikonomopoulos, gan...@googlegroups.com
Hi Apollon,

great !

Works fine with both cpu_type= qemu64 or host:

gnt-instance modify -H 'cpu_type=qemu64\,-kvm_steal_time' test-jessie
gnt-instance modify -H 'cpu_type=host\,-kvm_steal_time' test-jessie


no more CPU steal time 100% after live migration:

mpstat -P ALL
Linux 3.16.0-4-amd64 (test-jessie) 06/03/2015 _x86_64_ (2 CPU)

03:50:39 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
03:50:39 PM  all         0.21    0.00    0.23     0.37        0.00    0.00    0.00    0.00    0.00   99.18
03:50:39 PM    0        0.26    0.00    0.34     0.42        0.00    0.00    0.00    0.00    0.00   98.98
03:50:39 PM    1        0.15    0.00    0.13     0.32        0.00    0.00    0.00    0.00    0.00   99.40
 

thanks
Daniele

Neal Oakey

unread,
Jun 11, 2015, 11:54:45 AM6/11/15
to gan...@googlegroups.com, Apollon Oikonomopoulos
Hi Apollon,

thanks!
It seams that this has solved my issue
(https://code.google.com/p/ganeti/issues/detail?id=986)

I'll keep a eye on it and migrate some more VMs the next few nights and
report in case it didn't work out.

Greetings,
Neal

Alexandre Derumier

unread,
Aug 31, 2015, 3:56:50 AM8/31/15
to ganeti
Hi,

I have add this bug today, on 3 different debian jessie vm (kernel 3.16), live migrating from qemu 2.2 to qemu 2.4.

Is it a qemu bug or kernel bug ?

Chad S.

unread,
Sep 4, 2015, 10:08:15 AM9/4/15
to ganeti
Hi Alex,
  I've seen you on the Ceph mailing list giving good advice.  :)
  It appears as though this is a QEMU bug, but different kernels react differently:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785557#64


Chad.
Reply all
Reply to author
Forward
0 new messages