virtio-scsi, CEPH and timeouts

Phil Regnauld

unread,

May 7, 2018, 6:09:23 AM5/7/18

to gan...@googlegroups.com

Currently runnign 2.16.0-rc2 on Ubuntu 18.04, with a CEPH cluster in the
backend. Some instances are experiencing I/O timeout on the block device.

Unfortunately, vd* block devices don't implement /sys/block/*/device/timeout...

Until now I've been using virtio (-blk) as the default disk backend, but
as I understand, I should be getting better performance, and the ability
to control I/O timeout -- but so far I'm not sure I'm doing it right.

a) gnt-instance modify -H disk_type=scsi,scsi_controller_type=virtio-scsi-pci testvm

(figured this out from looking at http://docs.ganeti.org/ganeti/2.17/html/design-scsi-kvm.html)

But it doesn't seem to do what I need - performance is abysmal (20 MB /sec!) on
writes, and it doesn't seem it's passing the right parameters from looking at
the KVM command line:

qemu-system-x86_64 -enable-kvm -name testvm -m 4096 -smp 1 -pidfile /var/run/ganeti/kvm-hypervisor/pid/testvm -balloon virtio -daemonize -machine pc-i440fx-bionic -monitor unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm.monitor,server,nowait -serial unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm.serial,server,nowait -usb -usbdevice tablet -vnc :5100,password -uuid 30c59506-b332-4ecf-aa08-1a0dbfc267b9 -netdev type=tap,id=nic-ffe9b380-e29e-4ced,fd=9 -device virtio-net-pci,bus=pci.0,id=nic-ffe9b380-e29e-4ced,addr=0xd,netdev=nic-ffe9b380-e29e-4ced,mac=aa:00:00:80:47:7c -qmp unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm.qmp,server,nowait -qmp unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm.kvmd,server,nowait -boot c -drive file=/var/run/ganeti/instance-disks/testvm:0,format=raw,if=scsi,cache=none -S

b) using kvm_extra

... from reading https://groups.google.com/forum/#!topic/ganeti/YuTE-292FDg

... but not really sure about the proper invocation - is there an example
somewhere ?

Any advice appreciated, as this is causing significant frustration :)

Cheers,
Phil

Phil Regnauld

unread,

May 7, 2018, 7:48:53 AM5/7/18

to gan...@googlegroups.com

Phil Regnauld (regnauld) writes:
> b) using kvm_extra
>
> ... from reading https://groups.google.com/forum/#!topic/ganeti/YuTE-292FDg
>
> ... but not really sure about the proper invocation - is there an example
> somewhere ?
>
> Any advice appreciated, as this is causing significant frustration :)

Ok, was able to get the above to work, but it means hardcoding device
in the following fashion:

gnt-instance modify -H kvm_extra="-device virtio-scsi-pci\,id=scsi -device scsi-hd\,drive=hd -drive file=/dev/rbd5\,if=none\,id=hd\,cache=none\,aio=native" testvm

... and, yeah, performance is rather better :)

root@testvm:~# dd if=/dev/zero of=junk bs=1M count=20000 status=progress
20000+0 records in
20000+0 records out
20971520000 bytes (21 GB, 20 GiB) copied, 25.117 s, 835 MB/s

Now, obviously it's not going to be obtimal to have to map RBD devices
beforehand, and point KVM to it.

Am I missing something ?

sascha...@web.de

unread,

May 7, 2018, 11:34:04 AM5/7/18

to gan...@googlegroups.com

Hi Phil,

on Mo, 07 Mai 2018 13:48:50 +0200 Phil Regnauld wrote:

> Now, obviously it's not going to be obtimal to have to map RBD devices
> beforehand, and point KVM to it.

AFAIK virtio-scsi is not implemented in 2.16, so your kvm_extra approach
seems the only way.

> Unfortunately, vd* block devices don't implement /sys/block/*/device/timeout

This is an advantage from my point of view. Error handling and retry is done
at the host, not at the guest kernel level. What ever error the host device
reports is propageted to the guest. In my case NFS/sharedfile with "hard"-
mount option, instances/VMs can wait hours without losing disks (vd*).

Switching to devices with timeout seems rather a workaround, then a sulution?

Thanks, Sascha.

Dimitris Aragiorgis

unread,

May 7, 2018, 12:22:28 PM5/7/18

to Phil Regnauld, gan...@googlegroups.com

Hi,

* Phil Regnauld <regn...@gmail.com> [2018-05-07 12:09:16 +0200]:

> Currently runnign 2.16.0-rc2 on Ubuntu 18.04, with a CEPH cluster in the
> backend. Some instances are experiencing I/O timeout on the block device.
>
> Unfortunately, vd* block devices don't implement /sys/block/*/device/timeout...
>
> Until now I've been using virtio (-blk) as the default disk backend, but
> as I understand, I should be getting better performance, and the ability
> to control I/O timeout -- but so far I'm not sure I'm doing it right.
>
> a) gnt-instance modify -H disk_type=scsi,scsi_controller_type=virtio-scsi-pci testvm
>

I think you should use disk_type=scsi-hd. Copying from the design doc:

Up until now, if disk_type hvparam was set to scsi, QEMU would use the
deprecated device model and end up using SCSI emulation, e.g.:

drive file=/var/run/ganeti/instance-disks/test:0,if=scsi,format=raw

Now the equivalent, which will also enable hotplugging, will be to set
disk_type to scsi-hd. The QEMU command line will include:

-drive file=/var/run/ganeti/instance-disks/test:0,if=none,format=raw,id=disk-9e7c85f6-b6e5-4243
-device scsi-hd,id=disk-9e7c85f6-b6e5-4243,drive=disk-9e7c85f6-b6e5-4243,bus=scsi.0,channel=0,scsi-id=0,lun=0

The disk_type=scsi is kept for backwards compatibility.

The patchset that implements this design doc ends practically with commit
56eb706d [1], which is included in v2.16.0beta2 onwards.

If you had real SCSI devices on the host (not /dev/rbdX) then you could also
use disk_type=scsi-block for SCSI block device passthrough.

Note that with the current implementation of hvparams/diskparams, you cannot
set any other QEMU specific drive options, e.g., aio=native.

Hope it helps,
dimara

[1] https://github.com/ganeti/ganeti/commit/56eb706dddfa06d5432c411dd530af3c845f279a

signature.asc

Phil Regnauld

unread,

May 7, 2018, 2:54:32 PM5/7/18

to gan...@googlegroups.com

sascha...@web.de (sascha_lucas) writes:
>
> AFAIK virtio-scsi is not implemented in 2.16, so your kvm_extra approach
> seems the only way.

Ok, that's a problem...

> > Unfortunately, vd* block devices don't implement /sys/block/*/device/timeout
>
> This is an advantage from my point of view. Error handling and retry is done
> at the host, not at the guest kernel level. What ever error the host device
> reports is propageted to the guest. In my case NFS/sharedfile with "hard"-mount
> option, instances/VMs can wait hours without losing disks (vd*).

Except that ZFS, ext4, etc. give up after a while because they assume
the underlying disk is experiencing errors.

> Switching to devices with timeout seems rather a workaround, then a sulution?

To some extent I agree, but unfortunately the workaround would solve
my problem :)

Cheers,
Phil

Phil Regnauld

unread,

May 7, 2018, 5:37:52 PM5/7/18

to Dimitris Aragiorgis, gan...@googlegroups.com

Dimitris Aragiorgis (dimara) writes:
>
> Now the equivalent, which will also enable hotplugging, will be to set
> disk_type to scsi-hd. The QEMU command line will include:
>
> -drive file=/var/run/ganeti/instance-disks/test:0,if=none,format=raw,id=disk-9e7c85f6-b6e5-4243
> -device scsi-hd,id=disk-9e7c85f6-b6e5-4243,drive=disk-9e7c85f6-b6e5-4243,bus=scsi.0,channel=0,scsi-id=0,lun=0
>
> The disk_type=scsi is kept for backwards compatibility.
>
> The patchset that implements this design doc ends practically with commit
> 56eb706d [1], which is included in v2.16.0beta2 onwards.

Cool. I just ran some tests on this.

A simple dd test is giving me ~300 MB / sec writes, and I can control
device timeouts (which may not be necessary).

On the down side, on the host I see:

top - 16:57:59 up 7 days, 11:47, 1 user, load average: 22.41, 9.10, 3.52

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
48867 root 20 0 9643460 4.027g 13216 R 142.4 3.2 1:18.94 qemu-system-x86

Load 22 and 140% CPU usage - ouch!

Here's what it looks like if I pass the disk config via kvm_extra :

top - 17:31:53 up 7 days, 12:21, 1 user, load average: 1.16, 0.59, 1.13

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
84424 root 20 0 5548656 4.048g 13064 S 101.0 3.2 0:46.73 qemu-system-x86

Seems there is quite a difference in overhead using 'virtio-scsi-pci'
(specified using kvm_extra) and 'virtio-blk-pci' (using disk_type=scsi-hd)...

> Note that with the current implementation of hvparams/diskparams, you cannot
> set any other QEMU specific drive options, e.g., aio=native.

Right.

signature.asc

Phil Regnauld

unread,

May 8, 2018, 4:04:01 AM5/8/18

to Dimitris Aragiorgis, gan...@googlegroups.com

Phil Regnauld (regnauld) writes:
> top - 16:57:59 up 7 days, 11:47, 1 user, load average: 22.41, 9.10, 3.52
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 48867 root 20 0 9643460 4.027g 13216 R 142.4 3.2 1:18.94 qemu-system-x86
>
> Load 22 and 140% CPU usage - ouch!

Something slightly more worrying:

state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: none requested
config:

NAME STATE READ WRITE CKSUM
zDATA ONLINE 0 0 3
sdb ONLINE 0 0 6

errors: Permanent errors have been detected in the following files:

<420 files listed>

root# sha256sum base.tar.gz

sha256sum: base.tar.gz: Input/output error

And on the console :

[ 5921.788836] INFO: task txg_sync:882 blocked for more than 120 seconds.
[ 5921.789151] Tainted: P O 4.15.0-21-generic #22-Ubuntu
[ 5921.789396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5921.789660] txg_sync D 0 882 2 0x80000000
[ 5921.789663] Call Trace:
[ 5921.789716] __schedule+0x297/0x8b0
[ 5921.790194] ? zio_taskq_dispatch+0x73/0x90 [zfs]
[ 5921.790198] schedule+0x2c/0x80
[ 5921.790210] io_schedule+0x16/0x40
[ 5921.790244] cv_wait_common+0xb2/0x140 [spl]
[ 5921.790250] ? wait_woken+0x80/0x80
[ 5921.790255] __cv_wait_io+0x18/0x20 [spl]
[ 5921.790285] zio_wait+0xf8/0x1b0 [zfs]
[ 5921.790309] dsl_pool_sync+0xb8/0x430 [zfs]
[ 5921.790335] spa_sync+0x43e/0xd80 [zfs]
[ 5921.790363] txg_sync_thread+0x2cd/0x4a0 [zfs]
[ 5921.790389] ? txg_quiesce_thread+0x3d0/0x3d0 [zfs]
[ 5921.790392] thread_generic_wrapper+0x74/0x90 [spl]
[ 5921.790400] kthread+0x121/0x140
[ 5921.790403] ? __thread_exit+0x20/0x20 [spl]
[ 5921.790406] ? kthread_create_worker_on_cpu+0x70/0x70
[ 5921.790408] ret_from_fork+0x35/0x40

... which makes me think something is very wrong :(

signature.asc

Dimitris Aragiorgis

unread,

May 8, 2018, 4:21:01 AM5/8/18

to Phil Regnauld, gan...@googlegroups.com

* Phil Regnauld <regn...@gmail.com> [2018-05-07 23:37:49 +0200]:

Why bother using kvm_extra? For the aio drive option? I think all you need is
to set both disk_type=scsi-hd and scsi_controller_type=virtio-scsi-pci hvparams.
If you don't set scsi_controller_type explicitly, QEMU will see that you
need a SCSI bus and will use the lsi controller by default (I think).

signature.asc

Phil Regnauld

unread,

May 8, 2018, 4:45:16 AM5/8/18

to Dimitris Aragiorgis, gan...@googlegroups.com

Dimitris Aragiorgis (dimara) writes:
>
> Why bother using kvm_extra? For the aio drive option? I think all you need is
> to set both disk_type=scsi-hd and scsi_controller_type=virtio-scsi-pci hvparams.

I did - and just spent 1 hour recovering a partially corrupted ZFS pool
in a customer's VM.

So, now back to virtio until I find an alternative :(

signature.asc

Dimitris Aragiorgis

unread,

May 8, 2018, 12:25:35 PM5/8/18

to Phil Regnauld, gan...@googlegroups.com

* Phil Regnauld <regn...@gmail.com> [2018-05-08 10:45:12 +0200]:

Just for the record:

* Which version of QEMU are you using?
* With what QEMU/Ganeti configuration have you encountered this corruption?
disk_type, scsi_controller_type, kvm_extra?

signature.asc

Phil Regnauld

unread,

May 8, 2018, 5:33:29 PM5/8/18

to Dimitris Aragiorgis, gan...@googlegroups.com

Dimitris Aragiorgis (dimara) writes:
>
> Just for the record:
>
> * Which version of QEMU are you using?
> * With what QEMU/Ganeti configuration have you encountered this corruption?
> disk_type, scsi_controller_type, kvm_extra?

Ok, at this point I'm still trying to figure out if this is a KVM problem
or a CEPH problem - since I've done some burn in testing on the cluster
(stock Luminous deploymnt), I'm tempted to think this is a KVM/QEMU caching
snafu (reminds me of https://access.redhat.com/articles/41313)

If I map the RBD image on the node itself, I can't cause any corruption -
creating an FS, running some heavy I/O testing - all good.

I can easily corrupt it as soon as I use it from within an instance.

To answer your questions:

qemu-kvm 2.11+dfsg-1ubuntu7 amd64 QEMU Full virtualization on x86 hardware

This is on 2.16.0-rc2 on Ubuntu 18.04 release, Kernel 4.15.0-13.

RBD backend enabled, standard replication pool (3 copies), heavily tested
before deploying Ganeti.

Observing corruption with paravirtual and scsi-hd, scsi_controller_type set
to either virtio-scsi-pci or default, and kvm_extra not set.

To compare, I'll be testing with libvirt + kvm using the same mapped block
device...

signature.asc

Phil Regnauld

unread,

May 8, 2018, 6:10:48 PM5/8/18

to Dimitris Aragiorgis, gan...@googlegroups.com

Phil Regnauld (regnauld) writes:
>
> To compare, I'll be testing with libvirt + kvm using the same mapped block
> device...

I've been able to reproduce the problem with libvirt/kvm + rbd.
It's now narrowed down to:

virtio, cache=none, aio=native => no issue

virtio, cache=none, aio=threads => corruption.

What does kvm use by default ? Looking at the Ganeti spawned instances,
aio is not specified - but *all* VMs we had setup for customers in the
past couple of days are experiencing massive corruption.

signature.asc

Phil Regnauld

unread,

May 8, 2018, 6:25:54 PM5/8/18

to Dimitris Aragiorgis, gan...@googlegroups.com

Phil Regnauld (regnauld) writes:
>
> What does kvm use by default ? Looking at the Ganeti spawned instances,
> aio is not specified - but *all* VMs we had setup for customers in the
> past couple of days are experiencing massive corruption.

To answer my own question:

gnt-cluster modify -H kvm:disk_aio=native

(default is threads)

No more corruption.

On the other, performance is abysmal (30-35 MB / sec writes)

I wonder where the right place to report this bug would be ?

signature.asc

Phil Regnauld

unread,

May 8, 2018, 6:30:59 PM5/8/18

to Dimitris Aragiorgis, gan...@googlegroups.com

Phil Regnauld (regnauld) writes:
>
> On the other, performance is abysmal (30-35 MB / sec writes)
>
> I wonder where the right place to report this bug would be ?

Another possible bug - disk_cache=writeback (or any other value,
really) seems to be ignored by Ganeti. stop/starting a VM
after setting the value at both cluster & instance level doesn't
make a difference - the command line still shows cache=none:

134788 ? Sl 0:11 qemu-system-x86_64 -enable-kvm -name testvm -m 4096 -smp 1 -pidfile /var/run/ganeti/kvm-hypervisor/pid/testvm -balloon virtio -daemonize -machine pc-i440fx-bionic -monitor unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm.monitor,server,nowait -serial unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm.serial,server,nowait -usb -usbdevice tablet -vnc :5100,password -uuid 30c59506-b332-4ecf-aa08-1a0dbfc267b9 -netdev type=tap,id=nic-ffe9b380-e29e-4ced,fd=9 -device virtio-net-pci,bus=pci.0,id=nic-ffe9b380-e29e-4ced,addr=0xe,netdev=nic-ffe9b380-e29e-4ced,mac=aa:00:00:80:47:7c -qmp unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm.qmp,server,nowait -qmp unix:/var/run/ganeti/kvm-hypervisor/ctrl/testvm.kvmd,server,nowait -boot c -device virtio-blk-pci,bus=pci.0,id=disk-1482a045-cae9-4796,addr=0xc,drive=disk-1482a045-cae9-4796 -drive file=/var/run/ganeti/instance-disks/testvm:0,format=raw,if=none,cache=none,aio=native,id=disk-1482a045-cae9-4796 -device virtio-blk-pci,bus=pci.0,id=disk-b1a88112-2d33-4e01,addr=0xd,drive=disk-b1a88112-2d33-4e01 -drive file=/var/run/ganeti/instance-disks/testvm:1,format=raw,if=none,cache=none,aio=native,id=disk-b1a88112-2d33-4e01 -S

Cheers,
Phil

signature.asc

Phil Regnauld

unread,

May 8, 2018, 6:38:04 PM5/8/18

to Dimitris Aragiorgis, gan...@googlegroups.com

Phil Regnauld (regnauld) writes:
>
> Another possible bug - disk_cache=writeback (or any other value,
> really) seems to be ignored by Ganeti. stop/starting a VM
> after setting the value at both cluster & instance level doesn't
> make a difference - the command line still shows cache=none:

Ah, now that's interesting:

/usr/share/ganeti/2.16/ganeti/hypervisor/hv_kvm/__init__.py

# TODO: make this a hard error, instead of a silent overwrite
logging.warning("KVM: overriding disk_cache setting '%s' with 'none'"
" to prevent shared storage corruption on migration",
disk_cache)

:-|

signature.asc

Ansgar Jazdzewski

unread,

May 9, 2018, 2:27:12 AM5/9/18

to Ganeti, regn...@gmail.com

Hi,

Let me join in, we also use ganeti with the ceph backend so i'am
heigly interestet in your work and like to help and test on my side.

- we just made some changes to gor the librbd-path inside qemu
(userspace-access), but we can check both ways.
- i resently added the discard-option to the bdev.py so if you want me
to add/change some stuff in the bdev we can test it.

i lost a bit the track from your last mails can you give aus a smal summary

Thanks,
Ansgar

Phil Regnauld

unread,

May 9, 2018, 3:31:41 AM5/9/18

to Ansgar Jazdzewski, Ganeti

Ansgar Jazdzewski (a.jazdzewski) writes:
> Hi,
>
> Let me join in, we also use ganeti with the ceph backend so i'am
> heigly interestet in your work and like to help and test on my side.
>
> - we just made some changes to gor the librbd-path inside qemu
> (userspace-access), but we can check both ways.

Do you think userspace access would make a difference ?
And what about native KVM RBD support ?

> - i resently added the discard-option to the bdev.py so if you want me
> to add/change some stuff in the bdev we can test it.

Hmm, not quite sure that would help here.

> i lost a bit the track from your last mails can you give aus a smal summary

Yes sorry, I did spam the list a bit :)

Basically, CEPH/RBD + Ganeti 2.16 rc2 (Ubuntu 18.04) with disk_aio default
(i.e: threads) leads to corruption of the VM filesystem (in this case,
ZFS). Easy to reproduce, observed in test and customer environments.

Switching to disk_aio=native solves the issue, at the expense of disk write
performance (very low, about 30 MB/sec).

cache= is always "none" (anything else is apparently ignored I could
understand from reading /usr/share/ganeti/2.16/ganeti/hypervisor/hv_kvm/__init__.py).

Cheers,
Phil

Ansgar Jazdzewski

unread,

May 9, 2018, 5:06:22 AM5/9/18

to Phil Regnauld, Ganeti

2018-05-09 9:31 GMT+02:00 Phil Regnauld <regn...@gmail.com>:
> Ansgar Jazdzewski (a.jazdzewski) writes:
>> Hi,
>>
>> Let me join in, we also use ganeti with the ceph backend so i'am
>> heigly interestet in your work and like to help and test on my side.
>>
>> - we just made some changes to gor the librbd-path inside qemu
>> (userspace-access), but we can check both ways.
>
> Do you think userspace access would make a difference ?
> And what about native KVM RBD support ?

not 100% shure but you bypass the kernel pagecache for the RBD volume
so you get rit of one potential caus and also can test if it will work
that way.

remember to enable the rbd client cach in the ceph.conf

>> - i resently added the discard-option to the bdev.py so if you want me
>> to add/change some stuff in the bdev we can test it.
>
> Hmm, not quite sure that would help here.

this was just me trying to say that i can add some options to ganeti
to adopt ganeti so it make use of kvm-setting you like to have added

Phil Regnauld

unread,

May 9, 2018, 5:25:30 AM5/9/18

to Ansgar Jazdzewski, Ganeti

Ansgar Jazdzewski (a.jazdzewski) writes:
>
> not 100% shure but you bypass the kernel pagecache for the RBD volume
> so you get rit of one potential caus and also can test if it will work
> that way.
>
> remember to enable the rbd client cach in the ceph.conf

Will check - will it help writes ?

Also, how do you use CEPH ? With userspace access ? I saw there are
some discussions about it not being fully implemented. Or does it work ?
And if so, what is the proper way to enable it ?

> > Hmm, not quite sure that would help here.
>
> this was just me trying to say that i can add some options to ganeti
> to adopt ganeti so it make use of kvm-setting you like to have added

One that doesn't corrupt data :)

Joke aside, I'll think about it - but I'm curious as to why caching
methods other than "none" are disabled on shared storage (cf. mentions
of risk of data loss during live migration).

Cheers,
Phil

Ansgar Jazdzewski

unread,

May 9, 2018, 5:40:05 AM5/9/18

to Phil Regnauld, Ganeti

2018-05-09 11:25 GMT+02:00 Phil Regnauld <regn...@gmail.com>:
> Ansgar Jazdzewski (a.jazdzewski) writes:
>>
>> not 100% shure but you bypass the kernel pagecache for the RBD volume
>> so you get rit of one potential caus and also can test if it will work
>> that way.
>>
>> remember to enable the rbd client cach in the ceph.conf
>
> Will check - will it help writes ?
>
> Also, how do you use CEPH ? With userspace access ? I saw there are
> some discussions about it not being fully implemented. Or does it work ?
> And if so, what is the proper way to enable it ?

all you basicaly have to do is:

gnt-cluster modify -D rbd:access=userspace

ceph.conf
...
[client]
rbd cache = true
rbd cache writethrough until flush = true
...

https://ceph.com/geen-categorie/ceph-validate-that-the-rbd-cache-is-active/

>
>> > Hmm, not quite sure that would help here.
>>
>> this was just me trying to say that i can add some options to ganeti
>> to adopt ganeti so it make use of kvm-setting you like to have added
>
> One that doesn't corrupt data :)
>
> Joke aside, I'll think about it - but I'm curious as to why caching
> methods other than "none" are disabled on shared storage (cf. mentions
> of risk of data loss during live migration).

i "patched" the pythoncode in our installation to allow cachmodes with
rbd - as fast as i know the most kvm-management systems use writeback
as default

diff --git a/lib/hypervisor/hv_kvm/__init__.py
b/lib/hypervisor/hv_kvm/__init__.py
index 8d8ad6f..3a615d7 100644
--- a/lib/hypervisor/hv_kvm/__init__.py
+++ b/lib/hypervisor/hv_kvm/__init__.py
@@ -914,9 +914,9 @@ class KVMHypervisor(hv_base.BaseHypervisor):
if cpus == constants.CPU_PINNING_OFF:
# we checked this at import time
# pylint: disable=E1101
- target_process.set_cpu_affinity(range(psutil.cpu_count()))
+ target_process.cpu_affinity(range(psutil.cpu_count()))
else:
- target_process.set_cpu_affinity(cpus)
+ target_process.cpu_affinity(cpus)

@classmethod
def _AssignCpuAffinity(cls, cpu_mask, process_id, thread_dict):
@@ -1123,14 +1123,7 @@ class KVMHypervisor(hv_base.BaseHypervisor):
# Cache mode
disk_cache = up_hvp[constants.HV_DISK_CACHE]
for cfdev, link_name, uri in kvm_disks:
- if cfdev.dev_type in constants.DTS_EXT_MIRROR:
- if disk_cache != "none":
- # TODO: make this a hard error, instead of a silent overwrite
- logging.warning("KVM: overriding disk_cache setting '%s' with 'none'"
- " to prevent shared storage corruption on migration",
- disk_cache)
- cache_val = ",cache=none"
- elif disk_cache != constants.HT_CACHE_DEFAULT:
+ if disk_cache != constants.HT_CACHE_DEFAULT:
cache_val = ",cache=%s" % disk_cache
else:
cache_val = ""

> Cheers,
> Phil

Have a nice one,
Ansgar

Phil Regnauld

unread,

May 9, 2018, 5:55:57 AM5/9/18

to Ansgar Jazdzewski, Ganeti

Ansgar Jazdzewski (a.jazdzewski) writes:
>
> all you basicaly have to do is:
>
> gnt-cluster modify -D rbd:access=userspace

Ack, testing this now - found out by reading one of your older posts :)
https://groups.google.com/forum/#!topic/ganeti/kApy2vclbmg

> ceph.conf
> ...
> [client]
> rbd cache = true
> rbd cache writethrough until flush = true

I believe it's default now, but won't hurt.

> https://ceph.com/geen-categorie/ceph-validate-that-the-rbd-cache-is-active/

Found that link as well. Will test.

Thanks, will give it a spin!

Phil Regnauld

unread,

May 9, 2018, 6:38:34 AM5/9/18

to Ansgar Jazdzewski, Ganeti

Ansgar Jazdzewski (a.jazdzewski) writes:
>
> all you basicaly have to do is:
>
> gnt-cluster modify -D rbd:access=userspace
>
> ceph.conf
> ...
> [client]
> rbd cache = true
> rbd cache writethrough until flush = true

I'm happy to report that I am no longer seeing corruption on the VMs after
switching to userspace mode!

I didn't touch cache settings at RBD level, but did enable cache=writeback
as per your patch to __init.py__.

Last question: how do I modify existing VMs to use userspace for already
created disks ? Is it at all possible ? I see either:

gnt-cluster modify -D rbd:access=userspace

or
gnt-instance add -t rbd --disk 0:access=userspace

... but no option to *modify* existing disks.

PS: This still leaves a nasty bug when using default settings (RBD via mapped
block device) - but not sure if this

Cheers,
Phil

Phil Regnauld

unread,

May 9, 2018, 8:35:43 AM5/9/18

to Ansgar Jazdzewski, Ganeti

Ansgar Jazdzewski (a.jazdzewski) writes:
> - i resently added the discard-option to the bdev.py so if you want me
> to add/change some stuff in the bdev we can test it.

BTW, it could be useful to be able to run fstrim and reclaim
unused space from the CEPH cluster - do you have that patch
somewhere ?

Ansgar Jazdzewski

unread,

May 9, 2018, 9:02:58 AM5/9/18

to Phil Regnauld, Ganeti

Hi,

the patch can be found at
https://groups.google.com/forum/#!msg/ganeti/I1QNr0mBKEE/qsF63n8mCAAJ
because of interface changes this is not includet into ganeti 2.16

so you have to patch and build ganeti to enable it.

but feal free to do so,
Ansgar

Reply all

Reply to author

Forward