Hugepages support

181 views
Skip to first unread message

aluk...@redhat.com

unread,
Jun 4, 2018, 11:35:26 AM6/4/18
to kubevirt-dev
Hi guys,

I am currently working on hugepages support for VM's,  the PR[1] is pretty straightforward, but before continuing I want to hear some opinions about VM API and validation aspects.


How hugepages VM API must look?
I thought to use resources section to specify the number of hugepages that I needed, similar to Kubernetes approach[2], but
  • unlike pods, a VM cannot use both regular memory and hugepages, so specifying both memory and hugepages resources looks too complicate for me
  • hugepages resource does not support over-commitment, so you can specify both limit and request, but you do not need(if you will specify only limit, k8s will add request automatically, but if you will specify only request, k8s will complain that limit is missing)
so by my opinion resource section does not suite good for VM hugepages representation, I prefer to use some new field to specify hugepages size(2Mi, 1Gi) with memory request.


What do we want to do in cases when VM memory request does not divisible by hugepage size?
  • create VM with hugepages number equals to "ceil(memory request / hugepage size), for example in the case of 21Mi, we will set 11 2Mi hugepages on a virt-launcher pod
  • just show validation error, without VM creation


Marcus Sorensen

unread,
Jun 4, 2018, 1:01:06 PM6/4/18
to kubevirt-dev
Just a side note that there's currently an open issue in regards to how resource limits and requests are interpreted. I could see the outcome of that possibly having an effect on how this is implemented.

aluk...@redhat.com

unread,
Jun 5, 2018, 2:58:39 AM6/5/18
to kubevirt-dev
Thanks, Marcus, you are right it another problem that introduced because of k8s resources usage.

aluk...@redhat.com

unread,
Jun 5, 2018, 3:58:49 AM6/5/18
to kubevirt-dev
I want to add more context, how things are done under Kubernetes, Ovirt and OpenStack

Kubernetes
  • you can request both regular memory and hugepages after it if you want to use hugepages, you can use the hugepages mount for memory allocation
  • k8s does not allow to allocate more hugepages than floor(hugepages_memory_size  / hugepages_size), for example when you specified "hugepages-2Mi: 101Mi" on a pod, a process under the pod cannot use more than 50 2Mi hugepages
Ovirt
  • to configure hugepages, you need to specify the VM memory and the size of hugepages(2Mi, 1Gi)
  • in the case when a VM memory does not divisible by hugepages size, it will ceil the number of hugepages(a user requested 1.5Gi of memory with hugepages of size 1Gi, the VM will allocate 2 hugepages), the Ovirt scheduler will check if some host has 2 hugepages of size 1Gi, if yes, it will start the VM, if not it will reject the start and will show validation error
I less familiar with the OpenStack, so if I wrong please correct me, I took all information from [1]
  • you need to specify hugepages size on the flavor level, under OpenStack it has more options(large|any|2MB|1GB)
  • If the flavor memory size is not a multiple of the specified huge page size this would be considered as an error which would cause the instance to fail to boot
[1] https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/virt-driver-large-pages.html

Fabian Deutsch

unread,
Jun 6, 2018, 8:59:05 AM6/6/18
to Artyom Lukianov, kubevirt-dev
On Tue, Jun 5, 2018 at 9:58 AM, <aluk...@redhat.com> wrote:
I want to add more context, how things are done under Kubernetes, Ovirt and OpenStack

Kubernetes
  • you can request both regular memory and hugepages after it if you want to use hugepages, you can use the hugepages mount for memory allocation
  • k8s does not allow to allocate more hugepages than floor(hugepages_memory_size  / hugepages_size), for example when you specified "hugepages-2Mi: 101Mi" on a pod, a process under the pod cannot use more than 50 2Mi hugepages
Ovirt
  • to configure hugepages, you need to specify the VM memory and the size of hugepages(2Mi, 1Gi)
  • in the case when a VM memory does not divisible by hugepages size, it will ceil the number of hugepages(a user requested 1.5Gi of memory with hugepages of size 1Gi, the VM will allocate 2 hugepages), the Ovirt scheduler will check if some host has 2 hugepages of size 1Gi, if yes, it will start the VM, if not it will reject the start and will show validation error
I less familiar with the OpenStack, so if I wrong please correct me, I took all information from [1]
  • you need to specify hugepages size on the flavor level, under OpenStack it has more options(large|any|2MB|1GB)
  • If the flavor memory size is not a multiple of the specified huge page size this would be considered as an error which would cause the instance to fail to boot
[1] https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/virt-driver-large-pages.html


Hey Artyom,

thanks for providing this additional context.

As you nicely noted, for the VM guest either memory or hugepages can be allocated. However - and I know wes spoke about it, just noting it - qemu libvirt etc still need memory, thus if HP are requested, we also need to requets memory (the overhead amount).

WRT API.
I'd probably go with a flag on the domain spec:

kind: VM
spec:
  domain:
    memory:
      hugepages:
        size: 2M

This would actually be analogous to the CPU configuration where we separately define the vCPU setup (.spec.domain.cpu) and on the other hand request the  CPU shares (.spec.domain.resources)

There will be more options in the hugepages struct if we look at
https://libvirt.org/formatdomain.html#elementsMemoryBacking

OTOH I wonder if we need to expose them 1:1 or aggregate thme into certain abstractions.

- fabian

On Monday, June 4, 2018 at 6:35:26 PM UTC+3, aluk...@redhat.com wrote:
Hi guys,

I am currently working on hugepages support for VM's,  the PR[1] is pretty straightforward, but before continuing I want to hear some opinions about VM API and validation aspects.


How hugepages VM API must look?
I thought to use resources section to specify the number of hugepages that I needed, similar to Kubernetes approach[2], but
  • unlike pods, a VM cannot use both regular memory and hugepages, so specifying both memory and hugepages resources looks too complicate for me
  • hugepages resource does not support over-commitment, so you can specify both limit and request, but you do not need(if you will specify only limit, k8s will add request automatically, but if you will specify only request, k8s will complain that limit is missing)
so by my opinion resource section does not suite good for VM hugepages representation, I prefer to use some new field to specify hugepages size(2Mi, 1Gi) with memory request.


What do we want to do in cases when VM memory request does not divisible by hugepage size?
  • create VM with hugepages number equals to "ceil(memory request / hugepage size), for example in the case of 21Mi, we will set 11 2Mi hugepages on a virt-launcher pod
  • just show validation error, without VM creation


--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubevi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/4cfec6ff-6085-49ee-90a4-554590d83e3a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Steve Gordon

unread,
Jun 6, 2018, 9:22:19 AM6/6/18
to Fabian Deutsch, Artyom Lukianov, kubevirt-dev
On Wed, Jun 6, 2018 at 8:59 AM, Fabian Deutsch <fdeu...@redhat.com> wrote:


On Tue, Jun 5, 2018 at 9:58 AM, <aluk...@redhat.com> wrote:
I want to add more context, how things are done under Kubernetes, Ovirt and OpenStack

Kubernetes
  • you can request both regular memory and hugepages after it if you want to use hugepages, you can use the hugepages mount for memory allocation
  • k8s does not allow to allocate more hugepages than floor(hugepages_memory_size  / hugepages_size), for example when you specified "hugepages-2Mi: 101Mi" on a pod, a process under the pod cannot use more than 50 2Mi hugepages
Ovirt
  • to configure hugepages, you need to specify the VM memory and the size of hugepages(2Mi, 1Gi)
  • in the case when a VM memory does not divisible by hugepages size, it will ceil the number of hugepages(a user requested 1.5Gi of memory with hugepages of size 1Gi, the VM will allocate 2 hugepages), the Ovirt scheduler will check if some host has 2 hugepages of size 1Gi, if yes, it will start the VM, if not it will reject the start and will show validation error
I less familiar with the OpenStack, so if I wrong please correct me, I took all information from [1]
  • you need to specify hugepages size on the flavor level, under OpenStack it has more options(large|any|2MB|1GB)
  • If the flavor memory size is not a multiple of the specified huge page size this would be considered as an error which would cause the instance to fail to boot
[1] https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/virt-driver-large-pages.html


Hey Artyom,

thanks for providing this additional context.

As you nicely noted, for the VM guest either memory or hugepages can be allocated. However - and I know wes spoke about it, just noting it - qemu libvirt etc still need memory, thus if HP are requested, we also need to requets memory (the overhead amount).

How closely are we aiming to guessimate the overhead? My understanding is traditionally this has been challenging and other offerings in the virtualization space using this stack have used a fixed value per host (which is imperfect as the amount required for QEMU can fluctuate based on the number of devices etc. of the guest itself)?

Thanks,

Steve
 

For more options, visit https://groups.google.com/d/optout.



--
Stephen Gordon,
Principal Product Manager,
Red Hat

Fabian Deutsch

unread,
Jun 6, 2018, 10:09:52 AM6/6/18
to Steve Gordon, Artyom Lukianov, kubevirt-dev
On Wed, Jun 6, 2018 at 3:22 PM, Steve Gordon <sgo...@redhat.com> wrote:


On Wed, Jun 6, 2018 at 8:59 AM, Fabian Deutsch <fdeu...@redhat.com> wrote:


On Tue, Jun 5, 2018 at 9:58 AM, <aluk...@redhat.com> wrote:
I want to add more context, how things are done under Kubernetes, Ovirt and OpenStack

Kubernetes
  • you can request both regular memory and hugepages after it if you want to use hugepages, you can use the hugepages mount for memory allocation
  • k8s does not allow to allocate more hugepages than floor(hugepages_memory_size  / hugepages_size), for example when you specified "hugepages-2Mi: 101Mi" on a pod, a process under the pod cannot use more than 50 2Mi hugepages
Ovirt
  • to configure hugepages, you need to specify the VM memory and the size of hugepages(2Mi, 1Gi)
  • in the case when a VM memory does not divisible by hugepages size, it will ceil the number of hugepages(a user requested 1.5Gi of memory with hugepages of size 1Gi, the VM will allocate 2 hugepages), the Ovirt scheduler will check if some host has 2 hugepages of size 1Gi, if yes, it will start the VM, if not it will reject the start and will show validation error
I less familiar with the OpenStack, so if I wrong please correct me, I took all information from [1]
  • you need to specify hugepages size on the flavor level, under OpenStack it has more options(large|any|2MB|1GB)
  • If the flavor memory size is not a multiple of the specified huge page size this would be considered as an error which would cause the instance to fail to boot
[1] https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/virt-driver-large-pages.html


Hey Artyom,

thanks for providing this additional context.

As you nicely noted, for the VM guest either memory or hugepages can be allocated. However - and I know wes spoke about it, just noting it - qemu libvirt etc still need memory, thus if HP are requested, we also need to requets memory (the overhead amount).

How closely are we aiming to guessimate the overhead? My understanding is traditionally this has been challenging and other offerings in the virtualization space using this stack have used a fixed value per host (which is imperfect as the amount required for QEMU can fluctuate based on the number of devices etc. of the guest itself)?


For now we have settled with a function derived from how oVirt is calculating the overhead.

For Kubevirt we do need a per vm overhead (regardless where this value comes from) as we need it to express the pod memory requirements.

- fabian

Marcus Sorensen

unread,
Jun 7, 2018, 2:23:04 PM6/7/18
to kubevirt-dev
Thanks,

Steve
 
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

To post to this group, send email to kubevi...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

To post to this group, send email to kubevi...@googlegroups.com.



--
Stephen Gordon,
Principal Product Manager,
Red Hat


As an aside, I have personally run into a few instances where virt-launcher pods were killed for memory, though I was troubleshooting something else so opted to increase the limit vs the request to get past it tactically. I've been meaning to try to reproduce, but the current overhead estimation may not be sufficient. I want to say generally that it was a 4G request windows VM (with all the recommended acpi/apic/timer stuff) and two PVCs.

aluk...@redhat.com

unread,
Jun 7, 2018, 3:02:59 PM6/7/18
to kubevirt-dev
Hi Marcus, thanks for the information, I think we can open another topic related to the memory overhead, and if you will encounter again the problem with windows machines, can you please open an issue?

Fabian Deutsch

unread,
Jun 7, 2018, 4:12:48 PM6/7/18
to Artyom Lukianov, kubevirt-dev
Yes, please do so.

This might actually be a good time to think about adding custom metrics for several parts of the picture.

- fabian

To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev+unsubscribe@googlegroups.com.

To post to this group, send email to kubevi...@googlegroups.com.

Itamar Heim

unread,
Jun 7, 2018, 5:21:51 PM6/7/18
to Fabian Deutsch, Artyom Lukianov, kubevirt-dev


On Wed, Jun 6, 2018, 08:59 Fabian Deutsch <fdeu...@redhat.com> wrote:


On Tue, Jun 5, 2018 at 9:58 AM, <aluk...@redhat.com> wrote:
I want to add more context, how things are done under Kubernetes, Ovirt and OpenStack

Kubernetes
  • you can request both regular memory and hugepages after it if you want to use hugepages, you can use the hugepages mount for memory allocation
  • k8s does not allow to allocate more hugepages than floor(hugepages_memory_size  / hugepages_size), for example when you specified "hugepages-2Mi: 101Mi" on a pod, a process under the pod cannot use more than 50 2Mi hugepages
Ovirt
  • to configure hugepages, you need to specify the VM memory and the size of hugepages(2Mi, 1Gi)
  • in the case when a VM memory does not divisible by hugepages size, it will ceil the number of hugepages(a user requested 1.5Gi of memory with hugepages of size 1Gi, the VM will allocate 2 hugepages), the Ovirt scheduler will check if some host has 2 hugepages of size 1Gi, if yes, it will start the VM, if not it will reject the start and will show validation error
I less familiar with the OpenStack, so if I wrong please correct me, I took all information from [1]
  • you need to specify hugepages size on the flavor level, under OpenStack it has more options(large|any|2MB|1GB)
  • If the flavor memory size is not a multiple of the specified huge page size this would be considered as an error which would cause the instance to fail to boot
[1] https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/virt-driver-large-pages.html


Hey Artyom,

thanks for providing this additional context.

As you nicely noted, for the VM guest either memory or hugepages can be allocated. However - and I know wes spoke about it, just noting it - qemu libvirt etc still need memory, thus if HP are requested, we also need to requets memory (the overhead amount).

Would libvirt and qemu consume hugepages for themselves?


To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

To post to this group, send email to kubevi...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

To post to this group, send email to kubevi...@googlegroups.com.

Michal Skrivanek

unread,
Jun 8, 2018, 5:14:57 AM6/8/18
to RH - Itamar Heim, Fabian Deutsch, Artyom Lukianov, kubevirt-dev

On 7 Jun 2018, at 23:21, Itamar Heim <ih...@redhat.com> wrote:



On Wed, Jun 6, 2018, 08:59 Fabian Deutsch <fdeu...@redhat.com> wrote:


On Tue, Jun 5, 2018 at 9:58 AM, <aluk...@redhat.com> wrote:
I want to add more context, how things are done under Kubernetes, Ovirt and OpenStack

Kubernetes
  • you can request both regular memory and hugepages after it if you want to use hugepages, you can use the hugepages mount for memory allocation
  • k8s does not allow to allocate more hugepages than floor(hugepages_memory_size  / hugepages_size), for example when you specified "hugepages-2Mi: 101Mi" on a pod, a process under the pod cannot use more than 50 2Mi hugepages
Ovirt
  • to configure hugepages, you need to specify the VM memory and the size of hugepages(2Mi, 1Gi)

sizes depend on host capability (arch)

  • in the case when a VM memory does not divisible by hugepages size, it will ceil the number of hugepages(a user requested 1.5Gi of memory with hugepages of size 1Gi, the VM will allocate 2 hugepages), the Ovirt scheduler will check if some host has 2 hugepages of size 1Gi, if yes, it will start the VM, if not it will reject the start and will show validation error
I less familiar with the OpenStack, so if I wrong please correct me, I took all information from [1]
  • you need to specify hugepages size on the flavor level, under OpenStack it has more options(large|any|2MB|1GB)
  • If the flavor memory size is not a multiple of the specified huge page size this would be considered as an error which would cause the instance to fail to boot
[1] https://specs.openstack.org/openstack/nova-specs/specs/kilo/implemented/virt-driver-large-pages.html


Hey Artyom,

thanks for providing this additional context.

As you nicely noted, for the VM guest either memory or hugepages can be allocated. However - and I know wes spoke about it, just noting it - qemu libvirt etc still need memory, thus if HP are requested, we also need to requets memory (the overhead amount).

Would libvirt and qemu consume hugepages for themselves?

AFAIK no, it’s just for the guest address space

Fabian Deutsch

unread,
Jun 11, 2018, 7:57:47 AM6/11/18
to Artyom Lukianov, kubevirt-dev
Hey,

in a quick example I mesauerd the numbers of the allocated VM memory, and the requirements of the pod:

This is from a 64MB VM:

NAMESPACE     NAME                                    CPU(cores)   MEMORY(bytes)  
default       virt-launcher-testvm-tee.local-xljks    162m         164Mi    

So 100M overhead. This includes _everything_ in the _pod_.

Needs a few more trials to see how this behaves in relation to different VM configurations.

- fabian


On Thu, Jun 7, 2018 at 9:02 PM, <aluk...@redhat.com> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev+unsubscribe@googlegroups.com.

To post to this group, send email to kubevi...@googlegroups.com.

Fabian Deutsch

unread,
Jun 11, 2018, 7:58:06 AM6/11/18
to Artyom Lukianov, kubevirt-dev
Hey Artyom,

has been a few days - what is your current approach, and how does the API look now?

- fabian

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubevi...@googlegroups.com.

aluk...@redhat.com

unread,
Jun 11, 2018, 8:00:28 AM6/11/18
to kubevirt-dev
Hey,
  • I will use API similar to one that you suggested
  • I will use strict validation on a VM memory request and size of hugepages
The PR already update and I wait for additional reviews:)

On Monday, June 11, 2018 at 2:58:06 PM UTC+3, Fabian Deutsch wrote:
Hey Artyom,

has been a few days - what is your current approach, and how does the API look now?

- fabian
On Mon, Jun 4, 2018 at 5:35 PM, <aluk...@redhat.com> wrote:
Hi guys,

I am currently working on hugepages support for VM's,  the PR[1] is pretty straightforward, but before continuing I want to hear some opinions about VM API and validation aspects.


How hugepages VM API must look?
I thought to use resources section to specify the number of hugepages that I needed, similar to Kubernetes approach[2], but
  • unlike pods, a VM cannot use both regular memory and hugepages, so specifying both memory and hugepages resources looks too complicate for me
  • hugepages resource does not support over-commitment, so you can specify both limit and request, but you do not need(if you will specify only limit, k8s will add request automatically, but if you will specify only request, k8s will complain that limit is missing)
so by my opinion resource section does not suite good for VM hugepages representation, I prefer to use some new field to specify hugepages size(2Mi, 1Gi) with memory request.


What do we want to do in cases when VM memory request does not divisible by hugepage size?
  • create VM with hugepages number equals to "ceil(memory request / hugepage size), for example in the case of 21Mi, we will set 11 2Mi hugepages on a virt-launcher pod
  • just show validation error, without VM creation


--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Fabian Deutsch

unread,
Jun 11, 2018, 8:05:53 AM6/11/18
to Artyom Lukianov, kubevirt-dev
Cool.

Could you also create a user-guide update in order to give a high level view on the API?

- fabian

To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev+unsubscribe@googlegroups.com.

To post to this group, send email to kubevi...@googlegroups.com.

Artyom Lukianov

unread,
Jun 11, 2018, 8:14:35 AM6/11/18
to Fabian Deutsch, kubevirt-dev

Fabian Deutsch

unread,
Jun 11, 2018, 8:31:17 AM6/11/18
to Artyom Lukianov, kubevirt-dev
Thanks!

I missed that one.

- fabian

Fabian Deutsch

unread,
Jun 11, 2018, 8:43:09 AM6/11/18
to Artyom Lukianov, kubevirt-dev
Mainly to move hugepage into a memory struct in order to provide osme more context.

- fabian

On Mon, Jun 11, 2018 at 2:14 PM, Artyom Lukianov <aluk...@redhat.com> wrote:
Reply all
Reply to author
Forward
0 new messages