Are hugepages by default a good idea for compute instance types?

Chandler Wilkerson

unread,

Sep 18, 2023, 8:56:23 AM9/18/23

to kubevirt-dev

Using a compute or memory instance type requires a node with huge pages support, which is not a default setting, so out of the box, compute and memory instance types are unusable.

When trying to create a VM with one of these instance types, it fails with an unhelpful error that points to scheduling, but does not mention huge pages as the reason.

I propose we have better messaging to bring to light the lack of huge pages when scheduling is prevented for this reason.

I would like to see a discussion here about whether to remove huge pages from the compute and memory instance types (by default) and find another way to introduce to admins the ability to require huge pages in an instance type.

Huge pages make sense across the board for VM handling nodes, but fine tuning the number of huge pages per compute node is a per-cluster exercise, and IMO fits better in a performance and tuning guide than a default.

I have opened [1] to discuss

1. https://github.com/kubevirt/common-instancetypes/issues/105

--

Chandler Wilkerson, RHCE
Sr. Software Engineer

Red Hat

Fabian Deutsch

unread,

Sep 18, 2023, 9:14:07 AM9/18/23

to Chandler Wilkerson, Lee Yarwood, kubevirt-dev

Chandler, hi!

Adding @Lee Yarwood

On Mon, Sep 18, 2023 at 2:56 PM Chandler Wilkerson <cwil...@redhat.com> wrote:

Using a compute or memory instance type requires a node with huge pages support, which is not a default setting, so out of the box, compute and memory instance types are unusable.

When trying to create a VM with one of these instance types, it fails with an unhelpful error that points to scheduling, but does not mention huge pages as the reason.

Please share the exact message - here and in [1].

I propose we have better messaging to bring to light the lack of huge pages when scheduling is prevented for this reason.

All errors need to be excellently bubbled up to the user. If not, then this is a bug.

I would like to see a discussion here about whether to remove huge pages from the compute and memory instance types (by default) and find another way to introduce to admins the ability to require huge pages in an instance type.

Can you please specify to which instanceTypes you refer to specifically?

to me, hugepages shoudl be limited to memory intensive, compute exclusive, and network (coming up).

Huge pages make sense across the board for VM handling nodes, but fine tuning the number of huge pages per compute node is a per-cluster exercise, and IMO fits better in a performance and tuning guide than a default.

In general, for no specific needs, U is the right series to take.

The goal of istance types is to move the tuning needs on users to pratically zero.

I have opened [1] to discuss

1. https://github.com/kubevirt/common-instancetypes/issues/105
--
Chandler Wilkerson, RHCE
Sr. Software Engineer

Red Hat

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAEbMwwzm2C5DMPYqX%3DyghjZNYVL5GShKpXpd_bJpo2Q869zHKQ%40mail.gmail.com.

Chandler Wilkerson

unread,

Sep 18, 2023, 11:04:57 AM9/18/23

to Fabian Deutsch, Lee Yarwood, kubevirt-dev

On Mon, Sep 18, 2023 at 8:14 AM Fabian Deutsch <fdeu...@redhat.com> wrote:

Chandler, hi!

Adding @Lee Yarwood

On Mon, Sep 18, 2023 at 2:56 PM Chandler Wilkerson <cwil...@redhat.com> wrote:
Using a compute or memory instance type requires a node with huge pages support, which is not a default setting, so out of the box, compute and memory instance types are unusable.

When trying to create a VM with one of these instance types, it fails with an unhelpful error that points to scheduling, but does not mention huge pages as the reason.

Please share the exact message - here and in [1].

Here's the describe Status: output:

Status:
Active Pods:
bd8dc179-cd57-4e75-a589-40c1c02e770a:
Conditions:
Last Probe Time: 2023-09-18T14:42:43Z
Last Transition Time: 2023-09-18T14:42:43Z
Message: Guest VM is not reported as running
Reason: GuestNotRunning
Status: False
Type: Ready
Last Probe Time: <nil>
Last Transition Time: 2023-09-18T14:42:43Z
Message: 0/6 nodes are available: 1 node(s) were unschedulable, 2 Insufficient hugepages-2Mi, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 2 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling..
Reason: Unschedulable
Status: False
Type: PodScheduled
Guest OS Info:
Phase: Scheduling
Phase Transition Timestamps:
Phase: Pending
Phase Transition Timestamp: 2023-09-18T14:42:43Z
Phase: Scheduling
Phase Transition Timestamp: 2023-09-18T14:42:43Z
Qos Class: Burstable
Runtime User: 107
Virtual Machine Revision Name: revision-start-vm-b0215b66-8379-456d-a4c0-ee318b3d0307-2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m virtualmachine-controller Created virtual machine pod virt-launcher-imaginative-marmoset-b8mgv

I propose we have better messaging to bring to light the lack of huge pages when scheduling is prevented for this reason.

All errors need to be excellently bubbled up to the user. If not, then this is a bug.

I recognize this may well be a deeper issue with the K8s Pod messaging too:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m20s default-scheduler 0/6 nodes are available: 1 node(s) were unschedulable, 2 Insufficient hugepages-2Mi, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 2 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling..
Warning FailedScheduling 2m14s default-scheduler 0/6 nodes are available: 1 node(s) were unschedulable, 2 Insufficient hugepages-2Mi, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 2 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling..

In the Pod Status, it mentions the huge pages, but does not underline it as not being met:

Limits:
devices.kubevirt.io/kvm: 1
devices.kubevirt.io/tun: 1
devices.kubevirt.io/vhost-net: 1
hugepages-2Mi: 16Gi
Requests:
cpu: 200m
devices.kubevirt.io/kvm: 1
devices.kubevirt.io/tun: 1
devices.kubevirt.io/vhost-net: 1
ephemeral-storage: 50M
hugepages-2Mi: 16Gi
memory: 295698433

I would like to see a discussion here about whether to remove huge pages from the compute and memory instance types (by default) and find another way to introduce to admins the ability to require huge pages in an instance type.

Can you please specify to which instanceTypes you refer to specifically?

to me, hugepages shoudl be limited to memory intensive, compute exclusive, and network (coming up).

I'm referring to the cx and m types specifically, both require 2Mi hugepages.

For that matter, the cx types require dedicated CPU placement, which requires cpumanager support, exposed in the virt-launcher Pod as a nodeSelector:

nodeSelector:
cpumanager: "true"
kubevirt.io/schedulable: "true"

Huge pages make sense across the board for VM handling nodes, but fine tuning the number of huge pages per compute node is a per-cluster exercise, and IMO fits better in a performance and tuning guide than a default.

In general, for no specific needs, U is the right series to take.
The goal of istance types is to move the tuning needs on users to pratically zero.

I agree with that goal; would it help to add help to the instancetype.kubevirt.io/description annotation for the instance type itself?

Current for a CX is defined here [2]

In part:

The exclusive resources are given to the compute threads of the

VM. In order to ensure this, some additional cores (depending
on the number of disks and NICs) will be requested to offload
the IO threading from cores dedicated to the workload.
In addition, in this series, the NUMA topology of the used
cores is provided to the VM.

Perhaps a link to a page in user-guide explaining how to adjust a cluster node to support the required features for each instance type that requires them?

I have opened [1] to discuss

1. https://github.com/kubevirt/common-instancetypes/issues/105

2. https://github.com/kubevirt/common-instancetypes/blob/main/instancetypes/cx/1/cx1.yaml

--
Chandler Wilkerson, RHCE
Sr. Software Engineer

Red Hat

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAEbMwwzm2C5DMPYqX%3DyghjZNYVL5GShKpXpd_bJpo2Q869zHKQ%40mail.gmail.com.

Lee Yarwood

unread,

Sep 18, 2023, 3:28:21 PM9/18/23

to Chandler Wilkerson, Fabian Deutsch, kubevirt-dev

Thanks Chandler, Fabian, comments in-line below.

On Mon, 18 Sept 2023 at 16:05, Chandler Wilkerson <cwil...@redhat.com> wrote:
> On Mon, Sep 18, 2023 at 8:14 AM Fabian Deutsch <fdeu...@redhat.com> wrote:
>>
>> Chandler, hi!
>>
>> Adding @Lee Yarwood

:) thanks!

>> On Mon, Sep 18, 2023 at 2:56 PM Chandler Wilkerson <cwil...@redhat.com> wrote:
>>>
>>> Using a compute or memory instance type requires a node with huge pages support, which is not a default setting, so out of the box, compute and memory instance types are unusable.

I tend to agree for compute but I think it's fine for the memory
intensive class.

>>> When trying to create a VM with one of these instance types, it fails with an unhelpful error that points to scheduling, but does not mention huge pages as the reason.

Insufficient hugepages-2Mi is listed in the message below, I don't
think there's anything more we could do here tbh as we don't want to
wrap the VMI scheduling process with some awareness of instance types
and preferences when the reason is already documented pretty well.

>> Please share the exact message - here and in [1].
>>
> Here's the describe Status: output:
>
> Status:
> Active Pods:
> bd8dc179-cd57-4e75-a589-40c1c02e770a:
> Conditions:
> Last Probe Time: 2023-09-18T14:42:43Z
> Last Transition Time: 2023-09-18T14:42:43Z
> Message: Guest VM is not reported as running
> Reason: GuestNotRunning
> Status: False
> Type: Ready
> Last Probe Time: <nil>
> Last Transition Time: 2023-09-18T14:42:43Z
> Message: 0/6 nodes are available: 1 node(s) were unschedulable, 2 Insufficient hugepages-2Mi, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 2 No preemption victims found for incoming pod, 4 Preemption is not helpful for scheduling..

^ `2 Insufficient hugepages-2Mi`

I honestly think Insufficient hugepages-2Mi is pretty clear here.

>>> I would like to see a discussion here about whether to remove huge pages from the compute and memory instance types (by default) and find another way to introduce to admins the ability to require huge pages in an instance type.
>>
>>
>> Can you please specify to which instanceTypes you refer to specifically?
>>
>> to me, hugepages shoudl be limited to memory intensive, compute exclusive, and network (coming up).

What's the justification for compute exclusive again?

> I'm referring to the cx and m types specifically, both require 2Mi hugepages.
>
> For that matter, the cx types require dedicated CPU placement, which requires cpumanager support, exposed in the virt-launcher Pod as a nodeSelector:
>
> nodeSelector:
> cpumanager: "true"
> kubevirt.io/schedulable: "true"
>
>>> Huge pages make sense across the board for VM handling nodes, but fine tuning the number of huge pages per compute node is a per-cluster exercise, and IMO fits better in a performance and tuning guide than a default.
>>
>> In general, for no specific needs, U is the right series to take.
>> The goal of istance types is to move the tuning needs on users to pratically zero.
>
> I agree with that goal; would it help to add help to the instancetype.kubevirt.io/description annotation for the instance type itself?
> Current for a CX is defined here [2]
> In part:
> The exclusive resources are given to the compute threads of the
> VM. In order to ensure this, some additional cores (depending
> on the number of disks and NICs) will be requested to offload
> the IO threading from cores dedicated to the workload.
> In addition, in this series, the NUMA topology of the used
> cores is provided to the VM.
>
> Perhaps a link to a page in user-guide explaining how to adjust a cluster node to support the required features for each instance type that requires them?

ACK to links to documentation, we also expose the requirements as
labels now FWIW:

https://blog.yarwood.me.uk/2023/06/22/kubevirt_instancetype_update_5/#resource-labels

>>>
>>> I have opened [1] to discuss
>>>
>>> 1. https://github.com/kubevirt/common-instancetypes/issues/105

Thanks, I'll document my thoughts there as well.

> 2. https://github.com/kubevirt/common-instancetypes/blob/main/instancetypes/cx/1/cx1.yaml

Cheers,

Lee

Chandler Wilkerson

unread,

Sep 18, 2023, 6:04:28 PM9/18/23

to Lee Yarwood, Fabian Deutsch, kubevirt-dev

Oop, I definitely scanned that too quickly and missed it. You're right of course. Now if we could get that info into the VMI, we'd be good.

Fabian Deutsch

unread,

Sep 22, 2023, 8:10:04 AM9/22/23

to Chandler Wilkerson, Lee Yarwood, kubevirt-dev

We have it:

status:
conditions:
- lastProbeTime: '2023-09-22T12:08:51Z'
lastTransitionTime: '2023-09-22T12:08:51Z'
message: Guest VM is not reported as running
reason: GuestNotRunning
status: 'False'
type: Ready
- lastProbeTime: null
lastTransitionTime: '2023-09-22T12:08:51Z'
message: >-
0/24 nodes are available: 1 node(s) had untolerated taint {dedicated:
nfv-qe}, 1 node(s) were unschedulable, 2 node(s) had untolerated taint
{dedicated: realtime}, 20 Insufficient hugepages-2Mi. preemption: 0/24
nodes are available: 20 No preemption victims found for incoming pod, 4

Preemption is not helpful for scheduling..

reason: Unschedulable
status: 'False'
type: PodScheduled

Chandler Wilkerson

unread,

Sep 22, 2023, 10:32:48 AM9/22/23

to Fabian Deutsch, Lee Yarwood, kubevirt-dev

I went back and double-checked, and I found the point where I got careless.

Thus far, I have been representing this as an issue with huge pages, but that's incorrect. As demonstrated, a lack of huge pages is effectively reported all the way back to the

VM, and at all levels underneath.

My issue was originally with the compute profile, which requires both huge pages and the cpu manager node selector. When you don't have cpu manager capable nodes, there isn't reporting to say so, and then (I guess here) the scheduler stops before looking at the huge pages requirement.

POD:

Warning FailedScheduling 5m default-scheduler 0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..

Both the VMI and the VM include the above message in their Status.conditions[type=Ready] object.

Chandler Wilkerson

unread,

Sep 22, 2023, 12:05:56 PM9/22/23

to Fabian Deutsch, Lee Yarwood, kubevirt-dev

Now that I am going over the warning again, I see that 3 nodes did not match the Pod's node affinity/selector, and that is valid for bringing the admin's attention to the missing cpumanager node attribute. There is still truncation of the message, which might clarify more about the huge pages later, but the first issue that needs fixing is exposed by the message after all.

Apologies for dragging this out, thanks all for your patience!

Reply all

Reply to author

Forward