EgressQOS

Girish Moodalbail

unread,

Oct 15, 2022, 3:50:39 PM10/15/22

to ovn-kubernetes, obra...@redhat.com, Tim Rozet, Dumitru Ceara, Numan Siddique

Hello All,

The OVN QoS rules for OVN-K8s’s EgressQoS resource should be using from-lport instead of to-lport, isn’t it? See also: https://ovn-org.slack.com/archives/C010SQ5FSNL/p1663276948752959

With the `to-lport` direction, the DSCP marking is going to be applied on the egress pipeline of the logical switch. Say, we want all the packets from PodA marked with DSCP 28

This causes following two issues:

Pods on the same logical switch on same node:

PodA on NodeA ------------ PodB on NodeA

The packets are marked with DSCP of 28 on PodB ingress. It is too late. This is similar to Day-0 CVE we had with Egress K8s NetworkPolicy rules as it was using `to-lport` and then we later provided the CVE-Fix by changing egress rules to use `from-lport`
Pods on secondary OVN networks with single Layer-2 logical switch (flat network):

PodA on NodeA ------- Layer-2 overlay ----- PodB on NodeB

Packets from PodA will traverse the overlay network with TOS/DSCP set to 0. They will arrive on the br-int on NodeB. The packets are then marked with DSCP of 28 and sent to PodB.

I understand that we chose ‘to-lport’ because DSCP is marked in the pre-LB state and therefore can’t be used to select k8s endpoints, is that correct? In that case, the end-users should be using K8s Cluster IP. The endpoint IPs are ephemeral anyways, right?

Regards,

~Girish

Dumitru Ceara

unread,

Oct 18, 2022, 4:28:33 PM10/18/22

to Girish Moodalbail, ovn-kubernetes, obra...@redhat.com, Tim Rozet, Numan Siddique

On 10/15/22 21:50, Girish Moodalbail wrote:
> Hello All,
>

Hi Girish,

> The OVN QoS rules for OVN-K8s’s EgressQoS resource should be using from-lport instead of to-lport, isn’t it? See also: https://ovn-org.slack.com/archives/C010SQ5FSNL/p1663276948752959
>
> With the `to-lport` direction, the DSCP marking is going to be applied on the egress pipeline of the logical switch. Say, we want all the packets from PodA marked with DSCP 28
>
> This causes following two issues:
>
>

> 1. Pods on the same logical switch on same node:

>
> PodA on NodeA ------------ PodB on NodeA
>
> The packets are marked with DSCP of 28 on PodB ingress. It is too late. This is similar to Day-0 CVE we had with Egress K8s NetworkPolicy rules as it was using `to-lport` and then we later provided the CVE-Fix by changing egress rules to use `from-lport`

Maybe I'm missing some context here but why does it matter where we mark
packets for pods that are attached to the same logical switch (and on
the same physical node)?

In between the ls_in_qos_mark (from-lport qos) and ls_out_qos_mark
(to-lport qos) stages OVN will not read the DSCP value.

IIRC, the original discussion was about traffic leaving the cluster
(S-N) and in that case it made no difference whether we applied qos in
the node-switch ingress or egress pipeline. Except of course for the
fact that using to-lport would allow matching on already load balanced
traffic.

>
> 2. Pods on secondary OVN networks with single Layer-2 logical switch (flat network):

>
> PodA on NodeA ------- Layer-2 overlay ----- PodB on NodeB
>
> Packets from PodA will traverse the overlay network with TOS/DSCP set to 0. They will arrive on the br-int on NodeB. The packets are then marked with DSCP of 28 and sent to PodB.
>
> I understand that we chose ‘to-lport’ because DSCP is marked in the pre-LB state and therefore can’t be used to select k8s endpoints, is that correct? In that case, the end-users should be using K8s Cluster IP. The endpoint IPs are ephemeral anyways, right?
>

Does that still apply for secondary OVN networks you described above?
Are load balancers applied on the single Layer-2 logical switch
implementing the flat network?

>
> Regards,
> ~Girish
>

Regards,
Dumitru

Girish Moodalbail

unread,

Oct 19, 2022, 12:55:22 AM10/19/22

to Dumitru Ceara, ovn-kubernetes, obra...@redhat.com, Tim Rozet, Numan Siddique

From: Dumitru Ceara <dce...@redhat.com>
Date: Tuesday, October 18, 2022 at 1:28 PM
To: Girish Moodalbail <gmood...@nvidia.com>, ovn-kubernetes <ovn-kub...@googlegroups.com>, obra...@redhat.com <obra...@redhat.com>
Cc: Tim Rozet <tro...@redhat.com>, Numan Siddique <nusi...@redhat.com>
Subject: Re: EgressQOS

External email: Use caution opening links or attachments

On 10/15/22 21:50, Girish Moodalbail wrote:
> Hello All,
>

Hi Girish,

> The OVN QoS rules for OVN-K8s’s EgressQoS resource should be using from-lport instead of to-lport, isn’t it? See also: https://ovn-org.slack.com/archives/C010SQ5FSNL/p1663276948752959
>
> With the `to-lport` direction, the DSCP marking is going to be applied on the egress pipeline of the logical switch. Say, we want all the packets from PodA marked with DSCP 28
>
> This causes following two issues:
>
>
> 1. Pods on the same logical switch on same node:
>
> PodA on NodeA ------------ PodB on NodeA
>
> The packets are marked with DSCP of 28 on PodB ingress. It is too late. This is similar to Day-0 CVE we had with Egress K8s NetworkPolicy rules as it was using `to-lport` and then we later provided the CVE-Fix by changing egress rules to use `from-lport`

Maybe I'm missing some context here but why does it matter where we mark
packets for pods that are attached to the same logical switch (and on
the same physical node)?

In between the ls_in_qos_mark (from-lport qos) and ls_out_qos_mark
(to-lport qos) stages OVN will not read the DSCP value.

IIRC, the original discussion was about traffic leaving the cluster
(S-N) and in that case it made no difference whether we applied qos in
the node-switch ingress or egress pipeline. Except of course for the
fact that using to-lport would allow matching on already load balanced
traffic.

Is there a need to support `already load balanced traffic` if the EgressQoS is for North/South traffic alone? In this case, the EgressQoS object would have destination CIDR which is neither OVN K8s overlay IPs nor K8s Service Cluster IPs. They will be non-OVN LB IPs --- IPs in the underlay or towards ISP or some such, right?

Keeping the N/S discussion aside, using EgressQoS one can specify that all the packets originating from a Pod to have a certain DSCP value (which includes East-West as well). For example:

kind: EgressQoS

apiVersion: k8s.ovn.org/v1

metadata:

name: default

namespace: default

spec:

egress:

- dscp: 28

Consider these two pods below. Each have a VF and OVS is offloaded. Say, the packets from PodA are sent to PodB. Say, those packets should have a DSCP value of 28. Now, PodB receives the packets with DSCP value set to 28 in its IP header. Shouldn’t we process these packets in the context of the sender (PodA) and subject them to QoS processing before sending it to PodB? What if the DSCP marked packets are configured for severe throttling (using tools outside of the OVN)?

>
> 2. Pods on secondary OVN networks with single Layer-2 logical switch (flat network):
>
> PodA on NodeA ------- Layer-2 overlay ----- PodB on NodeB
>
> Packets from PodA will traverse the overlay network with TOS/DSCP set to 0. They will arrive on the br-int on NodeB. The packets are then marked with DSCP of 28 and sent to PodB.
>
> I understand that we chose ‘to-lport’ because DSCP is marked in the pre-LB state and therefore can’t be used to select k8s endpoints, is that correct? In that case, the end-users should be using K8s Cluster IP. The endpoint IPs are ephemeral anyways, right?
>

Does that still apply for secondary OVN networks you described above?
Are load balancers applied on the single Layer-2 logical switch
implementing the flat network?

As it currently stands today, we don’t have any LB on Layer-2 logical switch. I am pretty sure there will be requirements in the future for adding LB for the secondary Layer-2 Logical switch.

Thanks,

~Girish

Dumitru Ceara

unread,

Oct 19, 2022, 4:42:44 AM10/19/22

to Girish Moodalbail, ovn-kubernetes, obra...@redhat.com, Tim Rozet, Numan Siddique

On 10/19/22 06:55, Girish Moodalbail wrote:
>
>
>
>
> *From: *Dumitru Ceara <dce...@redhat.com>
> *Date: *Tuesday, October 18, 2022 at 1:28 PM
> *To: *Girish Moodalbail <gmood...@nvidia.com>, ovn-kubernetes
> <ovn-kub...@googlegroups.com>, obra...@redhat.com <obra...@redhat.com>
> *Cc: *Tim Rozet <tro...@redhat.com>, Numan Siddique <nusi...@redhat.com>
> *Subject: *Re: EgressQOS

What about traffic destined to a non-local node-port service? Should
that be QoSed too?

> Keeping the N/S discussion aside, using EgressQoS one can specify that
> all the packets originating from a Pod to have a certain DSCP value
> (which includes East-West as well). For example:
>
> kind: EgressQoS
>
> apiVersion: k8s.ovn.org/v1
>
> metadata:
>
> name: default
>
> namespace: default
>
> spec:
>
> egress:
>
> - dscp: 28
>
>
> Consider these two pods below. Each have a VF and OVS is offloaded. Say,
> the packets from PodA are sent to PodB. Say, those packets should have a
> DSCP value of 28. Now, PodB receives the packets with DSCP value set to
> 28 in its IP header. Shouldn’t we process these packets in the context
> of the sender (PodA) and subject them to QoS processing before sending
> it to PodB? What if the DSCP marked packets are configured for severe
> throttling (using tools outside of the OVN)?
>

I understand the requirement in a non-virtualized network. What's not
clear to me is how the OVS dataplane implementation (HWOL in this case)
will inject some custom actions to enforce checking and interpreting of
DSCP.

The OVS datapath flow for traffic podA->podB will look something like
(over simplified):

in_port=podA,smac=podAmac,dmac=podBmac actions=(set_dscp(28),out_port=podB)

This is the same in both cases ("from-lport" or "to-lport") so I don't
see how the dataplane can decide to apply qos differently in one case vs
the other.

>
>
>>
>> 2. Pods on secondary OVN networks with single Layer-2 logical switch (flat network):
>>
>> PodA on NodeA ------- Layer-2 overlay ----- PodB on NodeB
>>
>> Packets from PodA will traverse the overlay network with TOS/DSCP set to 0. They will arrive on the br-int on NodeB. The packets are then marked with DSCP of 28 and sent to PodB.
>>
>> I understand that we chose ‘to-lport’ because DSCP is marked in the pre-LB state and therefore can’t be used to select k8s endpoints, is that correct? In that case, the end-users should be using K8s Cluster IP. The endpoint IPs are ephemeral anyways, right?
>>
>
> Does that still apply for secondary OVN networks you described above?
> Are load balancers applied on the single Layer-2 logical switch
> implementing the flat network?
>
>
> As it currently stands today, we don’t have any LB on Layer-2 logical
> switch. I am pretty sure there will be requirements in the future for
> adding LB for the secondary Layer-2 Logical switch.
>

If that's the case, maybe the simplest would be indeed to add support
for an "apply-after-lb" qos from-lport rule. But I think that will
require two new OVN stages in the pipeline. So I'd wait until we
exhaust all other options.

> Thanks,
>
> ~Girish
>

Regards,
Dumitru

Tim Rozet

unread,

Oct 19, 2022, 9:27:08 AM10/19/22

to Dumitru Ceara, Girish Moodalbail, ovn-kubernetes, obra...@redhat.com, Numan Siddique

Egress QoS is only for egress S->N traffic:

The EgressQoS feature enables marking pods egress traffic with a valid QoS Differentiated Services Code Point (DSCP) value. The QoS markings will be consumed and acted upon by network appliances outside of the Kubernetes cluster to optimize traffic flow throughout their networks.

Non-local node port traffic is considered egress traffic as well.

Tim Rozet

Red Hat OpenShift Networking Team

Reply all

Reply to author

Forward