Hello All,
The OVN QoS rules for OVN-K8s’s EgressQoS resource should be using from-lport instead of to-lport, isn’t it? See also: https://ovn-org.slack.com/archives/C010SQ5FSNL/p1663276948752959
With the `to-lport` direction, the DSCP marking is going to be applied on the egress pipeline of the logical switch. Say, we want all the packets from PodA marked with DSCP 28
This causes following two issues:
I understand that we chose ‘to-lport’ because DSCP is marked in the pre-LB state and therefore can’t be used to select k8s endpoints, is that correct? In that case, the end-users should be using K8s Cluster IP. The endpoint IPs are ephemeral anyways, right?
Regards,
~Girish
From: Dumitru Ceara <dce...@redhat.com>
Date: Tuesday, October 18, 2022 at 1:28 PM
To: Girish Moodalbail <gmood...@nvidia.com>, ovn-kubernetes <ovn-kub...@googlegroups.com>, obra...@redhat.com <obra...@redhat.com>
Cc: Tim Rozet <tro...@redhat.com>, Numan Siddique <nusi...@redhat.com>
Subject: Re: EgressQOS
External email: Use caution opening links or attachments
On 10/15/22 21:50, Girish Moodalbail wrote:
> Hello All,
>
Hi Girish,
> The OVN QoS rules for OVN-K8s’s EgressQoS resource should be using from-lport instead of to-lport, isn’t it? See also:
https://ovn-org.slack.com/archives/C010SQ5FSNL/p1663276948752959
>
> With the `to-lport` direction, the DSCP marking is going to be applied on the egress pipeline of the logical switch. Say, we want all the packets from PodA marked with DSCP 28
>
> This causes following two issues:
>
>
> 1. Pods on the same logical switch on same node:
>
> PodA on NodeA ------------ PodB on NodeA
>
> The packets are marked with DSCP of 28 on PodB ingress. It is too late. This is similar to Day-0 CVE we had with Egress K8s NetworkPolicy rules as it was using `to-lport` and then we later provided the CVE-Fix by changing egress rules to use `from-lport`
Maybe I'm missing some context here but why does it matter where we mark
packets for pods that are attached to the same logical switch (and on
the same physical node)?
In between the ls_in_qos_mark (from-lport qos) and ls_out_qos_mark
(to-lport qos) stages OVN will not read the DSCP value.
IIRC, the original discussion was about traffic leaving the cluster
(S-N) and in that case it made no difference whether we applied qos in
the node-switch ingress or egress pipeline. Except of course for the
fact that using to-lport would allow matching on already load balanced
traffic.
Is there a need to support `already load balanced traffic` if the EgressQoS is for North/South traffic alone? In this case, the EgressQoS object would have destination CIDR which is neither OVN K8s overlay IPs nor K8s Service Cluster IPs. They will be non-OVN LB IPs --- IPs in the underlay or towards ISP or some such, right?
Keeping the N/S discussion aside, using EgressQoS one can specify that all the packets originating from a Pod to have a certain DSCP value (which includes East-West as well). For example:
kind: EgressQoS
apiVersion: k8s.ovn.org/v1
metadata:
name: default
namespace: default
spec:
egress:
- dscp: 28
Consider these two pods below. Each have a VF and OVS is offloaded. Say, the packets from PodA are sent to PodB. Say, those packets should have a DSCP value of 28. Now, PodB receives the packets with DSCP value set to 28 in its IP header. Shouldn’t we process
these packets in the context of the sender (PodA) and subject them to QoS processing before sending it to PodB? What if the DSCP marked packets are configured for severe throttling (using tools outside of the OVN)?
>
> 2. Pods on secondary OVN networks with single Layer-2 logical switch (flat network):
>
> PodA on NodeA ------- Layer-2 overlay ----- PodB on NodeB
>
> Packets from PodA will traverse the overlay network with TOS/DSCP set to 0. They will arrive on the br-int on NodeB. The packets are then marked with DSCP of 28 and sent to PodB.
>
> I understand that we chose ‘to-lport’ because DSCP is marked in the pre-LB state and therefore can’t be used to select k8s endpoints, is that correct? In that case, the end-users should be using K8s Cluster IP. The endpoint IPs are ephemeral anyways, right?
>
Does that still apply for secondary OVN networks you described above?
Are load balancers applied on the single Layer-2 logical switch
implementing the flat network?
As it currently stands today, we don’t have any LB on Layer-2 logical switch. I am pretty sure there will be requirements in the future for adding LB for the secondary Layer-2 Logical switch.
Thanks,
~Girish