In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on every K8s node. When we restart ovn-controller for upgrade using `ovs-appctl -t ovn-controller exit --restart`, the remaining traffic still works fine since br-int with flows still be Installed.
However, when a new ovn-controller starts it will connect OVS IDL and do an engine init run, clearing all OpenFlow flows and install flows based on SB DB.
With open flows count above 200K+, it took more than 15 seconds to get all the flows installed br-int bridge again.
When the ovn-controller gets “exit --start”, it will write a “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in external-ids column. When new ovn-controller starts, it will check if the “ovs-cond-seqno” exists in the Open_vSwitch table, and get the seqno from OVS IDL to decide if it will force a recomputing process?
Check flow cnt on br-int every second:
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=10322
packet_count=0 byte_count=0 flow_count=34220
packet_count=0 byte_count=0 flow_count=60425
packet_count=0 byte_count=0 flow_count=82506
packet_count=0 byte_count=0 flow_count=106771
packet_count=0 byte_count=0 flow_count=131648
packet_count=2 byte_count=120 flow_count=158303
packet_count=29 byte_count=1693 flow_count=185999
packet_count=188 byte_count=12455 flow_count=212764
Hello OVN Experts,With ovn-k8s, we need to keep the flows always on br-int which needed by running pods on the k8s node.Is there an ongoing project to address this problem?If not, I have one proposal not sure if it is doable.Please share your thoughts.The issue:
In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on every K8s node. When we restart ovn-controller for upgrade using `ovs-appctl -t ovn-controller exit --restart`, the remaining traffic still works fine since br-int with flows still be Installed.
However, when a new ovn-controller starts it will connect OVS IDL and do an engine init run, clearing all OpenFlow flows and install flows based on SB DB.
With open flows count above 200K+, it took more than 15 seconds to get all the flows installed br-int bridge again.
Proposal solution for the issue:
When the ovn-controller gets “exit --start”, it will write a “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in external-ids column. When new ovn-controller starts, it will check if the “ovs-cond-seqno” exists in the Open_vSwitch table, and get the seqno from OVS IDL to decide if it will force a recomputing process?
Check flow cnt on br-int every second:
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=0
packet_count=0 byte_count=0 flow_count=10322
packet_count=0 byte_count=0 flow_count=34220
packet_count=0 byte_count=0 flow_count=60425
packet_count=0 byte_count=0 flow_count=82506
packet_count=0 byte_count=0 flow_count=106771
packet_count=0 byte_count=0 flow_count=131648
packet_count=2 byte_count=120 flow_count=158303
packet_count=29 byte_count=1693 flow_count=185999
packet_count=188 byte_count=12455 flow_count=212764
--Winson
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS8eC2EtMJbqBccGD0hyvLFBkzkeJ9sXOsT_TVF3Ltm2hA%40mail.gmail.com.
On Wed, Aug 5, 2020 at 12:58 PM Winson Wang <windso...@gmail.com> wrote:Hello OVN Experts,With ovn-k8s, we need to keep the flows always on br-int which needed by running pods on the k8s node.Is there an ongoing project to address this problem?If not, I have one proposal not sure if it is doable.Please share your thoughts.The issue:
In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on every K8s node. When we restart ovn-controller for upgrade using `ovs-appctl -t ovn-controller exit --restart`, the remaining traffic still works fine since br-int with flows still be Installed.
However, when a new ovn-controller starts it will connect OVS IDL and do an engine init run, clearing all OpenFlow flows and install flows based on SB DB.
With open flows count above 200K+, it took more than 15 seconds to get all the flows installed br-int bridge again.
Proposal solution for the issue:
When the ovn-controller gets “exit --start”, it will write a “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in external-ids column. When new ovn-controller starts, it will check if the “ovs-cond-seqno” exists in the Open_vSwitch table, and get the seqno from OVS IDL to decide if it will force a recomputing process?
Hi Winson,Thanks for the proposal. Yes, the connection break during upgrading is a real issue in a large scale environment. However, the proposal doesn't work. The "ovs-cond-seqno" is for the OVSDB IDL for the local conf DB, which is a completely different connection from the ovs-vswitchd open-flow connection.To avoid clearing the open-flow table during ovn-controller startup, we can find a way to postpone clearing the OVS flows after the recomputing in ovn-controller is completed, right before ovn-controller replacing with the new flows. This should largely reduce the time of connection broken during upgrading. Some changes in the ofctrl module's state machine are required, but I am not 100% sure if this approach is applicable. Need to check more details.
Regards,~Girish
Hi, Han:
A comment inline:
From: ovn-kub...@googlegroups.com <ovn-kub...@googlegroups.com>
On Behalf Of Han Zhou
Sent: Wednesday, August 5, 2020 3:36 PM
To: Winson Wang <windso...@gmail.com>
Cc: ovs-d...@openvswitch.org; ovn-kub...@googlegroups.com; Dumitru Ceara <dce...@redhat.com>; Han Zhou <hz...@ovn.org>
Subject: Re: ovn-k8s scale: how to make new ovn-controller process keep the previous Open Flow in br-int
External email: Use caution opening links or attachments |
On Wed, Aug 5, 2020 at 12:58 PM Winson Wang <windso...@gmail.com> wrote:
Hello OVN Experts,
With ovn-k8s, we need to keep the flows always on br-int which needed by running pods on the k8s node.
Is there an ongoing project to address this problem?
If not, I have one proposal not sure if it is doable.
Please share your thoughts.
The issue:
In large scale ovn-k8s cluster there are 200K+ Open Flows on br-int on every K8s node. When we restart ovn-controller for upgrade using `ovs-appctl -t ovn-controller exit --restart`, the remaining traffic still works fine since br-int with flows still be Installed.
However, when a new ovn-controller starts it will connect OVS IDL and do an engine init run, clearing all OpenFlow flows and install flows based on SB DB.
With open flows count above 200K+, it took more than 15 seconds to get all the flows installed br-int bridge again.
Proposal solution for the issue:
When the ovn-controller gets “exit --start”, it will write a “ovs-cond-seqno” to OVS IDL and store the value to Open vSwitch table in external-ids column. When new ovn-controller starts, it will check if the “ovs-cond-seqno” exists in the Open_vSwitch table, and get the seqno from OVS IDL to decide if it will force a recomputing process?
Hi Winson,
Thanks for the proposal. Yes, the connection break during upgrading is a real issue in a large scale environment. However, the proposal doesn't work. The "ovs-cond-seqno" is for the OVSDB IDL for the local conf DB, which is a completely different connection from the ovs-vswitchd open-flow connection.
To avoid clearing the open-flow table during ovn-controller startup, we can find a way to postpone clearing the OVS flows after the recomputing in ovn-controller is completed, right before ovn-controller replacing with the new flows.
[vi> ]
[vi> ] Seems like we force recompute today if the OVS IDL is reconnected. Would it be possible to defer
decision to recompute the flows based on the SB’s nb_cfg we have sync’d with? i.e. If our nb_cfg is
in sync with the SB’s global nb_cfg, we can skip the recompute? At least if nothing has changed since
the restart, we won’t need to do anything.. We could stash nb_cfg in OVS (once ovn-controller receives
conformation from OVS that the physical flows for an nb_cfg update are in place), which should be cleared if
OVS itself is restarted.. (I mean currently, nb_cfg is used to check if NB, SB and Chassis are in sync, we
could extend this to OVS/physical flows?)
Have not thought through this though .. so maybe I am missing something…
Thanks,
-venu
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCn5wEGZZ4%3DdovxhQZ2cgWpEyiPhbChk9amodnxNVgeQxQ%40mail.gmail.com.
We can also think if its possible to do the below way- When ovn-controller starts, it will not clear the flows, but instead will get the dump of flows from the br-int and populate these flows in its installed flows- And then when it connects to the SB DB and computes the desired flows, it will anyway sync up with the installed flows with the desired flows- And if there is no difference between desired flows and installed flows, there will be no impact on the datapath at all.Although this would require a careful thought and proper handling.
ThanksNuman
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/BYAPR12MB33495002B3970889CA9293B3BC480%40BYAPR12MB3349.namprd12.prod.outlook.com.
I have another suggestion to handle this issue during upgrade.Let's say the br-int has ports p1, p2, ....,p10 which corresponds to the logical ports ... p1, p2, ... , p10,Then the following can be done1. Create a temporary bridge - br-tempovs-vsctl add-br br-temp2. Create the ports p1, p2... to p10 in br-temp with different names but external_ids:iface-id set properly.Eg.ovs-vsctl add-port br-temp temp-p1 -- set interface temp-p1 type=internal -- set interface temp-p1 external_ids:iface-id=p1....ovs-vsctl add-port br-temp temp-p10 -- set interface temp-p10 type=internal -- set interface temp-p10 external_ids:iface-id=p1(I think this can be easily scripted)3. Just before restart of ovn-controller run - ovs-vsctl set open . external_ids:ovn-bridge=br-temp4. Restart ovn-controller after upgrading5. Wait till ovn-controller connects to the SB ovsdb-server and all the flows appear in br-temp6. Switch back to the ovn bridge to br-int - ovs-vsctl set open . external_ids:ovn-bridge=br-int7. Delete br-temp - ovs-vsctl del-br br-tempTill step 5, there should not be any datapath impact as br-int is untouched and all the flows would be there.There could be some downtime after step 6 as ovn-controller may delete all the flows in br-int and re add again. But theduration should be shorter.Please note I have not tested this myself. But its worth testing this in a small environment before trying on an actual deployment.You could skip step 2, but if ovn-monitor-all is false, then you would still see some delay due to conditional monitoring.This is totally under the operator/admin control. And there is no need for any ovn-controller changes. We can still work on approach (2)and handle all the tricky parts mentioned by Han, but this may take time.Any thoughts about this ? We used this similar approach when I worked on a migration script to migrate an existing OpenStack deployment with ML2OVS toML2OVN.
ThanksNuman
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDC%3DBFTxGtnJ-3J5xSFMr0zW%2Boc%3D74Kk2iX4ffNU56TdauA%40mail.gmail.com.
Hi, Han:
[vi> ]
[vi> ] So, if I understand it, the ovn-controller completes all its iterations and then finally does a
bundle(replace-flows)? That sounds good to me – that way since it is replace-flows, we don’t flush,
so don’t need any explicit delay time?
Thanks,
-venu
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCnmeykOrbUof3%2B-i9P%2B8c8gtLdO7OdjWt-mP%3DTWGeg4DA%40mail.gmail.com.
Hi, Han:
An additional comment;
[vi> ] though ovn-controller needs to handle the case where the bundle op fails? Since it’ll revert all the flows to
what it was before?
-venu
Thanks,
-venu
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/BYAPR12MB3349F15AF56E77C354DABF60BC490%40BYAPR12MB3349.namprd12.prod.outlook.com.
[vi> ] though ovn-controller needs to handle the case where the bundle op fails? Since it’ll revert all the flows to
what it was before?
-venu