OVN Scale with RAFT: how to make raft cluster clients to balanced state again

211 views
Skip to first unread message

Winson Wang

unread,
Aug 5, 2020, 3:51:38 PM8/5/20
to ovs-d...@openvswitch.org, winson wang, Han Zhou, Dumitru Ceara, ovn-kub...@googlegroups.com
Hello OVN Experts:

With large scale ovn-k8s cluster,  there are several conditions that would make ovn-controller clients connect SB central from a balanced state to an unbalanced state.
Is there an ongoing project to address this problem?
If not,  I have one proposal not sure if it is doable.
Please share your thoughts.

The issue:

OVN SB RAFT 3 node cluster,  at first all the ovn-controller clients will connect all the 3 nodes in a balanced state.

The following conditions will make the connections become unbalanced.

  • One RAFT node restart,  all the ovn-controller clients to reconnect to the two remaining cluster nodes.

  • Ovn-k8s,  after SB raft pods rolling upgrade, the last raft pod has no client connections.


RAFT clients in an unbalanced state would trigger more stress to the raft cluster,  which makes the raft unstable under stress compared to a balanced state.

The proposal solution:


Ovn-controller adds next unix commands “reconnect” with argument of preferred SB node IP.

When unbalanced state happens,  the UNIX command can trigger ovn-controller reconnect

To new SB raft node with fast sync which doesn’t trigger the whole DB downloading process.



--
Winson

Girish Moodalbail

unread,
Aug 5, 2020, 7:35:08 PM8/5/20
to Han Zhou, Winson Wang, winson wang, ovn-kub...@googlegroups.com, ovs-d...@openvswitch.org


On Wed, Aug 5, 2020 at 3:05 PM Han Zhou <hz...@ovn.org> wrote:


Thanks Winson. The proposal sounds good to me. Will you implement it?

Han/Winson,

The fast re-sync is for ovsdb-server restart and it will not apply for ovn-controller restart, right?

If the ovsdb-client (ovn-controller) restarts, then it would have lost all its state and when it starts again it will still need to download logical_flows, port_bindings , and other tables it cares about. So, fast re-sync may not apply to this case.

Also, the ovn-controller should stash the IP address of the SB server to which it is connected to in Open_vSwitch table's external_id column. It updates this field whenever it re-connects to a different SB server (because that ovsdb-server instance failed or restarted). When ovn-controller itself restarts it could check for the value in this field and try to connect to it first and on failure fallback to connect to default connection approach.

Regards,
~Girish


 

Han

 

--
Winson

--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com.
_______________________________________________
discuss mailing list
dis...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Han Zhou

unread,
Aug 5, 2020, 8:23:43 PM8/5/20
to Girish Moodalbail, Han Zhou, Winson Wang, winson wang, ovn-kub...@googlegroups.com, ovs-d...@openvswitch.org
On Wed, Aug 5, 2020 at 4:35 PM Girish Moodalbail <gmood...@gmail.com> wrote:


On Wed, Aug 5, 2020 at 3:05 PM Han Zhou <hz...@ovn.org> wrote:


On Wed, Aug 5, 2020 at 12:51 PM Winson Wang <windso...@gmail.com> wrote:
Hello OVN Experts:

With large scale ovn-k8s cluster,  there are several conditions that would make ovn-controller clients connect SB central from a balanced state to an unbalanced state.
Is there an ongoing project to address this problem?
If not,  I have one proposal not sure if it is doable.
Please share your thoughts.

The issue:

OVN SB RAFT 3 node cluster,  at first all the ovn-controller clients will connect all the 3 nodes in a balanced state.

The following conditions will make the connections become unbalanced.

  • One RAFT node restart,  all the ovn-controller clients to reconnect to the two remaining cluster nodes.

  • Ovn-k8s,  after SB raft pods rolling upgrade, the last raft pod has no client connections.


RAFT clients in an unbalanced state would trigger more stress to the raft cluster,  which makes the raft unstable under stress compared to a balanced state.

The proposal solution:


Ovn-controller adds next unix commands “reconnect” with argument of preferred SB node IP.

When unbalanced state happens,  the UNIX command can trigger ovn-controller reconnect

To new SB raft node with fast sync which doesn’t trigger the whole DB downloading process.



Thanks Winson. The proposal sounds good to me. Will you implement it?

Han/Winson,

The fast re-sync is for ovsdb-server restart and it will not apply for ovn-controller restart, right?

 
Right, but the proposal is to provide a command just to reconnect, without restarting. In that case fast-resync should work.
 
If the ovsdb-client (ovn-controller) restarts, then it would have lost all its state and when it starts again it will still need to download logical_flows, port_bindings , and other tables it cares about. So, fast re-sync may not apply to this case.

Also, the ovn-controller should stash the IP address of the SB server to which it is connected to in Open_vSwitch table's external_id column. It updates this field whenever it re-connects to a different SB server (because that ovsdb-server instance failed or restarted). When ovn-controller itself restarts it could check for the value in this field and try to connect to it first and on failure fallback to connect to default connection approach.

The imbalance is usually caused by failover on server side. When one server is down, all clients are expected to connect to the rest of the servers, and when the server is back, there is no motivation for the clients to reconnect again (unless you purposely restart the clients, which would bring 1/3 of the restarted clients back to the old server). So I don't understand how "stash the IP address" would work in this scenario.

The proposal above by Winson is to purposely trigger a reconnection towards the desired server without restarting the clients, which I think solves this problem directly.

Thanks,
Han
 

Regards,
~Girish


 

Han

 

--
Winson

--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com.
_______________________________________________
discuss mailing list
dis...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.

Han Zhou

unread,
Aug 5, 2020, 8:29:02 PM8/5/20
to Tony Liu, Han Zhou, Winson Wang, winson wang, ovn-kub...@googlegroups.com, ovs-d...@openvswitch.org


On Wed, Aug 5, 2020 at 3:59 PM Tony Liu <tonyl...@hotmail.com> wrote:
Sorry for hijacking this thread, I'd like to get some clarifications.

How is the initial balanced state established, say 100 ovn-controllers
connecting to 3 ovn-sb-db?

The ovn-controller by default randomly connects to any servers specified in the connection method, e.g. tcp:<IP1>:6642,tcp:<IP2>6642,tcp:<IP3>:6643.
(Please see ovsdb(7) for details on "Connection Method".)

So initially it is balanced.
 
The ovn-controller doesn't have to connect to the leader of ovn-sb-db,
does it? In case it connects to the follower, the write request still
needs to be forwarded to the leader, right?

These logs keep showing up.
========
2020-08-05T22:48:33.141Z|103607|reconnect|INFO|tcp:10.6.20.84:6642: connecting...
2020-08-05T22:48:33.151Z|103608|reconnect|INFO|tcp:127.0.0.1:6640: connected
2020-08-05T22:48:33.151Z|103609|reconnect|INFO|tcp:10.6.20.84:6642: connected
2020-08-05T22:48:33.159Z|103610|main|INFO|OVNSB commit failed, force recompute next time.
2020-08-05T22:48:33.161Z|103611|ovsdb_idl|INFO|tcp:10.6.20.84:6642: clustered database server is disconnected from cluster; trying another server
2020-08-05T22:48:33.161Z|103612|reconnect|INFO|tcp:10.6.20.84:6642: connection attempt timed out
2020-08-05T22:48:33.161Z|103613|reconnect|INFO|tcp:10.6.20.84:6642: waiting 2 seconds before reconnect
========
What's that "clustered database server is disconnected from cluster" mean?

It means the server is part of a cluster, but it is disconnected from the cluster, e.g. due to network partitioning, or overloaded and lost heartbeat, or the cluster lost quorum and there is no leader elected.
If you use a clustered DB, it's better to set the connect method to all servers (or you can use a LB VIP that points to all servers), instead of only specifying a single server, which doesn't provide desired HA.
 

Thanks!

Tony



> -----Original Message-----
> From: discuss <ovs-discu...@openvswitch.org> On Behalf Of Han
> Zhou
> Sent: Wednesday, August 5, 2020 3:05 PM
> To: Winson Wang <windso...@gmail.com>
> Cc: winson wang <zhe...@nvidia.com>; ovn-kub...@googlegroups.com;
> ovs-d...@openvswitch.org
> Subject: Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster
> clients to balanced state again
>
>
>
> On Wed, Aug 5, 2020 at 12:51 PM Winson Wang <windso...@gmail.com
> <mailto:windso...@gmail.com> > wrote:
>
>
>       Hello OVN Experts:
>
>       With large scale ovn-k8s cluster,  there are several conditions
> that would make ovn-controller clients connect SB central from a
> balanced state to an unbalanced state.
>
>       Is there an ongoing project to address this problem?
>       If not,  I have one proposal not sure if it is doable.
>       Please share your thoughts.
>
>       The issue:
>
>       OVN SB RAFT 3 node cluster,  at first all the ovn-controller
> clients will connect all the 3 nodes in a balanced state.
>
>       The following conditions will make the connections become
> unbalanced.
>
>       *       One RAFT node restart,  all the ovn-controller clients to

> reconnect to the two remaining cluster nodes.
>
>       *       Ovn-k8s,  after SB raft pods rolling upgrade, the last raft

> pod has no client connections.
>
>
>       RAFT clients in an unbalanced state would trigger more stress to
> the raft cluster,  which makes the raft unstable under stress compared
> to a balanced state.
>
>
>       The proposal solution:
>
>
>
>       Ovn-controller adds next unix commands “reconnect” with argument of
> preferred SB node IP.
>
>       When unbalanced state happens,  the UNIX command can trigger ovn-
> controller reconnect
>
>       To new SB raft node with fast sync which doesn’t trigger the whole
> DB downloading process.
>
>
>
> Thanks Winson. The proposal sounds good to me. Will you implement it?
>
> Han
>
>
>
>
>
>       --
>
>       Winson
>
>
>
>       --
>       You received this message because you are subscribed to the Google
> Groups "ovn-kubernetes" group.
>       To unsubscribe from this group and stop receiving emails from it,
> send an email to ovn-kubernete...@googlegroups.com

>       To view this discussion on the web visit
> https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--
> iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com

Winson Wang

unread,
Aug 5, 2020, 8:45:39 PM8/5/20
to Han Zhou, ovs-d...@openvswitch.org, winson wang, Dumitru Ceara, ovn-kub...@googlegroups.com
Hi Han,


On Wed, Aug 5, 2020 at 3:05 PM Han Zhou <hz...@ovn.org> wrote:
Thanks Winson. The proposal sounds good to me. Will you implement it?

Thanks for reviewing my proposal solution.
I am hoping someone from the OVN team who more familiar with the ovn-controller code to deliver the feature if possible:).

Regards,
Winson


Han

 

--
Winson

--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com.


--
Winson

Girish Moodalbail

unread,
Aug 5, 2020, 9:22:00 PM8/5/20
to Han Zhou, Han Zhou, Winson Wang, winson wang, ovn-kub...@googlegroups.com, ovs-d...@openvswitch.org
Right. This is what we discussed internally, however when I read this email on the list I got confused with the other thread (rolling update of ovn-controller in K8s cluster which involves restart of ovn-controller). Sorry, for the noise.

Regards,
~Girish
Reply all
Reply to author
Forward
0 new messages