OVN SB RAFT 3 node cluster, at first all the ovn-controller clients will connect all the 3 nodes in a balanced state.
The following conditions will make the connections become unbalanced.
One RAFT node restart, all the ovn-controller clients to reconnect to the two remaining cluster nodes.
Ovn-k8s, after SB raft pods rolling upgrade, the last raft pod has no client connections.
RAFT clients in an unbalanced state would trigger more stress to the raft cluster, which makes the raft unstable under stress compared to a balanced state.
Ovn-controller adds next unix commands “reconnect” with argument of preferred SB node IP.
When unbalanced state happens, the UNIX command can trigger ovn-controller reconnect
To new SB raft node with fast sync which doesn’t trigger the whole DB downloading process.
Thanks Winson. The proposal sounds good to me. Will you implement it?
_______________________________________________Han----Winson
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com.
discuss mailing list
dis...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
On Wed, Aug 5, 2020 at 3:05 PM Han Zhou <hz...@ovn.org> wrote:On Wed, Aug 5, 2020 at 12:51 PM Winson Wang <windso...@gmail.com> wrote:Hello OVN Experts:With large scale ovn-k8s cluster, there are several conditions that would make ovn-controller clients connect SB central from a balanced state to an unbalanced state.Is there an ongoing project to address this problem?If not, I have one proposal not sure if it is doable.Please share your thoughts.The issue:OVN SB RAFT 3 node cluster, at first all the ovn-controller clients will connect all the 3 nodes in a balanced state.
The following conditions will make the connections become unbalanced.
One RAFT node restart, all the ovn-controller clients to reconnect to the two remaining cluster nodes.
Ovn-k8s, after SB raft pods rolling upgrade, the last raft pod has no client connections.
RAFT clients in an unbalanced state would trigger more stress to the raft cluster, which makes the raft unstable under stress compared to a balanced state.
The proposal solution:
Ovn-controller adds next unix commands “reconnect” with argument of preferred SB node IP.
When unbalanced state happens, the UNIX command can trigger ovn-controller reconnect
To new SB raft node with fast sync which doesn’t trigger the whole DB downloading process.
Thanks Winson. The proposal sounds good to me. Will you implement it?Han/Winson,The fast re-sync is for ovsdb-server restart and it will not apply for ovn-controller restart, right?
If the ovsdb-client (ovn-controller) restarts, then it would have lost all its state and when it starts again it will still need to download logical_flows, port_bindings , and other tables it cares about. So, fast re-sync may not apply to this case.Also, the ovn-controller should stash the IP address of the SB server to which it is connected to in Open_vSwitch table's external_id column. It updates this field whenever it re-connects to a different SB server (because that ovsdb-server instance failed or restarted). When ovn-controller itself restarts it could check for the value in this field and try to connect to it first and on failure fallback to connect to default connection approach.
--Regards,~Girish_______________________________________________Han----Winson
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com.
discuss mailing list
dis...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTrZb%2BNo8%2B3%3DOJcMqd6T_1sS5bm-xnF6v_P4%2B2uqKtZAQ%40mail.gmail.com.
Sorry for hijacking this thread, I'd like to get some clarifications.
How is the initial balanced state established, say 100 ovn-controllers
connecting to 3 ovn-sb-db?
The ovn-controller doesn't have to connect to the leader of ovn-sb-db,
does it? In case it connects to the follower, the write request still
needs to be forwarded to the leader, right?
These logs keep showing up.
========
2020-08-05T22:48:33.141Z|103607|reconnect|INFO|tcp:10.6.20.84:6642: connecting...
2020-08-05T22:48:33.151Z|103608|reconnect|INFO|tcp:127.0.0.1:6640: connected
2020-08-05T22:48:33.151Z|103609|reconnect|INFO|tcp:10.6.20.84:6642: connected
2020-08-05T22:48:33.159Z|103610|main|INFO|OVNSB commit failed, force recompute next time.
2020-08-05T22:48:33.161Z|103611|ovsdb_idl|INFO|tcp:10.6.20.84:6642: clustered database server is disconnected from cluster; trying another server
2020-08-05T22:48:33.161Z|103612|reconnect|INFO|tcp:10.6.20.84:6642: connection attempt timed out
2020-08-05T22:48:33.161Z|103613|reconnect|INFO|tcp:10.6.20.84:6642: waiting 2 seconds before reconnect
========
What's that "clustered database server is disconnected from cluster" mean?
Thanks!
Tony
> -----Original Message-----
> From: discuss <ovs-discu...@openvswitch.org> On Behalf Of Han
> Zhou
> Sent: Wednesday, August 5, 2020 3:05 PM
> To: Winson Wang <windso...@gmail.com>
> Cc: winson wang <zhe...@nvidia.com>; ovn-kub...@googlegroups.com;
> ovs-d...@openvswitch.org
> Subject: Re: [ovs-discuss] OVN Scale with RAFT: how to make raft cluster
> clients to balanced state again
>
>
>
> On Wed, Aug 5, 2020 at 12:51 PM Winson Wang <windso...@gmail.com
> <mailto:windso...@gmail.com> > wrote:
>
>
> Hello OVN Experts:
>
> With large scale ovn-k8s cluster, there are several conditions
> that would make ovn-controller clients connect SB central from a
> balanced state to an unbalanced state.
>
> Is there an ongoing project to address this problem?
> If not, I have one proposal not sure if it is doable.
> Please share your thoughts.
>
> The issue:
>
> OVN SB RAFT 3 node cluster, at first all the ovn-controller
> clients will connect all the 3 nodes in a balanced state.
>
> The following conditions will make the connections become
> unbalanced.
>
> * One RAFT node restart, all the ovn-controller clients to
> reconnect to the two remaining cluster nodes.
>
> * Ovn-k8s, after SB raft pods rolling upgrade, the last raft
> pod has no client connections.
>
>
> RAFT clients in an unbalanced state would trigger more stress to
> the raft cluster, which makes the raft unstable under stress compared
> to a balanced state.
>
>
> The proposal solution:
>
>
>
> Ovn-controller adds next unix commands “reconnect” with argument of
> preferred SB node IP.
>
> When unbalanced state happens, the UNIX command can trigger ovn-
> controller reconnect
>
> To new SB raft node with fast sync which doesn’t trigger the whole
> DB downloading process.
>
>
>
> Thanks Winson. The proposal sounds good to me. Will you implement it?
>
> Han
>
>
>
>
>
> --
>
> Winson
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "ovn-kubernetes" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to ovn-kubernete...@googlegroups.com
> <mailto:ovn-kubernete...@googlegroups.com> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--
> iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com
> <https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--
> iOW0LxxtkOhJpRT49E-
> 9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com?utm_medium=email&utm_source=foo
> ter> .
>
Thanks Winson. The proposal sounds good to me. Will you implement it?
Han
----Winson
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAMu6iS--iOW0LxxtkOhJpRT49E-9bJVy0iXraC1LMDUWeu6kLA%40mail.gmail.com.