Hello Han,Sorry, I was monitoring the ovn-kubernetes google group and didn't see your emails till now.On the other hand, why wouldn't splitting the join logical switch to 1000 LSes solve the problem? I understand that there will be 1000 more datapaths, and 1000 more LRPs, but these are all O(n), which is much more efficient than the O(n^2) exploding. What's the other scale issues created by this?Splitting a single join logical switch into 1000 different logical switch is how I have resolved the problem now. However, with this design I see following issues.(1) Complexitywhere one logical switch should have sufficed, we now need to create 1000 logical switches just to workaround the O(n^2) logical flows(2) IPAM management- before I had one IP subnet 100.64.0.0/16 for the single logical switch and depended on OVN IPAM to allocate IPs off of that subnet- now I need to first do subnet management (break a /16 to /29 CIDR) in OVN K8s and then assign each subnet to each of the join logical switch(3) each of this join logical switch is a distributed switch. The flows related to each one of them will be present in each hypervisor. This will increase the number of OpenFlow flows However, from OVN K8s point of view this logical switch is essentially pinned to an hypervisor and its role is to connect the hypervisor's l3gateway to the distributed router.We are trying to simplify the OVN logical topology for OVN K8s so that the number of logical flows (and therefore the number of OpenFlow flows) are reduced and that reduces the pressure on ovn-northd, OVN SB DB, and finally ovn-controller processes.Every node in OVN K8s cluster adds 4 resources. So, in a 1000 node k8s-cluster we will have 4000 + 1 (distributed router). This ends up creating around 250K OpenFlow rules in each of the hypervisior. This number is to just support the initial logical topology. I am not accounting for any flows that will be generated for k8s network polices, services, and so on.In addition, Girish, for the external LS, I am not sure why can't it be shared, if all the nodes are connected to a single L2 network. (If they are connected to separate L2 networks, different external LSes should be created, at least according to current OVN model).Yes, the plan was to share the same external LS with all of the L3 gateway routers since they are all on the same broadcast domain. However, we will end up with the same 2M logical flows since a single external LS connects all the L3 gateway routers on the same broadcast domain.In short, for a 1000-node K8s cluster, if we reduce the logical flow explosion, then we can reduce the number of logical resources in OVN K8s topology by 1998 (1000 Join LS will become 1 and 1000 external LS will become 1).
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDC%3Dp4fmsQPY38eezAqENG65ftXk6CAxKn%3DsF1X%3Dp92gw0A%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STSFJ1cWOyGyXDGjstX4L%3DJpZBw2%3D5b22dr1a4h3vKPU4A%40mail.gmail.com.
Girish, Han,From my understanding the GR (per node) <----> DR link is local subnet and you don't want the overhead of many switch objects in OVN, but you also dont want a all the GRs connecting to a single switch to stop large L2 domain. Isn't the simple solution to allow connecting routers to each other without an intermediary switch?
Tim RozetRed Hat CTO Networking Team
2. Datapath performance would be bad with DGP. We want the packet meant for the host or the Internet to exit out of the hypervisor on which the pod exists. The L3 gateway router provides us with this functionality. With dgp and with OVN supporting only one instance of it, packets unnecessarily gets forwarded over tunnel to dgp chassis for SNATing and then gets forwarded back over tunnel to the host to just exit out locally.
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/20200508201301.GD47205%40localhost.localdomain.
Hello Han,I did consider distributed gateway port. However, there are two issues with it1. In order to support K8s NodePort services we need to create a North-South LB and L3 gateway is a perfect solution for that. AFAIK,DGP doesn't support itIn fact DGP supports LB (at least from code https://github.com/ovn-org/ovn/blob/master/northd/ovn-northd.c#L9318), but the ovn-nb manpage may need an update.
2. Datapath performance would be bad with DGP. We want the packet meant for the host or the Internet to exit out of the hypervisor on which the pod exists. The L3 gateway router provides us with this functionality. With dgp and with OVN supporting only one instance of it, packets unnecessarily gets forwarded over tunnel to dgp chassis for SNATing and then gets forwarded back over tunnel to the host to just exit out locally.This is related to the changes needed for DGP (the first point I mentioned in previous email). In the diagram I draw, there will be 1000 DGPs, each reside on a chassis, just to make sure north-south traffic can be forwarded on the local chassis without going through a central node, just like how it works today in ovn-k8s. However, maybe this is not a small change, because today the NAT and LB processing on such LRs (LRs with DGP) are all based on the assumption that there is only one DGP. For example, the NB schema would also need to be changed so that the NAT/LB rules for a router can specify DGP to determine the central processing location for those rules.
So, to summarize, if we can make multi-DGP work, it would be the best solution for the ovn-k8s scenario. If we can't (either because of design problem, or because it is too big effort for the gains), maybe configurably avoiding the static neighbour flows is a good way to go. Both options requires changes in OVN.
Without changes in OVN, a further optimization based on your current workaround can be done is what Tim has suggested: to replace the large number of small join LSes (and LRPs and patch ports on both sides) by same number of directly connected LRPs.
The distributed router and the gateway router are connected by another logical switch, sometimes referred to as a ``join’’ logical switch. (OVN logical routers may be connected to one another directly, without an intervening switch, but the OVN implementation only supports gateway logical routers that are connected to logical switches. Using a join logical switch also reduces the number of IP addresses needed on the distributed router.)
>
> Thanks,
> Han
Hello Han,Can you please explain how the dynamic resolution of the IP-to-MAC will work with this new option set?Say the packet is being forwarded from router2 towards the distributed router? So, nexthop (reg0) is set to IP1 and we need to find the MAC address M1 to set eth.dst to.+----------------+ +----------------+
| l3gateway | | l3gateway |
| router2 | | router3 |
+-------------+--+ +-+--------------+
IP2,M2 IP3,M3
| |
+--+-------------+---+
| join switch |
+---------+----------+
|
IP1,M1
+-------+--------+
| distributed |
| router |
+----------------+The MAC M1 will not obviously in the MAC_binding table. On the hypervisor where the packet originated, the router2's port and the distributed router's port are locally present. So, does this result in a PACKET_IN to the ovn-controller and the resolution happens there?
How about the resolution of IP3-to-M3 happen on gateway router2? Will there be an ARP request packet that will be broadcasted on the join switch for this case?
On Sat, May 16, 2020 at 12:13 PM Girish Moodalbail <gmood...@gmail.com> wrote:Hello Han,Can you please explain how the dynamic resolution of the IP-to-MAC will work with this new option set?Say the packet is being forwarded from router2 towards the distributed router? So, nexthop (reg0) is set to IP1 and we need to find the MAC address M1 to set eth.dst to.+----------------+ +----------------+
| l3gateway | | l3gateway |
| router2 | | router3 |
+-------------+--+ +-+--------------+
IP2,M2 IP3,M3
| |
+--+-------------+---+
| join switch |
+---------+----------+
|
IP1,M1
+-------+--------+
| distributed |
| router |
+----------------+The MAC M1 will not obviously in the MAC_binding table. On the hypervisor where the packet originated, the router2's port and the distributed router's port are locally present. So, does this result in a PACKET_IN to the ovn-controller and the resolution happens there?Yes there will be a PACKET_IN, and then:1. ovn-controller will generate the ARP request for IP1, and send PACKET_OUT to OVS.2. The ARP request will be delivered to the distributed router pipeline only, because of a special handling of ARP in OVN for IPs of router ports, although it is a broadcast. (It would have been broadcasted to all GRs without that special handling)3. The distributed router pipeline should learn the IP-MAC binding of IP2-M2 (through a PACKET_IN to ovn-controller), and at the same time send ARP reply to the router2 in the distributed router pipeline.4. Router2 pipeline will handle the ARP response and learn the IP-MAC binding of IP1-M1 (through a PACKET_IN to ovn-controller).
just a quick question below..
________________________________________
From: ovn-kub...@googlegroups.com <ovn-kub...@googlegroups.com> on behalf of Girish Moodalbail <gmood...@gmail.com>
Sent: Tuesday, May 19, 2020 11:09 PM
To: Han Zhou
Cc: Han Zhou; Dan Winship; ovs-discuss; ovn-kub...@googlegroups.com
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments
Hello Han,
Please see in-line:
On Sat, May 16, 2020 at 11:17 PM Han Zhou <zho...@gmail.com<mailto:zho...@gmail.com>> wrote:
<vi> probably obvious and I am missing it, but..
<vi> I see the lflow to direct ARP request to the router port, instead of bcast. However,
<vi> we also add flows to bcast self-originated (unsolicitated ?) arp requests (we should
<vi> not see this for router IPs, I suppose). But, given we just match on the source
<vi> MAC address of the packet for such packets, does it differ from the ARP
<vi> request generated for Router IP?
thanks,
-venu
Note that the direction of ARP request is from Gateway Router to Distributed Router.
Regards,
~Girish
How about the resolution of IP3-to-M3 happen on gateway router2? Will there be an ARP request packet that will be broadcasted on the join switch for this case?
I think in the use case of ovn-k8s, as you described before, this should not happen. However, if this does happen, it is similar to above steps, except that in step 2) and 3) the ARP request and response will be sent between the chassises through tunnel. If this happens between all pairs of GRs, then there will be again O(n^2) MAC_Binding entries.
I haven't tested the GR scenario yet, so I can't guarantee it works as expected. Please let me know if you see any problems. I will submit formal patch with more test cases if it is confirmed in your environment.
Thanks,
Han
Regards,
~Girish
On Sat, May 16, 2020 at 10:25 AM Girish Moodalbail <gmood...@gmail.com<mailto:gmood...@gmail.com>> wrote:
On Sat, May 16, 2020 at 12:36 AM Han Zhou <zho...@gmail.com<mailto:zho...@gmail.com>> wrote:
On Tue, May 5, 2020 at 11:57 AM Han Zhou <hz...@ovn.org<mailto:hz...@ovn.org>> wrote:
Hi Girish,
Thanks,
Han
Regards,
~Girish
>
> Thanks,
> Han
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com<mailto:ovn-kubernete...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com<https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCnZ0ZJeC0L%3DXXf8JQ0k1TqJoo0MkHzj6%3DkmEv1qHPxaZA%40mail.gmail.com.
________________________________________
From: ovn-kub...@googlegroups.com <ovn-kub...@googlegroups.com> on behalf of Han Zhou <zho...@gmail.com>
Sent: Thursday, May 21, 2020 2:00 PM
To: Venugopal Iyer; Dumitru Ceara
Cc: Girish Moodalbail; Han Zhou; Dan Winship; ovs-discuss; ovn-kub...@googlegroups.com
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments
On Thu, May 21, 2020 at 10:33 AM Venugopal Iyer <venug...@nvidia.com<mailto:venug...@nvidia.com>> wrote:
Han,
just a quick question below..
________________________________________
From: ovn-kub...@googlegroups.com<mailto:ovn-kub...@googlegroups.com> <ovn-kub...@googlegroups.com<mailto:ovn-kub...@googlegroups.com>> on behalf of Girish Moodalbail <gmood...@gmail.com<mailto:gmood...@gmail.com>>
Sent: Tuesday, May 19, 2020 11:09 PM
To: Han Zhou
Cc: Han Zhou; Dan Winship; ovs-discuss; ovn-kub...@googlegroups.com<mailto:ovn-kub...@googlegroups.com>
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments
Hello Han,
Please see in-line:
On Sat, May 16, 2020 at 11:17 PM Han Zhou <zho...@gmail.com<mailto:zho...@gmail.com><mailto:zho...@gmail.com<mailto:zho...@gmail.com>>> wrote:
Good catch! That seems to be the reason why it is broadcasted. I thought the feature was only allowing GARP to be broadcasted, but it is actually allowing (G)ARP including regular ARP generated by the LRs. It can be an easy fix to: commit 32f5ebb062 ("ovn-northd: Limit ARP/ND broadcast domain whenever possible."), but I am not sure if there are other concerns of doing that. @Dumitru Ceara<mailto:dce...@redhat.com> to comment if we can restrict it to be GARP only.
On the other hand, in this use case, if there are any ARP from the distributed router to any of the GRs, then all the GRs should have learned the MAC-bindings of the IP1-M1, and they won't send ARP for IP1 any more, thus would not result in N x N MAC-bindings, right? In the real use case, it may depend on which direction of traffic comes first. If it is always from external to k8s workloads first, then yes it will end up with N x N mac-bindings finally.
<vi> that's right. However, I am not sure why the MAC bindings are learnt from the
<vi> ARP requests unconditionally - I thought you update the bindings, if you have
<vi> it in the table, but don't add it unless you need to. Linux has "arp_accept" that
<vi> allows you to add if needed, but by default it doesn't.
"
arp_accept - BOOLEAN
Define behavior for gratuitous ARP frames who's IP is not
already present in the ARP table:
0 - don't create new entries in the ARP table
1 - create new entries in the ARP table
"
<vi>Shouldn't we have a similar knob to learn via ARP request? which should be
<vi>"false" by default?
thanks,
-venu
Note that the direction of ARP request is from Gateway Router to Distributed Router.
Regards,
~Girish
How about the resolution of IP3-to-M3 happen on gateway router2? Will there be an ARP request packet that will be broadcasted on the join switch for this case?
I think in the use case of ovn-k8s, as you described before, this should not happen. However, if this does happen, it is similar to above steps, except that in step 2) and 3) the ARP request and response will be sent between the chassises through tunnel. If this happens between all pairs of GRs, then there will be again O(n^2) MAC_Binding entries.
I haven't tested the GR scenario yet, so I can't guarantee it works as expected. Please let me know if you see any problems. I will submit formal patch with more test cases if it is confirmed in your environment.
Thanks,
Han
Regards,
~Girish
On Sat, May 16, 2020 at 10:25 AM Girish Moodalbail <gmood...@gmail.com<mailto:gmood...@gmail.com><mailto:gmood...@gmail.com<mailto:gmood...@gmail.com>>> wrote:
On Sat, May 16, 2020 at 12:36 AM Han Zhou <zho...@gmail.com<mailto:zho...@gmail.com><mailto:zho...@gmail.com<mailto:zho...@gmail.com>>> wrote:
On Tue, May 5, 2020 at 11:57 AM Han Zhou <hz...@ovn.org<mailto:hz...@ovn.org><mailto:hz...@ovn.org<mailto:hz...@ovn.org>>> wrote:
Hi Girish,
Thanks,
Han
Regards,
~Girish
>
> Thanks,
> Han
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com<mailto:ovn-kubernetes%2Bunsu...@googlegroups.com><mailto:ovn-kubernete...@googlegroups.com<mailto:ovn-kubernetes%2Bunsu...@googlegroups.com>>.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com<https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com<mailto:ovn-kubernete...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCnZ0ZJeC0L%3DXXf8JQ0k1TqJoo0MkHzj6%3DkmEv1qHPxaZA%40mail.gmail.com<https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCnZ0ZJeC0L%3DXXf8JQ0k1TqJoo0MkHzj6%3DkmEv1qHPxaZA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
I think that if you directly connect GR to DR you don't need to learn any ARP with packet_in and you can preprogram the static entries. Each GR will have 1 enty for the DR, while the DR will have N number of entries for N nodes.
The real issue with ARP learning comes from the GR-----External. You have to learn these, and from my conversation with Girish it seems like every GR is adding an entry on every ARP request it sees. This means 1 GR sends ARP request to external L2 network and every GR sees the ARP request and adds an entry. I think the behavior should be:GRs only add ARP entries when:
- An ARP Response is sent to it
- The GR receives a GARP broadcast, and already has an entry in his cache for that IP (Girish mentioned this is similar to linux arp_accept behavior)
In addition, as Michael Cambria pointed out in our weekly meeting, these ARP cache entries should have expiry timers on them. If they are permanently learned, you will end up with a growing ARP table over time, and end up in the same place. We can probably just program the GR ARP flows with an idle_timeout and have the flow removed. What do you think?
Should I file a bugzilla outlining the above so we can have proper tracking?
________________________________________
From: ovn-kub...@googlegroups.com <ovn-kub...@googlegroups.com> on behalf of Han Zhou <zho...@gmail.com>
Sent: Thursday, May 21, 2020 4:42 PM
To: Tim Rozet
Cc: Venugopal Iyer; Dumitru Ceara; Girish Moodalbail; Han Zhou; Dan Winship; ovs-discuss; ovn-kub...@googlegroups.com; Michael Cambria
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments
On Thu, May 21, 2020 at 2:35 PM Tim Rozet <tro...@redhat.com<mailto:tro...@redhat.com>> wrote:
I think that if you directly connect GR to DR you don't need to learn any ARP with packet_in and you can preprogram the static entries. Each GR will have 1 enty for the DR, while the DR will have N number of entries for N nodes.
Hi Tim, as mentioned by Girish, directly connecting GRs to DR requires N ports on the DR and also requires a lot of small subnets, which is not desirable. And since changes are needed anyway in OVN to support that, we moved forward with the current approach of avoiding the static ARP flows to solve the problem instead of directly connecting GRs to DR.
The real issue with ARP learning comes from the GR-----External. You have to learn these, and from my conversation with Girish it seems like every GR is adding an entry on every ARP request it sees. This means 1 GR sends ARP request to external L2 network and every GR sees the ARP request and adds an entry. I think the behavior should be:
GRs only add ARP entries when:
1. An ARP Response is sent to it
2. The GR receives a GARP broadcast, and already has an entry in his cache for that IP (Girish mentioned this is similar to linux arp_accept behavior)
For 2), it is expensive to do in OVN because OpenFlow doesn't support a match condition of "field1 == field2", which is required to check if the incoming ARP request is a GARP, i.e. SPA == TPA. However, it is ok to support something similar like linux arp_accept configuration but slightly different. In OVN we can configure it to alllow/disable learning from all ARP requests to IPs not belonging to the router, including GARPs. Would that solve the problem here? (@Venugopal Iyer<mailto:venug...@nvidia.com> brought up the same thing about "arp_accept". I hope this reply addresses that as well)
<vi> I can't think of any side effects to this, so seems fine to me to do so. Believe linux behaves that way w.r.t. ARP request
<vi> anyway (assuming I am reading it right).
https://elixir.bootlin.com/linux/v5.7-rc6/source/net/ipv4/arp.c (L874)
thanks,
-venu
In addition, as Michael Cambria pointed out in our weekly meeting, these ARP cache entries should have expiry timers on them. If they are permanently learned, you will end up with a growing ARP table over time, and end up in the same place. We can probably just program the GR ARP flows with an idle_timeout and have the flow removed. What do you think?
This has been discussed before. It is also mentioned in the TODO.rst. However, it is not taken care because there is no good solution found yet. It can be done but will be expensive and the gains do not worth the costs. Accepting ARP requests partially reduces the needs of ARP expiration. It is true that it could still be a problem in some scenarios but so far we didn't heard any use case that has hard dependency on this.
Should I file a bugzilla outlining the above so we can have proper tracking?
I think bugzilla is out of the control of OVN community, so please feel free to file or not file ;)
Thanks,
Han
Thanks,
Tim Rozet
Red Hat CTO Networking Team
On Thu, May 21, 2020 at 5:01 PM Han Zhou <zho...@gmail.com<mailto:zho...@gmail.com>> wrote:
On Thu, May 21, 2020 at 10:33 AM Venugopal Iyer <venug...@nvidia.com<mailto:venug...@nvidia.com>> wrote:
Han,
just a quick question below..
________________________________________
From: ovn-kub...@googlegroups.com<mailto:ovn-kub...@googlegroups.com> <ovn-kub...@googlegroups.com<mailto:ovn-kub...@googlegroups.com>> on behalf of Girish Moodalbail <gmood...@gmail.com<mailto:gmood...@gmail.com>>
Sent: Tuesday, May 19, 2020 11:09 PM
To: Han Zhou
Cc: Han Zhou; Dan Winship; ovs-discuss; ovn-kub...@googlegroups.com<mailto:ovn-kub...@googlegroups.com>
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments
Hello Han,
Please see in-line:
On Sat, May 16, 2020 at 11:17 PM Han Zhou <zho...@gmail.com<mailto:zho...@gmail.com><mailto:zho...@gmail.com<mailto:zho...@gmail.com>>> wrote:
Good catch! That seems to be the reason why it is broadcasted. I thought the feature was only allowing GARP to be broadcasted, but it is actually allowing (G)ARP including regular ARP generated by the LRs. It can be an easy fix to: commit 32f5ebb062 ("ovn-northd: Limit ARP/ND broadcast domain whenever possible."), but I am not sure if there are other concerns of doing that. @Dumitru Ceara<mailto:dce...@redhat.com> to comment if we can restrict it to be GARP only.
On the other hand, in this use case, if there are any ARP from the distributed router to any of the GRs, then all the GRs should have learned the MAC-bindings of the IP1-M1, and they won't send ARP for IP1 any more, thus would not result in N x N MAC-bindings, right? In the real use case, it may depend on which direction of traffic comes first. If it is always from external to k8s workloads first, then yes it will end up with N x N mac-bindings finally.
thanks,
-venu
Note that the direction of ARP request is from Gateway Router to Distributed Router.
Regards,
~Girish
How about the resolution of IP3-to-M3 happen on gateway router2? Will there be an ARP request packet that will be broadcasted on the join switch for this case?
I think in the use case of ovn-k8s, as you described before, this should not happen. However, if this does happen, it is similar to above steps, except that in step 2) and 3) the ARP request and response will be sent between the chassises through tunnel. If this happens between all pairs of GRs, then there will be again O(n^2) MAC_Binding entries.
I haven't tested the GR scenario yet, so I can't guarantee it works as expected. Please let me know if you see any problems. I will submit formal patch with more test cases if it is confirmed in your environment.
Thanks,
Han
Regards,
~Girish
On Sat, May 16, 2020 at 10:25 AM Girish Moodalbail <gmood...@gmail.com<mailto:gmood...@gmail.com><mailto:gmood...@gmail.com<mailto:gmood...@gmail.com>>> wrote:
On Sat, May 16, 2020 at 12:36 AM Han Zhou <zho...@gmail.com<mailto:zho...@gmail.com><mailto:zho...@gmail.com<mailto:zho...@gmail.com>>> wrote:
On Tue, May 5, 2020 at 11:57 AM Han Zhou <hz...@ovn.org<mailto:hz...@ovn.org><mailto:hz...@ovn.org<mailto:hz...@ovn.org>>> wrote:
Hi Girish,
Thanks,
Han
Regards,
~Girish
>
> Thanks,
> Han
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com<mailto:ovn-kubernetes%2Bunsu...@googlegroups.com><mailto:ovn-kubernete...@googlegroups.com<mailto:ovn-kubernetes%2Bunsu...@googlegroups.com>>.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com<https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com<mailto:ovn-kubernete...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCnZ0ZJeC0L%3DXXf8JQ0k1TqJoo0MkHzj6%3DkmEv1qHPxaZA%40mail.gmail.com<https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCnZ0ZJeC0L%3DXXf8JQ0k1TqJoo0MkHzj6%3DkmEv1qHPxaZA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com<mailto:ovn-kubernete...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCmDL84qU_aciBz_OgNwj8RQhiz%3DyCwzrnc6ZVqb80QyPQ%40mail.gmail.com<https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCmDL84qU_aciBz_OgNwj8RQhiz%3DyCwzrnc6ZVqb80QyPQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
Hi, Han:
________________________________________
From: ovn-kub...@googlegroups.com <ovn-kub...@googlegroups.com> on behalf of Han Zhou <zho...@gmail.com>
Sent: Thursday, May 21, 2020 4:42 PM
To: Tim Rozet
Cc: Venugopal Iyer; Dumitru Ceara; Girish Moodalbail; Han Zhou; Dan Winship; ovs-discuss; ovn-kub...@googlegroups.com; Michael Cambria
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments
On Thu, May 21, 2020 at 2:35 PM Tim Rozet <tro...@redhat.com<mailto:tro...@redhat.com>> wrote:
I think that if you directly connect GR to DR you don't need to learn any ARP with packet_in and you can preprogram the static entries. Each GR will have 1 enty for the DR, while the DR will have N number of entries for N nodes.
Hi Tim, as mentioned by Girish, directly connecting GRs to DR requires N ports on the DR and also requires a lot of small subnets, which is not desirable. And since changes are needed anyway in OVN to support that, we moved forward with the current approach of avoiding the static ARP flows to solve the problem instead of directly connecting GRs to DR.
The real issue with ARP learning comes from the GR-----External. You have to learn these, and from my conversation with Girish it seems like every GR is adding an entry on every ARP request it sees. This means 1 GR sends ARP request to external L2 network and every GR sees the ARP request and adds an entry. I think the behavior should be:
GRs only add ARP entries when:
1. An ARP Response is sent to it
2. The GR receives a GARP broadcast, and already has an entry in his cache for that IP (Girish mentioned this is similar to linux arp_accept behavior)
For 2), it is expensive to do in OVN because OpenFlow doesn't support a match condition of "field1 == field2", which is required to check if the incoming ARP request is a GARP, i.e. SPA == TPA. However, it is ok to support something similar like linux arp_accept configuration but slightly different. In OVN we can configure it to alllow/disable learning from all ARP requests to IPs not belonging to the router, including GARPs. Would that solve the problem here? (@Venugopal Iyer<mailto:venug...@nvidia.com> brought up the same thing about "arp_accept". I hope this reply addresses that as well)
<vi> I can't think of any side effects to this, so seems fine to me to do so. Believe linux behaves that way w.r.t. ARP request
<vi> anyway (assuming I am reading it right).
https://elixir.bootlin.com/linux/v5.7-rc6/source/net/ipv4/arp.c (L874)
thanks,
-venu
In addition, as Michael Cambria pointed out in our weekly meeting, these ARP cache entries should have expiry timers on them. If they are permanently learned, you will end up with a growing ARP table over time, and end up in the same place. We can probably just program the GR ARP flows with an idle_timeout and have the flow removed. What do you think?
This has been discussed before. It is also mentioned in the TODO.rst. However, it is not taken care because there is no good solution found yet. It can be done but will be expensive and the gains do not worth the costs. Accepting ARP requests partially reduces the needs of ARP expiration. It is true that it could still be a problem in some scenarios but so far we didn't heard any use case that has hard dependency on this.
Should I file a bugzilla outlining the above so we can have proper tracking?
I think bugzilla is out of the control of OVN community, so please feel free to file or not file ;)
On Thu, May 21, 2020 at 8:45 PM Venugopal Iyer <venug...@nvidia.com> wrote:Hi, Han:
________________________________________
From: ovn-kub...@googlegroups.com <ovn-kub...@googlegroups.com> on behalf of Han Zhou <zho...@gmail.com>
Sent: Thursday, May 21, 2020 4:42 PM
To: Tim Rozet
Cc: Venugopal Iyer; Dumitru Ceara; Girish Moodalbail; Han Zhou; Dan Winship; ovs-discuss; ovn-kub...@googlegroups.com; Michael Cambria
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments
On Thu, May 21, 2020 at 2:35 PM Tim Rozet <tro...@redhat.com<mailto:tro...@redhat.com>> wrote:
I think that if you directly connect GR to DR you don't need to learn any ARP with packet_in and you can preprogram the static entries. Each GR will have 1 enty for the DR, while the DR will have N number of entries for N nodes.
Hi Tim, as mentioned by Girish, directly connecting GRs to DR requires N ports on the DR and also requires a lot of small subnets, which is not desirable. And since changes are needed anyway in OVN to support that, we moved forward with the current approach of avoiding the static ARP flows to solve the problem instead of directly connecting GRs to DR.
Why is that not desirable? They are all private subnets with /30 (if using ipv4). If IPv6, it's even less of a concern from an addressing perspective.
The real issue with ARP learning comes from the GR-----External. You have to learn these, and from my conversation with Girish it seems like every GR is adding an entry on every ARP request it sees. This means 1 GR sends ARP request to external L2 network and every GR sees the ARP request and adds an entry. I think the behavior should be:
GRs only add ARP entries when:
1. An ARP Response is sent to it
2. The GR receives a GARP broadcast, and already has an entry in his cache for that IP (Girish mentioned this is similar to linux arp_accept behavior)
For 2), it is expensive to do in OVN because OpenFlow doesn't support a match condition of "field1 == field2", which is required to check if the incoming ARP request is a GARP, i.e. SPA == TPA. However, it is ok to support something similar like linux arp_accept configuration but slightly different. In OVN we can configure it to alllow/disable learning from all ARP requests to IPs not belonging to the router, including GARPs. Would that solve the problem here? (@Venugopal Iyer<mailto:venug...@nvidia.com> brought up the same thing about "arp_accept". I hope this reply addresses that as well)I think the issue there is if you have an external device, which is using a VIP and it fails over, it will usually send GARP to inform of the mac change. In this case if you ignore GARP, what happens? You wont send another ARP because OVN programs the arp entry forever and doesn't expire it right? So you won't learn the new mac and keep sending packets to a dead mac?
Regards,~Girish
________________________________________
From: ovn-kub...@googlegroups.com <ovn-kub...@googlegroups.com> on behalf of Han Zhou <zho...@gmail.com>
Sent: Thursday, May 21, 2020 7:43 PM
To: Girish Moodalbail
Cc: Tim Rozet; Venugopal Iyer; Dumitru Ceara; Han Zhou; Dan Winship; ovs-discuss; ovn-kub...@googlegroups.com; Michael Cambria
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments
On Thu, May 21, 2020 at 7:12 PM Girish Moodalbail <gmood...@gmail.com<mailto:gmood...@gmail.com>> wrote:
On Thu, May 21, 2020 at 6:58 PM Tim Rozet <tro...@redhat.com<mailto:tro...@redhat.com>> wrote:
On Thu, May 21, 2020 at 8:45 PM Venugopal Iyer <venug...@nvidia.com<mailto:venug...@nvidia.com>> wrote:
Hi, Han:
________________________________________
From: ovn-kub...@googlegroups.com<mailto:ovn-kub...@googlegroups.com> <ovn-kub...@googlegroups.com<mailto:ovn-kub...@googlegroups.com>> on behalf of Han Zhou <zho...@gmail.com<mailto:zho...@gmail.com>>
Sent: Thursday, May 21, 2020 4:42 PM
To: Tim Rozet
Cc: Venugopal Iyer; Dumitru Ceara; Girish Moodalbail; Han Zhou; Dan Winship; ovs-discuss; ovn-kub...@googlegroups.com<mailto:ovn-kub...@googlegroups.com>; Michael Cambria
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments
On Thu, May 21, 2020 at 2:35 PM Tim Rozet <tro...@redhat.com<mailto:tro...@redhat.com><mailto:tro...@redhat.com<mailto:tro...@redhat.com>>> wrote:
I think that if you directly connect GR to DR you don't need to learn any ARP with packet_in and you can preprogram the static entries. Each GR will have 1 enty for the DR, while the DR will have N number of entries for N nodes.
Hi Tim, as mentioned by Girish, directly connecting GRs to DR requires N ports on the DR and also requires a lot of small subnets, which is not desirable. And since changes are needed anyway in OVN to support that, we moved forward with the current approach of avoiding the static ARP flows to solve the problem instead of directly connecting GRs to DR.
Why is that not desirable? They are all private subnets with /30 (if using ipv4). If IPv6, it's even less of a concern from an addressing perspective.
It is not just about the subnet management but also the additional logical flows that created between two ways of connecting DR and GR.
Say, we have a fix that efficiently allows one to connect 1000s of GR using a single logical switch, then would you rather use that instead of 1000 patch cables connecting a GR to DR? It is not only the issue of Subnet Management for those 1000 point-to-point connections but also those 1000 patch ports are local to each of the chassis, so we need to understand in such a topology how many addition logical flows gets created in the SB and how many OpenFlow flows gets created on each of the 1000 chassis for those 1000 patch cables.
The real issue with ARP learning comes from the GR-----External. You have to learn these, and from my conversation with Girish it seems like every GR is adding an entry on every ARP request it sees. This means 1 GR sends ARP request to external L2 network and every GR sees the ARP request and adds an entry. I think the behavior should be:
GRs only add ARP entries when:
1. An ARP Response is sent to it
2. The GR receives a GARP broadcast, and already has an entry in his cache for that IP (Girish mentioned this is similar to linux arp_accept behavior)
For 2), it is expensive to do in OVN because OpenFlow doesn't support a match condition of "field1 == field2", which is required to check if the incoming ARP request is a GARP, i.e. SPA == TPA. However, it is ok to support something similar like linux arp_accept configuration but slightly different. In OVN we can configure it to alllow/disable learning from all ARP requests to IPs not belonging to the router, including GARPs. Would that solve the problem here? (@Venugopal Iyer<mailto:venug...@nvidia.com<mailto:venug...@nvidia.com>> brought up the same thing about "arp_accept". I hope this reply addresses that as well)
I think the issue there is if you have an external device, which is using a VIP and it fails over, it will usually send GARP to inform of the mac change. In this case if you ignore GARP, what happens? You wont send another ARP because OVN programs the arp entry forever and doesn't expire it right? So you won't learn the new mac and keep sending packets to a dead mac?
I think we will have to support GARP otherwise VIPs will not work like Tim mentions. If we do learn from GARP and as long as the GARP itself is not originated by any of the 1000s GRs, then we should be fine.
Right, I didn't thought this through. I thought it is just a configurable option, but it seems we will always need to support GARP, so the option becomes useless.
However, there is no easy way to achieve: "do learn from GARP and as long as the GARP itself is not originated by any of the 1000s GRs", because OVN doesn't have the knowledge of the use case. The requirement is like: don't learn neighbours from ARP requests if the ARP's src belongs to OVN routers. Firstly this requirement is hard to understand by users not from the particular ovn-k8s setup. Secondly to implement this, it requires O(n^2) flows already, just to bypass the OVN owned router IPs, which is useless to the original problem. We will have to figure out a clean way.
<vi> I suppose the use of GARP as a reply v/s response is not very clear; [1], Section 3 seems to offer a concise summary of this. If the application sends GARP as
<vi> a reply we are covered, but the question is if the GARP is a request (which is allowed) then what our response should be. Tim is right, we can't ignore
<vi> the request (more so, since aging is not supported currently), however "arp_accept" ignores the request for creating a new cache entry, not updating
<vi> an existing one (see last para below)
[2]
arp_accept - BOOLEAN
Define behavior for gratuitous ARP frames who's IP is not
already present in the ARP table:
0 - don't create new entries in the ARP table
1 - create new entries in the ARP table
Both replies and requests type gratuitous arp will trigger the
ARP table to be updated, if this setting is on.
If the ARP table already contains the IP address of the
gratuitous arp frame, the arp table will be updated regardless
if this setting is on or off.
<vi> if we lookup and get a hit, we should still process the GARP; only if we don't have a hit, we should ignore (instead of
<vi> creating an entry). BTW, do we update today? if I understand the use of reg9[2] / REGBIT_LOOKUP_NEIGHBOR_RESULT (assuming lookup_arp
<vi> returns 1 if entry exists), I am not sure it does? maybe I missed it ..
thanks,
-venu
[1]https://www.ietf.org/rfc/rfc5227.txt
For the internal join-switch this is easier. I think allowing broadcasting from LRs only the GARP request and ARP request to unknown IPs (all others will be unicasted) will solve the problem. But for the external logical switch, I have no idea. Can it be handled from the operator perspective, by initiating a ping from external to the GR, so that GR learns the external GW IP-MAC binding, before sending broadcast to all neighbours?
Regards,
~Girish
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com<mailto:ovn-kubernete...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCmKJ4JpZ-HfKhmb18LU3HmqAiAvUmFGnRrPcDF5M7u0yw%40mail.gmail.com<https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCmKJ4JpZ-HfKhmb18LU3HmqAiAvUmFGnRrPcDF5M7u0yw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
A couple of comments below:
<vi> I suppose the use of GARP as a reply v/s response is not very clear; [1], Section 3 seems to offer a concise summary of this. If the application sends GARP as
<vi> a reply we are covered, but the question is if the GARP is a request (which is allowed) then what our response should be. Tim is right, we can't ignore
<vi> the request (more so, since aging is not supported currently), however "arp_accept" ignores the request for creating a new cache entry, not updating
<vi> an existing one (see last para below)
[2]
arp_accept - BOOLEAN
Define behavior for gratuitous ARP frames who's IP is not
already present in the ARP table:
0 - don't create new entries in the ARP table
1 - create new entries in the ARP table
Both replies and requests type gratuitous arp will trigger the
ARP table to be updated, if this setting is on.
If the ARP table already contains the IP address of the
gratuitous arp frame, the arp table will be updated regardless
if this setting is on or off.
<vi> if we lookup and get a hit, we should still process the GARP; only if we don't have a hit, we should ignore (instead of
<vi> creating an entry). BTW, do we update today? if I understand the use of reg9[2] / REGBIT_LOOKUP_NEIGHBOR_RESULT (assuming lookup_arp
<vi> returns 1 if entry exists), I am not sure it does? maybe I missed it ..
thanks,
-venu
[1]https://www.ietf.org/rfc/rfc5227.txt
Sorry, Han, for messing up the indents, looks like my outlook browser client is either set
correctly, or doesn’t work well.
Let me try from the app and see if it is any better..
From: ovn-kub...@googlegroups.com <ovn-kub...@googlegroups.com>
On Behalf Of Han Zhou
Sent: Friday, May 22, 2020 1:51 PM
To: Venugopal Iyer <venug...@nvidia.com>
Cc: Girish Moodalbail <gmood...@gmail.com>; Tim Rozet <tro...@redhat.com>; Dumitru Ceara <dce...@redhat.com>; Han Zhou <hz...@ovn.org>; Dan Winship <danwi...@redhat.com>; ovs-discuss <ovs-d...@openvswitch.org>; ovn-kub...@googlegroups.com;
Michael Cambria <mcam...@redhat.com>
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments |
[vi> ] yes, I believe that should work.
Do you think this works?
Regarding your question on lookup_arp(), today it looks up for the same IP-MAC binding, just avoid unnecessary updating if the pair already existed and not changed.
thanks,
-venu
Thanks,
Han
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCk%3DeqsrifyfSuBcLFUNdbtFOESdeqg-M%2BZch%2BiQNiJTiA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/eea5ee59-fb14-e11d-40c1-b33c72ffb470%40redhat.com.
On 5/28/20 12:48 PM, Daniel Alvarez Sanchez wrote:
> Hi all
>
> Sorry for top posting. I want to thank you all for the discussion and
> give also some feedback from OpenStack perspective which is affected
> by the problem described here.
>
> In OpenStack, it's kind of common to have a shared external network
> (logical switch with a localnet port) across many tenants. Each tenant
> user may create their own router where their instances will be
> connected to access the external network.
>
> In such scenario, we are hitting the issue described here. In
> particular in our tests we exercise 3K VIFs (with 1 FIP) each spanning
> 300 LS; each LS connected to a LR (ie. 300 LRs) and that router
> connected to the public LS. This is creating a huge problem in terms
> of performance and tons of events due to the MAC_Binding entries
> generated as a consequence of the GARPs sent for the floating IPs.
>
Just as an addition to this, GARPs wouldn't be the only reason why all
routers would learn the MAC_Binding. Even if we wouldn't be sending
GARPs for the FIPs, when a VM that's behind a FIP would send traffic to
the outside, the router will generate an ARP request for the next hop
using the FIP-IP and FIP-MAC. This will be broadcasted to all routers
connected to the public LS and will trigger them to learn the
FIP-IP:FIP-MAC binding.
To unsubscribe from this group and stop receiving emails from it, send an email to ovn-kubernete...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADO7ZnoBqbOvo-2jjTOKPA3otgA_4LYqiao2k718guFdW8kTAg%40mail.gmail.com.
Sorry for the delay, Han, a quick question below:
From: ovn-kub...@googlegroups.com <ovn-kub...@googlegroups.com>
On Behalf Of Han Zhou
Sent: Wednesday, June 3, 2020 4:27 PM
To: Girish Moodalbail <gmood...@gmail.com>
Cc: Tim Rozet <tro...@redhat.com>; Dumitru Ceara <dce...@redhat.com>; Daniel Alvarez Sanchez <dalv...@redhat.com>; Dan Winship <danwi...@redhat.com>; ovn-kub...@googlegroups.com; ovs-discuss <ovs-d...@openvswitch.org>; Michael Cambria <mcam...@redhat.com>;
Venugopal Iyer <venug...@nvidia.com>
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments |
Hi Girish, yes, that's what we concluded in last OVN meeting, but sorry that I forgot to update here.
On Wed, Jun 3, 2020 at 3:32 PM Girish Moodalbail <gmood...@gmail.com> wrote:
>
> Hello all,
>
> To kind of proceed with the proposed fixes, with minimal impact, is the following a reasonable approach?
>
> Add an option, namely dynamic_neigh_routes={true|false}, for a gateway router. With this option enabled, the nextHop IP's MAC will be learned through a ARP request on the physical network. The ARP request will be flooded on the L2 broadcast domain (for both
join switch and external switch).
>
The RFC patch fulfils this purpose: https://patchwork.ozlabs.org/project/openvswitch/patch/1589614395-99499-1-...@ovn.org/
I am working on the formal patch.
> Add an option, namely learn_from_arp_request={true|false}, for a gateway router. The option is interpreted as below:\
> "true" - learn the MAC/IP binding and add a new MAC_Binding entry (default behavior)
> "false" - if there is a MAC_binding for that IP and the MAC is different, then update that MAC/IP binding. The external entity might be trying to advertise the new MAC for that IP. (If we don't do this, then we will never learn External VIP to MAC changes)
>
> (Irrespective of, learn_from_arp_request is true or false, always do this -- if the TPA is on the router, add a new entry (it means the remote wants to communicate with this node, so it makes sense to learn the remote as well))
>
I am working on this as well, but delayed a little. I hope to have something this week.
[vi> ] Just wanted to check if this should be learn_From_unsolicit_arp (unsolicited ARP request or reply) instead of learn_from_arp_request? This is just to protect from potential rogue usage of GARP reply flooding the MAC bindings.?
Thanks,
-venu
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCma-PU%3D3Gd%3DKLOkzuWKrKdBmqWVc-%3Dd-h6KAUqcvbzMgA%40mail.gmail.com.
Sorry for the delay, Han, a quick question below:
From: ovn-kub...@googlegroups.com <ovn-kub...@googlegroups.com> On Behalf Of Han Zhou
Sent: Wednesday, June 3, 2020 4:27 PM
To: Girish Moodalbail <gmood...@gmail.com>
Cc: Tim Rozet <tro...@redhat.com>; Dumitru Ceara <dce...@redhat.com>; Daniel Alvarez Sanchez <dalv...@redhat.com>; Dan Winship <danwi...@redhat.com>; ovn-kub...@googlegroups.com; ovs-discuss <ovs-d...@openvswitch.org>; Michael Cambria <mcam...@redhat.com>; Venugopal Iyer <venug...@nvidia.com>
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
External email: Use caution opening links or attachments
Hi Girish, yes, that's what we concluded in last OVN meeting, but sorry that I forgot to update here.
On Wed, Jun 3, 2020 at 3:32 PM Girish Moodalbail <gmood...@gmail.com> wrote:
>
> Hello all,
>
> To kind of proceed with the proposed fixes, with minimal impact, is the following a reasonable approach?
>
> Add an option, namely dynamic_neigh_routes={true|false}, for a gateway router. With this option enabled, the nextHop IP's MAC will be learned through a ARP request on the physical network. The ARP request will be flooded on the L2 broadcast domain (for both join switch and external switch).>
The RFC patch fulfils this purpose: https://patchwork.ozlabs.org/project/openvswitch/patch/1589614395-99499-1-...@ovn.org/
I am working on the formal patch.
> Add an option, namely learn_from_arp_request={true|false}, for a gateway router. The option is interpreted as below:\
> "true" - learn the MAC/IP binding and add a new MAC_Binding entry (default behavior)
> "false" - if there is a MAC_binding for that IP and the MAC is different, then update that MAC/IP binding. The external entity might be trying to advertise the new MAC for that IP. (If we don't do this, then we will never learn External VIP to MAC changes)
>
> (Irrespective of, learn_from_arp_request is true or false, always do this -- if the TPA is on the router, add a new entry (it means the remote wants to communicate with this node, so it makes sense to learn the remote as well))>
I am working on this as well, but delayed a little. I hope to have something this week.
[vi> ] Just wanted to check if this should be learn_From_unsolicit_arp (unsolicited ARP request or reply) instead of learn_from_arp_request? This is just to protect from potential rogue usage of GARP reply flooding the MAC bindings.?
Hi Girish, Venu,I sent a RFC patch series for the solution discussed. Could you give it a try when you get the chance?
To view this discussion on the web visit https://groups.google.com/d/msgid/ovn-kubernetes/CADO7Znpsww1HqYa%3DmHt-gTz8qrDdOjOhkaO%2BvVA_OJiWynGO8g%40mail.gmail.com.