MoM of today's OCP SONiC call 06/18/2019.
Topics discussed
- Error Handling - BRCM (Santhosh)
Review (Q&A)
We had a great discussion, there are lot of inputs from community and here is some. Feel free to add missing comments here.
- How does framework supports multiple CRUD failures?
[Ben]: See below
- Do you provide a knob to switch off Error handling feature? Is knob necessary?
[Ben]: No knob is necessary. The error handling proposal is a framework that is available for a) implementation of error reporting in SWSS on a feature-by-feature basis and b) application processing of such errors. Both a) and b) are implementation choices that can be made on an feature-by-feature basis. And if an application does not want to process a supported error, then it can just ignore it.
- Does the applications get out of order notifications from feedback loop? How to handle in the case of it? Ex: User does create/delete/create and do you expect the error feedback come in order?
[Ben]: The specific comment was that the key/values used to refer to APP_DB (or other) in an ERROR_DB report may not be specific enough to distinguish between different error events. The example given (by Nikos) was a route add-withdraw-add case - since the APP_DB table entry may be the same between the 2 adds, then, if there's an error report, how does the application (FRR in this case) know which of the adds failed? We will come back on this point.
- What is the design decision behind a new Error DB? Why can't we merge error attributes into APP DB?
[Ben]: We thought about both options, and decided that the ERROR_DB gave a bit more flexibility and avoided changing existing application tables. It was not a clear decision, but we see no reason to move away from it.
- What is the mechanism to synchronize route CRUD between APP DB vs new Error DB?
[Ben]: See above
- Is new Error DB is a mirror of APP DB?
[Ben]: Not really - but each error table entry points to a corresponding entry in another table (usually APP_DB)
- The current design mentioned an approach to stop propagate the failed/error routes to the neighbors? This may not right as per RFC, the routes should propagate though the it failed due to some policy. (Nikos)
[Ben]: This topic went beyond scope of the framework (#1 above) and into the BGP doc (#2). We will setup a separate offline discussion for this.
Overall feedback - The feedback loop is necessary to address SAI fatal errors. However the community requested the design should dis associate/de couple the feedback loop as much as possible so that applications have freedom to react/handle it own way.
[Ben]: That's exactly how it's setup today.
one option suggested - Framework should more generic and should accommodate opaque error context for the applications.
[Ben]: This is a different topic - see above ("The specific comment was that the key/values ....")
Xin will extend an offline discussion on this topic, stay tuned.
Announcements
- SONiC Release 201908 tracking page - Xin can you post the link
- Action Item for community - Signup for PR reviews
MoM of today's OCP SONiC call 06/04/2019.
Topics discussed
- STP/PVST - Sandeep (BRCM)
Q & A
- Can this STP feature compile time disabled? BRCM will explore this (compile time/run time options to disable/enable STP/PVST feature)
- Warm reboot not supported for PVST? Community requested more details need to be added to design
- Multiple questions what is the design decision on why STP states are not programming to Kernel? Few questions: 1) With the current STP design - the STP states are not populating in kernel, ASIC and Kernel will be out of sync, what is the downside ? 2) Let's say Port/Vlan is not blocking in the kernel, but is blocked in ASIC, then what is the behavior with arp/ping/ospf in this scenarios ? BRCM should document the scenarios.
- Community requested to document the ASIC and Kernel out of sync scenarios - AI BRCM
- There should be no drop if HW says forwarding? yes
- Is there mechanism to program the states in to Kernel ? BRCM to explore on it
- If the trap is configured on port which is blocked does the packet comes to CPU? yes, based on the trap configurations.
- When port is blocked in HW, what are the packets should send? - HW shouldn't block L2 packets/LACP exchanges but drop L3 packets.
- Can COPP program to trap to cpu ? Yes
- HLD on NAT - Kiran Kella (BRCM)
Q & A
- Does it support payload/embedded headers (ALGs- application level gateways) support ? Not right now.
- Continue discussion next sub group meeting.
Announcements
- Next sub group meetings HLD on NAT, SFLow
MoM of today's OCP SONiC SUB GROUP call 05/28/2019.
Topics discussed:
- Status on MLAG Design discussions - Nephos Team
Q & A
- Does this solution addressed L3 MLAG alone? Both L3 and L2. It seems L2 MLAG HLD need some updates.
- Does MCLAG supports MulitCast? Nephos team will update the HLD with all the use-cases and missing pieces.
- When is the next meeting to discuss on MCLAG ? June 11th
- Community requested Nephos team for Updated MCLAG HLD before Jun 11th.
Action Items/Announcements
MoM of today's OCP SONiC call 05/21/2019.
Topics discussed
- L2 - FDB/MAC enhancements - Anil (Broadcom)
Q & A
- FDB aging per device ? yes
- Does FDB aging support per sec ? yes
- Can MAC aging support per port and VLAN ? Anil will add support to the proposal
- Design on restrict the warning logs on VLAN range feature support? Broadcom will consider this in the proposal [Aggregated log etc.]
- Does this feature need SAI support from vendors ? (no new SAI attributes), Broadcom will list SAI APIs using it currently for this feature.
- How does Vlan range updates implemented? vlan range being consolidated at config_db and apply down to the hardware in single shot, no deletes and adds.
- Do we have FDB type in the fdb entry ? yes [static vs dynamic] and will be displayed in show commands
- How does FDB optimizations on topo/STP event flush ? out side of ASIC, in the case of broadcom flushes are quick.
- How does system wide fdb flush ? It should handled by SAI, by go over all the ports and Vlans, vendor specific.
- Community ask on MAC aging & MAC move scale numbers? Broadcom will add into the proposal
- BFD - Sumit Agarwal (Broadcom)
Q & A
- Discussed on BFD implementations phase 1 & Phase 2.
- In BFD Phase-1 : BFD is part of BGP docker
- In BFD Phase 2 : BFD will implement in Hardware.
- Can SONiC Users turn off if they don't want? yes through compile time, but community suggested don't run default, provide CLI to enable it.
- How BFD works with warm reboots ? 1) planned warm reboot, users can update the BFD timers upfront 2) unplanned warm reboot BFD session will timeout before BGP timeouts.
- Can configure/control BFD timeouts on remote Bgp peers? Question from Nikos. Need discussion more.
Announcements
- More design reviews lineup for Aug 2019.
- Provide feedbacks on PRs
- Watch out for bi weekly meeting on design proposals and reviews.
MoM of today's OCP SONiC call 05/07/2019.
Topics discussed
- SONiC 201908 release Planning - 05/07/2019
Q & A
- Need code review support for multi-db performance improvements - MSFT & AVIZ Networks
- What is the scope of Error handling mechanism work by BRCM - It covers SAI error surfacing and handling
- What is the scope of Configuration validations - Open for design, current scope is use syslog mechanism to propagate the config errors.
- What is the VRF feature planned in SONiC? it is VRF lite support not the MPLS.
- Do we have plan for multi-tenancy VPN with VRF feature? No, that would be handles separately.
- When is the VRF lite design review - Expected 5/21
- What is the ETA for dynamic breakout - Xin will work with LNKD
- For dynamic breakout, is it possible to get ASIC vendor ETA ? Xin will talk to ASIC vendors [an ETA early July would help to test it]
- Do we have a list of platform APIs ? refer PMON APIs
- How to earn OCP credits for companies - Checkout the OCP website for how to get credits to such as software contributions etc.
- Is sub-port feature is same as sub-interface ? yes
- What kind of features run on sub-port? No HLD yet, Jipan will come back with HLD on this
- Can we have small description on sub-port ? Xin will work with Alibaba
- When is the SAI proposal on sFlow? Dell working on the SAI proposal for sFlow and will send for design review.
- What does SONiC side use for slow ? HSflowD, its a opensource package and need to check the licensing [Need to explore the licensing part, work with Xin]
- Build improvements - experimental BRCM ? design review needed on the changes. Ben will provide a design review
- What is Mgmt framework - Goal is to easily manage the sonic switch? [models, serialization, unified cli, gnmi]
- What is the BFD for FRR used for - for BGP failures
- Does BFD-FRR required SAI support ? No, for the current work, not using any SAI BFD APIs, will be using on next iteration.
- Does SONiC official release support on ONL ? No, SONiC has tight roadmap next 8 months.
Announcements
- OCP events - www.opencompute.org/events/upcoming events - road show Taiwan, Beijing, India
- SONiC next meeting 05/21/2019
- SONiC team will use Workgroup meetings other alternative Tuesday [Test workgroups & MLAG/L2 workgroups etc. ]
APR release
- Redis performance - out of the apr release
- CLI improvement - moved to next release
- Any ETA for APR release stabilizations - need to estimate