Hi @Zhenggen Xu, @Praveen Chaudhary,
We are currently validating DPB in 202111 branch. During validation we filed a list of issues. These are critical issues for us in qualifying DPB for 202111. Could you please investigate the issues below? We can also have a port breakout workgroup meeting if you want more details on these bugs.
https://github.com/Azure/sonic-buildimage/issues/10131 - [Functional] [DPB] [MSN4410] | incorrect port configuration after breakout
https://github.com/Azure/sonic-buildimage/issues/10191 - [DPB] | breakout cleanup fail with error "Key '{COUNTERS_PORT_NAME_MAP}' unavailable"
https://github.com/Azure/sonic-buildimage/issues/10005 - [DPB] [portsorch] Orchagent missing DELETE update from CONFIG_DB during DPB
https://github.com/Azure/sonic-buildimage/issues/9802 - [DPB] [lldp] LLDP does not have logic to gracefully handle port deletion
With Regards,
Sudharsan
Hi @Zhenggen Xu, @Praveen Chaudhary, @Yanzhao Zhang,
Can we have an update here? I believe we should have a DPB sub-group meeting to go over the issues.
With Regards,
Sudharsan
During Yang subgroup meeting, @Praveen Chaudhary agreed to take a look at this and drive a discussion in DPB sub-group meeting.
Praveen,
I have updated each of the bugs @Sudharsan mentioned. Please take a look.
You have correctly identified that some of these bugs involve other aspects of SONiC than just the sonic-utilities DPB code, however, if possible can we please have a DPB owner take a look at the analysis for these issues and confirm if the issue lies within another module of SONiC and if so involve that owner?
We are only seeing these issues during DPB, and the flows that are broken are generally dynamic port removal flows which we only see in place during DPB operations.
In that regard, it would help other feature owners in SONiC feel that appropriate regard had been taken before assigning the issues to them if we can confidently identify the responsible code for these issues. Having a second opinion that is deeply familiar with the expected DPB flows would help a lot here.
Please let me know if there is anything I can do to help, we are also checking reproducibility of each of these issues internally and we will report back soon on that but I believe you should have enough information to move forward for now. If you feel there would be a better person to work on an issue than yourself from the community please feel free to involve them I only wish to see us arrive at some consensus for each of these issues.
Thanks,
Alexander
From: Praveen Chaudhary <pchau...@linkedin.com>
Sent: Monday, March 14, 2022 2:23 PM
To: Yanzhao Zhang <yanz...@microsoft.com>; Sudharsan Dhamal Gopalarathnam <sudha...@nvidia.com>; sonic-breako...@googlegroups.com; Zhenggen Xu <z...@linkedin.com>
Cc: Liat Grozovik <li...@nvidia.com>; Alexander Allen <ara...@nvidia.com>; Moshe Moshe <mmo...@nvidia.com>; Dror Prital <dr...@nvidia.com>; NBU-Contact-Guohan Lu (EXTERNAL) <gu...@microsoft.com>; Gaurav Dawra <gda...@linkedin.com>
Subject: Re: Reg DPB issues in 202111
|
External email: Use caution opening links or attachments |
Hi @Praveen Chaudhary, @Zhenggen Xu, @Yanzhao Zhang,
Please let us know the next steps. Is it possible to have a sub-group meeting to discuss?
With Regards,
Sudharsan
@Zhenggen Xu @Praveen Chaudhary,
Can you please help to find someone in Linkedin who is familiar with DPB to take a look at those bugs?
@Sudharsan Dhamal Gopalarathnam from SONiC | Workgroups (azure.github.io), DPB workgroup is done for now. We can discuss these bugs during SONiC issue triage meeting next time.
Simplicity is the ultimate Sophistication – Leonardo Da Vinci
When is the SONiC issue triage meeting? I can join for the discussions.
Currently LinkedIn team is shorthanded for fix the ongoing community issues for DPB, which were likely broken by new use cases, onboarding new features, Yang model missing etc… We can help to identify the root cause but would need help from the community to fix and retest (especially some of them are hard to reproduce or require a special HW).
I updated all 4 tickets, also attach the summary here:
https://github.com/Azure/sonic-buildimage/issues/10131 , the analysis is correct, the uniqueness of the port by speed needs to be improved, maybe include the lane# etc.
https://github.com/Azure/sonic-buildimage/issues/10191 seems to be a timing issue with fast-reboot, may need wait for the table be ready before continuing the DPB process.
https://github.com/Azure/sonic-buildimage/issues/10005 it did seem the DEL op was missing from swss.rec, need redis-cli monitor data to see what could be wrong at redis level.
https://github.com/Azure/sonic-buildimage/issues/9802 seems to be a timing issue. Is it also a temporary error message? Some refactor of code is needed to solve that.
3/30 is the next issue triage meeting.
Simplicity is the ultimate Sophistication – Leonardo Da Vinci