Reg DPB issues in 202111

2 views
Skip to first unread message

Sudharsan Dhamal Gopalarathnam

unread,
Mar 10, 2022, 1:04:30 AM3/10/22
to sonic-breako...@googlegroups.com, Zhenggen Xu, Praveen Chaudhary, Liat Grozovik, Alexander Allen, Moshe Moshe, Dror Prital, Yanzhao Zhang, NBU-Contact-Guohan Lu (EXTERNAL)

Hi @Zhenggen Xu, @Praveen Chaudhary,

 

We are currently validating DPB in 202111 branch. During validation we filed a list of issues. These are critical issues for us in qualifying DPB for 202111. Could you please investigate the issues below? We can also have a port breakout workgroup meeting if you want more details on these bugs.

 

https://github.com/Azure/sonic-buildimage/issues/10131 - [Functional] [DPB] [MSN4410] | incorrect port configuration after breakout

 

https://github.com/Azure/sonic-buildimage/issues/10191 - [DPB] | breakout cleanup fail with error "Key '{COUNTERS_PORT_NAME_MAP}' unavailable"

 

https://github.com/Azure/sonic-buildimage/issues/10005 - [DPB] [portsorch] Orchagent missing DELETE update from CONFIG_DB during DPB

 

https://github.com/Azure/sonic-buildimage/issues/9802 - [DPB] [lldp] LLDP does not have logic to gracefully handle port deletion

 

With Regards,

Sudharsan

Sudharsan Dhamal Gopalarathnam

unread,
Mar 14, 2022, 1:21:55 PM3/14/22
to sonic-breako...@googlegroups.com, Zhenggen Xu, Praveen Chaudhary, Yanzhao Zhang, Liat Grozovik, Alexander Allen, Moshe Moshe, Dror Prital, NBU-Contact-Guohan Lu (EXTERNAL)

Hi @Zhenggen Xu, @Praveen Chaudhary, @Yanzhao Zhang,

Can we have an update here? I believe we should have a DPB sub-group meeting to go over the issues.

 

With Regards,

Sudharsan

Yanzhao Zhang

unread,
Mar 14, 2022, 1:24:18 PM3/14/22
to Sudharsan Dhamal Gopalarathnam, sonic-breako...@googlegroups.com, Zhenggen Xu, Praveen Chaudhary, Liat Grozovik, Alexander Allen, Moshe Moshe, Dror Prital, Guohan Lu

During Yang subgroup meeting, @Praveen Chaudhary agreed to take a look at this and drive a discussion in DPB sub-group meeting.

 

Yours sincerely

Yanzhao Zhang | He/Him | SONiC PM

Email: yanz...@microsoft.com

Praveen Chaudhary

unread,
Mar 14, 2022, 2:23:25 PM3/14/22
to Yanzhao Zhang, Sudharsan Dhamal Gopalarathnam, sonic-breako...@googlegroups.com, Zhenggen Xu, Liat Grozovik, Alexander Allen, Moshe Moshe, Dror Prital, Guohan Lu, Gaurav Dawra

That is not correct,  I did not agree to look at any of these issues Yang WG.

All of them all missing analysis which connects them clearly with DPB. DPB is like 'config reload', it may impact many component, and if bug exists in another component it will be exposed during DPB.

For example: DPB pushed correct config DB changes, still Port is not deleted.  PortOrch is common code and It should be analysed why exactly port is not deleted, i.e. due to dependencies or some other bug.

https://github.com/Azure/sonic-buildimage/issues/10005 - [DPB] [portsorch] Orchagent missing DELETE update from CONFIG_DB during DPB

Thanks,

Regards
Praveen

From: Yanzhao Zhang <yanz...@microsoft.com>
Sent: Monday, March 14, 2022 10:24 AM
To: Sudharsan Dhamal Gopalarathnam <sudha...@nvidia.com>; sonic-breako...@googlegroups.com <sonic-breako...@googlegroups.com>; Zhenggen Xu <z...@linkedin.com>; Praveen Chaudhary <pchau...@linkedin.com>
Cc: Liat Grozovik <li...@nvidia.com>; Alexander Allen <ara...@nvidia.com>; Moshe Moshe <mmo...@nvidia.com>; Dror Prital <dr...@nvidia.com>; Guohan Lu <gu...@microsoft.com>
Subject: RE: Reg DPB issues in 202111
 

Yanzhao Zhang

unread,
Mar 14, 2022, 5:43:10 PM3/14/22
to Praveen Chaudhary, Sudharsan Dhamal Gopalarathnam, sonic-breako...@googlegroups.com, Zhenggen Xu, Liat Grozovik, Alexander Allen, Moshe Moshe, Dror Prital, Guohan Lu, Gaurav Dawra

@Praveen Chaudhary @Zhenggen Xu,

 

What is your suggestion for the next step?

 

Yours sincerely

Yanzhao Zhang | He/Him | SONiC PM

Email: yanz...@microsoft.com

 

Alexander Allen

unread,
Mar 17, 2022, 7:19:40 PM3/17/22
to Praveen Chaudhary, Yanzhao Zhang, Sudharsan Dhamal Gopalarathnam, sonic-breako...@googlegroups.com, Zhenggen Xu, Liat Grozovik, Moshe Moshe, Dror Prital, NBU-Contact-Guohan Lu (EXTERNAL), Gaurav Dawra

Praveen,

 

I have updated each of the bugs @Sudharsan mentioned. Please take a look.

 

You have correctly identified that some of these bugs involve other aspects of SONiC than just the sonic-utilities DPB code, however, if possible can we please have a DPB owner take a look at the analysis for these issues and confirm if the issue lies within another module of SONiC and if so involve that owner?

 

We are only seeing these issues during DPB, and the flows that are broken are generally dynamic port removal flows which we only see in place during DPB operations.

 

In that regard, it would help other feature owners in SONiC feel that appropriate regard had been taken before assigning the issues to them if we can confidently identify the responsible code for these issues. Having a second opinion that is deeply familiar with the expected DPB flows would help a lot here.

 

Please let me know if there is anything I can do to help, we are also checking reproducibility of each of these issues internally and we will report back soon on that but I believe you should have enough information to move forward for now. If you feel there would be a better person to work on an issue than yourself from the community please feel free to involve them I only wish to see us arrive at some consensus for each of these issues.

 

Thanks,

Alexander

 

From: Praveen Chaudhary <pchau...@linkedin.com>
Sent: Monday, March 14, 2022 2:23 PM
To: Yanzhao Zhang <yanz...@microsoft.com>; Sudharsan Dhamal Gopalarathnam <sudha...@nvidia.com>; sonic-breako...@googlegroups.com; Zhenggen Xu <z...@linkedin.com>
Cc: Liat Grozovik <li...@nvidia.com>; Alexander Allen <ara...@nvidia.com>; Moshe Moshe <mmo...@nvidia.com>; Dror Prital <dr...@nvidia.com>; NBU-Contact-Guohan Lu (EXTERNAL) <gu...@microsoft.com>; Gaurav Dawra <gda...@linkedin.com>
Subject: Re: Reg DPB issues in 202111

 

External email: Use caution opening links or attachments

Sudharsan Dhamal Gopalarathnam

unread,
Mar 21, 2022, 3:19:45 PM3/21/22
to Alexander Allen, Praveen Chaudhary, Yanzhao Zhang, sonic-breako...@googlegroups.com, Zhenggen Xu, Liat Grozovik, Moshe Moshe, Dror Prital, NBU-Contact-Guohan Lu (EXTERNAL), Gaurav Dawra

Hi @Praveen Chaudhary, @Zhenggen Xu, @Yanzhao Zhang,

 

Please let us know the next steps. Is it possible to have a sub-group meeting to discuss?

 

With Regards,

Sudharsan

Yanzhao Zhang

unread,
Mar 21, 2022, 8:24:56 PM3/21/22
to Sudharsan Dhamal Gopalarathnam, Alexander Allen, Praveen Chaudhary, sonic-breako...@googlegroups.com, Zhenggen Xu, Liat Grozovik, Moshe Moshe, Dror Prital, Guohan Lu, Gaurav Dawra

@Zhenggen Xu @Praveen Chaudhary,

 

Can you please help to find someone in Linkedin who is familiar with DPB to take a look at those bugs?

 

@Sudharsan Dhamal Gopalarathnam from SONiC | Workgroups (azure.github.io), DPB workgroup is done for now. We can discuss these bugs during SONiC issue triage meeting next time.

 

Yours sincerely

Yanzhao Zhang | He/Him | SONiC PM

Email: yanz...@microsoft.com

Simplicity is the ultimate Sophistication – Leonardo Da Vinci

Zhenggen Xu

unread,
Mar 22, 2022, 2:50:27 AM3/22/22
to Yanzhao Zhang, Sudharsan Dhamal Gopalarathnam, Alexander Allen, Praveen Chaudhary, sonic-breako...@googlegroups.com, Liat Grozovik, Moshe Moshe, Dror Prital, Guohan Lu, Gaurav Dawra

When is the SONiC issue triage meeting?   I can join for the discussions.

 

Currently LinkedIn team is shorthanded for fix the ongoing community issues for DPB, which were likely broken by new use cases, onboarding new features, Yang model missing etc… We can help to identify the root cause but would need help from the community to fix and retest (especially some of them are hard to reproduce or require a special HW).

 

I updated all 4 tickets,  also attach the summary here:

https://github.com/Azure/sonic-buildimage/issues/10131 , the analysis is correct, the uniqueness of the port by speed needs to be improved, maybe include the lane# etc.

https://github.com/Azure/sonic-buildimage/issues/10191  seems to be a timing issue with fast-reboot, may need wait for the table be ready before continuing the DPB process.

https://github.com/Azure/sonic-buildimage/issues/10005 it did seem the DEL op was missing from swss.rec, need redis-cli monitor data to see what could be wrong at redis level.

https://github.com/Azure/sonic-buildimage/issues/9802 seems to be a timing issue.  Is it also a temporary error message?  Some refactor of code is needed to solve that.

Yanzhao Zhang

unread,
Mar 22, 2022, 2:51:58 AM3/22/22
to Zhenggen Xu, Sudharsan Dhamal Gopalarathnam, Alexander Allen, Praveen Chaudhary, sonic-breako...@googlegroups.com, Liat Grozovik, Moshe Moshe, Dror Prital, Guohan Lu, Gaurav Dawra

3/30 is the next issue triage meeting.

 

Yours sincerely

Yanzhao Zhang | He/Him | SONiC PM

Email: yanz...@microsoft.com

Simplicity is the ultimate Sophistication – Leonardo Da Vinci

Reply all
Reply to author
Forward
0 new messages