DASH Workgroup Community Update 9/10/2025

0 views
Skip to first unread message

Kristina Moore

unread,
Sep 11, 2025, 7:17:02 PM (3 days ago) Sep 11
to sonic...@googlegroups.com, sonic-o...@lists.sonicfoundation.dev, eddie.ruan, guizhao.lh, Yanfeng, Yuezhou, Murthy Vakkalagadda, Arun, Doddapaneni, Krishna, Moopath velayudhan, Mukesh, Narayanan, Swaminathan, Selvarajan, Arunachalam, Srinivasan, Vijay, Sundara Murthy Gurunathan, Thyamagundalu, Sanjay, Veerappan, Senthilnathan, Venkatesh Srinivasan, Marc Meunier, Chid, Harrish SJ, Madhu, Israel Meilik, Jai Kumar, Lisa Nguyen, Mohammad Hanif, Sandeep Balani, Suresh Satapati, Kannan Selvaraj, grboudre, Hon Lon Lum (honllum), janapal, nissampa, Sid Singhal, vijamoha, Abdel Baig (abdbaig), Anand Srinivasan, Anant Kishor Sharma, Andy Fingerhut, Andy Fingerhut, Ansel Li, Bhagyashree Hanumaiah (bhanumai), Bhavani, Carol Gal (cgal), Don Ewald (doewald), Dylan Peterson (dypeters), Franko Zamora Chacon (fzamora), Guy Duryee (guduryee), Jack Sexton (jacsexto), Joanna Li (joannali), Julia Tamayo (juledesm), Keerthy Erode Mohanasundaram (keerodem), Ken Parker (kentp), Krithika Srinivas (kritsrin), murali Venkateshaiah (muraliv), Perumal Venkatesh (pevenkat), Praveen Bhagwatula (pbhagwat), Ramesh Raghupathy (ram), Rob Murphy (robermur), Satish Ananthanarayana (sanantha), Selvam Ramanathan (selraman), Sudhir Kayamkulangara, TJ Barker (tjbarker), Venkat Sukavanam (vsukavan), Wenchung Wang (vincwang), Yue Gao (yuega2), Joseph White, Mark Sanders, Phaniraj Vattem, Senthil Kumar Ganesa, Shawn Dube, Venkatesan Mahalinga, Faisal Khan, Farhan Tariq J, Mohammad Qasim Farooqi, Saad Mazhar GMail, Zafir, Zarif Hafeez GMail, Ahmed Guetari, Chris McDonald, Heath Parrott, Joel Moses, John Gruber, Tony Torzillo, Ziv Saar, Ravindran Suresh, jame...@geico.com, Amith, Erum Frahim, Ghani, Ixim, Kwangsuk, Lin Songnan, Mahendar Byra, Meyappan K Gmail, Nitesh, Piotr P, Ravi, RS4681, Venkat External, Yoyo, Chatterjee, Deb, Cristian Dumitrescu, Dan Peng, Limaye, Namrata, Naren Mididaddi, Paul Kappler, Rao, Radhika, Shan Greer, Shweta Shrivastava, Singhai, Anjali, Stephen Doyle, Subramanian, Maheswari, Dean Lee, Alberto Villarreal, Alex Bortok, Chris Sommers, Manodipto Ghose, Mircea Dan Gheorghe, Nitesh Jha, Swaminathan Balasubramanian, Vinod Kumar, Alexander Cheskis, Mike Woster, Kishore Atreya, Sonny Mei, Brad House, Christian Kuhtz, John Evans, Rawal, Amol (Nokia - US/Westford), Abdul Rouff, Alan Lo, E Blatt, Eilon Greenstein, Gagan Punathil Ellath, Idan Hac, Liat Grozovik, Marian Pritsak, Matty Kadosh, Nikhil Sandugula, Oleksandr Ivantsiv, Paul Cummins, Shay Schlafman, Venice Hawa, Wei Bai, Yohad Tor, Yuval Degani, Madhu, Jamal Hadi Salim, Andriy Kokhan, Leonid Khedyk, Mykola Zhuravel, Tetyana Zubova, Michael Offel, Philipp Keydel, VolodymyrX Mytnyk, Aditya Sahni, Mahaboob Gani, Pranay Sahay, Sairam Rangaswamy, Satya Valli Rama, Sohan Prabhu (TATA CONSULTANCY SERVICES LTD), Syed Mehemood, Richard Wu, arham...@xflowresearch.com, Kanza Latif, Muhammad Ali, rimsh...@xflowresearch.com, Bud Grise, Ezra Y, John C Carney, Ted Weatherford, Vincent L

Hello DASH Open Source Community –thank you for your time this week!     

We continue to have a potential contribution back up for grabs.  It would be great to have a volunteer to suggest a PR in the dash-sonic-hld (in the SONiC repo here) for commands to show ENI counters and DPU global metrics – please submit a PR if you are interested!

 

In Summary, we discussed HA Feature Development and Integration including integration of HAMgrD, test case and automation progress, and recent fixes contributed by teams from Cisco, NVIDIA, and Keysight. 

 

We are almost at completion (with Lawrence Lee from the SONiC team) for the initial sonic-mgmt tests for PL NSG, Trusted VNI, Floating NIC, and Return Path ECMP.  Once done, we will move forward with flow offload re-testing. 

 

The team discussed inconsistencies in configuration files across different vendor images, focusing on fields like Region ID and Trusted VNI, and agreed on steps to standardize configurations and improve test coverage.  And lastly, we will open a work item regarding support for multiple VNIs per DPU and multi-tenancy as there is a gap between current schema capabilities and production requirements. 

 

Also, I am looking to leverage the Linux Foundation lists more.  If you could please take the time to enter your info into the list here, I can initiate deletion of the sonic-dash@googlegroups list we used when we began the project.    

For Complete Details, please see the “Full DASH Community Notes” near the end of this communication. 

 

Follow-up tasks:

 

  • Review of In-Progress PRs: review all "yellow in progress" PRs and check their current status, especially older and lower priority items. (Kristina)
  • Smart Switch Container Offloader HLD PR1976: Review the Container Offloader PR that is awaiting review from Gang and Prabhat. (Gang, Prabhat)
  • HA Scope Config Table Merge: Merge the HA scope config table for automation, currently waiting on Bing. (Bing)
  • Track VNEt Map PA Validation: Continue working on the track VNET map PA validation task. (Lawrence)
  • Demo of HA Functionality: Schedule and conduct a detailed demo or call next week to review HA functionality and DPU replacement strategies with Michal's team. (Jing)
  • SONIC State Table Handling Design: Finalize the design for handling the SONIC state table after DPU reboots, including whether to reprogram state or delete state to trigger re-push, and provide an update after POC. (Prabhat, SONIC team)
  • Configuration Differences Ticket: Open a GitHub ticket documenting configuration differences (diffs) between vendor images, assign to Prabhat, specify OS version, provide full config, and list the diffs observed. (Mircea)
  • Sample Config for Management Test Alignment: Provide a sample config used in POC to the SONIC team to ensure SONIC management tests are aligned with actual usage. (Michal)
  • Management Test Coverage for Config Diffs: Check if the specific diff fields identified by Mircea are covered in the SONIC management test and add coverage if missing. (Prabhat)
  • SONIC Schema Multi-Tenancy Support: Open a ticket to track updating the SONIC schema to support both Floating and VM modes (multi-tenancy) per DPU for long-term support, and assign to appropriate owners (e.g., Prince, Lawrence) when available. (Prabhat)
  • Clarification of Inbound/Outbound Lookup Implementation: Send an email listing questions and request all vendors to clarify how inbound/outbound lookup is implemented in their firmware for both normal and Floating modes. (Michal, Prabhat)
  • VTAP Feature Documentation Presentation: Coordinate with @Pranjal Shrivastava to present the VTAP feature documentation in the next month. (Kristina)

 

 

In Summary (full list below), since the last Community call we have:

25 PRs Completed (+ 13)

9 in To Do (+/- 0)

6 in Draft (+/- 0)

39 in Progress (+/- 0)

9 Awaiting Review (+/- 0)

 

Just a reminder that we would encourage/invite Community members to present to the Community (test runs or progress, new scenarios, etc…), just ‘r’ to let me know, or generate a PR in the repo.

The DASH channel link is here to subscribe / access WG content (and click the bell to receive notifications). 

 

Thank you for your time/contributions – see you on 9/17/2025

 

Meeting Title:  SONiC-DASH-Workgroup Community Meeting #159

Attendees (14):

DASH Group to join: https://groups.google.com/g/sonic-dash

Linux Foundation list: https://lists.sonicfoundation.dev/g/SONiC-Dash

 

Gagan Punathil Ellath - Nvidia

Mukesh MV - AMD

Ramesh Raghupathy - Cisco

Veerappan, Senthilnathan - AMD

Kristina Moore - MSFT

murali Venkateshaiah - Cisco

Shrivastava, Shweta - Intel

Vivek Reddy Karri - Nvidia

Michal Zygmunt - MSFT

Oleksandr Ivantsiv - Nvidia

Swami Balasubramanian - Keysight

Mircea Dan Gheorghe - Keysight

Prabhat Aravind - MSFT

Ted Weatherford - XSightLabs

 

  

 

Full DASH Community Notes 😊

 

  • HA Feature Development and Integration Progress: the team discussed ongoing high availability (HA) feature development, including integration of HAMgrD, test case progress, and recent fixes contributed by teams from Cisco, NVIDIA, Keysight, and others.
    • HAMgrD Integration: the integration test for HAMgrD is ongoing, with peering established between two DPUs and control plane connectivity in place. The team is now focusing on BFD signaling and fast failover testing, with private link flows being created and further testing planned.
    • HA Test Cases and Automation: the HA team is working on HA test cases, including the new HA scope config table and UHD support. The HA scope config table for automation was awaiting a merge by Bing, and additional cleanup logic for HA database entries was implemented by Fred from Cisco.
    • Recent HA-Related Fixes: include correcting the HA state setting, updating the DPU scope state table name, and implementing cleanup logic for HA actors to ensure proper database entry deletion.
    • HA Orchestration Workflow Review: Michal from Azure SDN described a review of the HA orchestration workflow, noting a need to clarify active/passive card orchestration and DPU replacement strategies. Jing is preparing a detailed demo for the team, and discussions are ongoing about supporting both active-active and active-standby modes.
  • PR and Feature Status Review: provided an overview of completed, in-progress, and awaiting-review pull requests (PRs), highlighting key features such as VNET mapping improvements, DPU graceful shutdown, ZMQ reliability fixes, and platform-specific updates from multiple vendors.  While our team is not metrics-driven, there has been significant progress with over 25 PRs completed since the last meeting and 40 done in the days since our last call.
    • PL NSG Tests Progress: Lawrence Lee from the SONiC team continues to work on  tests to cover DASH Trusted VNI, Floating NIC, and Return Path ECMP
    • VNET Mapping and Validation: The team is working on tracking VNET mapping usage of PA validation entries, with a feature to improve visibility and management. Mukesh at AMD implemented a fix to ensure VNET mapping entries are properly associated with port-map IDs in the data path.
    • Graceful Shutdown and ZMQ Fixes: A PR for module graceful shutdown support is under review, with Ramesh addressing comments. ZMQ reliability was improved by setting the ZMQ_IMMEDIATE flag and implementing lazy binding to prevent data loss during DPU restarts.
    • State Table and Versioning Discussion: Prabhat and Michal discussed the management of the GET state table, especially after DPU reboots. Two designs are under consideration: either SONIC reprograms the state after reboot or deletes the state to signal the control plane to re-push it. This is pending further internal discussion after the POC
    • Platform-Specific Updates: NVIDIA contributed a fix for platform PCIe checks to support both light and dark modes, while Cisco and AMD teams addressed various HA and SmartSwitch-related issues. Keysight added SmartSwitch UHD config support for HA testing.
    • Reporting and Metrics: Kristina is developing a Power BI dashboard for tracking GitHub activity across SONiC repositories, discussed her data filtering approach, and solicited feedback.
      • Filter: on relevant keywords such as DASH, Smart Switch, and DPU
    • tweek.
  • Configuration Consistency Across Vendors: discussed inconsistencies in configuration files across different vendor images, focusing on fields like Region ID and Trusted VNI, and agreed on steps to standardize configurations and improve test coverage.
    • Vendor-Specific Configuration Differences: Mircea raised issues where the same JSON configuration does not load across different vendor images due to fields like Region ID and Trusted VNI. Oleksandr clarified that Region ID is not used in SONIC and can be ignored, while Trusted VNI should be consistent.
    • Action Items for Configuration Diffs: The group agreed that Mircea would log tickets with configuration diffs, assigning them to Prabhat, who will verify if the differences are covered in management tests. If not, test coverage will be extended.
    • Sample Configurations and Test Alignment: Michal and Prabhat discussed the need for a sample configuration that works across all vendors. Once a clean run is achieved, the configuration will be shared to ensure SONIC management tests are aligned with POC programming.
  • SONIC Schema Limitations and Multi-Tenancy Modeling: the team discussed limitations in the SONIC schema regarding support for multiple VNIs per DPU and multi-tenancy, identifying gaps between current schema capabilities and production requirements, and planning for future schema updates.
    • Single VNI Limitation in Appliance Table: Vivek highlighted that the current SONIC schema allows only 1 VM VNI per appliance, while SAI supports multiple. This restricts multi-tenancy, as 1 DPU may need to support multiple ENIs in different modes.
    • POC Agreement and Long-Term Plans: For the POC, the team agreed to proceed with Floating NIC mode only, but acknowledged the need to update the schema for long-term support of both Floating and VM modes on a single DPU. Kristina and Prabhat agreed to open a tracking issue for this enhancement.
    • Vendor Implementation Differences: Mukesh clarified that AMD/Pensando does not use VM VNI for Express Gateway Bypass or Floating NIC mode, while NVIDIA relies on VM VNI for direction lookup. The group agreed to document and share implementation details for inbound/outbound lookup across vendors.
    • Next Steps and Action Items: including Mircea logging configuration diff tickets to provide to Prabhat, aligning management tests, documenting vendor-specific behaviors, and tracking schema enhancement requests

 

 

 

 

Sticky for Links/Reference:

 

 

DASH Groups to join to receive Invites, Meeting Notes, and Comms

DASH: https://groups.google.com/g/sonic-dash    

DASH-Test-Workgroup Group: https://groups.google.com/g/sonic-dash-test-workgroup  

Linux Foundation list: https://lists.sonicfoundation.dev/g/SONiC-Dash

If anyone knows potentially interested people who would like info re: our community, please have them joins these groups for receive Comms, etc…

Links to Recording 

Teams /Sharepoint

​mp4 icon SONiC-DASH Workgroup Community Meeting-20250910_090616-Meeting Recording.mp4

 

DASH Community
https://youtu.be/tiX697gAQUs
 

HA moved to SmartSwitch LF group on Thursdays

YouTube Behavioral Model:
No agenda this week

9/10/2025 DASH Community Call; please request access via the link if you are not able to view/listen

Azure DASH GitHub Repo:                     

https://github.com/sonic-net/DASH

 


Test/Docs folder:

https://github.com/sonic-net/DASH/blob/main/test/docs/dash-test-workflow-saithrift.md

Ideal test workflow is here, converted to .md

SAI Thrift     

SAI Thrift PR

Client server needed for testing

P4

https://opennetworking.org/p4/ and https://p4.org/working-groups/

Open source, domain-specific programming language for network devices, specifying packet processing for data plane devices (switches, routers, NICs, filters, etc.)

PINS

https://opennetworking.org/pins/

 

PNA consortium spec

https://p4.org/p4-spec/docs/PNA-v0.5.0.html

An architecture describing the structure and common capabilities of network interface controller (NIC) devices which process packets transiting one or more interfaces and a host system.

Describes the structure and capabilities of the pipeline, and a user program, which specifies the functionality of the programmable blocks within that pipeline. For more information, see the P4 Language Consortium specifications

IPDK

Infrastructure Programmer Development Kit (ipdk.io) and

https://github.com/ipdk-io/ipdk-io.github.io

IPDK is an open source, vendor agnostic framework of drivers and APIs for infrastructure offload and management which runs on a CPU, IPU, DPU or switch. IPDK runs in Linux and uses a set of well-established tools such as DPDK and P4 to enable network virtualization.

bmv2

https://github.com/p4lang/behavioral-model

The second version of the reference P4 software switch, nicknamed bmv2 (for behavioral model version 2). The software switch is written in C++11. It takes as input a JSON file generated from your P4 program by a P4 compiler and interprets it to implement the packet-processing behavior specified by that P4 program

DPDK

https://www.dpdk.org/

DPDK is the Data Plane Development Kit which consists of libraries to accelerate packet processing workloads running on a wide variety of CPU architectures.

Linux Foundation SmartSwitch

https://lists.sonicfoundation.dev/g/sonic-smartswitch/calendar

 

 

 

Thank you again for your participation…

Kristina Moore MBA, M.S., CISSP - Azure Core Principal PM / DASH & SmartSwitch
Office: 425-722-7720     Mobile: 425-876-2040     Email:
kri...@microsoft.com
DASH Group to join: https://groups.google.com/g/sonic-dash    
Linux Foundation:  
https://lists.sonicfoundation.dev/g/SONiC-Dash
ImageTitle: LinkedIn - Description: image of LinkedIn icon

 

Reply all
Reply to author
Forward
0 new messages