Multi-asic support for fabric ASICs

Eswaran Baskaran

unread,

Mar 19, 2021, 6:06:29 PM3/19/21

to sonic-chassis-subgroup, Ngoc Do, Samuel Angebault, Maxime Lorrillere

Hi All,

Here's a problem the Arista team has been looking at and it would be useful to get community input on this topic.

Problem: Today, we use the multi-asic model to start the containers needed for all the fabric asics in the supervisor of a chassis switch. There is a per-platform asic.conf file that carries a NUM_ASIC variable that's used to generate the service files that starts as many containers as are needed for the number of asics in the system. In the Arista architecture, the same supervisor is used in a 4-slot chassis or a 8-slot chassis. I suspect this might be the case for other vendors too. This implies we will need to set NUM_ASIC to a different value depending on where the supervisor is deployed and not just have it a static value.

Some options to solve this.

1. Introduce a new notion called "chassis_configuration". We could create the asic.conf file per chassis_configuration in the swi and have a platform API that populates the chassis_configuration by invoking some vendor implemented function. This will work, except that we have to introduce this new notion in Sonic.

2. Start the system with NUM_ASIC=<max possible> and find a way to gracefully kill off or pause the containers for asics that don't exist in the system.

3. Start the system with NUM_ASIC=0 and find a way to start the new containers once the actual number of asics is known. At that point, we could run systemd daemon-reload to initialize the required service files and then have the new containers running. We tried this out but have a challenge in figuring out how to load the config for the new asic after startup.

Any thoughts/suggestions on this topic?

Thanks,

Eswaran

Sureshkannan

unread,

Mar 19, 2021, 8:07:14 PM3/19/21

to Eswaran Baskaran, sonic-chassis-subgroup, Ngoc Do, Samuel Angebault, Maxime Lorrillere

During other design discussions, option 2 with help of Feature Table was proposed for this problem. I don't know how exactly the feature table can be extended for this purpose, the idea was to define max fabric numbers (That many swss, syncd) features will be created but not activated until the slot presence is notified by pmon.

Currently /etc/sonic/init_cfg.json has the features

"swss": {
"state": "enabled",
"has_timer" : false,
"has_global_scope": false,
"has_per_asic_scope": true,
"auto_restart": "enabled",
"high_mem_alert": "disabled"
},

How does per asic scope is enabled and disabled? for example, swss0 to be enabled when slot0 is present/online.

Thanks,

Suresh

--
You received this message because you are subscribed to the Google Groups "sonic-chassis-subgroup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sonic-chassis-sub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CA%2BrWxa2%2BU1%2BJJNE6BKCFrC4Pc6E1PWFkYjRUtKwQskss%2Bzi1hA%40mail.gmail.com.

Eswaran Baskaran

unread,

Mar 19, 2021, 8:12:54 PM3/19/21

to Sureshkannan, sonic-chassis-subgroup, Ngoc Do, Samuel Angebault, Maxime Lorrillere

These two PRs are proposed to use feature flags for this -

https://github.com/Azure/sonic-utilities/pull/1493/files

https://github.com/Azure/sonic-buildimage/pull/7026/files

This doesn't exactly work because we also need to run database container only for the asics that need it.

Sureshkannan

unread,

Mar 19, 2021, 8:24:51 PM3/19/21

to Eswaran Baskaran, Maxime Lorrillere, Ngoc Do, Samuel Angebault, sonic-chassis-subgroup

Why can’t database instance to be running all the time and only syncd is enabled/disabled based on slot presence?

Thanks

Suresh

Ngoc Do

unread,

Mar 19, 2021, 8:52:51 PM3/19/21

to Sureshkannan, Eswaran Baskaran, Maxime Lorrillere, Samuel Angebault, sonic-chassis-subgroup

FEATURE TABLE is in database (not database@X) which always runs. So database@X is also disabled if asic X is not in the system.

I think that with FEATURE TABLE support, we can start with MAX_POSSIBLE_NUM_ASIC in asic.conf. Once the system identifies an actual number of asics, we can disable/mask the corresponding containers/services created for the asics not available in the system.

I think the downside of this design is we will create a bunch of services and then disable them, that sounds not so right.

It could also depend how fast the services are disabled. I see disabling via FEATURE TABLE takes some time. Say if swss/syncd for asic X which is not in the system runs before they could be disabled, they fail and issue a bunch of syslog messages, which is also not what we want.

Thanks,

Ngoc

Eswaran Baskaran

unread,

Mar 19, 2021, 9:00:33 PM3/19/21

to Ngoc Do, Sureshkannan, Maxime Lorrillere, Samuel Angebault, sonic-chassis-subgroup

What's the mechanism that would be used to disable the swssX/syncdX for asic X that doesn't exist ? That part is not yet clear to me.

Sureshkannan

unread,

Mar 19, 2021, 9:24:08 PM3/19/21

to Ngoc Do, Eswaran Baskaran, Maxime Lorrillere, Samuel Angebault, sonic-chassis-subgroup

IMHO, most of the production deployment will have all fabric cards populated and current solution is very much optimized for production use case. So I consider all services to be active is good for production deployment. Also it’s simple solution.

You would still check for PCI slot presence before actually attaching to the fabric. That should take care of containers failing/syslog messages.

My 2 cents,

Suresh

On Fri, Mar 19, 2021 at 5:52 PM Ngoc Do <ngo...@arista.com> wrote:

Eswaran Baskaran

unread,

Mar 19, 2021, 9:40:38 PM3/19/21

to Sureshkannan, Maxime Lorrillere, Ngoc Do, Samuel Angebault, sonic-chassis-subgroup

Appreciate your quick response, Suresh. A couple of questions/notes.

1. I agree we can assume all cards will be present all the time, unless fabric cards are removed/replaced. We can cross that bridge later. For the Arista case, we also have 4-slot chassis and fabric cards with different numbers of ASICs. So, for production deployments, we will have situations with fewer ASICs than the max possible. I want to make sure spurious logs are not going to be a concern operationally if we go with this model.

2. "You would still check for PCI slot presence before actually attaching to the fabric" Do you know where this happens? I am still a bit unclear on what mechanism we can use to update the FEATURE flags for the asics that are absent. We could write something Arista specific, but this feels like it should be something more generic.

3. We should also consider fabric card hotswaps as a regular production operation and design for it at some point.

Thanks,

Eswaran

Sureshkannan

unread,

Mar 20, 2021, 3:48:54 AM3/20/21

to Eswaran Baskaran, Maxime Lorrillere, Ngoc Do, Samuel Angebault, sonic-chassis-subgroup

2. I was thinking syncd.sh or docker_image_ctl.j2 can be enhanced to check if /proc/linux-kernel-bde has a device identified/resources allocated. Until then, syncd.sh will be waiting for PCI device probed and resource allocated. This will avoid any spurious logs until the device is actually probed by the kernel.

3. for hot swap, it can be a safe/graceful method of removal. Users/Operators are expected to deconfigure/shutdown the SW services first before actually physically removing the fabric from the slot. They are kernel resources to be de-allocated and it's better be graceful/user driven SW unplug and then HW unplug. Certainly this isn't cool but it reduces software complications.

Thanks,

Suresh

Shyam Kumar

unread,

Mar 22, 2021, 3:19:55 PM3/22/21

to Sureshkannan, Ngoc Do, Eswaran Baskaran, Maxime Lorrillere, Samuel Angebault, sonic-chassis-subgroup

Happened to see/check this thread today

I brought up this issue 4-5 months ago :

NUM_ASIC is the max number of ASICs on a card.

Since RP is catering to all CPU-less FCs, it may not have all FCs populated (and operational) even in a production/deployed scenario.

So, spawning of services/containers should be based on FC-NPU detected at run-time and not static (based on max ASICs/NPUs possible).

Further, when a chassis is in a steady/operational state, system can encounter failure cases (thermal/ ASIC related etc.) or config/topology changes:

ASICs may detect fault (SBE/MBE/parity etc.) and entire FC to be reloaded or shutdown; someone may remove FC; or change in FC configuration modes etc.

Considering all these long-term/eventual deployment goals, I'd recommend designing based on the number of ASICs detected at run-time (chassis bring-up/reload; card bring-up/reload; config-relaod).

At Cisco, I/team explored few options and have following proposal

Proposal A)

Start with the max NUM_ASIC possible for a given chassis_type. Maks the services for ASICs not available [systemctl mask <service>]

Mechanism: Have a vendor specified chassis_topology.serviceto detect MAX and FC presence/absence and then mask services accordingly

This helped/worked to certain extent but has following challenges to be discussed/addressed from SONiC infra standpoint

(i) database services are started before platform-topology.service and as per current impl., some global services require all instances to be running.

SONiC needs to handle this for asics which are not present.

(ii) config-reload has dependency on NUM_ASIC and will try to stop and restart all instantiated services for all asics. This try to start all masked services as well and as it won;t be able to start those, config reload fails. So this script would require handling for masked services.

(iii) There are other functionalities like show lldp which are functions for NUM_ASIC and do not handle uninitialized/down asics.

Also, in this email thread, some of us feel that it's a downside to create services and then disable them!

Proposal B)

Run all services (for max possible NUM_ASIC) but do not start containers for ASICs not available/present.

H-L Workflow: systemdctl starts the service instance like sw...@0.service. The service startup script creates a service container.

Modify service startup script (such as swss.sh) to use platform-hook where platform/vendor in turn can update about ASIC presence/absence.

Don't spawn the container if ASIC is not detected else spawn the container and continue with the current workflow.

When ASIC detected later (like FC insertion), this script to be notified via platform-hook of ASIC arrival and script to resume (i.e. spawn required container).

Pros:

This will make sure no container instance runs for unidentified asics and also satisfies systemdctl.

Also, takes care of FC insertion/removal and ASIC failure(s).

Proposal C)

Run all services and containers (for max possible NUM_ASIC).

H-L Workflow: Container to detect ASIC presence or not. Facilitate this via platform-hook (where platform/vendor to update about ASIC presence/absence)

If yes (i.e. ASIC present), simply continue with the current workflow else If not, container to stay put in a dormant state.

I'd prefer B the most as it takes care of the issue at the bottom-most layer and helps with dynamic insertion/removal of FCs (and ASIC(s) failures).

Fee free to share your feedback/thoughts...

Thanks,

Shyam

To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CAAfdQhNQq0KQ1Sr9v%3D_acdEPJEhVsRdeykzToHy_uyHj5wFzVg%40mail.gmail.com.

Eswaran Baskaran

unread,

Mar 22, 2021, 3:52:59 PM3/22/21

to Shyam Kumar, Sureshkannan, Ngoc Do, Maxime Lorrillere, Samuel Angebault, sonic-chassis-subgroup

Hi Shyam,

Thanks for the comprehensive description.

One question on your proposal B. When a card is removed after the system reaches steady state, the swss/syncd container will have to be shutdown so that it can come back up and wait for re-insertion. Did you have an idea on how that would work?

Thanks,

Eswaran

Shyam Kumar

unread,

Mar 22, 2021, 4:22:54 PM3/22/21

to Eswaran Baskaran, Sureshkannan, Ngoc Do, Maxime Lorrillere, Samuel Angebault, sonic-chassis-subgroup

Hi Eswaran,

In proposal B, intent/plan is to avoid spawning container (under a service).

So, put service on hold (i.e, not spawn container) on ASIC absence based on platform-hook input and resume service flow (i.e. spawn container) when platform-hook notifies about ASIC presence (e.g. on account of FC insertion).

'resume' could be changed to 'restart service' - depending upon which option works better for the system (as a whole).

Thanks,

Shyam

>> Don't spawn the container if ASIC is not detected else spawn the container and continue with the current workflow.

>> When ASIC detected later (like FC insertion), this script to be notified via platform-hook of ASIC arrival and script to resume (i.e. spawn required container).

Eswaran Baskaran

unread,

Mar 22, 2021, 6:32:43 PM3/22/21

to Shyam Kumar, Sureshkannan, Ngoc Do, Maxime Lorrillere, Samuel Angebault, sonic-chassis-subgroup

Thanks Anand and Shyam.

Thinking more about the service that could detect the card presence and the related platform-hook, how about this?

1. chassisd(pmon) can invoke a platform API that detects the card presence and map that to the ASIC numbers. I think the card presence API might already be there?

2a. Based on the ASIC presence, chassisd can populate the FEATURE table in config DB per ASIC and this can enable/disable the service as needed.

2b. Based on the ASIC presence, chassisd can populate something that unblocks the swss/syncd startup script.

We could discuss between 2a and 2b on Wednesday and pick an approach. What do you all think?

Eswaran Baskaran

unread,

Mar 24, 2021, 2:21:44 PM3/24/21

to Shyam Kumar, Anshu Verma, Judy Joseph, Sureshkannan, Ngoc Do, Maxime Lorrillere, Samuel Angebault, sonic-chassis-subgroup

Hi All,

Thanks for the discussion today. What time works best for you all to continue this discussion on Friday?

A) Friday 3/26 9-10AM

B) Friday 3/26 10-11AM

Anshu/Judy, can you please add Arvind to this email thread as well?

Thanks,

Eswaran

Shyam Kumar

unread,

Mar 25, 2021, 12:07:01 AM3/25/21

to Eswaran Baskaran, Anshu Verma, Judy Joseph, Sureshkannan, Ngoc Do, Maxime Lorrillere, Samuel Angebault, sonic-chassis-subgroup

Hi Eswaran,

Sorry, have conflict for both time slots!

Can we try for Monday?

Meanwhile, if possible, can folks please chime in and share their high-level thought process / proposal etc. we touched upon in meeting, like:

1. spawning pmon before swss (and any other relevant service which currently spawns before pmon)

2. populating feature-table to enable/disable service

3. Pros/Cons of statically spawning MAX N services, containers and then disabling them later for N-M (M = number of NPUs not found based on FC absence detection)

vs dynamically spawning 'M' services only after detecting 'M' NPUs.

Can we think about shifting altogether to a dynamic model?

Thanks,

Shyam

Eswaran Baskaran

unread,

Mar 25, 2021, 5:58:55 PM3/25/21

to Judy Joseph, Shyam Kumar, Arvindsrinivasan Lakshmi Narasimhan, Anshu Verma, Sureshkannan, Ngoc Do, mlorrillere, staphylo, sonic-chassis-subgroup

Shyam,

Can you propose a time on Monday? Other than 1-2pm PST, I am available.

Here's how I understood the latest proposal.

1. System starts up with NUM_ASIC set to MAX_ASICs. But, we change the systemd generator to only spawn the database service files and not the other service files.

2. PMON starts up and the chassisd daemon detects card presence, figures out which asics are present and populates a) the FEATURE table in config DB and b) some other config file describing the ASICs including its PCI address.

3. hostcfgd reacts to the FEATURE flag and creates the service files as necessary and enables the services.

4. Since the database containers were already up per-ASIC and the minigraph was already parsed into the per-ASIC config DB, the per-ASIC containers should be able to start up without issues.

Thanks,
Eswaran

On Wed, Mar 24, 2021 at 11:01 PM Judy Joseph <Judy....@microsoft.com> wrote:

Including Arvind to the email thread ..

Shyam Kumar

unread,

Mar 26, 2021, 1:46:53 PM3/26/21

to Eswaran Baskaran, Judy Joseph, Arvindsrinivasan Lakshmi Narasimhan, Anshu Verma, Sureshkannan, Ngoc Do, mlorrillere, staphylo, sonic-chassis-subgroup

Hi Eswaran,

Monday 11 am or anytime between 2 pm to 3:30 pm (as meeting start time) works for me

Thanks,

Shyam

Eswaran Baskaran

unread,

Mar 26, 2021, 4:49:42 PM3/26/21

to Shyam Kumar, Judy Joseph, Arvindsrinivasan Lakshmi Narasimhan, Anshu Verma, Sureshkannan, Ngoc Do, mlorrillere, staphylo, sonic-chassis-subgroup

All of those slots work for me as well, but I would prefer 2-3pm

Judy/Arvind, your input is quite important for this discussion. Can you please let us know if one of these slots work?

Eswaran Baskaran

unread,

Mar 26, 2021, 4:50:33 PM3/26/21

to Shyam Kumar, Judy Joseph, Arvindsrinivasan Lakshmi Narasimhan, Anshu Verma, Sureshkannan, Ngoc Do, mlorrillere, staphylo, sonic-chassis-subgroup

Suresh/Nokia team as well, can you let us know if these timeslots work for you?

Sureshkannan

unread,

Mar 26, 2021, 6:03:51 PM3/26/21

to Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Hi Eawaran/All,

11am or after 2:30pm works for me

Thanks & Regards

Suresh

Judy Joseph

unread,

Mar 26, 2021, 7:48:04 PM3/26/21

to Sureshkannan, Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Hi Eswaran

2/2:30 would be good for me as well

Regards,

Judy

To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CAAfdQhOcr5cR-NayucNQWtcgaP6yDXd9MK7fKQ_amVGZX9AqHQ%40mail.gmail.com.

Eswaran Baskaran

unread,

Mar 26, 2021, 7:57:26 PM3/26/21

to Judy Joseph, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Thanks everyone. Anshu, could you please help set up a call at 2:30pm PST on Monday? I don't have teams access to do this.

Anshu Verma

unread,

Mar 26, 2021, 8:57:47 PM3/26/21

to eswaran, Judy Joseph, Sureshkannan, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Reserving slot to discuss following:

1. System starts up with NUM_ASIC set to MAX_ASICs. But, we change the systemd generator to only spawn the database service files and not the other service files.

2. PMON starts up and the chassisd daemon detects card presence, figures out which asics are present and populates a) the FEATURE table in config DB and b) some other config file describing the ASICs including its PCI address.

3. hostcfgd reacts to the FEATURE flag and creates the service files as necessary and enables the services.

4. Since the database containers were already up per-ASIC and the minigraph was already parsed into the per-ASIC config DB, the per-ASIC containers should be able to start up without issues.

________________________________________________________________________________

Microsoft Teams meeting

Join on your computer or mobile app

Click here to join the meeting

Or call in (audio only)

+1 323-849-4874,,774543066# United States, Los Angeles

Phone Conference ID: 774 543 066#

Find a local number | Reset PIN

Learn More | Meeting options

________________________________________________________________________________

_____________________________________________
From: Eswaran Baskaran <esw...@arista.com>
Sent: Friday, March 26, 2021 4:57 PM
To: Judy Joseph <judy....@gmail.com>
Cc: Sureshkannan <suresh...@gmail.com>; Anshu Verma <ans...@microsoft.com>; Arvindsrinivasan Lakshmi Narasimhan <Arvindsriniv...@microsoft.com>; Judy Joseph <Judy....@microsoft.com>; Ngoc Do <ngo...@arista.com>; Shyam Kumar <shy...@gmail.com>; mlorrillere <mlorr...@arista.com>; sonic-chassis-subgroup <sonic-chass...@googlegroups.com>; staphylo <stap...@arista.com>
Subject: Re: [EXTERNAL] Re: Multi-asic support for fabric ASICs

invite.ics

Eswaran Baskaran

unread,

Mar 29, 2021, 7:17:03 PM3/29/21

to Judy Joseph, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Thanks everyone. That was a good discussion. Some notes from the meeting.

AI: Arista will make hostcfgd changes in step 3.

AI: Arista will investigate the chassisd changes for active asics.

AI: Shyam/Anand will investigate systemd_generator vs hostcfgd dependency change (swss/syncd after hostcfgd) with respect to masking all services.

Existing service start sequence

Systemd generator creates service files for MAX_ASICs (based on asic.conf)
All database services run first

Host database service runs first
Per-ASIC database service runs first
pmon starts in parallel after host database starts

Minigraph is parsed and config db is updated
Other ASIC services are started
Hostcfgd is operational to react to FEATURE table and mask/unmask services

Why not start syncd/swss all the time?

Syncd might start too early before the PCI device can be discovered (the PCI device may not show up in the device tree) and this could lead to the ASIC not coming up because syncd doesn’t start after 3 crashes.
The actual PCI address may not be known for the device by the time swss/syncd starts.

Proposal

1. System starts up with NUM_ASIC set to MAX_ASICs. But, we change the

systemd generator to only spawn the database service files and not the

other service files.

Can systemd generators generate these service files but leave the services disabled? Can the per-ASIC databases also start disabled? No, we should have the per-ASIC databases start no matter what because the config load should happen even when the cards are not present yet. (config-setup service will load the db from either config_db.json or minigraph and this service needs all the database services to be up)

We can avoid this step if we can change the dependencies so that swss service can start after hostcfgd. This is an alternative design.

2. PMON starts up and the chassisd daemon detects card presence, figures

out which asics are present and populates a) the FEATURE table in config DB

and b) some other config file describing the ASICs including its PCI

Address.

Chassisd will be notified by the platform of the list of active ASICs (PCI devices are detected) for a given slot and publish that list into the existing STATE table.

Addendum: chassisd can, in addition, get the PCI address of these devices and populate the STATE table.

3. hostcfgd reacts to the FEATURE flag and creates the service files as

necessary and enables the services.

Can hostcfgd listen to the chassis STATE table and the FEATURE table content to turn the services on/off as needed for each ASIC?

Addendum: Can hostcfgd populate the CONFIG DB with the PCI address for these ASICs?

4. Since the database containers were already up per-ASIC and the minigraph

was already parsed into the per-ASIC config DB, the per-ASIC containers should be able to start up without issues.

Sureshkannan

unread,

Mar 30, 2021, 9:23:46 AM3/30/21

to Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Thinking more on this from what Arvind asked, what is the issue if all syncd.sh (max ASICS) is started and waiting for CHASSIS_STATE_TABLE to be created by chassisd, corresponding slot to be online and PCID devid to be written by chassisd. In this approach, it’s inline with multi asic solution and only difference is pcid devid became discoverable from platform driver.

When SFM removed(hot unplug), syncd container will anyway exit and goes back waiting for slot to be online.

Thanks

Suresh

Eswaran Baskaran

unread,

Mar 30, 2021, 2:09:32 PM3/30/21

to Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Questions on this proposal - Can syncd.sh just be in a blocking state like that forever? Does that mean the service is UP but the container is not started? How will this interact with the liveliness check of the system?

Shyam Kumar

unread,

Mar 30, 2021, 3:02:56 PM3/30/21

to Eswaran Baskaran, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Hi Suresh,

Have additional Qs on this proposal :

1) "inline with multi asic solution" - can you elaborate on this? and benefits of keeping it inline?

2) Just bringing service (and not container) would mean hostcfgd to trigger/notify service to resume its container spawning - right?

- "starting service and container as part of service spawning" seems simpler approach then this one

- Other services/daemons having any dependency/expectation once this service is spawned?

If so, such cases to be tracked, handled - required synchronization/code flow to be added

- Need to assess implications on reload operations (config-reload, whole-board reload wrt shutting down services gracefully/ in proper order) with this approach

- on FC reload, container to be brought down only and not service? and then service to go back into blocking/wait state?

3) If this proposal means service (and its container too), wouldn't this land into issue what we discussed:

service started but NPU/ASIC not there or its initialization not through (or other bring-up faults etc.)?

leading to unnecessary repeated restart eventually aborting! and not starting service at all

I'd prefer to spawn service immediately followed by service spawning its container AND both to go down together (on NPU/board failure events).

IMO, it would provide more streamlined/comprehensible workflow considering - bring-up, steady-state and bring-down - scenarios

Thanks,

Shyam

Sureshkannan

unread,

Mar 30, 2021, 3:54:38 PM3/30/21

to Shyam Kumar, Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Hi Shyam/Eswaran,

Reason to inline with multi-asic solution, it's already solved multi asic problem in sonic. Things that are different for a modular system is that asic is hot swappable and pci-devid is not static. So, my thinking is to enhance the multi-asic solution that is per asic sonic apps service files are created and started in systemd generated using MAX_ASIC and all service (.sh files started but not the container)... only syncd "container" has to be kept in waiting till asic to be online.

syncd.sh will be running but the container won't be created until it finds the asic (aka slot) found to be online.

syncd.sh can wait like database.sh (linecard per asic database).. systemd state will be "activating" (not active). I already see the code below in database.sh

if [ "$DATABASE_TYPE" != "chassisdb" ]; then
# Wait until supervisord and redis starts. This change is needed
# because now database_config.json is jinja2 templated based
# and by the time file gets generated if we do redis ping
# then we catch python exception of file not valid
# that comes to syslog which is unwanted so wait till database
# config is ready and then ping
until [[ ($(docker exec -i database$DEV pgrep -x -c supervisord) -gt 0) && ($($SONIC_DB_CLI PING | grep -c PONG) -gt 0) ]]; do
sleep 1;
done

One thing I'm not clear is, how do we pass the PCI-DEVID to syncd. Does it need to be configured in db or can it be an argument like how orchagent takes "-m" for the mgmt mac. Since PCI-DEVID isn't static anymore, Is it OK to be an argument and let system/bringup pass this while spawning the syncd?.

Thanks,

Suresh

Eswaran Baskaran

unread,

Mar 30, 2021, 4:03:30 PM3/30/21

to Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Suresh,

This example is not great for services that are expected to be in this state in steady state, right? It’s checking on the status every 1s forever. Will it be killed at some point in your thinking?

Sureshkannan

unread,

Mar 30, 2021, 5:05:05 PM3/30/21

to Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Certainly asic presence notification(push) can be done instead of poll. The loop that polls can be a simple lua script that subscribes and waits instead of poll?. Sorry, I'm not lua expert but someone can throw more light on this.

Thanks,

Suresh

Sureshkannan

unread,

Mar 30, 2021, 5:50:21 PM3/30/21

to Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

If lua script is possible, slight improvement can be done further.

Lets say fabric-asi...@n.service introduced and it uses lua script wait.. make swss/syncd services as dependent services of this fabric-asi...@n.service.

n being {0..MAX_ASIC}.

asic removal/shutdown is more of user driven activity.

# config asic shutdown <n>

Thanks,

Suresh

Shyam Kumar

unread,

Mar 31, 2021, 11:59:44 AM3/31/21

to Sureshkannan, Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Hi Suresh,

>> Reason to inline with multi-asic solution, it's already solved multi asic problem in sonic. Things that are different for a modular system is that asic is hot swappable and pci-devid is not static.

[Shyam] Beside ASIC being hot-swappable and pcie-id learning dynamic nature, number of ASICs detected (and effective) at boot-time/run-time make a lot of difference from what was decided earlier (statically spawn max possible etc.).

spawning all services (and their containers) based on Static max possible is OK for LC/Fixed box where all ASIC/NPU comes up (and expected to remain up) as long as LC/ Fixed box is up/running.

However, that doesn't seem to fit the bill for Modular chassis (with CPU-less FCs on RP/Supervisor)

>>So, my thinking is to enhance the multi-asic solution that is per asic sonic apps service files are created and started in systemd generated using MAX_ASIC and all service (.sh files started but not the container)... only syncd "container" has to be kept in waiting till asic to be online.

[Shyam] Had few Qs (please refer to #2 in my earlier response).

We need to take into account all these aspects from a system standpoint. Any gotcha with any of them (or any future point) won't be a fool-proof solution!

Also, have an underlying Q: I don't see/foresee advantage of "spawning service w/o its container or leaving container in wait/hold state & then resume later" VS "spawning both service and its container together & die together". Latter would be cleaner, scalable, less error-prone considering bring-up/steady-state and failures/reload across system

I couldn't see any technical/architectural reason to follow existing workflow/mechanism

>> syncd.sh will be running but the container won't be created until it finds the asic (aka slot) found to be online.

>>syncd.sh can wait like database.sh (linecard per asic database).. systemd state will be "activating" (not active). I already see the code below in database.sh

[Shyam] This is more from an implementation standpoint but prior to that, IMHO, we need to list pros/cons of this vs what we decided, discussed per Monday sync-up

>> asic removal/shutdown is more of user driven activity. # config asic shutdown <n>

[Shyam] User performing ASIC removal/shutdown is a rare case.

ASIC may hit SBE/MBE/parity or other kinds of errors while the system is operational carrying/transiting traffic.

These may happen any time and SW (along with the underlying platform/HW) should be capable of handling all such faults.

In such cases, the platform would ask NOS (SONiC) to take various possible actions - ASIC reset/shutdown config-reload/ whole-board reload/shutdown chassis reload etc.

In short, All such cases should be dynamically detected and handled.

Thanks,

Shyam

Sureshkannan

unread,

Mar 31, 2021, 5:34:06 PM3/31/21

to Shyam Kumar, Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Hi Shyam

Regarding

On Wed, Mar 31, 2021 at 8:59 AM Shyam Kumar <shy...@gmail.com> wrote:

Hi Suresh,

>> Reason to inline with multi-asic solution, it's already solved multi asic problem in sonic. Things that are different for a modular system is that asic is hot swappable and pci-devid is not static.
[Shyam] Beside ASIC being hot-swappable and pcie-id learning dynamic nature, number of ASICs detected (and effective) at boot-time/run-time make a lot of difference from what was decided earlier (statically spawn max possible etc.).
spawning all services (and their containers) based on Static max possible is OK for LC/Fixed box where all ASIC/NPU comes up (and expected to remain up) as long as LC/ Fixed box is up/running.
However, that doesn't seem to fit the bill for Modular chassis (with CPU-less FCs on RP/Supervisor)

Suresh> In my view, let's treat FC to be treated as a Line Card.

FC wil have admin enabled and admin disabled in the configuration. FC's that aren't plugged in, i would expect users to have admin disabled (most preferred) and hence some of the dockers can be shutdown. But again, this is user/operator driven. (not system driven).

>>So, my thinking is to enhance the multi-asic solution that is per asic sonic apps service files are created and started in systemd generated using MAX_ASIC and all service (.sh files started but not the container)... only syncd "container" has to be kept in waiting till asic to be online.
[Shyam] Had few Qs (please refer to #2 in my earlier response).
We need to take into account all these aspects from a system standpoint. Any gotcha with any of them (or any future point) won't be a fool-proof solution!
Also, have an underlying Q: I don't see/foresee advantage of "spawning service w/o its container or leaving container in wait/hold state & then resume later" VS "spawning both service and its container together & die together". Latter would be cleaner, scalable, less error-prone considering bring-up/steady-state and failures/reload across system
I couldn't see any technical/architectural reason to follow existing workflow/mechanism

Suresh> In my view, multi asic has already solved multi-asic and gives control to users to decide what to do with HW and also SW. The state of art is defined by the user.

>> syncd.sh will be running but the container won't be created until it finds the asic (aka slot) found to be online.
>>syncd.sh can wait like database.sh (linecard per asic database).. systemd state will be "activating" (not active). I already see the code below in database.sh
[Shyam] This is more from an implementation standpoint but prior to that, IMHO, we need to list pros/cons of this vs what we decided, discussed per Monday sync-up

>> asic removal/shutdown is more of user driven activity. # config asic shutdown <n>
[Shyam] User performing ASIC removal/shutdown is a rare case.
ASIC may hit SBE/MBE/parity or other kinds of errors while the system is operational carrying/transiting traffic.
These may happen any time and SW (along with the underlying platform/HW) should be capable of handling all such faults.
In such cases, the platform would ask NOS (SONiC) to take various possible actions - ASIC reset/shutdown config-reload/ whole-board reload/shutdown chassis reload etc.
In short, All such cases should be dynamically detected and handled.

Suresh> Isn't this the case for fixed asic as well? any fixed asic platform could run into this issue or even line cards can run into this issue?. So, in disaggregated architecture, it's all handled in disaggregated modeling. What I mean is, its local issue and chassis could help or the user has advantage to put local policy, it's all defined beyond the platform.

Sureshkannan

unread,

Mar 31, 2021, 8:53:43 PM3/31/21

to Shyam Kumar, Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Just to summarize the overall steps/procedure.

asic.conf can have MAX_ASIC as per vendor platform.
systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)
1. fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
2. fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.
3. Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.
global database container (config_db) can have per FC admin enable/disable.
if admin disabled, as part of configuration handling,
1. disable fabric-asi...@n.service and all other per asic services.
if admin enabled,

fabrice-asi...@n.service will be waiting until asic is online.

Maybe, we can go over the details in a call if required.

Thanks,

Suresh

Eswaran Baskaran

unread,

Apr 2, 2021, 12:49:25 PM4/2/21

to Sureshkannan, Shyam Kumar, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Suresh,

This should work overall. Some questions inline.

On Wed, Mar 31, 2021 at 5:53 PM Sureshkannan <suresh...@gmail.com> wrote:

Just to summarize the overall steps/procedure.
asic.conf can have MAX_ASIC as per vendor platform.
systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)

fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.
Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.

Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?

global database container (config_db) can have per FC admin enable/disable.
if admin disabled, as part of configuration handling,

disable fabric-asi...@n.service and all other per asic services.

Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?

Sureshkannan

unread,

Apr 2, 2021, 3:37:38 PM4/2/21

to Eswaran Baskaran, Shyam Kumar, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Answers inline below

Suresh>

On Fri, Apr 2, 2021 at 9:49 AM Eswaran Baskaran <esw...@arista.com> wrote:

Suresh,

This should work overall. Some questions inline.

On Wed, Mar 31, 2021 at 5:53 PM Sureshkannan <suresh...@gmail.com> wrote:
Just to summarize the overall steps/procedure.
asic.conf can have MAX_ASIC as per vendor platform.
systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)

fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.
Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.
Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?

Suresh> CHASSIS_STATE_TABLE has to be enhanced for this purpose. Please feel free to make the changes. We could help with review.

Suresh> When we added this, there was no requirement to make PCI-ID to be discovered and allow non static PCI-ID. In my view, as such it's still a vendor platform specific requirement. SONiC already allows PCI-ID to be configured and hence operators/users can still configure if that's what is desired.

global database container (config_db) can have per FC admin enable/disable.
if admin disabled, as part of configuration handling,

disable fabric-asi...@n.service and all other per asic services.
Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?

Suresh> Looking at currently implemented design pattern references in SONiC, I think these are called xyz-cfgmgr (i.e nbrmgr) but it's still inside SWSS containers. One can have multi-asic-cfgmgr running on host namespace. But it's not really needed to have a cfgmgr for operations like asic shutdown/startup alone. I would propose to enhance config reload(python scripts) to take care of shutdown/startup of an asic instance and update config-db as part of this cli interface. It's a cli interface implementation or part of minigraph handling. I don't know the exact current implementation of config reload but it seems like a good fit.

Eswaran Baskaran

unread,

Apr 2, 2021, 3:54:17 PM4/2/21

to Sureshkannan, Shyam Kumar, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

On Fri, Apr 2, 2021 at 12:37 PM Sureshkannan <suresh...@gmail.com> wrote:

Answers inline below
Suresh>

On Fri, Apr 2, 2021 at 9:49 AM Eswaran Baskaran <esw...@arista.com> wrote:
Suresh,

This should work overall. Some questions inline.

On Wed, Mar 31, 2021 at 5:53 PM Sureshkannan <suresh...@gmail.com> wrote:
Just to summarize the overall steps/procedure.
asic.conf can have MAX_ASIC as per vendor platform.
systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)

fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.
Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.
Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?
Suresh> CHASSIS_STATE_TABLE has to be enhanced for this purpose. Please feel free to make the changes. We could help with review.
Suresh> When we added this, there was no requirement to make PCI-ID to be discovered and allow non static PCI-ID. In my view, as such it's still a vendor platform specific requirement. SONiC already allows PCI-ID to be configured and hence operators/users can still configure if that's what is desired.

Ok, we (Arista) will take a stab at defining this.

global database container (config_db) can have per FC admin enable/disable.
if admin disabled, as part of configuration handling,

disable fabric-asi...@n.service and all other per asic services.
Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?
Suresh> Looking at currently implemented design pattern references in SONiC, I think these are called xyz-cfgmgr (i.e nbrmgr) but it's still inside SWSS containers. One can have multi-asic-cfgmgr running on host namespace. But it's not really needed to have a cfgmgr for operations like asic shutdown/startup alone. I would propose to enhance config reload(python scripts) to take care of shutdown/startup of an asic instance and update config-db as part of this cli interface. It's a cli interface implementation or part of minigraph handling. I don't know the exact current implementation of config reload but it seems like a good fit.

The expectation would be for the user to turn off the card and do 'config reload' on the supervisor for the services to disappear. That's reasonable. In my opinion though, if we get the fabric-asic-bootstrap service done right, it can disable the swss/syncd services for fabric cards that disappear (using the existing FEATURE table mechanism). I think we can start solving the first part of the problem (CHASSIS_TABLE schema, pmon API, fabric bootstrap service) and revisit this.

Thanks,

Eswaran

Shyam Kumar

unread,

Apr 6, 2021, 7:22:39 PM4/6/21

to Eswaran Baskaran, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Hi Suresh, Easwaran,

Apologies for catching up late on this!

@Suresh - Thanks for laying out high-level workflow.

Just to be on the same page, this is similar model to what we discussed in the call last week i.e. primarily hostcfgd replaced with fabric-asi...@n.service - right?

comment inline

On Fri, Apr 2, 2021 at 12:54 PM Eswaran Baskaran <esw...@arista.com> wrote:

On Fri, Apr 2, 2021 at 12:37 PM Sureshkannan <suresh...@gmail.com> wrote:
Answers inline below
Suresh>

On Fri, Apr 2, 2021 at 9:49 AM Eswaran Baskaran <esw...@arista.com> wrote:
Suresh,

This should work overall. Some questions inline.

On Wed, Mar 31, 2021 at 5:53 PM Sureshkannan <suresh...@gmail.com> wrote:
Just to summarize the overall steps/procedure.
asic.conf can have MAX_ASIC as per vendor platform.
systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)

fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.

[Shyam] Does "blocking the service state become active until slot/asic is online" implies swss, syncd services under each fabric-asi...@n.service would get spawned prior to checking FC/ASIC presence and these swss, syncd services would be put on hold until 'ASIC online' is notified? also implies their respective containers won't be spawned?

This means all (MAX) ASICs swss, syncd services started even if FC (and hence ASIC) were absent/not-inserted?

In that case, I'd recommend NOT-enabling these services (swss, syncd) at all until ASIC online status is notified to fabric-asi...@n.service. This would keep the bring-up, going-down workflow simpler, comprehensible and less complex while debugging.

Can we look into this?

Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.
Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?
Suresh> CHASSIS_STATE_TABLE has to be enhanced for this purpose. Please feel free to make the changes. We could help with review.
Suresh> When we added this, there was no requirement to make PCI-ID to be discovered and allow non static PCI-ID. In my view, as such it's still a vendor platform specific requirement. SONiC already allows PCI-ID to be configured and hence operators/users can still configure if that's what is desired.

[Shyam] can you please update/confirm on following queries (or we can take a note/AI to check them)

a) fabric-asi...@n.service has NO dependency whatsoever on any other service/daemon of the system and vice-versa?

b) Each of fabric-asi...@n.service is a standalone service/utility for that ASIC from the time its detected until it goes offline?

c) For FCs not preset/inserted in the chassis, fabric-asi...@n.service would keep running indefinitely? and causes no-harm/ side-effect(s)?

d) None of the operations - config-reload, system-daemon-reload, chassis reload - are impacted when FC (or NPU) is turned-off/not found on going down or coming up path

Ok, we (Arista) will take a stab at defining this.

global database container (config_db) can have per FC admin enable/disable.
if admin disabled, as part of configuration handling,

disable fabric-asi...@n.service and all other per asic services.
Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?
Suresh> Looking at currently implemented design pattern references in SONiC, I think these are called xyz-cfgmgr (i.e nbrmgr) but it's still inside SWSS containers. One can have multi-asic-cfgmgr running on host namespace. But it's not really needed to have a cfgmgr for operations like asic shutdown/startup alone. I would propose to enhance config reload(python scripts) to take care of shutdown/startup of an asic instance and update config-db as part of this cli interface. It's a cli interface implementation or part of minigraph handling. I don't know the exact current implementation of config reload but it seems like a good fit.

The expectation would be for the user to turn off the card and do 'config reload' on the supervisor for the services to disappear. That's reasonable. In my opinion though, if we get the fabric-asic-bootstrap service done right, it can disable the swss/syncd services for fabric cards that disappear (using the existing FEATURE table mechanism). I think we can start solving the first part of the problem (CHASSIS_TABLE schema, pmon API, fabric bootstrap service) and revisit this.

[Shyam] Besides user-configurable FC shutdown operation and config reload, there could/would be a case of FC ASIC hitting fault.

This may happen at runtime and system (SW) to act upon by shutting down the FC.

wouldn't the H-L flow is like - PMON/chassisD taking action and updating CHASSIS_STATE_TABLE about FC and all its ASICs offline.

Each of the ASIC state is in turn notified to fabric-asi...@n.service (the one who is subscribed for this). This in turn would disable all services like swss@n; syncd@n etc. and then self.

IMO - be it user/admin-configured FC enable/disable or system initiated (due to fault/otherwise) - all should have a common workflow.

In that case, how and where does the FEATURE table mechanism fit?

Thanks,

Shyam

Sureshkannan

unread,

Apr 7, 2021, 1:15:45 PM4/7/21

to Shyam Kumar, Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Hi Shyam

Please find inline answers.

Suresh>

Thanks,

Suresh

On Tue, Apr 6, 2021 at 4:22 PM Shyam Kumar <shy...@gmail.com> wrote:

Hi Suresh, Easwaran,

Apologies for catching up late on this!
@Suresh - Thanks for laying out high-level workflow.
Just to be on the same page, this is similar model to what we discussed in the call last week i.e. primarily hostcfgd replaced with fabric-asi...@n.service - right?
comment inline

On Fri, Apr 2, 2021 at 12:54 PM Eswaran Baskaran <esw...@arista.com> wrote:

On Fri, Apr 2, 2021 at 12:37 PM Sureshkannan <suresh...@gmail.com> wrote:
Answers inline below
Suresh>

On Fri, Apr 2, 2021 at 9:49 AM Eswaran Baskaran <esw...@arista.com> wrote:
Suresh,

This should work overall. Some questions inline.

On Wed, Mar 31, 2021 at 5:53 PM Sureshkannan <suresh...@gmail.com> wrote:
Just to summarize the overall steps/procedure.
asic.conf can have MAX_ASIC as per vendor platform.
systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)

fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.
[Shyam] Does "blocking the service state become active until slot/asic is online" implies swss, syncd services under each fabric-asi...@n.service would get spawned prior to checking FC/ASIC presence and these swss, syncd services would be put on hold until 'ASIC online' is notified? also implies their respective containers won't be spawned?
This means all (MAX) ASICs swss, syncd services started even if FC (and hence ASIC) were absent/not-inserted?

In that case, I'd recommend NOT-enabling these services (swss, syncd) at all until ASIC online status is notified to fabric-asi...@n.service. This would keep the bring-up, going-down workflow simpler, comprehensible and less complex while debugging.
Can we look into this?

Suresh> only when fabric-asic-bootstrap service is active, systemd will spawn other services. fabric-asic-bootstrap service won't be active until fc/asic is online.

Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.
Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?
Suresh> CHASSIS_STATE_TABLE has to be enhanced for this purpose. Please feel free to make the changes. We could help with review.
Suresh> When we added this, there was no requirement to make PCI-ID to be discovered and allow non static PCI-ID. In my view, as such it's still a vendor platform specific requirement. SONiC already allows PCI-ID to be configured and hence operators/users can still configure if that's what is desired.
[Shyam] can you please update/confirm on following queries (or we can take a note/AI to check them)
a) fabric-asi...@n.service has NO dependency whatsoever on any other service/daemon of the system and vice-versa?

Suresh> swss, syncd is depends on fabric-asic-bootstrap service (ie. after=fabric-asic-bootstrap.service)

b) Each of fabric-asi...@n.service is a standalone service/utility for that ASIC from the time its detected until it goes offline?

Suresh> yes.

c) For FCs not preset/inserted in the chassis, fabric-asi...@n.service would keep running indefinitely? and causes no-harm/ side-effect(s)?

Suresh> fabric-asic-bootstrap.service will be running but sleeping for the event from chassis-d for card/asic to be online.

d) None of the operations - config-reload, system-daemon-reload, chassis reload - are impacted when FC (or NPU) is turned-off/not found on going down or coming up path

Suresh> I'm not fully understand the question. fabric-asic-bootstrap.service that is spawned for FC(which is not inserted/not-found/turned-off), will not have any impact on existing operations.

Ok, we (Arista) will take a stab at defining this.

global database container (config_db) can have per FC admin enable/disable.
if admin disabled, as part of configuration handling,

disable fabric-asi...@n.service and all other per asic services.
Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?
Suresh> Looking at currently implemented design pattern references in SONiC, I think these are called xyz-cfgmgr (i.e nbrmgr) but it's still inside SWSS containers. One can have multi-asic-cfgmgr running on host namespace. But it's not really needed to have a cfgmgr for operations like asic shutdown/startup alone. I would propose to enhance config reload(python scripts) to take care of shutdown/startup of an asic instance and update config-db as part of this cli interface. It's a cli interface implementation or part of minigraph handling. I don't know the exact current implementation of config reload but it seems like a good fit.

The expectation would be for the user to turn off the card and do 'config reload' on the supervisor for the services to disappear. That's reasonable. In my opinion though, if we get the fabric-asic-bootstrap service done right, it can disable the swss/syncd services for fabric cards that disappear (using the existing FEATURE table mechanism). I think we can start solving the first part of the problem (CHASSIS_TABLE schema, pmon API, fabric bootstrap service) and revisit this.
[Shyam] Besides user-configurable FC shutdown operation and config reload, there could/would be a case of FC ASIC hitting fault.
This may happen at runtime and system (SW) to act upon by shutting down the FC.
wouldn't the H-L flow is like - PMON/chassisD taking action and updating CHASSIS_STATE_TABLE about FC and all its ASICs offline.

Suresh> Fault monitoring of FC card/asic isn't out of scope at this point.In my view, It depends on vendor specific. Certainly pmond can do fault monitoring of a FC asic and notify fabric-asic-bootstrap.service via chassis_state_db with state=fault,. That will make fabric-asic-bootstrap.service to shutdown swss,syncd and itself.

Each of the ASIC state is in turn notified to fabric-asi...@n.service (the one who is subscribed for this). This in turn would disable all services like swss@n; syncd@n etc. and then self.
IMO - be it user/admin-configured FC enable/disable or system initiated (due to fault/otherwise) - all should have a common workflow.
In that case, how and where does the FEATURE table mechanism fit?

Suresh> In my view, the FEATURE table isn't much useful for fabric asic handling. fabric-asic-bootstrap.service will be single controller of activating, deactivating all fabric related asic dockers (i.e swss, syncd, fabric-asic-bootstrap.service)

Eswaran Baskaran

unread,

Apr 7, 2021, 1:51:39 PM4/7/21

to Sureshkannan, Shyam Kumar, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Catching up on this discussion.

A fabric-asic-bootstrap.service will work fine for our purposes. Just to confirm - this is not a per-ASIC service right? Just a global service that turns on/off all the per-asic services, right? If so, the choice is between using hostcfgd vs fabric-asic-bootstrap.service to do this. We could go with the fabric-asic-bootstrap.service if this is a problem super-specific to fabric ASICs or the hostcfgd model if this problem is slightly more general and could potentially include other types of ASICs in the future. I will let the Sonic experts decide this choice.

From the Arista side, we are working out the platform API details and the schema for CHASSIS_ASIC_TABLE. We will post that soon.

Shyam Kumar

unread,

Apr 7, 2021, 2:09:12 PM4/7/21

to Eswaran Baskaran, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Hi Eswaran,

>> Just to confirm - this is not a per-ASIC service right?

This is per-ASIC service.

>> Just a global service that turns on/off all the per-asic services, right?

No. That's not the case. Its per-ASIC service - fabric-asi...@n.service - 'n' implies service for that 'n' th ASIC

Excerpts from what Suresh earlier (#2) in this email thread and today morning:

2. systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.

b) Each of fabric-asi...@n.service is a standalone service/utility for that ASIC from the time its detected until it goes offline?

Suresh> yes.

Thanks,

Shyam

Sureshkannan

unread,

Apr 7, 2021, 2:28:19 PM4/7/21

to Eswaran Baskaran, Shyam Kumar, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

fabric-asi...@n.service is per asic instance. It will listen to its own asic to be online/offline/fault/etc events.

sw...@n.service will have

----

after=fabric-asi...@n.service

sy...@n.service will have

----

after=fabric-asi...@n.service

thinking more on this, we could name it as "asic-bo...@n.service" and let it always on for all cards. supervisor card, it will be running and putting state as "activating" or "active" based on slot/asic online. On the linecards, it will be empty and always returns success. This is to keep common services across all types of cards. (i.e supervisor or linecards)

Thanks,

Suresh

Eswaran Baskaran

unread,

Apr 7, 2021, 2:33:37 PM4/7/21

to Sureshkannan, Shyam Kumar, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Why not make it a single service that can bootstrap multiple asics? If we make this a per-asic service, we are back to the question of creating all these services and running them unnecessarily on systems where the asics will never show up. (I understand it's a lightweight service). I am trying to understand the reason to make this a per-asic service vs a global service.

Thanks,

Eswaran

Sureshkannan

unread,

Apr 8, 2021, 2:41:17 AM4/8/21

to Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Given its disaggregated management of each asic, in my view each asic related work can be done independently and nicely fit with existing multi asic design of using systemd generator, systemd dependencies. I like the idea of using systemd infrastructure of Linux as much as possible and that’s what done for existing multi asic.

asic-bootstrap is light weight and distributed service and deals with per asic rather than across asic. By doing per asic, we already take advantage of per asic config, per asic dependencies, per asic monitoring.

Thanks

Suresh

Sureshkannan

unread,

Apr 8, 2021, 10:30:57 AM4/8/21

to Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

asic-bootstrap@n will not be ‘active’ until asic is online and hence all dependent services(swss@n, syncd@n, etc) will be not spawned. This dependency is taken care by systemd.

Thanks

Suresh

Sureshkannan

unread,

Apr 26, 2021, 10:20:18 AM4/26/21

to Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Hi All,

We(Nokia) are working on prototype version of asic-bootstrap.service and will send it as PR this week. We can discuss further with PR.

Idea was to use systemd notify.

notify: This indicates that the service will issue a notification when it has finished starting up. The systemd process will wait for this to happen before proceeding to other units.

Thanks

Suresh

Eswaran Baskaran

unread,

Apr 26, 2021, 5:59:43 PM4/26/21

to Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Hi All,

Arista team (Ngoc Do) has posted the following two PRs

1. https://github.com/Azure/sonic-platform-common/pull/185/files A new platform API that returns the list of ASICs with the associated PCI addresses of each device.

2. https://github.com/Azure/sonic-platform-daemons/pull/175/files The chassisd changes to invoke this platform API and use the result to publish the ASIC info in the CHASSIS_ASIC_TABLE.

Nokia team, can you check if the contents of the CHASSIS_ASIC_TABLE can be used to drive the asic-bootstrap.service ?

Thanks,

Eswaran

Manjunath Prabhu

unread,

Apr 29, 2021, 6:31:03 PM4/29/21

to Eswaran Baskaran, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Hi,

In continuation to the proposal on this thread and discussions we had yesterday, Nokia team has posted the below PR

https://github.com/Azure/sonic-buildimage/pull/7477 [systemd] bootstrap service for pluggable fabric card on VOQ chassis

Thanks,

Manju

To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CA%2BrWxa3a5pBtXH59CYgH6utn3CkZy-kbdbPyUkeD9FQcE0W1yA%40mail.gmail.com.

Manjunath Prabhu

unread,

May 7, 2021, 1:28:18 PM5/7/21

to Eswaran Baskaran, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Hi,

I have updated the PR with the below options. https://github.com/Azure/sonic-buildimage/pull/7477 has 2 commits.

Commit1 is for Option1. We got some comments from the Msft team about dependencies of mgmt dockers. Waiting to hear from them if we need to have those dependencies.
Commit2 has delta changes required to support Option2.

Option1 (Hard dependencies)
Relying on systemd After=, Before= hard dependency features to make sure that swss@->syncd@ services come up only after the asic has been detected as Online. However, services like snmp have After=swss which creates a hard dependency on the service to be started.
We have to remove After=swss/syncd tag from snmp, telemetry, mgmt-framework services.

Option2 (Soft dependencies)
Use mask/unmask to control asic services.

For swss/syncd services, on every boot, systemd-generator will mask these services only on the Supervisor card. When asic is detected by pmon, bootstrap-asic service will unmask swss/syncd for that asic.
(With this option, we may not require a per-asic bootstrap_asic.service. We can just have a single one)

Thanks,

Manju

Shyam Kumar

unread,

May 7, 2021, 1:54:12 PM5/7/21

to Manjunath Prabhu, Eswaran Baskaran, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Thanks Manju for the update.

W.R.T options2:

option 2 is what we started the discussion with i.e. it has challenges/issues.

- Vendor supports multiple platforms/variants of Modular/Distributed chassis. As a result, this enforces us to put platform_hook to determine NUM_ASICS based on chassis_type. systemd-generator to invoke such platform hook.

- config reload and other kind of warm/soft reload may run into issue(s). It'd try to stop and restart all instantiated srvs for all asics. Making an attempt to start masked srvs would lead to failure(s) and in turn config reload etc. operation failure.

- Run-time FC (and/or) ASIC failure handling would be another challenge to assess!

Masking the services at systemd-generator on system bring-up doesn't gel well with run-time handling of masking/unmasking services.

Instead, having a COMMON approach for ALL scenarios - be it system bring-up or run-time FC/ASIC failure or reload (config/warm/power-cycle) - serves better (i.e. asic-bootstrap option we discussed couple of weeks ago).

- other concern(s) mentioned at the start of this email thread

Regards,

Shyam

Eswaran Baskaran

unread,

May 7, 2021, 2:48:45 PM5/7/21

to Shyam Kumar, Manjunath Prabhu, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Shyam,

I am not sure if I understand option 2 the same way you did. The masking/unmasking of the systemd services will happen from the bootstrap service, but systemd-generator will generate the service files for MAX_ASICs just like today and like in option 1. If we go with option2, the overall solution would look like

1. Platform vendor implements the pmon API defined in https://github.com/Azure/sonic-platform-common/pull/185/files

2. CHASSIS_ASIC_TABLE will be populated with the ASIC info by pmon/chassisd - this is common code.

3. Based on the data in CHASSIS_ASIC_TABLE, the bootstrap service will mask/unmask the service (commit2 in https://github.com/Azure/sonic-buildimage/pull/7477)

I prefer option2 over option1 because option1 will have other implications on service dependency as Manju described that will require more changes to the system and the benefit of option1 over option2 is not super clear to me. I believe the mask/unmask approach will apply to all situations in general - run-time failure handling, OIR, etc.

Thanks,

Eswaran

Sureshkannan

unread,

May 9, 2021, 2:10:41 AM5/9/21

to Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Manjunath Prabhu, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Just to make sure Option 2 is explained fully here, because Nokia ran into a few issues with Issue 1.

Option 2:

Systemd generator generates MAX_ASICS service unit files and it will keep swss, syncd services mask disabled on VOQ supervisor card based (better to add option in asic.conf to say if its VOQ supervisor card)
asic-bootstrap.service (single service for entire supervisor card, and it's not multi unit service) that listens to pmon state table (CHASSIS_ASIC_TABLE) and unmasks swss,syncd services once fabric asic found be present.
While 1, 2 covers bootstrap, failure scenarios and unmasking can be done from bootstrap_asic.py upon asic found to not-present (sudden plug out), runtime failure even though swss, syncd might have crashed by then. But, asic-bootstrap will make sure unmask has happened upon asic went offline.
To make PCI-ID mapping more of a system overwritten(even though user can configure it), bootstrap_asic.py will update DEVICE_METADATA:asic_id with PCI-ID information coming from CHASSIS_ASIC_TABLE before unmasking swss,syncd.

The question we have is, PCI-ID info being written to ConfigDB by bootstrap_asic.py. Is that OK in sonic to update user supplied config(DEVICE_METADATA:asic_id) by runtime service?

Thanks,

Suresh

Eswaran Baskaran

unread,

May 10, 2021, 12:28:42 PM5/10/21

to Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Manjunath Prabhu, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

On Sat, May 8, 2021 at 11:10 PM Sureshkannan <suresh...@gmail.com> wrote:

Just to make sure Option 2 is explained fully here, because Nokia ran into a few issues with Issue 1.

Option 2:
Systemd generator generates MAX_ASICS service unit files and it will keep swss, syncd services mask disabled on VOQ supervisor card based (better to add option in asic.conf to say if its VOQ supervisor card)
asic-bootstrap.service (single service for entire supervisor card, and it's not multi unit service) that listens to pmon state table (CHASSIS_ASIC_TABLE) and unmasks swss,syncd services once fabric asic found be present.
While 1, 2 covers bootstrap, failure scenarios and unmasking can be done from bootstrap_asic.py upon asic found to not-present (sudden plug out), runtime failure even though swss, syncd might have crashed by then. But, asic-bootstrap will make sure unmask has happened upon asic went offline.
To make PCI-ID mapping more of a system overwritten(even though user can configure it), bootstrap_asic.py will update DEVICE_METADATA:asic_id with PCI-ID information coming from CHASSIS_ASIC_TABLE before unmasking swss,syncd.
The question we have is, PCI-ID info being written to ConfigDB by bootstrap_asic.py. Is that OK in sonic to update user supplied config(DEVICE_METADATA:asic_id) by runtime service?

Perhapd swss can read CHASSIS_ASIC_TABLE directly to get the PCI_ID instead of reading it from DEVICE_METADATA? Will that be better?

Sureshkannan

unread,

May 10, 2021, 3:24:01 PM5/10/21

to Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Manjunath Prabhu, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

That would work as well. Orchagent can read CHASSIS_ASIC_TABLE before doing sai->switch_create().

Thanks,

Suresh

Manjunath Prabhu

unread,

May 10, 2021, 9:33:42 PM5/10/21

to Sureshkannan, Eswaran Baskaran, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Update: https://github.com/Azure/sonic-buildimage/pull/7477

With the latest commit, this is what we have identified:

bootstrap-asic service by itself works well for boot, asic online/offline scenarios
There is no need for any changes in systemd-sonic-generator if bootstrap-asic service can mask all swss/syncd services when it initially runs.
hostcfgd conflicts with bootstrap-asic service.

This is the sequence of events for #3:

The feature table has swss and syncd enabled (from init_cfg.json).
Bootstrap service would have masked the swss/syncd services and would be waiting for pmon.
hostcfgd will then go ahead and enable it for all asics. This service runs slower even if it was started before boostrap-service.

So, we will need a mechanism in hostcfgd to skip certain services on supervisor cards. In the current commit, I have hardcoded to skip swss, syncd to qualify the changes. However, we need to find a cleaner approach.

Hi Msft-team,

Could you please weigh in on the hostcfg changes, if that is the way to proceed?

Thanks,

Manju

Eswaran Baskaran

unread,

May 11, 2021, 5:10:22 PM5/11/21

to Manjunath Prabhu, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Hi Manju,

Can we change the bootstrap-asic service to write to the FEATURE_TABLE as well? hostcfgd does react to the contents of the FEATURE_TABLE and then they can both be in sync?

Thanks,

Eswaran

Manjunath Prabhu

unread,

May 11, 2021, 5:21:05 PM5/11/21

to Eswaran Baskaran, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Ideally, I would want to avoid another service touching the FEATURE_TABLE. There is already, hostcfgd and then the config-cli itself.

If we are planning to come up with a mechanism (or already have one) where certain services like BGP etc are not acted upon by hostcfgd on the supervisor, then we can use the same for swss and syncd.

Thanks,

Manju

Eswaran Baskaran

unread,

May 11, 2021, 5:26:24 PM5/11/21

to Manjunath Prabhu, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

AIUI, hostcfgd reads FEATURE_TABLE and config-cli writes to it. It seems reasonable to me that FEATURE_TABLE can have multiple writers and it reflects the state of which services are left ON/OFF and hostcfgd executes the desired behavior. However, I am fine with alternative solutions and certainly, we should get more feedback from the Sonic experts.

Eswaran Baskaran

unread,

May 12, 2021, 1:53:31 PM5/12/21

to Manjunath Prabhu, Abhishek Dosi, Sureshkannan, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Abhishek/Judy/Suresh,

We talked about setting asic.conf per-hwsku to deal with the same supervisor module running in a 4-slot or a 8-slot chassis. I realized later why that doesn't work. asic.conf is needed by systemd-generator much earlier than the config is parsed, so the hwsku is unknown at this time. We haven't yet solved this problem of setting NUM_ASICs correctly. If we let it be MAX_ASICs, we will start all swss and syncd containers - otherwise, snmp will not start.

Overall solution -

1. asic.conf uses MAX_POSSIBLE_ASICs in the chassis

2. pmon populates the actual number of asics and their PCI addresses in CHASSIS_TABLE

3. All swss/syncd services start and get blocked waiting on the PCI ID of that ASIC.

This results in lots of services that don't actually fully start in steady state. While this is not great, there doesn't seem to be a solution without fixing snmp code to not rely on swss services to start. Please let me know if I missed anything here.

Thanks,

Eswaran

Sureshkannan

unread,

May 12, 2021, 3:12:20 PM5/12/21

to Eswaran Baskaran, Manjunath Prabhu, Abhishek Dosi, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

Hi,

When HWSKU is configured/changed in DEVICE_METADATA by the user, does it require sonic to be restarted?.

How about soft link <ONIE-PLATFORM>/asic.conf to <ONIE-PLATFORM>/<HWSKU>/asic.conf upon HWSKU change?

Thanks,

Suresh

Eswaran Baskaran

unread,

May 12, 2021, 3:37:53 PM5/12/21

to Sureshkannan, Manjunath Prabhu, Abhishek Dosi, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, Shyam Kumar, mlorrillere, sonic-chassis-subgroup, staphylo

On Wed, May 12, 2021 at 12:12 PM Sureshkannan <suresh...@gmail.com> wrote:

Hi,

When HWSKU is configured/changed in DEVICE_METADATA by the user, does it require sonic to be restarted?.

No, Sonic does not restart when hwsku is configured.

Shyam Kumar

unread,

May 18, 2021, 10:35:14 PM5/18/21

to Eswaran Baskaran, Sureshkannan, Manjunath Prabhu, Abhishek Dosi, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Agree so far with what Easwaran mentioned "On Wed, May 12, 2021 at 10:53 AM"

We discussed and agreed upon those line until we found snmp issue (hard-dependency on swss)

Couple of things to check:

[1] Can someone please mention (help recollect/refresh) the reason for snmp, telemetry dependency on swss?

IMO, these services are required when the stage is set (i.e. basic system bring-up is through).

Would it be appropriate (from workflow standpoint) to defer spawning them 'After' swss? or is there any downside, if any?

[2] hostcfgd, bootstrap-asic service and interface to FEATURE_TABLE

Anshu has scheduled for the sync-up on this in tomorrow's 9 am call. Let's discuss it further

Thanks,

Shyam

Eswaran Baskaran

unread,

May 19, 2021, 3:19:30 PM5/19/21

to Shyam Kumar, Sureshkannan, Manjunath Prabhu, Abhishek Dosi, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, sonic-chassis-subgroup, staphylo

Meeting summary 5/19/2021

1. We prefer not to run any unnecessary docker containers because it increases memory usage.

2. While it's not preferable, it's okay to create and run unnecessary services because they could be lightweight.

3. If we want to avoid swss/syncd dockers from running forever for chips that will never show up, we have to a) check for the presence of the chip in swss.sh:start() and not create the docker, b) check for the presence of the chip in swss.sh:wait() before creating the peer services and c) create the swss docker and peer dockers after the chip shows up.

4. Ideally, we wouldn't even create the database dockers for chips that will never show up. This can be done if the number of chips in this chassis type is known to systemd-sonic-generator. We could have a new platform API that could be invoked by systemd-sonic-generator to get the chassis type and use the correct NUM_ASIC. Presumably, this can be done by the platform code by reading the chassis EEPROM and there is a risk that this may not be available at the time of systemd-sonic-generator invocation. This part could be considered in the next phase.

Please let me know if I missed anything here.

In terms of action items, are there any volunteers to propose the code for #3 above (the changes to the swss.sh start() and wait() scripts) ?

Thanks,

Eswaran

Shyam Kumar

unread,

May 26, 2021, 2:20:34 PM5/26/21

to Eswaran Baskaran, mehra...@gmail.com, sonic-chassis-subgroup, rita...@gmail.com, Sureshkannan, Manjunath Prabhu, Abhishek Dosi, Anshu Verma, Arvindsrinivasan Lakshmi Narasimhan, Judy Joseph, Judy Joseph, Ngoc Do, mlorrillere, staphylo

Summarizing today's (05/26) sonic-chassis WorkGroup discussion wrt this issue (handling multi-NPU ASICs):

Following would be the order/priority of the work-items/PRs:

1. Add a template function that returns list of asics on module

https://github.com/Azure/sonic-platform-common/pull/185/

Once this is committed, each platform vendor in turn to provide support for get_all_asics() API

2. Collect asic info and store in CHASSIS_STATE_DB

https://github.com/Azure/sonic-platform-daemons/pull/175/files

3. https://github.com/Azure/sonic-buildimage/pull/7477

This PR is already open and is in works. Per today's sync-up, plan is to modify this PR according to the revised/agreed-upon proposal.

i.e. bootstrap-asic.py script in here could be leveraged to listen to CHASSIS_STATE_DB before starting the docker.

swss.sh changes and invocation if this script would be covered as part of this PR.

Note- Please do check w.r.t. syncd, teamd etc. services - i.e. these services spawning too to be put on hold until ASIC/NPU is detected ONLINE.

Also refer to Eswaran's e-mail dated Wednesday, May 19, 2021, point #3.

Decided to close https://github.com/Azure/sonic-buildimage/pull/7621 as 7477 would cater to required changes/enhancement.

4. Please file a new git/PR to track:

Determine MAX_ASICS_POSSIBLE on a given chassis_type at runtime (via a platform hook to systemd-sonic-generator).

This would avoid creating unnecessary DBs from get go (system boot up).

Also refer to Eswaran's's e-mail dated Wednesday, May 19, 2021, point #4.

5. config handling for Modular chassis based system

Presently, mini-graph has no info for this for Modular chassis based system.

So, it's to be downloaded to the box/Supervisor and then made effective via config reload.

Discuss/brain-storm in next session as to what all can be done to handle this in a better way.

Anand to add further details and findings etc.

Please update if anything missed

Thanks,

Shyam

On Wed, May 19, 2021 at 3:42 PM Shyam Kumar <shy...@gmail.com> wrote:

+ Anand to keep him in the loop with the latest discussion/update on email wrt workflow and work-items

On Wed, May 19, 2021 at 3:26 PM Eswaran Baskaran <esw...@arista.com> wrote:

On Wed, May 19, 2021 at 2:00 PM Shyam Kumar <shy...@gmail.com> wrote:
Thanks Easwaran for summarizing meeting discussion

w.r.t #3 is the same as one of the proposals I initially shared at the start of this email thread.
Anand worked on this and came up with following PRs:
- platform hook in swss.sh : https://github.com/Azure/sonic-buildimage/pull/7621
- another PR which is under platform cisco where it deals with platform hook implementation (NPU detection etc.)
Cisco can take this up further.
Also, we need to check for 'config reload' and other relevant use-cases with this approach.

Thanks Shyam. The posted PR needs to be modified in 2 ways
1. Avoid creating the docker when the chip is present.
2. Use platform independent mechanisms to check for chip presence. This PR stashes the ASIC PCI address in CHASSIS_STATE_DB and that can be used as an indication for chip presence (https://github.com/Azure/sonic-platform-daemons/pull/175/).

Thanks,
Eswaran

Hi Rita,
>> #fabric cards actually existing is less than expected fabric cards in chassis (this can be the max fabric cards for a give chassis sku), for first phase, we may consider the chassis is bad, and leave the chassis as isolated until the fabric card is replaced.

[1] Even for the first phase, I am not sure that all MAX FCs would be populated as the system can still run with required bandwidth etc. with <MAX_possible_FCs for a given chassis.
Taking Cisco chassis (as use case here), IIRC, we decided for 5 or 6 out of max 8.
Shall we double check on this?
[2] a) Another big thing is, for EFT personnel (at customer/MSFT lab) to validate chassis along with SONiC image, there may NOT be all MAX_FCs populated in the chassis or one/two may not come up.Rendering the entire chassis not-usable for 1 or 2 FCs not present/coming-up might not fit their test suite validation.
b) Internally, for platform vendor, it's difficult (near impossible) to have MAX FCs populated in many of the validation and development test-setups.
They would go with a bunch of FCs in many setups and near-max FCs in many test setups and only few setups with MAX FCs inserted (which does validation prior to image release) etc.
Bringing the chassis down won't help. Each platform would start adding their own patch to mitigate this; which would bring us back to the original ask (can FCs detection be dynamically handled instead of assuming max for a given chassis type?)

Thanks,
Shyam

On Wed, May 19, 2021 at 12:50 PM <rita...@gmail.com> wrote:
Thank you Eswaran.

One point I thought of after meeting is, if a fabric card is bad/missing in a chassis, i.e. #fabric cards actually existing is less than expected fabric cards in chassis (this can be the max fabric cards for a give chassis sku), for first phase, we may consider the chassis is bad, and leave the chassis as isolated until the fabric card is replaced.

Thanks,
Rita

Thanks,
Suresh

Thanks,
Shyam

Thanks,
Eswaran

Hi Shyam

Regarding

Thanks,
Shyam

Existing service start sequence

a.    Systemd generator creates service files for MAX_ASICs (based on asic.conf)
b.    All database services run first
                                                                                                                                                    i.Host database service runs first
                                                                                                                                                   ii.Per-ASIC database service runs first
                                                                                                                                                  iii.pmon starts in parallel after host database starts
c.     Minigraph is parsed and config db is updated
d.    Other ASIC services are started
e.    Hostcfgd is operational to react to FEATURE table and mask/unmask services

To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CA%2BrWxa07n-vuVPkPr0TojoeP-DOyLjd-2yWpQm2pP-6jNZq1Cw%40mail.gmail.com.

kamal kumar

unread,

May 26, 2021, 3:17:15 PM5/26/21

to sonic-chass...@googlegroups.com

Hi All,

I know it is still under coding phase for Modular chassis based systems, I am wondering if there is any product available that you guys are working on for multi-asic design/testing or how you guys are testing?

Regards,

Kamal.

To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CAE_Gcs91WK52tS6ptdE-6sLMhfLSoOGyuxnCwtJsr-Ck7yQsVQ%40mail.gmail.com.

Rita Hui

unread,

May 26, 2021, 5:03:31 PM5/26/21

to kamal kumar, sonic-chass...@googlegroups.com

Hi Kamal,

I believe the participants here have hardware platforms containing multiple ASICs, that are being used for testing.

Thanks,

Rita

To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CAGQTY9j9KcgFdjy9sHYE2-c1M5mTA_2DAP-2jnGjdisjgcTp1A%40mail.gmail.com.

Heidi net

unread,

Jun 7, 2021, 7:55:15 AM6/7/21

to 'Rita Hui' via sonic-chassis-subgroup

Hello, I'm currently on leave with limited access to my email. I will reply to your email as soon as possible. Thank you.

> 1. bootstrap-asic service by itself works well for boot, asic online/offline scenarios
> 2. There is no need for any changes in systemd-sonic-generator if bootstrap-asic service can mask all swss/syncd services when it initially runs.
> 3. hostcfgd conflicts with bootstrap-asic service.

> This is the sequence of events for #3:

> 1. The feature table has swss and syncd enabled (from init_cfg.json).
> 2. Bootstrap service would have masked the swss/syncd services and would be waiting for pmon.
> 3. hostcfgd will then go ahead and enable it for all asics. This service runs slower even if it was started before boostrap-service.

> So, we will need a mechanism in hostcfgd to skip certain services on supervisor cards. In the current commit, I have hardcoded to skip swss, syncd to qualify the changes. However, we need to find a cleaner approach.
>
> Hi Msft-team,
> Could you please weigh in on the hostcfg changes, if that is the way to proceed?
>
> Thanks,
> Manju
>
> On Mon, 10 May 2021 at 12:23, Sureshkannan <suresh...@gmail.com> wrote:
> That would work as well. Orchagent can read CHASSIS_ASIC_TABLE before doing sai->switch_create().
>
> Thanks,
> Suresh
>
>
> On Mon, May 10, 2021 at 9:28 AM Eswaran Baskaran <esw...@arista.com> wrote:
> On Sat, May 8, 2021 at 11:10 PM Sureshkannan <suresh...@gmail.com> wrote:
> Just to make sure Option 2 is explained fully here, because Nokia ran into a few issues with Issue 1.
>
> Option 2:

> 1. Systemd generator generates MAX_ASICS service unit files and it will keep swss, syncd services mask disabled on VOQ supervisor card based (better to add option in asic.conf to say if its VOQ supervisor card)
> 2. asic-bootstrap.service (single service for entire supervisor card, and it's not multi unit service) that listens to pmon state table (CHASSIS_ASIC_TABLE) and unmasks swss,syncd services once fabric asic found be present.
> 3. While 1, 2 covers bootstrap, failure scenarios and unmasking can be done from bootstrap_asic.py upon asic found to not-present (sudden plug out), runtime failure even though swss, syncd might have crashed by then. But, asic-bootstrap will make sure unmask has happened upon asic went offline.
> 4. To make PCI-ID mapping more of a system overwritten(even though user can configure it), bootstrap_asic.py will update DEVICE_METADATA:asic_id with PCI-ID information coming from CHASSIS_ASIC_TABLE before unmasking swss,syncd.
> 1. The question we have is, PCI-ID info being written to ConfigDB by bootstrap_asic.py. Is that OK in sonic to update user supplied config(DEVICE_METADATA:asic_id) by runtime service?

> 1. Commit1 is for Option1. We got some comments from the Msft team about dependencies of mgmt dockers. Waiting to hear from them if we need to have those dependencies.
> 2. Commit2 has delta changes required to support Option2.

> 1. asic.conf can have MAX_ASIC as per vendor platform.
> 2. systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
> 3. fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)
> 1. fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
> 2. fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.

> [Shyam] Does "blocking the service state become active until slot/asic is online" implies swss, syncd services under each fabric-asi...@n.service would get spawned prior to checking FC/ASIC presence and these swss, syncd services would be put on hold until 'ASIC online' is notified? also implies their respective containers won't be spawned?
> This means all (MAX) ASICs swss, syncd services started even if FC (and hence ASIC) were absent/not-inserted?
>
> In that case, I'd recommend NOT-enabling these services (swss, syncd) at all until ASIC online status is notified to fabric-asi...@n.service. This would keep the bring-up, going-down workflow simpler, comprehensible and less complex while debugging.
> Can we look into this?
> Suresh> only when fabric-asic-bootstrap service is active, systemd will spawn other services. fabric-asic-bootstrap service won't be active until fc/asic is online.
>

> 1.
> 1. Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.

> Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?
> Suresh>  CHASSIS_STATE_TABLE has to be enhanced for this purpose. Please feel free to make the changes. We could help with review.
> Suresh> When we added this, there was no requirement to make PCI-ID to be discovered and allow non static PCI-ID. In my view, as such it's still a vendor platform specific requirement. SONiC already allows PCI-ID to be configured and hence operators/users can still configure if that's what is desired.
> [Shyam] can you please update/confirm on following queries (or we can take a note/AI to check them)
> a) fabric-asi...@n.service has NO dependency whatsoever on any other service/daemon of the system and vice-versa?
> Suresh> swss, syncd is depends on fabric-asic-bootstrap service (ie. after=fabric-asic-bootstrap.service)
> b) Each of fabric-asi...@n.service is a standalone service/utility for that ASIC from the time its detected until it goes offline?
> Suresh> yes.
> c) For FCs not preset/inserted in the chassis, fabric-asi...@n.service would keep running indefinitely? and causes no-harm/ side-effect(s)?
> Suresh> fabric-asic-bootstrap.service will be running but sleeping for the event from chassis-d for card/asic to be online.
> d) None of the operations - config-reload, system-daemon-reload, chassis reload - are impacted when FC (or NPU) is turned-off/not found on going down or coming up path
> Suresh> I'm not fully understand the question.  fabric-asic-bootstrap.service that is spawned for FC(which is not inserted/not-found/turned-off), will not have any impact on existing operations.
>
>
> Ok, we (Arista) will take a stab at defining this.
>

> 1. global database container (config_db) can have per FC admin enable/disable.
> 2. if admin disabled, as part of configuration handling,
> 1. disable fabric-asi...@n.service and all other per asic services.

> Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?
> Suresh> Looking at currently implemented design pattern references in SONiC, I think these are called xyz-cfgmgr (i.e nbrmgr) but it's still inside SWSS containers. One can have multi-asic-cfgmgr running on host namespace. But it's not really needed to have a cfgmgr for operations like asic shutdown/startup alone. I would propose to enhance config reload(python scripts) to take care of shutdown/startup of an asic instance and update config-db as part of this cli interface. It's a cli interface implementation or part of minigraph handling. I don't know the exact current implementation of config reload but it seems like a good fit.
>
> The expectation would be for the user to turn off the card and do 'config reload' on the supervisor for the services to disappear. That's reasonable. In my opinion though, if we get the fabric-asic-bootstrap service done right, it can disable the swss/syncd services for fabric cards that disappear (using the existing FEATURE table mechanism). I think we can start solving the first part of the problem (CHASSIS_TABLE schema, pmon API, fabric bootstrap service) and revisit this.
> [Shyam] Besides user-configurable FC shutdown operation and config reload, there could/would be a case of FC ASIC hitting fault.
> This may happen at runtime and system (SW) to act upon by shutting down the FC.
> wouldn't the H-L flow is like - PMON/chassisD taking action and updating CHASSIS_STATE_TABLE about FC and all its ASICs offline.
> Suresh> Fault monitoring of FC card/asic isn't out of scope at this point.In my view, It depends on vendor specific. Certainly pmond can do fault monitoring of a FC asic and notify fabric-asic-bootstrap.service via chassis_state_db with state=fault,. That will make fabric-asic-bootstrap.service to shutdown swss,syncd and itself.
> Each of the ASIC state is in turn notified to fabric-asi...@n.service (the one who is subscribed for this). This in turn would disable all services like swss@n; syncd@n etc. and then self.
> IMO - be it user/admin-configured FC enable/disable or system initiated (due to fault/otherwise) - all should have a common workflow.
> In that case, how and where does the FEATURE table mechanism fit?
> Suresh> In my view, the FEATURE table isn't much useful for fabric asic handling. fabric-asic-bootstrap.service will be single controller of activating, deactivating all fabric related asic dockers (i.e swss, syncd, fabric-asic-bootstrap.service)
>
> Thanks,
> Shyam
>
>
> Thanks,
> Eswaran

> 1. if admin enabled,
> 1. fabrice-asi...@n.service will be waiting until asic is online.

> 1. Syncd might start too early before the PCI device can be discovered (the PCI device may not show up in the device tree) and this could lead to the ASIC not coming up because syncd doesn’t start after
> 2. 3 crashes.
> 3. The actual PCI address may not be known for the device by the time swss/syncd starts.

> To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/SJ0PR21MB198232E91EF30800C67256DD95249%40SJ0PR21MB1982.namprd21.prod.outlook.com.

Heidi net

unread,

Jun 7, 2021, 8:38:48 AM6/7/21

to 'Eswaran Baskaran' via sonic-chassis-subgroup

Hello, I'm currently on leave with limited access to my email. I will reply to your email as soon as possible. Thank you.

> 1. bootstrap-asic service by itself works well for boot, asic online/offline scenarios
> 2. There is no need for any changes in systemd-sonic-generator if bootstrap-asic service can mask all swss/syncd services when it initially runs. >
> 3. hostcfgd conflicts with bootstrap-asic service.

> This is the sequence of events for #3:

> 1. The feature table has swss and syncd enabled (from init_cfg.json).
> 2. Bootstrap service would have masked the swss/syncd services and would be waiting for pmon.  >
> 3. hostcfgd will then go ahead and enable it for all asics. This service runs slower even if it was started before boostrap-service. >

> So, we will need a mechanism in hostcfgd to skip certain services on supervisor cards. In the current commit, I have hardcoded to skip swss, syncd to qualify the changes. However, we need to find a cleaner approach.
>
> Hi Msft-team,
> Could you please weigh in on the hostcfg changes, if that is the way to proceed?
>
> Thanks,
> Manju
>
> On Mon, 10 May 2021 at 12:23, Sureshkannan <suresh...@gmail.com> wrote:
> That would work as well. Orchagent can read CHASSIS_ASIC_TABLE before doing sai->switch_create().
>
> Thanks,
> Suresh
>
>
> On Mon, May 10, 2021 at 9:28 AM Eswaran Baskaran <esw...@arista.com> wrote:
> On Sat, May 8, 2021 at 11:10 PM Sureshkannan <suresh...@gmail.com> wrote:
> Just to make sure Option 2 is explained fully here, because Nokia ran into a few issues with Issue 1.
>
> Option 2:

> 1. Systemd generator generates MAX_ASICS service unit files and it will keep swss, syncd services mask disabled on VOQ supervisor card based (better to add option in asic.conf to say if its VOQ supervisor card) >
> 2. asic-bootstrap.service (single service for entire supervisor card, and it's not multi unit service) that listens to pmon state table (CHASSIS_ASIC_TABLE) and unmasks swss,syncd services once fabric asic found be present.
> 3. While 1, 2 covers bootstrap, failure scenarios and unmasking can be done from bootstrap_asic.py upon asic found to not-present (sudden plug out), runtime failure even though swss, syncd might have crashed by then. But, asic-bootstrap will make sure unmask has happened upon asic went offline.
> 4. To make PCI-ID mapping more of a system overwritten(even though user can configure it), bootstrap_asic.py will update DEVICE_METADATA:asic_id with PCI-ID information coming from CHASSIS_ASIC_TABLE before unmasking swss,syncd.
> 1. The question we have is, PCI-ID info being written to ConfigDB by bootstrap_asic.py. Is that OK in sonic to update user supplied config(DEVICE_METADATA:asic_id) by runtime service?

> 1. Commit1 is for Option1. We got some comments from the Msft team about dependencies of mgmt dockers. Waiting to hear from them if we need to have those dependencies. >
> 2. Commit2 has delta changes required to support Option2. >

> 1. asic.conf can have MAX_ASIC as per vendor platform.
> 2. systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
> 3. fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)
> 1. fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
> 2. fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.

> [Shyam] Does "blocking the service state become active until slot/asic is online" implies swss, syncd services under each fabric-asi...@n.service would get spawned prior to checking FC/ASIC presence and these swss, syncd services would be put on hold until 'ASIC online' is notified? also implies their respective containers won't be spawned?
> This means all (MAX) ASICs swss, syncd services started even if FC (and hence ASIC) were absent/not-inserted?
>
> In that case, I'd recommend NOT-enabling these services (swss, syncd) at all until ASIC online status is notified to fabric-asi...@n.service. This would keep the bring-up, going-down workflow simpler, comprehensible and less complex while debugging.
> Can we look into this?
> Suresh> only when fabric-asic-bootstrap service is active, systemd will spawn other services. fabric-asic-bootstrap service won't be active until fc/asic is online.
>
>

> 1. Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.

> Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?
> Suresh>  CHASSIS_STATE_TABLE has to be enhanced for this purpose. Please feel free to make the changes. We could help with review.
> Suresh> When we added this, there was no requirement to make PCI-ID to be discovered and allow non static PCI-ID. In my view, as such it's still a vendor platform specific requirement. SONiC already allows PCI-ID to be configured and hence operators/users can still configure if that's what is desired.
> [Shyam] can you please update/confirm on following queries (or we can take a note/AI to check them)
> a) fabric-asi...@n.service has NO dependency whatsoever on any other service/daemon of the system and vice-versa?
> Suresh> swss, syncd is depends on fabric-asic-bootstrap service (ie. after=fabric-asic-bootstrap.service)
>
> b) Each of fabric-asi...@n.service is a standalone service/utility for that ASIC from the time its detected until it goes offline?
> Suresh> yes.
>
> c) For FCs not preset/inserted in the chassis, fabric-asi...@n.service would keep running indefinitely? and causes no-harm/ side-effect(s)?
> Suresh> fabric-asic-bootstrap.service will be running but sleeping for the event from chassis-d for card/asic to be online.
>
> d) None of the operations - config-reload, system-daemon-reload, chassis reload - are impacted when FC (or NPU) is turned-off/not found on going down or coming up path
> Suresh> I'm not fully understand the question.  fabric-asic-bootstrap.service that is spawned for FC(which is not inserted/not-found/turned-off), will not have any impact on existing operations.
>
>
>
> Ok, we (Arista) will take a stab at defining this.
>

> 1. global database container (config_db) can have per FC admin enable/disable.
> 2. if admin disabled, as part of configuration handling,
> 1. disable fabric-asi...@n.service and all other per asic services.

> Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?
> Suresh> Looking at currently implemented design pattern references in SONiC, I think these are called xyz-cfgmgr (i.e nbrmgr) but it's still inside SWSS containers. One can have multi-asic-cfgmgr running on host namespace. But it's not really needed to have a cfgmgr for operations like asic shutdown/startup alone. I would propose to enhance config reload(python scripts) to take care of shutdown/startup of an asic instance and update config-db as part of this cli interface. It's a cli interface implementation or part of minigraph handling. I don't know the exact current implementation of config reload but it seems like a good fit.
>
> The expectation would be for the user to turn off the card and do 'config reload' on the supervisor for the services to disappear. That's reasonable. In my opinion though, if we get the fabric-asic-bootstrap service done right, it can disable the swss/syncd services for fabric cards that disappear (using the existing FEATURE table mechanism). I think we can start solving the first part of the problem (CHASSIS_TABLE schema, pmon API, fabric bootstrap service) and revisit this.
> [Shyam] Besides user-configurable FC shutdown operation and config reload, there could/would be a case of FC ASIC hitting fault.
> This may happen at runtime and system (SW) to act upon by shutting down the FC.
> wouldn't the H-L flow is like - PMON/chassisD taking action and updating CHASSIS_STATE_TABLE about FC and all its ASICs offline.
> Suresh> Fault monitoring of FC card/asic isn't out of scope at this point.In my view, It depends on vendor specific. Certainly pmond can do fault monitoring of a FC asic and notify fabric-asic-bootstrap.service via chassis_state_db with state=fault,. That will make fabric-asic-bootstrap.service to shutdown swss,syncd and itself.
> Each of the ASIC state is in turn notified to fabric-asi...@n.service (the one who is subscribed for this). This in turn would disable all services like swss@n; syncd@n etc. and then self.
> IMO - be it user/admin-configured FC enable/disable or system initiated (due to fault/otherwise) - all should have a common workflow.
> In that case, how and where does the FEATURE table mechanism fit?
> Suresh> In my view, the FEATURE table isn't much useful for fabric asic handling. fabric-asic-bootstrap.service will be single controller of activating, deactivating all fabric related asic dockers (i.e swss, syncd, fabric-asic-bootstrap.service)
>
> Thanks,
> Shyam
>
>
>
> Thanks,
> Eswaran

> 1. if admin enabled,
> 1. fabrice-asi...@n.service will be waiting until asic is online.

> 1. Systemd generator creates service files for MAX_ASICs (based on asic.conf) >
> 2. All database services run first >
> 1. Host database service runs first >
> 2. Per-ASIC database service runs first >
> 3. pmon starts in parallel after host database starts >
> 3. Minigraph is parsed and config db is updated >
> 4. Other ASIC services are started >
> 5. Hostcfgd is operational to react to FEATURE table and mask/unmask services >

>
> Why not start syncd/swss all the time?

> 1. Syncd might start too early before the PCI device can be discovered (the PCI device may not show up in the device tree) and this could lead to the ASIC not coming up because syncd doesn’t start after 3 crashes. >
> 2. The actual PCI address may not be known for the device by the time swss/syncd starts.  >

> To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CA%2BrWxa07n-vuVPkPr0TojoeP-DOyLjd-2yWpQm2pP-6jNZq1Cw%40mail.gmail.com.

Heidi net

unread,

Jun 7, 2021, 9:17:43 AM6/7/21