--
You received this message because you are subscribed to the Google Groups "sonic-chassis-subgroup" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sonic-chassis-sub...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CA%2BrWxa2%2BU1%2BJJNE6BKCFrC4Pc6E1PWFkYjRUtKwQskss%2Bzi1hA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CAAfdQhNQq0KQ1Sr9v%3D_acdEPJEhVsRdeykzToHy_uyHj5wFzVg%40mail.gmail.com.
Including Arvind to the email thread ..
Hi Eswaran
2/2:30 would be good for me as well
Regards,
Judy
To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CAAfdQhOcr5cR-NayucNQWtcgaP6yDXd9MK7fKQ_amVGZX9AqHQ%40mail.gmail.com.
Reserving slot to discuss following:
1. System starts up with NUM_ASIC set to MAX_ASICs. But, we change the systemd generator to only spawn the database service files and not the other service files.
2. PMON starts up and the chassisd daemon detects card presence, figures out which asics are present and populates a) the FEATURE table in config DB and b) some other config file describing the ASICs including its PCI address.
3. hostcfgd reacts to the FEATURE flag and creates the service files as necessary and enables the services.
4. Since the database containers were already up per-ASIC and the minigraph was already parsed into the per-ASIC config DB, the per-ASIC containers should be able to start up without issues.
________________________________________________________________________________
Microsoft Teams meeting
Join on your computer or mobile app
Or call in (audio only)
+1 323-849-4874,,774543066# United States, Los Angeles
Phone Conference ID: 774 543 066#
________________________________________________________________________________
_____________________________________________
From: Eswaran Baskaran <esw...@arista.com>
Sent: Friday, March 26, 2021 4:57 PM
To: Judy Joseph <judy....@gmail.com>
Cc: Sureshkannan <suresh...@gmail.com>; Anshu Verma <ans...@microsoft.com>; Arvindsrinivasan Lakshmi Narasimhan <Arvindsriniv...@microsoft.com>;
Judy Joseph <Judy....@microsoft.com>; Ngoc Do <ngo...@arista.com>; Shyam Kumar <shy...@gmail.com>; mlorrillere <mlorr...@arista.com>;
sonic-chassis-subgroup <sonic-chass...@googlegroups.com>; staphylo <stap...@arista.com>
Subject: Re: [EXTERNAL] Re: Multi-asic support for fabric ASICs
Existing service start sequence
Systemd generator creates service files for MAX_ASICs (based on asic.conf)
All database services run first
Host database service runs first
Per-ASIC database service runs first
pmon starts in parallel after host database starts
Minigraph is parsed and config db is updated
Other ASIC services are started
Hostcfgd is operational to react to FEATURE table and mask/unmask services
Why not start syncd/swss all the time?
Syncd might start too early before the PCI device can be discovered (the PCI device may not show up in the device tree) and this could lead to the ASIC not coming up because syncd doesn’t start after 3 crashes.
The actual PCI address may not be known for the device by the time swss/syncd starts.
Proposal
1. System starts up with NUM_ASIC set to MAX_ASICs. But, we change the
systemd generator to only spawn the database service files and not the
other service files.
Can systemd generators generate these service files but leave the services disabled? Can the per-ASIC databases also start disabled? No, we should have the per-ASIC databases start no matter what because the config load should happen even when the cards are not present yet. (config-setup service will load the db from either config_db.json or minigraph and this service needs all the database services to be up)
We can avoid this step if we can change the dependencies so that swss service can start after hostcfgd. This is an alternative design.
2. PMON starts up and the chassisd daemon detects card presence, figures
out which asics are present and populates a) the FEATURE table in config DB
and b) some other config file describing the ASICs including its PCI
Address.
Chassisd will be notified by the platform of the list of active ASICs (PCI devices are detected) for a given slot and publish that list into the existing STATE table.
Addendum: chassisd can, in addition, get the PCI address of these devices and populate the STATE table.
3. hostcfgd reacts to the FEATURE flag and creates the service files as
necessary and enables the services.
Can hostcfgd listen to the chassis STATE table and the FEATURE table content to turn the services on/off as needed for each ASIC?
Addendum: Can hostcfgd populate the CONFIG DB with the PCI address for these ASICs?
4. Since the database containers were already up per-ASIC and the minigraph
was already parsed into the per-ASIC config DB, the per-ASIC containers should be able to start up without issues.
Hi Suresh,>> Reason to inline with multi-asic solution, it's already solved multi asic problem in sonic. Things that are different for a modular system is that asic is hot swappable and pci-devid is not static.[Shyam] Beside ASIC being hot-swappable and pcie-id learning dynamic nature, number of ASICs detected (and effective) at boot-time/run-time make a lot of difference from what was decided earlier (statically spawn max possible etc.).spawning all services (and their containers) based on Static max possible is OK for LC/Fixed box where all ASIC/NPU comes up (and expected to remain up) as long as LC/ Fixed box is up/running.However, that doesn't seem to fit the bill for Modular chassis (with CPU-less FCs on RP/Supervisor)
>>So, my thinking is to enhance the multi-asic solution that is per asic sonic apps service files are created and started in systemd generated using MAX_ASIC and all service (.sh files started but not the container)... only syncd "container" has to be kept in waiting till asic to be online.[Shyam] Had few Qs (please refer to #2 in my earlier response).We need to take into account all these aspects from a system standpoint. Any gotcha with any of them (or any future point) won't be a fool-proof solution!Also, have an underlying Q: I don't see/foresee advantage of "spawning service w/o its container or leaving container in wait/hold state & then resume later" VS "spawning both service and its container together & die together". Latter would be cleaner, scalable, less error-prone considering bring-up/steady-state and failures/reload across systemI couldn't see any technical/architectural reason to follow existing workflow/mechanism
>> syncd.sh will be running but the container won't be created until it finds the asic (aka slot) found to be online.>>syncd.sh can wait like database.sh (linecard per asic database).. systemd state will be "activating" (not active). I already see the code below in database.sh[Shyam] This is more from an implementation standpoint but prior to that, IMHO, we need to list pros/cons of this vs what we decided, discussed per Monday sync-up>> asic removal/shutdown is more of user driven activity. # config asic shutdown <n>[Shyam] User performing ASIC removal/shutdown is a rare case.ASIC may hit SBE/MBE/parity or other kinds of errors while the system is operational carrying/transiting traffic.These may happen any time and SW (along with the underlying platform/HW) should be capable of handling all such faults.In such cases, the platform would ask NOS (SONiC) to take various possible actions - ASIC reset/shutdown config-reload/ whole-board reload/shutdown chassis reload etc.In short, All such cases should be dynamically detected and handled.
Just to summarize the overall steps/procedure.
- asic.conf can have MAX_ASIC as per vendor platform.
- systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
- fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)
- fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
- fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.
- Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.
- global database container (config_db) can have per FC admin enable/disable.
- if admin disabled, as part of configuration handling,
- disable fabric-asi...@n.service and all other per asic services.
Suresh,This should work overall. Some questions inline.On Wed, Mar 31, 2021 at 5:53 PM Sureshkannan <suresh...@gmail.com> wrote:Just to summarize the overall steps/procedure.
- asic.conf can have MAX_ASIC as per vendor platform.
- systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
- fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)
- fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
- fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.
- Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.
Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?
- global database container (config_db) can have per FC admin enable/disable.
- if admin disabled, as part of configuration handling,
- disable fabric-asi...@n.service and all other per asic services.
Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?
Answers inline belowSuresh>On Fri, Apr 2, 2021 at 9:49 AM Eswaran Baskaran <esw...@arista.com> wrote:Suresh,This should work overall. Some questions inline.On Wed, Mar 31, 2021 at 5:53 PM Sureshkannan <suresh...@gmail.com> wrote:Just to summarize the overall steps/procedure.
- asic.conf can have MAX_ASIC as per vendor platform.
- systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
- fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)
- fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
- fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.
- Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.
Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?Suresh> CHASSIS_STATE_TABLE has to be enhanced for this purpose. Please feel free to make the changes. We could help with review.Suresh> When we added this, there was no requirement to make PCI-ID to be discovered and allow non static PCI-ID. In my view, as such it's still a vendor platform specific requirement. SONiC already allows PCI-ID to be configured and hence operators/users can still configure if that's what is desired.
- global database container (config_db) can have per FC admin enable/disable.
- if admin disabled, as part of configuration handling,
- disable fabric-asi...@n.service and all other per asic services.
Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?Suresh> Looking at currently implemented design pattern references in SONiC, I think these are called xyz-cfgmgr (i.e nbrmgr) but it's still inside SWSS containers. One can have multi-asic-cfgmgr running on host namespace. But it's not really needed to have a cfgmgr for operations like asic shutdown/startup alone. I would propose to enhance config reload(python scripts) to take care of shutdown/startup of an asic instance and update config-db as part of this cli interface. It's a cli interface implementation or part of minigraph handling. I don't know the exact current implementation of config reload but it seems like a good fit.
On Fri, Apr 2, 2021 at 12:37 PM Sureshkannan <suresh...@gmail.com> wrote:Answers inline belowSuresh>On Fri, Apr 2, 2021 at 9:49 AM Eswaran Baskaran <esw...@arista.com> wrote:Suresh,This should work overall. Some questions inline.On Wed, Mar 31, 2021 at 5:53 PM Sureshkannan <suresh...@gmail.com> wrote:Just to summarize the overall steps/procedure.
- asic.conf can have MAX_ASIC as per vendor platform.
- systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
- fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)
- fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
- fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.
- Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.
Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?Suresh> CHASSIS_STATE_TABLE has to be enhanced for this purpose. Please feel free to make the changes. We could help with review.Suresh> When we added this, there was no requirement to make PCI-ID to be discovered and allow non static PCI-ID. In my view, as such it's still a vendor platform specific requirement. SONiC already allows PCI-ID to be configured and hence operators/users can still configure if that's what is desired.
[Shyam] can you please update/confirm on following queries (or we can take a note/AI to check them)
a) fabric-asi...@n.service has NO dependency whatsoever on any other service/daemon of the system and vice-versa?
b) Each of fabric-asi...@n.service is a standalone service/utility for that ASIC from the time its detected until it goes offline?
c) For FCs not preset/inserted in the chassis, fabric-asi...@n.service would keep running indefinitely? and causes no-harm/ side-effect(s)?
d) None of the operations - config-reload, system-daemon-reload, chassis reload - are impacted when FC (or NPU) is turned-off/not found on going down or coming up path
Ok, we (Arista) will take a stab at defining this.
- global database container (config_db) can have per FC admin enable/disable.
- if admin disabled, as part of configuration handling,
- disable fabric-asi...@n.service and all other per asic services.
Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?Suresh> Looking at currently implemented design pattern references in SONiC, I think these are called xyz-cfgmgr (i.e nbrmgr) but it's still inside SWSS containers. One can have multi-asic-cfgmgr running on host namespace. But it's not really needed to have a cfgmgr for operations like asic shutdown/startup alone. I would propose to enhance config reload(python scripts) to take care of shutdown/startup of an asic instance and update config-db as part of this cli interface. It's a cli interface implementation or part of minigraph handling. I don't know the exact current implementation of config reload but it seems like a good fit.The expectation would be for the user to turn off the card and do 'config reload' on the supervisor for the services to disappear. That's reasonable. In my opinion though, if we get the fabric-asic-bootstrap service done right, it can disable the swss/syncd services for fabric cards that disappear (using the existing FEATURE table mechanism). I think we can start solving the first part of the problem (CHASSIS_TABLE schema, pmon API, fabric bootstrap service) and revisit this.
Hi Suresh, Easwaran,Apologies for catching up late on this!@Suresh - Thanks for laying out high-level workflow.Just to be on the same page, this is similar model to what we discussed in the call last week i.e. primarily hostcfgd replaced with fabric-asi...@n.service - right?comment inlineOn Fri, Apr 2, 2021 at 12:54 PM Eswaran Baskaran <esw...@arista.com> wrote:On Fri, Apr 2, 2021 at 12:37 PM Sureshkannan <suresh...@gmail.com> wrote:Answers inline belowSuresh>On Fri, Apr 2, 2021 at 9:49 AM Eswaran Baskaran <esw...@arista.com> wrote:Suresh,This should work overall. Some questions inline.On Wed, Mar 31, 2021 at 5:53 PM Sureshkannan <suresh...@gmail.com> wrote:Just to summarize the overall steps/procedure.
- asic.conf can have MAX_ASIC as per vendor platform.
- systemd-generator will create all per asic services(database, fabric-asic-bootstrap, syncd, swss). fabric-asi...@n.service created only if its VOQ supervisor. This kind of flexibility is already supported in sonic. For example BGP services won't be running in supervisor.
- fabric-asi...@n.service will have systemd rules like, before=swss, before=syncd, (all per asic services started only after fabric-asic-bootstrap.service)
- fabric-asi...@n.service is a simple lua script that subscribes to CHASSIS_STATE_TABLE and waits for its slot/asic to be ONLINE
- fabric-asic-bootstrap.lua or python will be blocking the service state become active until slot/asic is online.
[Shyam] Does "blocking the service state become active until slot/asic is online" implies swss, syncd services under each fabric-asi...@n.service would get spawned prior to checking FC/ASIC presence and these swss, syncd services would be put on hold until 'ASIC online' is notified? also implies their respective containers won't be spawned?This means all (MAX) ASICs swss, syncd services started even if FC (and hence ASIC) were absent/not-inserted?In that case, I'd recommend NOT-enabling these services (swss, syncd) at all until ASIC online status is notified to fabric-asi...@n.service. This would keep the bring-up, going-down workflow simpler, comprehensible and less complex while debugging.Can we look into this?
- Once slot/asic is online, fabric-asic-bootstrap.py can write per asic config DB with PCI-ID info.
Do you know if the existing schema for CHASSIS_STATE_TABLE supports this already or if we need to enhance it?Suresh> CHASSIS_STATE_TABLE has to be enhanced for this purpose. Please feel free to make the changes. We could help with review.Suresh> When we added this, there was no requirement to make PCI-ID to be discovered and allow non static PCI-ID. In my view, as such it's still a vendor platform specific requirement. SONiC already allows PCI-ID to be configured and hence operators/users can still configure if that's what is desired.[Shyam] can you please update/confirm on following queries (or we can take a note/AI to check them)a) fabric-asi...@n.service has NO dependency whatsoever on any other service/daemon of the system and vice-versa?
b) Each of fabric-asi...@n.service is a standalone service/utility for that ASIC from the time its detected until it goes offline?
c) For FCs not preset/inserted in the chassis, fabric-asi...@n.service would keep running indefinitely? and causes no-harm/ side-effect(s)?
d) None of the operations - config-reload, system-daemon-reload, chassis reload - are impacted when FC (or NPU) is turned-off/not found on going down or coming up path
Ok, we (Arista) will take a stab at defining this.
- global database container (config_db) can have per FC admin enable/disable.
- if admin disabled, as part of configuration handling,
- disable fabric-asi...@n.service and all other per asic services.
Do you know which part of the system will do this(react to changes in config DB and turn off services) ? Has this been solved already for some use case?Suresh> Looking at currently implemented design pattern references in SONiC, I think these are called xyz-cfgmgr (i.e nbrmgr) but it's still inside SWSS containers. One can have multi-asic-cfgmgr running on host namespace. But it's not really needed to have a cfgmgr for operations like asic shutdown/startup alone. I would propose to enhance config reload(python scripts) to take care of shutdown/startup of an asic instance and update config-db as part of this cli interface. It's a cli interface implementation or part of minigraph handling. I don't know the exact current implementation of config reload but it seems like a good fit.The expectation would be for the user to turn off the card and do 'config reload' on the supervisor for the services to disappear. That's reasonable. In my opinion though, if we get the fabric-asic-bootstrap service done right, it can disable the swss/syncd services for fabric cards that disappear (using the existing FEATURE table mechanism). I think we can start solving the first part of the problem (CHASSIS_TABLE schema, pmon API, fabric bootstrap service) and revisit this.[Shyam] Besides user-configurable FC shutdown operation and config reload, there could/would be a case of FC ASIC hitting fault.This may happen at runtime and system (SW) to act upon by shutting down the FC.wouldn't the H-L flow is like - PMON/chassisD taking action and updating CHASSIS_STATE_TABLE about FC and all its ASICs offline.
Each of the ASIC state is in turn notified to fabric-asi...@n.service (the one who is subscribed for this). This in turn would disable all services like swss@n; syncd@n etc. and then self.IMO - be it user/admin-configured FC enable/disable or system initiated (due to fault/otherwise) - all should have a common workflow.In that case, how and where does the FEATURE table mechanism fit?
b) Each of fabric-asi...@n.service is a standalone service/utility for that ASIC from the time its detected until it goes offline?
systemd process will wait for this to happen before proceeding to other units.To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CA%2BrWxa3a5pBtXH59CYgH6utn3CkZy-kbdbPyUkeD9FQcE0W1yA%40mail.gmail.com.
Hi,
I have updated the PR with the below options.
https://github.com/Azure/sonic-buildimage/pull/7477 has 2 commits.
Option1 (Hard dependencies)
Relying on systemd After=, Before= hard dependency features to make sure
that swss@->syncd@ services come up only after the asic has been
detected as Online. However, services like snmp have After=swss which
creates a hard dependency on the service to be started.
We have to remove After=swss/syncd tag from snmp, telemetry, mgmt-framework services.
Option2 (Soft dependencies)
Use mask/unmask to control asic services.
For swss/syncd services, on every boot, systemd-generator will mask
these services only on the Supervisor card. When asic is detected by
pmon, bootstrap-asic service will unmask swss/syncd for that asic.
(With this option, we may not require a per-asic bootstrap_asic.service. We can just have a single one)
Just to make sure Option 2 is explained fully here, because Nokia ran into a few issues with Issue 1.Option 2:
- Systemd generator generates MAX_ASICS service unit files and it will keep swss, syncd services mask disabled on VOQ supervisor card based (better to add option in asic.conf to say if its VOQ supervisor card)
- asic-bootstrap.service (single service for entire supervisor card, and it's not multi unit service) that listens to pmon state table (CHASSIS_ASIC_TABLE) and unmasks swss,syncd services once fabric asic found be present.
- While 1, 2 covers bootstrap, failure scenarios and unmasking can be done from bootstrap_asic.py upon asic found to not-present (sudden plug out), runtime failure even though swss, syncd might have crashed by then. But, asic-bootstrap will make sure unmask has happened upon asic went offline.
- To make PCI-ID mapping more of a system overwritten(even though user can configure it), bootstrap_asic.py will update DEVICE_METADATA:asic_id with PCI-ID information coming from CHASSIS_ASIC_TABLE before unmasking swss,syncd.
- The question we have is, PCI-ID info being written to ConfigDB by bootstrap_asic.py. Is that OK in sonic to update user supplied config(DEVICE_METADATA:asic_id) by runtime service?
Hi,When HWSKU is configured/changed in DEVICE_METADATA by the user, does it require sonic to be restarted?.
+ Anand to keep him in the loop with the latest discussion/update on email wrt workflow and work-itemsOn Wed, May 19, 2021 at 3:26 PM Eswaran Baskaran <esw...@arista.com> wrote:On Wed, May 19, 2021 at 2:00 PM Shyam Kumar <shy...@gmail.com> wrote:Thanks Easwaran for summarizing meeting discussionw.r.t #3 is the same as one of the proposals I initially shared at the start of this email thread.Anand worked on this and came up with following PRs:- platform hook in swss.sh : https://github.com/Azure/sonic-buildimage/pull/7621- another PR which is under platform cisco where it deals with platform hook implementation (NPU detection etc.)Cisco can take this up further.Also, we need to check for 'config reload' and other relevant use-cases with this approach.Thanks Shyam. The posted PR needs to be modified in 2 ways1. Avoid creating the docker when the chip is present.2. Use platform independent mechanisms to check for chip presence. This PR stashes the ASIC PCI address in CHASSIS_STATE_DB and that can be used as an indication for chip presence (https://github.com/Azure/sonic-platform-daemons/pull/175/).Thanks,EswaranHi Rita,>> #fabric cards actually existing is less than expected fabric cards in chassis (this can be the max fabric cards for a give chassis sku), for first phase, we may consider the chassis is bad, and leave the chassis as isolated until the fabric card is replaced.[1] Even for the first phase, I am not sure that all MAX FCs would be populated as the system can still run with required bandwidth etc. with <MAX_possible_FCs for a given chassis.Taking Cisco chassis (as use case here), IIRC, we decided for 5 or 6 out of max 8.Shall we double check on this?[2] a) Another big thing is, for EFT personnel (at customer/MSFT lab) to validate chassis along with SONiC image, there may NOT be all MAX_FCs populated in the chassis or one/two may not come up.Rendering the entire chassis not-usable for 1 or 2 FCs not present/coming-up might not fit their test suite validation.b) Internally, for platform vendor, it's difficult (near impossible) to have MAX FCs populated in many of the validation and development test-setups.They would go with a bunch of FCs in many setups and near-max FCs in many test setups and only few setups with MAX FCs inserted (which does validation prior to image release) etc.Bringing the chassis down won't help. Each platform would start adding their own patch to mitigate this; which would bring us back to the original ask (can FCs detection be dynamically handled instead of assuming max for a given chassis type?)Thanks,ShyamOn Wed, May 19, 2021 at 12:50 PM <rita...@gmail.com> wrote:Thank you Eswaran.
One point I thought of after meeting is, if a fabric card is bad/missing in a chassis, i.e. #fabric cards actually existing is less than expected fabric cards in chassis (this can be the max fabric cards for a give chassis sku), for first phase, we may consider the chassis is bad, and leave the chassis as isolated until the fabric card is replaced.
Thanks,
Rita
Thanks,
Suresh
Thanks,
Shyam
Thanks,
Eswaran
Hi Shyam
Regarding
Thanks,
Shyam
Existing service start sequence
a. Systemd generator creates service files for MAX_ASICs (based on asic.conf)
b. All database services run first
i.Host database service runs first
ii.Per-ASIC database service runs first
iii.pmon starts in parallel after host database starts
c. Minigraph is parsed and config db is updated
d. Other ASIC services are started
e. Hostcfgd is operational to react to FEATURE table and mask/unmask services
To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CA%2BrWxa07n-vuVPkPr0TojoeP-DOyLjd-2yWpQm2pP-6jNZq1Cw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CAE_Gcs91WK52tS6ptdE-6sLMhfLSoOGyuxnCwtJsr-Ck7yQsVQ%40mail.gmail.com.
Hi Kamal,
I believe the participants here have hardware platforms containing multiple ASICs, that are being used for testing.
Thanks,
Rita
To view this discussion on the web visit https://groups.google.com/d/msgid/sonic-chassis-subgroup/CAGQTY9j9KcgFdjy9sHYE2-c1M5mTA_2DAP-2jnGjdisjgcTp1A%40mail.gmail.com.