Hi,
> > > > playing with updates, I maneuvered the EBG envs on a system into this
> > > > weird state:
> > > >
> > > >
> > > > ----------------------------
> > > > Config Partition #0 Values:
> > > > in_progress: yes
> > > > revision: 4
> > > > kernel: C:BOOT1:linux.efi
> > > > kernelargs:
> > > > watchdog timeout: 0 seconds
> > > > ustate: 3 (FAILED)
> > > >
> > > > user variables:
> > > > recovery_status = failed
Hm, did you start with a clean environment and SWUpdate >= 2022.12?
> > > > ----------------------------
> > > > Config Partition #1 Values:
> > > > in_progress: no
> > > > revision: 3
> > > > kernel: C:BOOT1:linux.efi
> > > > kernelargs:
> > > > watchdog timeout: 0 seconds
> > > > ustate: 2 (TESTING)
> > > >
> > > > user variables:
> > > >
> > >
> > > I see - we should *never* reach this state.
> > >
> > > >
> > > > To get there, I started an upstate with swupdate and booted into testing
> > > > path #1.
> > >
> > > Ok
> > >
> > > > But then didn't confirm this update and rather started it
> > > > again, using the same swu.
> > >
> > > It looks to me that this is the point. SWUpdate requires to close the
> > > transaction, for itself or for the deployment server (Hawkbit). If a
> > > system boots with TESTING, the glue logic should start SWUpdate asking
> > > to close the transaction - with OK or FAILED by passing the -c parameter.
> > >
> > > However, this was thought to work together with the deployment server,
> > > because it handles the state machine on Hawkbit. The parameter is
> > > ignored if another deployment interface (Webserver, USB, ..) is used.
The suricatta modules handle this for you ― as a "convenience" feature
and to keep the (hawkBit, ...) server's view of things consistent with
the device's, which is more important than the convenience aspect :)
If you're running it with other modules/modes, you're on your own.
Then, you have to play along the (convention) rules to close the
transaction as there's nothing preventing you to get into this
situation with EFI Boot Guard.
Hence, the valid question whether this should be allowed / denied by EFI
Boot Guard or the tools (SWUpdate in this case) making use of it?
> > > This is managed (again) on such situation on glue logic, and the
> > > transaction (that is set of ustate) is done before starting SWUpdate. Or
> > > in case of U-Boot, it is also managed with the help of additional (and
> > > custom) variables.
> > >
> > > In your case, it seems that nothing is done at boot time, and SWUpdate
> > > is started. SWUpdate does not know (because it expects that someone has
> > > already decided, and ustate is not checked) that a new software is
> > > running, and the same SWU is loaded again.
Exactly, here you're on your own. You have to instrument EFI Boot Guard
so that it's happy... which is convention and not enforced, currently.
Granted, this requires a lot of context knowledge how to integrate
things properly and seamlessly...
One common pattern is to have a "health" target and once that's reached
you start SWUpdate with according parameters (or set them yourself via
some glueing method). But again, that is convention, not enforced, and
it's currently the responsibility of the system integrator to get right.
> > I was running swupdate manually from the command line. No backend
> > involved, just the desire to intentionally break things. ;)
>
> The best way to reach the goal...:-D
If you would have used suricatta, you would have missed this :)
> And yes, this can happen because the part deciding if previous update was ok,
> is missing. In most projects, if system is up and running, it is considered
> ok. That means the decision is done in SWUpdate's systemd run unit (or SystemV
> init script), see also glue logic under /usr/lib/swupdate. In some other
> cases, update is ok only if application is running, a migration of a custom
> database was ok, ad, and....that means is outside SWUpdate. SWUpdate supports
> all these use cases.
Yes, that's the codified context knowledge. Still, if you miss out on
one thing, the whole integration will crash and burn. And it's quite
easy to miss a thing...
The question is whether there is a generic pattern like the "health"
target I sketched above so that SWUpdate can handle and abstract
the bootloader interactions?
Then, any SWUpdate mode/module will behave the same and there's all
in one place reducing the need for having all the context knowledge...
> To avoid the issue you are seeing, the decsion should be done inside SWUpdate:
> something like a transiction TESTING ==> OK, because SWUpdate is running. But
> as I said, this can be done if it will be configurable, or it will break the
> use cases I mentioned.
This is essentially promoting the current suricatta behavior to all
SWUpdate modes/modules w/o the remote reporting part if not run from
a suricatta module. Would be a starter...
Kind regards,
Christian
--
Dr. Christian Storm
Siemens AG, Technology, T CED SES-DE
Otto-Hahn-Ring 6, 81739 München, Germany