Detecting corrupted-failed update

15 views
Skip to first unread message

Mathieu Alexandre-Tétreault

unread,
Nov 19, 2018, 4:32:57 PM11/19/18
to efibootg...@googlegroups.com

Howdy,

 

I went through the source code and I have a few questions:

 

  1. Is efibootguard able to detect a failed boot? How is it handled? From what I understand, is that the code simply returns using: error_exit.
  2. I’d like to be able to use a boot counter to toggle the rootfs/kernel used. For example, if after three failed boot attempt, the boot switched to rootfs2 and kernel2. So each time efibootguard boots, the counter is incremented, and as soon as the kernel is fully loaded a user service resets the boot counter. As far as I know, this feature is not implemented in efibootguear (am I wrong?). I was thinking about using the uservars for that purpose. Is there any concern with that, or things I should be awared of?

 

Any advise would be welcomed.

 

Cheers,

 

Mathieu

Andreas Reichel

unread,
Nov 20, 2018, 5:28:16 AM11/20/18
to Mathieu Alexandre-Tétreault, efibootg...@googlegroups.com
On Mon, Nov 19, 2018 at 09:32:52PM +0000, Mathieu Alexandre-Tétreault wrote:
> Howdy,
> I went through the source code and I have a few questions:
> 1. Is efibootguard able to detect a failed boot? How is it handled? From what
> I understand, is that the code simply returns using: error_exit.

Hi, yes this is the whole sense of efibootguard. You could read the docs
in the docs/ folder. There it is explained.

Short version:
The whole thing works with using the hardware watchdog timer together
with a redundant environment on disk. When you update the system, you
trigger a variable in the environment and if the system fails to boot,
the watchdog will reset the system and efibootguard will see the already
triggered variable and bring the system back to a working state.

> 2. I’d like to be able to use a boot counter to toggle the rootfs/kernel used.
> For example, if after three failed boot attempt, the boot switched to
> rootfs2 and kernel2. So each time efibootguard boots, the counter is
> incremented, and as soon as the kernel is fully loaded a user service
> resets the boot counter. As far as I know, this feature is not implemented
> in efibootguear (am I wrong?). I was thinking about using the uservars for
> that purpose. Is there any concern with that, or things I should be awared
> of?

You are implementing a completely different logic on top of the actual
switching logic then. Also I don't get your example:
"... if after three failed boot attempt, the boot switched to rootfs2
and kernel2."
Does not make sense to me. efibootguard switches back after the first
failed attempt to bring up new rootfs/kernel. It does not test three
times but one time. Why would you try it more then once? A system should
be reliable.

And surely, you can implement your own logic with user variables. You
can disable the watchdog and script whatever you want with them.

Still I would first suggest you read the docs more thoroughly.

Kind regards
Andreas

>
>
>
> Any advise would be welcomed.
>
>
>
> Cheers,
>
>
>
> Mathieu
>
> --
> You received this message because you are subscribed to the Google Groups "EFI
> Boot Guard" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to efibootguard-d...@googlegroups.com.
> To post to this group, send email to efibootg...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> efibootguard-dev/
> BN6PR1101MB23565CA2C4F289FC0778D387B2D80%40BN6PR1101MB2356.namprd11.prod.outlook.com
> .
> For more options, visit https://groups.google.com/d/optout.

--
Andreas Reichel
Dipl.-Phys. (Univ.)
Software Consultant

Andreas...@tngtech.com, +49-174-3180074
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterfoehring
Geschaeftsfuehrer: Henrik Klagges, Dr. Robert Dahlke, Gerhard Mueller
Sitz: Unterfoehring * Amtsgericht Muenchen * HRB 135082

Mathieu Alexandre-Tétreault

unread,
Nov 21, 2018, 9:56:36 AM11/21/18
to Andreas Reichel, efibootg...@googlegroups.com
Hi Andreas,

Thank you for your answer.

On Mon, Nov 19, 2018 at 09:32:52PM +0000, Mathieu Alexandre-Tétreault wrote:
>> Howdy,
>> I went through the source code and I have a few questions:
>> 1. Is efibootguard able to detect a failed boot? How is it handled? From what
>> I understand, is that the code simply returns using: error_exit.

>Hi, yes this is the whole sense of efibootguard. You could read the docs in the docs/ folder. There it is explained.
>
>Short version:
>The whole thing works with using the hardware watchdog timer together with a redundant environment on disk. When you update the system, you trigger a variable in the environment and if the system fails to boot, >the watchdog will reset the system and efibootguard will see the already triggered variable and bring the system back to a working state.
My bad, it seems I went through the doc a little bit to fast.

>> 2. I’d like to be able to use a boot counter to toggle the rootfs/kernel used.
>> For example, if after three failed boot attempt, the boot switched to
>> rootfs2 and kernel2. So each time efibootguard boots, the counter is
>> incremented, and as soon as the kernel is fully loaded a user service
>> resets the boot counter. As far as I know, this feature is not implemented
>> in efibootguear (am I wrong?). I was thinking about using the uservars for
>> that purpose. Is there any concern with that, or things I should be awared
>> of?

>You are implementing a completely different logic on top of the actual switching logic then. Also I don't get your example:
>"... if after three failed boot attempt, the boot switched to rootfs2 and kernel2."
>Does not make sense to me. efibootguard switches back after the first failed attempt to bring up new rootfs/kernel. It does not test three times but one time. Why would you try it more then once? A system should be >reliable.
This is actually a customer's requirement. They want the system to test the update more than once. Their point is that the power could get disconnected thus, a failed attemp could be due to power outtage and not a corrupted update.

Mathieu

Jan Kiszka

unread,
Nov 21, 2018, 10:30:06 AM11/21/18
to Mathieu Alexandre-Tétreault, Andreas Reichel, efibootg...@googlegroups.com
On 21.11.18 15:56, Mathieu Alexandre-Tétreault wrote:
> Hi Andreas,
>
> Thank you for your answer.
>
> On Mon, Nov 19, 2018 at 09:32:52PM +0000, Mathieu Alexandre-Tétreault wrote:
>>> Howdy,
>>> I went through the source code and I have a few questions:
>>> 1. Is efibootguard able to detect a failed boot? How is it handled? From what
>>> I understand, is that the code simply returns using: error_exit.
>
>> Hi, yes this is the whole sense of efibootguard. You could read the docs in the docs/ folder. There it is explained.
>>
>> Short version:
>> The whole thing works with using the hardware watchdog timer together with a redundant environment on disk. When you update the system, you trigger a variable in the environment and if the system fails to boot, >the watchdog will reset the system and efibootguard will see the already triggered variable and bring the system back to a working state.
> My bad, it seems I went through the doc a little bit to fast.
>
>>> 2. I’d like to be able to use a boot counter to toggle the rootfs/kernel used.
>>> For example, if after three failed boot attempt, the boot switched to
>>> rootfs2 and kernel2. So each time efibootguard boots, the counter is
>>> incremented, and as soon as the kernel is fully loaded a user service
>>> resets the boot counter. As far as I know, this feature is not implemented
>>> in efibootguear (am I wrong?). I was thinking about using the uservars for
>>> that purpose. Is there any concern with that, or things I should be awared
>>> of?
>
>> You are implementing a completely different logic on top of the actual switching logic then. Also I don't get your example:
>> "... if after three failed boot attempt, the boot switched to rootfs2 and kernel2."
>> Does not make sense to me. efibootguard switches back after the first failed attempt to bring up new rootfs/kernel. It does not test three times but one time. Why would you try it more then once? A system should be >reliable.
> This is actually a customer's requirement. They want the system to test the update more than once. Their point is that the power could get disconnected thus, a failed attemp could be due to power outtage and not a corrupted update.
>

Hmm, sounds like a not completely far-fetched point - though I wonder how
unreliable their power supply may be that such case requires device-side
handling. Conceptually, we could introduce a retry counter that only makes a
transition to "failed" when it reaches 0 and otherwise keeps the device in the
"testing" state. Feel free to propose a patch.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

Andreas Reichel

unread,
Nov 21, 2018, 10:51:42 AM11/21/18
to Jan Kiszka, Mathieu Alexandre-Tétreault, efibootg...@googlegroups.com
Yes we could do that, but is that really a good idea? It is somehow
symptomatic fixing instead of going to the root of problems. A machine
either works or it does not. If it is in between, something is wrong.
And there are indeed a lot of machines/systems around, that are in
between and software is hacked around to cope with this :)
Well - as you wish of course, but I had to point that out first.

Andreas

> Jan
>
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux

Reply all
Reply to author
Forward
0 new messages