Flatcar does not boot into updated version

169 views
Skip to first unread message

Yasha C

unread,
May 20, 2020, 3:35:51 AM5/20/20
to Flatcar Linux Dev
Hi, 

Sorry for bombing this everywhere. I have already filed git ticket and posted this in the user mailing list and the IRC. I am under a tight schedule hence posting even here for any kind of suggestions. 

My issue, in summary, is, post-update rollout and reboot, /flatcar/vmlinuz-b is not found. I see this file in /boot/coreos along with the first_boot file.

For ease of the reader, following is the content of the git ticket (https://github.com/flatcar-linux/Flatcar/issues/122):

Flatcar does not boot into updated version

I am trying to upgrade my flatcar deployment from the current stable version (2345.3.1) to the current beta version (2411.1.1) via the Omaha protocol. I am using coreroller for the rollout mechanism. During this upgrade rollout, the OS on 2345.3.1 successfully downloads the new update (i.e. version 2411.1.1)

May 19 17:51:03 localhost update_engine[2575]: I0519 17:51:03.514751  2575 update_attempter.cc:316] Update successfully applied, waiting to reboot.
May 19 17:52:47 localhost update_engine[2575]: I0519 17:52:47.786886  2575 prefs.cc:51] certificate-report-to-send-update not present in /var/lib/update_engine/prefs
May 19 17:52:47 localhost update_engine[2575]: I0519 17:52:47.786928  2575 prefs.cc:51] certificate-report-to-send-download not present in /var/lib/update_engine/prefs
May 19 17:52:47 localhost update_engine[2575]: I0519 17:52:47.786934  2575 update_attempter.cc:131] Not updating b/c we already updated and we're waiting for reboot, we'll ping Omaha instead

and reboots but stops during the boot cycle with the following error :
image

On pressing enter, it will show the GRUB menu with :
image

On selecting "Flatcar default" it will boot back into 2345.3.1 version and the whole cycle repeats.

Impact

I am not able to roll out upgrades to my flatcar deployments.

Environment and steps to reproduce

  1. Set-up: The OS is deployed on VMWare Vsphere
  2. Task: upgrade the os versions
  3. Action(s): As described in the above paragraph
  4. Error: Screenshot 1

Expected behavior
Os upgrades to 2411.1.1 version

Additional information
Now if I understand it right, from this document, https://docs.flatcar-linux.org/os/manual-rollbacks/ USR-A with UUID 7130c94a-213a-4e5a-8e26-6cce9662f132 is the active partition and points to the current version, in my case 2345.3.1

And also from grub.cfg,

function gptprio {
    # TODO: device name is no longer needed, should make it optional...
    gptprio.next -d usr_device -u usr_uuid
    if [ $? -ne 0 -o -z "$usr_uuid" ]; then
        echo
        echo "Reading or updating the GPT failed!"
        echo "Please file a bug with any messages above to Flatcar:"
        echo " https://issues.flatcar-linux.org"
        abort
    fi

    set gptprio_cmdline="@@MOUNTUSR@@=PARTUUID=$usr_uuid"
    if [ "$usr_uuid" = "7130c94a-213a-4e5a-8e26-6cce9662f132" ]; then
        set gptprio_kernel="/flatcar/vmlinuz-a"
    else
        set gptprio_kernel="/flatcar/vmlinuz-b"
    fi
}

So from the rollback document and grub.cfg during upgrade rollout and subsequent reboot the UUID will be set to one other than 7130c94a-213a-4e5a-8e26-6cce9662f132. Hence, GRUB is trying to point to /flatcar/vmlinuz-b, which it is not able to find. On inspecting the /boot partition, I do not see /flatcar/vmlinuz-b in /boot/flatcar/ but in /boot/coreos/.

core@localhost ~ $ sudo su
localhost core # cd /boot/flatcar/
localhost flatcar # ls
grub  vmlinuz-a
localhost flatcar # cd ../coreos/
localhost coreos # ls
first_boot  vmlinuz-b

Also hopefully this info helps:

core@localhost ~ $ sudo cgpt show /dev/sda
       start        size    part  contents
           0           1          Hybrid MBR
           1           1          Pri GPT header
           2          32          Pri GPT table
        4096      262144       1  Label: "EFI-SYSTEM"
                                  Type: EFI System Partition
                                  UUID: 15F4D856-29DA-4F9D-BDFB-93CEA0FC7748
                                  Attr: Legacy BIOS Bootable
      266240        4096       2  Label: "BIOS-BOOT"
                                  Type: BIOS Boot Partition
                                  UUID: 757841A0-7143-4BB6-AD13-03F795C6B11F
      270336     4194304       3  Label: "USR-A"
                                  Type: Alias for coreos-rootfs
                                  UUID: 7130C94A-213A-4E5A-8E26-6CCE9662F132
                                  Attr: priority=2 tries=0 successful=1
     4464640     4194304       4  Label: "USR-B"
                                  Type: Alias for coreos-rootfs
                                  UUID: E03DD35C-7C2D-4A47-B3FE-27F15780A57C
                                  Attr: priority=1 tries=0 successful=0
     8658944      262144       6  Label: "OEM"
                                  Type: Alias for linux-data
                                  UUID: 8F99D006-4F26-46D2-A567-B6FAC073BB61
     8921088      131072       7  Label: "OEM-CONFIG"
                                  Type: CoreOS reserved
                                  UUID: FE8334B2-114A-4074-95DF-0030E3F70293
     9052160    29913055       9  Label: "ROOT"
                                  Type: CoreOS auto-resize
                                  UUID: 1C0A563E-E727-4D3D-8920-B0D09FBA7982
    38965215          32          Sec GPT table
    38965247           1          Sec GPT header


Kindly let me know if any further information is required. Any help here is much appreciated.


Regards

Yasha C

Kai Lüke

unread,
May 21, 2020, 12:46:01 PM5/21/20
to Yasha C, Flatcar Linux Dev
Hi,
for other readers: I've responded in the ticket but can summarize here.
The problem is that as soon as /boot/coreos is created on a new Flatcar installation (not migrated from CoreOS), the kernel will be placed there during the update instead of /boot/flatcar.
Currently, the first_boot flag file can only be created under the new location /boot/flatcar/first_boot.
In-place migrations from CoreOS have to stick to /boot/coreos/first_boot because GRUB checks this file and the GRUB code is not migrated (see https://docs.flatcar-linux.org/ignition/boot-process/#reprovisioning).
I will try to add support for the old location /boot/coreos/first_boot on new Flatcar installations, too, to reduce the migration burden/confusion when new Flatcar machines are provisioned but in-place migrated machines also exist in the cluster and you don't know where to create the flag file.

Regards,
Kai


--
You received this message because you are subscribed to the Google Groups "Flatcar Linux Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatcar-linux-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/flatcar-linux-dev/66e0f2e0-750f-4fda-99b0-1019dae1e0af%40googlegroups.com.


--
Kinvolk GmbH | Adalbertstr.6a, 10999 Berlin | tel: +491755589364

Geschäftsführer/Directors: Alban Crequy, Chris Kühl, Iago López Galeiras

Registergericht/Court of registration: Amtsgericht Charlottenburg

Registernummer/Registration number: HRB 171414 B

Ust-ID-Nummer/VAT ID number: DE302207000

Reply all
Reply to author
Forward
0 new messages