CoreOS version 1632.3.0 / file system corruption on VMware.

319 views
Skip to first unread message

Mukarram Syed

unread,
Mar 24, 2018, 6:11:20 AM3/24/18
to CoreOS User
Hi
I was able to use Matchbox to install CoreOS on to my test VMware using the "simple-install.json" from Matchbox (profiles/groups/ignition).  I configured Ignition to use my SSH public key.  The coreos-install script works and the system reboots...but when I ssh using the public key I am not able to login. 
I used "coreos.autologin" in the Grub menu to login. but when I login after a few minutes my /dev/sda9 filesystem gets corrupted - me without doing anything. 
I run a fsck -y /dev/sda9 and I get a bunch of inode errors and my whole / partition is wiped out.
I don't understand what's going on with CoreOS.  I tried this many different times and in 2 entirely different VMware Infrastructures - one in my lab and the other in my datacenter. 
I am using CoreOS 1632.3.0 and I tried 1576.4.0 as well.

Any help is appreciate.

Thanks

# mukarram

Mukarram Syed

unread,
Mar 27, 2018, 6:53:09 PM3/27/18
to Derek Gonyeo, CoreOS User
Hi Derek
Thank you for responding!  Appreciate it.

Looks like when the coreos-install runs it does not partition the file system correctly with Matchbox. 
When I run coreos-install manually with my own ".ign" file it works fine on baremetal or on VM.
Before the coreos-install there is a "curl" command that downloads the ignition file and that seems to be the problem with my setup.

It works fine but after sometimes and then when I reboot it gives a file system corruption error or when it leave it for sometime it starts giving file system corruption error...particularly on /dev/sda6 where the 

last part of the dmesg output:

[   12.301931] EDAC sbridge:  Ver: 1.1.2
[   12.310121] vmxnet3 0000:0b:00.0 ens192: intr type 3, mode 0, 5 vectors allocated
[   12.311648] vmxnet3 0000:0b:00.0 ens192: NIC Link is Up 10000 Mbps
[   12.312714] IPv6: ADDRCONF(NETDEV_UP): ens192: link is not ready
[   12.312934] IPv6: ADDRCONF(NETDEV_CHANGE): ens192: link becomes ready
[   12.323588] mousedev: PS/2 mouse device common for all mice
[   24.996588] random: crng init done
[   27.279664] Alternate GPT is invalid, using primary GPT.
[   27.279846]  sda: sda1 sda2 sda3 sda4 sda6 sda7 sda9
[  103.732620] GPT:Primary header thinks Alt. header is not at the end of the disk.
[  103.732881] GPT:9289727 != 335544319
[  103.733001] GPT:Alternate GPT header not at the end of the disk.
[  103.733181] GPT:9289727 != 335544319
[  103.733299] GPT: Use GNU Parted to correct GPT errors.
[  103.733469]  sda: sda1 sda2 sda3 sda4 sda6 sda7 sda9
[  103.860595] GPT:Primary header thinks Alt. header is not at the end of the disk.
[  103.860989] GPT:9289727 != 335544319
[  103.861112] GPT:Alternate GPT header not at the end of the disk.
[  103.861300] GPT:9289727 != 335544319
[  103.861423] GPT: Use GNU Parted to correct GPT errors.
[  103.861600]  sda: sda1 sda2 sda3 sda4 sda6 sda7 sda9
[  104.161906] EXT4-fs (sda6): mounted filesystem with ordered data mode. Opts: (null)

Screenshot of my outputs:

Pre-reboot-gpt-errors...same as the dmesg output below:




FSCK errors after the final reboot from disk - 1:



FSCK errors after the final reboot from disk - 2:



groups/install-reboot.yaml shown below:

---
systemd:
  units:
    - name: installer.service
      enable: true
      contents: |
        [Unit]
        Requires=network-online.target
        After=network-online.target
        [Service]
        Type=simple
        ExecStart=/opt/installer
        [Install]
        WantedBy=multi-user.target
storage:
  files:
    - path: /opt/installer
      filesystem: root
      mode: 0500
      contents:
        inline: |
          #!/bin/bash -ex
          curl --retry 10 --fail "{{.ignition_endpoint}}?{{.request.raw_query}}&os=installed" -o ignition.json
          coreos-install -d /dev/sda -C {{.coreos_channel}} -V {{.coreos_version}} -i ignition.json {{if index . "baseurl"}}-b {{.baseurl}}{{end}} -o vmware_raw
          udevadm settle
          ##systemctl reboot

{{ if index . "ssh_authorized_keys" }}
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        {{ range $element := .ssh_authorized_keys }}
        - {{$element}}
        {{end}}
{{end}}

This is the installer.service output that runs the /opt/installer script:

localhost ~ #  journalctl -u installer.service
-- Logs begin at Tue 2018-03-27 22:07:50 UTC, end at Tue 2018-03-27 22:12:00 UTC. --
Mar 27 22:08:13 localhost systemd[1]: Started installer.service.
Mar 27 22:08:13 localhost installer[725]: + curl --retry 10 --fail 'http://matchbox1-corp.int:8080/ignition?uuid=421bf121-caf3-9f8b-a747-7a504447a9cb&mac=00-50-56-9b-eb-65&os=installed' -o ignition.json
Mar 27 22:08:13 localhost installer[725]:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
Mar 27 22:08:13 localhost installer[725]:                                  Dload  Upload   Total   Spent    Left  Speed
Mar 27 22:08:13 localhost installer[725]: [158B blob data]
Mar 27 22:08:13 localhost installer[725]: + coreos-install -d /dev/sda -C stable -V 1632.3.0 -i ignition.json -b http://matchbox1-corp.int:8080/assets/coreos -o vmware_raw
Mar 27 22:08:14 localhost installer[725]: Downloading the signature for http://matchbox1-corp.int:8080/assets/coreos/1632.3.0/coreos_production_vmware_raw_image.bin.bz2...
Mar 27 22:08:14 localhost installer[725]: 2018-03-27 22:08:14 URL:http://matchbox1-corp.int:8080/assets/coreos/1632.3.0/coreos_production_vmware_raw_image.bin.bz2.sig [566/566] -> "/tmp/coreos-install.Zbh
Mar 27 22:08:14 localhost installer[725]: Downloading, writing and verifying coreos_production_vmware_raw_image.bin.bz2...
Mar 27 22:09:23 localhost installer[725]: 2018-03-27 22:09:23 URL:http://matchbox1-corp.int:8080/assets/coreos/1632.3.0/coreos_production_vmware_raw_image.bin.bz2 [353911287/353911287] -> "-" [1]
Mar 27 22:09:30 localhost installer[725]: gpg: Signature made Wed Feb 14 04:42:32 2018 UTC
Mar 27 22:09:30 localhost installer[725]: gpg:                using RSA key 8826AD9569F575AD3F5643E7DE2F8F87EF4B4ED9
Mar 27 22:09:30 localhost installer[725]: gpg: key 50E0885593D2DCB4 marked as ultimately trusted
Mar 27 22:09:30 localhost installer[725]: gpg: checking the trustdb
Mar 27 22:09:30 localhost installer[725]: gpg: marginals needed: 3  completes needed: 1  trust model: pgp
Mar 27 22:09:30 localhost installer[725]: gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
Mar 27 22:09:30 localhost installer[725]: gpg: Good signature from "CoreOS Buildbot (Offical Builds) <buil...@coreos.com>" [ultimate]
Mar 27 22:09:31 localhost installer[725]: Installing Ignition config ignition.json...
Mar 27 22:09:31 localhost installer[725]: Success! CoreOS Container Linux stable 1632.3.0 (vmware_raw) is installed on /dev/sda
Mar 27 22:09:31 localhost installer[725]: + udevadm settle

This is the ignition.json file that was downloaded by the curl command and before the reboot from disk:

localhost ~ # cat /ignition.json |jq "."
{
  "ignition": {
    "config": {},
    "timeouts": {},
    "version": "2.1.0"
  },
  "networkd": {},
  "passwd": {
    "users": [
      {
        "name": "core",
        "sshAuthorizedKeys": [
          "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCW49vWpygX+mk2iuLWhL0DILm49k+5oh3A0w1gi3MGeHjBnc7ujCIGcKC3CEmjYzACcm4VVRpTS0gv+cM+Gyh/xyZ+zPgHWhFjuBYPreBgahRoLzhSDQjnVtKP08XHFr4M4c3qbNj/bIYB27NGpXxLRm0JFxixjLVT65MYmHBzPJr/tMsYF5yA7HODJOu0Tja0mo1laqsNoTSe6NOKM9usBAhYmZD4QtxZkghws6PwHAZM6qKBGPabk5DH6CyNugzXu2brgRzT8YR9Kv7Tq1v8tk8S4zDQnxFckv2yJobwS6rUr2bf54gROTGpRg9vMvmYQzhLbInexLxe0LQWV+Gd matc...@matchbox1-corp.8x8hosts.corp"
        ]
      }
    ]
  },
  "storage": {
    "files": [
      {
        "filesystem": "root",
        "group": {},
        "path": "/opt/installer",
        "user": {},
        "contents": {
          "source": "data:,%23!%2Fbin%2Fbash%20-ex%0Acurl%20--retry%2010%20--fail%20%22http%3A%2F%2Fmatchbox1-corp.int%3A8080%2Fignition%3Fuuid%3D421bf121-caf3-9f8b-a747-7a504447a9cb%26mac%3D00-50-56-9b-eb-65%26os%3Dinstalled%26os%3Dinstalled%22%20-o%20ignition.json%0Acoreos-install%20-d%20%2Fdev%2Fsda%20-C%20stable%20-V%201632.3.0%20-i%20ignition.json%20-b%20http%3A%2F%2Fmatchbox1-corp.int%3A8080%2Fassets%2Fcoreos%20-o%20vmware_raw%0Audevadm%20settle%0A%23%23systemctl%20reboot%0A",
          "verification": {}
        },
        "mode": 320
      }
    ]
  },
  "systemd": {
    "units": [
      {
        "contents": "[Unit]\nRequires=network-online.target\nAfter=network-online.target\n[Service]\nType=simple\nExecStart=/opt/installer\n[Install]\nWantedBy=multi-user.target\n",
        "enable": true,
        "name": "installer.service"
      }
    ]
  }
}


On Mon, Mar 26, 2018 at 11:42 AM, Derek Gonyeo <dgo...@redhat.com> wrote:
After you log in via setting coreos.autologin can you grab the output of "journalctl -t ignition --no-pager" and share it here? No clue what's up with the disk corruption, but I might at least be able to help hunt down why your SSH key isn't being set.

--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Mukarram Syed

unread,
Mar 30, 2018, 5:56:37 PM3/30/18
to CoreOS User

Hi,
Looks like this is the same kind of bug I am hitting.

https://github.com/coreos/bugs/issues/1091

Please advice.
Reply all
Reply to author
Forward
0 new messages