Locksmith on single host deployment

154 views
Skip to first unread message

MikeM

unread,
Apr 10, 2017, 9:52:58 PM4/10/17
to CoreOS User
I'm new to CoreOS but have been using it on a single host EC2 deployment and want to use locksmith to set a maintenance window.

I've tried the following two cloud-config:

#cloud-config
write_files
:
 
- path: /home/core/.dockercfg
[...]
 
coreos
:
  locksmith
:
    window
-start: 05:00
    window
-length: 1h

  update
:
    reboot
-strategy: reboot

  units
:
[...]



and one where I start etcd:

#cloud-config
write_files
:
 
- path: /home/core/.dockercfg
[...]
 
coreos
:
  locksmith
:
    window
-start: 05:00
    window
-length: 1h

  update
:
    reboot
-strategy: reboot

  units
:
   
- name: etcd.service
     command
: start
[...]


Based on the docs at https://github.com/coreos/locksmith#reboot-windows, I would expect to see environment variables for locksmith but there are none - not sure if there is another way to check if the window is properly set. The logs also indicate the system's last update reboot occurred outside of the maintenance window. Everything else seems to work great.

I'm sure it's something unbelievably simple that I'm missing, but I can't figure it out. All suggestions greatly appreciated!

Alex Crawford

unread,
Apr 11, 2017, 3:23:55 PM4/11/17
to MikeM, CoreOS User
On 04/10, MikeM wrote:
> I'm new to CoreOS but have been using it on a single host EC2 deployment
> and want to use locksmith to set a maintenance window.
>
> Based on the docs at https://github.com/coreos/locksmith#reboot-windows, I
> would expect to see environment variables for locksmith but there are none
> - not sure if there is another way to check if the window is properly set.
> The logs also indicate the system's last update reboot occurred outside of
> the maintenance window. Everything else seems to work great.
>
> I'm sure it's something unbelievably simple that I'm missing, but I can't
> figure it out. All suggestions greatly appreciated!

The parameters are written as drop-ins for locksmithd.service. For
example, on my system I see the following:

# /etc/systemd/system/locksmithd.service.d/window.conf
[Service]
Environment=REBOOT_WINDOW_START=05:00
Environment=REBOOT_WINDOW_LENGTH=10m

Off the top of my head, there are two potential reasons why it may have
rebooted outside the window. First, there may be a bug around the etcd
locking mechanism where very large clusters may take longer than the
reboot window to fully upgrade. I haven't looked at the code, so I'm not
sure if this is actually possible. The second reason, and much more
likely, is that locksmith's config was written after it was restarted.

We can test to see if there was an ordering issue by looking at the
logs (`journalctl -u locksmithd -t coreos-cloudinit --no-pager`). If
coreos-cloudinit wrote the config after locksmith started and didn't
restart it, that would explain the error.

-Alex
signature.asc

MikeM

unread,
Apr 11, 2017, 9:32:40 PM4/11/17
to CoreOS User, mikekm...@gmail.com
Hi Alex - thanks for your suggestions. Here's the results: 

So I don't seem to have anything at # /etc/systemd/system/locksmithd.service.d - so I guess that's the main issue. I also get no entries when I run "journalctl -u locksmithd -t coreos-cloudinit --no-pager". It looks like for whatever reason the locksmith part of my cloud-config is essentially having no effect. Any thoughts about what could cause that? Is there an cloud-config ordering issue?

Thanks again for your help.

Alex Crawford

unread,
Apr 12, 2017, 5:27:12 PM4/12/17
to MikeM, CoreOS User
On 04/11, MikeM wrote:
> So I don't seem to have anything at #
> /etc/systemd/system/locksmithd.service.d - so I guess that's the main
> issue.

Yeah, that doesn't sound good. You can use the online validator [1] to
double check your config. You can also look at the logs for
coreos-cloudinit specifically to see what it's running:

journalctl -b -t coreos-cloudinit

> Is there an cloud-config ordering issue?

There are almost always cloud-config ordering issues. Partly for this
reason, we've been investing in Ignition [2] and ct [3] instead. As an
example, you will be able to use the following Container Linux Config
to configure locksmith in the future [4]:

locksmith:
reboot_strategy: reboot
window_start: 05:00
window_length: 1h

From this config, ct will generate the following Ignition config which
can be passed directly to the machine:

{
"ignition": { "version": "2.0.0" },
"storage": {
"files": [{
"filesystem": "root",
"path": "/etc/coreos/update.conf",
"contents": {
"source": "data:,%0AREBOOT_STRATEGY%3D%22reboot%22%0ALOCKSMITHD_REBOOT_WINDOW_START%3D%2205%3A00%22%0ALOCKSMITHD_REBOOT_WINDOW_LENGTH%3D%221h%22",
},
"mode": 420,
}]
}
}

Since Ignition runs long before locksmith and even systemd itself, this
won't have any ordering issues or race conditions.

-Alex
signature.asc

Alex Crawford

unread,
Apr 12, 2017, 5:29:01 PM4/12/17
to MikeM, CoreOS User
signature.asc

MikeM

unread,
Apr 14, 2017, 10:54:41 PM4/14/17
to CoreOS User, mikekm...@gmail.com
So I ran  "journalctl -b -t coreos-cloudinit" as you suggested and it turns out that the drop-in was being written to "/run/systemd/system/locksmithd.service.d/20-cloudinit.conf" rather than /etc/... . The variables are there as expected.

So I'm wondering if the reboot was triggered by something else (e.g. me). I'm just going to assume the reboot was my error and monitor it on the next upgrade to see if everything behaves as expected. 

Alex - I really appreciate your help on this. 
Reply all
Reply to author
Forward
0 new messages