CoreOS doesnt fetch ignition when booting from PXE

372 views
Skip to first unread message

Václav Rozsypálek

unread,
Dec 4, 2017, 5:01:09 AM12/4/17
to CoreOS User
Hello,

I have strange situation with booting from PXE and using ignition.

I have stup DHCP + PXE server where DHCP provides IP and serves ipxe image.
IPXE image will fetch kernel and initrd from XPE server.
Kernel has set "coreos.config.url" to the URL of the ignition  fro that server.

When i test this on virtualized infrastructure (KVM based) this work nicely and I have issues, but the absolute same setup is not working on baremetal.

During the booting phase of the kernel the ignition is never fetched (I cant see any log entries on the http server which serves the ignition). I can ping the server in the boot phase (i get ip via DHCP)  also in older vesion of our system we used cloudconfig for the installation on the same physical machines and it worked nicely so its not in the baremetal/network configuration.


I tried to increase kernel boot log verbosity but i don tsee any errors regarding ignition. ("systemd.journald.max_level_console=debug debug ignore_loglevel log_buf_len=10M print_fatal_signals=1 LOGLEVEL=8 earlyprintk=vga,keep sched_debug initcall_debug udev.log_priority=8")

Do you have any hints how could i debug the issue more? (also the log evel on the kernel is big so maybe i missed something? hard to get the logs as i have only ILO access).

CoreOS version is "1465.8.0" for both enviroments.

Václav Rozsypálek

unread,
Dec 4, 2017, 5:02:43 AM12/4/17
to CoreOS User
Additional info:
The boot process will  wait for some time (around 120s) to finish  but then it decided that it failed and reboot machine.

Seán C. McCord

unread,
Dec 4, 2017, 10:35:41 AM12/4/17
to Václav Rozsypálek, CoreOS User
I would turn up the logging of matchbox and see if, perhaps, you have a mismatch in the, er, matching parameters.  With debug logging turned on, you can see each of the requests as they come in.

--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Seán C McCord
CyCore Systems, Inc

Václav Rozsypálek

unread,
Dec 4, 2017, 10:42:12 AM12/4/17
to CoreOS User
Hello Sean,

thanks for the answer.

We are not using matchbox. We built our custom tooling around that (because matchbox lack some extra functionality that we need).
But anyway the ignition is served by basic golang http server and we see all logs of that http server and there is no request reaching the URL of ignition on http server at all. So we have the same visibility as we would have with matchbox with debug log.

There is just no GET request for the ignition URL. That is the issue.

Seán C. McCord

unread,
Dec 4, 2017, 10:56:52 AM12/4/17
to Václav Rozsypálek, CoreOS User
In that case, I would next pass `coreos.autologin=tty1` as an additional kernel parameter and see if you can't inspect what is happening directly.  All the normal networking things come to mind:  DNS resolution, IP conflicts, default gateway, wrong interface, NAT, etc.  And, of course, you will also be able to debug ignition's execution itself.

Alex Crawford

unread,
Dec 4, 2017, 11:11:31 AM12/4/17
to Václav Rozsypálek, CoreOS User
On 12/04, Václav Rozsypálek wrote:
> We are not using matchbox. We built our custom tooling around that (because
> matchbox lack some extra functionality that we need).
> But anyway the ignition is served by basic golang http server and we see
> all logs of that http server and there is no request reaching the URL of
> ignition on http server at all. So we have the same visibility as we would
> have with matchbox with debug log.
>
> There is just no GET request for the ignition URL. That is the issue.

You need to pass `coreos.first_boot=1` on the kernel command line. Since
Ignition only runs on the first boot, it needs somewhere to store that
state. In a PXE environment, there is no constant location for this
state, so it is moved to the PXE server instead.

-Alex
signature.asc

Václav Rozsypálek

unread,
Dec 4, 2017, 11:43:02 AM12/4/17
to CoreOS User
Hey guys,

I thnik got little bit closer.

The issue might be following: the baremetal machine have lots of NIC and the one that is actualy connected to DHCP is the last one.

So the scenario plays likes this:
* systems is running DHCP on eth0  - timeouts after some time
* systems is running DHCP on eth1  - timeouts after some time
* systems is running DHCP on eth2 - timeouts after some time
* systems is running DHCP on eth3 - timeouts after some time
* systems is running DHCP on eth4 - timeouts after some time
* systems is running DHCP on eth5 - timeouts after some time
* systems is running DHCP on eth6 - timeouts after some time
* systems is running DHCP on eth7 - successly get ip from DHCP

the problem is that when the eth7 is configured the ignition service already reached the maximum failed attempts to fetch the ignition config and is failed.

Is there a way how i can  delay the ignition start or disable dhcp or specific interfaces via kernel parameter?

Václav Rozsypálek

unread,
Dec 4, 2017, 12:23:40 PM12/4/17
to CoreOS User

I can confirm this.
I dropped into the emergency shell and  waited for DHCP to get IP  on the last interface  and after that run the ignition command manually and woaala ihe ignition worked, it doesn not solve the issue but atleast confirm where the problem is.

ANy idea how could i force the DHCP to be run first on specific interface or increase the amount that ignition service is trying to fetch ignition config?

Seán C. McCord

unread,
Dec 4, 2017, 2:58:10 PM12/4/17
to Václav Rozsypálek, CoreOS User
You can try passing the kernel-level autoconfiguration flag at the kernel commandline:  `ip=dhcp`.  That way, the interface will already be up when userspace is called.  The problem will likely be, though, that systemd probably won't consider networking to be "up."  Still, it's worth a try.


--
You received this message because you are subscribed to the Google Groups "CoreOS User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to coreos-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Crawford

unread,
Dec 4, 2017, 3:45:19 PM12/4/17
to Seán C. McCord, Václav Rozsypálek, CoreOS User
On 12/04, Seán C. McCord wrote:
> You can try passing the kernel-level autoconfiguration flag at the kernel
> commandline: `ip=dhcp`. That way, the interface will already be up when
> userspace is called. The problem will likely be, though, that systemd
> probably won't consider networking to be "up." Still, it's worth a try.

The `ip` command line option is still evaluated by networkd [1], so I
don't think that would help. It's interesting that DHCP takes so long
when there are multiple interfaces. That sounds like a networkd bug. As
a workaround, it might work to assign static, bogus addresses to the
unused interfaces.

-Alex

[1]: https://github.com/coreos/bootengine/blob/f927ba07477286996d636e658bd8daaad9eccb98/dracut/03coreos-network/parse-ip-for-networkd.sh
signature.asc

Václav Rozsypálek

unread,
Dec 4, 2017, 4:04:45 PM12/4/17
to CoreOS User
Hey,

it takes around 2 minutes for networkd to run dhcp on the last interface.

For now, I have a workaround. I inject systemd drop-in into pxe image.

File - '/etc/systemd/system/ignition-disks.service.d/00-delay.conf'
with content:
[Service]
ExecStartPre=/bin/bash -c "while [ \"$(ip addr | grep 'inet ' | grep -v '127.0.0.1')\" = \"\" ];do echo 'Waiting for ip on any interface' && sleep 2s;done;"

Which basically waits until there is some valid ip assigned to one of the interfaces (I know there is probabbly more elegant solution without that many bash magic).

This works. Boot phase is delayed but after the interface get ip from DHCP machine will continue booting and fetch ignition config.

PS: I tried the kernel param 'ip=dhcp' but that did not really helped.
Reply all
Reply to author
Forward
0 new messages