ANNOUNCE: Microkernel-002

111 views
Skip to first unread message

David Lutterkort

unread,
Oct 28, 2013, 9:44:56 PM10/28/13
to puppet...@googlegroups.com
I am pleased to announce the availability of the 2nd iteration of the razor-el-mk microkernel. The release is a drop-in replacement for the old microkernel, and can be installed on a running Razor server with

tar xf razor-microkernel-002.tar -C $repo_store_root

where repo_store_root is the directory you set in your config.yaml for storing repositories.


New in this release

The NEWS file contains all the details. Highlights include:
  • Use latest Facter (1.7.3) from Puppet Labs directly, rather than the somewhat stale Facter in Fedora 19
  • Reduced image size (~ 130 MB down from ~ 190 MB)
  • DHCP on all physical network interfaces
  • Note that sshd is not started by default. To start it, log into the console of a machine and run 'systemctl start sshd.service'
Verifying the release

The microkernel-002.tar file contains the sha256sums of the kernel and initrd, signed with key FC6E8A22 ; to verify the integrity of your download, do the following:

tar xf microkernel-002.tar
cd microkernel
gpg --verify SHA256SUM.sig && sha256sum -c SHA256SUM

David

croel...@gmail.com

unread,
Oct 30, 2013, 10:42:48 AM10/30/13
to puppet...@googlegroups.com
I run into a problem with the new mk on a physical server. The network interfaces are never brought online.
I have to manually run ifup or service network restart. Only then the network will get online and mk service will start.

On virtual machines it runs fine. Difference between physical and virtual machine is
1: VM has 1 nic, physical machine has 2 nics connected, total 6 nics.
2: device names on physical machine are eno# for onboard adapters instead of ens###

This particular machine pxe boots over second nic because somehow PXE is broken on first nic. After booting the mk it can use the first nic. on the previous mk this was no problem. And after manually bringing up eno1 everything works just fine. Also deployment works fine over that nic.

Looks like the "DHCP Everything" might cause a problem here. Maybe some script does not wait long enough for all adapters to come online? just guessing here.

croel...@gmail.com

unread,
Oct 30, 2013, 11:25:48 AM10/30/13
to puppet...@googlegroups.com
Did some more digging. The log shows that network.service gets restarted for every NIC in the system after the nic is renamed by udev. By the time it gets to NIC 6 it complains that start requests were too quickly.

Heres the log:
Oct 30 15:06:53 localhost systemd-udevd[384]: renamed network interface eth0 to eno1
Oct 30 15:06:53 localhost systemd[1]: Stopping LSB: Bring up/down networking...
Oct 30 15:06:53 localhost systemd[1]: Starting LSB: Bring up/down networking...
Oct 30 15:06:53 localhost kernel: scsi 0:2:0:0: Direct-Access     DELL     PERC 6/i         1.22 PQ: 0 ANSI: 5
Oct 30 15:06:53 localhost systemd-udevd[388]: renamed network interface eth4 to enp5s0f0
Oct 30 15:06:53 localhost systemd[1]: Stopping LSB: Bring up/down networking...
Oct 30 15:06:53 localhost kernel: scsi 0:0:32:0: Attached scsi generic sg0 type 13
Oct 30 15:06:53 localhost kernel: sd 0:2:0:0: Attached scsi generic sg1 type 0
Oct 30 15:06:53 localhost systemd[1]: Starting LSB: Bring up/down networking...
Oct 30 15:06:53 localhost kernel: sd 0:2:0:0: [sda] 142082048 512-byte logical blocks: (72.7 GB/67.7 GiB)
Oct 30 15:06:53 localhost kernel: sd 0:2:0:0: [sda] Write Protect is off
Oct 30 15:06:53 localhost kernel: sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08
Oct 30 15:06:53 localhost kernel: sd 0:2:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
Oct 30 15:06:53 localhost kernel:  sda: unknown partition table
Oct 30 15:06:53 localhost kernel: sd 0:2:0:0: [sda] Attached SCSI disk
Oct 30 15:06:53 localhost systemd[1]: Stopping LSB: Bring up/down networking...
Oct 30 15:06:53 localhost systemd[1]: Starting LSB: Bring up/down networking...
Oct 30 15:06:53 localhost network[531]: Bringing up loopback interface:
Oct 30 15:06:53 localhost network[436]: Bringing up loopback interface:
Oct 30 15:06:53 localhost kernel: fbcon: mgadrmfb (fb0) is primary device
Oct 30 15:06:53 localhost kernel: [drm] mga base 0
Oct 30 15:06:53 localhost systemd-udevd[389]: renamed network interface eth5 to enp5s0f1
Oct 30 15:06:53 localhost kernel: ses 0:0:32:0: Attached Enclosure device
Oct 30 15:06:53 localhost kernel: Console: switching to colour frame buffer device 160x64
Oct 30 15:06:53 localhost kernel: CE: hpet increased min_delta_ns to 20113 nsec
Oct 30 15:06:53 localhost kernel: mgag200 0000:07:03.0: fb0: mgadrmfb frame buffer device
Oct 30 15:06:53 localhost kernel: mgag200 0000:07:03.0: registered panic notifier
Oct 30 15:06:53 localhost kernel: [drm] Initialized mgag200 1.0.0 20110418 for 0000:07:03.0 on minor 0
Oct 30 15:06:53 localhost systemd-udevd[386]: renamed network interface eth2 to enp3s0f0
Oct 30 15:06:53 localhost systemd[1]: Stopping LSB: Bring up/down networking...
Oct 30 15:06:53 localhost systemd-udevd[387]: renamed network interface eth3 to enp3s0f1
Oct 30 15:06:53 localhost systemd[1]: Starting LSB: Bring up/down networking...
Oct 30 15:06:53 localhost systemd[1]: Stopping LSB: Bring up/down networking...
Oct 30 15:06:53 localhost systemd[1]: Starting LSB: Bring up/down networking...
Oct 30 15:06:53 localhost systemd[1]: network.service start request repeated too quickly, refusing to start.
Oct 30 15:06:53 localhost systemd[1]: Failed to start LSB: Bring up/down networking.
Oct 30 15:06:53 localhost systemd[1]: Unit network.service entered failed state.
Oct 30 15:06:53 localhost systemd[1]: Starting Network.
Oct 30 15:06:53 localhost systemd[1]: Reached target Network.
Oct 30 15:06:53 localhost systemd[1]: Starting Razor Microkernel Agent trigger.
Oct 30 15:06:53 localhost systemd[1]: Started Razor Microkernel Agent trigger.
Oct 30 15:06:53 localhost systemd[1]: Starting Multi-User System.
Oct 30 15:06:53 localhost systemd[1]: Reached target Multi-User System.
Oct 30 15:06:53 localhost systemd[1]: Starting Graphical Interface.
Oct 30 15:06:53 localhost systemd[1]: Reached target Graphical Interface.
Oct 30 15:06:53 localhost systemd[1]: Starting Stop Read-Ahead Data Collection 10s After Completed Startup.
Oct 30 15:06:53 localhost systemd[1]: Started Update UTMP about System Runlevel Changes.
Oct 30 15:06:53 localhost systemd[1]: Startup finished in 1.784s (kernel) + 755ms (initrd) + 1.324s (userspace) = 3.864s.
Oct 30 15:06:53 localhost systemd-udevd[385]: renamed network interface eth1 to eno2
Oct 30 15:06:53 localhost systemd[1]: Starting LSB: Bring up/down networking...
Oct 30 15:06:53 localhost systemd[1]: network.service start request repeated too quickly, refusing to start.
Oct 30 15:06:53 localhost systemd[1]: Failed to start LSB: Bring up/down networking.

Daniel Pittman

unread,
Oct 30, 2013, 1:28:19 PM10/30/13
to puppet...@googlegroups.com
On Wed, Oct 30, 2013 at 8:25 AM, <croel...@gmail.com> wrote:
> Did some more digging. The log shows that network.service gets restarted for
> every NIC in the system after the nic is renamed by udev. By the time it
> gets to NIC 6 it complains that start requests were too quickly.

Yes: when I implemented dynamic interface configuration, I had it
trigger a restart of the interface by kicking the `network.service`
thing. I tested multiple interfaces, but obviously not enough to hit
that failure mode. :(

Directly calling ifup from the udev script ended up causing more
trouble than it was worth -- the dhclient instances ended up killed by
systemd if they took too long, which they did when the interface was
present, cabled, but no DHCP server was responding.

My fallback plan was to add NetworkManager to the microkernel, since
that is the upstream default model, and ask it to "manage" the
interfaces rather than using the legacy network.service.

That should be a little more robust, but more "developer effort
intensive" to get working. I updated
https://github.com/puppetlabs/razor-el-mk/issues/4 to reflect that.

Thanks for testing this, and I am sorry it didn't work out in the real
world. I noted the NIC count in that ticket, and will try to
reproduce the failure on my testbench before I update the MK, so that
I know it resolves the problem. :)

--
Daniel Pittman
⎋ Puppet Labs Developer – http://puppetlabs.com
♲ Made with 100 percent post-consumer electrons

croel...@gmail.com

unread,
Oct 31, 2013, 7:35:55 AM10/31/13
to puppet...@googlegroups.com
I created a quick fix to get the new mk working on my physical machines. I added this to mk-network.ks:

cat > /etc/systemd/system/network.service << 'EOF'
[Unit]
Description=Network Connectivity for <interface
Wants=network.target
Before=network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/etc/init.d/network start
ExecStop=/etc/init.d/network stop
StartLimitBurst=10

[Install]
WantedBy=multi-user.target

EOF


It takes five minutes for the network service to start but at least it works :)
The StartLimitBurst=10 should allow for 10 NICs.

David Lutterkort

unread,
Oct 31, 2013, 1:05:32 PM10/31/13
to puppet...@googlegroups.com
Could you open an issue ? I just want to make sure this doesn't disappear in the ML archives.

David


--
You received this message because you are subscribed to the Google Groups "puppet-razor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-razor...@googlegroups.com.
To post to this group, send email to puppet...@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-razor.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages