KVM and F22 move, saffron reboot

11 views
Skip to first unread message

Jeremy Morse

unread,
Aug 24, 2015, 7:20:01 PM8/24/15
to srobo...@googlegroups.com
Hi,

Sometime in the very near future we should move to F22, to avoid running
the year on an out of date distro. This would help the fact that nemesis
is currently disabled due to some python/wsgi problems.

I'm going to try mangling some F22 things this evening / tomorrow, with
a view to deploying before the 1st of Sept. I'd like to have a situation
like we had last year, where we created teacher accounts the moment they
had their place confirmed, and could register students immediately.

~

Linode are currently offering KVM linodes that are faster; eventually
they're going to stop supporting xen it seems. Currently saffron is
configured with pv-grub, something to allow us to boot an arbitrary
kernel from disk, where VMs normally have their kernel configured
externally. This allows us to pick up kernel upgrades the moment they
hit repos, rather than waiting for linode. However, that doesn't work
with the new KVM situation, instead one has to install a bootloader
stage to the partition table and go through the whole grub rigmarole
[0]. Happily one doesn't need to install it to the MBR or anything.

I'm going to muck about with this on a different linode tomorrow, and if
it works fine I'll reboot saffron to that configuration tomorrow
evening. There's no particular time to perform this move, aside from
"before the competition starts".

[0]
https://www.linode.com/docs/tools-reference/custom-kernels-distros/run-a-distribution-supplied-kernel-with-kvm

--
Thanks,
Jeremy

signature.asc

Rob Spanton

unread,
Aug 24, 2015, 8:07:07 PM8/24/15
to srobo...@googlegroups.com
On Tue, 2015-08-25 at 00:19 +0100, Jeremy Morse wrote:
> I'm going to try mangling some F22 things this evening / tomorrow, with
> a view to deploying before the 1st of Sept.

What's the current level of srobo-server-on-F22 functionality?

R
signature.asc

Jeremy Morse

unread,
Aug 24, 2015, 8:24:03 PM8/24/15
to srobo...@googlegroups.com
Hi,

On 25/08/15 01:07, Rob Spanton wrote:
> What's the current level of srobo-server-on-F22 functionality?

Peter's done some work on rejuggling how configuration works, to make it
happen through hiera rather than the extlookup function and external
'common.csv'. Details on this in #3069, #3090. The cause of this is that
the latest version of puppet removes extlookup.

The configuration hasn't actually been tested on F22 yet. I currently
have some unpushed commits just now that's moving sufficient amounts of
'extlookup' configuration to hiera to get that working. From reading the
release notes, I don't believe any serious breaking changes are likely
(mariadb and php were upgraded, XFS is the new default fs, Peter has
addressed the yum <=> dnf situation).

--
Thanks,
Jeremy

signature.asc

Jeremy Morse

unread,
Aug 24, 2015, 10:19:39 PM8/24/15
to srobo...@googlegroups.com
Hi,

My latest work is on the 'f22' branch of server/puppet.git. Most of the
changes there are:
* Moving default configuration from common.csv to the hiera defaults
file
* Replacing 'extlookup' with 'hiera' in various files
* Resolving numerous breakages due to namespace rule changes
* Moving a few classes that had hiphens in their names, which are now
not permitted.

The following error may be interesting in the future, but is not important:

Error: Could not get latest version: undefined method `[]' for nil:NilClass
Error: /Stage[main]/Sr_site/Package[yum-plugin-fastestmirror]/ensure:
change from 1.1.31-508.fc22 to latest failed: Could not get latest
version: undefined method `[]' for nil:NilClass

The only real problem was that the subversion module that we're using
doesn't work with puppet 4.0, due to using octal integers in the 'mode'
argument to files, where it should be now using a string. I've switched
puppet to load from my github, fixed that particular problem, and sent a
pull request upstream.

~

I was going to send the email above and then work through whatever
problems cropped up when applying puppet, but then metacity decided that
mouse clicks weren't a feature needed for modern computers. (Only for
thunderbird it seems).

Anyway, I've wrangled my way through some more puppet kludge, mostly now
fixed on the f22 branch of the puppet repo. There are two gotchas however:
* /etc/dnf/dnf.conf needs a line adding, "obseletes=0" or something,
don't have the exact line as I rebooted to write this email. A
similar line exists in /etc/yum.conf. Anyway, "obseletes" needs to
be turned off / set to zero, otherwise a bug in puppet fails to
parse the output of yum correctly.
* nslcd now appears to have it's own config file instead of sucking
information out of nscd, and the ldap connection details are going
to need to be copied into it. This is the point where I gave up
for the night.

I haven't tried using any of the services on the machine.

--
Thanks,
Jeremy



signature.asc

Jeremy Morse

unread,
Aug 26, 2015, 8:35:07 AM8/26/15
to srobo...@googlegroups.com
Hi,

The nslcd hiccup has now been addressed. I might test services later, I
might not. It should be possible to deploy a machine, given the caveats:
* It'll probably require several runs at this point
* The "obsoletes=0" line needs to be added to dnf.conf

Inre several runs, there's still an existing problem that the 'fritter'
user isn't found on the first run. This requires that after it's added
to LDAP, `nscd -i passwd` is run to invalidate the passwd db cache, so
that subsequent getpwent(3)'s work. For some reason a previous attempt
at this failed, and requires more investigation.

--
Thanks,
Jeremy

signature.asc

Jeremy Morse

unread,
Aug 27, 2015, 6:55:40 PM8/27/15
to srobo...@googlegroups.com
Hi,

Because people need to preserve their jobs apparently, all the PAM
configuration mechanisms have changed again. Today, nslcd will be the
location of our shell-users authorization filter. This particular
filtering mechanism does not have the ability to set a search base,
meaning that any group with name cn=shell-users, not a specific group,
will let people in. (This is not a problem, just irritating).

This makes for four server upgrades and four different PAM configuration
mechanisms. Colour me unamused.

The current matter that doesn't work is everything web based: because
things sit behind nginx now and get rewritten, being hosted on
'localhost:5443' doesn't seem (AFAIK) to be correctly rewritten.
Specifically, accessing '/ide/' is reversed proxy'd to the IDE httpd
server, but the URL is not rewritten to '/'. This isn't my area of
expertise.

--
Thanks,
Jeremy

signature.asc

Jeremy Morse

unread,
Aug 27, 2015, 9:55:55 PM8/27/15
to srobo...@googlegroups.com
Hi,

On 26/08/15 13:35, Jeremy Morse wrote:
> Inre several runs, there's still an existing problem that the 'fritter'
> user isn't found on the first run. This requires that after it's added
> to LDAP, `nscd -i passwd` is run to invalidate the passwd db cache, so
> that subsequent getpwent(3)'s work. For some reason a previous attempt
> at this failed, and requires more investigation.

Neither 'nscd -i passwd' or just 'service nscd restart' fix this. Ctrl+C
immediately after the problem occurs shows that the 'fritter' user
doesn't exist immediately after a flush/restart... but then magically
does 10 seconds later. I anticipate this means there's some kind of
cache out there with a delay. If so, there's not much we can do about that.

--
Thanks,
Jeremy

signature.asc

Peter Law

unread,
Aug 31, 2015, 12:43:28 PM8/31/15
to srobo...@googlegroups.com
Jeremy wrote:
> Peter has addressed the yum <=> dnf situation

I'm not aware that I've done anything about this other than dumping
comments onto the upgrade ticket (#3069), where Rob eventually pointed
out that yum was still present on F22 and that we could use it.

From the looks of this thread this is true, but there is some manual
gunge (presumably disabling the obsoletion warnings?) needed to
actually make things work.

Thanks,
Peter

#3069 https://www.studentrobotics.org/trac/ticket/3069

Jeremy Morse

unread,
Aug 31, 2015, 2:25:59 PM8/31/15
to srobo...@googlegroups.com
Hi,

On 31/08/15 17:43, Peter Law wrote:
> I'm not aware that I've done anything about this other than dumping
> comments onto the upgrade ticket (#3069), where Rob eventually pointed
> out that yum was still present on F22 and that we could use it.

Ah, OK, I'd over-interpreted what'd been going on then,

> From the looks of this thread this is true, but there is some manual
> gunge (presumably disabling the obsoletion warnings?) needed to
> actually make things work.

I think it's currently relying on the fact that yum is symlinked to dnf,
and puppet already just ran yum commands and interpreted the results.
The obseletes thing is going to have to rely on some kind of pre-puppet
deployment misery to work I think.

IMO F22 seems to be working fine under the SR configuration after some
wrangling; I think we should deploy it next weekend.

--
Thanks,
Jeremy

signature.asc

Jeremy Morse

unread,
Sep 5, 2015, 7:24:36 AM9/5/15
to srobo...@googlegroups.com
Hi,

On 31/08/15 19:25, Jeremy Morse wrote:
> IMO F22 seems to be working fine under the SR configuration after some
> wrangling; I think we should deploy it next weekend.

I've had all the spare time sucked out of me recently, however I'd like
to designate 10pm onwards, tonight and tomorrow, as being "at risk"
periods. If things get deployed, it'll be via fedoras 'fedup' tool
which'll leave the machine down for something like an hour, after which
it won't work until puppet completes.

My backup plan is to a) take a backup before and b) deploy to a
different VM in case of failure.

--
Thanks,
Jeremy

signature.asc

Jeremy Morse

unread,
Sep 11, 2015, 8:40:15 AM9/11/15
to srobo...@googlegroups.com
Hi,

On 05/09/15 12:24, Jeremy Morse wrote:
> I've had all the spare time sucked out of me recently, however I'd like
> to designate 10pm onwards, tonight and tomorrow, as being "at risk"
> periods. If things get deployed, it'll be via fedoras 'fedup' tool
> which'll leave the machine down for something like an hour, after which
> it won't work until puppet completes.
>
> My backup plan is to a) take a backup before and b) deploy to a
> different VM in case of failure.

Same again, just this time my weekend isn't completely booked out.

--
Thanks,
Jeremy

signature.asc

Jeremy Morse

unread,
Sep 12, 2015, 6:26:34 PM9/12/15
to srobo...@googlegroups.com
Hi,

On 11/09/15 13:40, Jeremy Morse wrote:
> Same again, just this time my weekend isn't completely booked out.

Being concious and un-tired, this will happen in 1/2 an hour. I'll move
to KVM first as that's guaranteed to go wrong first time. If there's
sufficient time in the evening I'll then fedup and deploy puppet.

--
Thanks,
Jeremy


signature.asc

Jeremy Morse

unread,
Sep 12, 2015, 9:10:21 PM9/12/15
to srobo...@googlegroups.com
Hi,

Much later, this now works again. Only the KVM move has occurred. Long
story short, complexity involving the root filesystem being renamed (and
certain boot systems not being particularly good at telling me this...),
and me having to figure out how to switch from xen to virtio drivers in
initrd, led to fun for all the family.

I'll look at F22 late on Sunday.

--
Thanks,
Jeremy

signature.asc

Jeremy Morse

unread,
Sep 13, 2015, 8:46:41 PM9/13/15
to srobo...@googlegroups.com
Hi,

On 13/09/15 02:10, Jeremy Morse wrote:
> I'll look at F22 late on Sunday.

This went OK, and things appear to be working across reboots and puppet
applications. Errata:
* I had to update grub.cfg manually for switching to the update kernel
and initrd. That's completely standard though, I would have had to
under xen too.
* I'd set the immutable unix flag on /etc/nslcd.conf because crazy
things were happening when updating to F20 last time, which caused
some silly this time. Mea culpa.
* Ticket #3090 happened, in the anticipated manner: I didn't fully
understand what was about to happen, and all the passwords got
reset to "123456" under my feet. I've now performed some remediala
actions, documented in #3090. There was a brief period when we were
vulnerable to this: it's probably not that big a deal.
* Ruby / puppet's error messages are rubbish: some fields in the
saffron.s.o.yaml config file require enclosing in quotes. I don't
know which ones, because the aforementioned diagnostics are poor.
They're all now enclosed.
* Peter encountered a gerrit error in #3069 related to mysql: after
doing some prodding [0] this turns out to be a mismatch between the
mysql java connector and the mariadb one. Installing the fedora-repo
one and symlinking it into gerrit's lib dir resolves this. Puppet
attempts to re-install the old java connector, which makes gerrit
croak. I've patched this for the moment by zeroing the old connector
file and marking it immutable.
* Grub currently hangs on boot, with the message:
error: file `/boot/grub/i386-pc/all_video.mod' not found.
after which it demands a key is pressed to continue. I haven't put
any effort into debugging this.
* Gerrit sometimes doesn't start automatically. It's probably because
it's LSB block doesn't identify either mysql or slapd as
dependencies.

This leaves us with an apparently working system, but with some
additional work required to fix things up, which I'll do over the next
week I guess.

Peter asked me why I was bothering trying an upgrade rather than just
deploying a new VM: after thinking about this for a while, there's no
good reason in /this/ particular circumstance of why we can't do that
(aside from the risk of losing data we haven't built in to the puppet
deploy system). However, if we weren't using VMs and had dedicated
hardware instead, that wouldn't be an option. To my mind, this upgrade
path should be available and should be exercised; otherwise we have a
Windows-tier amount of flexibility.

I haven't yet re-enabled nemesis (which was disabled due to some
python-wsgi vulnerabilities). I'll do that tomorrow.

[0] As ever, it's now impossible to diagnose any problem nowadays
without the use of strace

--
Thanks,
Jeremy

signature.asc

Jeremy Morse

unread,
Sep 15, 2015, 6:18:44 AM9/15/15
to srobo...@googlegroups.com
Hi,

In addition, some of piwik's functionality has been deprecated, and it
occasionally prints backtraces to the diagnostics page.

--
Thanks,
Jeremy

signature.asc

Rob Spanton

unread,
Sep 15, 2015, 6:42:34 AM9/15/15
to srobo...@googlegroups.com
On Tue, 2015-09-15 at 11:18 +0100, Jeremy Morse wrote:
> In addition, some of piwik's functionality has been deprecated, and it
> occasionally prints backtraces to the diagnostics page.

Maybe it's hoping that one of its users could magically fix the problem!

R
signature.asc
Reply all
Reply to author
Forward
0 new messages