Configuration Tools.

0 views
Skip to first unread message

anthonyclark

unread,
Mar 19, 2008, 12:47:11 AM3/19/08
to seed-linux-dev
I'm starting this threat to discuss the development of configuration
tools for seed-linux. I have some ideas in the working and the
documentation for each is on its way. I'm still diggint into the inner
workings of the creation of 'seeds' so i can get a better grasp on
some areas that need tools.. etc. If anyone has any ideas or whatnot,
please help me and everyone else out. I'd be glad to hear.

Thanks!
Anthony

samba

unread,
Mar 19, 2008, 1:07:57 PM3/19/08
to seed-linux-dev
Anthony,

I've put a little bit of thought into the manner of simplifying
configuration. Others may have objections to this proposal, so I
welcome any and all feedback.

It seems to me that a package's configuration can be handled by way of
simple string replacement (using 'sed'). This requires that seeds
include an "abstracted" configuration system though, where the
standard configuration files for a version of some software are
modified and perhaps treated as a different package in the seed-linux
portage overlay.

The necessary "abstraction" of the configuration would be to literally
define all the values that we believe are standard enough, and replace
all user-defined values with strings like "%hostname%", "%socketpath
%", "%someOtherVariable%". Upon installation, a script would ask the
user for relevant information, and then use 'sed' to replace the
appropriate strings in the configuration file.

So, an example to clarify. I'll be using the MySQL configuration in
this case...

In the default configuration (/etc/mysql/my.cnf) we see a line "bind-
address = 127.0.0.1". To abstract this, we replace this with
something like "bind-address = %bindIP%". Ideally, because lots of
packages will use the same value for binding to an IP, we would use a
standardized naming convention for these strings. This file should be
included in a package in the seed-linux portage overlay, perhaps named
something like "seed-linux/config-mysql-5.0.0.2", specifying a package
and a (MySQL) version.

A configuration script could be set up:
<shellscript>

#!/bin/sh

BINDIP=`read -p "Bind to IP: (blank for loopback 127.0.0.1) "`
sed -i -e 's/%bindIP%/${BINDIP}/g' /etc/mysql/my.cnf

</shellscript>

It may also be possible to develop a central configuration repository,
a database of global and per-file replacements and/or specifications.
It could look something like this:
<example>
/etc/mysql/my.cnf::bindIP::127.0.0.1 # only for MySQL
/etc/apache2/*::bindIP::192.168.0.1 # only for Apache2
*::enableClustering::true # for all files
</example>
Here I've specified that MySQL should bind only to localhost loopback,
while all files in the Apache configuration should refer to a specific
IP on the local network, accessible from other nodes. I've also
indicated that all software should be configured to support
clustering; this is probably not a direct replacement, but
configuration scripts can read the 'enableClustering' value and make
further configuration decisions based on its specification. To
determine which lines relate to a specific file, the file
specification on each line will need to be treated as a mask, and a
particular file will need to be tested against that mask; e.g. /etc/
mysql/my.cnf matches the 1st and 3rd lines, but not the 2nd.

Obviously this idea needs to be fleshed out a bit, but I think you all
see where I'm going with it. Your thoughts?


Regards,

Sam

anthonyclark

unread,
Mar 19, 2008, 1:25:08 PM3/19/08
to seed-linux-dev
Thanks for the feedback!


I'll be takin notes and try to find the best way possible to implement
configuration tools. Thanks for you help Sam.

-Anthony

Stuart Herbert

unread,
Mar 19, 2008, 3:03:08 PM3/19/08
to seed-li...@googlegroups.com
Heya,


On Wed, Mar 19, 2008 at 5:07 PM, samba <sam.brie...@gmail.com> wrote:
It seems to me that a package's configuration can be handled by way of
simple string replacement (using 'sed').

I think this is a good starting place - but that we'll need more than one approach to make all seeds easy to configure.

What can we do to *really* look at config files, and come up with a model for what goes into them?  If we can produce an abstract model that we can apply to all the config files, then we can make a killer tool that'll serve us well both now and in the future.

This requires that seeds
include an "abstracted" configuration system though, where the
standard configuration files for a version of some software are
modified and perhaps treated as a different package in the seed-linux
portage overlay.

Not a problem.  IIRC, a couple of the seed packages already install replacement config files.  It would be no trouble at all to extend this.
 
Best regards,
Stu

--
Stuart Herbert

e: stu...@stuartherbert.com
t: +44 7966 284577
w: http://www.stuartherbert.com/
b: http://blog.stuartherbert.com/

samba

unread,
Mar 19, 2008, 5:20:35 PM3/19/08
to seed-linux-dev

> I think this is a good starting place - but that we'll need more than one
> approach to make all seeds easy to configure.
>
> What can we do to *really* look at config files, and come up with a model
> for what goes into them? If we can produce an abstract model that we can
> apply to all the config files, then we can make a killer tool that'll serve
> us well both now and in the future.

Right, that's what I had in mind with the "central configuration
repository". It would be unique to each installation (unless manually
copied to another), and establishes a set of variables relevant to a
certain file. When a seed package is installed, it would poll values
from the repository, and if none is found then ask the user (and then
save it back to the repository).

Configuration files have default values, and many use unique layout
and formatting. I would imagine that each file's formatting and such
can be handled by regular expressions. Such regular expressions should
match default values in the configuration (in full-line context) and
replace them with variables from the configuration repository; they
should also leave customized configurations alone.

To properly implement this, seed-linux would need to develop
expressions for each version of each configuration file. As a
package's configuration files evolve, the patterns will need to be
updated. This may require ongoing work to maintain the expressions and
scripting for each package. Fortunately, most packages don't change
formatting too much between versions, though some (e.g. Apache) change
their config file layout, moving config data from one file to another
and such.

This amounts to a re-configuration script for every new version,
though some updates may work fine with a previous version of the
script. Due to the nature of seed-linux as a rapid-deployment,
enterprise-oriented solution, we may want to distribute seed updates
somewhat infrequently, choosing stable, thoroughly tested software,
and waiting until major updates provide significant security and/or
feature advantages. This would provide a way to reduce the burden of
maintaining the configuration scripts and such.

> IIRC, a couple of the seed packages already install
> replacement config files. It would be no trouble at all to extend this.

Glad to hear - maybe we don't need separate packages for seed
configuration after all?

As always, feedback is welcome. Please comment ;)

Ranjit Singh

unread,
Mar 20, 2008, 2:08:05 AM3/20/08
to seed-li...@googlegroups.com
On Wednesday 19 March 2008 21:20:35 samba wrote:
> > I think this is a good starting place - but that we'll need more than one
> > approach to make all seeds easy to configure.
> >
> > What can we do to *really* look at config files, and come up with a model
> > for what goes into them? If we can produce an abstract model that we can
> > apply to all the config files, then we can make a killer tool that'll
> > serve us well both now and in the future.

It would be good imo to have a meta config format similar to what tuomov
outlines here:
http://modeemi.fi/~tuomov/b//archives/2007/01/20/T11_58_29/
iow similar to .INI but with \t indentation.

>
> Right, that's what I had in mind with the "central configuration
> repository". It would be unique to each installation (unless manually
> copied to another), and establishes a set of variables relevant to a
> certain file. When a seed package is installed, it would poll values
> from the repository, and if none is found then ask the user (and then
> save it back to the repository).
>

I like this idea a lot.

> Configuration files have default values, and many use unique layout
> and formatting. I would imagine that each file's formatting and such
> can be handled by regular expressions. Such regular expressions should
> match default values in the configuration (in full-line context) and
> replace them with variables from the configuration repository; they
> should also leave customized configurations alone.
>

+1 to leaving custom configs alone. etc-proposals makes nice work of handling
config changes (similar to dispatch-conf but will use X if available.)

> To properly implement this, seed-linux would need to develop
> expressions for each version of each configuration file. As a
> package's configuration files evolve, the patterns will need to be
> updated. This may require ongoing work to maintain the expressions and
> scripting for each package. Fortunately, most packages don't change
> formatting too much between versions, though some (e.g. Apache) change
> their config file layout, moving config data from one file to another
> and such.
>

I think we should distinguish between XML type files and simpler INI style
formats. More complex configs, like apache, need to handle settings across
files, and parsing the values when an SGML/XML/HTML type format is used
requires specialised tools; I recommend xmlstarlet[1] for handling that from
a shellscript.

> This amounts to a re-configuration script for every new version,
> though some updates may work fine with a previous version of the
> script. Due to the nature of seed-linux as a rapid-deployment,
> enterprise-oriented solution, we may want to distribute seed updates
> somewhat infrequently, choosing stable, thoroughly tested software,
> and waiting until major updates provide significant security and/or
> feature advantages. This would provide a way to reduce the burden of
> maintaining the configuration scripts and such.
>
> > IIRC, a couple of the seed packages already install
> > replacement config files. It would be no trouble at all to extend this.
>
> Glad to hear - maybe we don't need separate packages for seed
> configuration after all?
>

I'd hope not; if we can enhance the existing Gentoo tools where required (like
the various etc-updaters) then we can avoid extra maintenance work.

I'd be happy to help out with bash (and C if needed); we've been working on an
emerge wrapper[2] for quite a while, which also works with pkgcore (we'll add
paludis when a user tells us how it should be handled) and the latest version
(pre-release on last page) allows USE editing and list selection from the
cmd-line (in a nice UI.)

wrt tools can I ask that ed be included in the system set? It's a real
omission in gentoo imo, as it makes editing config files and the like a lot
simpler, as well as more secure, and is part of the base POSIX set.[3]

Also, are you guys familiar with OpenRC[4] which will be the Gentoo
baselayout2? It might be wise to start testing it now.

[1] http://xmlstar.sourceforge.net/
[2] http://forums.gentoo.org/viewtopic-t-546828.html
[3] http://www.opengroup.org/onlinepubs/009695399/utilities/contents.html
[4] http://roy.marples.name/openrc
https://bugs.gentoo.org/show_bug.cgi?id=212696

samba

unread,
Mar 20, 2008, 11:43:54 AM3/20/08
to seed-linux-dev
All,

I woke up this morning and realized that I had neglected to highlight
a key point in my 2nd post.

I'll keep this one brief: variables in the central configuration
repository aren't necessarily used for replacement. Some simply define
a variable for the context of a different file, which is used for
*other* decisions.

That variable (such as 'enableClustering' in my example) can be used
for other decisions, such as (if 'enableClustering' were false)
commenting out every line relevant to clustering in MySQL, Apache, and
numerous other packages. Alternatively (if 'enableClustering' were
true) it would uncomment all such lines and (as scripted) would
perform the necessary operations to adjust values for the clustering
configuration - this may require that other values (e.g.
'clusterMasterIP') be defined.

Ranjit, thanks for your input. I had not gotten specific about
formats. What would you suggest for handling hybrid formats, such as
Apache's, where it's psuedo-XML with INI-like contents for most nodes?

> if we can enhance the existing Gentoo tools where required (like
> the various etc-updaters) then we can avoid extra maintenance work.

Indeed. I should think we could wrap etc-update, identify and merge
old customized configurations (including those common with the
'central...repository') over the new defaults, and then leave the rest
to the user. This would allow packages to be updated regularly and
still require minimal user involvement. (Negating my comments
previously about releasing infrequently.) My only concern is those
rare cases where a packages make significant changes to its config
layout...

Stuart, this post should be considered an amendment to my previous
posts on this topic - please re-consider it ;)

Best regards,

Sam

Ranjit Singh

unread,
Mar 27, 2008, 2:19:48 AM3/27/08
to seed-li...@googlegroups.com
Hi Sam,
Sorry for tardy response; kids are hard work! ;p

On Thursday 20 March 2008 15:43:54 samba wrote:
> I'll keep this one brief: variables in the central configuration
> repository aren't necessarily used for replacement. Some simply define
> a variable for the context of a different file, which is used for
> *other* decisions.
>
> That variable (such as 'enableClustering' in my example) can be used
> for other decisions, such as (if 'enableClustering' were false)
> commenting out every line relevant to clustering in MySQL, Apache, and
> numerous other packages. Alternatively (if 'enableClustering' were
> true) it would uncomment all such lines and (as scripted) would
> perform the necessary operations to adjust values for the clustering
> configuration - this may require that other values (e.g.
> 'clusterMasterIP') be defined.
>

Makes sense; can it not be restricted to one file, say /etc/seed.conf if it's
for such system-wide settings? I can see case for eg /etc/seed.conf/net but
meta-settings would filter down to /etc/conf.d/foo and I can't see the need
for more than one file given that we have a Gentoo config to play with.

> Ranjit, thanks for your input. I had not gotten specific about
> formats. What would you suggest for handling hybrid formats, such as
> Apache's, where it's psuedo-XML with INI-like contents for most nodes?
>

I'd suggest we handle apache config as its own format, since other projects do
use it, and it's so critical to the stack. It's been years since I had to
wrestle with apache config (can you tell? ;) and a basic search doesn't
reveal much on DTDs apart from strut stuff. So if we have to (surely not?) we
hack a basic DTD together, enough for us to query the elements which we can
parse separately. Yeah, this is going to take some hacking but only at script
level.

> > if we can enhance the existing Gentoo tools where required (like
> > the various etc-updaters) then we can avoid extra maintenance work.
>
> Indeed. I should think we could wrap etc-update, identify and merge
> old customized configurations (including those common with the
> 'central...repository') over the new defaults, and then leave the rest
> to the user. This would allow packages to be updated regularly and
> still require minimal user involvement. (Negating my comments
> previously about releasing infrequently.) My only concern is those
> rare cases where a packages make significant changes to its config
> layout...
>

Well we built a warning mechanism into update last year, to handle ABI
upgrades (expat) which we could extend. There are also post actions for a few
packages, and we were planning a watch list for generic user-defined stuff
(the config option has been there for ages, we just haven't done anything
with it, beyond highlighting the packages in resume.) We could make one of
those call an interactive thing. Personally I think /etc/warning is the best
mechanism since we can push changes to it out via portage. ATM it deals with
building other stuff before, after, before revdep and so on, if that info is
available. Adding a function is easy; working out what it should do with the
new config stuff is a bit trickier.

I guess simply masking it til the admin wants to deal with it (which is what
update defaults to in automated mode, though it's a temporary mask so doesn't
affect normal emerge operations) would be a start?

Regards,
Ranjit.

[update] http://forums.gentoo.org/viewtopic-t-546828.html

Eric Thibodeau

unread,
Jun 26, 2008, 6:36:16 PM6/26/08
to seed-li...@googlegroups.com
Hello all,
    A quick introduction, I have been using Gentoo for years, am now participating in a Google Summer of Code for Gentoo (see some details below) and like the Seed idea for many reasons.

    Sam and I have had a few conversations concerning a "central configuration" approach to ease the machine's configuration. A simple example for justification:

Small business context, say you want to change the hostname, you can't just change it in /etc/conf.d/hostname ....why, here are a few hints:

DNS identification via DHCP response...
LDAP depending on correct name resolution
KERBEROS (idem)
some daemons like apache might be better configured knowing the current hostname
mysql

Yes, localhost is handy but this is only one example of many where changing a config element needs to be propagated correctly.

But read on and do refer to the linked historical references ;)

Eric Thibodeau

Sam Briesemeister wrote:
Sounds good to me. If you'd like to re-start that thread, this looks like a good foundation.

Thanks ;)

On Wed, Jun 18, 2008 at 7:17 PM, Eric Thibodeau <ky...@neuralbs.com> wrote:
[I missed your original comments, see below for replis...if any ;)]

For re-launching the thread, how's this:

- We need to centralize setup-only config directives,
- I vote for BASH variables in a file and use bash to modify virgin config files
- If you want a philosophical debate about configuration files, read: http://modeemi.fi/~tuomov/b//archives/2007/01/20/T11_58_29/
- The original thread (a must! :P ) : http://groups.google.com/group/seed-linux-dev/browse_thread/thread/73cb8a4fef940903

Eric


Sam Briesemeister wrote:
I take that back - looks like I can't reply to the main thread for config tools either. I guess the thread is expired. We'll have to start a new one. Whoever does it should compile some of the feedback from the first to summarize the current status of the conversation.

On Wed, Jun 18, 2008 at 6:36 PM, Sam Briesemeister <sam.brie...@gmail.com> wrote:
Eric,

I've actually seen that behavior as well. I think for some reason you (and me too) are not allowed to respond to threads which were last active before we joined the group. Funky policy if you ask me, but that's my best guess. I'll push this onto the thread and hopefully you'll be able to participate directly then.

Would your contribution be limited to the summer only?


On Wed, Jun 18, 2008 at 6:28 PM, Eric Thibodeau <ky...@neuralbs.com> wrote:
Be damned that sucky Google interface!!! No, I obviously didn't. Could you please reply whilst including the mailing list (I didn't have the "reply to thread" option for some reason and still don't :( )

My current work is extremely compressed as far as time line is concerned, this is a Google Summer of Code project, which implies an end at the end of summer. So the work might not be well integrated and thought out but it should work and provide some bases to evolve upon ;)

Eric


Sam Briesemeister wrote:
Eric,

Thanks for your input. I have some further ideas I'd like to contribute as well. As you've noticed, that thread has been inactive for a while; during that time, my ideas have evolved a bit.

I'm in the process of moving to another city, so I'll be busy for a couple more weeks getting things settled there. When I'm done, I'll begin more active contribution to the project.

Also, did you know you were sending this only to me? I don't think the rest of the thread got it.

So, to your input...

Yes, and as an example of this implementation, see net-nds/openldap.
The default config file is "inspired" by one in the files dir and
modified using sed. Which adds much validity to your claim. The entire
process is performed in src_install() and brings me to wonder if it
wouldn't be a nice addition for portage to add the ability to
supersede such a function for such purposes as providing turnkey
solutions (ie: use this ebuild except that we want portage to use
_our_ src_install() function)

Without knowing about this thread, here is what I have made while
working on a Gentoo Clustering LiveCD (which I hope to change into an
actual SEED):

http://wiki.neuralbs.com/~kyron/soc2008/

The two files of interest are cluster_ldap_skel.conf and ldap-
setup.sh, all other files are auto-generated by calling
./ldap-setup.sh cluster_ldap_skel.conf

The final intention is that the functions in the .sh file be
integrated into multiple .ebuilds/seed-ebuilds and that the .conf file
be a reference for these seeds.

As much as possible information is deduced from the system (ie: usage
of $(hostname -d) for generating the ldap DC info) and all.

Well, with my comments above, my bias towards using BASH is obvious.


Funny you mention clustering. We believe that to be a niche that Seed Linux could really effectively target, automating things and such as we intend to. Your insight on the subject would be quite interesting.
 
Note that this is also emphasized by the ebulds being BASH scripts
(better integration into portage if we stick to BASH and BASH-variable-
config formats IMHO)

I'm in agreement with you regarding the use of BASH to integrate with the rest of Portage and the ebuild system.

IMHO, 'enableClustering' should be implemented as a USE flag.

That's fine by me. Properly implemented, it would achieve the goal I had in mind anyway.

Also, I am strongly against duplicating the location of information,
especially as critical as IP addresses. This information should be
derived from the "official" location (ie: /etc/conf.d/net if defined
in there, otherwise, derived directly from the system's utilities)

The idea was to provide configuration pools that could be used to configure multiple nodes as a collective, comparable to what MS Active Directory can do with Group Policies. If we could develop a configuration server of sorts, which hands out compiled configuration data to a client on-boot, that might achieve the same goal and allow total centralization. The purpose of having it replicated was primarily to avoid losing the data in the event of one node failing.
Ok, this aspect is important and should be addressed with the proper use of LDAP (I strongly suggest OpenLDAP 2.4.10 since it seems to have many very cool features enhancing performance and integrating nssov into it's modules to effectively replace the clumsy nss_ldap (under Gentoo anyways). Furthermore, Stuart mentioned registering OIDs specifically with LDAP and I believe this to be a step towards being able to use LDAP as a configuration backend.

Some words of caution:

- A DB is corruptible, do we want LDAP as the source to generating the config files or LDAP as the live repository of configs?
- LDAP is optimized for reads, not writes, I know of banks that made that fatal mistake and it brought down their personal transaction web site (quite a stupid mistake IMHO)
- LDAP is way less intuitive than a human readable text file an would suffer from the same complexity as the AD
** Nice little story about that one actually, one of my friends just deployed MS SErver 2008 with Vista stations yesterday and they actually ended up with useless Vista machines since they wouldn't even talk to the local AD due to a too restrictive firewall setting pushed out through the AD....M$ has gone Hara-kiri, Japanese style ;)

Centralizing a cluster's node config is trivial since all nodes are typically replicas of a central NFS-mounted root with some slight modifications of /etc files at boot up (etc being unionfs mounted so only modified ROOT files are stored into local RAM)...this approach wouldn't be applicable in the case of a regular workstation setup unless you're thinking of building diskless stations "à la LTSP", which I have also done (we need an nfs-booted X terminal Seed :P )



Again, I'll pitch more of my thoughts to the group once I get more time.

Thank for the input!

--
__________________
Samuel Briesemeister
Information Systems Consultant and Business Analyst

<sam.brie...@gmail.com>


355/113
Eric (again :P )



--
__________________
Samuel Briesemeister
Information Systems Consultant and Business Analyst

<sam.brie...@gmail.com>


355/113

Stuart Herbert

unread,
Jun 27, 2008, 4:07:49 AM6/27/08
to seed-li...@googlegroups.com
I'm not entirely clear ... what are you proposing we do about configuration?

My current thinking is that we have two inter-related configuration areas to solve:

a) Configuration required for initial installation of a Seed, which breaks down into
    1) hardware prep (partitioning, network, et al)
    2) seed-specific configuration
b) Central management and deployment of configurations

The traditional ways of solving a1) is via an installer (e.g. Anaconda), or by cookbook instructions (e.g. Gentoo installation handbook).  It is currently looking like we'll have to start with the cookbook approach until we have an installer available to integrate into releases.  The nice thing about a1) is that this step should be generic, and have little-to-no seed-specific component.  However, good installers are hard to do, because they play a big part in the first impressions that your user forms.  It's risky to try to innovate here, and I believe we should adapt either Anaconda or Ubuntu's installer to do this work.

a2) is the first place where Seed Linux can significantly improve over RedHat et al.  We have a distinct advantage over them - we *know* what the specific installation of Seed Linux is there to do.  It seems to make sense to add a seed-specific configuration capability into each Seed we invent, to make it as easy as possible for users to get things up and running with minimal fuss and effort.  I currently believe that the majority of this configuration should be stored in an LDAP system, and that we should build as many packages as possible to get their information directly from LDAP.  Where that's not possible, we have to decide whether or not to auto-generate config files from data stored in LDAP.

b) is the other place where Seed Linux can significantly improve over RedHat et al.  Seed Linux isn't a generic solution - we're deliberately building Seeds that are meant to do one job well, and that are suitable for clustering / farming.  It's likely that folks who choose to use Seed Linux don't need one well-specified web server; they need lots.  LDAP again seems the ideal conduit for this: it supports master/slave replication, so we can run LDAP locally on each server to improve robustness; provided we design things right, we can easily migrate local LDAP data into the master repository when a machine is added to the cluster.  But we have to go beyond LDAP too.  One example I can think of is monitoring.  A standalone seed will have all aspects of monitoring running locally, but when the seed joins a larger setup, you would want the monitoring data to be shipped off to a central location.

These aren't easy problems to solve, although I think using LDAP will give us the edge over the competition.  Fedora, for example, has just started using Func [1],[2].  Puppet is gaining in popularity, and there are still the older solutions such as cfengine out there (and let's not forget NIS/YP).  These are examples where everyone keeps re-inventing the same wheel over and over again.  

We need to do something different, and I think the cornerstone of that is building our seeds to take their configuration in real-time from LDAP.

[2] https://fedorahosted.org/func

Best regards,
Stu
--
--
Reply all
Reply to author
Forward
0 new messages