Sources of truth on the network

Matthew Macdonald-Wallace

Sep 29, 2011, 2:33:54 AM9/29/11
to devops-toolchain,
Hi all,

Through many conversations with the community and at work, it is clear
that the DevOps movement is doing an excellent job of improving the
status quo when it comes to systems adminstration (even if I do
personally feel it's a bit too much "dev" and not enough "ops")
through tools such as Chef, Puppet and all the other code that has
been written.

Where I feel we are missing a huge area is that of canonical sources of truth.

At present on my network I have the following systems which store
information about my servers/AMI instances/Printers/Desktops etc:

1) Nagios
2) Our "homebrew" inventory system
3) Puppet/Cobbler
4) Legacy Build System
5) Legacy "command and control" shell scripts
6) Change Management System
7) Support Ticket System

This means than in order to add a new host to the network and be
confident that it will get picked up by all of our systems and
scripts, I need to update seven places which seems to be crazy to me!

I am planning on starting to write a very basic API-based system which
can act as a source of truth for many of the above systems, however
before I do so, I thought I'd run the idea past the community to see
what you all feel would be useful in a system such as this.

At present, I only plan on including the following information:

Asset ID (Internal Asset Tag where appropriate)
Support Service Id (Dell Service Tag/AWS Instance Id etc)
ip address(es)
Datacentre (Telehouse/Amazon AWS/RackSpace/etc.)
Location (Room/Rack/AWS Zone/etc.)
Owner (the primary user - probably pulled from LDAP or similar -
responsible for this system)

The idea being that this system would expose information over the API
which could then be plugged in to monitoring, Puppet/Cobbler, Change
Management Systems and all the other wonderful systems that are

What do people think about this? Good idea? Bad Idea? :P

All comments welcome,


Greg Retkowski

Sep 30, 2011, 2:14:33 PM9/30/11
to, devops-toolchain

Greg Retkowski

Sep 30, 2011, 2:20:40 PM9/30/11
I am one of the main contributors to the Source-of-Truth database at my
current client. Our environment is tens of thousands of nodes. So I've
given some thought to what I'd do if I were to do a green-field
implementation. I'd want to create something with the attributes below:

Be a key-value store for each host; and each host identified by a unique
key for that host (perhaps the MAC address of the on-board ethernet port).

There'd be some way for users to perform CRUD operations on a host and its
key/value pairs; probably a web frontend.

There'd be a way for automated processes to put data into the SOTDB, for
example facter could populate the host's entry with key/value pairs of
the facts from that host.

There'd be multiple ways to get the data out of the SOTDB; it'd have a
template engine, to produce config files (nagios configs, etc). It'd have
a command line tool to produce a list of hosts, or host values, based on
query strings, and piped for shell scripts; i.e.

$ sotls datacenter=pao1 service=webserver

It'd have a web-api, REST or whatever. The automation tool, puppet, chef,
facter, etc.. would be able to access data about the host within its own


Some trigger mechanism where if a record was updated it'd kick off certain
actions (regenerate DNS, notify a kickstart server, etc..)

Open issues:

How do you describe dependencies / topology in a generic-as-possible
manner so it can be broadly used in different contexts.

Which is authoritative? What's discovered on your network or what's in
your SOTDB? If the SOT says a system should be one thing, but the machine
claims it is something else - do you change the machine or the SOT?

I might go see what's out there and see if any of them could be expanded.
I looked at iClassify at one point, it is a rails-based key-value store
with a web-api - seemed like it would be a good starting point for
building out something more powerful.

Best Regards,

-- Greg

Aaron Nichols

Oct 1, 2011, 10:20:08 AM10/1/11
On Thu, Sep 29, 2011 at 12:33 AM, Matthew Macdonald-Wallace wrote:
I am planning on starting to write a very basic API-based system which
can act as a source of truth for many of the above systems, however
before I do so, I thought I'd run the idea past the community to see
what you all feel would be useful in a system such as this.

I have always believed that puppet/chef are good sources of truth since they ultimately define what ends up on systems. Puppet facts already contain some of the information you listed by virtue of querying the system & custom facts may be added in a variety of ways. Since puppet/chef are apt to drive the configuration of other components, they seem like the right place to source this information. The configuration also lends itself well to being backed up, replicated, distributed and otherwise made available in a resilient way to protect your truth from being lost. It's not a database persay - but I'm not sure that's a requirement. 

I know the concept of a central API to query these facts may have some gaps - but that seems like an easy problem to solve if it isn't already solved.


Chris Mulder

Sep 29, 2011, 9:52:05 AM9/29/11
Hey Matthew,

I have given the same requirement some thought previously as well, and
have often wondered if this could not be solved through the use of
LDAP? Granted that one would probably have to spend some time to
extend/develop additional schemas to satisfy all the requirements. I'd
also be interested to hear what other folks think. My reason for
thinking that LDAP would make a good "single truth" is that most
unix/linux flavours have a basic set of tools for querying LDAP, and
that most development and scripting languages have fairly well
established API's

Any thoughts?


On 29 September 2011 08:33, Matthew Macdonald-Wallace wrote:

Chris Mulder
+27 82 040 6434

Jon Topper

Sep 30, 2011, 5:33:45 AM9/30/11
to, devops-toolchain
It sounds from your mail that what you're attempting to build is a CMDB.  Many have tried and failed at this approach (including a number of "enterprise" vendors), and there seems to be a general feeling that whilst desirable in idealistic terms, a central source of truth is a bit of a pipe dream.  ITSkeptic has some good posts on the subject - is a good example.

Personally, I prefer the idea of configuration discoverability, since the only real single source of truth is what the infrastructure *actually* looks like.  The great work RI Pienaar is doing with MCollective is a good stride in that direction, and I'd invest time in that tool, rather than trying to build yet another glorified spreadsheet.


Miles Fidelman

Oct 1, 2011, 11:32:04 AM10/1/11
Chris Mulder wrote:
> Hey Matthew,
> I have given the same requirement some thought previously as well, and
> have often wondered if this could not be solved through the use of
> LDAP? Granted that one would probably have to spend some time to
> extend/develop additional schemas to satisfy all the requirements. I'd
> also be interested to hear what other folks think. My reason for
> thinking that LDAP would make a good "single truth" is that most
> unix/linux flavours have a basic set of tools for querying LDAP, and
> that most development and scripting languages have fairly well
> established API's
> Any thoughts?
Well... there's always SNMP.

In theory, there is no difference between theory and practice.
In<fnord> practice, there is. .... Yogi Berra

Miles Fidelman

Oct 1, 2011, 11:37:50 AM10/1/11
Jon Topper wrote:
> It sounds from your mail that what you're attempting to build is a
> CMDB. Many have tried and failed at this approach (including a number
> of "enterprise" vendors), and there seems to be a general feeling that
> whilst desirable in idealistic terms, a central source of truth is a
> bit of a pipe dream. ITSkeptic has some good posts on the subject -
> is a good example.
> Personally, I prefer the idea of configuration discoverability, since
> the only real single source of truth is what the infrastructure
> *actually* looks like. The great work RI Pienaar is doing with
> MCollective is a good stride in that direction, and I'd invest time in
> that tool, rather than trying to build yet another glorified spreadsheet.

It occurs to me that there's a good model that comes out of the
distributed simulation world - rather than try to maintain a centralized
"world model," each simulator maintains a local copy - which are kept
synchronized by a publish-subscribe protocol whereby each node publishes
updates, other nodes subscribe to the information streams that they're
interested in. DIS and HLA are the primary protocols used, DDS is
similar and has a lot of traction in places like the Navy, for linking
distributed sensors to distributed weapons.

James Turnbull

Oct 1, 2011, 11:44:30 AM10/1/11
Aaron Nichols wrote:
> I know the concept of a central API to query these facts may have some
> gaps - but that seems like an easy problem to solve if it isn't already
> solved.

Puppet has a central API for querying fact data called the inventory
service (introduced in 2.6.7). The documentation is here:


James Turnbull

Matthew Macdonald-Wallace

Oct 5, 2011, 3:02:03 AM10/5/11
to devops-toolchain,

Thanks for all the responses.

I was going to go down the home-brew route and possibly even start a
project to define a "standard" format for these kinds of things to be
adopted by other open-source solutions, however it's become clear to
me that this is quite possible a "horses for courses" issue and there
cannot be a single solution.

I'll check out nventory and the other recommendations made on the
thread and write up a blog post summarising this in the next few days.

Kind regards,


On 29 September 2011 07:33, Matthew Macdonald-Wallace wrote:
<> wrote:

Jeff McCune

Oct 5, 2011, 1:23:09 PM10/5/11
On Sat, Oct 1, 2011 at 8:44 AM, James Turnbull wrote:
Aaron Nichols wrote:
> I know the concept of a central API to query these facts may have some
> gaps - but that seems like an easy problem to solve if it isn't already
> solved.

Puppet has a central API for querying fact data called the inventory
service (introduced in 2.6.7).  The documentation is here:

In addition to the inventory service, there's also a more "light weight" way to query the list of facts, classes, and node parameters using Puppet.

Using the puppet node command, you can obtain all of this information straight from the Puppet Master.  The puppet node command is a Puppet Face that nicely wraps up the Puppet REST API into a command line interface.  In the following example, I'm asking the Puppet Master to tell me the node information about the host "pe-ubuntu-lucid"  You can use this same command to get information for any node that has a puppet agent checking into to the master.

$ puppet node find pe-ubuntu-lucid --render-as yaml
--- !ruby/object:Puppet::Node
  classes: []
  environment: &id002 production
  expiration: 2011-10-05 10:48:37.609331 -07:00
  name: pe-ubuntu-lucid
    productname: VMware Virtual Platform
    kernelmajversion: "2.6"
    kernelversion: 2.6.18
    clientversion: &id001 2.7.6 (Puppet Enterprise cmdrkeith_rc0-107-gc194a2c)
    rubysitedir: /opt/puppet/lib/ruby/site_ruby/1.8
    clientcert: pe-centos5.localdomain
    fact_stomp_server: puppetmaster
    fact_stomp_port: "61613"
    ps: ps -ef
    fact_is_puppetagent: "true"
    lsbdistcodename: Final
    hardwareisa: i686
    lsbdistrelease: "5.7"
    uniqueid: 10ac86d6
    serialnumber: VMware-56 4d a0 c0 96 7b b9 0d-e2 d9 14 34 a7 5f ec d9
    hostname: pe-centos5
    kernelrelease: 2.6.18-274.3.1.el5
    lsbrelease: ":core-4.0-ia32:core-4.0-noarch:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-ia32:printing-4.0-noarch"
    kernel: Linux
    uptime_seconds: "5351"
    facterversion: 1.6.0
    interfaces: "eth0,lo,sit0"
    macaddress_eth0: 00:0C:29:5F:EC:D9
    is_virtual: "true"
    sshdsakey: AAAAB3NzaC1kc3MAAACBAJpC8lj4M2apiR89kuZuVtEao6/U+UIVjI7twjMQ5HhKjXUUOAostWdW0y6kOc3Z++H+S96XofricFRHTb545ulYqFz//pr6DM9Lmw3YNTrVrcEFD5XgRi+qQC6tKk+IimEbXBj0hYxb36NLhg8kv/IJ/nbctfO8/OblX2fYX1KvAAAAFQD9Ma0nP5IPJ+Na9sdd68qgeD9O3wAAAIBFls1gXHE0MlgGCKefLlwhptUFROihxqmrww4mgrI6p9yNcSjjAMrvUuUynQkZRALEFL0VmNkWUOEL//Qf9ITLFKcltl+xPSgNe5J91myfIfKQNE5nYrcCCc/nvpPVlsXEvRuDcZ2Vr+pd92ZZVjrDfAi9CEQyQYhCZR4tih6mdwAAAIAJcTtbVzb8IsXdGUylqSZ/x3yXO2UOZgZaWQn1Tml4Him9JE92cQp7buGoeLu1nevtTDWaz8tzS8u9/EyCPQKuPQzSiLVG+l5x4r/QnpOGQvB1KL5EhkPe7jlsvkxkt9WJiJ01K6W37zDgunK+10qnVR4Tr+PeG1WBnpFsdSbcjQ==
    macaddress: 00:0C:29:5F:EC:D9
    uptime: 1:29 hours
    manufacturer: "VMware, Inc."
    timezone: PDT
    puppetversion: *id001
    path: /usr/local/bin:/opt/csw/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
    hardwaremodel: i686
    fqdn: pe-centos5.localdomain
    uptime_days: "0"
    virtual: vmware
    operatingsystemrelease: "5.7"
    processorcount: "2"
    sshrsakey: AAAAB3NzaC1yc2EAAAABIwAAAQEAlfZIsZOxZo12GZYyIJmUHP69pefhZ4MuJ20FeCvtdkYlbE4gt+/5ohQ2TtzQNFWLQCuEmxlWDaCcxoCElF5pFTIv04kZRjR+/6sYD+UR1TbdphJmy/oI5qu9vArH3jfw+i3Uy/0MwHq8ghvGsUNdcRZCdEBfaa1/Gf68v5LzcEKkJe+UzLsx2tA/3Km0KQTlDUl/+NSgwTFmkD1DlrRdfRHC8nFBxTskPzfzVdQOfpliOXd0En8d/uideQi0o8HD2yDLxN7jLvKGwWAGZKYUEzDD6WeagWjYv0BwCXhmLhKaIM4bELxkaDivw/Z8LaobvqOosnhPFj9T7R8hHcCPSw==
    environment: *id002
    memorysize: 502.79 MB
    rubyversion: 1.8.7
    physicalprocessorcount: "2"
    lsbdistid: CentOS
    !ruby/sym _timestamp: Wed Oct 05 10:18:37 -0700 2011
    lsbdistdescription: CentOS release 5.7 (Final)
    fact_is_puppetmaster: "false"
    swapsize: 2.00 GB
    processor0: Intel(R) Core(TM) i7-2820QM CPU @ 2.30GHz
    processor1: Intel(R) Core(TM) i7-2820QM CPU @ 2.30GHz
    architecture: i386
    memoryfree: 377.38 MB
    selinux: "false"
    uptime_hours: "1"
    domain: localdomain
    operatingsystem: CentOS
    type: Other
    id: root
    lsbmajdistrelease: "5"
    swapfree: 2.00 GB
  time: 2011-10-05 10:18:37.344521 -07:00 

Jeff McCune

