Facter 2.0 Architecture document

23 views
Skip to first unread message

Luke Kanies

unread,
Aug 25, 2010, 2:17:55 PM8/25/10
to puppe...@googlegroups.com
Hi all,

Rein, Paul, and I had a call today discussing whether we should produce a 1.6 (I said no, unless there are high priority tickets that really need to be worked on), and then what the design goals of 2.0 should be. I took notes on our discussion and atempted to produce a doc capturing it all:

http://projects.puppetlabs.com/projects/facter/wiki/ArchitectureForTwoDotOh

Comments appreciated.

--
Sabbagh's Second Law:
The biggest problem with communication is the illusion that it
has occurred.
---------------------------------------------------------------------
Luke Kanies -|- http://puppetlabs.com -|- +1(615)594-8199


Nigel Kersten

unread,
Aug 25, 2010, 4:53:13 PM8/25/10
to puppe...@googlegroups.com
On Wed, Aug 25, 2010 at 11:17 AM, Luke Kanies <lu...@puppetlabs.com> wrote:
> Hi all,
>
> Rein, Paul, and I had a call today discussing whether we should produce a 1.6 (I said no, unless there are high priority tickets that really need to be worked on), and then what the design goals of 2.0 should be.  I took notes on our discussion and atempted to produce a doc capturing it all:
>
> http://projects.puppetlabs.com/projects/facter/wiki/ArchitectureForTwoDotOh

I have a few thoughts churning around about Facter having native
support for storing fact evaluation history on the client, which ties
into the open feature request for caching fact values, and I noticed
you reference a ttl option in #4565.

Simplifying the DSL for the common case is hugely appealing.


>
> Comments appreciated.
>
> --
> Sabbagh's Second Law:
>    The biggest problem with communication is the illusion that it
>    has occurred.
> ---------------------------------------------------------------------
> Luke Kanies  -|-   http://puppetlabs.com   -|-   +1(615)594-8199
>
>
>
>

> --
> You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
> To post to this group, send email to puppe...@googlegroups.com.
> To unsubscribe from this group, send email to puppet-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
>
>

--
nigel

Luke Kanies

unread,
Aug 26, 2010, 1:09:16 AM8/26/10
to puppe...@googlegroups.com
On Aug 25, 2010, at 1:53 PM, Nigel Kersten wrote:

> On Wed, Aug 25, 2010 at 11:17 AM, Luke Kanies <lu...@puppetlabs.com> wrote:
>> Hi all,
>>
>> Rein, Paul, and I had a call today discussing whether we should produce a 1.6 (I said no, unless there are high priority tickets that really need to be worked on), and then what the design goals of 2.0 should be. I took notes on our discussion and atempted to produce a doc capturing it all:
>>
>> http://projects.puppetlabs.com/projects/facter/wiki/ArchitectureForTwoDotOh
>
> I have a few thoughts churning around about Facter having native
> support for storing fact evaluation history on the client, which ties
> into the open feature request for caching fact values, and I noticed
> you reference a ttl option in #4565.
>

It'd be great to hear these.

> Simplifying the DSL for the common case is hugely appealing.

Yeah, although I think external, non-ruby facts will end up being more appealing.

--
A motion to adjourn is always in order. --Robert Heinlein

Nigel Kersten

unread,
Aug 26, 2010, 2:27:25 PM8/26/10
to puppe...@googlegroups.com
On Wed, Aug 25, 2010 at 10:09 PM, Luke Kanies <lu...@puppetlabs.com> wrote:
> On Aug 25, 2010, at 1:53 PM, Nigel Kersten wrote:
>
>> On Wed, Aug 25, 2010 at 11:17 AM, Luke Kanies <lu...@puppetlabs.com> wrote:
>>> Hi all,
>>>
>>> Rein, Paul, and I had a call today discussing whether we should produce a 1.6 (I said no, unless there are high priority tickets that really need to be worked on), and then what the design goals of 2.0 should be.  I took notes on our discussion and atempted to produce a doc capturing it all:
>>>
>>> http://projects.puppetlabs.com/projects/facter/wiki/ArchitectureForTwoDotOh
>>
>> I have a few thoughts churning around about Facter having native
>> support for storing fact evaluation history on the client, which ties
>> into the open feature request for caching fact values, and I noticed
>> you reference a ttl option in #4565.
>>
>
> It'd be great to hear these.

So I feel that it would be immensely useful for Facter to optionally
store a certain amount of historical data about the fact evaluation.

It would be great to be able to simply interrogate info like "when did
the amount of RAM in this machine change?" "what is my kernel version
history?" etc etc.

To get there however, we need a persistent store for facts, which
seems to tie in quite nicely to the idea of having certain facts be
cached, and easily marked as "refresh once per boot" etc.

Facter becomes much more useful as a standalone product with these
capabilities, and ideally we could hook Puppet/Puppet Dashboard into
this to store historical fact data. We could do this at the
Puppet/Dashboard layer, but if we decided to accept the feature
request for caching fact evaluation, then it appears to make more
sense to have Facter support this directly.


>> Simplifying the DSL for the common case is hugely appealing.
>
> Yeah, although I think external, non-ruby facts will end up being more appealing.

Agreed.

--
nigel

Rein Henrichs

unread,
Aug 26, 2010, 3:08:13 PM8/26/10
to puppet-dev
Excerpts from Nigel Kersten's message of Thu Aug 26 11:27:25 -0700 2010:

> So I feel that it would be immensely useful for Facter to optionally
> store a certain amount of historical data about the fact evaluation.
>
> It would be great to be able to simply interrogate info like "when did
> the amount of RAM in this machine change?" "what is my kernel version
> history?" etc etc.

Yes, this would be very useful. That said, it's not that we need Facter
to store historical data. We need *something* to store historical data.
Probably not Facter. Probably an inventory service. Probably something
that provides a rich query interface, like CouchDB.
--

Rein Henrichs
http://puppetlabs.com

This message was brought to you by Linux, the free unix.
Windows without the X is like making love without a partner.
Sex, Drugs & Linux Rules
win-nt from the people who invented edlin
apples have meant trouble since eden
Linux, the way to get rid of boot viruses
(By mwik...@at8.abo.fi, MaDsen Wikholm)

Luke Kanies

unread,
Aug 26, 2010, 3:21:52 PM8/26/10
to puppe...@googlegroups.com

Interesting. So in this scenario, Facter would develop a decent bit of its own functionality - maybe not a daemon, but at least long-term storage.

Would it be acceptable if, say, the puppet agent provided a simple interface to the server-side fact storage, which will already have this? We're working on designing something like this right now, although it's more mental goo than real ideas right now.

--
A great many people think they are thinking when they are merely
rearranging their prejudices. -- William James

Nigel Kersten

unread,
Aug 26, 2010, 3:42:09 PM8/26/10
to puppe...@googlegroups.com
On Thu, Aug 26, 2010 at 12:08 PM, Rein Henrichs <re...@puppetlabs.com> wrote:
> Excerpts from Nigel Kersten's message of Thu Aug 26 11:27:25 -0700 2010:
>> So I feel that it would be immensely useful for Facter to optionally
>> store a certain amount of historical data about the fact evaluation.
>>
>> It would be great to be able to simply interrogate info like "when did
>> the amount of RAM in this machine change?" "what is my kernel version
>> history?" etc etc.
>
> Yes, this would be very useful. That said, it's not that we need Facter
> to store historical data. We need *something* to store historical data.
> Probably not Facter. Probably an inventory service. Probably something
> that provides a rich query interface, like CouchDB.

But if we're considering adding a ttl/caching for facts, aren't we
talking about Facter storing historical data anyway?

Luke's next mail he says:

"Would it be acceptable if, say, the puppet agent provided a simple
interface to the server-side fact storage, which will already have
this? We're working on designing something like this right now,
although it's more mental goo than real ideas right now"

This would be ok, but how on earth does this work with load-balancing?
You have no guarantee that you're hitting the same
local-to-the-server fact store when interrogating.

> --
>
> Rein Henrichs
> http://puppetlabs.com
>
> This  message was brought to  you by Linux, the free  unix.
> Windows without the X is like making love without a partner.
> Sex, Drugs & Linux Rules
> win-nt from the people who invented edlin
> apples  have  meant  trouble  since  eden
> Linux, the way to get rid of boot viruses
> (By mwik...@at8.abo.fi, MaDsen Wikholm)
>

Luke Kanies

unread,
Aug 26, 2010, 3:43:41 PM8/26/10
to puppe...@googlegroups.com
On Aug 26, 2010, at 12:42 PM, Nigel Kersten wrote:

> On Thu, Aug 26, 2010 at 12:08 PM, Rein Henrichs <re...@puppetlabs.com> wrote:
>> Excerpts from Nigel Kersten's message of Thu Aug 26 11:27:25 -0700 2010:
>>> So I feel that it would be immensely useful for Facter to optionally
>>> store a certain amount of historical data about the fact evaluation.
>>>
>>> It would be great to be able to simply interrogate info like "when did
>>> the amount of RAM in this machine change?" "what is my kernel version
>>> history?" etc etc.
>>
>> Yes, this would be very useful. That said, it's not that we need Facter
>> to store historical data. We need *something* to store historical data.
>> Probably not Facter. Probably an inventory service. Probably something
>> that provides a rich query interface, like CouchDB.
>
> But if we're considering adding a ttl/caching for facts, aren't we
> talking about Facter storing historical data anyway?

Adding a ttl isn't actually the same as doing the caching - that is, a given fact may want to define its own ttl for downstream users (i.e., Puppet) without actually doing any caching.

> Luke's next mail he says:
>
> "Would it be acceptable if, say, the puppet agent provided a simple
> interface to the server-side fact storage, which will already have
> this? We're working on designing something like this right now,
> although it's more mental goo than real ideas right now"
>
> This would be ok, but how on earth does this work with load-balancing?
> You have no guarantee that you're hitting the same
> local-to-the-server fact store when interrogating.

The server-side storage would need to be centralized, which has to be done for other reasons anyway.

--
I think that's how Chicago got started. A bunch of people in New York
said, 'Gee, I'm enjoying the crime and the poverty, but it just isn't
cold enough. Let's go west.' --Richard Jeni

Nigel Kersten

unread,
Aug 26, 2010, 3:47:29 PM8/26/10
to puppe...@googlegroups.com
On Thu, Aug 26, 2010 at 12:43 PM, Luke Kanies <lu...@puppetlabs.com> wrote:
> On Aug 26, 2010, at 12:42 PM, Nigel Kersten wrote:
>
>> On Thu, Aug 26, 2010 at 12:08 PM, Rein Henrichs <re...@puppetlabs.com> wrote:
>>> Excerpts from Nigel Kersten's message of Thu Aug 26 11:27:25 -0700 2010:
>>>> So I feel that it would be immensely useful for Facter to optionally
>>>> store a certain amount of historical data about the fact evaluation.
>>>>
>>>> It would be great to be able to simply interrogate info like "when did
>>>> the amount of RAM in this machine change?" "what is my kernel version
>>>> history?" etc etc.
>>>
>>> Yes, this would be very useful. That said, it's not that we need Facter
>>> to store historical data. We need *something* to store historical data.
>>> Probably not Facter. Probably an inventory service. Probably something
>>> that provides a rich query interface, like CouchDB.
>>
>> But if we're considering adding a ttl/caching for facts, aren't we
>> talking about Facter storing historical data anyway?
>
> Adding a ttl isn't actually the same as doing the caching - that is, a given fact may want to define its own ttl for downstream users (i.e., Puppet) without actually doing any caching.

So in that case you may get a different value for a fact when running
the Facter binary than when that same fact is used by Puppet? Doesn't
that seem suboptimal?

>
>> Luke's next mail he says:
>>
>> "Would it be acceptable if, say, the puppet agent provided a simple
>> interface to the server-side fact storage, which will already have
>> this?  We're working on designing something like this right now,
>> although it's more mental goo than real ideas right now"
>>
>> This would be ok, but how on earth does this work with load-balancing?
>> You have no guarantee that you're hitting the same
>> local-to-the-server fact store when interrogating.
>
> The server-side storage would need to be centralized, which has to be done for other reasons anyway.
>
> --
> I think that's how Chicago got started.  A bunch of people in New York
> said, 'Gee, I'm enjoying the crime and the poverty, but it just isn't
> cold enough. Let's go west.'         --Richard Jeni
> ---------------------------------------------------------------------
> Luke Kanies  -|-   http://puppetlabs.com   -|-   +1(615)594-8199
>
>
>
>

Luke Kanies

unread,
Aug 26, 2010, 4:02:09 PM8/26/10
to puppe...@googlegroups.com
On Aug 26, 2010, at 12:47 PM, Nigel Kersten wrote:

> On Thu, Aug 26, 2010 at 12:43 PM, Luke Kanies <lu...@puppetlabs.com> wrote:
>> On Aug 26, 2010, at 12:42 PM, Nigel Kersten wrote:
>>
>>> On Thu, Aug 26, 2010 at 12:08 PM, Rein Henrichs <re...@puppetlabs.com> wrote:
>>>> Excerpts from Nigel Kersten's message of Thu Aug 26 11:27:25 -0700 2010:
>>>>> So I feel that it would be immensely useful for Facter to optionally
>>>>> store a certain amount of historical data about the fact evaluation.
>>>>>
>>>>> It would be great to be able to simply interrogate info like "when did
>>>>> the amount of RAM in this machine change?" "what is my kernel version
>>>>> history?" etc etc.
>>>>
>>>> Yes, this would be very useful. That said, it's not that we need Facter
>>>> to store historical data. We need *something* to store historical data.
>>>> Probably not Facter. Probably an inventory service. Probably something
>>>> that provides a rich query interface, like CouchDB.
>>>
>>> But if we're considering adding a ttl/caching for facts, aren't we
>>> talking about Facter storing historical data anyway?
>>
>> Adding a ttl isn't actually the same as doing the caching - that is, a given fact may want to define its own ttl for downstream users (i.e., Puppet) without actually doing any caching.
>
> So in that case you may get a different value for a fact when running
> the Facter binary than when that same fact is used by Puppet? Doesn't
> that seem suboptimal?

It might be, but in that case you should set your ttl to zero. If you're comfortable with a long ttl, then it shouldn't matter, right?

I guess the question is if there needs to clearly be a direct linkage in facter between that server-side view and the client-side view, and my perspective is, not really.

I've always thought of facter as a means of collecting data and as little else as possible. In fact, the first use of Facter was sticking its output in an ldap db, and I wrote a separate tool (enhost) for that, rather than add additional functionality to Facter.


--
The advantage of a classical education is that it enables you to
despise the wealth that it prevents you from achieving.
-- Russell Green

Nigel Kersten

unread,
Aug 26, 2010, 4:21:14 PM8/26/10
to puppe...@googlegroups.com
On Thu, Aug 26, 2010 at 1:02 PM, Luke Kanies <lu...@puppetlabs.com> wrote:
> On Aug 26, 2010, at 12:47 PM, Nigel Kersten wrote:
>
>> On Thu, Aug 26, 2010 at 12:43 PM, Luke Kanies <lu...@puppetlabs.com> wrote:
>>> On Aug 26, 2010, at 12:42 PM, Nigel Kersten wrote:
>>>
>>>> On Thu, Aug 26, 2010 at 12:08 PM, Rein Henrichs <re...@puppetlabs.com> wrote:
>>>>> Excerpts from Nigel Kersten's message of Thu Aug 26 11:27:25 -0700 2010:
>>>>>> So I feel that it would be immensely useful for Facter to optionally
>>>>>> store a certain amount of historical data about the fact evaluation.
>>>>>>
>>>>>> It would be great to be able to simply interrogate info like "when did
>>>>>> the amount of RAM in this machine change?" "what is my kernel version
>>>>>> history?" etc etc.
>>>>>
>>>>> Yes, this would be very useful. That said, it's not that we need Facter
>>>>> to store historical data. We need *something* to store historical data.
>>>>> Probably not Facter. Probably an inventory service. Probably something
>>>>> that provides a rich query interface, like CouchDB.
>>>>
>>>> But if we're considering adding a ttl/caching for facts, aren't we
>>>> talking about Facter storing historical data anyway?
>>>
>>> Adding a ttl isn't actually the same as doing the caching - that is, a given fact may want to define its own ttl for downstream users (i.e., Puppet) without actually doing any caching.
>>
>> So in that case you may get a different value for a fact when running
>> the Facter binary than when that same fact is used by Puppet? Doesn't
>> that seem suboptimal?
>
> It might be, but in that case you should set your ttl to zero.  If you're comfortable with a long ttl, then it shouldn't matter, right?

It shouldn't matter, but it creates an situation that can make
debugging difficult, unless the plan is for facter --puppet
invocations to give you an accurate view of what Puppet actually sees.

> I guess the question is if there needs to clearly be a direct linkage in facter between that server-side view and the client-side view, and my perspective is, not really.
>
> I've always thought of facter as a means of collecting data and as little else as possible.  In fact, the first use of Facter was sticking its output in an ldap db, and I wrote a separate tool (enhost) for that, rather than add additional functionality to Facter.

I think this restricts the utility of Facter to a point where it may
as well simply be part of Puppet.

Luke Kanies

unread,
Aug 26, 2010, 6:32:39 PM8/26/10
to puppe...@googlegroups.com

I concur, but that's the case in nearly all significant infrastructures, right?

I think you're right that a high priority needs to be on making debugging simple. I've been planning on adding client-side interfaces to the server's data, which should make this straightforward.

>> I guess the question is if there needs to clearly be a direct linkage in facter between that server-side view and the client-side view, and my perspective is, not really.
>>
>> I've always thought of facter as a means of collecting data and as little else as possible. In fact, the first use of Facter was sticking its output in an ldap db, and I wrote a separate tool (enhost) for that, rather than add additional functionality to Facter.
>
> I think this restricts the utility of Facter to a point where it may
> as well simply be part of Puppet.

I think it's exactly the opposite - because it doesn't include any of the features that Puppet wants, it's more generically useful.

But I get what you're saying, and I'm not opposed to an operational mode in Facter that does this. It just doesn't fit enough with how we need to use it that it's worth us spending a lot of time on it. It should be straightforward to add, though, right?

--
That was just a drill of the emergency y2k system. Had this been a
real emergency, we would've also dumped a bucket of spiders on you and
yelled out "civilization is collapsing!"

R.I.Pienaar

unread,
Aug 26, 2010, 6:39:30 PM8/26/10
to puppe...@googlegroups.com

----- "Luke Kanies" <lu...@puppetlabs.com> wrote:

> >
> > I think this restricts the utility of Facter to a point where it may
> > as well simply be part of Puppet.
>
> I think it's exactly the opposite - because it doesn't include any of
> the features that Puppet wants, it's more generically useful.
>
> But I get what you're saying, and I'm not opposed to an operational
> mode in Facter that does this. It just doesn't fit enough with how we
> need to use it that it's worth us spending a lot of time on it. It
> should be straightforward to add, though, right?


I quite like that facter is a reusable little library - obviously I use it with mcollective - but I've even deployed it on cfengine sites and used it to build quick and dirty inventories with.

The fact that its small, doesnt bring lots of bloat, doesnt take lots of resources - at least when its not running - these are all awesome aspects and doesnt mean we cant use it in larger inventory systems but would rather see those as optional with different ways of transporting the facts to them

Nigel Kersten

unread,
Aug 26, 2010, 7:15:56 PM8/26/10
to puppe...@googlegroups.com
On Thu, Aug 26, 2010 at 3:39 PM, R.I.Pienaar <r...@devco.net> wrote:
>
> ----- "Luke Kanies" <lu...@puppetlabs.com> wrote:
>
>> >
>> > I think this restricts the utility of Facter to a point where it may
>> > as well simply be part of Puppet.
>>
>> I think it's exactly the opposite - because it doesn't include any of
>> the features that Puppet wants, it's more generically useful.

Damnit. I can't work out whether you've convinced me or not with that. :)

>>
>> But I get what you're saying, and I'm not opposed to an operational
>> mode in Facter that does this.  It just doesn't fit enough with how we
>> need to use it that it's worth us spending a lot of time on it.  It
>> should be straightforward to add, though, right?
>
>
> I quite like that facter is a reusable little library - obviously I use it with mcollective - but I've even deployed it on cfengine sites and used it to build quick and dirty inventories with.
>
> The fact that its small, doesnt bring lots of bloat, doesnt take lots of resources - at least when its not running - these are all awesome aspects and doesnt mean we cant use it in larger inventory systems but would rather see those as optional with different ways of transporting the facts to them


After thinking this through for a while, I've realized I really don't
have a strong opinion as to which layer/component a persistent store
should live in, but just really want debugging to be simple.

There is definitely a huge advantage for Facter being simple. I wasn't
suggesting that a local persistent store would be on by default, but
keep flipping back and forth as to where I think it should live.

--
nigel

Daniel Pittman

unread,
Aug 28, 2010, 6:44:50 AM8/28/10
to puppe...@googlegroups.com
Luke Kanies <lu...@puppetlabs.com> writes:

G'day. Having finally gotten free of some crash-priority engineering I have a
chance to look this over.

> Rein, Paul, and I had a call today discussing whether we should produce a
> 1.6 (I said no, unless there are high priority tickets that really need to
> be worked on), and then what the design goals of 2.0 should be. I took
> notes on our discussion and atempted to produce a doc capturing it all:
>
> http://projects.puppetlabs.com/projects/facter/wiki/ArchitectureForTwoDotOh
>
> Comments appreciated.

It looks pretty good to me, and the subsequent discussion has clarified some
of the bits I was uncertain about when it came to our internal use of the
facts.

From my PoV one of the big gains would be making adding a new fact more like
writing a munin plugin[1] than it currently is, although it is fairly simple
and direct right now.


The main thing missing from this documentation, and which has bitten us in
practice, is a lack of "community standards" for how facts should be
presented.

For example, we have a "mem_in_mb" fact to work around the human-focused
values being returned from the default memory fact, or the difficulty in
returning a boolean fact to puppet. (0 and Ruby false are both "true",
apparently. :)


I would also be very happy to see more explicitness than is mentioned here
about what sort of data types Facter handles: It sounds like y'all are
thinking of something akin to JSON-level "rich" data structures, which I would
be very happy with, rather than YAML-with-Ruby-classes "rich" data structures.
(...or even plain "any Ruby object is fine" results. :)


WRT the point about grouping of facts, and resolution: to my mind, this would
be a nice place to use a qualified name, and a search path:

com.puppetlabs.memory
net.rimspace.memory

facter search path: rimspace.net, puppetlabs.com

That would allow for qualified and unqualified facts to be used, and
appropriate searching through them; mostly, I would imagine this being
centrally configured in whatever tool was going to reference them.

That same hierarchy lends itself, SNMP-like, to grouping facts as leaf nodes
in a tree of naming.

com.puppetlabs.ipaddress.eth0 => 192.168.1.1
com.puppetlabs.ipaddress.eth2 => [172.16.23.1, 172.16.24.1]


Finally, the new DSL as proposed in the ticket looks good to me. It removes
much of the boilerplate code from the system. The one additional thing that
would be convenient to know would be what assurances of namespace and
execution I had as a developer:

If I use a global variable, am I going to trample anyone else?
When, and how often, will my code be run?

I am thinking of facts like this one:

Facter.add("grub2") do
installed = %x{complex and costly command}.match(/installed/)
setcode { installed ? 'true' : '' }
end

Knowing when, and how often, that costly command runs would be good, and
having that promise as part of the specification for building facts would make
it easier to be comfortable.

The namespace part is more important when using the DSL that doesn't make the
blocks (and, so, variable scope) so obvious.

[...]

Luke Kanies <lu...@puppetlabs.com> writes:
> On Aug 26, 2010, at 11:27 AM, Nigel Kersten wrote:
>> On Wed, Aug 25, 2010 at 10:09 PM, Luke Kanies <lu...@puppetlabs.com> wrote:
>>> On Aug 25, 2010, at 1:53 PM, Nigel Kersten wrote:
>>>> On Wed, Aug 25, 2010 at 11:17 AM, Luke Kanies <lu...@puppetlabs.com> wrote:

[...]

>>>> I have a few thoughts churning around about Facter having native
>>>> support for storing fact evaluation history on the client, which ties
>>>> into the open feature request for caching fact values, and I noticed
>>>> you reference a ttl option in #4565.
>>>
>>> It'd be great to hear these.
>>

>> So I feel that it would be immensely useful for Facter to optionally store
>> a certain amount of historical data about the fact evaluation.
>>
>> It would be great to be able to simply interrogate info like "when did the
>> amount of RAM in this machine change?" "what is my kernel version history?"
>> etc etc.
>>

>> To get there however, we need a persistent store for facts, which seems to
>> tie in quite nicely to the idea of having certain facts be cached, and
>> easily marked as "refresh once per boot" etc.

FWIW, I would be worried about any fact that was "refresh once per boot": an
awful lot of things can change dynamically, including hostname, memory
capacity, disk capacity, number of CPUs, and a bunch of other frequently
static things about a host.

(In fact, much of the engineering we did was to give us more capacity to
dynamically change many of those aspects, and part of it done by renaming
hosts as we rebuild them on the fly. Gotta love emergencies.)

>> Facter becomes much more useful as a standalone product with these
>> capabilities, and ideally we could hook Puppet/Puppet Dashboard into this
>> to store historical fact data. We could do this at the Puppet/Dashboard
>> layer, but if we decided to accept the feature request for caching fact
>> evaluation, then it appears to make more sense to have Facter support this
>> directly.
>
> Interesting. So in this scenario, Facter would develop a decent bit of its
> own functionality - maybe not a daemon, but at least long-term storage.

I think that like R. I. Pienaar I would lean towards making this a third tool,
which used facter but was not part of it. This sort of trending and
historical analysis is a very different use of the system than configuration
management.

> Would it be acceptable if, say, the puppet agent provided a simple interface
> to the server-side fact storage, which will already have this? We're
> working on designing something like this right now, although it's more

> mental goo than real ideas right now.

To me, some sort of API between fact storage and third party applications
would be desirable, but whatever form it took would be pretty much fine.

We don't so much care about talking to facter, as obtaining the information
that is stored in a specific fact, possibly from a specific node.


The only other capability that would be interesting would be to be able to
dynamically query facts from other nodes: at the moment we use mcollective to
query facter facts dynamically on a manual basis, but anything we do manually
is usually a pointer to something that we want to automate eventually...

Daniel

Footnotes:
[1] http://munin-monitoring.org/wiki/HowToWritePlugins

--
✣ Daniel Pittman ✉ dan...@rimspace.net+61 401 155 707
♽ made with 100 percent post-consumer electrons

Luke Kanies

unread,
Aug 31, 2010, 2:26:16 PM8/31/10
to puppe...@googlegroups.com
On Aug 28, 2010, at 3:44 AM, Daniel Pittman wrote:

> Luke Kanies <lu...@puppetlabs.com> writes:
>
> G'day. Having finally gotten free of some crash-priority engineering I have a
> chance to look this over.
>
>> Rein, Paul, and I had a call today discussing whether we should produce a
>> 1.6 (I said no, unless there are high priority tickets that really need to
>> be worked on), and then what the design goals of 2.0 should be. I took
>> notes on our discussion and atempted to produce a doc capturing it all:
>>
>> http://projects.puppetlabs.com/projects/facter/wiki/ArchitectureForTwoDotOh
>>
>> Comments appreciated.
>
> It looks pretty good to me, and the subsequent discussion has clarified some
> of the bits I was uncertain about when it came to our internal use of the
> facts.
>
> From my PoV one of the big gains would be making adding a new fact more like
> writing a munin plugin[1] than it currently is, although it is fairly simple
> and direct right now.

Yeah, I want to make this easier, but especially easier for sysadmins to do in a way they're familiar with, which usually means not writing ruby and not knowing special Facterisms.

> The main thing missing from this documentation, and which has bitten us in
> practice, is a lack of "community standards" for how facts should be
> presented.
>
> For example, we have a "mem_in_mb" fact to work around the human-focused
> values being returned from the default memory fact, or the difficulty in
> returning a boolean fact to puppet. (0 and Ruby false are both "true",
> apparently. :)

Hmm. Yeah, this looks to be missing. What's the best way to fix that?

> I would also be very happy to see more explicitness than is mentioned here
> about what sort of data types Facter handles: It sounds like y'all are
> thinking of something akin to JSON-level "rich" data structures, which I would
> be very happy with, rather than YAML-with-Ruby-classes "rich" data structures.
> (...or even plain "any Ruby object is fine" results. :)

Yeah - this is definitely raw data, not ruby objects. We already support providing Facter output as a YAML hash, but we're going to support hashes of hashes of hashes of arrays of... You get the idea. And it'll all be in multiple formats.

> WRT the point about grouping of facts, and resolution: to my mind, this would
> be a nice place to use a qualified name, and a search path:
>
> com.puppetlabs.memory
> net.rimspace.memory
>
> facter search path: rimspace.net, puppetlabs.com
>
> That would allow for qualified and unqualified facts to be used, and
> appropriate searching through them; mostly, I would imagine this being
> centrally configured in whatever tool was going to reference them.
>
> That same hierarchy lends itself, SNMP-like, to grouping facts as leaf nodes
> in a tree of naming.
>
> com.puppetlabs.ipaddress.eth0 => 192.168.1.1
> com.puppetlabs.ipaddress.eth2 => [172.16.23.1, 172.16.24.1]

Hmm. I've never thought of this. I've always kind of hated the reverse-domain style naming, but it's certainly common, and my opinions have often been wrong on this. Anyone else desirous of this?

It wouldn't translate all that well to Puppet variables - $com::puppetlabs::ipaddress::eth0?

I think it would also make overrides and things a bit complicated - is a fact in com.puppetlabs a fact that we shipped the code for, or a fact about a Puppet Labs project? Should a Solaris IP address have a different path than a Red Hat IP? What if you override the default IP fact with a custom one?

However, I can totally see naming the resolution mechanisms that way - we need a unique way of naming and finding resolutions, both for logging and testing, and this is probably a good option. That way we can track where the code is from (i.e., com.puppetlabs fact resolutions are part of the core).

> Finally, the new DSL as proposed in the ticket looks good to me. It removes
> much of the boilerplate code from the system. The one additional thing that
> would be convenient to know would be what assurances of namespace and
> execution I had as a developer:
>
> If I use a global variable, am I going to trample anyone else?

Probably. :)

> When, and how often, will my code be run?

Yeah, this should be clearer.

> I am thinking of facts like this one:
>
> Facter.add("grub2") do
> installed = %x{complex and costly command}.match(/installed/)
> setcode { installed ? 'true' : '' }
> end
>
> Knowing when, and how often, that costly command runs would be good, and
> having that promise as part of the specification for building facts would make
> it easier to be comfortable.

This is why we're adding support for TTL and back-end data collectors. As discussed, it's unlikely for Facter to cache this data, but the caller (usually Puppet) certainly could.

I have the same worry, but we need some ability to store fact data over time.

>>> Facter becomes much more useful as a standalone product with these
>>> capabilities, and ideally we could hook Puppet/Puppet Dashboard into this
>>> to store historical fact data. We could do this at the Puppet/Dashboard
>>> layer, but if we decided to accept the feature request for caching fact
>>> evaluation, then it appears to make more sense to have Facter support this
>>> directly.
>>
>> Interesting. So in this scenario, Facter would develop a decent bit of its
>> own functionality - maybe not a daemon, but at least long-term storage.
>
> I think that like R. I. Pienaar I would lean towards making this a third tool,
> which used facter but was not part of it. This sort of trending and
> historical analysis is a very different use of the system than configuration
> management.

Yep.

>> Would it be acceptable if, say, the puppet agent provided a simple interface
>> to the server-side fact storage, which will already have this? We're
>> working on designing something like this right now, although it's more
>> mental goo than real ideas right now.
>
> To me, some sort of API between fact storage and third party applications
> would be desirable, but whatever form it took would be pretty much fine.
>
> We don't so much care about talking to facter, as obtaining the information
> that is stored in a specific fact, possibly from a specific node.
>
>
> The only other capability that would be interesting would be to be able to
> dynamically query facts from other nodes: at the moment we use mcollective to
> query facter facts dynamically on a manual basis, but anything we do manually
> is usually a pointer to something that we want to automate eventually...

We'll be having a central inventory (fact) store very soon, which will give you this, but can you provide more about what you do with it? How concerned are you about security? I've been really hesitant to build a system that allows anyone to read and write anyone else's facts.

--
The surest sign that intelligent life exists elsewhere in the universe
is that it has never tried to contact us.
--Calvin and Hobbes (Bill Watterson)

Jeff McCune

unread,
Aug 31, 2010, 11:12:59 PM8/31/10
to puppe...@googlegroups.com
On Tue, Aug 31, 2010 at 2:26 PM, Luke Kanies <lu...@puppetlabs.com> wrote:
[snip]

>> That same hierarchy lends itself, SNMP-like, to grouping facts as leaf nodes
>> in a tree of naming.
>>
>>    com.puppetlabs.ipaddress.eth0 => 192.168.1.1
>>    com.puppetlabs.ipaddress.eth2 => [172.16.23.1, 172.16.24.1]
>
> Hmm.  I've never thought of this.  I've always kind of hated the reverse-domain style naming, but it's certainly common, and my opinions have often been wrong on this.  Anyone else desirous of this?

I'm impartial to reverse-domain style naming, but I've always loathed
my inability to quickly determine if a variable is a fact, a parameter
from the ENC, or was defined elsewhere in a manifest.

When I write my own facts, I prefix them with "fact_" and then prefix
again with "org_" so my facts look like "$fact_acme_datacenter" which
helps me read my code quickly. "Ah, this is a fact I wrote while at
acme"

This sort of looks like a reverse domain style, so I guess I'm not as
impartial as I originally thought.

I'm totally on board if the scheme helps me quickly identify where a
variable originated from while scanning through code. It's not a
show-stopper for me, but it'd be nice.

> It wouldn't translate all that well to Puppet variables - $com::puppetlabs::ipaddress::eth0?

I actually don't have a huge issue with this.

$fact::acme::datacenter
$parameter::acme::globalstop

That's not so bad to my eyes. For better or worse my variable names
are often longer and more verbose than the norm, so that might explain
why this looks OK to me.

> I think it would also make overrides and things a bit complicated - is a fact in com.puppetlabs a fact that we shipped the code for, or a fact about a Puppet Labs project?

Both? I don't really see the difference, but maybe it's lack of sleep.

If it's a specific project, I'm fine with being more verbose,
$fact::puppetlabs::dashboard::version which follows the convention,
e.g. com.sun.java

It's long, but helps me understand the code and context quickly.

> Should a Solaris IP address have a different path than a Red Hat IP?

Nope. I think it should be $fact::puppetlabs::ipaddress if we adopt
this model.

> What if you override the default IP fact with a custom one?

If Acme wanted their own ipaddress fact, they could implement
$fact::acme::ipaddress

Scanning Acme's code quickly, I would quickly understand they
implemented their own fact for ipaddress.

> However, I can totally see naming the resolution mechanisms that way - we need a unique way of naming and finding resolutions, both for logging and testing, and this is probably a good option.  That way we can track where the code is from (i.e., com.puppetlabs fact resolutions are part of the core).

Yeah, I'm not 100% sure about the specific name "com.puppetlabs" but I
think we're definitely on the right track regarding the schema.

--
Jeff McCune
http://www.puppetlabs.com/

Jeff McCune

unread,
Aug 31, 2010, 11:26:31 PM8/31/10
to puppe...@googlegroups.com
On Tue, Aug 31, 2010 at 2:26 PM, Luke Kanies <lu...@puppetlabs.com> wrote:
[snip]

>> To me, some sort of API between fact storage and third party applications
>> would be desirable, but whatever form it took would be pretty much fine.
>>
>> We don't so much care about talking to facter, as obtaining the information
>> that is stored in a specific fact, possibly from a specific node.

When this happens, I'd also like to make sure we don't hard code it to
lookup against specific nodes, but rather a subset of all nodes.

"Give me the service tag of all hosts in the east coast data center"

If we implement this selection, specific nodes are also implemented

"Give me the service tag of all hosts with a hostname foo.bar.com"

I think simple selections like this will cover the majority of cases.
This begs the question, "Can I specify multiple criteria?" i.e.
complex joins, but that can wait until later because it's a rare use
case as far as I can tell.

Daniel Pittman

unread,
Aug 31, 2010, 11:57:36 PM8/31/10
to puppe...@googlegroups.com
Jeff McCune <je...@puppetlabs.com> writes:
> On Tue, Aug 31, 2010 at 2:26 PM, Luke Kanies <lu...@puppetlabs.com> wrote:
> [snip]
>
>>> To me, some sort of API between fact storage and third party applications
>>> would be desirable, but whatever form it took would be pretty much fine.
>>>
>>> We don't so much care about talking to facter, as obtaining the
>>> information that is stored in a specific fact, possibly from a specific
>>> node.
>
> When this happens, I'd also like to make sure we don't hard code it to
> lookup against specific nodes, but rather a subset of all nodes.

[...]

> I think simple selections like this will cover the majority of cases. This
> begs the question, "Can I specify multiple criteria?" i.e. complex joins,
> but that can wait until later because it's a rare use case as far as I can
> tell.

FWIW, the search features that Chef offers through their integration of
Apache SOLR and the use of JSON for their equivalent of stored configs is
pretty attractive looking.

However, yeah: being able to specify complex criteria and retrieve complex
information would be highly desirable to me, because I miss it every now and
then. Nothing I can't work around, but...

Daniel

Daniel Pittman

unread,
Aug 31, 2010, 10:31:37 PM8/31/10
to puppe...@googlegroups.com
Luke Kanies <lu...@puppetlabs.com> writes:
> On Aug 28, 2010, at 3:44 AM, Daniel Pittman wrote:
>> Luke Kanies <lu...@puppetlabs.com> writes:
>>
>> G'day. Having finally gotten free of some crash-priority engineering I have a
>> chance to look this over.
>>
>>> Rein, Paul, and I had a call today discussing whether we should produce a
>>> 1.6 (I said no, unless there are high priority tickets that really need to
>>> be worked on), and then what the design goals of 2.0 should be. I took
>>> notes on our discussion and atempted to produce a doc capturing it all:
>>>
>>> http://projects.puppetlabs.com/projects/facter/wiki/ArchitectureForTwoDotOh
>>>
>>> Comments appreciated.
>>
>> It looks pretty good to me, and the subsequent discussion has clarified some
>> of the bits I was uncertain about when it came to our internal use of the
>> facts.
>>
>> From my PoV one of the big gains would be making adding a new fact more like
>> writing a munin plugin[1] than it currently is, although it is fairly simple
>> and direct right now.
>
> Yeah, I want to make this easier, but especially easier for sysadmins to do
> in a way they're familiar with, which usually means not writing ruby and not
> knowing special Facterisms.

*nod* FWIW, stealing the scoped names below this would work for me:

#!/bin/sh
echo -n "net.rimspace.memtotal="; awk '/^MemTotal:/ { print $2 }' /proc/meminfo
exit 0

In other words: when invoked emit your facts as "(qualified.)name=value" on
STDOUT, exit 0 for success, exit above for failure.

Allowing JSON output would also be nice, perhaps akin to:

#!/bin/sh
# an unlikely JSON result source, and hand written JSON, yay!
echo "com.puppetlabs.encoding=text/x-json" # future proof!
echo '{ "net.rimspace.memtotal": +inf }'
exit 0

>> The main thing missing from this documentation, and which has bitten us in
>> practice, is a lack of "community standards" for how facts should be
>> presented.
>>
>> For example, we have a "mem_in_mb" fact to work around the human-focused
>> values being returned from the default memory fact, or the difficulty in
>> returning a boolean fact to puppet. (0 and Ruby false are both "true",
>> apparently. :)
>
> Hmm. Yeah, this looks to be missing. What's the best way to fix that?

Given that false.to_s == "false", and 0.to_s == "0", I would suggest that
puppet make the gross assumption that something returning a string matching
/^(true|false)$/ means "puppet false", and matching /^[0-9]+$/ means a puppet
integer, as though they wrote:

$boolean_fact = false # or = true
$integer_fact = 0 # float is left as an exercise for the reader

>> I would also be very happy to see more explicitness than is mentioned here
>> about what sort of data types Facter handles: It sounds like y'all are
>> thinking of something akin to JSON-level "rich" data structures, which I
>> would be very happy with, rather than YAML-with-Ruby-classes "rich" data
>> structures. (...or even plain "any Ruby object is fine" results. :)
>
> Yeah - this is definitely raw data, not ruby objects. We already support
> providing Facter output as a YAML hash, but we're going to support hashes of
> hashes of hashes of arrays of... You get the idea. And it'll all be in
> multiple formats.

*nod* FWIW, in Perl YAML is literally an order of magnitude slower in either
direction than JSON, mostly (I think) in the encoder startup. So, allowing
JSON there would make Perl plugins a universe nicer to write.

>> WRT the point about grouping of facts, and resolution: to my mind, this would
>> be a nice place to use a qualified name, and a search path:
>>
>> com.puppetlabs.memory
>> net.rimspace.memory
>>
>> facter search path: rimspace.net, puppetlabs.com

[...]

> Hmm. I've never thought of this. I've always kind of hated the
> reverse-domain style naming, but it's certainly common, and my opinions have
> often been wrong on this. Anyone else desirous of this?

I don't much like it either, because it is cumbersome. What I would propose
instead would be:

> It wouldn't translate all that well to Puppet variables -
> $com::puppetlabs::ipaddress::eth0?

Given a search path of 'rimspace.net, puppetlabs.com', the transformation
would be:

net.rimspace.example1
net.rimspace.example2
com.puppetlabs.example1
com.puppetlabs.nested.eth0
com.puppetlabs.nested.eth1

$example1 is the 'net.rimspace.example1' value, because we search my namespace
before yours. $example2 is 'net.rimspace.example2', and $nested is a hash:

{ eth0 => com.puppetlabs.nested.eth0, eth1 => ... }

Anything outside the search path wouldn't get shortened:

com.example.whatever => $com => { example => { whatever => $value } } }

...or even where things in the fact search path get imported to variables, but
nothing else does:

# We don't search for com.example.whatever, but we can sure use it!
$alias = $facts["com.example.whatever"] # is this fatal for unset keys?

As long as the result was predictable, rather than too DWIMish then I think
folks would work it out. Some magic around modules having their own facts
in an automatically generated and searched namespace might also help.

> I think it would also make overrides and things a bit complicated - is a
> fact in com.puppetlabs a fact that we shipped the code for, or a fact about
> a Puppet Labs project?

IMO, it shouldn't matter a lick, although nothing prevents an arbitrary
declaration that 'core.' is a reserved namespace or so.

> Should a Solaris IP address have a different path than a Red Hat IP? What
> if you override the default IP fact with a custom one?

I would encourage qualification based on the *author* of the fact, not the
target of it — but I would also suggest that folks already do have this
problem.

($sd_virtual, ugly hack that I wrote to replace the previous ugly hack, I am
looking at you here ;)

> However, I can totally see naming the resolution mechanisms that way - we
> need a unique way of naming and finding resolutions, both for logging and
> testing, and this is probably a good option. That way we can track where
> the code is from (i.e., com.puppetlabs fact resolutions are part of the
> core).

*nod* Obviously the idea needs some polish, but I think it covers the common
use cases reasonably well and sanely, and without introducing too many new
sharp edges for developers to cut themselves on.

[...]

>> FWIW, I would be worried about any fact that was "refresh once per boot": an
>> awful lot of things can change dynamically, including hostname, memory
>> capacity, disk capacity, number of CPUs, and a bunch of other frequently
>> static things about a host.
>>
>> (In fact, much of the engineering we did was to give us more capacity to
>> dynamically change many of those aspects, and part of it done by renaming
>> hosts as we rebuild them on the fly. Gotta love emergencies.)
>
> I have the same worry, but we need some ability to store fact data over time.

*nod*

>> The only other capability that would be interesting would be to be able to
>> dynamically query facts from other nodes: at the moment we use mcollective to
>> query facter facts dynamically on a manual basis, but anything we do manually
>> is usually a pointer to something that we want to automate eventually...
>
> We'll be having a central inventory (fact) store very soon, which will give
> you this, but can you provide more about what you do with it?

Right now? Nothing. This is a "future" bit of work, not a right now bit of
work. I can tell you about the problem I have and why I think this is the
right solution for it, however:

My problem is that I have a whole bunch of virtual machines on the network,
and we are preferentially using virtual to physical services now.

These live in a set of places: two storage servers, plus one execution node
that runs a KVM virtual machine, which then hosts Linux running OpenVZ to
provide the actual instances of the machine.

I want my Nagios monitoring to reflect that set of dependencies so that when a
KVM OpenVZ host, or the hardware under the KVM, goes down I only get the
lowest level alerts, and the rest stay silent.


Using mcollective I can hook into the virtualization system and trigger a
puppet run on the Nagios servers when these configurations change, so I have
that part sorted out.

What I want is for the Nagios host to query a fact about another machine:

Given host foo, find the host that contains it. (none, or bar)
Given host bar, find the host that contains it. (baz)
...and so forth. Ideally no more than three layers deep and all. :)

foo and bar can't tell me which machine they are hosted on, because they don't
have that visibility, but their host can tell me it contains them.


So, what I kind of need is to refresh the facts about the host when the
contained machine is moved, in a way that doesn't depend on prior knowledge
about which machines are involved.

(Theoretically I could trigger a puppet refresh on every OpenVZ host, or every
KVM host, and do it that way, but that ... costs.)


> How concerned are you about security?

Not enormously, because this is not security critical information. If it
makes a security difference that I publish this data to an attacker then
I have already lost, I just don't know it yet.

However, I can see an advantage to an ACL system for this sort of information
because some of it might actually be security critical — information about
memory or CPU use could be a side-channel for inferring crypto keys, for
example.

So, if I was developing this as part of puppet I would look to allow ACL
specification based on at least IP ranges vs regexp / glob fact name matches.

> I've been really hesitant to build a system that allows anyone to read and
> write anyone else's facts.

Write would make we worry enormously, but reading them on demand is quite
attractive to me.

Daniel

Reply all
Reply to author
Forward
0 new messages