The road to Facter 2

312 views
Skip to first unread message

Adrien Thebo

unread,
Dec 3, 2013, 5:27:35 PM12/3/13
to puppe...@googlegroups.com

Structured fact support in Facter has often been compared to unicorns - long sought after, greatly desired, and nonexistent. There have been a number of proposed implementations but none have managed to make it into Facter - either the implementation was incomplete, changed too many things, or just didn’t have follow-through. Structured fact support has also been a hard requirement for Facter 2, which has meant that Facter 2 itself has been in limbo for a long time.


We’re hoping to change this state of affairs, and get a release of Facter 2 out that contains structured facts. In order to get something out in short order, the initial implementation will support simple data types like arrays, hashes, strings, and numbers, and will support basic fact composition. The idea is to get the core behavior released into the wild in Facter 2, and add more functionality in the Facter 2.x series*.


Normally the next major release of Facter would be based on master, but doing so would make things more complicated. The master and stable branches have been diverging for over a year, and master is 400 commits ahead of stable. The stable branch has only been receiving bugfixes while the master branch has been receiving features, but since master has been intended to be Facter 2 it’s been receiving breaking features as well as non-breaking features. We know the master branch will definitely break things, but we no longer know exactly what it’s going to break.


Because the state of master is so uncertain we’ve decided to base Facter 2 off of stable. Structured facts will be implemented, but Facter 2 won’t see a rewrite of all the core facts to minimize the changes released in Facter 2. The goal is to base Facter 2 off a known release state and minimize the amount of possible breaking changes so that the upgrade from Facter 1 to Facter 2 is as painless as possible. Later in the 2.x series we intend to start writing new core facts that take advantage to structured data once the initial release has landed and works.


Basing Facter 2 off of stable means that we will need to do some reordering of  the Facter branches. Our CI jobs currently test against the master and stable branches, so in order to test Facter 2 against the rest of our projects the master branch will need to be changed to the stable branch, and the current state of master will be shifted to a new name. I propose that we copy the master branch to a ‘next’ branch, and then revert the master branch to stable and start work on Facter 2 from there. This means that pull requests will be targeted at the right branch by default, our CI jobs will continue to behave as normal, and we can synchronize our branch strategy with Puppet and Hiera.


Since there are a number of commits in Facter master that we’ve promised will be in Facter 2, we’ll be backporting a number of changes from the next branch to master. We want to keep the number of backported changes limited to minimize the number of changes in Facter 2, but we realize that there are changes that many of you have been waiting on for a while. If there are specific commits that you really want to see in Facter 2, let us know and we can see about getting them backported. And of course, once Facter 2 is released we’ll be able to start putting out feature releases much more quickly. If things don’t make it into Facter 2 then we can see about getting features released in 2.1.


* Tentative roadmap for upcoming Facter releases:


  • 1.7.4

    • release date: mid-December

    • base: current stable

    • features: current stable + bugfixes for external facts (#22944, #22622)

  • 2.0.0

    • release date: mid-January

    • base: current stable

    • features: structured facts + short list of candidates from (current) master

  • 2.1.0

    • release date: TBD

    • base: future master

    • features: more facts from (current) master, select facts rewritten as structured facts


--
Adrien Thebo | Puppet Labs

Dustin J. Mitchell

unread,
Dec 3, 2013, 5:49:11 PM12/3/13
to puppe...@googlegroups.com
I understand the rename of master -> next and stable -> master, and
the backporting of some features from next, but what's to become of
next in general? If a patch in that branch doesn't come to someone's
attention in 2.0.0 or 2.1.0, will it ever make it into a release?

I'm not terribly invested in the answer to this question, aside from
(I assume) blocking the decision on CFPropertyList, but the answer
isn't clear from your note.

Dustin

Adrien Thebo

unread,
Dec 3, 2013, 6:29:26 PM12/3/13
to puppe...@googlegroups.com
Ah CFPropertyList, the gift that keeps on giving.

With respect to CFPropertyList, we're still planning on pulling it out for Facter 2. Rubygems supports platform specific gems, so we can build a gem for OSX that has the CFPropertyList gem, and not add that dependency on other platforms. On the .pkg front we're going to bundle CFPropertyList in at package build time; it doesn't exist at this very moment but we're doing our best to move in that direction. In addition I'm under the impression that CFPropertyList was actually included in OSX 10.9, and if that's the case then we can just depend on the OS packages.

With respect to the contents of the next branch, we haven't decided what we're ultimately going to do with it. The goal is to get Facter master into a releasable state now, and in the medium - long term incrementally transfer and release the work accumulated in next. We're not bound to just backporting patches to 2.0.0 and 2.1.0, as long as a given patch is backwards compatible we can merge it in the 2.x line. With respect to backwards incompatible patches we will have to wait till Facter 3 or enable those features with feature flags, but there's nothing preventing us from releasing that somewhat soon. With luck we'll hit the point where we've managed to move all the behavior from next into master and we can delete the branch. However, we have to kickstart things in some manner and if it's between delaying the release of patches in master and waiting another year or so to release Facter 2, I would prefer the former.




Dustin

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-dev/CAJtE5vS6R%3D6BnW5%3D-3YB%3D%3D2pPr2ho21_4kuLoOUoWR66zwNT1g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Erik Dalén

unread,
Dec 19, 2013, 5:56:08 AM12/19/13
to Puppet Developers
Do you have any draft of what the structure will look like?

Will it be much different from the overall structure of ohai?


--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.



--
Erik Dalén

Adrien Thebo

unread,
Dec 19, 2013, 3:33:56 PM12/19/13
to puppe...@googlegroups.com
Is this with respect to the structure of the generated data, or the structure of the Facter API itself? If it's the former, Facter will be able to return simple data types that should be readily serializable in different formats, so the output could be trivially serialized to JSON, YAML, msgpack, and similar. I have a pull request that implements this behavior at https://github.com/puppetlabs/facter/pull/575 .

With respect to the API, that's yet to be exactly nailed down. The `Facter.add` syntax will still be supported for compatibility, but we may add a few aliases and methods to make things cleaner. Things get murkier when you want to start composing multiple resolutions like what's described in https://tickets.puppetlabs.com/browse/FACT-65 but I have a few prototypes that make it possible by allowing resolutions to be extended. I've attached an example of that to this email that has generally gotten positive reviews.





For more options, visit https://groups.google.com/groups/opt_out.



--
Adrien Thebo | Puppet Labs
example.rb

Adrien Thebo

unread,
Jan 16, 2014, 2:01:03 PM1/16/14
to puppe...@googlegroups.com
After getting into the actual implementation of Facter 2, it turns out the original branch structure that I proposed would be more work than it would be worth to do. The original approach was to shift the 'master' branch into the 'next' branch and revert master to the current version of stable so that Facter could match the branching strategy that we use in our other projects.  Upon attempting this, it turned out that doing this would create a colossal amount of churn in the code base and make it very unclear what was different between master and stable. While it would be nice to be consistent in our branch structure it looks like we would be better off leaving master as-is.

Instead of renaming master to next and synchronizing master with stable, instead we've created a new branch for the Facter 2 release, namely 'facter-2'. New features for Facter should be targeted at this branch, and the Facter 2.0 final release will be based off this code. Changes to stable will be merged into facter-2, and facter-2 will be merged up into master to keep the different branches in sync.

In the long run we plan on incrementally backport commits from master to facter-2. When we've backported all the commits that we want to release in Facter 2, we'll revert any remaining changes in master that aren't in facter-2, and then merge facter-2 up into master.

Erik Dalén

unread,
Apr 2, 2014, 3:33:04 AM4/2/14
to Puppet Developers
On 3 December 2013 23:27, Adrien Thebo <adr...@puppetlabs.com> wrote:
So, now that Facter 2.0 is released I'm wondering a bit how the rewrite of facts to structured versions will be done in a backwards compatible way? Will all of them get new names?
I guess putting interfaces as network: { interfaces: { eth0: ... } } like ohai does it would make sense instead of reusing the current name.

But then there are facts like is_virtual that return "true" and "false" instead of true/false.

I'm also wondering now when it is time to actually start writing it if you have any thoughts about some overall structure to it before it turns into a mess? For example ohai has some large top level hashes with a bunch of facts under each of them: languages, network, filesystem, counters, kernel & etc. We don't have to copy that structure exactly of course, but it seems to make sense, and apart from kernel none of those names clash with old facts.

--
Erik Dalén

Adrien Thebo

unread,
Apr 3, 2014, 1:32:04 PM4/3/14
to puppe...@googlegroups.com
On Wed, Apr 2, 2014 at 12:33 AM, Erik Dalén <erik.gus...@gmail.com> wrote:

So, now that Facter 2.0 is released I'm wondering a bit how the rewrite of facts to structured versions will be done in a backwards compatible way? Will all of them get new names?

For cases where structured data was represented as dynamic/flat facts it should be straightforward to coalesce these in a single structured fact. For the EC2 facts I've converted the actual fact generation to a single structured fact (http://goo.gl/3tKjDE) which I then unpack into flat dynamic facts for backwards compatibility (http://goo.gl/IrwDIt). This pattern should be readily applicable to most of the other cases of converting dynamic facts to structured facts. 
 
I guess putting interfaces as network: { interfaces: { eth0: ... } } like ohai does it would make sense instead of reusing the current name.

But then there are facts like is_virtual that return "true" and "false" instead of true/false.

And this is where things get really fun.

I'm interested in exploring an approach where facts can be versioned to allow rewritten facts to coexist safely. The version of a specific facts of all facts can be determined at runtime so that we can start implementing backwards incompatible changes without breaking APIs. For instance we can implement a new version of is_virtual to return booleans, and version it such that it's non-default in Facter 2 but can be enabled, and in Facter 3 it is default but users can opt for the old behavior.

When working on Facter 2.0.1 we realized that solving this would be something that required some serious time and thought, and so we opted to defer on it for the sake of being able to ship _something_. Because of this we yet don't have a lot of concrete strategies for addressing this. If anyone has more solutions for how we can manage these sort of challenges I'm all ears. :)
 
I'm also wondering now when it is time to actually start writing it if you have any thoughts about some overall structure to it before it turns into a mess? For example ohai has some large top level hashes with a bunch of facts under each of them: languages, network, filesystem, counters, kernel & etc. We don't have to copy that structure exactly of course, but it seems to make sense, and apart from kernel none of those names clash with old facts.

I agree that it would be good to have better namespacing in Facter, and now is the time to start creating these.

There are a couple of approaches here.

First off, as part of Facter 2 there is a new way of resolving facts in a piece-wise manner, called aggregate resolutions (https://github.com/puppetlabs/facter/blob/master/lib/facter/core/aggregate.rb). This allows a single fact resolution to be built up with multiple blocks that are evaluated and then combined. This is specifically built for things like network information where you may have a very deeply nested structure of interface facts, available networking protocols, and whatnot. Moreover these facts can be extended by custom facts without having to clobber or override existing resolution implementations. The docs at http://docs.puppetlabs.com/facter/2.0/fact_overview.html detail the new functionality that's available.

This was my best effort to solve the problem of namespacing and building up complex structures, but I'm still waiting to see how useful it ultimately is. Depending on how it's used there may be some modifications to make aggregate resolutions easier to understand and create.

The other solution for this problem is to introduce the concept of namespaces into Facter. This has been discussed in the past and there have been some implementations created, but nothing has been merged into core. It is a potential option, and depending on how things go we may implement this to solved the described problem.

But considering the amount of unknowns we have in Facter 2, I think that right now we should wait and see how the new features in Facter 2 are used. Facter 2 has been stalled for so long because we tried to solve all problems at once and got stuck aiming for perfection. This time I want to see how the new functionality works, how people want to solve issues with fact compatibility issues, and so forth.

Eric Sorenson

unread,
Jun 23, 2014, 7:42:51 PM6/23/14
to puppe...@googlegroups.com
o hai :) Engaging in a little zombie thread resurrection here as I've been thinking about structured facts a lot lately and this thread touched on a big concern without any specific resolution.

On Thursday, April 3, 2014 10:32:04 AM UTC-7, Adrien Thebo wrote:
On Wed, Apr 2, 2014 at 12:33 AM, Erik Dalén wrote:
I'm also wondering now when it is time to actually start writing it if you have any thoughts about some overall structure to it before it turns into a mess? For example ohai has some large top level hashes with a bunch of facts under each of them: languages, network, filesystem, counters, kernel & etc. We don't have to copy that structure exactly of course, but it seems to make sense, and apart from kernel none of those names clash with old facts.

I agree that it would be good to have better namespacing in Facter, and now is the time to start creating these.

There are a couple of approaches here.

First off, as part of Facter 2 there is a new way of resolving facts in a piece-wise manner, called aggregate resolutions (https://github.com/puppetlabs/facter/blob/master/lib/facter/core/aggregate.rb). This allows a single fact resolution to be built up with multiple blocks that are evaluated and then combined. This is specifically built for things like network information where you may have a very deeply nested structure of interface facts, available networking protocols, and whatnot. Moreover these facts can be extended by custom facts without having to clobber or override existing resolution implementations. The docs at http://docs.puppetlabs.com/facter/2.0/fact_overview.html detail the new functionality that's available.

This was my best effort to solve the problem of namespacing and building up complex structures, but I'm still waiting to see how useful it ultimately is. Depending on how it's used there may be some modifications to make aggregate resolutions easier to understand and create.

The other solution for this problem is to introduce the concept of namespaces into Facter. This has been discussed in the past and there have been some implementations created, but nothing has been merged into core. It is a potential option, and depending on how things go we may implement this to solved the described problem.

I'm not quite sure what 'namespaces' mean here, could you explain that a little more?

I'm +1 for Erik's suggestion above, especially now that I've had an opportunity to play with Ohai a bit more. I think we can adopt the core top-level keys and document the schema for the structures underneath them as "curated" core facts whose implementation can be extended into more OSes over time. (To be clear, I'm not suggesting facter directly enforce the schema, just that we describe it so people know where to use 'keys' vs 'each', and what to expect inside the structures.)

Looking at ohai output on a centos6 vm, ( https://gist.github.com/ahpook/4f8a5782de64fa2b0768 ) there are number of nested fact structures that facter has some overlap with:

* network, w/ an interfaces hash and 'default_interface' / 'default_gateway' keys
* kernel, with a few keys for uname info
* memory, looks like its parsed straight from /proc/meminfo
* virtualization
* cpu, w/ /proc/cpuinfo
* keys, containing the hosts ssh pub keys

and I would be all for just mimicing the substructure of those unless there's some strong reason not to. Introducing the structured facts at new non-colliding names would be a way to get them in during 2.x feature releases without harming backwards compatibility.

--eric0

Kylo Ginsberg

unread,
Jun 30, 2014, 2:09:07 PM6/30/14
to puppe...@googlegroups.com
On Mon, Jun 23, 2014 at 4:42 PM, Eric Sorenson <eric.s...@puppetlabs.com> wrote:
On Thursday, April 3, 2014 10:32:04 AM UTC-7, Adrien Thebo wrote:
On Wed, Apr 2, 2014 at 12:33 AM, Erik Dalén wrote:

I'm also wondering now when it is time to actually start writing it if you have any thoughts about some overall structure to it before it turns into a mess? For example ohai has some large top level hashes with a bunch of facts under each of them: languages, network, filesystem, counters, kernel & etc. We don't have to copy that structure exactly of course, but it seems to make sense, and apart from kernel none of those names clash with old facts.

Looking at ohai output on a centos6 vm, ( https://gist.github.com/ahpook/4f8a5782de64fa2b0768 ) there are number of nested fact structures that facter has some overlap with:

* network, w/ an interfaces hash and 'default_interface' / 'default_gateway' keys
* kernel, with a few keys for uname info
* memory, looks like its parsed straight from /proc/meminfo
* virtualization
* cpu, w/ /proc/cpuinfo
* keys, containing the hosts ssh pub keys

and I would be all for just mimicing the substructure of those unless there's some strong reason not to. Introducing the structured facts at new non-colliding names would be a way to get them in during 2.x feature releases without harming backwards compatibility.

I'm plus one on Erik/Eric's comments above also. So assuming we go with the non-colliding top-level names:
* network
* filesystem
* virtualization
* memory
* cpu
* keys

What should we do with the colliding ones? The two that come to mind are:
* kernel
* uptime

In the gist above, ohai doesn't have a structured 'uptime' - it has a top-level 'uptime' and 'uptime_seconds' (and then it also has a top-level 'idletime' and 'idletime_seconds'). Seems like it would be nice to structure that information though.  We could go with 'uptime_hash' but that seems awkward, and then in a year or two we'll be explaining why some facts have "_hash" in their name :> Alternately we could go with something generic like 'time' but that might be go generic as to be meaningless.

And then 'kernel' (which is structured in the gist above). We could go with 'os' as a top-level structured name ('os' in the gist above is a flat fact, but that's not a fact name in facter currently). Or ...

Comments? Naming is hard :)

Kylo

Or there may be natural-enough names like, e.g. these might just be intuitive enough without colliding:
* virtualization
* processors or cpu

But my creativity is failing me for, say, "kernel" and "uptime". Naming is hard ;)

Comments?

Kylo
Reply all
Reply to author
Forward
0 new messages