Solaris processorcount fact - what to do

44 views
Skip to first unread message

Alex Harvey

unread,
Dec 10, 2012, 11:44:40 PM12/10/12
to puppe...@googlegroups.com
Hi all,

This follows discussion on puppet users with Andrew Beresford
https://groups.google.com/forum/?fromgroups=&pli=1#!topic/puppet-users/rOj9OszhlQM

I was hoping to get more input from the community; now trying puppet developers for some advice.

Summary of discussion is we've found the processorcount fact counts CPU threads on linux but CPU cores on Solaris.  The behaviour on Solaris violates the principle of least surprise for my colleagues and I.  Our first reaction was it is a bug.  Question is what we can do about it now that these facts are out there in the wild.

Andrew Beresford (who found conversely that the linux behaviour was more surprising to him) suggested we create new facts - processorcorecount and processorthreadcount and then deprecate processorcount.  I'm a bit agnostic on the matter.  Certainly I can't volunteer to implement the new facts on all platforms for lack of both time and access to all the platforms to test on.

Another option presumably is simply to change the Solaris processorcount fact to do the same as what it does on linux.  This could arguably be described as either a bugfix or a non-backwards compatible change depending on your perspective.  Certainly, having the fact count different things on Solaris & Linux appears to be an error.

Anyhow would be good to get the advice of Puppet Labs on this one.

Best regards,
Alex Harvey

Josh Cooper

unread,
Dec 11, 2012, 2:20:54 AM12/11/12
to puppe...@googlegroups.com
Hi Alex,

On Mon, Dec 10, 2012 at 8:44 PM, Alex Harvey <alexh...@gmail.com> wrote:
> Hi all,
>
> This follows discussion on puppet users with Andrew Beresford
> https://groups.google.com/forum/?fromgroups=&pli=1#!topic/puppet-users/rOj9OszhlQM
>
> I was hoping to get more input from the community; now trying puppet
> developers for some advice.
>
> Summary of discussion is we've found the processorcount fact counts CPU
> threads on linux but CPU cores on Solaris. The behavior on Solaris
> violates the principle of least surprise for my colleagues and I. Our first
> reaction was it is a bug. Question is what we can do about it now that
> these facts are out there in the wild.

I feel your pain having done this recently on Windows[1]. On Windows,
processorcount counts logical CPUs (which may include multicore or
hyperthreaded CPUs), and physicalprocessorcount counts physical CPUs
(not cores).

> Andrew Beresford (who found conversely that the linux behavior was more
> surprising to him) suggested we create new facts - processorcorecount and
> processorthreadcount and then deprecate processorcount. I'm a bit agnostic
> on the matter. Certainly I can't volunteer to implement the new facts on
> all platforms for lack of both time and access to all the platforms to test
> on.

I agree that creating a new fact to count CPU cores is preferrable
I'm not sure we need to deprecate processorcount, since it's already
established to mean processorthreadcount, at least on Linux, but
Solaris users may disagree.

>
> Another option presumably is simply to change the Solaris processorcount
> fact to do the same as what it does on linux. This could arguably be
> described as either a bugfix or a non-backwards compatible change depending
> on your perspective. Certainly, having the fact count different things on
> Solaris & Linux appears to be an error.
>

In cases were I find a Windows fact is wrong, e.g. architecture[2],
domain[3], I've taken the route that it's better to fix the bug and
make it consistent with other platforms, than to not do so for fear of
making a backwards incompatible change. With that said, it's good to
think about the impact changing a fact's semantics will have, and
whether the benefit outweighs the pain. In the case of the domain
fact, Windows nodes with a primary dns suffix, but no
connection-specific suffixes were reporting an empty domain. Clearly
fixing the fact was the right thing to do, but it meant users had to
regenerate their windows agent certificates (due to the default
certname being based on the fqdn).

> Anyhow would be good to get the advice of Puppet Labs on this one.
>
> Best regards,
> Alex Harvey
>
> --
> You received this message because you are subscribed to the Google Groups
> "Puppet Developers" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/puppet-dev/-/GZEBOO_xeoMJ.
> To post to this group, send email to puppe...@googlegroups.com.
> To unsubscribe from this group, send email to
> puppet-dev+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/puppet-dev?hl=en.

Josh

[1] http://projects.puppetlabs.com/issues/8439#note-2
[2] http://projects.puppetlabs.com/issues/10261
[3] http://projects.puppetlabs.com/issues/12864

--
Josh Cooper
Developer, Puppet Labs

Alex Harvey

unread,
Dec 16, 2012, 11:53:28 PM12/16/12
to puppe...@googlegroups.com


On Tuesday, December 11, 2012 6:20:54 PM UTC+11, Josh Cooper wrote:
In cases were I find a Windows fact is wrong, e.g. architecture[2],
domain[3], I've taken the route that it's better to fix the bug and
make it consistent with other platforms, than to not do so for fear of
making a backwards incompatible change. With that said, it's good to
think about the impact changing a fact's semantics will have, and
whether the benefit outweighs the pain. In the case of the domain
fact, Windows nodes with a primary dns suffix, but no
connection-specific suffixes were reporting an empty domain. Clearly
fixing the fact was the right thing to do, but it meant users had to
regenerate their windows agent certificates (due to the default
certname being based on the fqdn).

It seems to me that if we agree that we need to fix this eventually then the sooner it's fixed, the less the chance that new manifests will be written using the old, broken behaviour.  That said I'm not the one who faces the Solaris customers when they upgrade facter and potentially find their manifests no longer work.  So with that in mind ... shall I proceed to raise a bug and 'fix' this?

James Polley

unread,
Dec 17, 2012, 12:44:23 AM12/17/12
to puppe...@googlegroups.com
Because this change would be backwards-incompatible, SemVer requires that version with the fix increments the major version number, so this would have to go into 2.0.0.

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/puppet-dev/-/Yuv6GOFPnqYJ.

Alex Harvey

unread,
Dec 17, 2012, 1:51:57 AM12/17/12
to puppe...@googlegroups.com


On Monday, December 17, 2012 4:44:23 PM UTC+11, James Polley wrote:

Because this change would be backwards-incompatible, SemVer requires that version with the fix increments the major version number, so this would have to go into 2.0.0.

I guess that takes us back to what was discussed earlier in the thread - do we consider this a bugfix or a backwards incompatible change?  Personally, I regard it as a bug - the meaning of the fact is defined in linux and other facts as a count of logical CPUs.  On Solaris it is counting CPU cores, apparently in error.  That said, treating it as a backwards incompatible change makes a certain amount of sense - and happy to proceed this way if that's what Puppet Labs wants.

James Polley

unread,
Dec 17, 2012, 1:55:30 AM12/17/12
to puppe...@googlegroups.com
On Mon, Dec 17, 2012 at 5:51 PM, Alex Harvey <alexh...@gmail.com> wrote:


On Monday, December 17, 2012 4:44:23 PM UTC+11, James Polley wrote:

Because this change would be backwards-incompatible, SemVer requires that version with the fix increments the major version number, so this would have to go into 2.0.0.

I guess that takes us back to what was discussed earlier in the thread - do we consider this a bugfix or a backwards incompatible change? 

I don't think so.

http://semver.org/ is quite specific about this: you can increment the patch version "if only backwards compatible bug fixes are introduced", the minor version "if new, backwards compatible functionality is introduced to the public API" - but "if any backwards incompatible changes are introduced to the public API", "Major version X (X.y.z | X > 0) MUST be incremented"

My reading is that, even if this is just a bugfix, the fact that it's backwards-incompatible requires a major version bump.

Personally, I regard it as a bug - the meaning of the fact is defined in linux and other facts as a count of logical CPUs.  On Solaris it is counting CPU cores, apparently in error.  That said, treating it as a backwards incompatible change makes a certain amount of sense - and happy to proceed this way if that's what Puppet Labs wants.

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/puppet-dev/-/fI5euuMQG6UJ.

Alex Harvey

unread,
Dec 17, 2012, 2:14:13 AM12/17/12
to puppe...@googlegroups.com


On Monday, December 17, 2012 5:55:30 PM UTC+11, James Polley wrote:

I don't think so.

http://semver.org/ is quite specific about this: you can increment the patch version "if only backwards compatible bug fixes are introduced", the minor version "if new, backwards compatible functionality is introduced to the public API" - but "if any backwards incompatible changes are introduced to the public API", "Major version X (X.y.z | X > 0) MUST be incremented"

My reading is that, even if this is just a bugfix, the fact that it's backwards-incompatible requires a major version bump.

Well, without a documented API you could argue that all bugfixes are backwards-incompatible changes.  A backwards incompatible change is a change to the public API, as you've noted.  In this case the fact in question is apparently supposed to count logical CPUs whereas in reality it counts CPU cores on Solaris.

Is there an actual documented API to refer to?  If not, I don't think SemVer really helps us here - it becomes a matter of opinion as to what the implied API is.  In my opinion, the implied API is that processorcount counts logical CPUs and currently gets it wrong on Solaris.

All said, however, whatever SemVer says I think for purely practical reasons perhaps we should treat this as a backwards incompatible change. :)

Jeff McCune

unread,
Dec 17, 2012, 12:24:38 PM12/17/12
to puppe...@googlegroups.com
This is a great discussion, please keep it up!

On Sunday, December 16, 2012, Alex Harvey wrote:


On Monday, December 17, 2012 5:55:30 PM UTC+11, James Polley wrote:

My reading is that, even if this is just a bugfix, the fact that it's backwards-incompatible requires a major version bump.

Well, without a documented API you could argue that all bugfixes are backwards-incompatible changes.  A backwards incompatible change is a change to the public API, as you've noted.  In this case the fact in question is apparently supposed to count logical CPUs whereas in reality it counts CPU cores on Solaris.

This is true, but even without a documented API we can make changes that we have a high degree of confidence are backwards compatible.  I struggle with this issue all the time and simply try and make a best effort to minimize the disruption to the end user who upgrades.  Often this is the best we can do.

For our part, we're currently working to document as much of the public Facter API as possible. Please see Patrick's pull request from last week.

 
Is there an actual documented API to refer to?  If not, I don't think SemVer really helps us here - it becomes a matter of opinion as to what the implied API is.  In my opinion, the implied API is that processorcount counts logical CPUs and currently gets it wrong on Solaris.

SemVer is still useful even without a public API. it's just a lot more difficult to make decisions like this one.

All said, however, whatever SemVer says I think for purely practical reasons perhaps we should treat this as a backwards incompatible change. :) 
 
In this case I think we must release this in Facter 2 or figure out a way to make it backwards compatible.

-Jeff

Andy Parker

unread,
Dec 17, 2012, 1:46:48 PM12/17/12
to puppe...@googlegroups.com
On Mon, Dec 17, 2012 at 9:24 AM, Jeff McCune <je...@puppetlabs.com> wrote:
This is a great discussion, please keep it up!

On Sunday, December 16, 2012, Alex Harvey wrote:


On Monday, December 17, 2012 5:55:30 PM UTC+11, James Polley wrote:

My reading is that, even if this is just a bugfix, the fact that it's backwards-incompatible requires a major version bump.

Well, without a documented API you could argue that all bugfixes are backwards-incompatible changes.  A backwards incompatible change is a change to the public API, as you've noted.  In this case the fact in question is apparently supposed to count logical CPUs whereas in reality it counts CPU cores on Solaris.

This is true, but even without a documented API we can make changes that we have a high degree of confidence are backwards compatible.  I struggle with this issue all the time and simply try and make a best effort to minimize the disruption to the end user who upgrades.  Often this is the best we can do.

For our part, we're currently working to document as much of the public Facter API as possible. Please see Patrick's pull request from last week.


The documentation so far is for the programming interface, but not what the facts are supposed to be. The docs.puppetlabs.com site tries to say what all of the facts are, but it ends up being pretty vague in a lot of cases (http://docs.puppetlabs.com/facter/1.6/core_facts.html).

 
Is there an actual documented API to refer to?  If not, I don't think SemVer really helps us here - it becomes a matter of opinion as to what the implied API is.  In my opinion, the implied API is that processorcount counts logical CPUs and currently gets it wrong on Solaris.

SemVer is still useful even without a public API. it's just a lot more difficult to make decisions like this one.

All said, however, whatever SemVer says I think for purely practical reasons perhaps we should treat this as a backwards incompatible change. :) 
 
In this case I think we must release this in Facter 2 or figure out a way to make it backwards compatible.

Given the vagueness of the 1.6 facts, I think we should aim to resolve this in 2.0, otherwise we'll just keep having this kind of conversation again and again. Does anyone want to take a stab at putting together clear, concise definitions of the core facts that would resolve these kinds of questions?

There is the other question of guiding Facter to a 2.0 release. We just haven't had enough time to devote to it to make sure that it would be a release that truly advances us, but if anyone wants to give it a shot, speak up.
 

-Jeff

--
You received this message because you are subscribed to the Google Groups "Puppet Developers" group.

Alex Harvey

unread,
Dec 19, 2012, 2:52:56 AM12/19/12
to puppe...@googlegroups.com
Unfortunately I can't volunteer to document all these facts (I'd struggle to justify that one to my manager and/or wife :) but I've raised a bug with target version set to 2.0 and will submit a patch at some point-
https://projects.puppetlabs.com/issues/18215
Reply all
Reply to author
Forward
0 new messages