Trouble with Facter

474 views
Skip to first unread message

Jonathan Gazeley

unread,
Sep 16, 2014, 5:44:32 AM9/16/14
to puppet...@googlegroups.com
Overnight my Facter seems to reporting new errors. As far as I can see,
neither Puppet, Facter, MCollective or the facts in question were
updated, so I'm struggling to find a cause. The problem as exacerbated
because MCollective is emailing me with the same error every 15 minutes.

This error is printed when I run facter:

Error loading fact /var/lib/puppet/lib/facter/warranty.rb: no such file
to load -- facter/util/warranty

The fact is provided by this module (my module, but not my code)

https://forge.puppetlabs.com/jgazeley/dell

The file /var/lib/puppet/lib/facter/warranty.rb does exist on my system
and is readable as is the referenced submodule in
/var/lib/puppet/lib/facter/util/warranty.rb , so I am not sure what is
causing the problem.

I'm using Puppet 3.7.1, Facter 2.2.0, MCollective 2.6.0 and
jgazeley/dell 0.1.2.

Any advice to sort this out?

Thanks,
Jonathan

jcbollinger

unread,
Sep 16, 2014, 9:30:50 AM9/16/14
to puppet...@googlegroups.com


On Tuesday, September 16, 2014 4:44:32 AM UTC-5, Jonathan Gazeley wrote:
Overnight my Facter seems to reporting new errors. As far as I can see,
neither Puppet, Facter, MCollective or the facts in question were
updated, so I'm struggling to find a cause. The problem as exacerbated
because MCollective is emailing me with the same error every 15 minutes.

This error is printed when I run facter:

Error loading fact /var/lib/puppet/lib/facter/warranty.rb: no such file
to load -- facter/util/warranty

The fact is provided by this module (my module, but not my code)

https://forge.puppetlabs.com/jgazeley/dell

The file /var/lib/puppet/lib/facter/warranty.rb does exist on my system
and is readable as is the referenced submodule in
/var/lib/puppet/lib/facter/util/warranty.rb , so I am not sure what is
causing the problem.



My first guess would be a load path problem.  If you're willing to limit your module to Ruby 1.9+, then you could try replacing the offending 'require' call with

    require_relative 'util/warranty'

Even if you don't want that Ruby version constraint, it might be illuminating to see whether 'require_relative' in fact does solve the problem.

For testing only, you could also consider printing the $LOAD_PATH just before executing the 'require'.

And speaking of Ruby version, are you certain that didn't change on you?  Clearly something did.


John

jcbollinger

unread,
Sep 16, 2014, 9:40:14 AM9/16/14
to puppet...@googlegroups.com


On Tuesday, September 16, 2014 4:44:32 AM UTC-5, Jonathan Gazeley wrote:
Overnight my Facter seems to reporting new errors. As far as I can see,
neither Puppet, Facter, MCollective or the facts in question were
updated, so I'm struggling to find a cause. The problem as exacerbated
because MCollective is emailing me with the same error every 15 minutes.

This error is printed when I run facter:

Error loading fact /var/lib/puppet/lib/facter/warranty.rb: no such file
to load -- facter/util/warranty

The fact is provided by this module (my module, but not my code)

https://forge.puppetlabs.com/jgazeley/dell

The file /var/lib/puppet/lib/facter/warranty.rb does exist on my system
and is readable as is the referenced submodule in
/var/lib/puppet/lib/facter/util/warranty.rb , so I am not sure what is
causing the problem.


Alternatively, you may want to check more thoroughly whether /var/lib/puppet/lib/facter/util/warranty.rb is in fact readable to Puppet in the relevant context.  In addition to the ordinary permission bits, it is possible that POSIX ACLs or mandatory access controls (i.e. SELinux) might prevent access to the file from the Puppet agent.  For SELinux, it could be that even though the relevant user (probably root) can read the file under other circumstances, the agent still cannot.  And switching SELinux from permissive to enforcing mode is something that can be done quickly and easily without rebooting, perhaps even automatically, and not necessarily with full comprehension of its effects.


John

Jonathan Gazeley

unread,
Sep 16, 2014, 10:06:34 AM9/16/14
to puppet...@googlegroups.com
Thanks for your detailed response. We're running Ruby 1.8.7, the standard packaged with CentOS 6. We're unwilling/unable to switch to a newer Ruby from another source, especially as we're a university and the freshers are arriving this weekend! The penalty for breaking IT systems in the run-up to the start of term isn't worth thinking about ;)

Permissions on the files in question and their parent directories shows that the files should be world-readable. We're not using POSIX ACLs. We are experienced users of SELinux and I can also confirm that the audit log shows nothing of relevance. Running in Permissive mode doesn't help. The facts that my module installs have the same SELinux contexts as other facts.

I checked over the yum.log and nothing was updated at the time that these symptoms started happening on 3 servers at the same time. Nothing was changed in the puppet.log, and nobody updated any Forge modules etc. The 3 servers are all Dell of the same hardware generation. Other Dell servers in our estate were not affected.

Attempting to use require_relative with Ruby 1.8.7 yields "undefined method `require_relative' for main:Object"

Printing the $LOAD_PATH does not list /var/lib/puppet/lib/facter/ when executing facter or facter -p as my user. It does not print any paths when I execute facter as root.

I don't really know my way around Facter and Ruby so I'm sort of shooting in the dark here trying to understand how it *should* work, and then working out why mine is different.

Thanks,
Jonathan

Trevor Vaughan

unread,
Sep 16, 2014, 3:01:18 PM9/16/14
to puppet...@googlegroups.com
I had a problem similar to this when an error was introduced to my puppet master's served code base.

To troubleshoot I did the following:

1) Restart the puppetmaster process (passenger, whatever you're using)
2) Remove everything under /var/lib/puppet/lib/facter/* on the clients that are having the issue.
3) Re-run the puppet agent on the affected node

One of two things should happen, either your facts will be re-sync'd and everything will be fine or an error will be thrown noting that you have an error somewhere else.

If this *doesn't* happen, check the server logs and see if there's something more insidious happening.

Good luck,

Trevor



--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/54180673.5030003%40bristol.ac.uk.
For more options, visit https://groups.google.com/d/optout.



--
Trevor Vaughan
Vice President, Onyx Point, Inc
(410) 541-6699
tvau...@onyxpoint.com

-- This account not approved for unencrypted proprietary information --

Jonathan Gazeley

unread,
Sep 16, 2014, 4:01:10 PM9/16/14
to puppet...@googlegroups.com
On 16/09/14 20:00, Trevor Vaughan wrote:
> I had a problem similar to this when an error was introduced to my
> puppet master's served code base.
>
> To troubleshoot I did the following:
>
> 1) Restart the puppetmaster process (passenger, whatever you're using)
> 2) Remove everything under /var/lib/puppet/lib/facter/* on the clients
> that are having the issue.
> 3) Re-run the puppet agent on the affected node
>
> One of two things should happen, either your facts will be re-sync'd
> and everything will be fine or an error will be thrown noting that you
> have an error somewhere else.
>
> If this *doesn't* happen, check the server logs and see if there's
> something more insidious happening.
>
> Good luck,
>
> Trevor

Thanks for your advice. I'll try this when I'm back at work in the
morning. For now, allow me to leave you with this video, which seems to
summarise your advice ;)

https://www.youtube.com/watch?v=kb2gzteVNa4

Cheers,
Jonathan

Trevor Vaughan

unread,
Sep 16, 2014, 4:14:56 PM9/16/14
to puppet...@googlegroups.com
*slow clap*

Well played sir, well played.
Trevor


Jonathan

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jonathan Gazeley

unread,
Sep 17, 2014, 9:14:33 AM9/17/14
to puppet...@googlegroups.com
On 16/09/14 21:14, Trevor Vaughan wrote:
*slow clap*

Well played sir, well played.
Trevor

On Tue, Sep 16, 2014 at 4:00 PM, Jonathan Gazeley <Jonathan...@bristol.ac.uk> wrote:
On 16/09/14 20:00, Trevor Vaughan wrote:
I had a problem similar to this when an error was introduced to my puppet master's served code base.

To troubleshoot I did the following:

1) Restart the puppetmaster process (passenger, whatever you're using)
2) Remove everything under /var/lib/puppet/lib/facter/* on the clients that are having the issue.
3) Re-run the puppet agent on the affected node

One of two things should happen, either your facts will be re-sync'd and everything will be fine or an error will be thrown noting that you have an error somewhere else.

If this *doesn't* happen, check the server logs and see if there's something more insidious happening.

Good luck,

Trevor

Thanks for your advice. I'll try this when I'm back at work in the morning. For now, allow me to leave you with this video, which seems to summarise your advice ;)

https://www.youtube.com/watch?v=kb2gzteVNa4

Cheers,
Jonathan

--


Well, I tried your suggestion. Puppet put back all the same facts, exactly as they were and the problem is not solved :(

Anyone have any other suggestions on how to debug facts? Otherwise I'm more-or-less resigned to just remove the offending facts. We don't really use them anyway.

The crux of the issue seems to be that /var/lib/puppet/lib/facter/warranty.rb wants to load
/var/lib/puppet/lib/facter/util/warranty.rb and is unable to. I don't know much Ruby, but would it be sensible to consider merging the two files into one script to avoid the process of loading?

Thanks,
Jonathan

Trevor Vaughan

unread,
Sep 17, 2014, 12:30:06 PM9/17/14
to puppet...@googlegroups.com
Ah, so looking at warranty.rb and then stdlib...puppet_vardir.rb, you may be missing this:

begin
  require 'facter/util/puppet_settings'
rescue LoadError => e
  # puppet apply does not add module lib directories to the $LOAD_PATH (See
  # #4248). It should (in the future) but for the time being we need to be
  # defensive which is what this rescue block is doing.
  rb_file = File.join(File.dirname(__FILE__), 'util', 'puppet_settings.rb')
  load rb_file if File.exists?(rb_file) or raise e
end


Trevor

--
You received this message because you are subscribed to the Google Groups "Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/puppet-users/5419891F.60301%40bristol.ac.uk.

For more options, visit https://groups.google.com/d/optout.

Jonathan Gazeley

unread,
Sep 18, 2014, 5:36:16 AM9/18/14
to puppet...@googlegroups.com
On 17/09/14 17:29, Trevor Vaughan wrote:
> Ah, so looking at warranty.rb and then stdlib...puppet_vardir.rb, you
> may be missing this:
>
> begin
> require 'facter/util/puppet_settings'
> rescue LoadError => e
> # puppet apply does not add module lib directories to the $LOAD_PATH
> (See
> # #4248). It should (in the future) but for the time being we need to be
> # defensive which is what this rescue block is doing.
> rb_file = File.join(File.dirname(__FILE__), 'util',
> 'puppet_settings.rb')
> load rb_file if File.exists?(rb_file) or raise e
> end
>

Thanks. I added a similar block of code to load util/warranty.rb but it
didn't cure the situation. Something is different though. Running facter
by hand no longer complains, but my 15-minutely emails from MCollective
as it tries to cache facts are still being sent:

Could not retrieve fact='warranty_start', resolution='<anonymous>': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_end', resolution='<anonymous>': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_end', resolution='<anonymous>': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_days_left', resolution='<anonymous>': can't dup NilClass


BTW #4248 is a very old bug, surely it's still not being hit?

https://projects.puppetlabs.com/issues/4248

I'm out of the office for a couple hours but I'll some more time into
this a bit later this afternoon.

Cheers,
Jonathan

jcbollinger

unread,
Sep 18, 2014, 9:27:28 AM9/18/14
to puppet...@googlegroups.com


On Thursday, September 18, 2014 4:36:16 AM UTC-5, Jonathan Gazeley wrote:
On 17/09/14 17:29, Trevor Vaughan wrote:
> Ah, so looking at warranty.rb and then stdlib...puppet_vardir.rb, you
> may be missing this:
>
> begin
>   require 'facter/util/puppet_settings'
> rescue LoadError => e
>   # puppet apply does not add module lib directories to the $LOAD_PATH
> (See
>   # #4248). It should (in the future) but for the time being we need to be
>   # defensive which is what this rescue block is doing.
>   rb_file = File.join(File.dirname(__FILE__), 'util',
> 'puppet_settings.rb')
>   load rb_file if File.exists?(rb_file) or raise e
> end
>

Thanks. I added a similar block of code to load util/warranty.rb but it
didn't cure the situation. Something is different though. Running facter
by hand no longer complains, but my 15-minutely emails from MCollective
as it tries to cache facts are still being sent:

Could not retrieve fact='warranty_start', resolution='<anonymous>': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_end', resolution='<anonymous>': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_end', resolution='<anonymous>': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_days_left', resolution='<anonymous>': can't dup NilClass



That suggests to me that you have moved on to a different issue.  Matching up the code (facter/warranty.rb and facter/util/warranty.rb) with the error messages you report, it seems likely that function Facter::Util::Warranty::get_data() is failing on you.  That will cause both Facter::Util::Warranty::purchase_date() and the Facter::Util::Warranty::warranties() to return nil.  The ::warranty_start, ::warranty_end, and ::warranty_days_left facts all depend on those two functions, assuming without testing that they never return nil, hence the observed errors.

As to why get_data() may be failing, it looks like there are several possibilities:

1) The function relies on calling an HTTP API at Dell to retrieve data (lines 6-12 and 60-72).  It may be that that API has been modified or removed, or that it no longer serves data for the particular machines in question.
2) The get_data() function relies on the ::serialnumber fact.  I am uncertain how that fact is computed, but maybe something changed that affected its result.
3) The get_data() function uses a local cache to avoid calling the Dell API every time.  If that cache has been corrupted but not altogether removed then perhaps get_data() would fail.
 
In addition, the local caching also means that, depending on the cache timeout, the change that triggered the error might have happened some time before the error started appearing.



BTW #4248 is a very old bug, surely it's still not being hit?

https://projects.puppetlabs.com/issues/4248



That's not your issue.  It pertains to plugin classes on the master that are present only in their modules (i.e. they have not yet been plugin-synced to the master itself).  Facts are the one plugin type to which that bug cannot apply.


John

Jonathan Gazeley

unread,
Sep 18, 2014, 10:26:12 AM9/18/14
to puppet...@googlegroups.com
On 18/09/14 14:27, jcbollinger wrote:


On Thursday, September 18, 2014 4:36:16 AM UTC-5, Jonathan Gazeley wrote:


Could not retrieve fact='warranty_start', resolution='<anonymous>': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_end', resolution='<anonymous>': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_end', resolution='<anonymous>': undefined method `[]' for nil:NilClass
Could not retrieve fact='warranty_days_left', resolution='<anonymous>': can't dup NilClass



That suggests to me that you have moved on to a different issue.  Matching up the code (facter/warranty.rb and facter/util/warranty.rb) with the error messages you report, it seems likely that function Facter::Util::Warranty::get_data() is failing on you.  That will cause both Facter::Util::Warranty::purchase_date() and the Facter::Util::Warranty::warranties() to return nil.  The ::warranty_start, ::warranty_end, and ::warranty_days_left facts all depend on those two functions, assuming without testing that they never return nil, hence the observed errors.

OK



As to why get_data() may be failing, it looks like there are several possibilities:

1) The function relies on calling an HTTP API at Dell to retrieve data (lines 6-12 and 60-72).  It may be that that API has been modified or removed, or that it no longer serves data for the particular machines in question.

A quick test using a web browser suggests that the API is still present, working, and returning valid data


2) The get_data() function relies on the ::serialnumber fact.  I am uncertain how that fact is computed, but maybe something changed that affected its result.

On Dell machines, the ::serialnumber fact returns the service tag of the hardware. I have no idea how it retrieves that data, but it only prints out the fact when run as root. Perhaps this is breaking it, if this fact can't read other facts.


3) The get_data() function uses a local cache to avoid calling the Dell API every time.  If that cache has been corrupted but not altogether removed then perhaps get_data() would fail.

It's a possibility that I will investigate. In the meantime, thanks for your suggestions.

Cheers,
Jonathan

jcbollinger

unread,
Sep 18, 2014, 10:53:36 AM9/18/14
to puppet...@googlegroups.com


On Thursday, September 18, 2014 9:26:12 AM UTC-5, Jonathan Gazeley wrote:
On 18/09/14 14:27, jcbollinger wrote:

 

2) The get_data() function relies on the ::serialnumber fact.  I am uncertain how that fact is computed, but maybe something changed that affected its result.

On Dell machines, the ::serialnumber fact returns the service tag of the hardware. I have no idea how it retrieves that data, but it only prints out the fact when run as root. Perhaps this is breaking it, if this fact can't read other facts.



If Puppet is running as an unprivileged account on the client then that very likely would explain the problem, but that would be unusual.  There could also have been some kind of access control change around the mechanism by which the ::serialnumber is determined, but you already said you weren't seeing anything relevant in your audit log.

 
3) The get_data() function uses a local cache to avoid calling the Dell API every time.  If that cache has been corrupted but not altogether removed then perhaps get_data() would fail.

It's a possibility that I will investigate. In the meantime, thanks for your suggestions.



Any way around, you could consider modifying the implementations of the ::warranty_start, ::warranty_end, and ::warranty_days_left facts to handle this situation more gracefully.  I account them flawed for not doing so now.


John

Jonathan Gazeley

unread,
Sep 18, 2014, 10:57:43 AM9/18/14
to puppet...@googlegroups.com
On 18/09/14 15:53, jcbollinger wrote:


On Thursday, September 18, 2014 9:26:12 AM UTC-5, Jonathan Gazeley wrote:
On 18/09/14 14:27, jcbollinger wrote:

 

2) The get_data() function relies on the ::serialnumber fact.  I am uncertain how that fact is computed, but maybe something changed that affected its result.

On Dell machines, the ::serialnumber fact returns the service tag of the hardware. I have no idea how it retrieves that data, but it only prints out the fact when run as root. Perhaps this is breaking it, if this fact can't read other facts.



If Puppet is running as an unprivileged account on the client then that very likely would explain the problem, but that would be unusual.  There could also have been some kind of access control change around the mechanism by which the ::serialnumber is determined, but you already said you weren't seeing anything relevant in your audit log.

Our Puppet agent is running as root, so this isn't the problem. And yes, the audit log doesn't turn up anything relevant.


 
3) The get_data() function uses a local cache to avoid calling the Dell API every time.  If that cache has been corrupted but not altogether removed then perhaps get_data() would fail.

It's a possibility that I will investigate. In the meantime, thanks for your suggestions.



Any way around, you could consider modifying the implementations of the ::warranty_start, ::warranty_end, and ::warranty_days_left facts to handle this situation more gracefully.  I account them flawed for not doing so now.


I think you are correct and this needs some dev work. However, this isn't my code (I inherited it) and I don't have the time to work on these facts which are not really used in my environment. Therefore, sadly I'm going to remove these facts to silence the warnings and get on with my other work.

Thanks John and Trevor for your help - at least I learned something along the way, and that's never a waste of time.

Cheers,
Jonathan
Reply all
Reply to author
Forward
0 new messages