Jira (FACT-1732) Facter core dump on large integers

2 views
Skip to first unread message

Trevor Vaughan (JIRA)

unread,
Aug 16, 2017, 5:08:02 PM8/16/17
to puppe...@googlegroups.com
Trevor Vaughan created an issue
 
Facter / Bug FACT-1732
Facter core dump on large integers
Issue Type: Bug Bug
Affects Versions: FACT 3.5.1
Assignee: Unassigned
Attachments: facter_core_dump.txt
Components: CLI
Created: 2017/08/16 2:07 PM
Environment:

CentOS 7.4

Priority: Major Major
Reporter: Trevor Vaughan

Facter is core dumping when returning facts with large integers.

How to reproduce:

  • Create test_fact.rb with the following content:

Facter.add("test_fact") do
  setcode do
      # Unsigned 64-bit Integer
      # Sample failure noted on querying 'shmall' in sysctl
   18446744073692774399
 end
end

  • Run facter -p test_fact
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.14#64029-sha1:ae256fe)
Atlassian logo

Josh Cooper (JIRA)

unread,
Aug 23, 2017, 1:33:02 AM8/23/17
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Team: Platform OS

Branan Riley (JIRA)

unread,
Aug 23, 2017, 1:15:03 PM8/23/17
to puppe...@googlegroups.com
Branan Riley updated an issue
Change By: Branan Riley
Sprint: Platform OS Grooming

Branan Riley (JIRA)

unread,
Aug 23, 2017, 1:15:03 PM8/23/17
to puppe...@googlegroups.com
Branan Riley commented on Bug FACT-1732
 
Re: Facter core dump on large integers

I can reproduce this with Facter 3.8.0 and Ruby 2.4, using the test case in the ticket

Branan Riley (JIRA)

unread,
Aug 23, 2017, 2:02:02 PM8/23/17
to puppe...@googlegroups.com
Branan Riley commented on Bug FACT-1732

This is caused by a couple of things:

1) We treat all integers as signed in the C++ code, so the conversion from ruby fails
2) we don't have proper error handling around the Ruby API here, so we crash hard

2 is easy to fix. 1 might take a bit more work. I don't really want us to support bignum values in CFacter, but unsigned 64-bit values do seem reasonable to me.

Trevor Vaughan (JIRA)

unread,
Aug 29, 2017, 10:42:02 AM8/29/17
to puppe...@googlegroups.com

Branan Riley I get not wanting to handle bignum values, but you don't really get to pick what people stuff into Facter facts and if you start talking about things over 4 Petabytes in Bytes, you're going to have a bad time since Ruby's max Fixnum on a 64 bit infrastructure is 2**62 - 1. This is not unreasonable on large systems even with the native facts.

You might just want to bring in libgmp and call it a day. Alternatively, I guess you could treat them all as strings and mark them with something that says to translate to Integer (or whatever) before you shove it over the wire. Not precise, and more work, but 100% safe.

Branan Riley (JIRA)

unread,
Sep 26, 2017, 1:33:05 PM9/26/17
to puppe...@googlegroups.com
Branan Riley commented on Bug FACT-1732

My preferred solution would be to just represent those large values as strings - the size of a disk in bytes is unlikely to be something that folks are doing math on in their puppet manifests. This avoids bringing in a bignum library until we hit a use-case that really requires it.

Branan Riley (JIRA)

unread,
Sep 26, 2017, 1:34:02 PM9/26/17
to puppe...@googlegroups.com
Branan Riley updated an issue
Change By: Branan Riley
Story Points: 3

Branan Riley (JIRA)

unread,
Sep 26, 2017, 1:35:02 PM9/26/17
to puppe...@googlegroups.com
Branan Riley updated an issue
Change By: Branan Riley
Sprint: Platform OS  Grooming  Ready for Eng.

Trevor Vaughan (JIRA)

unread,
Sep 26, 2017, 1:48:03 PM9/26/17
to puppe...@googlegroups.com
Trevor Vaughan commented on Bug FACT-1732
 
Re: Facter core dump on large integers

Branan Riley So, how exactly does this work? Integers are Integers unless arbitrary decision that they're Strings until such point that facter gets fixed and then they're Integers again and all your code breaks?

That doesn't seem very 'Strong Type' friendly.

I do math on disk sizes quite a bit so this really isn't outside the realm of reality and not knowing whether or not something is a number isn't going to make for a good UX.

Branan Riley (JIRA)

unread,
Sep 26, 2017, 2:35:01 PM9/26/17
to puppe...@googlegroups.com
Branan Riley commented on Bug FACT-1732

It's not strong-type friendly, and it's not ideal. It's a tradeoff.

Fixing a core dump is always worth our time, but investing in the work to switch facter to a bignum library internally is a lot harder to justify, especially without anything coming up through support internally. I also have to think about interop with the rest of the Puppet and PE ecosystem, which includes Javascript - passing large integers around isn't necessarily possible through the entire stack.

So the documentation becomes "we output any number that Javascript can't handle as a string". That's not an awesome limitation, but is at least well-specified and less arbitrary than Ruby's 2**62 internal cutoff being exposed directly.

Trevor Vaughan (JIRA)

unread,
Sep 26, 2017, 4:29:03 PM9/26/17
to puppe...@googlegroups.com

Branan Riley Ok, I get tradeoffs so let's tackle this from another direction.

  1. The core dump should be wrapped in an error message that makes some sense to the user (this ticket)
  2. Ruby facter should have the same limitation imposed so that acceptance and rspec testing doesn't give false successes (this ticket)
  3. Puppet itself should warn if you use something in a variable that is greater than this new PUP-STD-BIGNUM (or whatever it gets called) (new ticket)

I think that would cover everything that I would need as a user to not make me go insane between testing and production.

Would that work for you?

Trevor Vaughan (JIRA)

unread,
Sep 27, 2017, 10:43:02 PM9/27/17
to puppe...@googlegroups.com

Branan Riley Things just got a lot more hairy with this particular issue - to put a fine point on it this could cause cascading failures across the entire puppet infrastructure from a single node's facts

  1. If you want to know exactly why we're grabbing kernel.shmall, we are using it to calculate the default number of shared memory pages in libvirt per optimization documentation that we found at some point
  2. PuppetDB cannot handle this Integer and is spewing errors into /opt/puppetlabs/server/data/puppetdb/stockpile/discard without ever cleaning them up (we had 7.8G which filled our root partition)
    • This was PuppetDB 4.3.0
  3. Puppet Server 2.7.2 works properly, even with the facter error
  4. Puppet Server 2.8.0 crashes completely and will not process catalogs

Given the fact that a number that can be readily plucked from the running system now crashes numerous components, I think that facter needs to either convert the value to a string on exception (should be easy), or should do something else to prevent a cascading environment failure. It needs to be in facter because it's the only gatekeeper to the system as a whole for a 'fail-fast' solution.

Trevor Vaughan (JIRA)

unread,
Sep 27, 2017, 10:43:02 PM9/27/17
to puppe...@googlegroups.com
Trevor Vaughan updated an issue
 
Change By: Trevor Vaughan
Priority: Major Critical

Trevor Vaughan (JIRA)

unread,
Sep 27, 2017, 11:17:02 PM9/27/17
to puppe...@googlegroups.com
 
Re: Facter core dump on large integers

Branan Riley To your point about disk sizes...the default mountpoints fact returns both available_bytes and size_bytes as Integers. The disks fact also returns size_bytes as an Integer.

If you have a representative disk with over 8 PB of space (not completely out of the question) then these facts will crash the system.

My previous math was wrong - Ruby's max hits 4 Exabytes which is stretching it for current technologies, but the JavaScript limit of 2^53-1 is not.

Other items of note (default Facter)

  • gce.instance.id => 4073115147758271314 (no idea how this is not crashing things, perhaps the hash burial is saving it)
  • gce.project.numericProjectId => 1064768239454 (no idea how big this might get)

My point is that you just don't know what any of this is going to be.

Branan Riley (JIRA)

unread,
Sep 28, 2017, 1:23:02 AM9/28/17
to puppe...@googlegroups.com
Branan Riley commented on Bug FACT-1732

I definitely want to get this resolved ASAP if it's causing issues in the larger ecoystem. In the short term that probably stll means stringification, but longer-term I do want to be able to report large numbers correctly. Do the following steps seem reasonable to you?

1) Stringify everything suspicious. This means ruby bignums become strings on the way in to Facter, and anything greater than 2^53 becomes a string on the way out. This is probably overkill, but gets us to a solid baseline for consistent behavior and will avoid gross misbehaviors. (this is a refinement of what I proposed above, essentially).

2) Implement bignums internally. This will fix any numeric overflows in built-in facts (resolving disk sizes as negative, for example). This will also let us remove the stringification of Ruby values that are too large for 64-bit ints. We'll have to spend some time evaluating bignum libraries and getting our chosen one building across our supported platforms, so although this is a purely internal change, it's likely to be "effort", thus a step of its own.

3) Once we're sure the Puppet ecosystem can handle large integers, disable stringification at the output boundaries. At this point everything is correctly typed again, and we can pretend that none of this every happened. This will be a larger cross-team effort internally, and probably will involve some external documentation updates about what our various APIs accept/provide.

As for things like GCE IDs - we should be representing them as strings anyway. They may be made entirely of "digits", but they are logically single identifiers, not numbers. This is something we'll have to fix on a fact-by-fact basis, although turning large ints into strings at the various output boundaries should avoid compatibility issues in the medium term.

If you haven't already, can you file tickets against puppetdb and puppetserver with the issues you've seen in those projects? I will want to round up with the appropriate folks internally in order to develop a larger "bignum plan", and having tickets for the failure cases will make that much easier.

Trevor Vaughan (JIRA)

unread,
Sep 28, 2017, 12:27:03 PM9/28/17
to puppe...@googlegroups.com

Branan Riley I can work on getting tickets in when I have time. We're trying to get a release out so it might be after PuppetConf.

1) Technically, that's everything that's an Integer. We can't dynamically Stringify things that are big because it will destroy logic in the underlying Puppet manifests. With strong typing, either something is an Integer or a String, it won't auto-cast (I think). And, of course, you can't have things be Strings sometimes and Integers other times because users just won't know what to do with the data and manifests will mysteriously break when things get too big (for whatever reason).

I would definitely send out an announcement for people to be starting to watch for this and there's going to need to be some sort of purge script for PuppetDB to get bad data converted internally.

2) Yeah, this is the correct solution really but that's going to cover at least 3 languages. They're all solved problems but implementation and testing takes time.

3) Makes sense, but I still think you're going to have to Stringify all Numbers for the short term. You might be able to make this a Stringification of only custom facts and not core facts, but the disk sizes still pose a problem.

Trevor Vaughan (JIRA)

unread,
Sep 28, 2017, 12:32:02 PM9/28/17
to puppe...@googlegroups.com

Branan Riley I just realized that what I said previously is simply untenable since it would break a lot of existing code.

I guess auto-Stringification of big numbers is all that can be done for right now with a warning that things that get big might mysteriously break until Bignum support is added across the stack.

Branan Riley (JIRA)

unread,
Jan 3, 2018, 6:37:04 PM1/3/18
to puppe...@googlegroups.com
Branan Riley updated an issue
 
Change By: Branan Riley
Sprint: Platform OS  Ready for Eng.  Grooming/Triage
This message was sent by Atlassian JIRA (v7.0.2#70111-sha1:88534db)
Atlassian logo

Branan Riley (JIRA)

unread,
Mar 21, 2018, 7:18:03 PM3/21/18
to puppe...@googlegroups.com
Branan Riley updated an issue
Change By: Branan Riley
Labels: ruby triaged
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Geoff Nichols (JIRA)

unread,
May 15, 2018, 5:12:04 PM5/15/18
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Sprint: Platform OS Grooming/Triage Ready for Eng.

Geoff Nichols (JIRA)

unread,
May 15, 2018, 5:12:04 PM5/15/18
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Fix Version/s: FACT 4.0.0

Branan Riley (JIRA)

unread,
May 21, 2018, 3:55:35 PM5/21/18
to puppe...@googlegroups.com

Branan Riley (JIRA)

unread,
May 21, 2018, 3:55:39 PM5/21/18
to puppe...@googlegroups.com
Branan Riley updated an issue
Change By: Branan Riley
Sprint: Platform EOL - Blocked Tasks

Branan Riley (JIRA)

unread,
May 21, 2018, 4:01:09 PM5/21/18
to puppe...@googlegroups.com

Henrik Lindberg (JIRA)

unread,
Jul 24, 2018, 6:13:02 AM7/24/18
to puppe...@googlegroups.com
Henrik Lindberg commented on Bug FACT-1732
 
Re: Facter core dump on large integers

This requires use of Rich Data support throughout since JSON/YAML does not specify the precision. While they do permit any number of digits to be present in JSON/YAML text, the bindings to runtime data types per implementation may truncate or give errors. (JavaScript is notoriously bad at this as it isn't even 64 bit signed).
Ruby can do this but at the cost of generating non compliant JSON/YAML (it serializes a Ruby type with type information).

We need a BigNumber data type in the Puppet Type system to make it possible to know how to represent it on the wire (as a string) but still be able to operate on it at runtime.

PDB needs to handle the new data type and store it as a PostgreSQL Decimal (up to 131072 digits before the decimal point; up to 16383 digits after the decimal point) - which we could use as the cap for BigNumber. PDB needs to be able to query the value.

Facter needs to send Rich Data.

When the cap was introduced in the language / type system it was to prevent BigNumbers being mangled or causing errors throughout. At that point we estimated the cost of adding BigNumber support to be quite high (because of the above).

Kenn Hussey (JIRA)

unread,
Sep 10, 2018, 9:31:03 AM9/10/18
to puppe...@googlegroups.com
Kenn Hussey updated an issue
 
Change By: Kenn Hussey
Flagged: Impediment

Geoff Nichols (JIRA)

unread,
Oct 1, 2018, 1:42:08 PM10/1/18
to puppe...@googlegroups.com
Geoff Nichols updated an issue
Change By: Geoff Nichols
Flagged: Impediment

Josh Cooper (Jira)

unread,
Oct 13, 2022, 12:21:02 PM10/13/22
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Fix Version/s: FACT 4.0.0
Fix Version/s: FACT 5.x
This message was sent by Atlassian Jira (v8.20.11#820011-sha1:0629dd8)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages