Jira (PUP-9577) Large numbers of facts cause slow performance

34 views
Skip to first unread message

Brian Rak (JIRA)

unread,
Mar 21, 2019, 11:25:06 AM3/21/19
to puppe...@googlegroups.com
Brian Rak created an issue
 
Puppet / Bug PUP-9577
Large numbers of facts cause slow performance
Issue Type: Bug Bug
Assignee: Unassigned
Attachments: cpu_profiles.zip, heap_profiles.zip, mem_profile_6kfacts, mem_profile_defaultfacts
Created: 2019/03/21 8:24 AM
Priority: Normal Normal
Reporter: Brian Rak

Puppet Version: mix of 5.58 and 6.3.0
Puppet Server Version: 6.2.1
OS Name/Version: CentOS 7

Many of our hosts have relatively large numbers of facts (~6k or so).  We've been seeing performance issues on our masters that seem to be related to memory pressure, and believe these to be related to the fact count.

I hacked up the full_catalog benchmark to use an exported copy of facts from our environment, and saw that catalog compilation times went up by 1.1s (about a 50% increase over the unmodified test).

Looking at the memory profile, I see what appears to be a very significant increase in the number of objects allocated.

Is this behavior expected?  I wouldn't really expect 6k facts to cause such a dramatic increase here.

 

I've attached a few things here:

  • mem_profile_defaultfacts -> Memory profile output from an unmodified full_catalog benchmark
  • mem_profile_6kfacts -> Memory profile output from the modified benchmark with a high fact count.
  • heap_profiles.zip -> Output from heap_dump benchmarks, named as above
  • cpu_profiles.zip -> Output from profile benchmarks
  • facts.json -> Sanitized version of our facts

 

I had just changed full_catalog/benchmarker.rb to have `facts = Puppet::Node::Facts.new("testing", JSON.parse(`cat /root/facts`))` ... my ruby skills are terrible.

I should note that we don't care about 99% of these facts, but I don't really see any way of filtering them.  For example, all of the IXXXXX network interfaces are useless to us, along with most of the entries for /dev/mapper/

I don't know if the performance issues are fixable here, or if I'm asking for a regex-based blocklist in facter, or what.  My suspicion is that the number of template calls is adding up:

https://github.com/puppetlabs/puppet/blob/master/lib/puppet/parser/templatewrapper.rb#L82

During benchmarks, this takes 0.01s per template (as compared to 0.00s with a smaller fact count)... multiplied by all the templates we use seems pretty significant.

 

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Brian Rak (JIRA)

unread,
Mar 21, 2019, 11:38:04 AM3/21/19
to puppe...@googlegroups.com
Brian Rak updated an issue
Change By: Brian Rak
Attachment: facts.json

Brian Rak (JIRA)

unread,
Mar 21, 2019, 12:55:03 PM3/21/19
to puppe...@googlegroups.com
Brian Rak commented on Bug PUP-9577
 
Re: Large numbers of facts cause slow performance

So here's why I think this is memory related.  I enabled GC logging on one of my puppet servers, and it certainly seems like it's generating 10GB of garbage every few seconds:

 

2019-03-21T12:52:07.675-0400: 97.148: [GC pause (G1 Evacuation Pause) (young)
[Eden: 9672.0M(9672.0M)->0.0B(9392.0M) Survivors: 152.0M->208.0M Heap: 10.6G(24.0G)->1228.0M(24.0G)]
2019-03-21T12:52:18.699-0400: 108.172: [GC pause (G1 Evacuation Pause) (young)
[Eden: 9392.0M(9392.0M)->0.0B(9336.0M) Survivors: 208.0M->208.0M Heap: 10.4G(24.0G)->1228.0M(24.0G)]
2019-03-21T12:52:30.695-0400: 120.167: [GC pause (G1 Evacuation Pause) (young)
[Eden: 8864.0M(4232.0M)->0.0B(9376.0M) Survivors: 208.0M->240.0M Heap: 10092.0M(24.0G)->1260.0M(24.0G)]
2019-03-21T12:52:42.453-0400: 131.925: [GC pause (G1 Evacuation Pause) (young)
[Eden: 9376.0M(9376.0M)->0.0B(9048.0M) Survivors: 240.0M->280.0M Heap: 10.4G(24.0G)->1300.0M(24.0G)]
2019-03-21T12:52:51.331-0400: 140.804: [GC pause (G1 Evacuation Pause) (young)
[Eden: 9048.0M(9048.0M)->0.0B(9992.0M) Survivors: 280.0M->248.0M Heap: 10.1G(24.0G)->1268.0M(24.0G)]
2019-03-21T12:53:02.207-0400: 151.680: [GC pause (G1 Evacuation Pause) (young)
[Eden: 9992.0M(9992.0M)->0.0B(8488.0M) Survivors: 248.0M->296.0M Heap: 11.0G(24.0G)->1316.0M(24.0G)]
2019-03-21T12:53:10.381-0400: 159.854: [GC pause (G1 Evacuation Pause) (young)
[Eden: 8488.0M(8488.0M)->0.0B(7896.0M) Survivors: 296.0M->296.0M Heap: 9804.0M(24.0G)->1316.0M(24.0G)]
2019-03-21T12:53:17.530-0400: 167.003: [GC pause (G1 Evacuation Pause) (young)
[Eden: 7896.0M(7896.0M)->0.0B(4008.0M) Survivors: 296.0M->304.0M Heap: 9212.0M(24.0G)->1320.0M(24.0G)]
2019-03-21T12:53:22.759-0400: 172.232: [GC pause (G1 Evacuation Pause) (young)
[Eden: 4008.0M(4008.0M)->0.0B(4968.0M) Survivors: 304.0M->304.0M Heap: 5328.0M(24.0G)->1324.0M(24.0G)]
2019-03-21T12:53:26.655-0400: 176.127: [GC pause (G1 Evacuation Pause) (young)
[Eden: 4968.0M(4968.0M)->0.0B(5272.0M) Survivors: 304.0M->320.0M Heap: 6292.0M(24.0G)->1336.0M(24.0G)]
2019-03-21T12:53:32.472-0400: 181.944: [GC pause (G1 Evacuation Pause) (young)
[Eden: 5272.0M(5272.0M)->0.0B(5304.0M) Survivors: 320.0M->328.0M Heap: 6608.0M(24.0G)->1348.0M(24.0G)]

Brian Rak (JIRA)

unread,
Mar 21, 2019, 2:13:03 PM3/21/19
to puppe...@googlegroups.com
Brian Rak updated an issue
Change By: Brian Rak
Attachment: 2019-03-21_12-50-30-gc.log.0.current

Brian Rak (JIRA)

unread,
Mar 21, 2019, 2:14:03 PM3/21/19
to puppe...@googlegroups.com

Brian Rak (JIRA)

unread,
Mar 21, 2019, 5:02:03 PM3/21/19
to puppe...@googlegroups.com
Brian Rak updated an issue
Change By: Brian Rak
Attachment: fsl.patch

Brian Rak (JIRA)

unread,
Mar 21, 2019, 5:02:04 PM3/21/19
to puppe...@googlegroups.com
Brian Rak commented on Bug PUP-9577
 
Re: Large numbers of facts cause slow performance

I was reading through the memory profile, and noticed something weird:

{{ 262822 "_"}}
{{ 262337 /root/puppet/lib/puppet/parser/templatewrapper.rb:84}}

This constant seems to get created hundreds of thousands of times, for no real reason... I enabled frozen_string_literal for this file (and scope.rb, which had a similar issue).  This immediately reduced memory allocations during the benchmark by 13%.  I don't know if this is going to break anything, but even so... freezing that one particular location seems like a really easy win.

Brian Rak (JIRA)

unread,
Mar 21, 2019, 5:41:02 PM3/21/19
to puppe...@googlegroups.com
Brian Rak updated an issue
Change By: Brian Rak
Attachment: test3.patch

Brian Rak (JIRA)

unread,
Mar 21, 2019, 5:41:03 PM3/21/19
to puppe...@googlegroups.com
Brian Rak commented on Bug PUP-9577
 
Re: Large numbers of facts cause slow performance

Allocations reduce significantly by applying fsl in more places, as well as caching the mangled version of names in scope.rb and templatewrapper.rb

Down to Total allocated: 107510429 bytes (1298917 objects), which is a reduces the # of allocations by 41%... I suspect a lot of this is due to the repetition of names within the fact set.

Horrible ruby patch in test3.patch

Henrik Lindberg (JIRA)

unread,
Mar 21, 2019, 7:10:03 PM3/21/19
to puppe...@googlegroups.com

I suspect that # frozen_string_literal: true has an effect on number of allocations (Ruby compiler knows it can reuse one instance), but just calling freeze is not enough - that just takes time freezing an allocated object. Doubt Ruby is smart enough to optimize a "foo".freeze.

Brian Rak (JIRA)

unread,
Mar 22, 2019, 11:58:04 AM3/22/19
to puppe...@googlegroups.com
Brian Rak updated an issue
Change By: Brian Rak
Attachment: cpu_usage.png

Brian Rak (JIRA)

unread,
Mar 22, 2019, 12:05:09 PM3/22/19
to puppe...@googlegroups.com
Brian Rak commented on Bug PUP-9577
 
Re: Large numbers of facts cause slow performance

So test3.patch did not provide sufficient resolution for this issue (despite looking good in the benchmarks).  I applied another patch to our masters after this, which filtered out all the facts we didn't care about right at load time.  This reduced the number of facts per agent from ~6400 to ~1200, and had significant impacts on our master loads (to the degree where I could probably eliminate 6 out of 8 masters and still be fine).

This is a snapshot of the data coming out of puppet server: https://snapshot.raintank.io/dashboard/snapshot/OOdS1Og23IHvHG73kC2FTcWIMGDVX4mp

  • I applied the fact reducing patch to puppetmaster2 at 11:10, and puppetmaster4 at ~11:15 (you can see the request duration drop then).  The # of active requests to these stayed high because they were picking up a significant portion of the load (our load balancer is configured to pick the one with the least connections), note the total requests count graph around then.
  • I patched the rest of the masters around 11:20. The performance increase was immediate, and everything is behaving normally now. 

I've attached the CPU usage graphs from one of our masters, which shows a very significant drop as well.

I should also note that as part of this I made our load balancer cache the results of requests for file_metadatas, static_file_content, file_metadata, and file_content.  This did not have any significant effect on the puppet master loads.

I have attached my fact cleanup patch with the note that this is specific to our environment, and I have minimal ruby skills   It might be helpful if someone else comes across this issue though.

Charlie Sharpsteen (JIRA)

unread,
Mar 22, 2019, 2:04:04 PM3/22/19
to puppe...@googlegroups.com

In addition to improvements in the compiler, we should also take a look at implementing FACT-1575 — which would allow facts to be filtered on the agent.

Jorie Tappa (JIRA)

unread,
Mar 25, 2019, 12:54:04 PM3/25/19
to puppe...@googlegroups.com

Brian Rak (JIRA)

unread,
Mar 25, 2019, 3:48:06 PM3/25/19
to puppe...@googlegroups.com
Brian Rak commented on Bug PUP-9577
 
Re: Large numbers of facts cause slow performance

We left the patched masters running all weekend, and CPU usage is still at the lower level from my last graph, and we're only seeing GC runs every 20-40 seconds (with multiple GB less garbage on average... down to 4-5 GB peak, versus ~20GB every time)

Henrik Lindberg (JIRA)

unread,
Mar 25, 2019, 4:05:04 PM3/25/19
to puppe...@googlegroups.com

Brian Rak is that with filtering out a bunch of facts and the changes in the patch attached to this ticket, or only one set of changes?

Brian Rak (JIRA)

unread,
Mar 25, 2019, 5:25:04 PM3/25/19
to puppe...@googlegroups.com
Brian Rak commented on Bug PUP-9577

That was with both patches, but on Friday we did not see improvement until we started filtering out facts.

Jarret Lavallee (JIRA)

unread,
Apr 2, 2019, 6:39:03 PM4/2/19
to puppe...@googlegroups.com
Jarret Lavallee updated an issue
 
Change By: Jarret Lavallee
CS Priority: Needs Priority

Adam Bottchen (JIRA)

unread,
Apr 4, 2019, 7:32:02 PM4/4/19
to puppe...@googlegroups.com
Adam Bottchen updated an issue
Change By: Adam Bottchen
CS Priority: Needs Priority Reviewed

Adam Bottchen (JIRA)

unread,
Apr 4, 2019, 7:40:03 PM4/4/19
to puppe...@googlegroups.com
Adam Bottchen updated an issue
Change By: Adam Bottchen
CS Priority: Reviewed Major

Adam Bottchen (JIRA)

unread,
Apr 4, 2019, 7:40:03 PM4/4/19
to puppe...@googlegroups.com
Adam Bottchen updated an issue
Change By: Adam Bottchen
Priority: Normal Major

Josh Cooper (JIRA)

unread,
Jan 13, 2020, 2:16:04 PM1/13/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-9577
 
Re: Large numbers of facts cause slow performance

FWIW I've started on a patch to add the frozen string literal pragma. Although it doesn't solve this issue, it is in general a good thing to reduce string allocation.

Henrik Lindberg (JIRA)

unread,
Jan 13, 2020, 2:31:03 PM1/13/20
to puppe...@googlegroups.com

A large number of facts probably means a lot of work in handling ERB since it binds every visible variable to an instance variable. On Puppet 6 also issue with method-call style on $facts variable.

Josh Cooper (Jira)

unread,
Aug 26, 2020, 3:41:03 PM8/26/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-9577

Using the attached facts, I can see that frozen string literals do help overall memory usage. However, large numbers of facts cause the scope and templatewrapper to allocate more memory than the fsl improvements save. So it would make sense that the limiting/blocking facts is going to produce the best results.

To see the effect of facts, here's the top N memory allocations before making any fsl changes. When using the current full_catalog benchmark:

Total allocated: 163276004 bytes (2158754 objects)
Total retained:  10916136 bytes (118620 objects)
 
allocated memory by file
-----------------------------------
  65435279  /Users/josh/work/puppet/lib/puppet/pops/parser/lexer2.rb
  10869961  /Users/josh/work/puppet/lib/puppet/resource.rb
   ...
   4133150  /Users/josh/work/puppet/lib/puppet/parser/scope.rb
   ...
   1280784  /Users/josh/work/puppet/lib/puppet/parser/templatewrapper.rb

When using large number of facts:

Total allocated: 207317451 bytes (2960776 objects)
Total retained:  11779693 bytes (128498 objects)
 
allocated memory by file
-----------------------------------
  66391157  /Users/josh/work/puppet/lib/puppet/pops/parser/lexer2.rb
  31855608  /Users/josh/work/puppet/lib/puppet/parser/templatewrapper.rb
  12110152  /Users/josh/work/puppet/lib/puppet/parser/scope.rb
  10943780  /Users/josh/work/puppet/lib/puppet/resource.rb

Total allocation increased by about 40MB. Scope increased by 7.97MB because all facts are added as scope variables. Templatewrapper increased by 30.5MB because every fact is added as an instance variable to the wrapper (so it's visible to the ERB template).

After applying frozen string literals, I see some improvement to overall memory allocations, but scope and templatewrapper still dominate. For small facts:

Total allocated: 149804554 bytes (1822322 objects)
Total retained:  10798846 bytes (116042 objects)
 
allocated memory by file
-----------------------------------
  55276439  /Users/josh/work/puppet/lib/puppet/pops/parser/lexer2.rb
  10869961  /Users/josh/work/puppet/lib/puppet/resource.rb
   8865952  /Users/josh/work/puppet/lib/puppet/pops/model/factory.rb
   ...
   3810952  /Users/josh/work/puppet/lib/puppet/pops/evaluator/runtime3_support.rb
   3419144  /Users/josh/work/puppet/lib/puppet/parser/resource.rb
   2930548  /Users/josh/work/puppet/lib/puppet/parser/scope.rb
   ...
    885384  /Users/josh/work/puppet/lib/puppet/parser/templatewrapper.rb

For large numbers of facts

Total allocated: 183590607 bytes (2367558 objects)
Total retained:  11661668 bytes (125887 objects)
 
allocated memory by file
-----------------------------------
  56096717  /Users/josh/work/puppet/lib/puppet/pops/parser/lexer2.rb
  21346048  /Users/josh/work/puppet/lib/puppet/parser/templatewrapper.rb
  10943780  /Users/josh/work/puppet/lib/puppet/resource.rb
  10906830  /Users/josh/work/puppet/lib/puppet/parser/scope.rb
   9050968  /Users/josh/work/puppet/lib/puppet/pops/model/factory.rb

So memory allocations increased by ~33MB (so fsl saved 7MB). Templatewrapper increased by 20MB (fsl saved 10MB) and scope increased by the same 7.9MB (fsl had no effect). Also the number of facts don't much effect on lexer2.rb, resource.rb, factory.rb as we'd expect.

This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Justin Stoller (Jira)

unread,
Aug 27, 2020, 7:04:03 PM8/27/20
to puppe...@googlegroups.com

FWIW, I think users can avoid this situation by writing EPP templates vs ERB ones.

Ruby, afaict, does not provide a hook for instance variable look up like it does for method_missing so there's nothing to do if users reference @variable names besides preloading all possible instance variables. It would be possible to gate preloading facts as instance variables on a new setting, but users would need to validate that their ERB templates only use the other ways of accessing facts eg scope['variable']. At that point, though, I'm not sure if it wouldn't be better to encourage them to move to EPP?

Henrik Lindberg (Jira)

unread,
Aug 28, 2020, 10:05:03 AM8/28/20
to puppe...@googlegroups.com

Justin Stoller you are right, there is no way to cut in on the @variable references unless going the route of transforming the ERB code - i.e. parse it and modify the parsed Ruby AST to get an ERB that uses a different mechanism to look up variables - which would then result in a new variant of Puppet/ERB mapping (that this would use), and in that case, users could just as well be told to switch to that alternative binding... In any case a lot of work. So, as you suggest the short term solution is really to point people to EPP.

I was toying with the idea of replacing @varname access to be @var.varname and bind a value to @var with a missing_method implementation that would lookup the method name as the name of a variable.

Reid Vandewiele (Jira)

unread,
Jun 15, 2021, 6:44:01 PM6/15/21
to puppe...@googlegroups.com

Taking a look at this due to an at-scale customer performance issue.

From what I'm told it sounds like a JRuby implementation detail may be causing a lot of strife when nodes have large numbers of per-node-unique facts which are never used in templates, but which JRuby slowly adds references to on a per-class basis, as they are encountered, which means data in the Puppet::Parser::TemplateWrapper table sticks around long after individual instances of that class are gone, accumulating and impacting catalog runs.

As a triage solution, we talked about modifying how instance variables are added to Puppet::Parser::Template instances such that we only add an instance variable if we do a simple check of the template source material and believe it to reference the variable.

Rough idea expressed in Ruby:

ivar_names_regex = %r{(?<=@)[a-z0-9_][a-zA-Z0-9_]*}m
ivars_used = template_source.scan(ivar_names_regex)
ivars_used.each do |name|
  instance_variable_set("@#{name}", scope.lookupvar(name))
end

Is there an easy way to extend this approach to make it viable enough for a product patch?

This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Austin Boyd (Jira)

unread,
Jun 16, 2021, 5:35:01 AM6/16/21
to puppe...@googlegroups.com
Austin Boyd updated an issue
 
Change By: Austin Boyd
Labels: jira_escalated

Austin Boyd (Jira)

unread,
Jun 16, 2021, 5:35:02 AM6/16/21
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Zendesk Ticket Count: 1
Zendesk Ticket IDs: 44770

Gareth McGrillan (Jira)

unread,
Jun 16, 2021, 5:39:01 AM6/16/21
to puppe...@googlegroups.com

Justin Stoller (Jira)

unread,
Jun 16, 2021, 1:09:03 PM6/16/21
to puppe...@googlegroups.com

Is there an easy way to extend this approach to make it viable enough for a product patch?

I've looked into this and I don't think it's super viable. For the solution to be robust enough I think you'd need to parse each template file and extract the ruby code blocks in a similar manner to to ERB itself and then run Ripper on it to analyze the code and extract any instance variables for there. Then you can only set the instance variables you need on the scope before actually evaluating the template. I think will make evaluating templates noticeably slower for the majority of users. If that's true we probably would want to check to see the number of facts a customer has before going down that parse-the-erb-before-evaluating-it path...  Unfortunately, that feels like a lot of complexity to add however, the other ideas we have require customer/ecosystem changes.

 

One of the ideas we had were to strongly prefer EPP templates and effectively make them a requirement at scale. The other is to provide a flag that disables referring to facts as instance variables. There are three different ways to refer to facts within a template and the other two (eg scope['fact_name']) don't require pre-setting instance variables on the scope object. (see https://puppet.com/docs/puppet/7/lang_template_erb.html#erb_variables), if a user doesn't have ERB templates that access instance variable style facts then they could set a flag and we could skip that action.

 

I'm unclear the tractability of adding a lot of technical complexity vs getting users to stop using instance variable style fact references.

Justin Stoller (Jira)

unread,
Jun 16, 2021, 1:25:03 PM6/16/21
to puppe...@googlegroups.com

I suspect there's something leaking in the JRuby side related to dynamically adding variables for ERB templates that's causing this massive slowdown.

There's probably two culprits here. One is that large objects may be immediately tenured in Java. I think this is a particular problem with the G1 garbage collector.

 

The other is that JRuby has been moving more and more of Ruby's programming constructs to use the JVM's InvokeDynamic feature. This causes dynamic code to better optimized in the long run but cause more upfront cost and the Java objects that back this process are difficult to garbage collect (there's custom classes and byte code generated by this process). Setting the JAVA_ARG -Djruby.invokedynamic.cache.ivars=false may help there.

Henrik Lindberg (Jira)

unread,
Jun 19, 2021, 8:04:02 AM6/19/21
to puppe...@googlegroups.com

IIRC, the context with variables is created each time an ERB is called - it could be possible to cache this for all the facts (i.e. the global static state), and just mix in the scope specific variables at the time of the ERB call.

Justin Stoller (Jira)

unread,
Jun 30, 2021, 12:21:01 PM6/30/21
to puppe...@googlegroups.com

I think the state we left this was that:

  1. There was a support escalation with SF that was hit by this, we offered GC and JRuby tuning to hopefully help.
  2. We outlined a couple potential "fixes" for this:
    1. Parse ERB templates prior to evaluating them to see what variables we need for the scope. Possible costs in performance for the not-large-facts case and in general maintenance.
    2. Deprecate @ivar fact references and allow users to disable ivars as facts (skipping the slow paths). Probably has a large ecosystem lift? Would love to hear CS input on that.
    3. Share variables between templates, may only incur the facts -> ivars cost once per catalog, may have edge cases.

I'd love to know if either of the performance mitigations were helpful? and how much work CS believes it would be to move users to not using instance variables for facts?

Reid Vandewiele (Jira)

unread,
Jun 30, 2021, 12:40:03 PM6/30/21
to puppe...@googlegroups.com

The customer triage that was put in place, and which decisively solved the issue, was to patch the Puppet::Parser::TemplateWrapper class to filter which ivars are set.

Filter: 

require 'delegate'
 
class ToHashScopeFilterDelegator < SimpleDelegator
  def to_hash(*args)
    __getobj__.to_hash(*args).reject { |k, _| k.start_with?('badfact_') }
  end
end

Patch:

Puppet::Parser::TemplateWrapper.class_eval do
  def scope
    ToHashScopeFilterDelegator.new(@__scope__)
  end
end

The filter used by the customer is specific to their situation and so could be performed by prefix. It is clearly evident that limiting the number of ivars set solved the problem.

David McTavish (Jira)

unread,
Mar 8, 2022, 2:11:03 PM3/8/22
to puppe...@googlegroups.com
David McTavish updated an issue
 
Change By: David McTavish
Priority: Major Normal
This message was sent by Atlassian Jira (v8.20.2#820002-sha1:829506d)
Atlassian logo

Josh Cooper (Jira)

unread,
Oct 26, 2022, 2:58:02 PM10/26/22
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Team: Froyo Phoenix
This message was sent by Atlassian Jira (v8.20.11#820011-sha1:0629dd8)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages