Jira (PUP-10628) Performance regression with large hashes using lookup

35 views
Skip to first unread message

Oleksandr Shtepa (Jira)

unread,
Aug 19, 2020, 10:56:03 AM8/19/20
to puppe...@googlegroups.com
Oleksandr Shtepa created an issue
 
Puppet / Bug PUP-10628
Performance regression with large hashes using lookup
Issue Type: Bug Bug
Affects Versions: PUP 6.17.0
Assignee: Unassigned
Components: Hiera & Lookup
Created: 2020/08/19 7:55 AM
Environment:

Test configuration (2000 items hash):

$ cat > hiera.yaml <<EOF
---
version: 5
defaults:
  datadir: .
  data_hash: yaml_data
hierarchy:
  - name: test
    path: test.yaml
EOF
$ echo -e 'hash1:\n  hash2:\n    hash3:' > test.yaml
$ for i in {1..2000}; do echo "      key$i: value$i" >> test.yaml ; done

Priority: Normal Normal
Reporter: Oleksandr Shtepa

Puppet Version: starting from 5.0.0
**

After upgrading our Puppet installation from 4.10 to 6.x I found large catalog compile time increasing (over 300 seconds) for some nodes.

Profiling shows problem with large hashes in Hiera yamls.

 

Good case:

 

$ sudo yum downgrade -y http://yum.puppetlabs.com/el/6x/PC1/x86_64/puppet-agent-1.10.14-1.el6.x86_64.rpm
$ puppet --version
4.10.12
$ time puppet lookup --hiera_config hiera.yaml hash1.hash2.hash3.key1
--- value1
...
real    0m2.195s
user    0m2.010s
sys     0m0.181s

Bad cases:

 

 

$ sudo yum update -y http://yum.puppetlabs.com/puppet5/el/6/x86_64/puppet-agent-5.0.0-1.el6.x86_64.rpm
$ puppet --version
5.0.0
$ time puppet lookup --hiera_config hiera.yaml hash1.hash2.hash3.key1
--- value1
...
real    0m9.837s
user    0m9.623s
sys     0m0.207s

$ sudo yum update -y puppet-agent
$ puppet --version
6.17.0
$ time puppet lookup --hiera_config hiera.yaml hash1.hash2.hash3.key1
--- value1
real    0m9.788s
user    0m9.494s
sys     0m0.296s

My research ended on follow debug points:

 

$ patch /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/lookup/data_provider.rb <<EOF
--- data_provider.rb
+++ data_provider.rb
@@ -80,11 +80,14 @@
   # @param data_provider [DataProvider] The data provider that produced the hash
   # @return [Object] The data value
   def validate_data_value(value, &block)
+require 'time'
+print "data_provider.rb::validate_data_value::1 ", ::Time.now.to_f, "\n"
     # The DataProvider.value_type is self recursive so further recursive check of collections is needed here
     unless value_is_validated? || DataProvider.value_type.instance?(value)
       actual_type = Types::TypeCalculator.singleton.infer(value)
       raise Types::TypeAssertionError.new("#{yield} has wrong type, expects Puppet::LookupValue, got #{actual_type}", DataProvider.value_type, actual_type)
     end
+print "data_provider.rb::validate_data_value::2 ", ::Time.now.to_f, "\n"
     value
   end
 end
EOF

Run with debug timestamps on 6.17.0:

$ time puppet lookup --hiera_config hiera.yaml hash1.hash2.hash3.key1
data_provider.rb::validate_data_value::1 1597847682.16178
data_provider.rb::validate_data_value::2 1597847689.6889844
--- value1
real    0m9.819s
user    0m9.557s
sys     0m0.261s

Run with debug timestamps on 4.10.12:

$ time puppet lookup --hiera_config hiera.yaml hash1.hash2.hash3.key1
data_provider.rb::validate_data_value::1 1597847876.923722
data_provider.rb::validate_data_value::2 1597847876.9429817
--- value1
...
real    0m2.206s
user    0m2.050s
sys     0m0.151s

Slowdown from 19ms to 7.5s (390 times) on 2000 items.

 

 

For 4000 items - slowndown from 38ms to 31s (800 times)

 

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Josh Cooper (Jira)

unread,
Aug 20, 2020, 1:47:03 AM8/20/20
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Attachment: rbspy-2020-08-20-gAPdCc51RI.flamegraph.svg

Josh Cooper (Jira)

unread,
Aug 20, 2020, 1:48:04 AM8/20/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10628
 
Re: Performance regression with large hashes using lookup

Thanks Oleksandr Shtepa! I was able to reproduce using your instructions:

# puppet --version
6.17.0
# cd /etc/puppetlabs/puppet
# cat hiera.yaml
---
version: 5
defaults:
  datadir: .
  data_hash: yaml_data
hierarchy:
  - name: test
    path: test.yaml
# echo -e 'hash1:\n  hash2:\n    hash3:' > test.yaml
# for i in {1..2000}; do echo "      key$i: value$i" >> test.yaml ; done
# time puppet lookup --hiera_config hiera.yaml hash1.hash2.hash3.key1
--- value1
 
real	0m5.324s
user	0m5.066s
sys	0m0.241s

Using rbspy, I generated the following flamegraph:

# rbspy record -- puppet lookup hash1.hash2.hash3.key1
Wrote raw data to /root/.cache/rbspy/records/rbspy-2020-08-20-JLsx3EmDqL.raw.gz
Writing formatted output to /root/.cache/rbspy/records/rbspy-2020-08-20-gAPdCc51RI.flamegraph.svg

I've attached the flamegraph, which shows most of the time spent in DataProvider.value_type.instance?(value), and it seems to walk every element in the hash. I'll continue looking.

Oleksandr Shtepa (Jira)

unread,
Aug 20, 2020, 3:22:03 AM8/20/20
to puppe...@googlegroups.com

Looks like same problem exists for facter code.

On some node I have 270 IP addresses. --profile shows:

[puppetserver] Puppet functions -> net::ipaddresses4_list: 6.164248 s (5 calls)

On a node with 3 IPs:

[puppetserver] Puppet functions -> net::ipaddresses4_list: 0.075847 s (5 calls)

net::ipaddresses4_list only works with fact $::networking:

function net::ipaddresses4_list() >>Array[String] {
  $::networking['interfaces'].filter|$v|{ 'bindings' in $v[1] }.reduce([])|$memo, $v|{ $memo + $v[1]['bindings'].map()|$i|{ $i['address']} }
}

Josh Cooper (Jira)

unread,
Aug 20, 2020, 2:43:03 PM8/20/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10628

This can be reproduced using:

require 'puppet'
require 'puppet/pops'
 
Puppet.initialize_settings
Puppet::Util::Log.newdestination(:console)
 
values = 0.upto(2000).inject({}) do |memo, i|
  memo["key#{i}"] = "value#{i}"
  memo
end
 
hash = { 'hash2' => { 'hash3' => values } }
 
Puppet::Util.benchmark(:notice, "Inferred hash is an instance of #{Puppet::Pops::Lookup::DataProvider.value_type.name} in %{seconds}") do
  Puppet::Pops::Lookup::DataProvider.value_type.instance?(hash)
end

$ bx ruby slow_infer.rb
Notice: Inferred hash is an instance of RichData in 3.60

The bulk of the time is spent calculating if the hash is an instance of PTypeSetType. And it does this 3 times, once for each level of the hash. The PTypeSetType was added in PUP-6220 and is included in RichData:

'RichData' => 'Variant[Scalar,SemVerRange,Binary,Sensitive,Type,TypeSet,URI,Object,Undef,Default,Hash[RichDataKey,RichData],Array[RichData]]'

I can't imagine it would make sense for lookup to return TypeSet types? One option might be to check if the hash is an instance of RichData - TypeSet? Also wonder if facts have the same problem in PUP-9577.

Josh Cooper (Jira)

unread,
Aug 20, 2020, 3:12:03 PM8/20/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10628

Excluding TypeSet from the list of possible data provider values types is about 300x faster:

diff --git a/lib/puppet/pops/loader/static_loader.rb b/lib/puppet/pops/loader/static_loader.rb
index 3e6b78ce4c..8faca4e84f 100644
--- a/lib/puppet/pops/loader/static_loader.rb
+++ b/lib/puppet/pops/loader/static_loader.rb
@@ -29,6 +29,7 @@ class StaticLoader < Loader
     'Data' => 'Variant[ScalarData,Undef,Hash[String,Data],Array[Data]]',
     'RichDataKey' => 'Variant[String,Numeric]',
     'RichData' => 'Variant[Scalar,SemVerRange,Binary,Sensitive,Type,TypeSet,URI,Object,Undef,Default,Hash[RichDataKey,RichData],Array[RichData]]',
+    'LookupDataValue' => 'Variant[Scalar,SemVerRange,Binary,Sensitive,Type,URI,Object,Undef,Default,Hash[RichDataKey,LookupDataValue],Array[LookupDataValue]]',
 
     # Backward compatible aliases.
     'Puppet::LookupKey' => 'RichDataKey',
diff --git a/lib/puppet/pops/lookup/data_provider.rb b/lib/puppet/pops/lookup/data_provider.rb
index 6fdb9fa757..2446f35fb3 100644
--- a/lib/puppet/pops/lookup/data_provider.rb
+++ b/lib/puppet/pops/lookup/data_provider.rb
@@ -13,7 +13,8 @@ module DataProvider
   def self.register_types(loader)
     tp = Types::TypeParser.singleton
     @key_type = tp.parse('RichDataKey', loader)
-    @value_type = tp.parse('RichData', loader)
+    # exclude TypeSet
+    @value_type = tp.parse('LookupDataValue', loader)
   end
 
   # Performs a lookup with an endless recursion check.

Before

$ bx ruby slow_infer.rb
Notice: Inferred hash is an instance of RichData in 3.23

After

$ bx ruby slow_infer.rb
Notice: Inferred hash is an instance of LookupDataValue in 0.01

Henrik Lindberg (Jira)

unread,
Aug 20, 2020, 3:23:03 PM8/20/20
to puppe...@googlegroups.com

RichData as a format allows for types to be embedded in the serialization of data - that way, a serialization can be self describing. In serialization it is optional to use this, or to simply include types as reference via their name (i.e. "hoping" that the other side has the same definition of the type). Early on in the implementation of rich-data support in the catalog we did include the serialized types - but this proved to be too bulky.

There is currently no other data type (alias) defined for RichData without the Type and Type set types, but one could be created. Don't see it is of much practical use to support Type/TypeSet in hiera data, you would get the same if types were returned as strings and then turned into types dynamically - rather than a hiera backend returning an instance of Type/TypeSet. Not sure if dropping them from RichData would cause problems - probably not that difficult to test.

Speeding up the inference is still of value - there may be some obvious things that can be done to make it faster to discriminate if a hash could possibly represent a type or typeset.

Henrik Lindberg (Jira)

unread,
Aug 20, 2020, 3:27:03 PM8/20/20
to puppe...@googlegroups.com

An optimization that was done in other places were that instead of inferring the type of the value and then matching the inferred against the desired you ask if it is an instance of the desired type - not sure if there is something like that going on here, or if also asking if it is an instance requires a deep recursion.

Josh Cooper (Jira)

unread,
Aug 21, 2020, 7:23:03 PM8/21/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10628

Tests pass for me locally when Type and TypeSet are removed, so I submitted a PR https://github.com/puppetlabs/puppet/pull/8294.

It looks like TypeSet#instance? could be optimized to reject the value if the required pcore_version key is missing, but I'm not sure which keys are actually required based on the way __pcore_init_hash optionally parses the hash: https://github.com/puppetlabs/puppet/blob/main/lib/puppet/pops/types/p_type_set_type.rb#L174-L180.

Henrik Lindberg (Jira)

unread,
Aug 22, 2020, 2:28:04 PM8/22/20
to puppe...@googlegroups.com

Josh Cooper I think it is may be as easy as checking if it has the key "pcore_uri" - which should be present and signifies that "this is something that has to do with pcore" - but it could be that actual usage is too relaxed with the formalities of declaring this (or the pcore_version for that matter). Apart from Puppet core and possibly Bolt, I don't think there is any use of `TypeSet` in the wild - if so, the worst case would be that manually authored typeset files would need to be updated with the pcore_uri.

Would need to ask Thomas Hallgren to get all the details.

Josh Cooper (Jira)

unread,
Aug 27, 2020, 6:53:03 PM8/27/20
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Fix Version/s: PUP 6.19.0
Fix Version/s: PUP 5.5.22

Josh Cooper (Jira)

unread,
Aug 27, 2020, 6:53:03 PM8/27/20
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Aug 27, 2020, 6:53:04 PM8/27/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10628
 
Re: Performance regression with large hashes using lookup

I modified TypeSetType#instance? so inference is skipped if the object isn't a hash or if the pcore_version key is missing. The was merged to 5.5.x in https://github.com/puppetlabs/puppet/commit/d00f2a6a23f6a98daf2b954aea2e1bf01c020c17.

I'm going to file a separate ticket to remove TypeSetType from RichData to be fixed in Puppet 7.

Josh Cooper (Jira)

unread,
Aug 27, 2020, 7:09:03 PM8/27/20
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Aug 27, 2020, 7:10:03 PM8/27/20
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Sprint: Platform Core KANBAN

Josh Cooper (Jira)

unread,
Sep 1, 2020, 2:19:03 PM9/1/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10628
 
Re: Performance regression with large hashes using lookup

Follow up merged to 5.5.x https://github.com/puppetlabs/puppet/commit/6f168e901fae4a3b8593a342f4df54044fed22dd. We now verify the object is an instance of (or subclass) of PTypeSetType.

Josh Cooper (Jira)

unread,
Sep 1, 2020, 2:22:03 PM9/1/20
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Release Notes: Bug Fix
Release Notes Summary: Fixes a performance regression in 5.5.x and up which affected hiera lookups in large hashes.

Josh Cooper (Jira)

unread,
Sep 3, 2020, 1:30:03 PM9/3/20
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Sep 3, 2020, 1:47:05 PM9/3/20
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10628

This also fixes the issue where the hash is validated multiple times for each level of the hash.

Previously PTypeSetType#instance? was three times with the following object o:

 

{'hash2' => {'hash3' =>

{ 'key1' => ..., ... } } }
{'hash3' => { 'key1' => ..., ... }

}

{'key1' => ... }

 

With the new code, the instance? is still called three times, but it returns immediately due to the object not being an instance of PTypeSetType.

Claire Cadman (Jira)

unread,
Oct 12, 2020, 9:18:03 AM10/12/20
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages