Jira (PUP-9561) puppet language functions are 11,350x slower than ruby functions

48 views
Skip to first unread message

Ben Ford (JIRA)

unread,
Mar 14, 2019, 2:17:03 PM3/14/19
to puppe...@googlegroups.com
Ben Ford updated an issue
 
Puppet / Task PUP-9561
puppet language functions are 11,350x slower than ruby functions
Change By: Ben Ford
Reporter: Ben Ford Sean Millichamp
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Ben Ford (JIRA)

unread,
Mar 14, 2019, 2:17:04 PM3/14/19
to puppe...@googlegroups.com
Ben Ford created an issue
Issue Type: Task Task
Assignee: Unassigned
Components: Language
Created: 2019/03/14 11:16 AM
Priority: Normal Normal
Reporter: Ben Ford

From Slack: https://puppetcommunity.slack.com/archives/CFD8Z9A4T/p1552586549522900

I wrote a Puppet language function to, given an array of hashes, return a hash that counted duplicate instances of one of the keys in the original hash and then returned all of the values where the count was > 1.

this was the code:

# Find count of records with duplicates:
$id_count = $system_info.reduce({}) |$results, $rec| {
    $id = $rec['system_id']
    $current = $results[$id].lest || { 0 }
    $results + {$id => $current + 1}
}
 
# Find IDs with count > 1:
$duplicate_ids = $id_count.filter |$id, $count| { $count > 1 }.keys

on my set of records, which was around 1800 or so I think, it took more than two minutes to run, per the profile data

I converted it to Ruby and it was, of course, super fast

the ruby I ended up with was:

def find_dupes_system_info(system_info)
    res = Hash.new(0)
    system_info.each { |sys| res[sys['system_id']] += 1 }
    res.reject { |_id, count| count == 1 }.keys
end

155.4891 seconds vs 0.0137 seconds

Ben Ford (JIRA)

unread,
Mar 14, 2019, 2:20:02 PM3/14/19
to puppe...@googlegroups.com
Ben Ford commented on Task PUP-9561
 
Re: puppet language functions are 11,350x slower than ruby functions

I've had similar conversations with two other people in recent weeks.

Ben Ford (JIRA)

unread,
Mar 14, 2019, 3:34:02 PM3/14/19
to puppe...@googlegroups.com
Ben Ford commented on Task PUP-9561

After Puppet 6.3, this could be rewritten as something like (eyeball compiled only)

$system_info.group_by |$rec| { $rec['system_id'] }
                     .filter |$k, $v| { $v.size > 1}
	             .keys

Henrik Lindberg (JIRA)

unread,
Mar 15, 2019, 7:44:04 AM3/15/19
to puppe...@googlegroups.com

The original has exponential running time for the length of the input data, and the rewritten function is linear. For smaller number of entries (I tested with 200) the difference between the two algorithms is that Ben's version is 7x faster. For 500 entries it is 48x faster, and so on.

Henrik Lindberg (JIRA)

unread,
Mar 15, 2019, 8:01:06 AM3/15/19
to puppe...@googlegroups.com

If being on a version where group_by does not exist the function can be written like this:

$keys = $system_info.map |$x| { $x['system_id'] }.sort
$duplicate_ids = $keys.map |$i, $v| { if $v == $keys[$i+1] { $v } }.unique.filter |$x| { $x != undef}

This algorithm sorts they key, and for any duplicate key there will be at least one entry in the mapped result with this key (the rest are undef and filtered out).

This variant is slower than the group_by version by 2x but it is at least linear.

Sean Millichamp (JIRA)

unread,
Mar 18, 2019, 7:47:02 AM3/18/19
to puppe...@googlegroups.com

Thanks for the suggestions. Given that I can't use group_by yet (on 5.5) I'll probably stick with the Ruby I'm using, but Henrik's suggestions were clever and interesting to read.

Would you mind clarifying what part of the original is responsible for the exponential running time? Is it the last line in the reduce that merges the results? Is that because each time through the loop requires allocating a new hash and copying all of the elements in the previous hash, before doing the merge? I believe that is a fairly common pattern I use in reduce, often to manipulate a data structure I can't otherwise modify (such as in an each construct, for example) because of immutable variables.

Henrik Lindberg (JIRA)

unread,
Mar 18, 2019, 12:44:03 PM3/18/19
to puppe...@googlegroups.com

Sean Millichamp You are correct - copying each time in the loop, and each time the hash is a little bigger is what kills performance.

It is a common use case, and for small sized hashes it is not too bad. For bigger hashes, one of the techniques I showed work much faster.

 

Jorie Tappa (JIRA)

unread,
Mar 18, 2019, 12:46:04 PM3/18/19
to puppe...@googlegroups.com

Jorie Tappa (JIRA)

unread,
Mar 18, 2019, 12:46:05 PM3/18/19
to puppe...@googlegroups.com

Henrik Lindberg (JIRA)

unread,
Mar 19, 2019, 11:16:07 AM3/19/19
to puppe...@googlegroups.com
 
Re: puppet language functions are 11,350x slower than ruby functions

I added PUP-9568 as an idea for how we can add a hash builder function (by vamping the existing `merge` function) in stdlib. Maybe do the work there so it can be used with older puppet version.

Henrik Lindberg (JIRA)

unread,
Mar 20, 2019, 8:08:03 AM3/20/19
to puppe...@googlegroups.com

And PUP-9568 was changed to become MODULES-8760 (for which there is now a PR to stdlib).

Henrik Lindberg (JIRA)

unread,
May 9, 2019, 9:41:03 AM5/9/19
to puppe...@googlegroups.com

Sean Millichamp Since the feature for merge is now added to stdlib, maybe you can use that.

And on that note, I am closing this ticket - it shows enough alternatives how to avoid exponential running time for hash construction.

Bartosz Blizniak (Jira)

unread,
Sep 28, 2021, 7:23:02 AM9/28/21
to puppe...@googlegroups.com

Hi, we are currently seeing some performance issues with the "reduce" function performance and "merge".

A customer provided sample code:

$test = generate('/bin/seq', '1', '2000').split("\n") $result = $test.reduce({}) |$memo,$value| { $memo + { "key $value" => "foo" } } 

This took around 3 seconds. Increasing the generate line to 4000 will increase the runtime to 12 seconds which is more than double. 

 

Is this something that we can improve the performance on or is this expected?

This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Josh Cooper (Jira)

unread,
Sep 28, 2021, 1:37:02 PM9/28/21
to puppe...@googlegroups.com
Josh Cooper commented on Task PUP-9561

Bartosz Blizniak When using the merge function, the block just needs to return the value for that iteration, so you'll want to do:

$test = generate('/bin/seq', '1', '4000').split("\n")
$result = $test.merge |$memo,$value| { { "key $value" => "foo" } }

That said, it is still slow due to the amount of time puppet spends inferring the data type of the unused $memo parameter. I think there's a simple fix to the merge function for the typical case where the $memo is not needed/wanted:

$result = $test.merge |$value| { { "key $value" => "foo" } }

Doing that drops the time to merge 4000 elements from 13.2 sec to 3.8 sec:

❯ time bx puppet apply merge.pp
Notice: Compiled catalog for localhost in environment production in 9.48 seconds
Notice: Applied catalog in 0.01 seconds
bundle exec puppet apply merge.pp  12.90s user 0.24s system 99% cpu 13.244 total
 
❯ time bx puppet apply merge.pp
Notice: Compiled catalog for localhost in environment production in 0.20 seconds
Notice: Applied catalog in 0.02 seconds
bundle exec puppet apply merge.pp  3.60s user 0.18s system 97% cpu 3.877 total

Josh Cooper (Jira)

unread,
Sep 28, 2021, 7:32:02 PM9/28/21
to puppe...@googlegroups.com
Josh Cooper commented on Task PUP-9561

Actually the type inference is much worse than I expected. Here are 3 manifests, using reduce, merge and merge2 which is the same as merge except the memo isn't passed to the block.

❯ cat reduce.pp 
$test = generate('/bin/sh', '-c', 'seq 1 4000').split("\n")
$result = $test.reduce({}) |$memo,$value| { $memo + { "key $value" => "foo" } }
 
❯ cat merge.pp
$test = generate('/bin/sh', '-c', 'seq 1 4000').split("\n")
$result = merge($test) |$memo,$value| { { "key $value" => "foo" } }
 
❯ cat merge2.pp                      
$test = generate('/bin/sh', '-c', 'seq 1 4000').split("\n")
$result = merge($test) |$value| { { "key $value" => "foo" } }

Using puppet#3828aabe8d32368faa5cdc1a428189d0cb117e52, I get:

❯ time bundle exec puppet apply reduce.pp 
Notice: Compiled catalog for localhost in environment production in 8.21 seconds
Notice: Applied catalog in 0.02 seconds
bundle exec puppet apply reduce.pp  10.02s user 0.17s system 99% cpu 10.266 total
 
❯ time bundle exec puppet apply merge.pp 
Notice: Compiled catalog for localhost in environment production in 8.09 seconds
Notice: Applied catalog in 0.02 seconds
bundle exec puppet apply merge.pp  9.88s user 0.19s system 99% cpu 10.155 total
 
❯ time bundle exec puppet apply merge2.pp
Notice: Compiled catalog for localhost in environment production in 0.19 seconds
Notice: Applied catalog in 0.02 seconds
bundle exec puppet apply merge2.pp  2.04s user 0.21s system 96% cpu 2.337 total

So merge is only a little bit better than reduce since it doesn't copy the hash each iteration. But it's still terrible.

Since the reduce and merge block parameters don't specify a type, they effectively accept PAnyType. Which I think means all of the type inference we're doing is for naught.

I pushed a branch https://github.com/puppetlabs/puppet/compare/main...joshcooper:any_callable_9561 that skips type inference if the parameter types are all PAnyType. With that change, times drop to:

❯ time bundle exec puppet apply reduce.pp
Notice: Compiled catalog for localhost in environment production in 0.29 seconds
Notice: Applied catalog in 0.02 seconds
bundle exec puppet apply reduce.pp  2.13s user 0.19s system 96% cpu 2.403 total
 
❯ time bundle exec puppet apply merge.pp 
Notice: Compiled catalog for localhost in environment production in 0.16 seconds
Notice: Applied catalog in 0.02 seconds
bundle exec puppet apply merge.pp  1.96s user 0.20s system 95% cpu 2.262 total
 
❯ time bundle exec puppet apply merge2.pp
Notice: Compiled catalog for localhost in environment production in 0.15 seconds
Notice: Applied catalog in 0.02 seconds
bundle exec puppet apply merge2.pp  1.92s user 0.21s system 95% cpu 2.214 total

Josh Cooper (Jira)

unread,
Sep 29, 2021, 8:47:03 PM9/29/21
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Sep 29, 2021, 8:48:01 PM9/29/21
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Sep 29, 2021, 8:48:02 PM9/29/21
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Oct 1, 2021, 3:08:02 PM10/1/21
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Oct 1, 2021, 3:08:03 PM10/1/21
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Oct 5, 2021, 12:18:02 PM10/5/21
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Oct 5, 2021, 12:18:03 PM10/5/21
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Oct 6, 2021, 6:10:03 PM10/6/21
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Oct 6, 2021, 6:16:03 PM10/6/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Release Notes: Enhancement
Release Notes Summary: Greatly speeds up the amount of time taken to type check arguments passed to blocks of iterative functions, such as reduce and merge.

Josh Cooper (Jira)

unread,
Oct 7, 2021, 1:53:02 PM10/7/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Summary: Type checking arguments passed to a block (reduce puppet language functions are 11 , merge, etc) is very slow 350x slower than ruby functions

Josh Cooper (Jira)

unread,
Oct 8, 2021, 5:39:03 PM10/8/21
to puppe...@googlegroups.com
Josh Cooper updated an issue
Change By: Josh Cooper
Fix Version/s: PUP 7.13.0
Fix Version/s: PUP 7.12.0

Josh Cooper (Jira)

unread,
Oct 8, 2021, 5:40:02 PM10/8/21
to puppe...@googlegroups.com
Josh Cooper commented on Task PUP-9561
 
Re: puppet language functions are 11,350x slower than ruby functions

This was cherry-picked to the release branch for 7.12.0: d3a2ea9f5d825037ec3e5f72c26f6199ad121bd5

Josh Cooper (Jira)

unread,
Oct 8, 2021, 10:48:02 PM10/8/21
to puppe...@googlegroups.com

Ciprian Badescu (Jira)

unread,
Oct 11, 2021, 2:38:02 AM10/11/21
to puppe...@googlegroups.com
Ciprian Badescu updated an issue
 
Change By: Ciprian Badescu
Fix Version/s: PUP 6.26.0
Fix Version/s: PUP 6.25.1

Claire Cadman (Jira)

unread,
Oct 11, 2021, 12:01:03 PM10/11/21
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages