Jira (PUP-9561) puppet language functions are 11,350x slower than ruby functions

Ben Ford (JIRA)

unread,

Mar 14, 2019, 2:17:03 PM3/14/19

to puppe...@googlegroups.com

Ben Ford updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Ben Ford
Reporter:	Ben Ford Sean Millichamp

Add Comment

This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)

Ben Ford (JIRA)

unread,

Mar 14, 2019, 2:17:04 PM3/14/19

to puppe...@googlegroups.com

Ben Ford created an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Issue Type:	Task
Assignee:	Unassigned
Components:	Language
Created:	2019/03/14 11:16 AM
Priority:	Normal
Reporter:	Ben Ford

From Slack: https://puppetcommunity.slack.com/archives/CFD8Z9A4T/p1552586549522900

I wrote a Puppet language function to, given an array of hashes, return a hash that counted duplicate instances of one of the keys in the original hash and then returned all of the values where the count was > 1.

this was the code:

# Find count of records with duplicates:

$id_count = $system_info.reduce({}) |$results, $rec| {

    $id = $rec['system_id']

    $current = $results[$id].lest || { 0 }

    $results + {$id => $current + 1}

# Find IDs with count > 1:

$duplicate_ids = $id_count.filter |$id, $count| { $count > 1 }.keys

on my set of records, which was around 1800 or so I think, it took more than two minutes to run, per the profile data

I converted it to Ruby and it was, of course, super fast

the ruby I ended up with was:

def find_dupes_system_info(system_info)

    res = Hash.new(0)

    system_info.each { |sys| res[sys['system_id']] += 1 }

    res.reject { |_id, count| count == 1 }.keys

end

155.4891 seconds vs 0.0137 seconds

Add Comment

Ben Ford (JIRA)

unread,

Mar 14, 2019, 2:20:02 PM3/14/19

to puppe...@googlegroups.com

Ben Ford commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

I've had similar conversations with two other people in recent weeks.

Add Comment

Ben Ford (JIRA)

unread,

Mar 14, 2019, 3:34:02 PM3/14/19

to puppe...@googlegroups.com

Ben Ford commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

After Puppet 6.3, this could be rewritten as something like (eyeball compiled only)

$system_info.group_by |$rec| { $rec['system_id'] }

                     .filter |$k, $v| { $v.size > 1}

	             .keys

Add Comment

Henrik Lindberg (JIRA)

unread,

Mar 15, 2019, 7:44:04 AM3/15/19

to puppe...@googlegroups.com

Henrik Lindberg commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

The original has exponential running time for the length of the input data, and the rewritten function is linear. For smaller number of entries (I tested with 200) the difference between the two algorithms is that Ben's version is 7x faster. For 500 entries it is 48x faster, and so on.

Add Comment

Henrik Lindberg (JIRA)

unread,

Mar 15, 2019, 8:01:06 AM3/15/19

to puppe...@googlegroups.com

Henrik Lindberg commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

If being on a version where group_by does not exist the function can be written like this:

$keys = $system_info.map |$x| { $x['system_id'] }.sort

$duplicate_ids = $keys.map |$i, $v| { if $v == $keys[$i+1] { $v } }.unique.filter |$x| { $x != undef}

This algorithm sorts they key, and for any duplicate key there will be at least one entry in the mapped result with this key (the rest are undef and filtered out).

This variant is slower than the group_by version by 2x but it is at least linear.

Add Comment

Sean Millichamp (JIRA)

unread,

Mar 18, 2019, 7:47:02 AM3/18/19

to puppe...@googlegroups.com

Sean Millichamp commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

Thanks for the suggestions. Given that I can't use group_by yet (on 5.5) I'll probably stick with the Ruby I'm using, but Henrik's suggestions were clever and interesting to read.

Would you mind clarifying what part of the original is responsible for the exponential running time? Is it the last line in the reduce that merges the results? Is that because each time through the loop requires allocating a new hash and copying all of the elements in the previous hash, before doing the merge? I believe that is a fairly common pattern I use in reduce, often to manipulate a data structure I can't otherwise modify (such as in an each construct, for example) because of immutable variables.

Add Comment

Henrik Lindberg (JIRA)

unread,

Mar 18, 2019, 12:44:03 PM3/18/19

to puppe...@googlegroups.com

Henrik Lindberg commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

Sean Millichamp You are correct - copying each time in the loop, and each time the hash is a little bigger is what kills performance.

It is a common use case, and for small sized hashes it is not too bad. For bigger hashes, one of the techniques I showed work much faster.

Add Comment

Jorie Tappa (JIRA)

unread,

Mar 18, 2019, 12:46:04 PM3/18/19

to puppe...@googlegroups.com

Jorie Tappa updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Jorie Tappa
Sub-team:	Language

Add Comment

Jorie Tappa (JIRA)

unread,

Mar 18, 2019, 12:46:05 PM3/18/19

to puppe...@googlegroups.com

Jorie Tappa updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Jorie Tappa
Team:	Server

Add Comment

Henrik Lindberg (JIRA)

unread,

Mar 19, 2019, 11:16:07 AM3/19/19

to puppe...@googlegroups.com

Henrik Lindberg commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

I added PUP-9568 as an idea for how we can add a hash builder function (by vamping the existing `merge` function) in stdlib. Maybe do the work there so it can be used with older puppet version.

Add Comment

Henrik Lindberg (JIRA)

unread,

Mar 20, 2019, 8:08:03 AM3/20/19

to puppe...@googlegroups.com

Henrik Lindberg commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

And ~~PUP-9568~~ was changed to become MODULES-8760 (for which there is now a PR to stdlib).

Add Comment

Henrik Lindberg (JIRA)

unread,

May 9, 2019, 9:41:03 AM5/9/19

to puppe...@googlegroups.com

Henrik Lindberg commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

Sean Millichamp Since the feature for merge is now added to stdlib, maybe you can use that.

And on that note, I am closing this ticket - it shows enough alternatives how to avoid exponential running time for hash construction.

Add Comment

Bartosz Blizniak (Jira)

unread,

Sep 28, 2021, 7:23:02 AM9/28/21

to puppe...@googlegroups.com

Bartosz Blizniak commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

Hi, we are currently seeing some performance issues with the "reduce" function performance and "merge".

A customer provided sample code:

$test = generate('/bin/seq', '1', '2000').split("\n") $result = $test.reduce({}) |$memo,$value| { $memo + { "key $value" => "foo" } }

This took around 3 seconds. Increasing the generate line to 4000 will increase the runtime to 12 seconds which is more than double.

Is this something that we can improve the performance on or is this expected?

Add Comment

This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)

Josh Cooper (Jira)

unread,

Sep 28, 2021, 1:37:02 PM9/28/21

to puppe...@googlegroups.com

Josh Cooper commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

Bartosz Blizniak When using the merge function, the block just needs to return the value for that iteration, so you'll want to do:

$test = generate('/bin/seq', '1', '4000').split("\n")

$result = $test.merge |$memo,$value| { { "key $value" => "foo" } }

That said, it is still slow due to the amount of time puppet spends inferring the data type of the unused $memo parameter. I think there's a simple fix to the merge function for the typical case where the $memo is not needed/wanted:

$result = $test.merge |$value| { { "key $value" => "foo" } }

Doing that drops the time to merge 4000 elements from 13.2 sec to 3.8 sec:

❯ time bx puppet apply merge.pp

Notice: Compiled catalog for localhost in environment production in 9.48 seconds

Notice: Applied catalog in 0.01 seconds

bundle exec puppet apply merge.pp  12.90s user 0.24s system 99% cpu 13.244 total

❯ time bx puppet apply merge.pp

Notice: Compiled catalog for localhost in environment production in 0.20 seconds

Notice: Applied catalog in 0.02 seconds

bundle exec puppet apply merge.pp  3.60s user 0.18s system 97% cpu 3.877 total

Add Comment

Josh Cooper (Jira)

unread,

Sep 28, 2021, 7:32:02 PM9/28/21

to puppe...@googlegroups.com

Josh Cooper commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

Actually the type inference is much worse than I expected. Here are 3 manifests, using reduce, merge and merge2 which is the same as merge except the memo isn't passed to the block.

❯ cat reduce.pp

$test = generate('/bin/sh', '-c', 'seq 1 4000').split("\n")

$result = $test.reduce({}) |$memo,$value| { $memo + { "key $value" => "foo" } }

❯ cat merge.pp

$test = generate('/bin/sh', '-c', 'seq 1 4000').split("\n")

$result = merge($test) |$memo,$value| { { "key $value" => "foo" } }

❯ cat merge2.pp

$test = generate('/bin/sh', '-c', 'seq 1 4000').split("\n")

$result = merge($test) |$value| { { "key $value" => "foo" } }

Using puppet#3828aabe8d32368faa5cdc1a428189d0cb117e52, I get:

❯ time bundle exec puppet apply reduce.pp

Notice: Compiled catalog for localhost in environment production in 8.21 seconds

Notice: Applied catalog in 0.02 seconds

bundle exec puppet apply reduce.pp  10.02s user 0.17s system 99% cpu 10.266 total

❯ time bundle exec puppet apply merge.pp

Notice: Compiled catalog for localhost in environment production in 8.09 seconds

Notice: Applied catalog in 0.02 seconds

bundle exec puppet apply merge.pp  9.88s user 0.19s system 99% cpu 10.155 total

❯ time bundle exec puppet apply merge2.pp

Notice: Compiled catalog for localhost in environment production in 0.19 seconds

Notice: Applied catalog in 0.02 seconds

bundle exec puppet apply merge2.pp  2.04s user 0.21s system 96% cpu 2.337 total

So merge is only a little bit better than reduce since it doesn't copy the hash each iteration. But it's still terrible.

Since the reduce and merge block parameters don't specify a type, they effectively accept PAnyType. Which I think means all of the type inference we're doing is for naught.

I pushed a branch https://github.com/puppetlabs/puppet/compare/main...joshcooper:any_callable_9561 that skips type inference if the parameter types are all PAnyType. With that change, times drop to:

❯ time bundle exec puppet apply reduce.pp

Notice: Compiled catalog for localhost in environment production in 0.29 seconds

Notice: Applied catalog in 0.02 seconds

bundle exec puppet apply reduce.pp  2.13s user 0.19s system 96% cpu 2.403 total

❯ time bundle exec puppet apply merge.pp

Notice: Compiled catalog for localhost in environment production in 0.16 seconds

Notice: Applied catalog in 0.02 seconds

bundle exec puppet apply merge.pp  1.96s user 0.20s system 95% cpu 2.262 total

❯ time bundle exec puppet apply merge2.pp

Notice: Compiled catalog for localhost in environment production in 0.15 seconds

Notice: Applied catalog in 0.02 seconds

bundle exec puppet apply merge2.pp  1.92s user 0.21s system 95% cpu 2.214 total

Add Comment

Josh Cooper (Jira)

unread,

Sep 29, 2021, 8:47:03 PM9/29/21

to puppe...@googlegroups.com

Josh Cooper assigned an issue to Josh Cooper

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Josh Cooper
Assignee:	Josh Cooper

Add Comment

Josh Cooper (Jira)

unread,

Sep 29, 2021, 8:48:01 PM9/29/21

to puppe...@googlegroups.com

Josh Cooper updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Josh Cooper
Team:	Froyo Coremunity

Add Comment

Josh Cooper (Jira)

unread,

Sep 29, 2021, 8:48:02 PM9/29/21

to puppe...@googlegroups.com

Josh Cooper updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Josh Cooper
Sprint:	Coremunity Kanban

Add Comment

Josh Cooper (Jira)

unread,

Oct 1, 2021, 3:08:02 PM10/1/21

to puppe...@googlegroups.com

Josh Cooper updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Josh Cooper
Fix Version/s:	PUP 7.13.0

Add Comment

Josh Cooper (Jira)

unread,

Oct 1, 2021, 3:08:03 PM10/1/21

to puppe...@googlegroups.com

Josh Cooper updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Josh Cooper
Fix Version/s:	PUP 6.26.0

Add Comment

Josh Cooper (Jira)

unread,

Oct 5, 2021, 12:18:02 PM10/5/21

to puppe...@googlegroups.com

Josh Cooper updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Josh Cooper
Epic Link:	PUP-5773

Add Comment

Josh Cooper (Jira)

unread,

Oct 5, 2021, 12:18:03 PM10/5/21

to puppe...@googlegroups.com

Josh Cooper updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Josh Cooper
Epic Link:	PUP- 5773 11073

Add Comment

Josh Cooper (Jira)

unread,

Oct 6, 2021, 6:10:03 PM10/6/21

to puppe...@googlegroups.com

Josh Cooper commented on

PUP-9561

Re: puppet language functions are 11,350x slower than ruby functions

Merged to 6.x in https://github.com/puppetlabs/puppet/commit/5bbb5cf1795418a36b368bca274035bbcdcdaada

Add Comment

Josh Cooper (Jira)

unread,

Oct 6, 2021, 6:16:03 PM10/6/21

to puppe...@googlegroups.com

Josh Cooper updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Josh Cooper
Release Notes:	Enhancement
Release Notes Summary:	Greatly speeds up the amount of time taken to type check arguments passed to blocks of iterative functions, such as reduce and merge.

Add Comment

Josh Cooper (Jira)

unread,

Oct 7, 2021, 1:53:02 PM10/7/21

to puppe...@googlegroups.com

Josh Cooper updated an issue

Puppet /

PUP-9561

puppet language functions are 11,350x slower than ruby functions

Change By:	Josh Cooper
Summary:	Type checking arguments passed to a block (reduce puppet language functions are 11 , merge, etc) is very slow 350x slower than ruby functions