Jira (PDB-4830) Big performance impact when upgrading to PuppetDB5/6

13 views
Skip to first unread message

Nacho Barrientos (Jira)

unread,
Jul 27, 2020, 9:24:04 AM7/27/20
to puppe...@googlegroups.com
Nacho Barrientos created an issue
 
PuppetDB / Bug PDB-4830
Big performance impact when upgrading to PuppetDB5/6
Issue Type: Bug Bug
Assignee: Unassigned
Created: 2020/07/27 6:23 AM
Priority: Normal Normal
Reporter: Nacho Barrientos

Hi,

We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from
PuppetDB4 to PuppetDB5.

There's a type of query that we do rather often (directly to the API or via the
PuppeDB query module when compiling catalogs) which is to select certnames for
which a certain fact has a given value. The simplest way to reproduce this
scenario is to issue something like:

GET pdb/query/v4/facts/factname/factvalue

In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the facts table we added an index like this:

"facts_value_string_idx" btree (value_string)

which for a query on the database the size above for nodes with the fact 'fqdn' set
to 'node1.example.org` performed pretty well thanks to the index:

[
 { 
 "certname": "node1.cern.ch",
 "environment": "production",
 "name": "fqdn",
 "value": "node1.cern.ch"
 } 
]

real 0m0.319s

The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.

However, in the new world (PuppetDB5 and beyond) it seems that the facts table is gone and now the fact values are in two columns in the factsets table (stable and volatile) of
type jsonb.

With the same HW setup, PostgreSQL configuration, OS configuration and a copy
of the data (we're populating both instances at the same time using an extra
submit_only_server_urls in the terminus), the query above produces the same
results as expected but it is much slower:

real 0m7.403s

We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.

Looking at the possible indices that this query could use we found this one:

"idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops)

However, taking a look at the generated SQL, PuppetDB seems to be using
jsonb_extract_path() to then compare to the value being looked for. However,
this approach cannot benefit from the index above.

I have the whole query if needed but to illustrate the issue what I did was to
analyze the query and extract the juiciest subquery that could show the
problem which is:

explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb;
 QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Seq Scan on factsets (cost=0.00..7777.63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1)
 Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb)
 Rows Removed by Filter: 36492
 Execution Time: 5292.528 ms

As you can see it's a full table scan. However, if the query constructed by
PuppetDB used for instance the @> operator which can make use of the GIN index above, the query would be much faster:

explain analyze SELECT certname FROM factsets WHERE stable||volatile @> '\{"fqdn": "node1.example.org"}';
 QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on factsets (cost=288.28..425.15 rows=36 width=20) (actual time=0.767..0.768 rows=1 loops=1)
 Recheck Cond: ((stable || volatile) @> '\{"fqdn": "node1.example.org"}'::jsonb)
 Heap Blocks: exact=1
 -> Bitmap Index Scan on idx_factsets_jsonb_merged (cost=0.00..288.27 rows=36 width=0) (actual time=0.521..0.521 rows=1 loops=1)
 Index Cond: ((stable || volatile) @> '\{"fqdn": "node1.example.org"}'::jsonb)
 Execution Time: 0.827 ms

I haven't looked very deep into the code itself but it seems to be same in HEAD. No idea either how feasible it'd be to patch PuppetDB so a more efficient query was constructed.

This is a huge blocker for us as we cannot upgrade our systems taking such a
big performance penalty. This would create bottlenecks in PuppetDB making
Puppet runs much slower, increasing the load on the system and surely leading
to time-outs when requesting catalogs.

Do you have any suggestion on how to work around this? I'm happy to provide full query plans, etc.

Thanks.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Nacho Barrientos (Jira)

unread,
Jul 27, 2020, 9:25:03 AM7/27/20
to puppe...@googlegroups.com
Nacho Barrientos updated an issue
Change By: Nacho Barrientos
Hi,

We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from

PuppetDB4 to PuppetDB5.

There's a type of query that we do rather often (directly to the API or via the

PuppeDB
PuppetDB query module when compiling catalogs) which is to select _certnames_ for
which a certain fact has a given value. The simplest way to reproduce this
scenario is to issue something like:
{noformat}GET pdb/query/v4/facts/factname/factvalue
{noformat}
In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:
{noformat}"facts_value_string_idx" btree (value_string)
{noformat}

which for a query on the database the size above for nodes with the fact 'fqdn' set

  to 'node1.example.org` performed pretty well thanks to the index:
{noformat}[

{
"certname": "node1.cern.ch",
"environment": "production",
"name": "fqdn",
"value": "node1.cern.ch"
}
]
{noformat}
*real 0m0.319s*


The index is used avoiding a full table scan and the performance is acceptable. We normally query for more complex fact combinations using several string fact values but for illustration purposes the above is good enough.

However, in the new world (PuppetDB5 and beyond) it seems that the _facts_ table is gone and now the fact values are in two columns in the _factsets_ table (_stable_ and _volatile_) of
type _jsonb_.


With the same HW setup, PostgreSQL configuration, OS configuration and a copy
of the data (we're populating both instances at the same time using an extra
_submit_only_server_urls_ in the terminus), the query above produces the same

results as expected but it is much slower:

*real 0m7.403s*


We could easily reproduce the (bad) ratio PDB4/PDB5 with queries of different complexities.

Looking at the possible indices that this query could use we found this one:
{noformat}"idx_factsets_jsonb_merged" gin ((stable || volatile) jsonb_path_ops)
{noformat}

However, taking a look at the generated SQL, PuppetDB seems to be using
_jsonb_extract_path()_ to then compare to the value being looked for. However,

this approach cannot benefit from the index above.

I have the whole query if needed but to illustrate the issue what I did was to
analyze the query and extract the juiciest subquery that could show the
problem which is:
{noformat}explain analyze SELECT certname FROM factsets WHERE jsonb_extract_path(stable||volatile, 'fqdn') = '"node1.example.org"'::jsonb;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on factsets (cost=0.00..7777.63 rows=182 width=20) (actual time=2708.687..5292.499 rows=1 loops=1)
Filter: (jsonb_extract_path((stable || volatile), VARIADIC '\{fqdn}'::text[]) = '"node1.example.org"'::jsonb)
Rows Removed by Filter: 36492
Execution Time: 5292.528 ms
{noformat}

As you can see it's a full table scan. However, if the query constructed by
PuppetDB used for instance the [@>|https://www.postgresql.org/docs/11/gin-builtin-opclasses.html] operator which can make use of the GIN index above, the query would be much faster:
{noformat}explain analyze SELECT certname FROM factsets WHERE stable||volatile @> '\{"fqdn": "node1.example.org"}';

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on factsets (cost=288.28..425.15 rows=36 width=20) (actual time=0.767..0.768 rows=1 loops=1)
Recheck Cond: ((stable || volatile) @> '\{"fqdn": "node1.example.org"}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on idx_factsets_jsonb_merged (cost=0.00..288.27 rows=36 width=0) (actual time=0.521..0.521 rows=1 loops=1)
Index Cond: ((stable || volatile) @> '\{"fqdn": "node1.example.org"}'::jsonb)
Execution Time: 0.827 ms
{noformat}
I haven't looked very deep into the code itself but it seems to be same in [HEAD|https://github.com/puppetlabs/puppetdb/blob/master/src/puppetlabs/puppetdb/query_eng/engine.clj#L529]. No idea either how feasible it'd be to patch PuppetDB so a more efficient query was constructed.


This is a huge blocker for us as we cannot upgrade our systems taking such a
big performance penalty. This would create bottlenecks in PuppetDB making
Puppet runs much slower, increasing the load on the system and surely leading
to time-outs when requesting catalogs.

Do you have any suggestion on how to work around this? I'm happy to provide full query plans, etc.

Thanks.

Nacho Barrientos (Jira)

unread,
Jul 27, 2020, 9:26:02 AM7/27/20
to puppe...@googlegroups.com
Nacho Barrientos updated an issue
Hi,

We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.

There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select _certnames_ for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:

Nacho Barrientos (Jira)

unread,
Jul 27, 2020, 9:27:03 AM7/27/20
to puppe...@googlegroups.com
Nacho Barrientos updated an issue
Hi,

We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.

There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select _certnames_ for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:
{noformat}GET pdb/query/v4/facts/factname/factvalue
{noformat}
In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:
{noformat}"facts_value_string_idx" btree (value_string)
{noformat}
which for a query on the database the size above for nodes with the fact 'fqdn' set  to 'node1.example.org` performed pretty well thanks to the index:
{noformat}[
{
"certname": "node1. cern example . ch org ",

"environment": "production",
"name": "fqdn",
"value": "node1. cern example . ch org "

Nacho Barrientos (Jira)

unread,
Jul 27, 2020, 10:51:03 AM7/27/20
to puppe...@googlegroups.com
Nacho Barrientos updated an issue
Hi,

We're preparing the upgrade of a rather big PuppetDB instance (~40k hosts) from PuppetDB4 to PuppetDB5.

There's a type of query that we do rather often (directly to the API or via the PuppetDB query module when compiling catalogs) which is to select _certnames_ for which a certain fact has a given value. The simplest way to reproduce this scenario is to issue something like:
{noformat}GET pdb/query/v4/facts/factname/factvalue
{noformat}
In the 'old' PuppetDB world (<=PuppetDB4), where the fact values were normalised in the _facts_ table we added an index like this:
{noformat}"facts_value_string_idx" btree (value_string)
{noformat}
which for a query on the database the size above for nodes with the fact 'fqdn' _fqdn_ set  to 'node1 _node1 .example. org` org_ performed pretty well thanks to the index:
{noformat}[
{
"certname": "node1.example.org",

"environment": "production",
"name": "fqdn",
"value": "node1.example.org"

Austin Blatt (Jira)

unread,
Jul 28, 2020, 7:14:03 PM7/28/20
to puppe...@googlegroups.com
Austin Blatt commented on Bug PDB-4830
 
Re: Big performance impact when upgrading to PuppetDB5/6

Nacho Barrientos Have you checked out the inventory endpoint it often has better performance for single fact queries like this.

An example inventory PQL query that should return return a similar set of nodes as the facts query above is

inventory[] { facts.<factname> = "<value>" }

In PDB 6.7.0+ you can also restrict the facts that are returned with dot notation

inventory[certname, facts.os.family] { facts.<factname> = "<value>" }

Nacho Barrientos (Jira)

unread,
Jul 29, 2020, 8:56:03 AM7/29/20
to puppe...@googlegroups.com

Hi Austin Blatt,

Thanks for taking a look at the ticket and for coming back to me.

For the use-case of retrieving arbitrary fact values for simple fact queries, using the inventory is indeed much faster:

# PuppetDB 5
GET raw pdb/query/v4/facts/fqdn/node1.example.org
"node1.example.org"
 
0m8.358s

# PuppetDB 5
GET raw pdb/query/v4/inventory --query 'query=["=", "facts.fqdn", "node1.example.org"]' | jq '.[].certname'
"node1.example.org"
 
0m0.236s

which is comparable to the performance we have in production:

# PuppetDB 4
GET raw pdb/query/v4/facts/fqdn/node1.example.org | jq '.[].certname'
"node1.example.org"
 
0m0.247s

However I have to confess that the problem explained is just the tip of the iceberg (I was trying to simplify it to get to the heart of the problem). Most of the queries that our PuppetDBs receive come from the masters when compiling catalogs using the puppetdbquery module. This module allows creating arbitrary PuppetDB queries combining several facts and even selecting the output as you're mentioning but it consumes everything from the facts entrypoint. The module is very convenient as it allows writing queries using a simple language hiding the complexity behind PuppetDB::Parser.facts_query.

Perhaps the module could be patched to use the /inventory endpoint instead but I fear that this would not be trivial because for "complex" queries it would have to split, issue and post-process the multiple results to join/combine the data on the client side which is far from ideal in my opinion. I believe this should be the job of the database as it has been so far.

The functions provided by the module are heavily used in our code base and changing that interface would be very very painful. For us it's really important to have a performant /facts entrypoint

Austin Blatt (Jira)

unread,
Jul 29, 2020, 12:47:04 PM7/29/20
to puppe...@googlegroups.com
Austin Blatt commented on Bug PDB-4830

Yeah, that makes sense. The facts endpoint does appear to be underperforming, and I haven't found any queries yet that would actually make use of the index idx_factsets_jsonb_merged, so I'm going to do some investigation in the git history and try to see if it was added because someone thought it would optimize queries like these, or for another reason.

If we do go down the path of optimizing the /facts endpoint with the @> operator, it'll need to be for specific types of /facts queries as it's not as general as the function (which is just the #> operator). Is it important for your usecase that only the queries like /pdb/query/v4/facts/<FACT NAME>/<VALUE> are performant, or do you also make use of queries that don't provide the value, like /pdb/query/v4/facts/<FACT NAME>

Nacho Barrientos (Jira)

unread,
Jul 30, 2020, 4:43:03 AM7/30/20
to puppe...@googlegroups.com

Thanks once again for your time.

Is it important for your usecase that only the queries like /pdb/query/v4/facts/<FACT NAME>/<VALUE> are performant, or do you also make use of queries that don't provide the value, like /pdb/query/v4/facts/<FACT NAME>?

It'd be a good starting point but to be honest it would not be enough. Most of our queries (generated by the module above) talk directly to /pdb/query/v4/facts passing a query, for example:

"Get the value of the ipaddress fact for nodes whose fact fact1 has x as value, fact fact2 has y and fact fact3 has z":

via the query_nodes() function:

query_nodes('fact1="x" and fact2="y" and fact3="z"',  ipaddress)

translates into:

GET pdb/query/v4/facts --query
'["extract",["value"],["and",["and",["in","certname",["extract","certname",["select_fact_contents",["and",["=","path",["fact2"]],["=","value","y"]]]]],["in","certname",["extract","certname",["select_fact_contents",["and",["=","path",["fact1"]],["=","value","x"]]]]],["in","certname",["extract","certname",["select_fact_contents",["and",["=","path",["fact3"]],["=","value","z"]]]]]],["or",["=","name","ipaddress"]]]]'
# PuppetDB 4
real    0m0.471s
# PuppetDB 5
real    0m16.302s

The heaviest part of the query is the part that looks for certnames matching the search query. As I mentioned above PuppetDB profits from the index facts_value_string_idx (our facts table has more than 40M rows) generating a much more efficient query plan whereas PuppetDB5 does a full table scan (perhaps several times) plus of course all the calls to jsonb_extract_path() which presumably have to parse all contents of the columns stable and volatile.

The "production" execution times allow us to have those queries as part of a catalog compilations. Having such a performance drop is a killer because we have users that combine several queries in the same compilation and that time delta would make all those compilations timeout basically

The point is to make sure that searching certnames by fact values is fast, this way any kind of query interface (the above of just /pdb/query/v4/facts/<FACT NAME>/<VALUE> would profit.

Kamil Swoboda (Jira)

unread,
Mar 2, 2021, 7:13:57 PM3/2/21
to puppe...@googlegroups.com
Kamil Swoboda commented on Bug PDB-4830

Nacho Barrientos were you able to resolve this issue? We are facing exactly the same problem during migration from Puppet 4 to Puppet 7. Works fine on Puppetdb 4 and starting from Puppetdb5 query performance is terrible. My ticket is PDB-5051 would be really great if you've got some information how to fix this problem or if you've found some sort of a workaround  Thanks! 

Nacho Barrientos (Jira)

unread,
Mar 3, 2021, 2:25:03 AM3/3/21
to puppe...@googlegroups.com

Not at all, we're stuck in PuppetDB4 due to this

Kamil Swoboda (Jira)

unread,
Mar 4, 2021, 5:48:02 AM3/4/21
to puppe...@googlegroups.com
Kamil Swoboda commented on Bug PDB-4830

Too bad  Thanks for the replay though! If anything changes would be great if you'd post here  

Margaret Lee (Jira)

unread,
Apr 29, 2021, 6:30:02 PM4/29/21
to puppe...@googlegroups.com
Margaret Lee updated an issue
 
Change By: Margaret Lee
Epic Link: PE-31891
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Nacho Barrientos (Jira)

unread,
Jun 19, 2023, 2:19:02 PM6/19/23
to puppe...@googlegroups.com
Nacho Barrientos commented on Bug PDB-4830
 
Re: Big performance impact when upgrading to PuppetDB5/6

It would have been nice if, after three years of basically zero communication and finally closing this issue as "won't fix", (at least) some pointers to the "current work" had been left in the ticket. I think it's the bare minimum.

This message was sent by Atlassian Jira (v8.20.21#820021-sha1:38274c8)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages