Jira (PDB-5450) PuppetDB 7.9.0 really slow

Issue Type:	Task
Affects Versions:	PDB 7.9.0
Assignee:	Unassigned
Components:	PuppetDB
Created:	2022/02/03 6:11 AM
Priority:	Normal
Reporter:	Elof Ofel

This is a pretty vague report, I know, but I file it anyhow.

I'm running PuppetDB (an puppetserver and puppetexplorer) as a docker container, managed via pupperware.
I just upgraded the PuppetDB container from v7.8.0 to v7.9.0.
After that, querys from PuppetExplorer are really slow. Sometimes they even timeout.

In the PuppetDB container I see no warnings in the log.
In the PuppetExplorer container I see no warnings if there is a response (after ~40s), otherwise it logs:

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

So, my guess is that something in those postgres threads take 40 seconds before they finally return.

Meanwhile, the puppet agents speak to the puppetserver as usual - no increase in the time to apply the catalog, and the server replaces the facts and stores the reports to PuppetDB without problem.

In https://puppet.com/docs/puppetdb/7/release_notes.html#puppetdb-790 I see:
"Improved performance of the fact-contents endpoint. Testing against a database of 10,000 mocked nodes, there was an observed 84% decrease in time taken to complete a difficult query. https://tickets.puppetlabs.com/browse/PDB-5259"
Could it be this change that degraded the performance for me?
Possibly a bug in some library that distributes multiple querys over a pool of postgres threads?

This message was sent by Atlassian Jira (v8.20.2#820002-sha1:829506d)

Elof Ofel (Jira)

unread,

Feb 3, 2022, 9:13:02 AM2/3/22

to puppe...@googlegroups.com

Elof Ofel updated an issue

PuppetDB /

Change By:	Elof Ofel

This is a pretty vague report, I know, but I file it anyhow.

I'm running PuppetDB ( an and puppetserver and puppetexplorer) as a docker container, managed via pupperware.

I just upgraded the PuppetDB container from v7.8.0 to v7.9.0.

After that, querys from PuppetExplorer are *really* slow. Sometimes they even timeout.

All the cells in the PuppetExplorer dashboard usually update quickly (less than 1 second) for the last year during various PuppetDB 7.x.x versions.
If I now reload the dashboard, a few cells immediately show a value while the others show the rotating animation waiting for a response. After 40 seconds they finally show a value (and sometimes it times out completely).

In the PuppetDB container I see no warnings in the log.
In the PuppetExplorer container I see no warnings if there is a response (after ~40s), otherwise it logs:

{noformat}

puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled

{noformat}

I'm running 'top' on the docker host.
I reload the PuppetExplorer dashboard (and it sends a bunch of querys to PDB).
In 'top' I can see how ca 20 postgres processes appear and each use 10% CPU.
The processes are there for ~40 seconds and then disappear. At the same time my browser finally show values in all dashboard cells.

So, my guess is that something in those postgres threads take 40 seconds before they finally return.

Meanwhile, the puppet agents speak to the puppetserver as usual - no increase in the time to apply the catalog, and the server replaces the facts and stores the reports to PuppetDB without problem.

In https://puppet.com/docs/puppetdb/7/release_notes.html#puppetdb-790 I see:
"Improved performance of the fact-contents endpoint. Testing against a database of 10,000 mocked nodes, there was an observed 84% decrease in time taken to complete a difficult query. https://tickets.puppetlabs.com/browse/PDB-5259"
Could it be this change that degraded the performance for me?
Possibly a bug in some library that distributes multiple querys over a pool of postgres threads?

Elof Ofel (Jira)

unread,

Feb 3, 2022, 9:14:02 AM2/3/22

to puppe...@googlegroups.com

Elof Ofel updated an issue

PuppetDB /

Change By:	Elof Ofel

This is a pretty vague report, I know, but I file it anyhow.

I'm running PuppetDB (and puppetserver and puppetexplorer) as a docker container, managed via pupperware.

I just upgraded the PuppetDB container from v7.8.0 to v7.9.0.
After that, querys from PuppetExplorer are *really* slow. Sometimes they even timeout.

All the cells in the PuppetExplorer dashboard usually update quickly (less than 1 second) for the last year during various PuppetDB 7.x.x versions.

If Now when I now reload the dashboard, a few cells immediately show a value while the others show the rotating animation waiting for a response. After 40 seconds they finally show a value (and sometimes it times out completely).

Elof Ofel (Jira)

unread,

Feb 3, 2022, 9:18:03 AM2/3/22

to puppe...@googlegroups.com

Elof Ofel updated an issue

PuppetDB /

Change By:	Elof Ofel

This is a pretty vague report, I know, but I file it anyhow.

I'm running PuppetDB (and puppetserver and puppetexplorer) as a docker container, managed via pupperware.
I just upgraded the PuppetDB container from v7.8.0 to v7.9.0.
After that, querys from PuppetExplorer are *really* slow. Sometimes they even timeout.

All the cells in the PuppetExplorer dashboard usually update quickly (less than 1 second) for the last year during various PuppetDB 7.x.x versions.

Now when I reload the dashboard, a few cells immediately show a value while the others show the rotating animation , waiting for a response. After 40 seconds they finally show a value (and sometimes it is times out completely and an error page is displayed ).

In the PuppetDB container I see no warnings in the log.

In the PuppetExplorer container I see no warnings if there is a response (after ~40s), otherwise it logs this (and my browser show a red puppetexplorer timeout page) :

{noformat}
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
{noformat}

I'm running 'top' on the docker host.
I reload the PuppetExplorer dashboard (and it sends a bunch of querys to PDB).
In 'top' I can see how ca 20 postgres processes appear and each use 10% CPU.

The processes are there present for ~40 seconds and then disappear. At the same time my browser finally show values in all dashboard cells.

So, my guess is that something in those postgres threads take 40 seconds before they finally return.

The question is what.
(Let me know if I can help debugging this somehow)

Meanwhile, the all puppet agents speak to the puppetserver as usual - no increase in the time to apply the catalog, and the server replaces the facts and stores the reports to PuppetDB without problem.

Elof Ofel (Jira)

unread,

Feb 3, 2022, 9:30:05 AM2/3/22

to puppe...@googlegroups.com

Elof Ofel updated an issue

PuppetDB /

Change By:	Elof Ofel

Now when I reload the dashboard, a few cells immediately show a value while the others show the rotating animation, waiting for a response. After 40 seconds they finally show a value (and sometimes is times out completely and an error page is displayed).

In the PuppetDB container I see no warnings in the log.
In the PuppetExplorer container I see no warnings if there is a response (after ~40s), otherwise it logs this (and my browser show a red puppetexplorer timeout page):
{noformat}
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
puppetexplorer_1  | 03/Feb/2022:12:57:48 +0000 [ERROR 502 /api/pdb/query/v4/nodes] context canceled
{noformat}

I'm running 'top' on the docker host.
I reload the PuppetExplorer dashboard (and it sends a bunch of querys to PDB).
In 'top' I can see how ca 20 postgres processes appear and each use 10% CPU.

The processes are present for ~40 seconds and then disappear. At the same time my browser finally show values in all dashboard cells.

So, my guess is that something in those postgres threads take 40 seconds before they finally return.
The question is what.
(Let me know if I can help debugging this somehow)

Meanwhile, all puppet agents speak to the puppetserver as usual - no increase in the time to apply the catalog, and the server replaces the facts and stores the reports to PuppetDB without problem.

Update:
I really think it is the changes in PDB-5259 that is the culprit.
I looked at all the cells in the PuppetExplorer dashboard, and all the cells that do *not* query a fact are updated immediately (like number of nodes in Production env, nodes that has not sent a report in the last 24h).
But all the cells that query a fact (like: apt_reboot_required=true) all take 40 seconds before they update.

Austin Blatt (Jira)

unread,

Feb 3, 2022, 12:13:02 PM2/3/22

to puppe...@googlegroups.com

Austin Blatt commented on