Nick Walker (JIRA)

unread,

May 30, 2019, 6:53:03 PM5/30/19

to puppe...@googlegroups.com

Nick Walker created an issue

PuppetDB /

PDB-4386

Improve performance of catalog inserts/updates ( by using an UPSERT )

Issue Type:	Improvement
Assignee:	Unassigned
Attachments:	pgbadger_catalog_resources_slow_query.png
Created:	2019/05/30 3:52 PM
Priority:	Normal
Reporter:	Nick Walker

The Problem

After upgrade from PE 2019.0 to 2019.1 we're seeing a large uptick in the following query from running PGBadger on the postgresql logs.

Where as before the upgrade we were only seeing a few hundred instances of that slow query.

Suggested Solution

Currently we query for the existing catalog_resources and compare to what we have in PDB to determine what we should update in the table. Then we send an update query with just those resources to update.

https://github.com/puppetlabs/puppetdb/blob/efd35d761be93e4a7757f086ffb9030bb2b34133/src/puppetlabs/puppetdb/scf/storage.clj#L470

https://github.com/puppetlabs/puppetdb/blob/efd35d761be93e4a7757f086ffb9030bb2b34133/src/puppetlabs/puppetdb/scf/storage.clj#L636-L649

We can replace that logic with an INSERT INTO ... ON CONFLICT statement so that we send a single request to the database and it determines what to update instead of PuppetDB.

Add Comment

This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)

Zachary Kent (Jira)

unread,

Feb 17, 2021, 12:52:03 PM2/17/21

to puppe...@googlegroups.com

Zachary Kent updated an issue

PuppetDB /

PDB-4386

Improve performance of catalog inserts/updates ( by using an UPSERT )

Change By:	Zachary Kent
Sprint:	HAHA/Grooming

Add Comment

This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)

Zachary Kent (Jira)

unread,

Apr 21, 2021, 2:43:04 PM4/21/21

to puppe...@googlegroups.com

Zachary Kent updated an issue

PuppetDB /

PDB-4386

Improve performance of catalog inserts/updates ( by using an UPSERT )

Change By:	Zachary Kent
Story Points:	3

Add Comment

This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)

Zachary Kent (Jira)

unread,

Apr 21, 2021, 2:44:01 PM4/21/21

to puppe...@googlegroups.com

Zachary Kent updated an issue

PuppetDB /

PDB-4386

Improve performance of catalog inserts/updates ( by using an UPSERT )

Change By:	Zachary Kent
Sprint:	HAHA/Grooming HA 2020-05-05

Add Comment

Rob Browning (Jira)

unread,

Apr 23, 2021, 12:15:02 PM4/23/21

to puppe...@googlegroups.com

Rob Browning assigned an issue to Rob Browning

PuppetDB /

PDB-4386

Improve performance of catalog inserts/updates ( by using an UPSERT )

Change By:	Rob Browning
Assignee:	Rob Browning

Add Comment

Rob Browning (Jira)

unread,

Apr 23, 2021, 12:40:03 PM4/23/21

to puppe...@googlegroups.com

Rob Browning updated an issue

PuppetDB /

PDB-4386

Improve performance of catalog inserts/updates ( by using an UPSERT )

Change By:	Rob Browning
Team:	PuppetDB HA

Add Comment

Rob Browning (Jira)

unread,

May 3, 2021, 3:43:01 PM5/3/21

to puppe...@googlegroups.com

Rob Browning updated an issue

PuppetDB /

PDB-4386

Improve performance of catalog inserts/updates (by using an UPSERT)

Summary:
Change By:	Rob Browning

Improve performance of catalog inserts/updates ( by using an UPSERT )

Add Comment

Austin Blatt (Jira)

unread,

May 5, 2021, 2:05:02 PM5/5/21

to puppe...@googlegroups.com

Austin Blatt updated an issue

PuppetDB /

PDB-4386

Improve performance of catalog inserts/updates (by using an UPSERT)

Change By:	Austin Blatt
Sprint:	HA 2020-05-05 , HA 2020-05-19

Add Comment

Rob Browning (Jira)

unread,

May 14, 2021, 5:53:03 PM5/14/21

to puppe...@googlegroups.com

Rob Browning commented on

PDB-4386

Re: Improve performance of catalog inserts/updates (by using an UPSERT)

After some investigation, and if our assumptions are correct, we think it's likely to be preferable to continue diffing the catalogs in PDB. One reason is that at the moment PDB CPU is expected to be a notably less constrained resource than postgres since we have the option of running multiple command processors, and trading PDB CPU for a decreased postgres write load is potentially useful.

In addition, at least with plain SQL, while we could handle the new/changed rows via upsert, we'd still have to arrange for all the obsolete rows to be deleted. Furthermore, any unchanged rows, at least with the straightforward "on conflict update" solution, would still generate dead tuples (the only upsert that doesn't iirc is "do nothing").

Regarding the original problem, we noticed that the resource queries that were causing concern were likely running more slowly because of VM snapshot IO contention on the host.

In any case, whatever we decide in the end, we've taken this opportunity to review some of the other storage code, and identified a number of places where we handle things more efficiently, via upsert, decreasing round trips, etc. cf. PDB-5128

Add Comment

Rob Browning (Jira)

unread,

May 14, 2021, 5:53:04 PM5/14/21

to puppe...@googlegroups.com

Rob Browning updated an issue

PuppetDB /

PDB-4386

Improve performance of catalog inserts/updates (by using an UPSERT)

Change By:	Rob Browning
Story Points:	3 0

Add Comment

Rob Browning (Jira)

unread,

May 21, 2021, 2:59:01 PM5/21/21

to puppe...@googlegroups.com

Rob Browning updated an issue

PuppetDB /

PDB-4386

Improve performance of catalog inserts/updates (by using an UPSERT)

Change By:	Rob Browning
Release Notes:	Not Needed

Add Comment

Nick Walker (Jira)

unread,

May 28, 2021, 1:42:27 PM5/28/21

to puppe...@googlegroups.com

Nick Walker commented on

PDB-4386

Re: Improve performance of catalog inserts/updates (by using an UPSERT)

I'm a little skeptical of the reasoning here. However, even if we don't go with a full UPSERT it appears that we are updating, deleting, and maybe inserting a single row at a time instead of doing them as a single operation.

It seems like at a minimum we could fetch all the records, do an UPSERT as one operation for insert and update ( instead of one operation per row that needs to be updated or inserted ), and then a delete as a single operation.

I suspect it would still be better to write a single temp table that represents the new catalog and then an UPSERT followed by a delete but even without that I think removing all of the single row inserts and updates would be a boost.