Jira (PDB-4386) Improve performance of catalog inserts/updates ( by using an UPSERT )

0 views
Skip to first unread message

Nick Walker (JIRA)

unread,
May 30, 2019, 6:53:03 PM5/30/19
to puppe...@googlegroups.com
Nick Walker created an issue
 
PuppetDB / Improvement PDB-4386
Improve performance of catalog inserts/updates ( by using an UPSERT )
Issue Type: Improvement Improvement
Assignee: Unassigned
Attachments: pgbadger_catalog_resources_slow_query.png
Created: 2019/05/30 3:52 PM
Priority: Normal Normal
Reporter: Nick Walker

The Problem

After upgrade from PE 2019.0 to 2019.1 we're seeing a large uptick in the following query from running PGBadger on the postgresql logs.

Where as before the upgrade we were only seeing a few hundred instances of that slow query.

Suggested Solution

Currently we query for the existing catalog_resources and compare to what we have in PDB to determine what we should update in the table. Then we send an update query with just those resources to update.

https://github.com/puppetlabs/puppetdb/blob/efd35d761be93e4a7757f086ffb9030bb2b34133/src/puppetlabs/puppetdb/scf/storage.clj#L470

https://github.com/puppetlabs/puppetdb/blob/efd35d761be93e4a7757f086ffb9030bb2b34133/src/puppetlabs/puppetdb/scf/storage.clj#L636-L649

We can replace that logic with an INSERT INTO ... ON CONFLICT statement so that we send a single request to the database and it determines what to update instead of PuppetDB.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Zachary Kent (Jira)

unread,
Feb 17, 2021, 12:52:03 PM2/17/21
to puppe...@googlegroups.com
Zachary Kent updated an issue
Change By: Zachary Kent
Sprint: HAHA/Grooming
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Zachary Kent (Jira)

unread,
Apr 21, 2021, 2:43:04 PM4/21/21
to puppe...@googlegroups.com
Zachary Kent updated an issue
Change By: Zachary Kent
Story Points: 3
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Zachary Kent (Jira)

unread,
Apr 21, 2021, 2:44:01 PM4/21/21
to puppe...@googlegroups.com

Rob Browning (Jira)

unread,
Apr 23, 2021, 12:15:02 PM4/23/21
to puppe...@googlegroups.com

Rob Browning (Jira)

unread,
Apr 23, 2021, 12:40:03 PM4/23/21
to puppe...@googlegroups.com

Rob Browning (Jira)

unread,
May 3, 2021, 3:43:01 PM5/3/21
to puppe...@googlegroups.com
Rob Browning updated an issue
Change By: Rob Browning
Summary:
Improve performance of catalog inserts/updates ( by using an UPSERT )

Austin Blatt (Jira)

unread,
May 5, 2021, 2:05:02 PM5/5/21
to puppe...@googlegroups.com
Austin Blatt updated an issue
Change By: Austin Blatt
Sprint: HA 2020-05-05 , HA 2020-05-19

Rob Browning (Jira)

unread,
May 14, 2021, 5:53:03 PM5/14/21
to puppe...@googlegroups.com
Rob Browning commented on Improvement PDB-4386
 
Re: Improve performance of catalog inserts/updates (by using an UPSERT)

After some investigation, and if our assumptions are correct, we think it's likely to be preferable to continue diffing the catalogs in PDB. One reason is that at the moment PDB CPU is expected to be a notably less constrained resource than postgres since we have the option of running multiple command processors, and trading PDB CPU for a decreased postgres write load is potentially useful.

In addition, at least with plain SQL, while we could handle the new/changed rows via upsert, we'd still have to arrange for all the obsolete rows to be deleted. Furthermore, any unchanged rows, at least with the straightforward "on conflict update" solution, would still generate dead tuples (the only upsert that doesn't iirc is "do nothing").

Regarding the original problem, we noticed that the resource queries that were causing concern were likely running more slowly because of VM snapshot IO contention on the host.

In any case, whatever we decide in the end, we've taken this opportunity to review some of the other storage code, and identified a number of places where we handle things more efficiently, via upsert, decreasing round trips, etc. cf. PDB-5128

Rob Browning (Jira)

unread,
May 14, 2021, 5:53:04 PM5/14/21
to puppe...@googlegroups.com

Rob Browning (Jira)

unread,
May 21, 2021, 2:59:01 PM5/21/21
to puppe...@googlegroups.com

Nick Walker (Jira)

unread,
May 28, 2021, 1:42:27 PM5/28/21
to puppe...@googlegroups.com
Nick Walker commented on Improvement PDB-4386
 
Re: Improve performance of catalog inserts/updates (by using an UPSERT)

I'm a little skeptical of the reasoning here. However, even if we don't go with a full UPSERT it appears that we are updating, deleting, and maybe inserting a single row at a time instead of doing them as a single operation.

It seems like at a minimum we could fetch all the records, do an UPSERT as one operation for insert and update ( instead of one operation per row that needs to be updated or inserted ), and then a delete as a single operation.

I suspect it would still be better to write a single temp table that represents the new catalog and then an UPSERT followed by a delete but even without that I think removing all of the single row inserts and updates would be a boost.

Jenna McCarthy (Jira)

unread,
Jul 19, 2021, 3:11:04 PM7/19/21
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages