Jira (PDB-3747) PDB holds summary query transactions open during long syncs

0 views
Skip to first unread message

Charlie Sharpsteen (JIRA)

unread,
Nov 3, 2017, 5:20:04 PM11/3/17
to puppe...@googlegroups.com
Charlie Sharpsteen updated an issue
 
PuppetDB / Bug PDB-3747
PDB holds summary query transactions open during long syncs
Change By: Charlie Sharpsteen
Summary: PDB holds  summary  query transactions open during long syncs
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.0.2#70111-sha1:88534db)
Atlassian logo

Owen Rodabaugh (JIRA)

unread,
Nov 7, 2017, 11:22:03 AM11/7/17
to puppe...@googlegroups.com
Owen Rodabaugh updated an issue
Change By: Owen Rodabaugh
CS Priority: Needs Priority Major
CS Impact: When you reconnect a partitioned HA setup this can cause the postgresql nodes to slow to a crawl a few hours later as these long running transactions prevent maintenance.

This can take hours to resolve
CS Severity: 4 - Major
CS Business Value: 4 - $$$$$
CS Frequency: 2 - 5-25% of Customers

Nick Walker (JIRA)

unread,
Apr 10, 2018, 2:47:03 PM4/10/18
to puppe...@googlegroups.com
Nick Walker commented on Bug PDB-3747
 
Re: PDB holds summary query transactions open during long syncs

Russell Mull thoughts on this? Seems like a cascading failure when sync fails and you need to re-sync you don't want to get into this situation.

This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Russell Mull (JIRA)

unread,
Apr 10, 2018, 3:03:03 PM4/10/18
to puppe...@googlegroups.com

Austin Blatt (JIRA)

unread,
Oct 10, 2018, 5:31:04 PM10/10/18
to puppe...@googlegroups.com

Austin Blatt (JIRA)

unread,
Oct 17, 2018, 4:41:10 PM10/17/18
to puppe...@googlegroups.com

Austin Blatt (JIRA)

unread,
Oct 17, 2018, 4:41:10 PM10/17/18
to puppe...@googlegroups.com
Austin Blatt assigned an issue to Unassigned

Adam Bottchen (JIRA)

unread,
Dec 6, 2018, 4:04:04 PM12/6/18
to puppe...@googlegroups.com

Yasmin Rajabi (JIRA)

unread,
Feb 26, 2019, 4:45:04 PM2/26/19
to puppe...@googlegroups.com
Yasmin Rajabi commented on Bug PDB-3747
 
Re: PDB holds summary query transactions open during long syncs

This is currently on the prioritized backlog according to Rob Browning

Yasmin Rajabi (JIRA)

unread,
Sep 18, 2019, 4:32:05 PM9/18/19
to puppe...@googlegroups.com
Yasmin Rajabi commented on Bug PDB-3747

Rob Browning when we last chatted you mentioned this was on the backlog, do you have an idea where it sits?

Charlie Sharpsteen (JIRA)

unread,
Sep 18, 2019, 4:39:03 PM9/18/19
to puppe...@googlegroups.com

Since the description is fairly technical, a higher level overview is:

  • The primary and replica become de-synced and copying the data required to sync back up will take several hours.
  • After a couple hours, the DB queries that are running for sync start to severely impact the performance of the database that is providing the data.
  • At this point, the PE administrator has to start periodically interrupting the connection between the two PuppetDB instances in order to force the sync to avoid queries that run long enough to avoid a performance impact.

Nick Walker (JIRA)

unread,
Sep 18, 2019, 5:07:03 PM9/18/19
to puppe...@googlegroups.com
Nick Walker commented on Bug PDB-3747

It seems like when we're looking for commands to sync we should do a select columns from table LIMIT <some reasonable number> and then sync those commands and commit the change and keep looping until we don't have sync work left to do. That way we commit changes in small batches and don't leave a long running transaction.

Austin Blatt is that reasonable?

Austin Blatt (JIRA)

unread,
Sep 18, 2019, 5:27:03 PM9/18/19
to puppe...@googlegroups.com
Austin Blatt commented on Bug PDB-3747

Yes, we already batch the transfer in sets of 2,000, but we leave the summary query open and start the next set. My general plan is to end that query entirely after the first batch and start over. Unfortunately that'll mean we re-run the summary query every 2,000 items transferred, but those are fairly lightweight queries afaik (we should verify that during this work and optimize where necessary if that is not the case).

Nick Walker (JIRA)

unread,
Sep 18, 2019, 6:29:03 PM9/18/19
to puppe...@googlegroups.com
Nick Walker commented on Bug PDB-3747

Can we just add a limit 2000 to the summary query? That should make it even more lightweight

Austin Blatt (JIRA)

unread,
Sep 18, 2019, 6:51:03 PM9/18/19
to puppe...@googlegroups.com
Austin Blatt commented on Bug PDB-3747

Yeah, that sounds like a reasonable change.

Rob Browning (JIRA)

unread,
Oct 1, 2019, 6:41:03 PM10/1/19
to puppe...@googlegroups.com
Rob Browning commented on Bug PDB-3747

Yasmin Rajabi It's on our team's list of priorities, and it sounds like it's on others' lists as well, so given that, and what it sounds like may be a more general emphasis on HA, I assume it's likely to be addressed in our next rounds of work – we definitely want to fix it.

Austin Blatt (Jira)

unread,
Mar 5, 2020, 8:51:02 PM3/5/20
to puppe...@googlegroups.com
Austin Blatt updated an issue
 
Change By: Austin Blatt
Release Notes: Not Needed
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo
Reply all
Reply to author
Forward
0 new messages