Jira (PDB-4930) Investigate PG deadlocks with latest_report_id

2 views
Skip to first unread message

Zachary Kent (Jira)

unread,
Oct 13, 2020, 1:58:03 PM10/13/20
to puppe...@googlegroups.com
Zachary Kent created an issue
 
PuppetDB / Improvement PDB-4930
Investigate PG deadlocks with latest_report_id
Issue Type: Improvement Improvement
Assignee: Unassigned
Created: 2020/10/13 10:57 AM
Priority: Normal Normal
Reporter: Zachary Kent

One of our PE customers recently had a few instances of deadlocks that postgres detected in their logs. See below for example
 

2020-10-13 04:44:56.786 GMT [db:pe-puppetdb,sess:5f852d78.4f01,pid:20225,vtid:102/1047291,tid:181623009] ERROR:  deadlock detected
2020-10-13 04:44:56.786 GMT [db:pe-puppetdb,sess:5f852d78.4f01,pid:20225,vtid:102/1047291,tid:181623009] DETAIL:  Process 20225 waits for ShareLock on transaction 181623010; blocked by process 21558.
	Process 21558 waits for ShareLock on transaction 181623009; blocked by process 20225.
	Process 20225: UPDATE certnames SET latest_report_id = $1,latest_report_timestamp = $2 WHERE certname = $3AND ( latest_report_timestamp < $4      OR latest_report_timestamp is NULL )
	Process 21558: UPDATE certnames SET latest_report_id = $1,latest_report_timestamp = $2 WHERE certname = $3AND ( latest_report_timestamp < $4      OR latest_report_timestamp is NULL )

These deadlocks appear to be related to contention around latest_reports_id and the unique constraint it uses in its index. In this case postgres detected the deadlock and killed one of the transactions producing the error above. here is an article that explains this issue. It may be possible with sync that we could transfer multiple reports for the same certname close to one another.

We should investigate how PDB ingests reports for the same certname concurrently to see if there is a way to avoid this situation or at least determine that we don't run the risk of a full deadlock that postgres can't detect.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.5.2#805002-sha1:a66f935)
Atlassian logo

Austin Boyd (Jira)

unread,
Oct 14, 2020, 4:17:03 AM10/14/20
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Zendesk Ticket Count: 1
Zendesk Ticket IDs: 41188

Austin Boyd (Jira)

unread,
Oct 14, 2020, 4:17:03 AM10/14/20
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Labels: jira_escalated
Reply all
Reply to author
Forward
0 new messages