Jira (PDB-4600) Investigate OOM errors during resource_events partitioning migration

11 views
Skip to first unread message

Zachary Kent (JIRA)

unread,
Dec 12, 2019, 2:05:04 PM12/12/19
to puppe...@googlegroups.com
Zachary Kent created an issue
 
PuppetDB / Bug PDB-4600
Investigate OOM errors during resource_events partitioning migration
Issue Type: Bug Bug
Assignee: Unassigned
Created: 2019/12/12 11:04 AM
Priority: Normal Normal
Reporter: Zachary Kent

When running the resource_events partition migration (#73) PDB will OOM after a number of events have been migrated. This issue was first seen when restoring the SLV data and then restarting PDB.

Instructions for restoring the SLV data can be found here

It appears you can work around this issue with SLV data by bumping the available heap to a sufficient level.

I was able to get it to work on a 8 core 16GB plat9 Centos7 box with the PDB java_args set to -Xms1588m -Xmx1588m

However the OOM error is present on a 4 core 8GB platform9 centos7 box with the PDB java_args set to -Xms782m -Xmx782m

We should investigate and see if we're possibly holding on to the head of a seq or doing something else that's causing the OOM error.

Example of the OOM error:

Dec 11 01:18:28 master-el7.test.net puppetdb[23051]: #
Dec 11 01:18:28 master-el7.test.net puppetdb[23051]: # java.lang.OutOfMemoryError: GC overhead limit exceeded
Dec 11 01:18:28 master-el7.test.net puppetdb[23051]: # -XX:OnOutOfMemoryError="kill -9 %p"
Dec 11 01:18:28 master-el7.test.net puppetdb[23051]: #   Executing /bin/sh -c "kill -9 23058"...
Dec 11 01:18:29 master-el7.test.net puppetdb[23051]: /opt/puppetlabs/server/apps/puppetdb/cli/apps/start: line 99: 23058 Killed                  ${JAVA_BIN} ${JAVA_ARGS} -XX:OnOutOfMemoryError="kill -9 %p" -cp "
Dec 11 01:18:29 master-el7.test.net puppetdb[23051]: Background process 23058 exited before start had completed
Dec 11 01:18:29 master-el7.test.net systemd[1]: pe-puppetdb.service: control process exited, code=exited status=1
Dec 11 01:18:29 master-el7.test.net systemd[1]: Failed to start pe-puppetdb Service.
-- Subject: Unit pe-puppetdb.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit pe-puppetdb.service has failed.
-- 
-- The result is failed.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Nick Walker (JIRA)

unread,
Dec 12, 2019, 8:58:04 PM12/12/19
to puppe...@googlegroups.com

Zachary Kent (JIRA)

unread,
Dec 13, 2019, 12:40:04 PM12/13/19
to puppe...@googlegroups.com

Nick Walker (JIRA)

unread,
Jan 6, 2020, 4:29:03 PM1/6/20
to puppe...@googlegroups.com

Zachary Kent (JIRA)

unread,
Jan 6, 2020, 4:44:03 PM1/6/20
to puppe...@googlegroups.com

Zachary Kent (JIRA)

unread,
Jan 6, 2020, 4:46:03 PM1/6/20
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages