Jira (PDB-5469) PuppetDB services shutting down due to query performance

12 views
Skip to first unread message

Subramanian Kalidass (Jira)

unread,
Mar 28, 2022, 2:15:02 AM3/28/22
to puppe...@googlegroups.com
Subramanian Kalidass created an issue
 
PuppetDB / Bug PDB-5469
PuppetDB services shutting down due to query performance
Issue Type: Bug Bug
Assignee: Unassigned
Components: PuppetDB
Created: 2022/03/27 11:14 PM
Priority: Normal Normal
Reporter: Subramanian Kalidass

Summary:

The customer reported the issue on running the node query with 50 nodes, it got failed with the console error "Error resolving pql query"  The query used was an arbitrary selection of 50 nodes using the OR condition.

Customer Organization Name  : Splunk Release Engineering 
Did it ever work?

As per the ticket, it worked for 20 nodes and in the case of 50 nodes, it got failed with the PQL error.

When did the issue first occur?

Not sure. The customer reported this issue as a possible bug.

If applicable, did you do a run with --debug --trace?

N/A

OS & Version:  Ubuntu- 18.04.6 LTS (Bionic Beaver) 

PE Version on affected machine:

2021.5

Master Or Agent Affected:

Master

All In One Or Split Install:

Split install

Steps Taken To Reproduce:

The customer has provided the below steps to reproduce this error

1. Perform an expensive PQL query from the console. In our case, we were able to reproduce the problem with 30+ OR statements in a query:

inventory[certname] { certname ~ "bastion-i-0df61833aa00e6acc" or certname ~ "compile-master-i-05d0fd872d2b1964f" or certname ~ "compile-master-i-061f9a63122235787"}

(and so on, of course, with 30+ nodes)

2. There is no step 2! 

When running this query, I get the error in the console "Error resolving pql query: Server Error". It appears that the UI timeout occurs after 60 seconds, and the user would have no idea they caused a problem.

On the backend, PuppetDB queries start getting slower and slower, and the load on the server in question climbs. In at least one test, if left unchecked, the load runs away and `apport` runs to try to gather a crash dump.

 

// 2022-03-10T17:26:59.813Z WARN [p.p.h.query] Parsing PQL took 3,705.52 ms: "nodes[certname] { report_timestamp > \"2022-03-01T00:00:00Z\" }" 2022-03-10T17:27:05.470Z WARN [p.p.h.query] Parsing PQL took 9,362.058 ms: "resources[certname] {\n type = 'Class' and\n title = 'Puppet_enterprise::Profile::Master' and\n nodes
{ deactivated is null and expired is null }

\n order by certname\n }"

 

Files Acquired:

Support Bundle from Primary master and external puppet database 

Relevant Error Messages:

We can see the following error in the puppetdb logs which is matching with the customer who has tried to execute the query.

 

// error from the puppetdb Log
at java.base/java.lang.Thread.run(Thread.java:829)
2022-03-10T17:42:47.401Z INFO [p.p.c.services] Periodic activities halted
2022-03-10T17:42:47.402Z INFO [c.z.h.HikariDataSource] PDBWritePool - Shutdown initiated...
2022-03-10T17:43:03.587Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-392 to stderr and log
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7294c26[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@22c31f38[Wrapped task = puppetlabs.puppetdb.command$schedule_delayed_message$fn__37749@f8d4060]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6d53b1d1[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
 
...
 
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2022-03-10T17:43:03.587Z INFO [c.z.h.HikariDataSource] PDBWritePool - Shutdown completed.
2022-03-10T17:43:03.588Z INFO [c.z.h.HikariDataSource] PDBReadPool - Shutdown initiated...

 

What we have done so far. 

While investigating the logs, we can see that the puppetDb services got went down. We recommended increasing the puppet db heap as well and doesn't give any improvement. 

The customer is using the external postgresSQL and after verifying the logs, we couldn't any errors in the PostgresSQL server.

What You Need From Engineering:

Based on our analysis from the logs shared by the customer,  we checked the following PDB all processes are slower and eventually shutdown because of the console query while the PosgresSQL looks to be fine and no clear evidence of any issues or performance problems.

 

// 2022-03-10T17:37:48.318Z INFO  [p.p.command] [26328-1646933566230] [126181 ms] 'replace facts' command processed for compile-master-i-03f903ab084ac8760.cloudworks.stg.splunkcloud.systems
2022-03-10T17:37:48.320Z INFO  [p.p.command] [26329-1646933609444] [113513 ms] 'replace facts' command processed for compile-master-i-0c9498c87b4df2f7a.cloudworks.stg.splunkcloud.systems
2022-03-10T17:38:12.223Z INFO  [p.p.command] [26309-1646933223584] [573506 ms] 'replace catalog' command processed for puppet-lb-i-079cb980f5ef06cac.cloudworks.stg.splunkcloud.systems
2022-03-10T17:38:14.100Z INFO  [p.p.command] [26330-1646933624551] [111323 ms] 'replace facts' command processed for compile-master-i-069283f01ed3a5595.cloudworks.stg.splunkcloud.systems
2022-03-10T17:38:27.020Z INFO  [p.p.command] [26320-1646933440008] [321946 ms] 'replace catalog' command processed for omd-worker-i-00a382be4a2a75a68.vpc-0f6ad8da6f3397703.stg.splunkcloud.systems
2022-03-10T17:38:39.703Z INFO  [p.p.command] [26301-1646933119390] [746750 ms] 'replace catalog' command processed for compile-master-i-0ba1013ac09e65788.cloudworks.stg.splunkcloud.systems
2022-03-10T17:38:45.390Z WARN  [o.e.j.s.HttpChannel] handleException /metrics/v2/read java.io.IOException: Close SendCallback@12ffc3c7[PROCESSING][i=null,cb=org.eclipse.jetty.server.HttpChannel$SendCallback@2f57a924] in state PROCESSING
2022-03-10T17:39:18.246Z INFO  [p.p.command] [26300-1646933117390] [791028 ms] 'replace catalog' command processed for compile-master-i-015d461e6969d4a8e.cloudworks.stg.splunkcloud.systems
2022-03-10T17:39:54.135Z INFO  [p.p.command] [26332-1646933679012] [123931 ms] 'replace facts' command processed for compile-master-i-0616b3f6f5b5df3d2.cloudworks.stg.splunkcloud.systems
2022-03-10T17:40:08.861Z INFO  [p.p.command] [26334-1646933692314] [118524 ms] 'replace facts' command processed for bastion-i-082f5259bf7f0a439.cloudworks.stg.splunkcloud.systems
2022-03-10T17:40:19.807Z INFO  [p.p.command] [26333-1646933682901] [129470 ms] 'replace facts' command processed for compile-master-i-00d667ea3eddbc17b.cloudworks.stg.splunkcloud.systems
2022-03-10T17:40:28.999Z INFO  [p.p.command] [26338-1646933754806] [88645 ms] 'replace facts' command processed for compile-master-i-06195bca0100aa1e6.cloudworks.stg.splunkcloud.systems
2022-03-10T17:40:28.999Z INFO  [p.p.command] [26314-1646933333701] [594698 ms] 'replace catalog' command processed for puppet-lb-i-09e9a7a0fea610107.cloudworks.stg.splunkcloud.systems
2022-03-10T17:40:28.999Z INFO  [p.p.command] [26340-1646933773831] [29397 ms] 'store report' puppet v7.14.0 command processed for bastion-i-082f5259bf7f0a439.cloudworks.stg.splunkcloud.systems
2022-03-10T17:40:34.727Z INFO  [p.p.command] [26323-1646933464378] [301851 ms] 'replace catalog' command processed for puppet-lb-i-03a75bf33b78c50b7.cloudworks.stg.splunkcloud.systems
2022-03-10T17:40:34.727Z INFO  [p.p.command] [26337-1646933754805] [115021 ms] 'replace facts' command processed for compile-master-i-07656bab49614b96b.cloudworks.stg.splunkcloud.systems
2022-03-10T17:40:47.837Z INFO  [p.p.command] [26339-1646933762404] [73629 ms] 'replace facts' command processed for puppet-lb-i-06dad768756f17fc8.cloudworks.stg.splunkcloud.systems
2022-03-10T17:40:47.838Z INFO  [p.p.command] [26306-1646933176796] [747702 ms] 'replace catalog' command processed for compile-master-i-0078fe62a6d520804.cloudworks.stg.splunkcloud.systems
2022-03-10T17:42:06.114Z INFO  [p.p.command] [26341-1646933804421] [108114 ms] 'replace facts' command processed for puppet-lb-i-0e7bcaab7c788836b.cloudworks.stg.splunkcloud.systems
2022-03-10T17:42:08.124Z INFO  [p.p.command] [26342-1646933844687] [99122 ms] 'replace facts' command processed for prometheus-i-0e481f404d2b50fe8.cloudworks.stg.splunkcloud.systems
2022-03-10T17:42:13.949Z INFO  [p.t.internal] Shutting down due to JVM shutdown hook.
2022-03-10T17:42:13.951Z INFO  [p.t.internal] Beginning shutdown sequence
2022-03-10T17:42:23.046Z INFO  [p.p.command] [26345-1646933870203] [104595 ms] 'replace facts' command processed for puppet-lb-i-08fe9b411637fa847.cloudworks.stg.splunkcloud.systems
2022-03-10T17:42:23.046Z INFO  [p.p.command] [26344-1646933850114] [108320 ms] 'replace facts' command processed for compile-master-i-051cc78ceb9fc06e5.cloudworks.stg.splunkcloud.systems
2022-03-10T17:42:24.909Z INFO  [p.p.command] [26343-1646933844687] [110180 ms] 'replace facts' command processed for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems
2022-03-10T17:42:30.738Z INFO  [p.p.command] [26346-1646933917829] [100994 ms] 'replace facts' command processed for prometheus-i-0c4f9c94aa7072bde.cloudworks.stg.splunkcloud.systems
2022-03-10T17:42:34.463Z INFO  [p.p.command] [26349-1646933925393] [80660 ms] 'replace facts' command processed for compile-master-i-01af1a48629ebaa2f.cloudworks.stg.splunkcloud.systems
2022-03-10T17:42:39.943Z WARN  [p.p.threadpool] Forcing threadpool shutdown after waiting 10,000ms: java.util.concurrent.ThreadPoolExecutor@2b180dfc[Shutting down, pool size = 12, active threads = 12, queued tasks = 0, completed tasks = 26341]
2022-03-10T17:42:39.943Z WARN  [p.p.threadpool] Threadpool forcibly shut down
2022-03-10T17:42:39.944Z INFO  [p.p.command] Periodic activities halted
2022-03-10T17:42:43.694Z INFO  [p.p.command] [26352-1646933975992] [18785 ms] 'store report' puppet v7.14.0 command processed for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems
2022-03-10T17:42:43.696Z INFO  [p.p.c.services] Shutdown request received; puppetdb exiting.
2022-03-10T17:42:43.696Z ERROR [p.p.command] [26,352] [store report] Retrying after attempt 0 for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems, due to: java.nio.channels.ClosedByInterruptException [Ljava.lang.Throwable;@1e2678b8
java.nio.channels.ClosedByInterruptException: null

 

We are not sure why the puppetdb is going down and the customer claimed that it is considered to be a bug. 

For more details on the error, please refer to the attachment from the customer.

 [^Splunk - Teapot Game Day_Customer.pdf]

Zendesk Ticket.   https://puppetlabs.zendesk.com/agent/tickets/47705

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.20.2#820002-sha1:829506d)
Atlassian logo

Subramanian Kalidass (Jira)

unread,
Mar 28, 2022, 2:23:02 AM3/28/22
to puppe...@googlegroups.com
Subramanian Kalidass updated an issue
Change By: Subramanian Kalidass
h3. Summary:


The customer reported the issue on running the node query with 50 nodes, it got failed with the console error "Error resolving pql query"  The query used was an arbitrary selection of 50 nodes using the OR condition.
h5. Customer Organization Name  : Splunk Release Engineering 
h5.
Did it ever work?

As per the ticket, it It worked for 20 nodes and in the case of 50 nodes, it got failed with the PQL error.
h5. When did the issue first occur?


Not sure. The customer reported this issue as a possible bug.
h5. If applicable, did you do a run with --debug --trace?

N/A
h4. OS & Version:  Ubuntu- 18.04.6 LTS (Bionic Beaver) 
h4. PE Version on
the affected machine:

2021.5
h4. Master Or Agent Affected:

Master
h4. All In One Or Split Install:

Split install
h4. Steps Taken To Reproduce:


The customer has provided the below steps to reproduce this error

1. Perform an expensive PQL query from the console. In our case, we were able to reproduce the problem with 30+ OR statements in a query:

inventory[certname] \{ certname ~ "bastion-i-0df61833aa00e6acc" or certname ~ "compile-master-i-05d0fd872d2b1964f" or certname ~ "compile-master-i-061f9a63122235787"}

(and so on, of course, with 30+ nodes)

2. There is no step 2! 

When running this query, I get the error in the console "Error resolving pql query: Server Error". It appears that the UI timeout occurs after 60 seconds, and the user would have no idea they caused a problem.

On the backend, PuppetDB queries start getting slower and slower, and the load on the server in question climbs. In at least one test, if left unchecked, the load runs away and `apport` runs to try to gather a crash dump.

 
|
 
{code:java}
// 2022-03-10T17:26:59.813Z WARN [p.p.h.query] Parsing PQL took 3,705.52 ms: "nodes[certname] \ { report_timestamp > \"2022-03-01T00:00:00Z\" }" 2022-03-10T17:27:05.470Z WARN [p.p.h.query] Parsing PQL took 9,362.058 ms: "resources[certname] {\n type = 'Class' and\n title = 'Puppet_enterprise::Profile::Master' and\n nodes
|
|
{ deactivated is null and expired is null }\n order by certname\n }" |   {code}
 

 

Files Acquired:

Support Bundle from Primary master and external puppet database 
h4. Relevant Error Messages:


We can see the following error in the puppetdb logs which is matching with the customer who has tried to execute the query.

 
| {code:java}
// error from the puppetdb Log |
|
at java.base/java.lang.Thread.run(Thread.java:829) |
|
2022-03-10T17:42:47.401Z INFO [p.p.c.services] Periodic activities
halted|
|2022
halted2022 -03-10T17:42:47.402Z INFO [c.z.h.HikariDataSource] PDBWritePool - Shutdown initiated... |
|
2022-03-10T17:43:03.587Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-392 to stderr and log|
|java
logjava .util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7294c26[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@22c31f38[Wrapped task = puppetlabs.puppetdb.command$schedule_delayed_message$fn__37749@f8d4060]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6d53b1d1[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] |
|
at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) |
|
  |
|
... |
|
  |
|
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) |
|
at java.base/java.lang.Thread.run(Thread.java:829) |
|
2022-03-10T17:43:03.587Z INFO [c.z.h.HikariDataSource] PDBWritePool - Shutdown completed. |
|
2022-03-10T17:43:03.588Z INFO [c.z.h.HikariDataSource] PDBReadPool - Shutdown initiated... |   {code}

 

*What we have done so far.* 


While investigating the logs, we can see that the puppetDb services got went down. We recommended increasing the puppet db heap as well and doesn't give any improvement. 

The customer is using the external postgresSQL and after verifying the logs, we couldn't any errors in the PostgresSQL server.
h3. What You Need From Engineering:


Based on our analysis from the logs shared by the customer,  we checked the following PDB all processes are slower and eventually shutdown because of the console query while the PosgresSQL looks to be fine and no clear evidence of any issues or performance problems.
{code:java}
 
|
// // 2022-03-10T17:37:48.318Z INFO  [p.p.command] [26328-1646933566230] [126181 ms] 'replace facts' command processed for compile-master-i-03f903ab084ac8760.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:37:48.320Z INFO  [p.p.command] [26329-1646933609444] [113513 ms] 'replace facts' command processed for compile-master-i-0c9498c87b4df2f7a.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:38:12.223Z INFO  [p.p.command] [26309-1646933223584] [573506 ms] 'replace catalog' command processed for puppet-lb-i-079cb980f5ef06cac.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:38:14.100Z INFO  [p.p.command] [26330-1646933624551] [111323 ms] 'replace facts' command processed for compile-master-i-069283f01ed3a5595.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:38:27.020Z INFO  [p.p.command] [26320-1646933440008] [321946 ms] 'replace catalog' command processed for omd-worker-i-00a382be4a2a75a68.vpc-0f6ad8da6f3397703.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:38:39.703Z INFO  [p.p.command] [26301-1646933119390] [746750 ms] 'replace catalog' command processed for compile-master-i-0ba1013ac09e65788.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:38:45.390Z WARN  [o.e.j.s.HttpChannel] handleException /metrics/v2/read java.io.IOException: Close SendCallback@12ffc3c7[PROCESSING][i=null,cb=org.eclipse.jetty.server.HttpChannel$SendCallback@2f57a924] in state PROCESSING|
|2022
PROCESSING2022 -03-10T17:39:18.246Z INFO  [p.p.command] [26300-1646933117390] [791028 ms] 'replace catalog' command processed for compile-master-i-015d461e6969d4a8e.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:39:54.135Z INFO  [p.p.command] [26332-1646933679012] [123931 ms] 'replace facts' command processed for compile-master-i-0616b3f6f5b5df3d2.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:40:08.861Z INFO  [p.p.command] [26334-1646933692314] [118524 ms] 'replace facts' command processed for bastion-i-082f5259bf7f0a439.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:40:19.807Z INFO  [p.p.command] [26333-1646933682901] [129470 ms] 'replace facts' command processed for compile-master-i-00d667ea3eddbc17b.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:40:28.999Z INFO  [p.p.command] [26338-1646933754806] [88645 ms] 'replace facts' command processed for compile-master-i-06195bca0100aa1e6.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:40:28.999Z INFO  [p.p.command] [26314-1646933333701] [594698 ms] 'replace catalog' command processed for puppet-lb-i-09e9a7a0fea610107.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:40:28.999Z INFO  [p.p.command] [26340-1646933773831] [29397 ms] 'store report' puppet v7.14.0 command processed for bastion-i-082f5259bf7f0a439.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:40:34.727Z INFO  [p.p.command] [26323-1646933464378] [301851 ms] 'replace catalog' command processed for puppet-lb-i-03a75bf33b78c50b7.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:40:34.727Z INFO  [p.p.command] [26337-1646933754805] [115021 ms] 'replace facts' command processed for compile-master-i-07656bab49614b96b.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:40:47.837Z INFO  [p.p.command] [26339-1646933762404] [73629 ms] 'replace facts' command processed for puppet-lb-i-06dad768756f17fc8.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:40:47.838Z INFO  [p.p.command] [26306-1646933176796] [747702 ms] 'replace catalog' command processed for compile-master-i-0078fe62a6d520804.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:42:06.114Z INFO  [p.p.command] [26341-1646933804421] [108114 ms] 'replace facts' command processed for puppet-lb-i-0e7bcaab7c788836b.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:42:08.124Z INFO  [p.p.command] [26342-1646933844687] [99122 ms] 'replace facts' command processed for prometheus-i-0e481f404d2b50fe8.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:42:13.949Z INFO  [p.t.internal] Shutting down due to JVM shutdown hook. |
|
2022-03-10T17:42:13.951Z INFO  [p.t.internal] Beginning shutdown sequence|
|2022
sequence2022 -03-10T17:42:23.046Z INFO  [p.p.command] [26345-1646933870203] [104595 ms] 'replace facts' command processed for puppet-lb-i-08fe9b411637fa847.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:42:23.046Z INFO  [p.p.command] [26344-1646933850114] [108320 ms] 'replace facts' command processed for compile-master-i-051cc78ceb9fc06e5.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:42:24.909Z INFO  [p.p.command] [26343-1646933844687] [110180 ms] 'replace facts' command processed for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:42:30.738Z INFO  [p.p.command] [26346-1646933917829] [100994 ms] 'replace facts' command processed for prometheus-i-0c4f9c94aa7072bde.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:42:34.463Z INFO  [p.p.command] [26349-1646933925393] [80660 ms] 'replace facts' command processed for compile-master-i-01af1a48629ebaa2f.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:42:39.943Z WARN  [p.p.threadpool] Forcing threadpool shutdown after waiting 10,000ms: java.util.concurrent.ThreadPoolExecutor@2b180dfc[Shutting down, pool size = 12, active threads = 12, queued tasks = 0, completed tasks = 26341] |
|
2022-03-10T17:42:39.943Z WARN  [p.p.threadpool] Threadpool forcibly shut down|
|2022
down2022 -03-10T17:42:39.944Z INFO  [p.p.command] Periodic activities halted|
|2022
halted2022 -03-10T17:42:43.694Z INFO  [p.p.command] [26352-1646933975992] [18785 ms] 'store report' puppet v7.14.0 command processed for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud. systems|
|2022
systems2022 -03-10T17:42:43.696Z INFO  [p.p.c.services] Shutdown request received; puppetdb exiting. |
|
2022-03-10T17:42:43.696Z ERROR [p.p.command] [26,352] [store report] Retrying after attempt 0 for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems, due to: java.nio.channels.ClosedByInterruptException [Ljava.lang.Throwable;@ 1e2678b8|
|java
1e2678b8java .nio.channels.ClosedByInterruptException: null |   {code}

 

We are not sure why the puppetdb is going down and the customer claimed that it is considered to be a bug. 

For more details on the error, please refer to the attachment from the customer.

 
[^Splunk - Teapot Game Day_Customer.pdf]

^Zendesk Ticket.^   ^[https://puppetlabs.zendesk.com/agent/tickets/47705]^

Subramanian Kalidass (Jira)

unread,
Mar 28, 2022, 2:24:01 AM3/28/22
to puppe...@googlegroups.com
Subramanian Kalidass updated an issue
h3. Summary:

The
customer reported the issue reported on running the node query with 50 nodes, it got failed with the console error "Error resolving pql query"  The query used was an arbitrary selection of 50 nodes using the OR condition.
h5. Did it ever work?
// error from the puppetdb Log  at java.base/java.lang.Thread.run(Thread.java:829)2022-03-10T17:42:47.401Z INFO  [p.p.c.services] Periodic activities halted2022-03-10T17:42:47.402Z INFO  [c.z.h.HikariDataSource] PDBWritePool - Shutdown initiated...2022-03-10T17:43:03.587Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-392 to stderr and logjava.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7294c26[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@22c31f38[Wrapped task = puppetlabs.puppetdb.command$schedule_delayed_message$fn__37749@f8d4060]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6d53b1d1[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]    at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) ...     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)    at java.base/java.lang.Thread.run(Thread.java:829)2022-03-10T17:43:03.587Z INFO  [c.z.h.HikariDataSource] PDBWritePool - Shutdown completed.2022-03-10T17:43:03.588Z INFO  [c.z.h.HikariDataSource] PDBReadPool - Shutdown initiated...  {code}

*What we have done so far.* 

While investigating the logs, we can see that the puppetDb services got went down. We recommended increasing the puppet db heap as well and doesn't give any improvement. 

The customer is using the external postgresSQL and after verifying the logs, we couldn't any errors in the PostgresSQL server.
h3. What You Need From Engineering:

Based on our analysis from the logs shared by the customer,  we checked the following PDB all processes are slower and eventually shutdown because of the console query while the PosgresSQL looks to be fine and no clear evidence of any issues or performance problems.
{code:java}

// // 2022-03-10T17:37:48.318Z INFO  [p.p.command] [26328-1646933566230] [126181 ms] 'replace facts' command processed for compile-master-i-03f903ab084ac8760.cloudworks.stg.splunkcloud.systems2022-03-10T17:37:48.320Z INFO  [p.p.command] [26329-1646933609444] [113513 ms] 'replace facts' command processed for compile-master-i-0c9498c87b4df2f7a.cloudworks.stg.splunkcloud.systems2022-03-10T17:38:12.223Z INFO  [p.p.command] [26309-1646933223584] [573506 ms] 'replace catalog' command processed for puppet-lb-i-079cb980f5ef06cac.cloudworks.stg.splunkcloud.systems2022-03-10T17:38:14.100Z INFO  [p.p.command] [26330-1646933624551] [111323 ms] 'replace facts' command processed for compile-master-i-069283f01ed3a5595.cloudworks.stg.splunkcloud.systems2022-03-10T17:38:27.020Z INFO  [p.p.command] [26320-1646933440008] [321946 ms] 'replace catalog' command processed for omd-worker-i-00a382be4a2a75a68.vpc-0f6ad8da6f3397703.stg.splunkcloud.systems2022-03-10T17:38:39.703Z INFO  [p.p.command] [26301-1646933119390] [746750 ms] 'replace catalog' command processed for compile-master-i-0ba1013ac09e65788.cloudworks.stg.splunkcloud.systems2022-03-10T17:38:45.390Z WARN  [o.e.j.s.HttpChannel] handleException /metrics/v2/read java.io.IOException: Close SendCallback@12ffc3c7[PROCESSING][i=null,cb=org.eclipse.jetty.server.HttpChannel$SendCallback@2f57a924] in state PROCESSING2022-03-10T17:39:18.246Z INFO  [p.p.command] [26300-1646933117390] [791028 ms] 'replace catalog' command processed for compile-master-i-015d461e6969d4a8e.cloudworks.stg.splunkcloud.systems2022-03-10T17:39:54.135Z INFO  [p.p.command] [26332-1646933679012] [123931 ms] 'replace facts' command processed for compile-master-i-0616b3f6f5b5df3d2.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:08.861Z INFO  [p.p.command] [26334-1646933692314] [118524 ms] 'replace facts' command processed for bastion-i-082f5259bf7f0a439.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:19.807Z INFO  [p.p.command] [26333-1646933682901] [129470 ms] 'replace facts' command processed for compile-master-i-00d667ea3eddbc17b.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:28.999Z INFO  [p.p.command] [26338-1646933754806] [88645 ms] 'replace facts' command processed for compile-master-i-06195bca0100aa1e6.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:28.999Z INFO  [p.p.command] [26314-1646933333701] [594698 ms] 'replace catalog' command processed for puppet-lb-i-09e9a7a0fea610107.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:28.999Z INFO  [p.p.command] [26340-1646933773831] [29397 ms] 'store report' puppet v7.14.0 command processed for bastion-i-082f5259bf7f0a439.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:34.727Z INFO  [p.p.command] [26323-1646933464378] [301851 ms] 'replace catalog' command processed for puppet-lb-i-03a75bf33b78c50b7.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:34.727Z INFO  [p.p.command] [26337-1646933754805] [115021 ms] 'replace facts' command processed for compile-master-i-07656bab49614b96b.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:47.837Z INFO  [p.p.command] [26339-1646933762404] [73629 ms] 'replace facts' command processed for puppet-lb-i-06dad768756f17fc8.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:47.838Z INFO  [p.p.command] [26306-1646933176796] [747702 ms] 'replace catalog' command processed for compile-master-i-0078fe62a6d520804.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:06.114Z INFO  [p.p.command] [26341-1646933804421] [108114 ms] 'replace facts' command processed for puppet-lb-i-0e7bcaab7c788836b.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:08.124Z INFO  [p.p.command] [26342-1646933844687] [99122 ms] 'replace facts' command processed for prometheus-i-0e481f404d2b50fe8.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:13.949Z INFO  [p.t.internal] Shutting down due to JVM shutdown hook.2022-03-10T17:42:13.951Z INFO  [p.t.internal] Beginning shutdown sequence2022-03-10T17:42:23.046Z INFO  [p.p.command] [26345-1646933870203] [104595 ms] 'replace facts' command processed for puppet-lb-i-08fe9b411637fa847.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:23.046Z INFO  [p.p.command] [26344-1646933850114] [108320 ms] 'replace facts' command processed for compile-master-i-051cc78ceb9fc06e5.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:24.909Z INFO  [p.p.command] [26343-1646933844687] [110180 ms] 'replace facts' command processed for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:30.738Z INFO  [p.p.command] [26346-1646933917829] [100994 ms] 'replace facts' command processed for prometheus-i-0c4f9c94aa7072bde.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:34.463Z INFO  [p.p.command] [26349-1646933925393] [80660 ms] 'replace facts' command processed for compile-master-i-01af1a48629ebaa2f.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:39.943Z WARN  [p.p.threadpool] Forcing threadpool shutdown after waiting 10,000ms: java.util.concurrent.ThreadPoolExecutor@2b180dfc[Shutting down, pool size = 12, active threads = 12, queued tasks = 0, completed tasks = 26341]2022-03-10T17:42:39.943Z WARN  [p.p.threadpool] Threadpool forcibly shut down2022-03-10T17:42:39.944Z INFO  [p.p.command] Periodic activities halted2022-03-10T17:42:43.694Z INFO  [p.p.command] [26352-1646933975992] [18785 ms] 'store report' puppet v7.14.0 command processed for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:43.696Z INFO  [p.p.c.services] Shutdown request received; puppetdb exiting.2022-03-10T17:42:43.696Z ERROR [p.p.command] [26,352] [store report] Retrying after attempt 0 for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems, due to: java.nio.channels.ClosedByInterruptException [Ljava.lang.Throwable;@1e2678b8java.nio.channels.ClosedByInterruptException: null  {code}

We are not sure why the puppetdb is going down and the customer claimed that it is considered to be a bug. 

 

Subramanian Kalidass (Jira)

unread,
Mar 28, 2022, 2:25:01 AM3/28/22
to puppe...@googlegroups.com
Subramanian Kalidass commented on Bug PDB-5469
 
Re: PuppetDB services shutting down due to query performance

This issue is a clone of PE-33705 for the sake of customer viewing and support ticket closure.

Selvakumar Azhagarsami (Jira)

unread,
Mar 28, 2022, 2:25:02 AM3/28/22
to puppe...@googlegroups.com
Selvakumar Azhagarsami updated an issue
 
Change By: Selvakumar Azhagarsami
h3. Summary:

The issue reported on running the node query with 50 nodes, it got failed with the console error "Error resolving pql query"  The query used was an arbitrary selection of 50 nodes using the OR condition.

h5. Did it ever work?

It worked for 20 nodes and in the case of 50 nodes, it got failed with the PQL error.
h5. When did the issue first occur?

Not sure. The customer reported this issue as a possible bug.
h5. If applicable, did you do a run with --debug --trace?

N/A
h4. OS & Version:  Ubuntu- 18.04.6 LTS (Bionic Beaver) 
h4. PE Version on the affected machine:

2021.5
h4. Master Or Agent Affected:

Master
h4. All In One Or Split Install:

Split install
h4. Steps Taken To Reproduce:

The customer has provided the below steps to reproduce this error

1. Perform an expensive PQL query from the console. In our case, we were able to reproduce the problem with 30+ OR statements in a query:
{code:java}
inventory[certname] 
\ { certname ~ "bastion-i-0df61833aa00e6acc" or certname ~ "compile-master-i-05d0fd872d2b1964f" or certname ~ "compile-master-i-061f9a63122235787"} {code}
 

(and so on, of course, with 30+ nodes)

2. There is no step 2! 

When running this query, I get the error in the console "Error resolving pql query: Server Error". It appears that the UI timeout occurs after 60 seconds, and the user would have no idea they caused a problem.

On the backend, PuppetDB queries start getting slower and slower, and the load on the server in question climbs. In at least one test, if left unchecked, the load runs away and `apport` runs to try to gather a crash dump.

 

 
{code:java}// 2022-03-10T17:26:59.813Z WARN [p.p.h.query] Parsing PQL took 3,705.52 ms: "nodes[certname] { report_timestamp > \"2022-03-01T00:00:00Z\" }" 2022-03-10T17:27:05.470Z WARN [p.p.h.query] Parsing PQL took 9,362.058 ms: "resources[certname] {\n type = 'Class' and\n title = 'Puppet_enterprise::Profile::Master' and\n nodes{ deactivated is null and expired is null }\n order by certname\n }"  {code}
 

 

Files Acquired:

Support Bundle from Primary master and external puppet database 
h4. Relevant Error Messages:

We can see the following error in the puppetdb logs which is matching with the customer who has tried to execute the query.

 
{code:java}// error from the puppetdb Log  at java.base/java.lang.Thread.run(Thread.java:829)2022-03-10T17:42:47.401Z INFO  [p.p.c.services] Periodic activities halted2022-03-10T17:42:47.402Z INFO  [c.z.h.HikariDataSource] PDBWritePool - Shutdown initiated...2022-03-10T17:43:03.587Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-392 to stderr and logjava.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7294c26[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@22c31f38[Wrapped task = puppetlabs.puppetdb.command$schedule_delayed_message$fn__37749@f8d4060]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6d53b1d1[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]    at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) ...     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)    at java.base/java.lang.Thread.run(Thread.java:829)2022-03-10T17:43:03.587Z INFO  [c.z.h.HikariDataSource] PDBWritePool - Shutdown completed.2022-03-10T17:43:03.588Z INFO  [c.z.h.HikariDataSource] PDBReadPool - Shutdown initiated...  {code}
*What we have done so far.* 

While investigating the logs, we can see that the puppetDb services got went down. We recommended increasing the puppet db heap as well and doesn't give any improvement. 

The customer is using the external postgresSQL and after verifying the logs, we couldn't any errors in the PostgresSQL server.
h3. What You Need From Engineering:

Based on our analysis from the logs shared by the customer,  we checked the following PDB all processes are slower and eventually shutdown because of the console query while the PosgresSQL looks to be fine and no clear evidence of any issues or performance problems.
{code:java}// // 2022-03-10T17:37:48.318Z INFO  [p.p.command] [26328-1646933566230] [126181 ms] 'replace facts' command processed for compile-master-i-03f903ab084ac8760.cloudworks.stg.splunkcloud.systems2022-03-10T17:37:48.320Z INFO  [p.p.command] [26329-1646933609444] [113513 ms] 'replace facts' command processed for compile-master-i-0c9498c87b4df2f7a.cloudworks.stg.splunkcloud.systems2022-03-10T17:38:12.223Z INFO  [p.p.command] [26309-1646933223584] [573506 ms] 'replace catalog' command processed for puppet-lb-i-079cb980f5ef06cac.cloudworks.stg.splunkcloud.systems2022-03-10T17:38:14.100Z INFO  [p.p.command] [26330-1646933624551] [111323 ms] 'replace facts' command processed for compile-master-i-069283f01ed3a5595.cloudworks.stg.splunkcloud.systems2022-03-10T17:38:27.020Z INFO  [p.p.command] [26320-1646933440008] [321946 ms] 'replace catalog' command processed for omd-worker-i-00a382be4a2a75a68.vpc-0f6ad8da6f3397703.stg.splunkcloud.systems2022-03-10T17:38:39.703Z INFO  [p.p.command] [26301-1646933119390] [746750 ms] 'replace catalog' command processed for compile-master-i-0ba1013ac09e65788.cloudworks.stg.splunkcloud.systems2022-03-10T17:38:45.390Z WARN  [o.e.j.s.HttpChannel] handleException /metrics/v2/read java.io.IOException: Close SendCallback@12ffc3c7[PROCESSING][i=null,cb=org.eclipse.jetty.server.HttpChannel$SendCallback@2f57a924] in state PROCESSING2022-03-10T17:39:18.246Z INFO  [p.p.command] [26300-1646933117390] [791028 ms] 'replace catalog' command processed for compile-master-i-015d461e6969d4a8e.cloudworks.stg.splunkcloud.systems2022-03-10T17:39:54.135Z INFO  [p.p.command] [26332-1646933679012] [123931 ms] 'replace facts' command processed for compile-master-i-0616b3f6f5b5df3d2.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:08.861Z INFO  [p.p.command] [26334-1646933692314] [118524 ms] 'replace facts' command processed for bastion-i-082f5259bf7f0a439.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:19.807Z INFO  [p.p.command] [26333-1646933682901] [129470 ms] 'replace facts' command processed for compile-master-i-00d667ea3eddbc17b.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:28.999Z INFO  [p.p.command] [26338-1646933754806] [88645 ms] 'replace facts' command processed for compile-master-i-06195bca0100aa1e6.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:28.999Z INFO  [p.p.command] [26314-1646933333701] [594698 ms] 'replace catalog' command processed for puppet-lb-i-09e9a7a0fea610107.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:28.999Z INFO  [p.p.command] [26340-1646933773831] [29397 ms] 'store report' puppet v7.14.0 command processed for bastion-i-082f5259bf7f0a439.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:34.727Z INFO  [p.p.command] [26323-1646933464378] [301851 ms] 'replace catalog' command processed for puppet-lb-i-03a75bf33b78c50b7.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:34.727Z INFO  [p.p.command] [26337-1646933754805] [115021 ms] 'replace facts' command processed for compile-master-i-07656bab49614b96b.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:47.837Z INFO  [p.p.command] [26339-1646933762404] [73629 ms] 'replace facts' command processed for puppet-lb-i-06dad768756f17fc8.cloudworks.stg.splunkcloud.systems2022-03-10T17:40:47.838Z INFO  [p.p.command] [26306-1646933176796] [747702 ms] 'replace catalog' command processed for compile-master-i-0078fe62a6d520804.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:06.114Z INFO  [p.p.command] [26341-1646933804421] [108114 ms] 'replace facts' command processed for puppet-lb-i-0e7bcaab7c788836b.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:08.124Z INFO  [p.p.command] [26342-1646933844687] [99122 ms] 'replace facts' command processed for prometheus-i-0e481f404d2b50fe8.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:13.949Z INFO  [p.t.internal] Shutting down due to JVM shutdown hook.2022-03-10T17:42:13.951Z INFO  [p.t.internal] Beginning shutdown sequence2022-03-10T17:42:23.046Z INFO  [p.p.command] [26345-1646933870203] [104595 ms] 'replace facts' command processed for puppet-lb-i-08fe9b411637fa847.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:23.046Z INFO  [p.p.command] [26344-1646933850114] [108320 ms] 'replace facts' command processed for compile-master-i-051cc78ceb9fc06e5.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:24.909Z INFO  [p.p.command] [26343-1646933844687] [110180 ms] 'replace facts' command processed for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:30.738Z INFO  [p.p.command] [26346-1646933917829] [100994 ms] 'replace facts' command processed for prometheus-i-0c4f9c94aa7072bde.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:34.463Z INFO  [p.p.command] [26349-1646933925393] [80660 ms] 'replace facts' command processed for compile-master-i-01af1a48629ebaa2f.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:39.943Z WARN  [p.p.threadpool] Forcing threadpool shutdown after waiting 10,000ms: java.util.concurrent.ThreadPoolExecutor@2b180dfc[Shutting down, pool size = 12, active threads = 12, queued tasks = 0, completed tasks = 26341]2022-03-10T17:42:39.943Z WARN  [p.p.threadpool] Threadpool forcibly shut down2022-03-10T17:42:39.944Z INFO  [p.p.command] Periodic activities halted2022-03-10T17:42:43.694Z INFO  [p.p.command] [26352-1646933975992] [18785 ms] 'store report' puppet v7.14.0 command processed for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems2022-03-10T17:42:43.696Z INFO  [p.p.c.services] Shutdown request received; puppetdb exiting.2022-03-10T17:42:43.696Z ERROR [p.p.command] [26,352] [store report] Retrying after attempt 0 for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems, due to: java.nio.channels.ClosedByInterruptException [Ljava.lang.Throwable;@1e2678b8java.nio.channels.ClosedByInterruptException: null  {code}
We are not sure why the puppetdb is going down and the customer claimed that it is considered to be a bug. 

 

Austin Boyd (Jira)

unread,
Mar 28, 2022, 2:30:01 AM3/28/22
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Zendesk Ticket Count: 1
Zendesk Ticket IDs: 47705

Austin Boyd (Jira)

unread,
Mar 28, 2022, 2:30:02 AM3/28/22
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages