Summary: The customer reported the issue on running the node query with 50 nodes, it got failed with the console error "Error resolving pql query" The query used was an arbitrary selection of 50 nodes using the OR condition. Customer Organization Name : Splunk Release Engineering Did it ever work? As per the ticket, it worked for 20 nodes and in the case of 50 nodes, it got failed with the PQL error. When did the issue first occur? Not sure. The customer reported this issue as a possible bug. If applicable, did you do a run with --debug --trace? N/A OS & Version: Ubuntu- 18.04.6 LTS (Bionic Beaver) PE Version on affected machine: 2021.5 Master Or Agent Affected: Master All In One Or Split Install: Split install Steps Taken To Reproduce: The customer has provided the below steps to reproduce this error 1. Perform an expensive PQL query from the console. In our case, we were able to reproduce the problem with 30+ OR statements in a query: inventory[certname] { certname ~ "bastion-i-0df61833aa00e6acc" or certname ~ "compile-master-i-05d0fd872d2b1964f" or certname ~ "compile-master-i-061f9a63122235787"} (and so on, of course, with 30+ nodes) 2. There is no step 2! When running this query, I get the error in the console "Error resolving pql query: Server Error". It appears that the UI timeout occurs after 60 seconds, and the user would have no idea they caused a problem. On the backend, PuppetDB queries start getting slower and slower, and the load on the server in question climbs. In at least one test, if left unchecked, the load runs away and `apport` runs to try to gather a crash dump.
| // 2022-03-10T17:26:59.813Z WARN [p.p.h.query] Parsing PQL took 3,705.52 ms: "nodes[certname] { report_timestamp > \"2022-03-01T00:00:00Z\" }" 2022-03-10T17:27:05.470Z WARN [p.p.h.query] Parsing PQL took 9,362.058 ms: "resources[certname] {\n type = 'Class' and\n title = 'Puppet_enterprise::Profile::Master' and\n nodes |
| { deactivated is null and expired is null } \n order by certname\n }" |
Files Acquired: Support Bundle from Primary master and external puppet database Relevant Error Messages: We can see the following error in the puppetdb logs which is matching with the customer who has tried to execute the query.
| // error from the puppetdb Log |
| at java.base/java.lang.Thread.run(Thread.java:829) |
| 2022-03-10T17:42:47.401Z INFO [p.p.c.services] Periodic activities halted |
| 2022-03-10T17:42:47.402Z INFO [c.z.h.HikariDataSource] PDBWritePool - Shutdown initiated... |
| 2022-03-10T17:43:03.587Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-392 to stderr and log |
| java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@7294c26[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@22c31f38[Wrapped task = puppetlabs.puppetdb.command$schedule_delayed_message$fn__37749@f8d4060]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@6d53b1d1[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] |
| at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) |
| |
| ... |
| |
| at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) |
| at java.base/java.lang.Thread.run(Thread.java:829) |
| 2022-03-10T17:43:03.587Z INFO [c.z.h.HikariDataSource] PDBWritePool - Shutdown completed. |
| 2022-03-10T17:43:03.588Z INFO [c.z.h.HikariDataSource] PDBReadPool - Shutdown initiated... |
What we have done so far. While investigating the logs, we can see that the puppetDb services got went down. We recommended increasing the puppet db heap as well and doesn't give any improvement. The customer is using the external postgresSQL and after verifying the logs, we couldn't any errors in the PostgresSQL server. What You Need From Engineering: Based on our analysis from the logs shared by the customer, we checked the following PDB all processes are slower and eventually shutdown because of the console query while the PosgresSQL looks to be fine and no clear evidence of any issues or performance problems.
| // 2022-03-10T17:37:48.318Z INFO [p.p.command] [26328-1646933566230] [126181 ms] 'replace facts' command processed for compile-master-i-03f903ab084ac8760.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:37:48.320Z INFO [p.p.command] [26329-1646933609444] [113513 ms] 'replace facts' command processed for compile-master-i-0c9498c87b4df2f7a.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:38:12.223Z INFO [p.p.command] [26309-1646933223584] [573506 ms] 'replace catalog' command processed for puppet-lb-i-079cb980f5ef06cac.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:38:14.100Z INFO [p.p.command] [26330-1646933624551] [111323 ms] 'replace facts' command processed for compile-master-i-069283f01ed3a5595.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:38:27.020Z INFO [p.p.command] [26320-1646933440008] [321946 ms] 'replace catalog' command processed for omd-worker-i-00a382be4a2a75a68.vpc-0f6ad8da6f3397703.stg.splunkcloud.systems |
| 2022-03-10T17:38:39.703Z INFO [p.p.command] [26301-1646933119390] [746750 ms] 'replace catalog' command processed for compile-master-i-0ba1013ac09e65788.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:38:45.390Z WARN [o.e.j.s.HttpChannel] handleException /metrics/v2/read java.io.IOException: Close SendCallback@12ffc3c7[PROCESSING][i=null,cb=org.eclipse.jetty.server.HttpChannel$SendCallback@2f57a924] in state PROCESSING |
| 2022-03-10T17:39:18.246Z INFO [p.p.command] [26300-1646933117390] [791028 ms] 'replace catalog' command processed for compile-master-i-015d461e6969d4a8e.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:39:54.135Z INFO [p.p.command] [26332-1646933679012] [123931 ms] 'replace facts' command processed for compile-master-i-0616b3f6f5b5df3d2.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:40:08.861Z INFO [p.p.command] [26334-1646933692314] [118524 ms] 'replace facts' command processed for bastion-i-082f5259bf7f0a439.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:40:19.807Z INFO [p.p.command] [26333-1646933682901] [129470 ms] 'replace facts' command processed for compile-master-i-00d667ea3eddbc17b.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:40:28.999Z INFO [p.p.command] [26338-1646933754806] [88645 ms] 'replace facts' command processed for compile-master-i-06195bca0100aa1e6.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:40:28.999Z INFO [p.p.command] [26314-1646933333701] [594698 ms] 'replace catalog' command processed for puppet-lb-i-09e9a7a0fea610107.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:40:28.999Z INFO [p.p.command] [26340-1646933773831] [29397 ms] 'store report' puppet v7.14.0 command processed for bastion-i-082f5259bf7f0a439.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:40:34.727Z INFO [p.p.command] [26323-1646933464378] [301851 ms] 'replace catalog' command processed for puppet-lb-i-03a75bf33b78c50b7.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:40:34.727Z INFO [p.p.command] [26337-1646933754805] [115021 ms] 'replace facts' command processed for compile-master-i-07656bab49614b96b.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:40:47.837Z INFO [p.p.command] [26339-1646933762404] [73629 ms] 'replace facts' command processed for puppet-lb-i-06dad768756f17fc8.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:40:47.838Z INFO [p.p.command] [26306-1646933176796] [747702 ms] 'replace catalog' command processed for compile-master-i-0078fe62a6d520804.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:42:06.114Z INFO [p.p.command] [26341-1646933804421] [108114 ms] 'replace facts' command processed for puppet-lb-i-0e7bcaab7c788836b.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:42:08.124Z INFO [p.p.command] [26342-1646933844687] [99122 ms] 'replace facts' command processed for prometheus-i-0e481f404d2b50fe8.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:42:13.949Z INFO [p.t.internal] Shutting down due to JVM shutdown hook. |
| 2022-03-10T17:42:13.951Z INFO [p.t.internal] Beginning shutdown sequence |
| 2022-03-10T17:42:23.046Z INFO [p.p.command] [26345-1646933870203] [104595 ms] 'replace facts' command processed for puppet-lb-i-08fe9b411637fa847.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:42:23.046Z INFO [p.p.command] [26344-1646933850114] [108320 ms] 'replace facts' command processed for compile-master-i-051cc78ceb9fc06e5.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:42:24.909Z INFO [p.p.command] [26343-1646933844687] [110180 ms] 'replace facts' command processed for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:42:30.738Z INFO [p.p.command] [26346-1646933917829] [100994 ms] 'replace facts' command processed for prometheus-i-0c4f9c94aa7072bde.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:42:34.463Z INFO [p.p.command] [26349-1646933925393] [80660 ms] 'replace facts' command processed for compile-master-i-01af1a48629ebaa2f.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:42:39.943Z WARN [p.p.threadpool] Forcing threadpool shutdown after waiting 10,000ms: java.util.concurrent.ThreadPoolExecutor@2b180dfc[Shutting down, pool size = 12, active threads = 12, queued tasks = 0, completed tasks = 26341] |
| 2022-03-10T17:42:39.943Z WARN [p.p.threadpool] Threadpool forcibly shut down |
| 2022-03-10T17:42:39.944Z INFO [p.p.command] Periodic activities halted |
| 2022-03-10T17:42:43.694Z INFO [p.p.command] [26352-1646933975992] [18785 ms] 'store report' puppet v7.14.0 command processed for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems |
| 2022-03-10T17:42:43.696Z INFO [p.p.c.services] Shutdown request received; puppetdb exiting. |
| 2022-03-10T17:42:43.696Z ERROR [p.p.command] [26,352] [store report] Retrying after attempt 0 for prometheus-i-04f3c34f4f6c072d7.cloudworks.stg.splunkcloud.systems, due to: java.nio.channels.ClosedByInterruptException [Ljava.lang.Throwable;@1e2678b8 |
| java.nio.channels.ClosedByInterruptException: null |
We are not sure why the puppetdb is going down and the customer claimed that it is considered to be a bug. For more details on the error, please refer to the attachment from the customer. [^Splunk - Teapot Game Day_Customer.pdf] Zendesk Ticket. https://puppetlabs.zendesk.com/agent/tickets/47705 |