Jira (PDB-5282) PuppetDB command data results in NullPointerException during startup

24 views
Skip to first unread message

Charlie Sharpsteen (Jira)

unread,
Sep 21, 2021, 5:54:03 PM9/21/21
to puppe...@googlegroups.com
Charlie Sharpsteen created an issue
 
PuppetDB / Bug PDB-5282
PuppetDB command data results in NullPointerException during startup
Issue Type: Bug Bug
Affects Versions: PDB 6.17.0
Assignee: Unassigned
Components: PuppetDB
Created: 2021/09/21 2:53 PM
Priority: Normal Normal
Reporter: Charlie Sharpsteen

Under certain unknown circumstances, data stored in the PuppetDB command queue can cause a java.lang.NullPointerException during startup. This exception prevents the PuppetDB service from exiting maintenance mode and responding to read and write requests.

The most relevant errors printed to the logs are:

2021-09-21T17:43:47.616Z INFO  [c.z.h.HikariDataSource] PDBWritePool - Start completed.
2021-09-21T17:43:47.623Z ERROR [p.p.c.services] Reporting unexpected error to stderr and log
java.lang.NullPointerException: null
    at clojure.core$name.invokeStatic(core.clj:1595)
    at clojure.core$name.invoke(core.clj:1589)
    at clojure.core$mapv$fn__8445.invoke(core.clj:6912)
    at clojure.lang.PersistentVector.reduce(PersistentVector.java:343)
    at clojure.core$reduce.invokeStatic(core.clj:6827)
    at clojure.core$mapv.invokeStatic(core.clj:6903)
    at clojure.core$mapv.invoke(core.clj:6903)
    at puppetlabs.puppetdb.metrics.core$keyword__GT_metric_name.invokeStatic(core.clj:26)
    at puppetlabs.puppetdb.metrics.core$keyword__GT_metric_name.invoke(core.clj:20)
    at puppetlabs.puppetdb.command$create_metrics$to_metric_name_fn__36929.invoke(command.clj:169)
    at puppetlabs.puppetdb.command$create_metrics.invokeStatic(command.clj:172)
    at puppetlabs.puppetdb.command$create_metrics.invoke(command.clj:168)
    at puppetlabs.puppetdb.command$create_metrics_for_command_BANG_$fn__36935.invoke(command.clj:230)
    at clojure.lang.Atom.swap(Atom.java:37)
    at clojure.core$swap_BANG_.invokeStatic(core.clj:2352)
    at clojure.core$swap_BANG_.invoke(core.clj:2345)
    at puppetlabs.puppetdb.command$create_metrics_for_command_BANG_.invokeStatic(command.clj:222)
    at puppetlabs.puppetdb.command$create_metrics_for_command_BANG_.invoke(command.clj:215)
    at puppetlabs.puppetdb.command$inc_cmd_depth.invokeStatic(command.clj:237)
    at puppetlabs.puppetdb.command$inc_cmd_depth.invoke(command.clj:233)
    at puppetlabs.puppetdb.queue$message_loader$fn__19252$fn__19253.invoke(queue.clj:475)
    at puppetlabs.stockpile.queue$reduce$fn__7878.invoke(queue.clj:224)
    at clojure.lang.ArrayChunk.reduce(ArrayChunk.java:58)
    at clojure.core.protocols$fn__8154.invokeStatic(protocols.clj:136)
    at clojure.core.protocols$fn__8154.invoke(protocols.clj:124)
    at clojure.core.protocols$fn__8114$G__8109__8123.invoke(protocols.clj:19)
    at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:31)
    at clojure.core.protocols$fn__8146.invokeStatic(protocols.clj:75)
    at clojure.core.protocols$fn__8146.invoke(protocols.clj:75)
    at clojure.core.protocols$fn__8088$G__8083__8101.invoke(protocols.clj:13)
    at clojure.core$reduce.invokeStatic(core.clj:6828)
    at clojure.core$reduce.invoke(core.clj:6810)
    at puppetlabs.stockpile.queue$reduce_paths.invokeStatic(queue.clj:133)
    at puppetlabs.stockpile.queue$reduce_paths.invoke(queue.clj:130)
    at puppetlabs.stockpile.queue$reduce.invokeStatic(queue.clj:220)
    at puppetlabs.stockpile.queue$reduce.invoke(queue.clj:213)
    at puppetlabs.puppetdb.queue$message_loader$fn__19252.invoke(queue.clj:466)
    at puppetlabs.puppetdb.cli.services$init_queue$fn__41456.invoke(services.clj:798)
    at clojure.core$binding_conveyor_fn$fn__5754.invoke(core.clj:2030)
    at clojure.lang.AFn.call(AFn.java:18)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
...
2021-09-21T17:43:47.636Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-1 to stderr and log
java.lang.NullPointerException: null
    at metrics.meters$mark_BANG_.invokeStatic(meters.clj:76)
    at metrics.meters$mark_BANG_.invoke(meters.clj:72)
    at metrics.meters$mark_BANG_.invokeStatic(meters.clj:74)
    at metrics.meters$mark_BANG_.invoke(meters.clj:72)
    at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invokeStatic(command.clj:244)
    at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invoke(command.clj:240)
    at puppetlabs.puppetdb.command$process_message$retry__37302.invoke(command.clj:755)
    at puppetlabs.puppetdb.command$process_message.invokeStatic(command.clj:810)
    at puppetlabs.puppetdb.command$process_message.invoke(command.clj:742)
    at puppetlabs.puppetdb.command$message_handler$fn__37314.invoke(command.clj:820)
    at puppetlabs.puppetdb.threadpool$dochan$fn__36891$fn__36892.invoke(threadpool.clj:116)
    at puppetlabs.puppetdb.threadpool$gated_execute$fn__36853.invoke(threadpool.clj:69)
    at clojure.lang.AFn.run(AFn.java:22)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

Clearing the command queue allowed startup to proceed:

find /opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q -type f -delete
systemctl restart pe-puppetdb

This resolved the issue, but removed the problematic data from further troubleshooting. From PE support script output, the only item in the queue at the time was a "remove node" command:

273855120    4 -rw-r-----   1 pe-puppetdb pe-puppetdb       68 Sep 16 08:07 /opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q/2410881-1631779671729-252_rm-node_3_.json

On interesting fact about the command is that it appears to be 5 days old.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Austin Boyd (Jira)

unread,
Sep 21, 2021, 5:55:02 PM9/21/21
to puppe...@googlegroups.com
Austin Boyd updated an issue
Change By: Austin Boyd
Zendesk Ticket Count: 1
Zendesk Ticket IDs: 45879

Austin Boyd (Jira)

unread,
Sep 21, 2021, 5:55:02 PM9/21/21
to puppe...@googlegroups.com

Austin Blatt (Jira)

unread,
Sep 21, 2021, 6:24:07 PM9/21/21
to puppe...@googlegroups.com
Austin Blatt commented on Bug PDB-5282
 
Re: PuppetDB command data results in NullPointerException during startup

Reproduction case using curl

curl -X POST 'http://localhost:8080/pdb/cmd/v1?command=deactivate%20node&version=3&certname=&producer-timestamp=2019-07-16T09:17:25-0700' \
  -H 'Content-Type:application/json' \
  -d '{ "certname": null, "producer_timestamp": "2020-01-15T15:43:28-08:00" }'

errors with

2021-09-21 15:19:48,093 ERROR [cmd-proc-thread-6] [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-6 to stderr and log
java.lang.NullPointerException: null
        at metrics.meters$mark_BANG_.invokeStatic(meters.clj:76)
        at metrics.meters$mark_BANG_.invoke(meters.clj:72)
        at metrics.meters$mark_BANG_.invokeStatic(meters.clj:74)
        at metrics.meters$mark_BANG_.invoke(meters.clj:72)
        at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invokeStatic(command.clj:244)
        at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invoke(command.clj:240)
        at puppetlabs.puppetdb.command$process_message$retry__37699.invoke(command.clj:755)
        at puppetlabs.puppetdb.command$process_message.invokeStatic(command.clj:810)
        at puppetlabs.puppetdb.command$process_message.invoke(command.clj:742)
...

Austin Blatt (Jira)

unread,
Sep 21, 2021, 6:25:04 PM9/21/21
to puppe...@googlegroups.com
Austin Blatt commented on Bug PDB-5282

Also sounds like it was reproducible in PE with

puppet node deactivate ''

Austin Blatt (Jira)

unread,
Sep 21, 2021, 6:29:03 PM9/21/21
to puppe...@googlegroups.com
Austin Blatt updated an issue
 
Change By: Austin Blatt
Acceptance Criteria: * When using the "new" command syntax, the HTTP endpoint should reject a command that has the empty string for a certname ie. &certname=& in the URI
* When using the "old" command syntax, the command should be rejected and DLOed without any retries
Summary: Using an empty string as a PuppetDB command certname data results in a NullPointerException during startup

Austin Blatt (Jira)

unread,
Sep 21, 2021, 6:29:03 PM9/21/21
to puppe...@googlegroups.com
Austin Blatt commented on Bug PDB-5282
 
Re: PuppetDB command data results in NullPointerException during startup

One more reproduction case using the old command API

curl -X POST http://localhost:8080/pdb/cmd/v1 \
  -H 'Content-Type:application/json' \
  -d '{"command": "deactivate node", "version": 3, "payload": { "certname": "", "producer_timestamp": "2020-01-15T15:43:28-08:00" } }'

Austin Blatt (Jira)

unread,
Sep 21, 2021, 6:33:02 PM9/21/21
to puppe...@googlegroups.com
Austin Blatt updated an issue
Change By: Austin Blatt
Acceptance Criteria:
* When using the "new" command syntax, the HTTP endpoint should reject a command that has the empty string for a certname ie. &certname=& in the URI
* When using the "old" command syntax, if we read load the command into memory to read the certname in order to store it in stockpile we should be rejected reject it then. Otherwise the command processor should reject it and DLOed put it in the DLO without any retries .

Austin Blatt (Jira)

unread,
Sep 21, 2021, 6:33:03 PM9/21/21
to puppe...@googlegroups.com
Austin Blatt updated an issue
Change By: Austin Blatt
Acceptance Criteria:
* When using the "new" command syntax, the HTTP endpoint should reject a command that has the empty string for a certname ie. &certname=& in the URI
* When using the "old" command syntax, if we read load the command into memory to read the certname in order to store it in stockpile we should reject it then. Otherwise the command processor should reject it and put it in the DLO without any retries.

Charlie Sharpsteen (Jira)

unread,
Sep 21, 2021, 6:39:02 PM9/21/21
to puppe...@googlegroups.com
Charlie Sharpsteen updated an issue
Change By: Charlie Sharpsteen
PuppetDB "commands" are operations to update the data associated with puppet nodes. These commands are identified by certname, and odd things can happen if an empty string or other null value is provided as the certname. One consequence is a {{java.lang.NullPointerException}} during startup. This exception prevents the PuppetDB service from exiting maintenance mode and responding to read and write requests.

h2. Reproduction Case

- Install PE 2019.8.8 on a CentOS 7 node

  - Deactivate an empty certname: {{puppet node deactivate ''}}

  - Re-start the {{pe-puppetdb}} service

h3
. Outcome

The deactivate command results in the following error message being recorded to the PuppetDB log:

{noformat}
2021-09-21T22:20:47.271Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-13 to stderr and log

java.lang.NullPointerException: null
at metrics.meters$mark_BANG_.invokeStatic(meters.clj:76)
at metrics.meters$mark_BANG_.invoke(meters.clj:72)
at metrics.meters$mark_BANG_.invokeStatic(meters.clj:74)
at metrics.meters$mark_BANG_.invoke(meters.clj:72)
at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invokeStatic(command.clj:244)
at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invoke(command.clj:240)
at puppetlabs.puppetdb.command$process_message$retry__37262.invoke(command.clj:755)

at puppetlabs.puppetdb.command$process_message.invokeStatic(command.clj:810)
at puppetlabs.puppetdb.command$process_message.invoke(command.clj:742)
at puppetlabs.puppetdb.command$message_handler$fn__37274.invoke(command.clj:820)
at puppetlabs.puppetdb.threadpool$dochan$fn__36851$fn__36852.invoke(threadpool.clj:116)
at puppetlabs.puppetdb.threadpool$gated_execute$fn__36813.invoke(threadpool.clj:69)

at clojure.lang.AFn.run(AFn.java:22)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
{noformat}

This also leaves behind a command int the PuppetDB processing queue with an empty certname:

{noformat}
# find /opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q -type f -print -exec cat {} \;

/opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q/50-1632262847254-115_rm-node_3_.json
{"certname":"","producer_timestamp":"2021-09-21T22:20:47.139+00:00"}
{noformat}

Upon restart, the PuppetDB services fails to initialize the command processing pool and never exits maintenance mode:

{noformat}
2021-09-21T22:33:58.072Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-1 to stderr and log

java.lang.NullPointerException: null
        at metrics.meters$mark_BANG_.invokeStatic(meters.clj:76)
        at metrics.meters$mark_BANG_.invoke(meters.clj:72)
        at metrics.meters$mark_BANG_.invokeStatic(meters.clj:74)
        at metrics.meters$mark_BANG_.invoke(meters.clj:72)
        at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invokeStatic(command.clj:244)
        at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invoke(command.clj:240)
        at puppetlabs.puppetdb.command$process_message$retry__37262.invoke(command.clj:755)

        at puppetlabs.puppetdb.command$process_message.invokeStatic(command.clj:810)
        at puppetlabs.puppetdb.command$process_message.invoke(command.clj:742)
        at puppetlabs.puppetdb.command$message_handler$fn__37274.invoke(command.clj:820)
        at puppetlabs.puppetdb.threadpool$dochan$fn__36851$fn__36852.invoke(threadpool.clj:116)
        at puppetlabs.puppetdb.threadpool$gated_execute$fn__36813.invoke(threadpool.clj:69)

        at clojure.lang.AFn.run(AFn.java:22)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
{noformat}

h3. Expected Outcome

An empty certname in a command submission does not block PuppetDB startup

h3. Recommended Workaround

Clear the PuppetDB command queue and re-start the service:

{code:bash}

find /opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q -type f -delete
systemctl restart pe-puppetdb
{code}

Charlie Sharpsteen (Jira)

unread,
Sep 21, 2021, 6:39:02 PM9/21/21
to puppe...@googlegroups.com
Charlie Sharpsteen updated an issue
Under certain unknown circumstances, data stored in the PuppetDB command queue "commands" are operations to update the data associated with puppet nodes. These commands are identified by certname, and odd things can cause happen if an empty string or other null value is provided as the certname. One consequence is a {{java.lang.NullPointerException}} during startup. This exception prevents the PuppetDB service from exiting maintenance mode and responding to read and write requests.

h2. Reproduction Case

- Install PE 2019.8.8 on a CentOS 7 node

  - Deactivate an empty certname: {{puppet node deactivate ''}}

  - Re-start the {{pe-puppetdb}} service

h3 Outcome

The most relevant errors printed deactivate command results in the following error message being recorded to the logs are PuppetDB log :

{noformat}
2021-09-
21T17 21T22 : 43 20 :47. 616Z INFO  [c.z.h.HikariDataSource] PDBWritePool - Start completed.
2021-09-21T17:43:47.623Z
271Z ERROR [p.p. c.services threadpool ] Reporting unexpected error from thread cmd-proc-thread-13 to stderr and log
java.lang.NullPointerException: null
    at
clojure metrics . core meters $ name mark_BANG_ .invokeStatic( core meters .clj: 1595 76 )
    at
clojure metrics . core meters $ name mark_BANG_ .invoke( core meters .clj: 1589 72 )
    at
clojure.core$mapv$fn__8445.invoke(core.clj:6912)

    at clojure.lang.PersistentVector.reduce(PersistentVector.java:343)
    at clojure.core$reduce.invokeStatic(core.clj:6827)
    at clojure.core$mapv.invokeStatic(core.clj:6903)
    at clojure.core$mapv.invoke(core.clj:6903)
    at puppetlabs.puppetdb. metrics. core meters $ keyword__GT_metric_name mark_BANG_ .invokeStatic( core meters .clj: 26 74 )
    at
puppetlabs.puppetdb. metrics. core meters $ keyword__GT_metric_name mark_BANG_ .invoke( core meters .clj: 20 72 )
    at puppetlabs.puppetdb.command$
create_metrics$to_metric_name_fn__36929 mark_both_metrics_BANG_ . invoke(command.clj:169)
    at puppetlabs.puppetdb.command$create_metrics.
invokeStatic(command.clj: 172 244 )
    at puppetlabs.puppetdb.command$
create_metrics mark_both_metrics_BANG_ .invoke(command.clj: 168 240 )
    at puppetlabs.puppetdb.command$
create_metrics_for_command_BANG_ process_message $ fn__36935 retry__37262 .invoke(command.clj: 230 755 )
    at
clojure.lang.Atom.swap(Atom.java:37)

    at clojure.core$swap_BANG_.invokeStatic(core.clj:2352)
    at clojure.core$swap_BANG_.invoke(core.clj:2345)
    at puppetlabs.puppetdb.command$ create_metrics_for_command_BANG_ process_message .invokeStatic(command.clj: 222 810 )
    at puppetlabs.puppetdb.command$
create_metrics_for_command_BANG_ process_message .invoke(command.clj: 215 742 )
    at puppetlabs.puppetdb.command$
inc_cmd_depth.invokeStatic(command.clj:237)
    at puppetlabs.puppetdb.command
message_handler $ inc_cmd_depth fn__37274 .invoke(command.clj: 233 820 )
    at puppetlabs.puppetdb.
queue threadpool $ message_loader dochan $ fn__19252 fn__36851 $ fn__19253 fn__36852 .invoke( queue threadpool .clj: 475 116 )
    at puppetlabs.
stockpile.queue$reduce$fn__7878.invoke(queue.clj:224)

    at clojure.lang.ArrayChunk.reduce(ArrayChunk.java:58)
    at clojure.core.protocols$fn__8154.invokeStatic(protocols.clj:136)
    at clojure.core.protocols$fn__8154.invoke(protocols.clj:124)
    at clojure.core.protocols$fn__8114$G__8109__8123.invoke(protocols.clj:19)
    at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:31)
    at clojure.core.protocols$fn__8146.invokeStatic(protocols.clj:75)
    at clojure.core.protocols$fn__8146.invoke(protocols.clj:75)
    at clojure.core.protocols$fn__8088$G__8083__8101.invoke(protocols.clj:13)
    at clojure.core$reduce.invokeStatic(core.clj:6828)
    at clojure.core$reduce.invoke(core.clj:6810)
    at puppetlabs.stockpile.queue$reduce_paths.invokeStatic(queue.clj:133)
    at puppetlabs.stockpile.queue$reduce_paths.invoke(queue.clj:130)
    at puppetlabs.stockpile.queue$reduce.invokeStatic(queue.clj:220)
    at puppetlabs.stockpile.queue$reduce.invoke(queue.clj:213)
    at puppetlabs. puppetdb. queue threadpool $ message_loader gated_execute $ fn__19252 fn__36813 .invoke( queue threadpool .clj: 466 69 )
    at
puppetlabs.puppetdb.cli.services$init_queue$fn__41456.invoke(services.clj:798)
    at
clojure. core$binding_conveyor_fn$fn__5754.invoke(core.clj:2030)
    at clojure.
lang.AFn. call run (AFn.java: 18 22 )
    at java.base/java.util.concurrent.
FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.
ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)

    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
{noformat}

This also leaves behind a command int the PuppetDB processing queue with an empty certname:

{noformat}
# find /opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q -type f -print -exec cat {} \;

/opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q/50-1632262847254-115_rm-node_3_ . .. json
{"certname":"","producer_timestamp":" 2021-09- 21T17 21T22 : 43 20 :47. 636Z 139+00:00"}

{noformat}

Upon restart, the PuppetDB services fails to initialize the command processing pool and never exits maintenance mode:

{noformat}
2021-09-21T22:33:58.072Z
ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-1 to stderr and log

java.lang.NullPointerException: null
    at metrics.meters$mark_BANG_.invokeStatic(meters.clj:76)
    at metrics.meters$mark_BANG_.invoke(meters.clj:72)
    at metrics.meters$mark_BANG_.invokeStatic(meters.clj:74)
    at metrics.meters$mark_BANG_.invoke(meters.clj:72)
    at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invokeStatic(command.clj:244)
    at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invoke(command.clj:240)
    at puppetlabs.puppetdb.command$process_message$ retry__37302 retry__37262 .invoke(command.clj:755)

    at puppetlabs.puppetdb.command$process_message.invokeStatic(command.clj:810)
    at puppetlabs.puppetdb.command$process_message.invoke(command.clj:742)
    at puppetlabs.puppetdb.command$message_handler$ fn__37314 fn__37274 .invoke(command.clj:820)
    at puppetlabs.puppetdb.threadpool$dochan$
fn__36891 fn__36851 $ fn__36892 fn__36852 .invoke(threadpool.clj:116)
    at puppetlabs.puppetdb.threadpool$gated_execute$
fn__36853 fn__36813 .invoke(threadpool.clj:69)

    at clojure.lang.AFn.run(AFn.java:22)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
{noformat}

Clearing h3. Expected Outcome


An empty certname in a command submission does not block PuppetDB startup

h3. Recommended Workaround

Clear the PuppetDB command queue allowed startup to proceed and re-start the service :


{code:bash}
find /opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q -type f -delete
systemctl restart pe-puppetdb
{code}

This resolved the issue, but removed the problematic data from further troubleshooting. From PE support script output, the only item in the queue at the time was a "remove node" command:

{code:bash}

273855120    4 -rw-r-----   1 pe-puppetdb pe-puppetdb       68 Sep 16 08:07 /opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q/2410881-1631779671729-252_rm-node_3_.json
{code}


On interesting fact about the command is that it appears to be 5 days old.

Austin Blatt (Jira)

unread,
Sep 21, 2021, 7:02:02 PM9/21/21
to puppe...@googlegroups.com
Austin Blatt commented on Bug PDB-5282
 
Re: PuppetDB command data results in NullPointerException during startup

When this command is submitted to a PuppetDB set up to sync, startup will hang before the sync phase.

I wonder if it is attempting to process the command in memory but failing to handle the error thrown or DLO/retry logic correctly because it is not using a command processing thread?

Charlie Sharpsteen (Jira)

unread,
Sep 21, 2021, 7:25:03 PM9/21/21
to puppe...@googlegroups.com
Charlie Sharpsteen updated an issue
Change By: Charlie Sharpsteen
PuppetDB "commands" are operations to update the data associated with puppet nodes. These commands are identified by certname, and odd things can happen if an empty string or other null value is provided as the certname. One consequence is a {{java.lang.NullPointerException}} during startup. This   

When PE DR is enabled, this
exception prevents the PuppetDB service from exiting maintenance mode and responding to read and write requests.

h2. Reproduction Case

- Install PE 2019.8.8 on a CentOS 7 node
and add a DR replica

  - Deactivate an empty certname: {{puppet node deactivate ''}}

  - Re-start the {{pe-puppetdb}} service

h3. Outcome

The deactivate command results in the following error message being recorded to the PuppetDB log:

{noformat}
2021-09-
21T22 21T23 : 20 21 : 47 07 . 271Z 566Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread- 13 2 to stderr and log

java.lang.NullPointerException: null
at metrics.meters$mark_BANG_.invokeStatic(meters.clj:76)
at metrics.meters$mark_BANG_.invoke(meters.clj:72)
at metrics.meters$mark_BANG_.invokeStatic(meters.clj:74)
at metrics.meters$mark_BANG_.invoke(meters.clj:72)
at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invokeStatic(command.clj:244)
at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invoke(command.clj:240)
at puppetlabs.puppetdb.command$process_message$retry__37262.invoke(command.clj:755)

at puppetlabs.puppetdb.command$process_message.invokeStatic(command.clj:810)
at puppetlabs.puppetdb.command$process_message.invoke(command.clj:742)
at puppetlabs.puppetdb.command$message_handler$fn__37274.invoke(command.clj:820)
at puppetlabs.puppetdb.threadpool$dochan$fn__36851$fn__36852.invoke(threadpool.clj:116)
at puppetlabs.puppetdb.threadpool$gated_execute$fn__36813.invoke(threadpool.clj:69)

at clojure.lang.AFn.run(AFn.java:22)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
{noformat}

This also leaves behind a command int the PuppetDB processing queue with an empty certname:

{noformat}
# find /opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q -type f -print -exec cat {} \;

/opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q/ 50 20 - 1632262847254 1632266467556 - 115_rm 103_rm -node_3_.json
{"certname":"","producer_timestamp":"2021-09-
21T22 21T23 : 20 21 : 47 07 . 139 453 +00:00"}

{noformat}

Upon restart, the PuppetDB services fails to initialize the command processing pool and never exits maintenance mode:

{noformat}
2021-09- 21T22 21T23 : 33 22 : 58 20 . 072Z 499Z ERROR [p.p.threadpool] Reporting unexpected error from thread cmd-proc-thread-1 to stderr and log

java.lang.NullPointerException: null
        at metrics.meters$mark_BANG_.invokeStatic(meters.clj:76)
        at metrics.meters$mark_BANG_.invoke(meters.clj:72)
        at metrics.meters$mark_BANG_.invokeStatic(meters.clj:74)
        at metrics.meters$mark_BANG_.invoke(meters.clj:72)
        at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invokeStatic(command.clj:244)
        at puppetlabs.puppetdb.command$mark_both_metrics_BANG_.invoke(command.clj:240)
        at puppetlabs.puppetdb.command$process_message$retry__37262.invoke(command.clj:755)

        at puppetlabs.puppetdb.command$process_message.invokeStatic(command.clj:810)
        at puppetlabs.puppetdb.command$process_message.invoke(command.clj:742)
        at puppetlabs.puppetdb.command$message_handler$fn__37274.invoke(command.clj:820)
        at puppetlabs.puppetdb.threadpool$dochan$fn__36851$fn__36852.invoke(threadpool.clj:116)
        at puppetlabs.puppetdb.threadpool$gated_execute$fn__36813.invoke(threadpool.clj:69)

        at clojure.lang.AFn.run(AFn.java:22)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
2021-09-21T23:22:20.715Z INFO  [p.p.c.services] Finished sweep of stale nodes (threshold: 7 days)
2021-09-21T23:22:20.717Z INFO  [p.p.c.services] Starting purge deactivated and expired nodes (threshold: 14 days)
2021-09-21T23:22:20.760Z INFO  [p.p.c.services] Finished purge deactivated and expired nodes (threshold: 14 days)
2021-09-21T23:22:20.761Z INFO  [p.p.c.services] Starting sweep of stale reports (threshold: 14 days) and resource events (threshold: 14 days)
2021-09-21T23:22:20.849Z INFO  [p.p.c.services] Finished sweep of stale reports (threshold: 14 days) and resource events (threshold: 14 days)
2021-09-21T23:22:20.849Z INFO  [p.p.c.services] Starting gc packages
2021-09-21T23:22:20.852Z INFO  [p.p.c.services] Finished gc packages
2021-09-21T23:22:20.853Z INFO  [p.p.c.services] Starting database garbage collection
2021-09-21T23:22:20.909Z INFO  [p.p.s.storage] Starting sweep of unused fact paths
2021-09-21T23:22:20.932Z INFO  [p.p.s.storage] Finished sweep of unused fact paths
2021-09-21T23:22:20.940Z INFO  [p.p.c.services] Finished database garbage collection
{noformat}

There should be a "PuppetDB finished starting, disabling maintenance mode" message in the above.

h3. Expected Outcome

An empty certname in a command submission does not block PuppetDB startup

h3. Recommended Workaround

Clear the PuppetDB command queue and re-start the service:


{code:bash}
find /opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q -type f -delete
systemctl restart pe-puppetdb
{code}

Charlie Sharpsteen (Jira)

unread,
Sep 21, 2021, 7:30:03 PM9/21/21
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages