Jira (PDB-2535) puppetdb stuck in maintenance mode on startup

98 views
Skip to first unread message

Daniel Urist (JIRA)

unread,
Mar 15, 2016, 5:35:08 PM3/15/16
to puppe...@googlegroups.com
Daniel Urist created an issue
 
PuppetDB / Bug PDB-2535
puppetdb stuck in maintenance mode on startup
Issue Type: Bug Bug
Affects Versions: PDB 3.2.4
Assignee: Unassigned
Created: 2016/03/15 2:34 PM
Environment:

Debian 8 (Jessie)
puppetdb 3.2.4-1puppetlabs1

Priority: Normal Normal
Reporter: Daniel Urist

puppetdb was running with no issues for weeks, then hung and now logs the following on startup:

2016-03-15 15:22:15,183 INFO  [o.e.j.s.Server] jetty-9.2.z-SNAPSHOT
2016-03-15 15:22:15,257 INFO  [o.e.j.s.h.ContextHandler] Started o.e.j.s.h.ContextHandler@31cc7610{/pdb,null,AVAILABLE}
2016-03-15 15:22:15,283 INFO  [o.e.j.s.ServerConnector] Started ServerConnector@456277f5{HTTP/1.1}{localhost:8080}
2016-03-15 15:22:15,382 INFO  [o.e.j.s.ServerConnector] Started ServerConnector@6ef9942d{SSL-HTTP/1.1}{dev-puppetdb.ucar.edu:8081}
2016-03-15 15:22:15,383 INFO  [o.e.j.s.Server] Started @30300ms
2016-03-15 15:22:15,428 INFO  [p.p.c.services] PuppetDB version 3.2.4
2016-03-15 15:22:15,589 INFO  [p.p.s.migrate] There are no pending migrations
2016-03-15 15:22:15,612 INFO  [c.j.b.BoneCP] Shutting down connection pool...
2016-03-15 15:22:15,616 INFO  [c.j.b.BoneCP] Connection pool has been shutdown.
2016-03-15 15:22:15,619 INFO  [p.p.c.services] Starting broker
 
2016-03-15 15:23:35,088 INFO  [p.p.pdb-routing] HTTP request received while in maintenance mode
2016-03-15 15:23:37,775 INFO  [p.p.pdb-routing] HTTP request received while in maintenance mode
2016-03-15 15:24:01,838 INFO  [p.p.pdb-routing] HTTP request received while in maintenance mode
2016-03-15 15:24:01,848 WARN  [o.e.j.h.HttpParser] badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp@7ed11886{r=1,c=false,a=IDLE,uri=-}
2016-03-15 15:24:04,072 INFO  [p.p.pdb-routing] HTTP request received while in maintenance mode
2016-03-15 15:24:29,776 INFO  [p.p.pdb-routing] HTTP request received while in maintenance mode
2016-03-15 15:24:32,812 INFO  [p.p.pdb-routing] HTTP request received while in maintenance mode

Puppetdb appears to be permanently stuck in maintenance mode.

I've tried bumping up the memory and restarting, going as high as "-Xmx2048m -Xms1024m", but that doesn't make any difference. Since I'm using virtual and exported resources, this breaks my whole infrastructure.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v6.4.12#64027-sha1:e3691cc)
Atlassian logo

Russell Mull (JIRA)

unread,
Mar 15, 2016, 6:01:03 PM3/15/16
to puppe...@googlegroups.com
Russell Mull assigned an issue to Russell Mull
Change By: Russell Mull
Assignee: Russell Mull

Russell Mull (JIRA)

unread,
Mar 15, 2016, 6:05:03 PM3/15/16
to puppe...@googlegroups.com
Russell Mull commented on Bug PDB-2535
 
Re: puppetdb stuck in maintenance mode on startup

Daniel Urist Wow, this is quite bad. It's not really apparent from these logs what's going on, but I think we can find out. Can you do the following:
1. Capture stack traces of the hung puppetdb by running jstack <pid> against the hung process
2. Edit puppetdb's logback.xml to change the root log level from "info" to "debug". Then restart puppetdb to get some more useful logs (they will be quite large)

This information should help us figure out what's going on here.

Additionally, does this event correlate in any way related to an adjacent software upgrade or any maintenance activities?

Wyatt Alt (JIRA)

unread,
Mar 15, 2016, 6:20:02 PM3/15/16
to puppe...@googlegroups.com
Wyatt Alt commented on Bug PDB-2535

Daniel Urist I'm interested in the following in addition to what russ mentioned:

  • how many nodes do you have?
  • how long have you observed this hanging for?
  • what's the output of

    du /opt/puppetlabs/server/data/puppetdb/mq
    

  • can we see your complete puppetdb.log (after that debug logging is turned on)
  • are there any messages in /opt/puppetlabs/server/data/puppetdb/mq/discarded
  • if you move /opt/puppetlabs/server/data/puppetdb/mq somewhere else and restart PuppetDB, does the issue go away?

If the issue goes away after moving the mq directory, we'd like to take a look at the content of that directory if possible.

Rob Browning (JIRA)

unread,
Mar 15, 2016, 6:21:04 PM3/15/16
to puppe...@googlegroups.com
Rob Browning commented on Bug PDB-2535

If possible it would be interesting to see what PuppetDB and/or PostgreSQL are up to, even if only generally, i.e. via something like "top", "atop", "iostat -mx 5", "iotop", "jnettop", etc. For example, are they particularly busy with respect to CPU, network, or storage.

And it may not be relevant, but if it's easy, it'd also be interesting to see the queue dir size, perhaps via both "du -sh queue-dir" and "du -sh --apparent-size queue-dir".

Daniel Urist (JIRA)

unread,
Mar 15, 2016, 6:23:03 PM3/15/16
to puppe...@googlegroups.com
Daniel Urist updated an issue
 

I've attached a gzipped puppetdb.log.

jstack doesn't seem to be in the debian puppetlabs package; I've tried the
version from debian's openjdk-7-jdk package, but I'm not sure this shows
anything useful? Here's the output:

root@dev-puppetdb:/# jstack -F 9941

On Tue, Mar 15, 2016 at 4:05 PM, Russell Mull (JIRA) <
issue-update...@puppetlabs.com> wrote:

Change By: Daniel Urist
Attachment: puppetdb.log.gz

Daniel Urist (JIRA)

unread,
Mar 15, 2016, 6:35:04 PM3/15/16
to puppe...@googlegroups.com
Daniel Urist updated an issue

I have 23 nodes in this environment.

It just started hanging today.

root@dev-puppetdb:/# du /opt/puppetlabs/server/data/puppetdb/mq

There's nothing in /opt/puppetlabs/server/data/puppetdb/mq/discarded; I
don't even have that directory; see above.

Moving the mq directory out of the way worked!

I've attached a gzipped tar archive of the contents.

On Tue, Mar 15, 2016 at 4:20 PM, Wyatt Alt (JIRA) <
issue-updat...@puppetlabs.com> wrote:

Change By: Daniel Urist
Attachment: mq.BAK.tgz

Russell Mull (JIRA)

unread,
Mar 15, 2016, 6:35:24 PM3/15/16
to puppe...@googlegroups.com
Russell Mull commented on Bug PDB-2535
 
Re: puppetdb stuck in maintenance mode on startup

The jdk's jstack is indeed the one you want. You shouldn't need the -F though; what happens if you don't use that?

Russell Mull (JIRA)

unread,
Mar 15, 2016, 6:36:03 PM3/15/16
to puppe...@googlegroups.com
Russell Mull commented on Bug PDB-2535

Ah, I didn't see your most recent update. We'll look at the mq dir.

Daniel Urist (JIRA)

unread,
Mar 15, 2016, 6:38:03 PM3/15/16
to puppe...@googlegroups.com
Daniel Urist commented on Bug PDB-2535

On Tue, Mar 15, 2016 at 4:35 PM, Russell Mull (JIRA) <
issue-update...@puppetlabs.com> wrote:

Add Comment Add Comment
 

Wyatt Alt (JIRA)

unread,
Mar 15, 2016, 6:53:03 PM3/15/16
to puppe...@googlegroups.com
Wyatt Alt commented on Bug PDB-2535

Daniel Urist Thanks for that info, and good to hear it's working at least. I can start up PDB with the message queue you provided with no issue, so we still don't know what the problem was. Nothing about the file content looks out of place either, though it can be pretty hard to tell.

Any chance you still have log files from the time when you first observed the hang? Also, now that PDB is running would you mind giving us a screenshot of the dashboard running on localhost:8080 on the PDB host? You should be able to get at it with an ssh tunnel if there's no browser on the PDB host:

ssh -NL 8080:localhost:8080 your.puppetdb.host

and then look at localhost:8080 in your browser.

Daniel Urist (JIRA)

unread,
Mar 17, 2016, 5:59:04 PM3/17/16
to puppe...@googlegroups.com
Daniel Urist updated an issue

Unfortunately I don't have the original log, but since logging was set to
the default level of notice I'm not sure it would help; I don't recall
seeing anything that stood out to me.

I've attached a screenshot of the current puppetdb dashboard.

On Tue, Mar 15, 2016 at 4:53 PM, Wyatt Alt (JIRA) <

Change By: Daniel Urist
Attachment: pdb_dashboard.png

Wyatt Alt (JIRA)

unread,
Mar 17, 2016, 6:35:04 PM3/17/16
to puppe...@googlegroups.com
Wyatt Alt commented on Bug PDB-2535
 
Re: puppetdb stuck in maintenance mode on startup

Daniel Urist Thanks for that. Nothing in that dashboard appears out of the ordinary, so unfortunately I'm at a loss . I'm going to close this ticket as can't reproduce for now, but please jump back in if you hit a recurrence.

Daniel Urist (JIRA)

unread,
Apr 11, 2016, 11:27:04 AM4/11/16
to puppe...@googlegroups.com
Daniel Urist commented on Bug PDB-2535

I'm seeing this issue again following a restart of puppetdb; exactly the same systems. Once again moving the mq directory out of the way cleared it up.

The only odd thing I'm seeing during the startup is the following warning in the puppetdb log:

2016-04-11 09:21:20,141 WARN  [o.a.a.b.BrokerService] Store limit is 102400 mb (current store usage is 0 mb). The data directory: /opt/puppetlabs/server/data/puppetdb/mq/localhost/KahaDB only has 18610 mb of usable space. - resetting to maximum available disk space: 18610 mb
2016-04-11 09:21:20,144 WARN  [o.a.a.b.BrokerService] Temporary Store limit is 51200 mb (current store usage is 0 mb). The data directory: /opt/puppetlabs/server/data/puppetdb/mq/localhost only has 18610 mb of usable space. - resetting to maximum available disk space: 18610 mb
2016-04-11 09:21:20,145 WARN  [o.a.a.b.BrokerService] Job Scheduler Store limit is 51200 mb, whilst the data directory: /opt/puppetlabs/server/data/puppetdb/mq/localhost/scheduler only has 18610 mb of usable space - resetting to 18610 mb.

Surely 18GB should be enough space?

This message was sent by Atlassian JIRA (v6.4.13#64028-sha1:b7939e9)
Atlassian logo

Wyatt Alt (JIRA)

unread,
Apr 11, 2016, 4:20:04 PM4/11/16
to puppe...@googlegroups.com
Wyatt Alt commented on Bug PDB-2535

Daniel Urist that warning is a red herring – it's harmless. You can make it go away by adjusting the store-usage and temp-usage parameters to match what's available, but these parameters will already readjust themselves so there's no danger in leaving it.

https://docs.puppet.com/puppetdb/4.0/configure.html#store-usage

18GB is certainly enough space.

As for the issue at hand, I'll reopen the ticket. The next time you hit the issue, would you mind restarting PuppetDB with debug logging enabled and show us those logs, and also collect the output of

thanks for bearing with us.

Daniel Urist (JIRA)

unread,
Apr 11, 2016, 5:05:04 PM4/11/16
to puppe...@googlegroups.com
Daniel Urist commented on Bug PDB-2535

Will do, thanks.

On Mon, Apr 11, 2016 at 2:20 PM, Wyatt Alt (JIRA) <

Susan McNerney (JIRA)

unread,
Aug 17, 2016, 2:26:03 PM8/17/16
to puppe...@googlegroups.com
Susan McNerney assigned an issue to Unassigned
 
Change By: Susan McNerney
Assignee: Russell Mull
Reply all
Reply to author
Forward
0 new messages