The PuppetDB service doesn't shut down after a failed migration, despite log messages indicating that the shutdown cycle has been started. This causes the init service to wait for the full START_TIMEOUT before declaring startup as failed and killing the process. Given we've bumped START_TIMEOUT to 4 hours to allow for migration of large databases, this significantly delays detection of failed upgrades due to migration issues.
-
-
Reproduction Case
Install PuppetDB 5.0.1 on a CentOS 7 VM using the puppetlabs/puppetdb module:
-
Add package repos for version 5 of the Puppet Platform and install the puppet agent:
rpm -Uvh http://yum.puppetlabs.com/puppet5/puppet-release-el-7.noarch.rpm
|
yum install -y puppet agent
|
-
Instal the PuppetDB module
/opt/puppetlabs/bin/puppet module install puppetlabs-puppetdb --version 6.0.1
|
-
Apply the following manifest to install Postgres and PuppetDB. Two applications may be necessary due to a race condition where Puppet resources attempt to enforce state on the database while Postgres is still starting up.
/opt/puppetlabs/puppet/bin/puppet apply << EOF
|
class {'puppetdb::globals': version => '5.0.1'}
|
class {'puppetdb': disable_ssl => true}
|
EOF
|
Add some test data to PuppetDB that will trigger the migration failure described in PDB-3692:
/opt/puppetlabs/puppet/bin/curl -X POST \
|
-H 'Content-Type:application/json' \
|
-H 'Accept:application/json' \
|
-d '{"certname":"test1","environment":"DEV","values":{"myfact": null},"producer_timestamp":"2015-01-01", "producer":"master1"}' \
|
"http://localhost:8080/pdb/cmd/v1?command=replace_facts&version=5&certname=test1"
|
/opt/puppetlabs/puppet/bin/curl -X POST \
|
-H 'Content-Type:application/json' \
|
-H 'Accept:application/json' \
|
-d '{"certname":"test2","environment":"DEV","values":{"myfact": null},"producer_timestamp":"2015-01-01", "producer":"master1"}' \
|
"http://localhost:8080/pdb/cmd/v1?command=replace_facts&version=5&certname=test2"
|
|
# Alter the facts table to trigger PDB-3692 on upgrade
|
sudo -u postgres /usr/bin/psql -d puppetdb -c "UPDATE facts SET value='null' WHERE factset_id=2 AND value_type_id=4"
|
Decrease the timeout for PuppetDB startup and then upgrade PuppetDB to 5.1.1:
echo 'START_TIMEOUT=60' >> /etc/sysconfig/puppetdb
|
|
/opt/puppetlabs/bin/puppet resource --evaltrace package puppetdb ensure=5.1.1
|
-
-
-
Outcome
/var/log/puppetlabs/puppetdb/puppetdb.log shows the migration fails and the service enters shutdown within seconds after starting:
2017-10-16 20:03:52,350 INFO [p.p.c.services] PuppetDB version 5.1.1
|
2017-10-16 20:03:52,351 INFO [c.z.h.HikariDataSource] PDBMigrationsPool - Starting...
|
2017-10-16 20:03:52,353 INFO [c.z.h.HikariDataSource] PDBMigrationsPool - Start completed.
|
2017-10-16 20:03:52,517 INFO [p.p.s.migrate] Applying database migration version 64
|
2017-10-16 20:03:52,518 INFO [p.p.s.migrate] [1/8] Cleaning up unreferenced facts...
|
2017-10-16 20:03:52,527 INFO [p.p.s.migrate] [2/8] Creating new fact storage tables...
|
2017-10-16 20:03:52,543 INFO [p.p.s.migrate] [3/8] Copying unique fact values into fact_values
|
2017-10-16 20:03:52,555 INFO [p.p.s.migrate] [4/8] Reconstructing facts to refer to fact_values...
|
2017-10-16 20:03:52,557 INFO [p.p.s.migrate] [5/8] Cleaning up duplicate null values...
|
2017-10-16 20:03:52,558 INFO [p.p.s.migrate] ... none found
|
2017-10-16 20:03:52,559 INFO [p.p.s.migrate] [6/8] Computing fact value hashes...
|
2017-10-16 20:03:52,611 INFO [p.p.s.migrate] [7/8] Indexing fact_values table...
|
2017-10-16 20:03:52,640 ERROR [p.p.s.migrate] Caught SQLException during migration
|
java.sql.BatchUpdateException: Batch entry 3 ALTER TABLE fact_values ADD CONSTRAINT fact_values_value_hash_key UNIQUE (value_hash) was aborted. Call getNextException to see the cause.
|
...
|
2017-10-16 20:03:52,641 ERROR [p.p.s.migrate] Unravelled exception
|
org.postgresql.util.PSQLException: ERROR: could not create unique index "fact_values_value_hash_key"
|
Detail: Key (value_hash)=(\x2be88ca4242c76e8253ac62474851065032d6833) is duplicated.
|
...
|
2017-10-16 20:03:52,668 INFO [p.t.internal] Shutting down due to JVM shutdown hook.
|
2017-10-16 20:03:52,669 INFO [p.t.internal] Beginning shutdown sequence
|
However, the upgrade takes over a full minute to complete, the init script hangs until START_TIMEOUT kicks in:
# /opt/puppetlabs/bin/puppet resource --evaltrace --verbose package puppetdb ensure=5.1.1
|
Info: /Package[puppetdb]: Starting to evaluate the resource
|
Notice: /Package[puppetdb]/ensure: ensure changed '5.0.1-1.el7' to '5.1.1'
|
Info: /Package[puppetdb]: Evaluated in 132.98 seconds
|
package { 'puppetdb':
|
ensure => '5.1.1-1.el7',
|
}
|
|
# journalctl -u puppetdb
|
...
|
Oct 16 19:51:10 r1lt6c9aq1o5t4r systemd[1]: Starting puppetdb Service...
|
Oct 16 19:51:29 r1lt6c9aq1o5t4r puppetdb[13361]: The [database] classname setting has been retired and will be ignored.
|
Oct 16 19:51:29 r1lt6c9aq1o5t4r puppetdb[13361]: The [database] subprotocol setting has been retired and will be ignored.
|
Oct 16 19:51:33 r1lt6c9aq1o5t4r puppetdb[13361]: The [database] classname setting has been retired and will be ignored.
|
Oct 16 19:51:33 r1lt6c9aq1o5t4r puppetdb[13361]: The [database] subprotocol setting has been retired and will be ignored.
|
Oct 16 19:52:10 r1lt6c9aq1o5t4r puppetdb[13361]: Startup timed out after 60 seconds
|
Oct 16 19:52:10 r1lt6c9aq1o5t4r puppetdb[13361]: Startup script was terminated before completion
|
Oct 16 19:53:11 r1lt6c9aq1o5t4r puppetdb[13361]: Process 13368 not terminated gracefully after 60 seconds
|
Oct 16 19:53:12 r1lt6c9aq1o5t4r puppetdb[13361]: /opt/puppetlabs/server/apps/puppetdb/ezbake-functions.sh: line 79: 13368 Killed ${JAVA_BIN} ${JAVA_ARGS} -Djava.sec
|
Oct 16 19:53:13 r1lt6c9aq1o5t4r puppetdb[13361]: Process 13368 killed after SIGKILL
|
Oct 16 19:53:13 r1lt6c9aq1o5t4r systemd[1]: puppetdb.service: control process exited, code=exited status=1
|
Oct 16 19:53:13 r1lt6c9aq1o5t4r systemd[1]: Failed to start puppetdb Service.
|
-
-
-
Expected Outcome
PuppetDB startup should fail-fast after migration fails and not have to be killed by the init script due START_TIMEOUT.
|