Nomad 0.7.0 (and 0.6.3) templates being re-rendered every few seconds

281 views
Skip to first unread message

Chris Stevens

unread,
Nov 30, 2017, 2:58:39 PM11/30/17
to Nomad
See https://github.com/hashicorp/nomad/issues/3197

I continue to see this issue in a test environment and had to roll back to 0.6.0 which has not had this issue.

Current setup:
  • Consul 1.0.0
  • Nomad 0.7.0
  • Vault 0.9.0
  • Now using the vault database backend instead of the older mysql.
  • If it matters functionally, I'm using plugin_name=mysql-aurora-database-plugin for mysql 5.6 / aurora compatibility
  • database role default_ttl = 12h, max_ttl=48h
  • vault_grace = 30s on my test job

Chris Stevens

unread,
Nov 30, 2017, 3:37:28 PM11/30/17
to Nomad
I'll add that the same jobs run well and as expected on Nomad 0.6.0 with the only system change being rollback of Nomad from 0.7.0.

The tasks that exhibit this issue all use vault and the database (or legacy mysql) backend for credentials. The nginx tasks in the same jobs that do not use vault are unaffected.

The typical task is a php-fpm process with up to 3 template stanzas that use both consul and vault data:

template {
    source        = "local/provision/app/php-fpm.ctmpl"
    destination   = "local/php-fpm.conf"
    change_mode   = "signal"
    change_signal = "SIGUSR2"
    vault_grace   = "30s"
}

template {
    source        = "local/provision/app/config.ctmpl"
    destination   = "local/config.php"
    change_mode   = "signal"
    change_signal = "SIGUSR2"
    vault_grace   = "30s"
}

template {
    source        = "local/provision/app/database.ctmpl"
    destination   = "local/database.php"
    change_mode   = "signal"
    change_signal = "SIGUSR2"
    vault_grace   = "30s"
}


Chris Stevens

unread,
Nov 30, 2017, 3:39:01 PM11/30/17
to Nomad
We also use this same setup as a single-node development environment that is very similar to the test cluster and have never seen this re-rendering issue in that environment.

Chris Stevens

unread,
Nov 30, 2017, 4:49:11 PM11/30/17
to Nomad
In case it is relevant, we currently upgrade nomad servers and workers in-place. This is done carefully with each server and then the workers and has not been the obvious cause of issues in the past up through nomad 0.6.0.

Chelsea Komlo

unread,
Nov 30, 2017, 5:21:07 PM11/30/17
to Chris Stevens, Nomad
Hi Chris,

Thanks for writing to us about this issue. I see that there is quite a long discussion on the github issue as well. Any relevant debug logs would be helpful to diagnose.

What happens when you try running your config with the newest version of consul-template? There was a corresponding fix for this on the consul-template side, so it would be good to know if this is a Nomad specifically. See https://github.com/hashicorp/consul-template/issues/1019 for reference.


Thanks,
Chelsea

On Thu, Nov 30, 2017 at 4:49 PM, Chris Stevens <chris....@traxo.com> wrote:
In case it is relevant, we currently upgrade nomad servers and workers in-place. This is done carefully with each server and then the workers and has not been the obvious cause of issues in the past up through nomad 0.6.0.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/43dfb32a-8fed-45e4-8fbe-6b879829fb6b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Chris Stevens

unread,
Dec 1, 2017, 8:52:23 AM12/1/17
to Nomad
Chelsea,

I'll try to do a test with consul-template on the problem cluster. I might have to wait until the next nomad release that appears to vendor consul-template v0.19.4.

I've tried vault_grace up to 30s without a change in this behavior. Would increasing it further help with this issue? How much further? Back to 15 minutes?

I'm curious if there are unit tests in nomad that test with multiple template stanzas that all use vault? I see that other users with a similar cluster setup are not having this problem. If those users only have 1 template that uses vault and mine have 2 or more, that could point to a problem in handling and renewing tokens across those templates or something like that.

Chris Stevens

unread,
Dec 1, 2017, 9:43:18 AM12/1/17
to Nomad
I ran the templates through consul-template versions 0.19.3 and 0.19.4. Both gave the same sys/leases/renew error.

It looks like this is going to be a vault default policy issue in my environment.

The default policy for my deployed environments has been around since Vault 0.6 or something.

On our development systems that were not experiencing this issue, vault is loaded fresh and thus the default policy is also refresh.

I vaguely remembered a comment or a discussion about the vault policy changing and the task token needed access to sys/leases/renew in vault.


Reply all
Reply to author
Forward
0 new messages