health checks?

45 views
Skip to first unread message

steve christensen

unread,
Jan 22, 2015, 12:46:30 AM1/22/15
to gwi...@googlegroups.com
Did you have anything in particular in mind for health checks?

I was thinking I'd start with Metrics' implementation https://dropwizard.github.io/metrics/3.1.0/manual/healthchecks/ , and follow some of the patterns you'd developed when you cleaned up the services & metrics modules.

probably something along the lines of...
  • modules bind health checks as singletons
  • health checks inject a HealthCheckRegistry (or wrapper around it), and register themselves w/ the registry
  • a health check service that periodically runs the registry's health checks

Jon Stevens

unread,
Jan 22, 2015, 1:05:55 AM1/22/15
to gwi...@googlegroups.com
please don't default them to mandatory. =)

jon


--
You received this message because you are subscribed to the Google Groups "GWizard Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gwizard+u...@googlegroups.com.
To post to this group, send email to gwi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gwizard/5cef871e-ea1e-41f0-9af6-3f96aacd78c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

steve christensen

unread,
Jan 22, 2015, 3:08:34 AM1/22/15
to gwi...@googlegroups.com
That came together quicker than I thought it would. I've submitted a pull request.

And, I even remembered to branch this time.

Jeff Schnitzer

unread,
Jan 22, 2015, 4:03:21 AM1/22/15
to gwi...@googlegroups.com
Neato! Applied. Apparently it went straight to master anyways; every time I think I have git/github fully figured out it does something to surprise me :)

How would you feel about renaming a few things? Since we've pluralized gwizard-metrics and gwizard-services, seems like gwizard-healtchecks/HealthChecksModule would be more consistent. But while it's ok I'm not totally in love - what do you think about HealthModule or DoctorModule? HealthConfig or DoctorConfig? I'm not sure. I guess this is the set of names that I'm leaning towards (but it might be different in the morning):

gwizard-health
HealthConfig
HealthModule
HealthService
HealthChecks (same as it is now)

Also - I will rename HealthChecks.register/unregister to add/remove to be consistent with Services.add().

Jeff


steve christensen

unread,
Jan 22, 2015, 4:18:44 AM1/22/15
to gwi...@googlegroups.com

I kind of like Doctor as the name.

But, I don't have a strong opinion one way or the other about these names.  I'd actually had them pluralized earlier, then renamed them to see how it looked and forgot to switch back.

I'm also realizing I probably shouldn't have made the health service a periodic service. Or, at least it should have had  a much longer interval...

If dropwizard serves as example, the registered health checks don't run until someone hits the health endpoint.

You received this message because you are subscribed to a topic in the Google Groups "GWizard Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gwizard/RpXsQRcRuHQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gwizard+u...@googlegroups.com.

To post to this group, send email to gwi...@googlegroups.com.

Jeff Schnitzer

unread,
Jan 22, 2015, 4:22:49 AM1/22/15
to gwi...@googlegroups.com
I committed this as HealthChecksModule (etc) for now. I guess it's straightforward enough.

One issue is that HealthConfig.interval is a Java8 class. Which brings up two questions:

1) We currently target Java7. Is it too early to say Java8 only?

2) Does jackson parse a java.time.Duration? [looking] Apparently if we add jackson-datatype-jsr310 we get some options, but it's not totally obvious what the format is from this:


Jeff

Huseyn Guliyev

unread,
Jan 22, 2015, 7:36:55 AM1/22/15
to gwi...@googlegroups.com, je...@infohazard.org
Hi guys,
I see some cool stuff being cooked here.

Just a third opinion here:

Java8 will be too limiting at the moment. Maybe best to use JodaTime for dates for now?

Best,
Huseyn

steve christensen

unread,
Jan 22, 2015, 10:19:07 AM1/22/15
to gwi...@googlegroups.com, je...@infohazard.org

Dropwizard has a nice Duration class that'll be parsed from config. You can enter "1s", or "5min" into the yaml

You received this message because you are subscribed to a topic in the Google Groups "GWizard Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gwizard/RpXsQRcRuHQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gwizard+u...@googlegroups.com.

To post to this group, send email to gwi...@googlegroups.com.

Jeff Schnitzer

unread,
Jan 22, 2015, 6:50:25 PM1/22/15
to gwi...@googlegroups.com
I like that. I replaced it with the dropwizard Duration, which is conveniently in a util jar. It also has Size, which could be useful.

I'm not quite sure why IDEA didn't warn me about the JDK8ism even though the pom source level is set to 1.7. Mysterious.

Some thoughts:

 * I set the default interval to 10 minutes, but I'm open to any suggestion.

 * Should it run by default? And what if someone wants to disable automatic health checks? (presumably the web interface is what they want)

 * Once nice thing is that gwizard-healthcheck can register some JAXRS resources. If a RestModule (or JerseyModule) is not installed, it does nothing, so there's no hard dependency. This gives us an admin interface. But we need to establish a convention for that (everything under /admin?) and a way to secure it.

 * This calls for an AirbrakeModule or something like that that lets us automatically capture errors in logs. Probably depends on (and extends) LoggingModule.

Jeff

steve christensen

unread,
Jan 22, 2015, 7:41:41 PM1/22/15
to gwi...@googlegroups.com, je...@infohazard.org
Maybe decouple the health check registration in HealthChecks from the service impl in HealthChecksService ?

Running health checks _should_ have minimal impact, but whoever is creating a gwizard app will likely have a strong opinion about how they want them triggered.


The current HealthChecksService could be renamed to "PeriodicHealthChecks", and it's the only thing that needs the health check config. We could put the binding for PeriodicHealthChecks into a separate PeriodicHealthCheckModule Guice module 

Then, only people that want that periodic checking will install the PeriodicHealthCheckModule into their injector.


The HealthChecks.run() method could change return signatures, and just do : 
return healthCheckRegistry.runHealthChecks().entrySet();

Then, if it's the dumb PeriodicHealthCheckService that's running, it'll just use slf4j to log the results. If it's a JAXRS resource, maybe it sticks all the results into JSON


For my possible usage, I'd probably just expose a Gauge metric that runs my health check (like the dumb HealthCheckInJmxToo class in HealthChecksModuleExample.java). Then, whenever jmxtrans or other JMX-scraper queries the Metrics mbeans, it'll trigger a run of the health check. 

steve christensen

unread,
Jan 22, 2015, 11:54:25 PM1/22/15
to gwi...@googlegroups.com, je...@infohazard.org
I've implemented the changes I'd suggested below. Will have a pull request later tonight

Jeff Schnitzer

unread,
Jan 23, 2015, 5:13:35 AM1/23/15
to gwi...@googlegroups.com
I merged this but I actually like having just the single HeathChecksModule and letting the HealthChecksConfig determine whether or not the service starts up. Two modules here feels a bit much for something so simple.

Since the HealthChecksService can inspect the HeathChecksConfig before it calls Services.add(), we can eliminate all the service overhead if we detect that the service shouldn't start. Either checking for null interval or an explicit on/off switch in the config.

Sound good?

I'll try this out in a few mins.

Jeff



Jeff Schnitzer

unread,
Jan 23, 2015, 5:32:47 AM1/23/15
to gwi...@googlegroups.com
Hmmm, no easy way to add a 'null' for the interval Duration in yml.  Some offhand ideas:

---- Option 1: Default to not running

---- Option 2: Separate flag

healthChecks:
    enabled: false   # anything else to call the flag?

---- Option 3: Custom Duration implementation

healthChecks:
    interval: never

steve christensen

unread,
Jan 23, 2015, 10:11:43 AM1/23/15
to gwi...@googlegroups.com

Null check in the config sounds good

Jeff Schnitzer

unread,
Jan 23, 2015, 12:22:37 PM1/23/15
to gwi...@googlegroups.com
The problem with a null check in the config is that I can't seem to put a null in yaml. From the YAML spec it looks like 'null' or '~' should work but jackson is failing - 'null' just produces the default value and '~' is read as a literal. Maybe this is a bug in jackson or snakeyaml.

I tried '!!null' too and it didn't work.

I will ask about this on the jackson mailing list. In the mean time we can default to null or just punt on the problem until it gets fixed.

Jeff


steve christensen

unread,
Jan 23, 2015, 12:45:31 PM1/23/15
to gwi...@googlegroups.com, je...@infohazard.org
You could default to to a ludicrously long Duration (10 years?). And, in the PeriodicHealthCheckService.schedule() don't use a zero initialDelay.


Oh. Hmm. Just thought of a weird interaction. Due to the zero-length initial delay, the health checks might run prior to all services finishing their startup. 

HealthChecks can be added/removed dynamically, throughout the life of the application. Should it be up to each service to register its HealthCheck at an appropriate point in the service's lifecycle?

Or, should the HealthChecks class be a ServiceManager.Listener, and only allow health checks to run once the services have completed their startup?

Jeff Schnitzer

unread,
Jan 23, 2015, 12:51:38 PM1/23/15
to gwi...@googlegroups.com
Any compelling reason not to use the interval as a delay?

If we really want healthchecks to run immediately after startup, we can add an addStartHook() method to Run which lets the healthcheck system do some work after the awaitHealthy() is done. But the simple solution is just to leave a delay.

Jeff

Jon Stevens

unread,
Jan 23, 2015, 12:54:50 PM1/23/15
to gwi...@googlegroups.com, je...@infohazard.org
I vote for only running on services that have finished starting. 

Although, it might be useful to health check services through the whole lifecycle to be able to report how long it takes to start.

Jon

steve christensen

unread,
Jan 23, 2015, 12:55:36 PM1/23/15
to gwi...@googlegroups.com, je...@infohazard.org
Nope, I think just using the interval as the delay makes sense. 

In my mind... any 'real' health issues during initial startup should probably mean a service fails to startup, and the application barfs the errors all over the log

steve christensen

unread,
Jan 23, 2015, 12:59:22 PM1/23/15
to gwi...@googlegroups.com, je...@infohazard.org
Timing startup could also be done by adding a @Timed annotation to your service's startup method (and adding gwizard-metrics' MetricsModule() to your injector)

Jeff Schnitzer

unread,
Jan 23, 2015, 1:02:49 PM1/23/15
to gwi...@googlegroups.com
This is confusing to me. This test fails:


Here's the input:


The YAML spec and the snakeyaml docs suggest this should work, so I'm guessing jackson is to blame. Will know more soon.

Jeff


steve christensen

unread,
Jan 23, 2015, 2:33:44 PM1/23/15
to gwi...@googlegroups.com
Do you need to represent null in the config file?

If the field's default is null in the POJO, just not supplying that field in the YAML input will mean it's null when deserialized from the config.

Jeff Schnitzer

unread,
Jan 23, 2015, 2:36:24 PM1/23/15
to gwi...@googlegroups.com
Right - that just means it's off by default. Which I'm fine with for now, although it seems like there's some merit to DW's "production ready out of the box" mantra.

I'll make that change.

Jeff

Reply all
Reply to author
Forward
0 new messages