You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Nomad
Hi there!
Are there recommended best practices for monitoring unhealthy jobs in Nomad? We're currently deploying jobs across regions and datacenters. After deployment, is there a good way to check that all the services are running correctly and to alert us if not? We currently use Prometheus as our monitoring system.
We've looked at the option of using Consul health checks as many of you do. However, if a task never comes up at all there is no Consul health check. It would also be great is there is a way to monitor for accidental stopping of jobs. For example, if someone accidentally stops a job that's actually needed.
Thanks in advance.
Matt Veitas
unread,
Jul 29, 2018, 6:13:47 PM7/29/18
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Nomad
I don't think there is anything out of the box, but we spent a day writing a number of simple python scripts that we run every 30 seconds to query the Consul and Nomad APIs to get information about jobs and their health and then report these metrics to our monitoring system. So far it's working well.
-Matt
Shantanu Gadgil
unread,
Jul 30, 2018, 1:33:14 PM7/30/18
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Nomad
Hi,
Are these scripts available opensource somewhere???☺️😊
Matt Veitas
unread,
Jul 31, 2018, 2:02:48 PM7/31/18
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Nomad
Not yet, but this is something we (my company) might consider in the future