As we're getting ready to go to production with our k8s-based system,
we're trying to pin down exactly how we're going to do all the needed
monitoring/alerting for it. We can easily collect many of the metrics
we need (using kube-state-metrics to feed into prometheus, and/or
Datadog) and alert off of those.
However, there's other important k8s-related info about our system that
we need to be able to access, monitor, and alert on, most notably things
like:
* If a container crashes and is restarted by k8s
* If k8s kills a container and restarts it (e.g., due to exceeding cpu
or memory limits, or due to repeated failure of liveness check)
* If k8s kills a container but cannot restart it
* If an entire pod crashes and is restarted by k8s
etc.
How would would go about gaining access to those k8s-related events in
an automated fashion, and setting up monitoring/alerting off of those?
Thanks,
DR
David,
What we do is export the kubernetes cluster events to Cloud PubSub using Stackdriver Export and then we have SumoLogic setup to ingest logs from PubSub.
Then we use the SumoLogic Scheduled Search Capabilities to send alerts based on certain events.
Punit Agrawal
Site Reliability Engineer, Lead
New Product Development
--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
We pipe the k8s events into sumologic using a http collector and then use sumologic alerting.
punit agrawal
dev-ops lead
new product development
ebay
--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
kubernetes-use...@googlegroups.com.
To post to this group, send email to
kubernet...@googlegroups.com.