You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Keras-users
Prometheus is a popular software tool for monitoring and alerting. It can gather metrics from applications and infrastructure and its query language, PromQL, can be used to create alerts when things go wrong.
Chapter 7 of François's "Deep Learning with Python, second edition" covers Keras callbacks and the TensorBoard callback for feedback during model training. I thought it would fun (and possibly useful) to write a callback for exposing training/testing loss and other metrics (accuracy, mean absolute error, histograms of model weights, etc.) to Prometheus. So I wrote Gangplank (https://github.com/hammingweight/gangplank). TensorBoard is more than a monitoring tool - it's useful for visualizing how a model works and is behaving. Gangplank is intended solely for monitoring.
Once I'd implemented the training and testing callbacks, I wanted to get Gangplank to emit inference metrics as well. The two metrics that (almost) every service should emit are the number of requests and the duration per request, so Gangplank exposes those to Prometheus. Additionally, if you have sufficient data to implement a meaningful statistical test for drift (data drift or prediction drift), it can expose the test metrics like a p-value or test statistic to Prometheus.