Hello all,
I would like to raise a discussion about a potential enhancement to MicroProfile Health around drain (or soft-stop) semantics.
Today, the specification focuses on binary health states (UP / DOWN), which are mapped to HTTP 200 and 503. This works well for liveness and readiness checks, but it leaves a gap for graceful traffic draining when a worker is under high memory or CPU load: the instance is still healthy and can continue serving existing sessions, but should temporarily stop receiving new traffic.
In practice, applications often need to signal “healthy but should not receive new traffic” while still serving existing or persistent sessions. Because MicroProfile Health does not allow customizing the HTTP status code or expressing intermediate states, this is typically handled via non-standard endpoints or load-balancer-specific workarounds.
A few open questions worth discussing:
Should MicroProfile Health consider a standardized way to express drain or traffic-deprioritization semantics?
How can this be done without coupling the specification to a specific load balancer or HTTP status code? For example, HAProxy can trigger drain mode today if the health check endpoint returns HTTP 404.
The motivation is not to replace existing UP/DOWN semantics, but to complement them with a portable, cloud-friendly mechanism aligned with modern deployment and availability requirements.
I would be very interested in your feedback on whether this use case fits the long-term direction of MicroProfile Health, and what constraints or design principles should guide such an enhancement.
Best regards,
Skander