(I published this message originally in the Drools Setup group but I am wondering if it could be more relevant here)
Hi team,
I am running Business Central and KIE Server in an AWS ECS/Fargate cluster.
I used the community docker images and extended them to run as ECS pods so we have:
-
One service for Business Central workbench/controller. It is running
only one task (replica) for the moment and the service is attached to an
ALB.
- Other service for KIE Server. This service is able to run multiple replicas and the service is also attached to an ALB.
This
architecture is working quite fine but we are observing an issue that
makes usage of Business Central workbench annoying to the users.
Every
60 seconds the registered KIE server disappear from the workbench and
queries are not working. After 3-4 seconds the KIE Server appears again
in the workbench.
Our ALBs are configured
with a 60 seconds idle timeout and we have observed that the permanent
WebSocket connection open from the KIE Server to the controller is
closed by the ALB because it is not used unless there are configuration
changes in the server template/configuration, which happens from time to
time.
We have also seen that Business Central controller has a health check functionality (implemented in KieServerHealthCheckControllerImpl)
that checks the health of the registered KIE servers every 5 seconds
but it does only actually uses the network to check the KIE Server
health when using REST, and not when using WebSockets.
We
don't want to increase the 60 seconds idle time out of our ALBs, so we
are wondering if there is any way to send traffic through the permanent
websocket connection between the controller and KIE Server so that
connection is not closed as idle by our ALB.
We figured out that making KieServerHealthCheckControllerImpl
to actually check the health of the KIE Server through the websocket should fix our issue (perhaps sending a
GetServerInfoCommand).
Any ideas on this? Any other approach hat could fix this issue?
Thank you in advance,