KIE Server health check from controller using WebSockets

49 views

Skip to first unread message

Quique Riesgo

unread,

Jan 21, 2022, 6:28:57 AM1/21/22

to Drools Development

(I published this message originally in the Drools Setup group but I am wondering if it could be more relevant here)

Hi team,

I am running Business Central and KIE Server in an AWS ECS/Fargate cluster.

I used the community docker images and extended them to run as ECS pods so we have:

- One service for Business Central workbench/controller. It is running only one task (replica) for the moment and the service is attached to an ALB.

- Other service for KIE Server. This service is able to run multiple replicas and the service is also attached to an ALB.

Since we are running multiple replicas of the KIE Server behind the ALB we are using WebSockets to connect to the controller as indicated here: https://blog.kie.org/2017/08/managed-kie-server-gets-ready-for-the-cloud.html

This architecture is working quite fine but we are observing an issue that makes usage of Business Central workbench annoying to the users.

Every 60 seconds the registered KIE server disappear from the workbench and queries are not working. After 3-4 seconds the KIE Server appears again in the workbench.

Our ALBs are configured with a 60 seconds idle timeout and we have observed that the permanent WebSocket connection open from the KIE Server to the controller is closed by the ALB because it is not used unless there are configuration changes in the server template/configuration, which happens from time to time.

We have also seen that Business Central controller has a health check functionality (implemented in KieServerHealthCheckControllerImpl) that checks the health of the registered KIE servers every 5 seconds but it does only actually uses the network to check the KIE Server health when using REST, and not when using WebSockets.

We don't want to increase the 60 seconds idle time out of our ALBs, so we are wondering if there is any way to send traffic through the permanent websocket connection between the controller and KIE Server so that connection is not closed as idle by our ALB.

We figured out that making KieServerHealthCheckControllerImpl to actually check the health of the KIE Server through the websocket should fix our issue (perhaps sending a GetServerInfoCommand).

Any ideas on this? Any other approach hat could fix this issue?

Thank you in advance,

Quique Riesgo

unread,

Jan 21, 2022, 6:35:16 AM1/21/22

to Drools Development

Quick update on this.

I have locally changed the code of WebSocketKieServerClient constructor to get fresh server info from the WebSocket connection (sending GetServerInfoCommand).

I have compiled it and deployed in my local installation and it is working fine, so now I can see that the websocket connection is used every five seconds and the ALB should not be closing it.

    public WebSocketKieServerClient(String url) {
        this.url = url;
        this.serverInfo = manager.getServerInfoByUrl(url);
        // Changes start here
        logger.debug("Obtained cached server info {} to kie server located at {}", serverInfo, url);
        logger.debug("Forcing get fresh server info from kie server located at {}", url);
        ServiceResponse<KieServerInfo> freshServerInfo = this.getServerInfo();
        logger.debug("Obtained FRESH server info {} to kie server located at {}", freshServerInfo.getResult().toString(), url);        
        this.serverInfo = freshServerInfo.getResult();
        // Changes end here
    }