Load balancing for singletons / distinct singleton election policies for specific singletons (JBoss EAP 7.3.10))

DocSivia

unread,

Sep 14, 2022, 4:43:30 PM9/14/22

to WildFly

Hi everyone,

I am currently running a multi-node cluster, includes quite a few singleton MDBs, amount continously increasing as I develop.
I noticed all of them are always running on the same node, changing singleton election policies always effected every singleton. Im foreseeing load balancing issues in the very near future and would like to split this 1 whole group onto more nodes.

Is it possible to to apply singleton election policies only to a specific subset of singletons? Are there other (better) ways of achieving the same goal?

Any help is more than welcome!

Paul Ferraro

unread,

Sep 15, 2022, 8:46:09 AM9/15/22

to WildFly

Unfortunately no, as this feature was implemented somewhat crudely, with server-level granularity (i.e. all MDBs in the cluster will run on the same cluster member), rather than per-MDB, or even per-application, granularity.

krysiu

unread,

Nov 19, 2022, 1:33:10 PM11/19/22

to WildFly

But does this election system utilize Wildfly's singleton subsystem and if yes then does it simply use "default" policy of this subsystem? My goal is to have guarantee that messages are processed by MDB that is on the same node as some other cluster-singleton bean that is activated with respect to some singleton policy (i.e. I'm using org.jboss.msc.Service) ?

Paul Ferraro

unread,

Nov 21, 2022, 9:58:15 AM11/21/22

to WildFly

On Saturday, November 19, 2022 at 1:33:10 PM UTC-5 kry...@gmail.com wrote:

But does this election system utilize Wildfly's singleton subsystem

Yes.

and if yes then does it simply use "default" policy of this subsystem?

Yes.

My goal is to have guarantee that messages are processed by MDB that is on the same node as some other cluster-singleton bean that is activated with respect to some singleton policy (i.e. I'm using org.jboss.msc.Service) ?

So long as your singleton service is installed using the default policy on every cluster member as your MDBs, then the singleton election results will be the same.

However, since these are separate singleton service installations, the exact timing of the service start/stop for each service will not be precise - thus there might be a very short period of time where one service is not yet stopped while the other has already started.

You can improve the timing of this, such that there is no time where your singleton service is running on a different cluster member than a singleton MDB, by piggy-backing your singleton service on the singleton MDB mechanism itself. This way your service starts after the singleton MDB is started, and stops before the singleton MDB stops on any given cluster member.

e.g.

ServiceTarget target = ...;

ServiceName name = ...;

Service service = ...;

target.addService(name).setInstance(service)

.requires(ServiceName.parse("org.wildfly.ejb3.clustered.singleton.barrier")) // Piggy back on singleton MDB mechanism

.setInitialMode(ServiceController.Mode.PASSIVE) // Ensures this service starts when required service starts, and stops before required service stops

.install();

Paul

krysiu

unread,

Nov 22, 2022, 2:29:42 PM11/22/22

to WildFly

Thanks for the answer. It's a pity, however, that I can't associate these services the other way round i.e. the MDB can process messages only when the other HA bean is active.

krysiu

unread,

Nov 23, 2022, 4:29:27 AM11/23/22

to WildFly

1. This org.wildfly.ejb3.clustered.singleton.barrier wasn't good as I ended up with bean active on both nodes. Changed to org.wildfly.ejb3.clustered.singleton and looks ok.

2. My code is a bit different:

ServiceBuilder< ? > serviceBuilder = serviceContainer.addService( SERVICE_NAME )
.setInstance( this )
.setInitialMode( ServiceController.Mode.PASSIVE );
Supplier< Object > objectSupplier = serviceBuilder.requires( EJB3_SINGLETON );
ServiceController<?> serviceController = serviceBuilder.install();

This objectSupplier variable is not used, am I doing this correctly? I've checked and both beans are on the same node.

3. In my logs I can find:

INFO (MSC service thread 1-8) I am a master
INFO (MSC service thread 1-2) WFLYEJB0475: MDB delivery started: myserver.ear,MySingletonMDB

So HA bean starts before MDB delivery but this most likely is a coincidence. Nevertheless if I handle a message in MDB and can't find active HA singleton I can throw exception and message will be redelivered soon. Or I can wait for HA bean with Thread.sleep ;)

Paul Ferraro

unread,

Nov 24, 2022, 12:41:25 PM11/24/22

to WildFly

On Wednesday, November 23, 2022 at 4:29:27 AM UTC-5 kry...@gmail.com wrote:

1. This org.wildfly.ejb3.clustered.singleton.barrier wasn't good as I ended up with bean active on both nodes. Changed to org.wildfly.ejb3.clustered.singleton and looks ok.

Yes, sorry - you are right. The service name should match: https://github.com/wildfly/wildfly/blob/27.0.0.Final/ejb3/src/main/java/org/jboss/as/ejb3/subsystem/EJB3SubsystemRootResourceDefinition.java#L83

2. My code is a bit different:
ServiceBuilder< ? > serviceBuilder = serviceContainer.addService( SERVICE_NAME )
.setInstance( this )
.setInitialMode( ServiceController.Mode.PASSIVE );
Supplier< Object > objectSupplier = serviceBuilder.requires( EJB3_SINGLETON );
ServiceController<?> serviceController = serviceBuilder.install();
This objectSupplier variable is not used, am I doing this correctly? I've checked and both beans are on the same node.

Yes - this looks correct. That service does not provide a value (e.g. Supplier<Void>), so you can ignore this.

3. In my logs I can find:
INFO (MSC service thread 1-8) I am a master
INFO (MSC service thread 1-2) WFLYEJB0475: MDB delivery started: myserver.ear,MySingletonMDB
So HA bean starts before MDB delivery but this most likely is a coincidence. Nevertheless if I handle a message in MDB and can't find active HA singleton I can throw exception and message will be redelivered soon. Or I can wait for HA bean with Thread.sleep ;)

The MDB component will still be created first, bur your service will most likely be started before the full deployment chain completes, and almost certainly before the MDB receives its first callback - though, technically, there is no guarantee of this.

Either way, I'm glad to see that this is workable.

krysiu

unread,

Jan 3, 2023, 8:43:19 AM1/3/23

to WildFly

Hello again. Currently I'm starting my service asynchronously (as described in org.jboss.msc.Service.start()) :

in Service.start( StartContext context ) method I'm calling context.asynchronous()
then I'm spawning a new thread taken from managedThreadFactory and passing the StartContext there
after everything is started I'm calling context.complete() from the thread
however when exception is caught inside the new thread I'm calling context.failed( new StartException( e ) );

Question: what is the expected behavior of container when I call context.failed(...) ? Should it try again activating the service or it simply marks service as failed and all it does is not activating dependent services? Is it somehow configurable to make container try activating service again? If not then if I want to activate the service myself (e.g. after some time e.g. 15sec) and succeed should I use the StartContext that was passed to me in the very beginning and call context.complete() ?

krysiu

unread,

Jan 4, 2023, 5:07:08 AM1/4/23

to WildFly

So I'm using ServiceController.retry() right after I call StartContext.failed( new StartException( ... ) ) and this seems to be working fine because right after that Service.start is called by the container but unfortunately I've run into endless service restart loop even when I call StartContext.complete() once in a while (i.e. I call StartContext.complete() and I DON"T call retry() but the container still calls Service.start(), I don't know why) .

krysiu

unread,

Jan 4, 2023, 2:00:22 PM1/4/23

to WildFly

To sum up and provide smallest reproducible sample , here's the scenario (wildfly 21.0.2):

container calls Service.start
I call startContext.asynchronous()
I call startContext.failed(...) - at this point container does nothing and this is fine
I call serviceController.retry() - container calls Service.start again (in MSC service thread) and this is fine
in implementation of Service.start I call startContext.asynchronous()
I call startContext.complete() - at this point container calls Service.start again (this is the third time) - this is not ok since what I'm doing at this singleton service start is load and cache lots of data from DB. The same happens (i.e. container calling Service.start) when I call startContext.failed(...) for the second time - the service is started again right after call to failed(...) , without the need to calling retry()

Since this mechanism is a bit unstable this gives me at least these two possibilities:

keep using startContext.failed() and serviceController.retry() but also keep an internal flag that is the source of truth for service state : if the flag says the service has already been started , do not start it when Service.start is called again. Also: ServiceController.getState() cannot be used instead of the flag because it always says "starting" in start() method
or: never use startContext.failed() and serviceController.retry() - instead use a "while" loop in which I do what has to be done to make the service running (i.e. aforementioned reading data from DB) , repeat when exception is caught (i.e. DB is not accessible e.g. due to failing over) and only call startContext.complete() after all is loaded . The problem with this is to handle situation when in the meantime the containers decide that this node should no longer provide this singleton service - how to break the "while" loop ? The container won't call Service.stop probably because the service isn't started yet. The serviceController.getState() can't be used for this purpose because state is always "starting" and only after I call startContext.failed(...) the state changes to "down" instead of "start_failed" . I think I could rely on the state of the service my service depends on i.e. org.wildfly.ejb3.clustered.singleton because I've observed that its mode/state changes from active/up through never/up, never/stopping to never/down.

Paul Ferraro

unread,

Jan 5, 2023, 10:22:06 PM1/5/23

to WildFly

On Tuesday, January 3, 2023 at 8:43:19 AM UTC-5 kry...@gmail.com wrote:

Hello again. Currently I'm starting my service asynchronously (as described in org.jboss.msc.Service.start()) :
in Service.start( StartContext context ) method I'm calling context.asynchronous()
then I'm spawning a new thread taken from managedThreadFactory and passing the StartContext there
after everything is started I'm calling context.complete() from the thread
however when exception is caught inside the new thread I'm calling context.failed( new StartException( e ) );

This sounds generally correct.

You can also leverage the AsyncServiceConfigurator class to install a normal service such that it starts and/or stops asynchronously: https://github.com/wildfly/wildfly/blob/main/clustering/service/src/main/java/org/wildfly/clustering/service/AsyncServiceConfigurator.java

e.g.

ServiceTarget target = ...;

ServiceName name = ...;

Service service = ...;

ServiceBuilder<?> builder = new AsyncServiceConfigurator(name).build(target);

builder.setInstance(service).install();

Question: what is the expected behavior of container when I call context.failed(...) ?

From the perspective of the ServiceContainer, the ServiceController.getState() method of the failed service will return State.START_FAILED and the failure cause will be available via getStartException().

From the perspective of the singleton service mechanism, If the target service fails to start, it will remain in START_FAILED state until the next singleton election. No dependent services will start, and any on-demand dependencies will return to their previous state.

Should it try again activating the service or it simply marks service as failed and all it does is not activating dependent services?

You can certainly retry a failed service, however, the ServiceController<?> instance returned via singleton service installation is not the controller of your service, but rather the controller for the singleton service mechanism itself. Thus, calling getState() on that ServiceController instance will always return UP.

However, you can reference the actual service using the ServiceName returned by ServiceController.getServiceName().append("singleton").

Thus, you could try something like:

ServiceTarget target = ...;

SingletonPolicy policy = ...;

ServiceName name = ...;

Service service = ...;

ServiceController<?> controller = policy.createSingletonServiceConfigurator(name).build(target).setInstance(service).install();

controller.getServiceContainer().getService(name.append("singleton")).addLifecycleListener(new LifecycleListener() {

@Override

public void handleEvent(ServiceController<?> controller, LifecycleEvent event) {

if (event == LifecycleEvent.FAILED) {

int retries = ...;

try {

while (retries-- > 0) {

controller.getStartException().printStackTrace(System.err);

controller.retry();

container.awaitStability(timeout, unit);

}

} catch (InterruptedException e) {

Thread.currentThread().interrupt();

}

));

Is it somehow configurable to make container try activating service again?

Not currently. Honestly, you are the first person to ask about this.

Perhaps it makes sense to be able to configure a stability duration and a number of retries such that the above logic is executed automatically, something like:

/subsystem=singleton/singleton-policy=default:write-attribute(name=stability-timeout, value=5000) // In milliseconds

/subsystem=singleton/singleton-policy=default:write-attribute(name=start-retries, value=1) // default is 0

If not then if I want to activate the service myself (e.g. after some time e.g. 15sec) and succeed should I use the StartContext that was passed to me in the very beginning and call context.complete() ?

You should be able to insert a TimeUnit.sleep(15) in the above logic. If some kind of back-off interval is useful, we could instead express "retries" as a list of back-off intervals.

e.g.

/subsystem=singleton/singleton-policy=default:write-attribute(name=start-retry-intervals, value=[0, 10, 100]) // In milliseconds

Would it be useful if, on ultimate failure (after any retries) of the elected node to start its service, the singleton policy continually run elections (progressively removing failed nodes from the current set of election candidates) until some node is able to start its service, or until a quorum can no longer be reached, or a topology update triggers a new election? e.g.

/subsystem=singleton/singleton-policy=default:write-attribute(name=failover-election, value=true) // default is false