Hello again. Currently I'm starting my service asynchronously (as described in org.jboss.msc.Service.start()) :
- in Service.start( StartContext context ) method I'm calling context.asynchronous()
- then I'm spawning a new thread taken from managedThreadFactory and passing the StartContext there
- after everything is started I'm calling context.complete() from the thread
- however when exception is caught inside the new thread I'm calling context.failed( new StartException( e ) );
This sounds generally correct.
e.g.
ServiceTarget target = ...;
ServiceName name = ...;
Service service = ...;
ServiceBuilder<?> builder = new AsyncServiceConfigurator(name).build(target);
builder.setInstance(service).install();
Question: what is the expected behavior of container when I call context.failed(...) ?
From the perspective of the ServiceContainer, the ServiceController.getState() method of the failed service will return State.START_FAILED and the failure cause will be available via getStartException().
From the perspective of the singleton service mechanism, If the target service fails to start, it will remain in START_FAILED state until the next singleton election. No dependent services will start, and any on-demand dependencies will return to their previous state.
Should it try again activating the service or it simply marks service as failed and all it does is not activating dependent services?
You can certainly retry a failed service, however, the ServiceController<?> instance returned via singleton service installation is not the controller of your service, but rather the controller for the singleton service mechanism itself. Thus, calling getState() on that ServiceController instance will always return UP.
However, you can reference the actual service using the ServiceName returned by ServiceController.getServiceName().append("singleton").
Thus, you could try something like:
ServiceTarget target = ...;
SingletonPolicy policy = ...;
ServiceName name = ...;
Service service = ...;
ServiceController<?> controller = policy.createSingletonServiceConfigurator(name).build(target).setInstance(service).install();
controller.getServiceContainer().getService(name.append("singleton")).addLifecycleListener(new LifecycleListener() {
@Override
public void handleEvent(ServiceController<?> controller, LifecycleEvent event) {
if (event == LifecycleEvent.FAILED) {
int retries = ...;
try {
while (retries-- > 0) {
controller.getStartException().printStackTrace(System.err);
controller.retry();
container.awaitStability(timeout, unit);
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
));
Is it somehow configurable to make container try activating service again?
Not currently. Honestly, you are the first person to ask about this.
Perhaps it makes sense to be able to configure a stability duration and a number of retries such that the above logic is executed automatically, something like:
/subsystem=singleton/singleton-policy=default:write-attribute(name=stability-timeout, value=5000) // In milliseconds
/subsystem=singleton/singleton-policy=default:write-attribute(name=start-retries, value=1) // default is 0
If not then if I want to activate the service myself (e.g. after some time e.g. 15sec) and succeed should I use the StartContext that was passed to me in the very beginning and call context.complete() ?
You should be able to insert a TimeUnit.sleep(15) in the above logic. If some kind of back-off interval is useful, we could instead express "retries" as a list of back-off intervals.
e.g.
/subsystem=singleton/singleton-policy=default:write-attribute(name=start-retry-intervals, value=[0, 10, 100]) // In milliseconds
Would it be useful if, on ultimate failure (after any retries) of the elected node to start its service, the singleton policy continually run elections (progressively removing failed nodes from the current set of election candidates) until some node is able to start its service, or until a quorum can no longer be reached, or a topology update triggers a new election? e.g.
/subsystem=singleton/singleton-policy=default:write-attribute(name=failover-election, value=true) // default is false
(I try to think of a better name)
WDYT?