[WFLY-17287] Cannot persiste ejb timers into database

603 views
Skip to first unread message

Alessandro Moscatelli

unread,
Nov 18, 2022, 5:36:47 AM11/18/22
to WildFly
[WFLY-17287] Cannot persist ejb timers into database - Red Hat Issue Tracker

Hi everybody,
I opened the issue above a week ago because I can't find a way to use database-data-store for ejb timers starting from wildfly 27.0.0.

I don't know if I am doing something wrong, I understand that Wildfly 27 introduces a new way to distribute ejb timers using infinispan :


But I need to store persistent timers on databases because I work on autoscalable cluster based on docker and I don't want to rely on files and I need to be able to recover timers from database after rebooting the whole cluster, I used this so far : 


Now, doesn't matter how I do use the  database-data-store element, it always ends complaining with this error while booting : 

10:39:24,915 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("add") failed - address: ([
    ("subsystem" => "ejb3"),
    ("service" => "timer-service"),
    ("database-data-store" => "database-data-store")
]) - failure description: {
    "WFLYCTL0412: Required services that are not installed:" => ["org.wildfly.ejb3.timer-service"],
    "WFLYCTL0180: Services with missing/unavailable dependencies" => ["org.wildfly.ejb3.timer-service.timer-persistence-service.database-data-store is missing [org.wildfly.ejb3.timer-service]"]
}
10:39:24,927 INFO  [org.jboss.as.controller] (Controller Boot Thread) WFLYCTL0183: Service status report
WFLYCTL0184:    New missing/unsatisfied dependencies:
      service org.wildfly.ejb3.timer-service (missing) dependents: [service org.wildfly.ejb3.timer-service.timer-persistence-service.database-data-store]

When using this configuration :

            <timer-service default-transient-timer-management="transient" default-data-store="default-file-store">
                <data-stores>
                    <file-data-store name="default-file-store" path="timer-service-data" relative-to="jboss.server.data.dir"/>
                    <database-data-store name="database-store" datasource-jndi-name="java:/jdbc/db_optoplus" partition="timer"/>
                </data-stores>
            </timer-service>

This makes no sense to me, the error appears as soon as I use the database-data-store, even if I do not use it in default-data-store.

It seems like database-data-store isn't able to register itself as a service while file-data-store can. I also tried to have a look to the source code but the capability system seems a little to complex to work with on the spot.

Please can somebody give me a working configuration storing ejb timers on database compatibile con Wildfly 27 ? 

Thank you in advance

Alessandro Moscatelli

unread,
Nov 18, 2022, 10:14:54 AM11/18/22
to WildFly
Thanks to Paul Ferraro for guiding me, it was my mistake since I had to remove both the  two new attributes :  default-transient-timer-management and  default-persistent-timer-management

So the working configuration is exactly the same as the old one.

Thank you 

Stephen Sill II

unread,
May 9, 2023, 9:56:23 AM5/9/23
to WildFly
I'm actually running into this issue going from wildfly 26 to wildfly 28.   I build a docker container based on bitnami/wildfly:28    and part of my set up is that I run a  .cli that does the wildfly configuration for the ear and war I'll be deploying.

did you fix this by manually editing a standalone*.xml file or were you able to entirely script this?

Thanks,
Stephen

Stephen Sill II

unread,
May 9, 2023, 10:28:31 AM5/9/23
to WildFly
Here's the snippet of the .cli file that worked great on widlfly 26 and now has issues on wildfly 28.  The second to last command throws the error


{
    "outcome" => "failed",
    "failure-description" => {

        "WFLYCTL0412: Required services that are not installed:" => ["org.wildfly.ejb3.timer-service"],
        "WFLYCTL0180: Services with missing/unavailable dependencies" => ["org.wildfly.ejb3.timer-service.timer-persistence-service.foo-cluster
ed-timer-store is missing [org.wildfly.ejb3.timer-service]"]
    },
    "rolled-back" => true
}



# snippet

data-source add --name=fooDS --jndi-name=java:jboss/datasources/fooDS --driver-name=${PG_JDBC_JAR_NAME} --connection-url=jdbc:postgresql://${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DB} --user-name=${POSTGRES_USER} --credential-reference={store=wfcreds, alias=dbpw} --jta

# configure db connection validation
/subsystem=datasources/data-source=fooDS:write-attribute(name=background-validation-millis, value=10000)
/subsystem=datasources/data-source=fooDS:write-attribute(name=check-valid-connection-sql,value="SELECT 1")
/subsystem=datasources/data-source=fooDS:write-attribute(name=background-validation, value=true)

# configure messaging subsystem to use jdbc for persistence
/subsystem=messaging-activemq/server=default:write-attribute(name=journal-datasource, value=fooDS)
reload

## Clustering config

# Configuration of clustered timer storage(THIS THROWS ERROR)
/subsystem=ejb3/service=timer-service/database-data-store=foo-clustered-timer-store:add(allow-execution=true, datasource-jndi-name=java:jboss/datasources/fooDS, refresh-interval=5000, database=postgresql, partition=timer_partition)

# set timer service to use the new cluster timer storage by default
/subsystem=ejb3/service=timer-service:write-attribute(name=default-data-store, value=foo-clustered-timer-store)

Paul Ferraro

unread,
May 11, 2023, 12:55:18 PM5/11/23
to WildFly
WF27 introduced a new timer service implementation based on Infinispan that is configured by default for both persistent and non-persistent timers.

Therefore, if you want to use the legacy implementation instead, you first need to disable the default implementation.
N.B. while the legacy TimerService can be used for non-persistent timers alongside the new implementation for persistent timers, the reverse is not possible.
Therefore, to use the legacy implementation for persistent timers, you also need to use the legacy implementation for non-persistent timers.

e.g.

start-batch
/subsystem=ejb3/service=timer-service:undefine-attribute(name=transient-timer-management) # Disable the new timer service impl for non-persistent timers
/subsystem=ejb3/service=timer-service:write-attribute(name=thread-pool-name, value=default) # Enable the legacy timer service impl for non-persistent timers
/subsystem=ejb3/service=timer-service:undefine-attribute(name=persistent-timer-management) # Disable the new timer service impl for persistent timers
# Add your datasource
# Add your timer service data-store
/subsystem=ejb3/service=timer-service:write-attribute(name=default-data-store, value=foo-clustered-timer-store) # Enable the legacy timer service impl for persistent timers
end-batch

That said, I would highly encourage you to try out the new default TimerService implementation which improves upon the scalability and performance characteristics of the old implementation.
While the new implementation can also be configured to use a shared database, the table structure and binary storage format are not compatible with the old implementation, i.e. you cannot use the new implementation to read timer data written by the old implementation.
See:

Stephen Sill II

unread,
Aug 7, 2023, 10:22:37 AM8/7/23
to WildFly
Hi Paul,
I'm just getting back to this now.   thanks for your reply!

What I'm seeing currently with wildfly 28 application deployed via helm with two replicas is that some ejb scheduled tasks that should only run every 5 min on one of the nodes is in fact running on both.  I can recreate this in minikube locally.

So what I'm trying to do is:

Pod comes up, a Singleton EJB asks the TimerService for the jobs that are schedule.  If it doesn't find the two jobs that should be running, it creates the jobs.   So every 5 min one of those jobs runs and that Singleton EJB adds a task to the JMS queue, which is also configured to persist to jdbc database.

I'm going go through your reply above and see if I can get it working with jdbc just to get over the hump since this issue is causing problems in production and is compounded with more pods/replicas.  I agree the infinispan route seems better.

Thanks!

Stephen Sill II

unread,
Aug 7, 2023, 11:24:32 AM8/7/23
to WildFly
So I decided to dig in and figure out why this wasn't working with the new way of doing things.

I think this is the root, and it fits what I'm seeing:

First pod coming up(since this is first pod, you'd expect it to basically start the infinispan cluster)
2023-08-07 15:05:52,101 INFO  [org.jboss.ws.common.management] (MSC service thread 1-2) JBWS022052: Starting JBossWS 6.2.0.Final (Apache CXF 4.0.0)
2023-08-07 15:05:52,106 INFO  [org.jgroups.protocols.kubernetes.KUBE_PING] (ServerService Thread Pool -- 86) namespace dev set; clustering enabled
2023-08-07 15:05:52,283 INFO  [org.jboss.as.clustering.jgroups] (ServerService Thread Pool -- 86) WFLYCLJG0032: Connecting 'ee' channel. '89b122cfa3491f5995a9f0' joining cluster 'ejb' via /10.244.0.240:7600
2023-08-07 15:05:52,284 INFO  [org.jgroups.JChannel] (ServerService Thread Pool -- 86) local_addr: 4d443464-77cc-4956-a6fa-fac8f36e6613, name: 89b122cfa3491f5995a9f0
2023-08-07 15:05:52,327 INFO  [org.jgroups.protocols.FD_SOCK2] (ServerService Thread Pool -- 86) server listening on /10.244.0.240:57600
2023-08-07 15:05:52,973 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-7) WFLYJCA0004: Deploying JDBC-compliant driver class org.mariadb.jdbc.Driver (version 2.7)
2023-08-07 15:05:52,983 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-4) WFLYJCA0005: Deploying non-JDBC-compliant driver class org.postgresql.Driver (version 42.6)
2023-08-07 15:05:52,998 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-5) WFLYJCA0018: Started Driver service with driver-name = postgresql-42.6.0.jar
2023-08-07 15:05:53,010 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-5) WFLYJCA0018: Started Driver service with driver-name = mariadb-java-client-2.7.9.jar
2023-08-07 15:05:54,365 INFO  [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-8) WFLYJCA0001: Bound data source [java:jboss/datasources/guardianDS]
2023-08-07 15:05:54,724 INFO  [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 86) 89b122cfa3491f5995a9f0: no members discovered after 2375 ms: creating cluster as coordinator
2023-08-07 15:05:54,738 INFO  [org.jboss.as.clustering.jgroups] (ServerService Thread Pool -- 86) WFLYCLJG0033: Connected 'ee' channel. '89b122cfa3491f5995a9f0' joined cluster 'ejb' with view: [89b122cfa3491f5995a9f0|0] (1) [89b122cfa3491f5995a9f0]
2023-08-07 15:05:55,217 INFO  [org.apache.activemq.artemis.core.server] (ServerService Thread Pool -- 86) AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=/opt/bitnami/wildfly/standalone/data/activemq/journal,bindingsDirectory=/opt/bitnami/wildfly/standalone/data/activemq/bindings,largeMessagesDirectory=/opt/bitnami/wildfly/standalone/data/activemq/largemessages,pagingDirectory=/opt/bitnami/wildfly/standalone/data/activemq/paging)


Second pod coming up after a manual scale to 2 replicas(he's not finding the existing cluster)
2023-08-07 15:12:23,580 INFO  [org.jboss.as.clustering.jgroups] (ServerService Thread Pool -- 86) WFLYCLJG0032: Connecting 'ee' channel. 'c3157ac4595fab71f52da2' joining cluster 'ejb' via /10.244.0.241:7600
2023-08-07 15:12:23,580 INFO  [org.jgroups.JChannel] (ServerService Thread Pool -- 86) local_addr: d07b1cfc-eaf2-4969-af71-c8ffc2fa0d0c, name: c3157ac4595fab71f52da2
2023-08-07 15:12:23,683 INFO  [org.jgroups.protocols.FD_SOCK2] (ServerService Thread Pool -- 86) server listening on /10.244.0.241:57600
2023-08-07 15:12:24,344 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-2) WFLYJCA0004: Deploying JDBC-compliant driver class org.mariadb.jdbc.Driver (version 2.7)
2023-08-07 15:12:24,347 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-8) WFLYJCA0005: Deploying non-JDBC-compliant driver class org.postgresql.Driver (version 42.6)
2023-08-07 15:12:24,359 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-4) WFLYJCA0018: Started Driver service with driver-name = mariadb-java-client-2.7.9.jar
2023-08-07 15:12:24,360 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-5) WFLYJCA0018: Started Driver service with driver-name = postgresql-42.6.0.jar
2023-08-07 15:12:25,983 INFO  [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-6) WFLYJCA0001: Bound data source [java:jboss/datasources/guardianDS]
2023-08-07 15:12:26,260 INFO  [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 86) c3157ac4595fab71f52da2: no members discovered after 2527 ms: creating cluster as coordinator


This worked previously with wildfly 26 version of things. 

In the CLI that gets run during pod deployment I'm doing this:

batch

# configure KUBE_PING so that wildfly clustering will work in kubernetes
/subsystem=jgroups/stack=tcp/protocol=MPING:remove()
/subsystem=jgroups/stack=tcp/protocol=kubernetes.KUBE_PING:add(add-index=1, properties={namespace="${K8S_NAMESPACE}", labels="app.kubernetes.io/name=guardian"})
/subsystem=jgroups/channel=ee:write-attribute(name=stack, value=tcp)

run-batch
reload


In addition to the above in the .cli file, The helm deployment also creates a cluster role, cluster role binding, etc.  I'm going to dig into this some more.

Stephen Sill II

unread,
Aug 7, 2023, 12:32:00 PM8/7/23
to WildFly
I believe I have this working!

so as it turns out the main issue was that the argument to labels= in kube_ping configuration used to work, but since upgrading kubernetes the app.kubernetes.io/ is not needed and things work without it.  I now see messages in the first pod when the second pod comes up to indicated the second pod has joined the cluster!


Paul Ferraro

unread,
Aug 7, 2023, 3:17:33 PM8/7/23
to WildFly
On Monday, August 7, 2023 at 10:22:37 AM UTC-4 Stephen Sill II wrote:
Hi Paul,
I'm just getting back to this now.   thanks for your reply!

What I'm seeing currently with wildfly 28 application deployed via helm with two replicas is that some ejb scheduled tasks that should only run every 5 min on one of the nodes is in fact running on both.  I can recreate this in minikube locally.

So what I'm trying to do is:

Pod comes up, a Singleton EJB asks the TimerService for the jobs that are schedule.  If it doesn't find the two jobs that should be running, it creates the jobs.   

This logic is prone to race conditions.  Such timers should be automatically created, if possible.  See:

Stephen Sill II

unread,
Aug 7, 2023, 3:43:48 PM8/7/23
to WildFly
I was originally using the @Schedule annotation for this stuff, and no JMS.

The problem was that these tasks can take several minutes.   What would happen is that the old timer code on wildfly 26 would update the db to say the timer task was IN_TIMEOUT.  If pods got shutdown in the middle of this process you could be left with a task that would never execute again.  

So what I came up with was to have the timer task fire, and then quickly put  JMS message on a queue that's backed by a mdb pool with size 1.  This would allowed the timer task to be complete and the actual work be done in a serial fashion.  I can try going back to using the @Schedule annotation instead of having the this @PostConstruct method in my singleton scheduler service ejb:

    @PostConstruct
    public void initialize()
    {
        LOG.info(String.format("SchedulerServiceEJB is initializing on node %s", System.getProperty(GuardianAppConstants.SYSTEM_PROP__JBOSS_TX_NODE_ID)));

        boolean foundTaxiiPollTask = false;
        boolean foundMaxmindPollTask = false;

        Collection<Timer> existingTimers = timerService.getAllTimers();
        if (existingTimers != null && !existingTimers.isEmpty())
        {
            for (Timer timer : existingTimers)
            {
                Timer jbossTimer = (Timer) timer;
                Serializable infoS = jbossTimer.getInfo();

                if (TASK__TAXII_POLL_SCHEDULER.equals(infoS))
                {
                    // this task exists and is scheduled
                    foundTaxiiPollTask = true;
                }
                else if (TASK__MAXMIND_POLL_SCHEDULER.equals(infoS))
                {
                    // this task exists and is scheduled
                    foundMaxmindPollTask = true;
                }
            }
        }

        if (!foundTaxiiPollTask)
        {
            LOG.info("Scheduling taxii poll scheduler task");
            //
            // Schedule the polling of taxii feeds
            TimerConfig taxiiPollTimerConfig = new TimerConfig(TASK__TAXII_POLL_SCHEDULER, true);
            Timer taxiiPollTimer = timerService.createIntervalTimer(TAXII_POLL_SCHEDULER__INITIAL_DELAY_MS, TAXII_POLL_SCHEDULER_INTERVAL_MS, taxiiPollTimerConfig);
            LOG.info(String.format("The task %s has been scheduled", (String) taxiiPollTimer.getInfo()));
        }
        else
        {
            LOG.info("Not scheduling taxii poll task since it's already present");
        }

        if (!foundMaxmindPollTask)
        {
            LOG.info("Scheduling maxmind poll scheduler task");
            //
            // Schedule the polling of maxmind feeds
            TimerConfig maxmindPollTimerConfig = new TimerConfig(TASK__MAXMIND_POLL_SCHEDULER, true);
            Timer maxmindPollTimer = timerService.createIntervalTimer(MAXMIND_POLL_SCHEDULER__INITIAL_DELAY_MS, MAXMIND_POLL_SCHEDULER_INTERVAL_MS, maxmindPollTimerConfig);
            LOG.info(String.format("The task %s has been scheduled", (String) maxmindPollTimer.getInfo()));
        }
        else
        {
            LOG.info("Not scheduling maxmind poll task since it's already present");
        }
    }

This app is normally deployed on GKE(k8s on gcp) so the pods can tend to get shutdown and moved to other nodes as the cluster upgrades itself, so I'm really trying to simulate normal circumstances to make sure it's pretty bullet proof before I deploy this again.  What I've been doing locally in minikube is deploying the app using helm,  letting everything come up and then scale the deployment to 2 replicas.  I now have this all working and I see messages in pod A that pod B joined the cluster.  this all works great, the every 5 min task bounces between the two cluster nodes.  However, if I scale the deployment back down to 1,   things start getting wonky.

I start seeing messages like this
2023-08-07 19:07:43,303 WARN  [org.apache.activemq.artemis.core.server] (Thread-28 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@66e92b4f)) AMQ224091: Bridge ClusterConnectionBridge@3b9b18c8 [name=$.artemis.internal.sf.my-cluster.2dd068a6-3551-11ee-9c85-d69ed7020d51, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.2dd068a6-3551-11ee-9c85-d69ed7020d51, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@355cf1cb targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@3b9b18c8 [name=$.artemis.internal.sf.my-cluster.2dd068a6-3551-11ee-9c85-d69ed7020d51, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.2dd068a6-3551-11ee-9c85-d69ed7020d51, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@355cf1cb targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=10-244-0-251], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1492564989[nodeUUID=b487d677-3554-11ee-b29b-5ebb19f492d3, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=10-244-0-254, address=jms, server=ActiveMQServerImpl::name=default])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=10-244-0-251], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying

I added this in to try to fix that
# since the pods acan be added/removed, we need to let wildfly know not to retry brokers forever
/subsystem=messaging-activemq/server=default/cluster-connection=my-cluster:write-attribute(name=reconnect-attempts, value=1)

I'll go ahead and try the @Schedule annotation thing again to see if that helps.

Thanks for your help,
Stephen

Stephen Sill II

unread,
Aug 8, 2023, 1:37:28 PM8/8/23
to WildFly
I did end up getting this all working.  without persisting the timers to the db.  Thanks for your help Paul!   I also went back to using @Schedule annotations.

Stephen

Reply all
Reply to author
Forward
0 new messages