I was originally using the @Schedule annotation for this stuff, and no JMS.
The problem was that these tasks can take several minutes. What would happen is that the old timer code on wildfly 26 would update the db to say the timer task was IN_TIMEOUT. If pods got shutdown in the middle of this process you could be left with a task that would never execute again.
So what I came up with was to have the timer task fire, and then quickly put JMS message on a queue that's backed by a mdb pool with size 1. This would allowed the timer task to be complete and the actual work be done in a serial fashion. I can try going back to using the @Schedule annotation instead of having the this @PostConstruct method in my singleton scheduler service ejb:
@PostConstruct
public void initialize()
{
LOG.info(String.format("SchedulerServiceEJB is initializing on node %s", System.getProperty(GuardianAppConstants.SYSTEM_PROP__JBOSS_TX_NODE_ID)));
boolean foundTaxiiPollTask = false;
boolean foundMaxmindPollTask = false;
Collection<Timer> existingTimers = timerService.getAllTimers();
if (existingTimers != null && !existingTimers.isEmpty())
{
for (Timer timer : existingTimers)
{
Timer jbossTimer = (Timer) timer;
Serializable infoS = jbossTimer.getInfo();
if (TASK__TAXII_POLL_SCHEDULER.equals(infoS))
{
// this task exists and is scheduled
foundTaxiiPollTask = true;
}
else if (TASK__MAXMIND_POLL_SCHEDULER.equals(infoS))
{
// this task exists and is scheduled
foundMaxmindPollTask = true;
}
}
}
if (!foundTaxiiPollTask)
{
LOG.info("Scheduling taxii poll scheduler task");
//
// Schedule the polling of taxii feeds
TimerConfig taxiiPollTimerConfig = new TimerConfig(TASK__TAXII_POLL_SCHEDULER, true);
Timer taxiiPollTimer = timerService.createIntervalTimer(TAXII_POLL_SCHEDULER__INITIAL_DELAY_MS, TAXII_POLL_SCHEDULER_INTERVAL_MS, taxiiPollTimerConfig);
LOG.info(String.format("The task %s has been scheduled", (String) taxiiPollTimer.getInfo()));
}
else
{
LOG.info("Not scheduling taxii poll task since it's already present");
}
if (!foundMaxmindPollTask)
{
LOG.info("Scheduling maxmind poll scheduler task");
//
// Schedule the polling of maxmind feeds
TimerConfig maxmindPollTimerConfig = new TimerConfig(TASK__MAXMIND_POLL_SCHEDULER, true);
Timer maxmindPollTimer = timerService.createIntervalTimer(MAXMIND_POLL_SCHEDULER__INITIAL_DELAY_MS, MAXMIND_POLL_SCHEDULER_INTERVAL_MS, maxmindPollTimerConfig);
LOG.info(String.format("The task %s has been scheduled", (String) maxmindPollTimer.getInfo()));
}
else
{
LOG.info("Not scheduling maxmind poll task since it's already present");
}
}
This app is normally deployed on GKE(k8s on gcp) so the pods can tend to get shutdown and moved to other nodes as the cluster upgrades itself, so I'm really trying to simulate normal circumstances to make sure it's pretty bullet proof before I deploy this again. What I've been doing locally in minikube is deploying the app using helm, letting everything come up and then scale the deployment to 2 replicas. I now have this all working and I see messages in pod A that pod B joined the cluster. this all works great, the every 5 min task bounces between the two cluster nodes. However, if I scale the deployment back down to 1, things start getting wonky.
I start seeing messages like this
2023-08-07 19:07:43,303 WARN [org.apache.activemq.artemis.core.server] (Thread-28 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@66e92b4f)) AMQ224091: Bridge ClusterConnectionBridge@3b9b18c8 [name=$.artemis.internal.sf.my-cluster.2dd068a6-3551-11ee-9c85-d69ed7020d51, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.2dd068a6-3551-11ee-9c85-d69ed7020d51, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@355cf1cb targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@3b9b18c8 [name=$.artemis.internal.sf.my-cluster.2dd068a6-3551-11ee-9c85-d69ed7020d51, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.2dd068a6-3551-11ee-9c85-d69ed7020d51, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@355cf1cb targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=10-244-0-251], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1492564989[nodeUUID=b487d677-3554-11ee-b29b-5ebb19f492d3, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=10-244-0-254, address=jms, server=ActiveMQServerImpl::name=default])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=10-244-0-251], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying
I added this in to try to fix that
# since the pods acan be added/removed, we need to let wildfly know not to retry brokers forever
/subsystem=messaging-activemq/server=default/cluster-connection=my-cluster:write-attribute(name=reconnect-attempts, value=1)
I'll go ahead and try the @Schedule annotation thing again to see if that helps.
Thanks for your help,
Stephen