XARecovery of failed commit is not working and I can't understand why

777 views
Skip to first unread message

Olivier Masseau

unread,
Jun 29, 2023, 12:11:17 PM6/29/23
to WildFly
Hello,

I have a distributed transaction looking like this :
  1. - Update DB record (My DB is postgres)
  2. - Send JMS message

If 2PhaseCommit fails after the DB commit (I'm able to simulate an error by forcing the generation of an XAException in XAResourceWrapperImpl.commit) :

1) DB.prepare()
2) JMS.prepare()
3) DB.commit()
-> throws an XAException
4) JMS.commit()


then recovery does not work and the prepared transaction stays forever in Postgres (I can see it by doing a select * from pg_prepared_xacts).

Periodic recovery just displays these messages :

17:22:50,192 WARN  [com.arjuna.ats.arjuna] (Periodic Recovery) Transaction 0:ffffac1001a3:2b87b80f:649d4c47:137b has 1 heuristic participant(s)!

17:22:50,193 WARN  [com.arjuna.ats.jta] (Periodic Recovery) XAResourceRecord restored heuristic instance: XAResourceRecord < resource:XAResourceWrapperImpl@82b3a9f[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@108b77c8 pad=false overrideRmValue=null productName=PostgreSQL productVersion=12.12 jndiName=java:/jdbc/AofDS], txid:< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:2b87b80f:649d4c47:137b, node_name=1, branch_uid=0:ffffac1001a3:2b87b80f:649d4c47:137f, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, heuristic: TwoPhaseOutcome.HEURISTIC_HAZARD, product: PostgreSQL/12.12, jndiName: java:/jdbc/AofDS com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@b4d8189 >

17:22:50,193 WARN  [com.arjuna.ats.arjuna] (Periodic Recovery) Transaction 0:ffffac1001a3:2b87b80f:649d4c47:137b restored heuristic participant XAResourceRecord < resource:XAResourceWrapperImpl@82b3a9f[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@108b77c8 pad=false overrideRmValue=null productName=PostgreSQL productVersion=12.12 jndiName=java:/jdbc/AofDS], txid:< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:2b87b80f:649d4c47:137b, node_name=1, branch_uid=0:ffffac1001a3:2b87b80f:649d4c47:137f, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, heuristic: TwoPhaseOutcome.HEURISTIC_HAZARD, product: PostgreSQL/12.12, jndiName: java:/jdbc/AofDS com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@b4d8189 >


As you can see it correctly restored the XAResourceRecord from Wildfly tx-object-store (standalone\data\tx-object-store\ShadowNoFileLockStore\defaultStore\StateManager\BasicAction\TwoPhaseCoordinator\AtomicAction).
The txid correspond to the txid of the prepared transaction found in Postgres.

When debugging the code of AtomicActionRecoveryModule.doRecoverTransaction if can see it calls RecoverAtomicAction.replayPhase2().
RecoverAtomicAction.replayPhase2() calls BasicAction.phase2Commit()
The problem is that pendingList and preparedList are always empty. So no commit or abort is effectively done.
Only the heuristicList contains my XAResourceRecord that needs to be recovered.

As a result the record stays in the tx-objet-store forever and the prepared_statement stays in Postgres forever.

I don't understand why the recovery is not done.

To avoid the never ending messages in the logs I tried to enable JTAEnvironmentBean.xaAssumeRecoveryComplete in standalone.xml but it does nothing :
<property name="JTAEnvironmentBean.xaAssumeRecoveryComplete" value="true"/>

Olivier Masseau

unread,
Jun 29, 2023, 2:23:00 PM6/29/23
to WildFly
Here is another test I did. I threw a SQLException in the DB.commit phase :

1) DB.prepare()
2) JMS.prepare()
3) DB.commit()
-> throws an XAException
4) JMS.commit()


--> No recovery is happening. I expect recovery to do a rollback of the prepared transaction


The full trace (I tried to highlight in bold what seems important):


19:53:42,444 WARN  [com.arjuna.ats.jta] (default task-3 : callProgram.xhtml-ADMIN@fr) ARJUNA016036: commit on < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS > (XAResourceWrapperImpl@2d2c35d4[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@3eaeb01c pad=false overrideRmValue=null productName=PostgreSQL productVersion=12.12 jndiName=java:/jdbc/AofDS]) failed with exception $XAException.XAER_RMERR: org.postgresql.xa.PGXAException: Error committing prepared transaction. commit xid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, preparedXid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, currentXid=null
at org.pos...@42.2.20//org.postgresql.xa.PGXAConnection.commitPrepared(PGXAConnection.java:614)
...
Caused by: java.sql.SQLException: ko



19:57:17,629 WARN  [com.arjuna.ats.jta] (default task-3 : callProgram.xhtml-ADMIN@fr) ARJUNA016045: attempted rollback of < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c60, subordinatenodename=null, eis_name=java:/JmsXA NodeId:f33f4cde-bf55-11ec-a57b-9cb6d0fa80ff > (XAResourceWrapperImpl@49e48f8f[xaResource=org.jboss.activemq.artemis.wildfly.integration.WildFlyActiveMQXAResourceWrapper@66e65497 pad=false overrideRmValue=null productName=ActiveMQ Artemis productVersion=2.0 jndiName=java:/JmsXA NodeId:f33f4cde-bf55-11ec-a57b-9cb6d0fa80ff]) failed with exception code XAException.XAER_RMFAIL: javax.transaction.xa.XAException
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollback(ClientSessionImpl.java:1880)
...
Caused by: org.apache.activemq.artemis.api.core.ActiveMQInterruptedException: java.lang.InterruptedException
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:506)
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.xaRollback(ActiveMQSessionContext.java:667)
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollback(ClientSessionImpl.java:1858)
... 153 more
Caused by: java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2197)
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)


19:57:43,370 WARN  [com.arjuna.ats.arjuna] (default task-3 : callProgram.xhtml-ADMIN@fr) ARJUNA012091: Top-level abort of action 0:ffffac1001a3:-d0caef9:649dc2b0:c52 received TwoPhaseOutcome.FINISH_ERROR from com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord


19:58:26,505 WARN  [com.arjuna.ats.jta] (default task-3 : callProgram.xhtml-ADMIN@fr) ARJUNA016006: XAResourceRecord forget failed: - forget threw exception: org.postgresql.xa.PGXAException: Heuristic commit/rollback not supported. forget xid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >
at org.pos...@42.2.20//org.postgresql.xa.PGXAConnection.forget(PGXAConnection.java:631)



19:59:13,492 ERROR [org.jboss.as.ejb3.invocation] (default task-3 : callProgram.xhtml-ADMIN@fr) WFLYEJB0034: Jakarta Enterprise Beans Invocation failed on component TestService for method public void com.real.aof.business.service.TestService.testXA(com.real.aof.session.bean.CurrentUser) throws java.lang.Exception: javax.ejb.EJBTransactionRolledbackException: javax.transaction.RollbackException: ARJUNA016053: Could not commit transaction.
at org.jbos...@23.0.2.Final//org.jboss.as.ejb3.tx.CMTTxInterceptor.endTransaction(CMTTxInterceptor.java:117)
...
Caused by: javax.transaction.RollbackException: ARJUNA016053: Could not commit transaction.
at org.jboss.jts//com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1307)
...
Suppressed: org.postgresql.xa.PGXAException: Error committing prepared transaction. commit xid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, preparedXid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, currentXid=null
at org.pos...@42.2.20//org.postgresql.xa.PGXAConnection.commitPrepared(PGXAConnection.java:614)
...
Caused by: java.sql.SQLException: ko
at org.pos...@42.2.20//org.postgresql.xa.PGXAConnection.commitPrepared(PGXAConnection.java:591)




Periodic recovery kicks in :

20:00:36,496 WARN  [com.arjuna.ats.arjuna] (Periodic Recovery) Transaction 0:ffffac1001a3:-d0caef9:649dc2b0:c52 has 1 heuristic participant(s)!

20:00:36,496 WARN  [com.arjuna.ats.jta] (Periodic Recovery) XAResourceRecord restored heuristic instance: XAResourceRecord < resource:XAResourceWrapperImpl@2bac8686[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@694092b8 pad=false overrideRmValue=null productName=PostgreSQL productVersion=12.12 jndiName=java:/jdbc/AofDS], txid:< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, heuristic: TwoPhaseOutcome.HEURISTIC_ROLLBACK, product: PostgreSQL/12.12, jndiName: java:/jdbc/AofDS com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@50a8d615 >

20:00:36,496 WARN  [com.arjuna.ats.arjuna] (Periodic Recovery) Transaction 0:ffffac1001a3:-d0caef9:649dc2b0:c52 restored heuristic participant XAResourceRecord < resource:XAResourceWrapperImpl@2bac8686[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@694092b8 pad=false overrideRmValue=null productName=PostgreSQL productVersion=12.12 jndiName=java:/jdbc/AofDS], txid:< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, heuristic: TwoPhaseOutcome.HEURISTIC_ROLLBACK, product: PostgreSQL/12.12, jndiName: java:/jdbc/AofDS com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@50a8d615 >





20:02:46,489 WARN  [com.arjuna.ats.arjuna] (Periodic Recovery) Transaction 0:ffffac1001a3:-d0caef9:649dc2b0:c52 has 1 heuristic participant(s)!

20:02:46,489 WARN  [com.arjuna.ats.jta] (Periodic Recovery) XAResourceRecord restored heuristic instance: XAResourceRecord < resource:XAResourceWrapperImpl@30025a47[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@694092b8 pad=false overrideRmValue=null productName=PostgreSQL productVersion=12.12 jndiName=java:/jdbc/AofDS], txid:< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, heuristic: TwoPhaseOutcome.HEURISTIC_ROLLBACK, product: PostgreSQL/12.12, jndiName: java:/jdbc/AofDS com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@36488d04 >

20:02:46,489 WARN  [com.arjuna.ats.arjuna] (Periodic Recovery) Transaction 0:ffffac1001a3:-d0caef9:649dc2b0:c52 restored heuristic participant XAResourceRecord < resource:XAResourceWrapperImpl@30025a47[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@694092b8 pad=false overrideRmValue=null productName=PostgreSQL productVersion=12.12 jndiName=java:/jdbc/AofDS], txid:< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, heuristic: TwoPhaseOutcome.HEURISTIC_ROLLBACK, product: PostgreSQL/12.12, jndiName: java:/jdbc/AofDS com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@36488d04 >


.... continues forever .....

Martin Choma

unread,
Jun 30, 2023, 3:58:42 AM6/30/23
to WildFly
Hi Olivier,

in your other post you are saying "My webapp (running on Wildfly 23)". Are you observing this behaviour also on WF 23? Could you upgrade to WF 28 if you still see this problem?

Regards,
Martin 

Message has been deleted
Message has been deleted

Martin Choma

unread,
Jun 30, 2023, 4:15:57 AM6/30/23
to WildFly

I see from stacktrace "at org.jbos...@23.0.2.Final" that it really is WF 23. That version is 2 years old and lot of happened since then. Could you reproduce on latest WF 28?

Manuel Finelli

unread,
Jun 30, 2023, 9:02:40 AM6/30/23
to Olivier Masseau, WildFly
Hi Olivier,

The two logs you attached to your emails are different: HEURISTIC_ROLLBACK and HEURISTIC_HAZARD are two different stories. In general though, heuristic transactions should be resolved manually. Off the top of my head, Periodic Recovery and its (default) Recovery Modules are not able to solve heuristic transactions on their own. Having said that, let's go through your (last) logs:

On Thu, 29 Jun 2023 at 19:23, Olivier Masseau <omas...@gmail.com> wrote:
Here is another test I did. I threw a SQLException in the DB.commit phase :

1) DB.prepare()
2) JMS.prepare()
3) DB.commit()
-> throws an XAException
4) JMS.commit()


--> No recovery is happening. I expect recovery to do a rollback of the prepared transaction


The full trace (I tried to highlight in bold what seems important):


19:53:42,444 WARN  [com.arjuna.ats.jta] (default task-3 : callProgram.xhtml-ADMIN@fr) ARJUNA016036: commit on < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS > (XAResourceWrapperImpl@2d2c35d4[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@3eaeb01c pad=false overrideRmValue=null productName=PostgreSQL productVersion=12.12 jndiName=java:/jdbc/AofDS]) failed with exception $XAException.XAER_RMERR: org.postgresql.xa.PGXAException: Error committing prepared transaction. commit xid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, preparedXid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, currentXid=null
at org.pos...@42.2.20//org.postgresql.xa.PGXAConnection.commitPrepared(PGXAConnection.java:614)
...
Caused by: java.sql.SQLException: ko

XAER_RMERR -> this is pretty bad. From specs:

An error occurred in committing the work performed on behalf of the transaction branch and the branch’s work has been rolled back. Note that returning this error signals a catastrophic event to a transaction manager since other resource
managers may successfully commit their work on behalf of this branch. This error should be returned only when a resource manager concludes that it can never commit the branch and that it cannot hold the branch’s resources in a prepared
state. Otherwise, [XA_RETRY] should be returned.

I guess you're using byteman at this point and you're testing your code (I recall you wanted to test your code in particular failing scenarios). If this is the case, your testing is successful and the server is telling you that something (big) at the resource manager went wrong during a transaction.



19:57:17,629 WARN  [com.arjuna.ats.jta] (default task-3 : callProgram.xhtml-ADMIN@fr) ARJUNA016045: attempted rollback of < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c60, subordinatenodename=null, eis_name=java:/JmsXA NodeId:f33f4cde-bf55-11ec-a57b-9cb6d0fa80ff > (XAResourceWrapperImpl@49e48f8f[xaResource=org.jboss.activemq.artemis.wildfly.integration.WildFlyActiveMQXAResourceWrapper@66e65497 pad=false overrideRmValue=null productName=ActiveMQ Artemis productVersion=2.0 jndiName=java:/JmsXA NodeId:f33f4cde-bf55-11ec-a57b-9cb6d0fa80ff]) failed with exception code XAException.XAER_RMFAIL: javax.transaction.xa.XAException
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollback(ClientSessionImpl.java:1880)
...
Caused by: org.apache.activemq.artemis.api.core.ActiveMQInterruptedException: java.lang.InterruptedException
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:506)
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.xaRollback(ActiveMQSessionContext.java:667)
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollback(ClientSessionImpl.java:1858)
... 153 more
Caused by: java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2197)
at org.apache.ac...@2.16.0//org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)

Again, from the specs:

XAER_RMFAIL -> An error occurred that makes the resource manager unavailable.

I guess this is still byteman.



19:57:43,370 WARN  [com.arjuna.ats.arjuna] (default task-3 : callProgram.xhtml-ADMIN@fr) ARJUNA012091: Top-level abort of action 0:ffffac1001a3:-d0caef9:649dc2b0:c52 received TwoPhaseOutcome.FINISH_ERROR from com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord


19:58:26,505 WARN  [com.arjuna.ats.jta] (default task-3 : callProgram.xhtml-ADMIN@fr) ARJUNA016006: XAResourceRecord forget failed: - forget threw exception: org.postgresql.xa.PGXAException: Heuristic commit/rollback not supported. forget xid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c52, node_name=1, branch_uid=0:ffffac1001a3:-d0caef9:649dc2b0:c56, subordinatenodename=null, eis_name=java:/jdbc/AofDS >
at org.pos...@42.2.20//org.postgresql.xa.PGXAConnection.forget(PGXAConnection.java:631)

I had to have a look online for "org.postgresql.xa.PGXAException: Heuristic commit/rollback not supported". Apparently, forget returns this exception in PGXAConnection. The reason seems to be "Does nothing, since we don't do heuristics,". I don't know what this means but, from what the Periodic Recovery reports, it looks like postgreSQL rolled back the transaction it was told to commit (that's why you get HEURISTIC_ROLLBACK). Maybe, something is wrong with the configuration of the datasource? Could you share the configuration for your postgreSQL XA datasource?

--
You received this message because you are subscribed to the Google Groups "WildFly" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wildfly+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wildfly/ca743f75-653f-441b-a31b-788b0d84e231n%40googlegroups.com.

Olivier Masseau

unread,
Jun 30, 2023, 10:42:56 AM6/30/23
to WildFly
Hi Manuel,

Indeed the error was generated with byteman by making PGXAConnection.commit throw a SQLException("ko")

Here is my datasource :

<xa-datasource jndi-name="java:/jdbc/AofDS" pool-name="AofDS_Pool" enabled="true" use-java-context="true" spy="true" use-ccm="true" statistics-enabled="true">
<xa-datasource-property name="URL">
jdbc:postgresql://localhost:5432/RFLOW_BACKEND?currentSchema=real
</xa-datasource-property>
<driver>postgresql</driver>
<transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
<xa-pool>
<min-pool-size>10</min-pool-size>
<initial-pool-size>20</initial-pool-size>
<max-pool-size>200</max-pool-size>
<prefill>true</prefill>
<use-strict-min>true</use-strict-min>
<flush-strategy>FailingConnectionOnly</flush-strategy>
</xa-pool>
<security>
<user-name>****</user-name>
<password>****</password>
</security>
<validation>
<check-valid-connection-sql>/* ping */ SELECT 1</check-valid-connection-sql>
<validate-on-match>false</validate-on-match>
<background-validation>true</background-validation>
<background-validation-millis>60000</background-validation-millis>
</validation>
<timeout>
<blocking-timeout-millis>30000</blocking-timeout-millis>
<idle-timeout-minutes>5</idle-timeout-minutes>
<query-timeout>10</query-timeout>
</timeout>
<statement>
<track-statements>true</track-statements>
<prepared-statement-cache-size>100</prepared-statement-cache-size>
<share-prepared-statements>true</share-prepared-statements>
</statement>
</xa-datasource>

Olivier Masseau

unread,
Jun 30, 2023, 10:48:57 AM6/30/23
to WildFly
Hi Martin,

Unfortunatly we are currently stuck with JavaEE 8 so we cannot upgrade to the latest version.
But If I understand there were some modifications made to Wildfly's transaction manager/recovery between these 2 versions ?

I think we could maybe upgrade to Wildfly 26 max (latest version supporting EE 8 if I recall correctly) to see if there is any difference.

Manuel Finelli

unread,
Jun 30, 2023, 11:47:26 AM6/30/23
to Olivier Masseau, WildFly
mmm...your datasource looks good to me :-/ Is it possible to get a reproducer?

Olivier Masseau

unread,
Jun 30, 2023, 4:56:16 PM6/30/23
to WildFly
Hi, thanks for you time helping :)

I could try to get a reproducer.

In the meantime I've managed to have a recovery done by simulating instead a closed DB connection at the time of the commit (Which is maybe more representative of a real world problem that could happen in production than throwing a SQLException during this commit phase) :)
In this case I can see indeed that I get a XAER_RMFAIL instead of getting a XAER_RMERR :

22:18:31,285 WARN  [com.arjuna.ats.jta] (default task-20 : callProgram.xhtml-ADMIN@fr) ARJUNA016036: commit on < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2237, node_name=1, branch_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2242, subordinatenodename=null, eis_name=java:/jdbc/AofDS > (XAResourceWrapperImpl@9895891[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@6a65c512 pad=false overrideRmValue=null productName=PostgreSQL productVersion=12.12 jndiName=java:/jdbc/AofDS]) failed with exception $XAException.XAER_RMFAIL: org.postgresql.xa.PGXAException: Error committing prepared transaction. commit xid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2237, node_name=1, branch_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2242, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, preparedXid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2237, node_name=1, branch_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2242, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, currentXid=null
at org.pos...@42.2.20//org.postgresql.xa.PGXAConnection.commitPrepared(PGXAConnection.java:614)
...
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.pos...@42.2.20//org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:877) -> I've modified the code so that it returns true
at org.pos...@42.2.20//org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:1727)
at org.pos...@42.2.20//org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:423)
at org.pos...@42.2.20//org.postgresql.xa.PGXAConnection.commitPrepared(PGXAConnection.java:589)


After that the recovery is done correctly via the Periodic Recovery. Hooray :)
But (yes there is a but :p), something is still problematic to me :

1) The JMS message is still commited in the 2nd phase of 2PC.  This is problematic as it means the JMS message is commited to the queue before the DB commit is done.
This is never the case when I don't simulate a commit error as I can manage to always have the DB resource enlisted in the transaction first and JMS resource enlisted in second position so that 2PC is done this way :
  1.  DB.prepare
  2. JMS.prepare
  3. DB.commit
  4. JMS.commit
If the DB commit fails, shouln't the JMS commit be aborted and instead also be enlisted for later recovery by the Periodic Recovery process, so that the commits can still be performed in the correct order ?
For us it is really important to always have the DB transaction be commited first, then the JMS transaction commited after (as the message listener process needs to read some data that have been commited by the DB transaction).

2) This is a minor rant : Shouldn't we have a log message generated when the recovery of the DB XA record happens ?  Without it there is no way to know there was a recovery and it can be quite disturbing not being sure it was effectively done and, mostly important: when is was done.
Maybe there is a way to have a log message be generated through some configuration ? (As you guess having the log level of com.arjrna.ats in TRACE is not a solution when running in production :))

Thanks a lot.

Manuel Finelli

unread,
Jul 3, 2023, 5:47:17 AM7/3/23
to Olivier Masseau, WildFly
Hi Olivier,

On Fri, 30 Jun 2023 at 21:56, Olivier Masseau <omas...@gmail.com> wrote:
Hi, thanks for you time helping :)

You're welcome :-)

I could try to get a reproducer. 

A reproducer would be great.

In the meantime I've managed to have a recovery done by simulating instead a closed DB connection at the time of the commit (Which is maybe more representative of a real world problem that could happen in production than throwing a SQLException during this commit phase) :)
In this case I can see indeed that I get a XAER_RMFAIL instead of getting a XAER_RMERR :

22:18:31,285 WARN  [com.arjuna.ats.jta] (default task-20 : callProgram.xhtml-ADMIN@fr) ARJUNA016036: commit on < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2237, node_name=1, branch_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2242, subordinatenodename=null, eis_name=java:/jdbc/AofDS > (XAResourceWrapperImpl@9895891[xaResource=org.jboss.jca.adapters.jdbc.xa.XAManagedConnection@6a65c512 pad=false overrideRmValue=null productName=PostgreSQL productVersion=12.12 jndiName=java:/jdbc/AofDS]) failed with exception $XAException.XAER_RMFAIL: org.postgresql.xa.PGXAException: Error committing prepared transaction. commit xid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2237, node_name=1, branch_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2242, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, preparedXid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2237, node_name=1, branch_uid=0:ffffc0a8b25b:38f936a5:649f24c0:2242, subordinatenodename=null, eis_name=java:/jdbc/AofDS >, currentXid=null
at org.pos...@42.2.20//org.postgresql.xa.PGXAConnection.commitPrepared(PGXAConnection.java:614)
...
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.pos...@42.2.20//org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:877) -> I've modified the code so that it returns true
at org.pos...@42.2.20//org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:1727)
at org.pos...@42.2.20//org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:423)
at org.pos...@42.2.20//org.postgresql.xa.PGXAConnection.commitPrepared(PGXAConnection.java:589)


After that the recovery is done correctly via the Periodic Recovery. Hooray :)

Yeah, that sounds more like it :-)
 
But (yes there is a but :p), something is still problematic to me :

1) The JMS message is still commited in the 2nd phase of 2PC.  This is problematic as it means the JMS message is commited to the queue before the DB commit is done.
This is never the case when I don't simulate a commit error as I can manage to always have the DB resource enlisted in the transaction first and JMS resource enlisted in second position so that 2PC is done this way :
  1.  DB.prepare
  2. JMS.prepare
  3. DB.commit
  4. JMS.commit
OK, I think/hope I understood your scenario now. 
If the DB commit fails, shouln't the JMS commit be aborted and instead also be enlisted for later recovery by the Periodic Recovery process, so that the commits can still be performed in the correct order ?

Nope. We go through the entire list of prepared resources and try to commit all of them. I guess that "going through all prepared resources even in case of errors" is an attempt to try to get the best outcome from the 2PC protocol and then recover the (hopefully, few) resources that failed to commit.

For us it is really important to always have the DB transaction be commited first, then the JMS transaction commited after (as the message listener process needs to read some data that have been commited by the DB transaction).

Jonathan's blog might be useful here. If that is not helpful:
1) Have you considered sub-transactions? (i.e. REQUIRES_NEW)
2) Could LRCO be an option? (a bit risky)

2) This is a minor rant : Shouldn't we have a log message generated when the recovery of the DB XA record happens ?  Without it there is no way to know there was a recovery and it can be quite disturbing not being sure it was effectively done and, mostly important: when is was done.
Maybe there is a way to have a log message be generated through some configuration ? (As you guess having the log level of com.arjrna.ats in TRACE is not a solution when running in production :))

Strange, you should see a WARN :-/ Is WARN the right logging level for "com.arjuna"
 

Olivier Masseau

unread,
Jul 4, 2023, 5:08:43 AM7/4/23
to WildFly
Hi

Nope. We go through the entire list of prepared resources and try to commit all of them. I guess that "going through all prepared resources even in case of errors" is an attempt to try to get the best outcome from the 2PC protocol and then recover the (hopefully, few) resources that failed to commit.

-> Ok but doesn't that mean that we take the risk to loose consistency (and so going against the ACID principle) if recovery cannot be performed for the DB ?
Wouldn't it be better in term of consistency to not try to commit the remaining resources if a previously enlisted resource commit fails (or maybe give us the choice :)) ? For me 2PC should be atomic AND recovery should be atomic.



Jonathan's blog might be useful here. If that is not helpful:
1) Have you considered sub-transactions? (i.e. REQUIRES_NEW)
2) Could LRCO be an option? (a bit risky)


-> Thanks for the link.
1) You mean having the DB part in a sub-transaction ?
Problem is if the JMS commit fails we still need the DB transaction to be automatically rollbacked.


2) I've checked LRCO. Indeed it will allow us to be sur JMS always commit first. But as you said it is risky.
We still prefer having the JMS message be sent and generating an error after if no data is found in DB that not having the JMS message be sent at all :)


Strange, you should see a WARN :-/ Is WARN the right logging level for "com.arjuna"
-> I've rechecked I can confirm there is no message confirming the recovery in WARN or INFO ;)

Michael Musgrove

unread,
Jul 5, 2023, 5:11:16 PM7/5/23
to WildFly
Heuristic outcomes are generally unrecoverable and you will need to manually tell the database what to do.
If you have access to the WildFly CLI you may, depending upon what kind of heuristic you generated, be able to move it back into the prepare state which will cause the recovery system to have another attempt at the commit call.

Michael Musgrove

unread,
Jul 6, 2023, 5:04:10 AM7/6/23
to WildFly
Olivier,

> -> Ok but doesn't that mean that we take the risk to loose consistency (and so going against the ACID principle) if recovery cannot be performed for the DB ?
> Wouldn't it be better in term of consistency to not try to commit the remaining resources if a previously enlisted resource commit fails (or maybe give us the choice :)) ? For me 2PC should be atomic AND recovery should be atomic.

If the *first* participant returns HEURISTIC_ROLLBACK then we do indeed switch to rollback.
If it returns any other heuristic then we have no option but to continue with the remaining participants.
And once we're passed the first participant there is no going back!

Michael Musgrove

unread,
Jul 6, 2023, 5:16:54 AM7/6/23
to WildFly
You said:

> 2) I've checked LRCO. Indeed it will allow us to be sur JMS always commit first. But as you said it is risky.

But we have a fix for that: we patented a risk free alternative to LRCO a number of years ago and it is called Commit Markable Resourcew: https://docs.wildfly.org/28/wildscribe/subsystem/transactions/commit-markable-resource/index.htm

Michael Musgrove

unread,
Jul 6, 2023, 5:26:06 AM7/6/23
to WildFly
The WildFly link was missing an `l`, the model reference for CMR is https://docs.wildfly.org/28/wildscribe/subsystem/transactions/commit-markable-resource/index.html
And Ondra wrote an excellent article that covers it in depth: https://jbossts.blogspot.com/2018/06/narayana-commit-markable-resource.html

Michael Musgrove

unread,
Jul 6, 2023, 8:09:15 AM7/6/23
to WildFly
I was mistaken about using CMR in this example.

Manuel pointed out to me that it will not help you since CMR is designed for enlisting a single non-XA database resource in an XA/2PC transaction, whereas you are wanting resource ordering.

My apologies for adding a bit of noise to this conversation.

Scott Marlow

unread,
Jul 6, 2023, 9:47:42 AM7/6/23
to Michael Musgrove, WildFly

Fyi, there are existing Jakarta Specification issues that have been discussed for quite a while, like https://github.com/jakartaee/transactions/issues/65 + https://github.com/jakartaee/transactions/issues/58.

My question is whether this resource ordering issue is solvable by application code using PreparePriority/CommitPriority as mentioned in https://github.com/jakartaee/transactions/issues/58#issuecomment-526267697 (or something like that)?

Or should this resource ordering issue be solved automatically by WildFly without the application using something like PreparePriority/CommitPriority?

Scott


Michael Musgrove

unread,
Jul 6, 2023, 12:19:35 PM7/6/23
to WildFly
On Thursday, 6 July 2023 at 14:47:42 UTC+1 Scott Marlow wrote:

Fyi, there are existing Jakarta Specification issues that have been discussed for quite a while, like https://github.com/jakartaee/transactions/issues/65 + https://github.com/jakartaee/transactions/issues/58.
 
I think you are correct Scott, we seem to be getting this request more often now.

Given that there is a known window where resources committing earlier in the list expose their transactional changes before later ones are committed, resource ordering may alleviate some of the ill-effects of this.


My question is whether this resource ordering issue is solvable by application code using PreparePriority/CommitPriority as mentioned in https://github.com/jakartaee/transactions/issues/58#issuecomment-526267697 (or something like that)?

Or should this resource ordering issue be solved automatically by WildFly without the application using something like PreparePriority/CommitPriority?

I would vote for solving it at the spec level.

Michael

Olivier Masseau

unread,
Jul 7, 2023, 9:46:25 AM7/7/23
to WildFly
Hello Michael,

> If it returns any other heuristic then we have no option but to continue with the remaining participants.
> And once we're passed the first participant there is no going back!

But in my message dating from 30 juin 2023, 22:56:16, you can see that I got an XAER_RMFAIL (by simulating a closed connection during the DB commit) and that the periodic recovery has been able to recover it later as it was not an heuristic outcome.

The problem is : We have no garantee that the recovery for the DB transaction can work all the time (The prepared transaction could have in the meantime reached its timeout in the database - if a timeout has been configured for prepared transactions - and so this transaction could have been automatically deleted by the database, meaning the recovery of the transaction if not possible anymore).

Michael Musgrove

unread,
Jul 7, 2023, 2:35:55 PM7/7/23
to WildFly
Is there a reproducer for it?

Michael Musgrove

unread,
Jul 8, 2023, 6:33:12 AM7/8/23
to WildFly
There are known issues with some databases returning invalid XA codes so if you have found one of those it would need reporting to postgres, but they'd need a reproducer to work with if that is indeed the root cause of your problem.

Manuel Finelli

unread,
Jul 11, 2023, 9:16:33 AM7/11/23
to Olivier Masseau, WildFly
Hi Olivier,

On Tue, 4 Jul 2023 at 10:08, Olivier Masseau <omas...@gmail.com> wrote:
Hi

Nope. We go through the entire list of prepared resources and try to commit all of them. I guess that "going through all prepared resources even in case of errors" is an attempt to try to get the best outcome from the 2PC protocol and then recover the (hopefully, few) resources that failed to commit.

-> Ok but doesn't that mean that we take the risk to loose consistency (and so going against the ACID principle) if recovery cannot be performed for the DB ?
Wouldn't it be better in term of consistency to not try to commit the remaining resources if a previously enlisted resource commit fails (or maybe give us the choice :)) ? For me 2PC should be atomic AND recovery should be atomic.

A worse result would be obtained in case a forced ROLLBACK was used. For example, let's suppose we have a distributed transaction enlisting 10 (PREPARED) participants; after the Transaction Manager (TM) has successfully committed the first 5 XA resources, the 6th XA resource throws an XAException; going ahead and sending COMMIT to all other XA resources results in a more manageable scenario; in fact, if we ROLLBACK the distributed transactions the result will be that:
* 5 XA resources committed
* 4 XA resources rolled back
* 1 XA resource threw an exception
On the other hand, if we go ahead and send COMMIT to the other XA resources, we will end up (hopefully) with only one XA resource to recover.

Of course, if the first XA resource fails, there is room for sending ROLLBACK to all other XA resources (as Michael said).



Jonathan's blog might be useful here. If that is not helpful:
1) Have you considered sub-transactions? (i.e. REQUIRES_NEW)
2) Could LRCO be an option? (a bit risky)


-> Thanks for the link.
1) You mean having the DB part in a sub-transaction ?

Sorry about my poor explanation (or lack of explanation). What I had in mind was to use EJB remoting to simulate sub-transactions (I'm not an expert in EJB remoting). I tried this solution on my own to see if this idea works or not. Apparently, EJB remoting doesn't use sub-transactions as JTS does and instead it sticks with JTA, so my idea doesn't work very well. Anyway, here is the code so you can try it out.

Problem is if the JMS commit fails we still need the DB transaction to be automatically rollbacked.


2) I've checked LRCO. Indeed it will allow us to be sur JMS always commit first. But as you said it is risky.
We still prefer having the JMS message be sent and generating an error after if no data is found in DB that not having the JMS message be sent at all :)


Strange, you should see a WARN :-/ Is WARN the right logging level for "com.arjuna"
-> I've rechecked I can confirm there is no message confirming the recovery in WARN or INFO ;)

This is strange. In the code I have linked in this email I can see the WARN about recovering the transaction :-/

Overall, a reproducer would be very helpful to support you further

Reply all
Reply to author
Forward
0 new messages