XA recovery - dist_txn_sync

317 views
Skip to first unread message

scott gamble

unread,
Nov 1, 2022, 10:13:37 AM11/1/22
to WildFly
First off want to say that I am out of my element here. I am an Oracle DBA trying to help an application team work through an issue with XA recovery. I am mostly involved due to the load that it is putting on my Database.

Environment - Exadata Cloud at Customer
                          2 Node RAC

We recently upgraded one of our databases from Oracle 12.1 to 19.13. At the same time the application team moved to Wildfly.. If details are needed I can get them.. Like I mentioned above the database is much more my area.

Since the upgrade we see a large number of sessions running dist_txn_sync which is an expensive operation. Literally ran it 1.2 million times in 32 hours. It appears that it is running every few minutes or so across a set of sessions in the database. A SQL Trace on one of the DB's shows this to be the case.

I understand that this is caused by calls to  oracle.jdbc.xa.OracleXAResource.recover() but there is not a lot of information available on what is happening and why.. This appears to be a new thing happening since the upgrade to 19c and wildfly. our previous environment never saw this. It did how up in testing but the impact was so small it wasn't noticed until we went looking for it.   Oracle has a support not that essentially says tell the application to stop calling it so often ( Doc ID 2332314.1) .

I have seen an option to turn recovery off completely but there does not appear to be a way to just slow it down and stop it from running so often.

Can anyone offer any insight on this or ways we might be able to control this without stopping it completely. stopping it may be an option but that is no my area.

Michael Musgrove

unread,
Nov 1, 2022, 10:46:02 AM11/1/22
to scott gamble, WildFly
The Oracle docs say that that procedure is called when the TM calls xa_recover, xa_commit and xa_rollback.
We do call xa_recover during each recovery cycle and the default for that is every two minutes and I'd advise against disabling recovery.
But the figures you quote indicate it is being called 10 times per second. Does that correspond in any way to the transaction throughput? You could try enabling transaction statistics to get a feel for the actual transaction stats.

--
You received this message because you are subscribed to the Google Groups "WildFly" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wildfly+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wildfly/e829c067-293b-4cdb-8147-51de06b7526dn%40googlegroups.com.


--
Michael Musgrove

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)


scott gamble

unread,
Nov 1, 2022, 11:20:10 AM11/1/22
to WildFly
It probably is every 2 minutes but with the number of  connection pools they have it probably comes out to close to constantly happening.

They have multiple app servers.... each app server has a connection pool per user, we have a user per warehouse and 60 warehouses.. so roughly 60 connection pools per app server..
so that is a lot of connection pools that are being checked. 

I was just talking with the application owner and they have mentioned that prior to 10/16 they were on jboss 4 and did not see this behavior.. it appears to have started with the move to wildfly that we did on 10/16.
Is there any option to slow that down off the every 2 minutes?

Neal Day

unread,
Nov 1, 2022, 12:02:50 PM11/1/22
to WildFly
I see there is an option to set the recovery user and password on each XA data source.  If we were to set that to be the same user on all data sources, would that help at all?  I guess I'm wondering if it was the same user, would the recovery process run once for all the data sources or would it still run once per data source..

Michael Musgrove

unread,
Nov 1, 2022, 12:58:09 PM11/1/22
to WildFly
Yes Neal is thinking along the right lines:- if you have configured multiple equivalent datasources (in the sense that they refer to the same oracle instance) then the recommendation is to only enable recovery for one of them, there is a Red Hat solution that discusses the "no-recovery" option when defining datasources [1].

https://access.redhat.com/solutions/1577923

Neal Day

unread,
Nov 1, 2022, 1:34:39 PM11/1/22
to WildFly
I don't have a Red hat support subscription.  Are you able to share a more detailed summary of what's in the link?

I do see the first line though.  If I'm understanding that correctly, if we (for example) had 3 xa data sources defined with different database users, but the pointing to the same database (i.e. same host and port), we should have <recovery no-recovery="true"/> set on 2 out of the 3 of them?

And to take that a step farther, we have multiple servers in the environment that would have 3 xa data sources defined the same way, so I assume we should follow the same 2 of 3 approach on each server?

Thanks!

Manuel Finelli

unread,
Nov 1, 2022, 1:59:37 PM11/1/22
to Neal Day, WildFly
On Tue, 1 Nov 2022 at 17:34, 'Neal Day' via WildFly <wil...@googlegroups.com> wrote:
I don't have a Red hat support subscription.  Are you able to share a more detailed summary of what's in the link?

I do see the first line though.  If I'm understanding that correctly, if we (for example) had 3 xa data sources defined with different database users, but the pointing to the same database (i.e. same host and port), we should have <recovery no-recovery="true"/> set on 2 out of the 3 of them?

That's exactly what you should do
 

And to take that a step farther, we have multiple servers in the environment that would have 3 xa data sources defined the same way, so I assume we should follow the same 2 of 3 approach on each server?

If I understand correctly your configuration, on each server you should (again) set <recovery no-recovery="true"/> on 2 out of the 3 datasources.
 

scott gamble

unread,
Nov 1, 2022, 3:27:01 PM11/1/22
to WildFly
We do call xa_recover during each recovery cycle and the default for that is every two minutes a

Is there a way to change the default?



Manuel Finelli

unread,
Nov 2, 2022, 5:43:48 AM11/2/22
to scott gamble, WildFly
There is a way :-) Narayana's `RecoveryEnvironmentBean.periodicRecoveryPeriod` should be modified. IIRC, WildFly doesn't offer a direct bridge to modify `periodicRecoveryPeriod` but you can modify this property (as well as other Narayana's properties) using either `jbossts-properties.xml` or system properties.

Michael Musgrove

unread,
Nov 2, 2022, 6:19:05 AM11/2/22
to Manuel Finelli, scott gamble, WildFly
I'd argue that you need to fix it at your end, extending the periodic recovery period is pushing the problem elsewhere (from where it should be addressed) and that's why we chose not to expose the property; two minutes should be a good compromise between hitting the RM too often and leaving transaction branches in-doubt (with the consequence of leaving the data locked and unavailable which is going to cause performance spikes) for longer than they need to.

Michael Musgrove

unread,
Nov 2, 2022, 6:37:23 AM11/2/22
to Manuel Finelli, scott gamble, WildFly
The article I referenced just says to use the "no-recover" option when multiple datasources are configured against the same database. The WildFly datasource model property that controls whether or not the datasource is used during recovery is [1].

[1] https://docs.wildfly.org/23/wildscribe/subsystem/datasources/xa-data-source/index.html#attr-no-recovery

scott gamble

unread,
Nov 2, 2022, 11:12:53 AM11/2/22
to WildFly
I do not disagree with that but even if they do the other suggestions here I am still potentially doing 30 of these every 2 minutes.

Its nice to know I have another option if needed.

Reply all
Reply to author
Forward
0 new messages