Recover and clear transaction heuristic state by mbean management operation

192 views
Skip to first unread message

Ondra Chaloupka

unread,
Jun 11, 2020, 7:38:52 AM6/11/20
to narayana-users
Hi,

I have a question on Narayana management operations exposed via mbeans. I wonder about 'clearHeuristic'[1] in particular. This is the operation which is invoked when WildFly cli executes ':recover'[2][3].

Let me explain the scenario which I consider wrong. I came to it when I've been designing testcase for WFTC-85[4]. There are two WildFly servers where the first one calls EJB remote bean on the second server. There is transaction context propagated over the call.
There happens to be an network intermittent failure at time when commit is called. As the first WildFly server enlisted only the EJB remote call as the XAResource there is used the 1PC. On the RMFAIL error (emitted based on the fact there is connection crash) the XAResourceRecord is assigned with heuristic error[5]. The transaction is marked as heuristics. Then the user comes to the WildFly console and wants to finished it. He invokes the ':recover' but nothing happens. The transaction/participant is left uncommitted.
I found that the reason is that the management operation 'clearHeuristic' is invoked only on the "covering" transaction - BasicAction is modified[6] -> [7] -> [8]. But the XAResourceRecord heuristics state[9] is unchanged. Then during recovery commit retry the XAResource.commit is denied[10].

I would like to understand if this could be considered as a flaw of the heuristics management processing or if that's by design for reasons.

After some experiments I've got to a working PoC[11] which main point is to permits to clear heuristic decision state from the AbstractRecord - instantiated as the XAResourceRecord in this case. When the heuristic state of the record (XAResourceRecord._heuristics) is cleared and saved by the management operation then the recovery cycle is able to take it and replay the processing on it.

Would you have some thoughts here?

Thanks a lot
Ondra


[2] /subsystem=transactions/log-store=log-store/transactions=.../participants=...:recover

Mark Little

unread,
Jun 17, 2020, 4:58:30 AM6/17/20
to Ondra Chaloupka, narayana-users
When you say below that the transaction/participant is left uncommitted which do you mean? Transaction or participant? Heuristic decisions aren’t associated with uncommitted states, they are only associated with (potentially wrong) committed states.

Mark.


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/9c2b154f-443e-49f1-a3ad-67aa0724e6ffo%40googlegroups.com.

---
Mark Little
mli...@redhat.com

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)




Ondra Chaloupka

unread,
Jun 17, 2020, 5:54:46 AM6/17/20
to narayana-users
On Wednesday, June 17, 2020 at 10:58:30 AM UTC+2, Mark Little wrote:
When you say below that the transaction/participant is left uncommitted which do you mean? Transaction or participant? Heuristic decisions aren’t associated with uncommitted states, they are only associated with (potentially wrong) committed states.

The transaction which I refer to is the one which is started on one WildFly server and is propagated with the context to the second WildFly server. The transaction manager works with the EJB remoting call as with a standard XA resource. Then there is done a transactional work on the second server. There are enlisted two participants - in case of the test case they are only mock XA resources. At time of the commit (1PC processing) - which is emitted from the first server - the mock XA resources are prepared on the second server but then failure at commit happens (it's simulation of network error which can happen for e.g. database JDBC driver trying to connect to a database to commit the prepared resource). At that time the whole transaction is marked as heuristic. There is not clear if the commit call to the database passed through and (the possible) database has been committed or it's only in a prepared state. As said with this failure the transaction is marked as heuristic and the EJB remoting XA resource participant on the first WildFly server is marked with heuristic state as well.
Does my explanation make sense?
 

Mark.


To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

Michael Musgrove

unread,
Jun 17, 2020, 6:32:50 AM6/17/20
to Ondra Chaloupka, narayana-users
The tooling clears heuristics by putting them back onto the prepared list. The linked PoC is marking the resource as finished which hides the heuristic. As far as I can recall the tooling only allows deleting heuristics (with the usual warnings about doing this) or allows marking them as prepared for another attempt at finishing it.


Mark.


To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

---
Mark Little
mli...@redhat.com

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)




--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/7d883b72-2843-415f-935e-caa4258ee698o%40googlegroups.com.


--
Michael Musgrove

Mark Little

unread,
Jun 17, 2020, 7:03:47 AM6/17/20
to Michael Musgrove, Ondra Chaloupka, narayana-users
Yes, heuristic resolution is a manual process because only an administrator, say, can tell the system which way the transaction was meant to complete and hence which participants went the wrong way. An admin could resolve the heuristic entirely "off line" and then use tooling to clear/delete the heuristic information from the participant or transaction log.

Mark.

Ondra Chaloupka

unread,
Jun 18, 2020, 3:46:34 AM6/18/20
to narayana-users
Thanks Mike and Mark for the feedback.
Agree with all what was said - the tooling is meant for admin to manually resolve the heuristics. As said, the tooling currently provides a way to delete or recover (marking them as prepared).

Now, my question is about the particular scenario and the particular way how code handles it.

I tried to explain the code path of the scenario in my first post (the part where I talk about BasiAction and where I linked the source code). If I do not go that deep to the code path the issue that I observe is that the transaction (BasicAction) as whole is marked as heuristic. At that time the XAResourceRecord is marked as heuristic (by changing the _heuristic flag) as well. But the WildFly recover functionality (or the clearHeuristic action in mbean) reverts only the state of the transaction prepared. But the heuristic state of the resource is not reverted back to "ready to be handled".

This behaviour seems to me wrong and I would like to fix it. I wanted to understand if I haven't missed some aspect. I would like to create a JBTM and propose a fix that would be clearing the heuristic flag from the resource as well.

@Mike: please, can you elaborate a bit more about the statement "The linked PoC is marking the resource as finished which hides the heuristic". My point was opposite. I'm changing only the '_heuristic' flag (https://github.com/ochaloup/narayana/commit/7ef2af5c469f634f82696bc0ad1120b9dc684097#diff-9613cb84dbca3e6a76d760b2b4626befR792) and I intentionally do not touch the participant/resource state. What have I missed?

Thank you
Ondra

Mark.


To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

---
Mark Little
mli...@redhat.com

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)




--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.


--
Michael Musgrove

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

Michael Musgrove

unread,
Jun 18, 2020, 5:32:56 AM6/18/20
to Ondra Chaloupka, narayana-users
For JTA the tooling only supports resource records that are in-lined with the transaction log.
For JTS we support resource records that are located in a separate part of the object store (separate from the actual transaction log).

Does your use case match either of these two strategies?


Mark.


To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

---
Mark Little
mli...@redhat.com

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)




--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.


--
Michael Musgrove

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/05338914-30bf-4ffd-befb-5348f1f70f65o%40googlegroups.com.

Ondra Chaloupka

unread,
Jun 18, 2020, 6:24:21 AM6/18/20
to narayana-users
@Mike: this is JTA. I'm not sure what is meant by term "in-lined with the transaction log". The resource is the XAResourceRecord which uses the save_state and restore_state[1] methods during the recovery to load the state from the transaction log. From that I assume this is in-lined with the transaction log, right? If so, then the handling matches the first strategy.


Mark.


To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

---
Mark Little
mli...@redhat.com

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)




--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.


--
Michael Musgrove

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

Michael Musgrove

unread,
Jun 18, 2020, 8:12:01 AM6/18/20
to Ondra Chaloupka, narayana-users
So it should work. Maybe it's just a bug rather than a feature enhancement.

NB in-line just means that the resource records are stored in the same log as the transaction record (whereas JTS stores resource records in a log which is separate from the transaction log).



Mark.


To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

---
Mark Little
mli...@redhat.com

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)




--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.


--
Michael Musgrove

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/5b09aeda-edd6-4659-944a-59fbdd98b21do%40googlegroups.com.

Michael Musgrove

unread,
Jun 18, 2020, 9:01:06 AM6/18/20
to Michael Musgrove, Ondra Chaloupka, narayana-users
We have different tooling MBeans to handle the various Arjuna record types. If your resource is not one of the usual ones then maybe we need to create a dedicated MBean for it - it depends on your use case.

Mark Little

unread,
Jun 18, 2020, 9:05:15 AM6/18/20
to Michael Musgrove, Michael Musgrove, Ondra Chaloupka, narayana-users
Maybe start by building an example which should work?

Mark.

Ondra Chaloupka

unread,
Jun 18, 2020, 11:11:15 AM6/18/20
to narayana-users
Thanks Mike I haven't realized that the expected behaviour is to have a mbean handling various record.
But as I was examining the code now I can see that the participant log record is the one managed by LogRecordWrapper. I consider it logic as the WildFly :recover call is run against the participant not against the transction. The 'clearHeuristic' is defined at LogRecordWrapper. Then it just pass the call up to the AtomicAction. I mean the :recover is called on LogRecord (which is defined by javadoc that it works with participants) but the call is then redirected to the AtomicAction which works with transaction.

From the discussion I would think it's a bug. I'm going to create a JBTM.

The scenario/example can be checked here


Mark.


To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

---
Mark Little
mli...@redhat.com

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)




--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.


--
Michael Musgrove

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.


--
Michael Musgrove

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

Mark Little

unread,
Jun 19, 2020, 4:51:57 AM6/19/20
to Ondra Chaloupka, narayana-users
So you’ve created an example which should work and it doesn’t? I don’t see that answered below.

Mark.


To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/c23c60b5-7db9-4074-acc8-22183c6a622fo%40googlegroups.com.

Ondra Chaloupka

unread,
Jun 19, 2020, 5:43:01 AM6/19/20
to narayana-users

Michael Musgrove

unread,
Jun 19, 2020, 9:38:19 AM6/19/20
to narayana-users
Ondra, your first post says that you are propagating a transaction. Since narayana only supports JTS for transaction propagation you will be using something external to narayana (WFTC). Since there is no tooling integration between narayana and WFTC I would expect there to be issues for manually resolving these transactions.

Also the javadoc for the test your reference in your last post says that the client commit uses 1PC but the remote server uses 2PC so are these two separate transactions. Furthermore, since there is no logging for 1PC transactions how do you handle the absence of a log in failure scenarios. It sounds like you need some extra tooling support for your use case - that was what I was trying to get at in my first response on this thread, i.e. it sounds like a new use case/feature and not a bug.

Ondra Chaloupka

unread,
Jun 19, 2020, 1:54:53 PM6/19/20
to narayana-users
Ok. Then if it's a new use case then let's discuss how it should behave and if there is an approach how to change the Narayana codebase or not.

This is about JTA transaction and the context is propagate with WFTC. When the EJB remoting is used then it's always the WFTC as the middle layer which processes the propagation. The WFTC "simluates" the behaviour of the standard XA resource. In other scenarios (e.g. when 2PC is used) the tooling works fine. As WFTC works as the usual XAResource the :recover is capable to switch the transaction back to the prepared and later commit.

In case of this scenario there is used the 1PC. Thew 1PC WFTC XAResource returns RMFAIL on commit[1] and changes the _heuristic flag on the XAResourceRecord. This is probably the place and difference that the standard handling does not consider the fact there could be reason to :recover the 1PC.
But it's not accurate that there is no logging for 1PC. In case of failure the 1PC record is saved[2].

What happens here is that 1PC fails, the transaction outcome is logged in the object store with the heuristic failure. The :recover does not touch such record as the heuristic failure updates not only the BasicAction log record but the XAResourceRecord as well (with the mentioned _heuristic flag).

As I'm going though the code again and again I think the possible, and maybe better, approach would be to integrate from the WFTC side in way that the 1PC error code does not mean the _heuristic flag to be filled. If the 1PC record is saved but the _heuristic flag is not tourched then the :recover functionality could work. I think if the WFTC throws a specific XAException.errorCode then the handling goes under default branch[3]. That would bean the WFTC would be informing the TM that "commit retry" could be replied. Which is normally not expected for 1PC. But WFTC saves data at the client side and it does have such ability.

What do you think to integrate the 1PC WFTC with specific error code?


o.

Mark Little

unread,
Jun 22, 2020, 4:38:16 AM6/22/20
to Michael Musgrove, narayana-users
Is this work driven by a customer need?

Mark.


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

Mark Little

unread,
Jun 22, 2020, 4:39:20 AM6/22/20
to Ondra Chaloupka, narayana-users
Have you looked at how JTS interposition works when there’s a coordinator, a single remote interposed resource and then many resources registered locally to it?

Mark.


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

Ondra Chaloupka

unread,
Jun 22, 2020, 5:40:21 AM6/22/20
to narayana-users
> Is this work driven by a customer need?

The work is based on fixing a customer case. This particular discussion came from the fact of creating a testcase, reproducing the customer problem.

> Have you looked at how JTS interposition works when there’s a coordinator, a single remote interposed resource and then many resources registered locally to it?

Yes, I've been looking at it. What I was able to find there is a different behaviour of the JTA and JTS approach. The JTA runs with EJB remoting where the remote call simulates the XAResource and needs to meet the demands which the Narayana holds for the XA processing. While the JTS works over IIOP calls where the interaction of the Narayana and the calls is interlinked. Narayana uses the interceptors directly on calls. The participants logged its activity to the transaction log immediately on the participant prepare call. There are few differences in the design of those two approaches.

Mark Little

unread,
Jun 23, 2020, 4:44:13 AM6/23/20
to Ondra Chaloupka, narayana-users

On 22 Jun 2020, at 10:40, Ondra Chaloupka <ocha...@redhat.com> wrote:

 
> Is this work driven by a customer need?

The work is based on fixing a customer case. This particular discussion came from the fact of creating a testcase, reproducing the customer problem.

Great. Which customer? Is there a relevant support ticket to link?


> Have you looked at how JTS interposition works when there’s a coordinator, a single remote interposed resource and then many resources registered locally to it?

Yes, I've been looking at it. What I was able to find there is a different behaviour of the JTA and JTS approach.

There is no interposition with local JTA, so do you mean JTA-over-JTS or did you just mean local JTA?

The JTA runs with EJB remoting where the remote call simulates the XAResource and needs to meet the demands which the Narayana holds for the XA processing. While the JTS works over IIOP calls where the interaction of the Narayana and the calls is interlinked. Narayana uses the interceptors directly on calls. The participants logged its activity to the transaction log immediately on the participant prepare call. There are few differences in the design of those two approaches.

There are a number of other differences. However, what I specifically wanted you to look at is the behaviour of coordinator->interposed coordinator->locally registered participants. Clue: take a look at how they register and with what protocol, especially for resources/participants and commit.


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

Ondra Chaloupka

unread,
Jun 23, 2020, 6:31:44 AM6/23/20
to narayana-users

On Tuesday, June 23, 2020 at 10:44:13 AM UTC+2, Mark Little wrote:


On 22 Jun 2020, at 10:40, Ondra Chaloupka <ocha...@redhat.com> wrote:

 
> Is this work driven by a customer need?

The work is based on fixing a customer case. This particular discussion came from the fact of creating a testcase, reproducing the customer problem.

Great. Which customer? Is there a relevant support ticket to link?


> Have you looked at how JTS interposition works when there’s a coordinator, a single remote interposed resource and then many resources registered locally to it?

Yes, I've been looking at it. What I was able to find there is a different behaviour of the JTA and JTS approach.

There is no interposition with local JTA, so do you mean JTA-over-JTS or did you just mean local JTA?

I mean local JTA propagated over remote EJB calls. There is a transaction started on the first server, propagated over remote EJB to the second server where it's imported as a subordinate JTA transaction.

 

The JTA runs with EJB remoting where the remote call simulates the XAResource and needs to meet the demands which the Narayana holds for the XA processing. While the JTS works over IIOP calls where the interaction of the Narayana and the calls is interlinked. Narayana uses the interceptors directly on calls. The participants logged its activity to the transaction log immediately on the participant prepare call. There are few differences in the design of those two approaches.

There are a number of other differences. However, what I specifically wanted you to look at is the behaviour of coordinator->interposed coordinator->locally registered participants. Clue: take a look at how they register and with what protocol, especially for resources/participants and commit.

Sorry for my ignorance but I'm not sure if I got it right. My understanding is that for JTS there is use the ExtendedResourceRecord (https://github.com/jbosstm/narayana/blob/5.10.5.Final/ArjunaJTS/jts/classes/com/arjuna/ats/internal/jts/resources/ExtendedResourceRecord.java#L599) and in case of the failure there is thrown HeuristicHazard exception.
If you talk about registration then by my knowledge it's the IIOP and the interceptors are bound for the sake of passing transaction context.
 


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

Mark Little

unread,
Jun 23, 2020, 6:59:05 AM6/23/20
to Ondra Chaloupka, narayana-users
Hi Ondra.

I suggest you take a more indepth look at how interposition works. Start at the interceptors and work your way up. You could look at explicit interposition or implicit interposition because the key points around what participants are created, with whom are the registered, how they are registered etc. is pretty much the same. There's a big gap between IIOP and any OTS ResourceRecord which you should understand before evaluating your proposal.

Mark.


On Tue, Jun 23, 2020 at 11:31 AM Ondra Chaloupka <ocha...@redhat.com> wrote:

On Tuesday, June 23, 2020 at 10:44:13 AM UTC+2, Mark Little wrote:


On 22 Jun 2020, at 10:40, Ondra Chaloupka <ocha...@redhat.com> wrote:

 
> Is this work driven by a customer need?

The work is based on fixing a customer case. This particular discussion came from the fact of creating a testcase, reproducing the customer problem.

Great. Which customer? Is there a relevant support ticket to link?


> Have you looked at how JTS interposition works when there’s a coordinator, a single remote interposed resource and then many resources registered locally to it?

Yes, I've been looking at it. What I was able to find there is a different behaviour of the JTA and JTS approach.

There is no interposition with local JTA, so do you mean JTA-over-JTS or did you just mean local JTA?

I mean local JTA propagated over remote EJB calls. There is a transaction started on the first server, propagated over remote EJB to the second server where it's imported as a subordinate JTA transaction.

 

The JTA runs with EJB remoting where the remote call simulates the XAResource and needs to meet the demands which the Narayana holds for the XA processing. While the JTS works over IIOP calls where the interaction of the Narayana and the calls is interlinked. Narayana uses the interceptors directly on calls. The participants logged its activity to the transaction log immediately on the participant prepare call. There are few differences in the design of those two approaches.

There are a number of other differences. However, what I specifically wanted you to look at is the behaviour of coordinator->interposed coordinator->locally registered participants. Clue: take a look at how they register and with what protocol, especially for resources/participants and commit.

Sorry for my ignorance but I'm not sure if I got it right. My understanding is that for JTS there is use the ExtendedResourceRecord (https://github.com/jbosstm/narayana/blob/5.10.5.Final/ArjunaJTS/jts/classes/com/arjuna/ats/internal/jts/resources/ExtendedResourceRecord.java#L599) and in case of the failure there is thrown HeuristicHazard exception.
If you talk about registration then by my knowledge it's the IIOP and the interceptors are bound for the sake of passing transaction context.
 

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

---
Mark Little
mli...@redhat.com

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)




--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/a635a452-a6dc-48cf-8460-34cf4ff5c7f2o%40googlegroups.com.

Mark Little

unread,
Jun 23, 2020, 8:46:25 AM6/23/20
to Ondra Chaloupka, narayana-users
BTW I'm happy to answer any questions you may have on the architecture or the code as you go through this.

Mark.

Ondra Chaloupka

unread,
Jun 23, 2020, 12:10:28 PM6/23/20
to narayana-users
Ok then. I understand the JTS processing in the following way. The scenario is still the same - there is the server to server communication where the first server calls the second server. The first server has no transactional work to be done while the second server works with two XA participants.
The first server starts the transaction and on the remote call there is registered the ResourceRecord. This one is then used to during one phase commit to access the orb stub to communicate over the network.
The second server then works with the participants as the ExtendedResourceRecord. When the remote call is received the ServerTransaction is object is created and the transaction is wrapped within. In our scenario then the failure occurs on the commit one of the XA participants. This means that the ServerTransaction was already marked with prepared and the outcome of the transaction as whole is assumed to finish with commit. RMFAIL on the XAResource means that the transaction recovery should finish it later. The transaction recovery on the second server is capable to finish the ServerTransaction with commit as it was already prepared. Thus the first server is informed about the success of the one phase commit. The XA participant commit state will be fixed by recovery manager on the second server when the XA participant is available again.

Could you elaborate on my description if it's not precise, please?
If it's more or less accurate then what is your suggestion on changes within the EJB remoting XA processing in this regard?

Thank you for help


On Tuesday, June 23, 2020 at 2:46:25 PM UTC+2, Mark Little wrote:
BTW I'm happy to answer any questions you may have on the architecture or the code as you go through this.

Mark.

On Tue, Jun 23, 2020 at 11:58 AM Mark Little <mli...@redhat.com> wrote:
Hi Ondra.

I suggest you take a more indepth look at how interposition works. Start at the interceptors and work your way up. You could look at explicit interposition or implicit interposition because the key points around what participants are created, with whom are the registered, how they are registered etc. is pretty much the same. There's a big gap between IIOP and any OTS ResourceRecord which you should understand before evaluating your proposal.

Mark.


On Tue, Jun 23, 2020 at 11:31 AM Ondra Chaloupka <ocha...@redhat.com> wrote:

On Tuesday, June 23, 2020 at 10:44:13 AM UTC+2, Mark Little wrote:


On 22 Jun 2020, at 10:40, Ondra Chaloupka <ocha...@redhat.com> wrote:

 
> Is this work driven by a customer need?

The work is based on fixing a customer case. This particular discussion came from the fact of creating a testcase, reproducing the customer problem.

Great. Which customer? Is there a relevant support ticket to link?


> Have you looked at how JTS interposition works when there’s a coordinator, a single remote interposed resource and then many resources registered locally to it?

Yes, I've been looking at it. What I was able to find there is a different behaviour of the JTA and JTS approach.

There is no interposition with local JTA, so do you mean JTA-over-JTS or did you just mean local JTA?

I mean local JTA propagated over remote EJB calls. There is a transaction started on the first server, propagated over remote EJB to the second server where it's imported as a subordinate JTA transaction.

 

The JTA runs with EJB remoting where the remote call simulates the XAResource and needs to meet the demands which the Narayana holds for the XA processing. While the JTS works over IIOP calls where the interaction of the Narayana and the calls is interlinked. Narayana uses the interceptors directly on calls. The participants logged its activity to the transaction log immediately on the participant prepare call. There are few differences in the design of those two approaches.

There are a number of other differences. However, what I specifically wanted you to look at is the behaviour of coordinator->interposed coordinator->locally registered participants. Clue: take a look at how they register and with what protocol, especially for resources/participants and commit.

Sorry for my ignorance but I'm not sure if I got it right. My understanding is that for JTS there is use the ExtendedResourceRecord (https://github.com/jbosstm/narayana/blob/5.10.5.Final/ArjunaJTS/jts/classes/com/arjuna/ats/internal/jts/resources/ExtendedResourceRecord.java#L599) and in case of the failure there is thrown HeuristicHazard exception.
If you talk about registration then by my knowledge it's the IIOP and the interceptors are bound for the sake of passing transaction context.
 

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

---
Mark Little
mli...@redhat.com

JBoss, by Red Hat
Registered Address: Red Hat Ltd, 6700 Cork Airport Business Park, Kinsale Road, Co. Cork.
Registered in the Companies Registration Office, Parnell House, 14 Parnell Square, Dublin 1, Ireland, No.304873
Directors:Michael Cunningham (USA), Vicky Wiseman (USA), Michael O'Neill, Keith Phelan, Matt Parson (USA)




--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-users+unsubscribe@googlegroups.com.

Mark Little

unread,
Jun 24, 2020, 4:54:07 AM6/24/20
to Ondra Chaloupka, narayana-users

On 23 Jun 2020, at 17:10, Ondra Chaloupka <ocha...@redhat.com> wrote:

Ok then. I understand the JTS processing in the following way. The scenario is still the same - there is the server to server communication where the first server calls the second server. The first server has no transactional work to be done while the second server works with two XA participants.
The first server starts the transaction and on the remote call there is registered the ResourceRecord.

Which ResourceRecord? Who creates it? Who registers it? Is it on the client side or the server side?

This one is then used to during one phase commit to access the orb stub to communicate over the network.
The second server then works with the participants as the ExtendedResourceRecord. When the remote call is received the ServerTransaction is object is created and the transaction is wrapped within. In our scenario then the failure occurs on the commit one of the XA participants. This means that the ServerTransaction was already marked with prepared and the outcome of the transaction as whole is assumed to finish with commit. RMFAIL on the XAResource means that the transaction recovery should finish it later. The transaction recovery on the second server is capable to finish the ServerTransaction with commit as it was already prepared. Thus the first server is informed about the success of the one phase commit. The XA participant commit state will be fixed by recovery manager on the second server when the XA participant is available again.

Could you elaborate on my description if it's not precise, please?

There’s a lot missing from this which suggests you didn’t trace the code through the filters or interceptors. If you did, which filters/interceptors did you examine?

To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/0afd66b4-d4e8-4423-9f17-b589b6d9ecb2o%40googlegroups.com.

Ondra Chaloupka

unread,
Jun 24, 2020, 8:04:12 AM6/24/20
to narayana-users


On Wednesday, June 24, 2020 at 10:54:07 AM UTC+2, Mark Little wrote:


On 23 Jun 2020, at 17:10, Ondra Chaloupka <ocha...@redhat.com> wrote:

Ok then. I understand the JTS processing in the following way. The scenario is still the same - there is the server to server communication where the first server calls the second server. The first server has no transactional work to be done while the second server works with two XA participants.
The first server starts the transaction and on the remote call there is registered the ResourceRecord.

Which ResourceRecord? Who creates it? Who registers it? Is it on the client side or the server side?

The ResourceRecord is the Narayana class (https://github.com/jbosstm/narayana/blob/5.10.5.Final/ArjunaJTS/jts/classes/com/arjuna/ats/internal/jts/resources/ResourceRecord.java). It's created by interceptor when the IIOP call is processed from the second server. The registration is done at the client side[1][3] (aka. on the first server) and the registration is commenced at the second server[2].


This one is then used to during one phase commit to access the orb stub to communicate over the network.
The second server then works with the participants as the ExtendedResourceRecord. When the remote call is received the ServerTransaction is object is created and the transaction is wrapped within. In our scenario then the failure occurs on the commit one of the XA participants. This means that the ServerTransaction was already marked with prepared and the outcome of the transaction as whole is assumed to finish with commit. RMFAIL on the XAResource means that the transaction recovery should finish it later. The transaction recovery on the second server is capable to finish the ServerTransaction with commit as it was already prepared. Thus the first server is informed about the success of the one phase commit. The XA participant commit state will be fixed by recovery manager on the second server when the XA participant is available again.

Could you elaborate on my description if it's not precise, please?

There’s a lot missing from this which suggests you didn’t trace the code through the filters or interceptors. If you did, which filters/interceptors did you examine?


I traced the code as far as was able but because I'm not strong in IIOP.
I can present here that both sides use the InterpositionClientRequestInterceptorImpl and InterpositionServerRequestInterceptorImpl to work with the context. If I add part of the processing then they are like [4][5].

From what I wrote last time I can add that I wrongly understood that the resource registration is provoked by client on the call to the server. What I understood now the registration of the server as the participant on the first server is caused by call from server to client (aka. the server registers itself to side client).

Still, I'm a bit lost in what you want to hear from me. If you can provide me with some more explanation how the things work then I would be able to grasp the point better.
What is the connection with the initial question on how to proceed with JTA XA EJB bean remote processing? Is here the concern to do it in similarly as for JTS with the redesign of the whole processing?
 

If it's more or less accurate then what is your suggestion on changes within the EJB remoting XA processing in this regard?

 Thank you for help

[1]

 (p: default-threadpool; w: Idle) JavaIdlRCManager: Created reference for tran 0:ffffc0a80026:3b3233e7:5ef33afd:2e = IOR:0000000000000034494...
 (p: default-threadpool; w: Idle) InterpositionClientRequestInterceptorImpl::send_request ( _is_a ) nodeId=1 requestId=10
 (p: default-threadpool; w: Idle) InterpositionClientRequestInterceptorImpl::receive_reply ( _is_a ) nodeId=1 requestId=10
 (p: default-threadpool; w: Idle) ArjunaTransactionImple::register_resource for 0:ffffc0a80026:3b3233e7:5ef33afd:2e - subtransaction aware resource: NO
 (p: default-threadpool; w: Idle) ArjunaTransactionImple 0:ffffc0a80026:3b3233e7:5ef33afd:2e ::register_resource: Simple resource - org.omg.CORBA.BAD_PARAM:   vmcid: 0x0  minor code: 0  completed: No
 (p: default-threadpool; w: Idle) ArjunaTransactionImple::createOTSRecord for 0:ffffc0a80026:3b3233e7:5ef33afd:2e
 (p: default-threadpool; w: Idle) InterpositionClientRequestInterceptorImpl::send_request ( _is_a ) nodeId=1 requestId=11


[2]
 (p: default-threadpool; w: Idle) ServerTransaction::ServerTransaction ( 0:ffffc0a80026:3b3233e7:5ef33afd:2e, Control myParent, 0:0:0:0:0 )
 (p: default-threadpool; w: Idle) ControlImple::createTransactionHandle ()
 (p: default-threadpool; w: Idle) RootOA::objectIsReady (Servant)
 (p: default-threadpool; w: Idle) ServerResource::ServerResource ( 0:ffffc0a80026:3b3233e7:5ef33afd:2e )
 (p: default-threadpool; w: Idle) ServerTopLevelAction::ServerTopLevelAction ( 0:ffffc0a80026:3b3233e7:5ef33afd:2e )
 (p: default-threadpool; w: Idle) RootOA::objectIsReady (Servant)
 (p: default-threadpool; w: Idle) InterpositionClientRequestInterceptorImpl::send_request ( register_resource ) nodeId=2 requestId=5
 (p: default-threadpool; w: Idle) ContextManager::current ()
 (p: default-threadpool; w: Idle) InterpositionServerRequestInterceptorImpl::receive_request_service_contexts ( _is_a ) nodeId=2 requestId=84
 (p: default-threadpool; w: Idle) InterpositionServerRequestInterceptorImpl::receive_request ( _is_a ) nodeId=2 requestId=84
 (p: default-threadpool; w: Idle) InterpositionServerRequestInterceptorImpl::send_reply ( _is_a ) nodeId=2 requestId=84
 (p: default-threadpool; w: Idle) InterpositionServerRequestInterceptorImpl::suspendContext ( _is_a ) nodeId=2 requestId=84
 (p: default-threadpool; w: Idle) InterpositionServerRequestInterceptorImpl::receive_request_service_contexts ( _is_a ) nodeId=2 requestId=85
 (p: default-threadpool; w: Idle) InterpositionServerRequestInterceptorImpl::receive_request ( _is_a ) nodeId=2 requestId=85
 (p: default-threadpool; w: Idle) InterpositionServerRequestInterceptorImpl::send_reply ( _is_a ) nodeId=2 requestId=85
 (p: default-threadpool; w: Idle) InterpositionServerRequestInterceptorImpl::suspendContext ( _is_a ) nodeId=2 requestId=85

[3]
JTS client server - "enlisting" the ResourceRecord on JTS call to the second server
add:303, BasicAction (com.arjuna.ats.arjuna.coordinator)
register_resource:1026, ArjunaTransactionImple (com.arjuna.ats.internal.jts.orbspecific.coordinator)
register_resource:99, ArjunaTransactionPOATie (com.arjuna.ArjunaOTS)
_invoke:171, ArjunaTransactionPOA (com.arjuna.ArjunaOTS)
dispatchToServant:654, CorbaServerRequestDispatcherImpl (com.sun.corba.se.impl.protocol)
dispatch:205, CorbaServerRequestDispatcherImpl (com.sun.corba.se.impl.protocol)
handleRequestRequest:1700, CorbaMessageMediatorImpl (com.sun.corba.se.impl.protocol)
handleRequest:1558, CorbaMessageMediatorImpl (com.sun.corba.se.impl.protocol)
handleInput:940, CorbaMessageMediatorImpl (com.sun.corba.se.impl.protocol)
callback:198, RequestMessage_1_2 (com.sun.corba.se.impl.protocol.giopmsgheaders)
handleRequest:712, CorbaMessageMediatorImpl (com.sun.corba.se.impl.protocol)
dispatch:474, SocketOrChannelConnectionImpl (com.sun.corba.se.impl.transport)
doWork:1237, SocketOrChannelConnectionImpl (com.sun.corba.se.impl.transport)
performWork:490, ThreadPoolImpl$WorkerThread (com.sun.corba.se.impl.orbutil.threadpool)
run:519, ThreadPoolImpl$WorkerThread (com.sun.corba.se.impl.orbutil.threadpool)

[4]
 InterpositionServerRequestInterceptorImpl::receive_request_service_contexts ( commit_one_phase ) nodeId=2 requestId=146
 InterpositionServerRequestInterceptorImpl::receive_request ( commit_one_phase ) nodeId=2 requestId=146
 ServerTopLevelAction::commit_one_phase for 0:ffffc0a80026:-522ce7b5:5ef325fb:2e
 BasicAction::addChildThread () action 0:ffffc0a80026:-522ce7b5:5ef325fb:2e adding Thread[p: default-threadpool; w: Idle,5,ORB ThreadGroup 1]

[5]
 BasicAction::phase2Commit() for action-id 0:ffffc0a80026:-522ce7b5:5ef325fb:2e
 BasicAction::doCommit (com.arjuna.ats.internal.jts.resources.ExtendedResourceRecord@63530cf1)
 ExtendedResourceRecord::topLevelCommit() for 0:ffffc0a80026:7d5002e5:5ef325ff:33
 InterpositionClientRequestInterceptorImpl::send_request ( commit ) nodeId=2 requestId=178
 ContextManager::current ()
 InterpositionClientRequestInterceptorImpl::send_request ( get_txcontext ) nodeId=2 requestId=179
 InterpositionServerRequestInterceptorImpl::receive_request_service_contexts ( get_txcontext ) nodeId=2 requestId=186
 InterpositionServerRequestInterceptorImpl::receive_request ( get_txcontext ) nodeId=2 requestId=186
 ArjunaTransactionImple::get_txcontext - called for 0:ffffc0a80026:-522ce7b5:5ef325fb:2e
 ArjunaTransactionImple::propagationContext for 0:ffffc0a80026:-522ce7b5:5ef325fb:2e
 InterpositionClientRequestInterceptorImpl::send_request ( getParentControl ) nodeId=2 requestId=180
 InterpositionServerRequestInterceptorImpl::receive_request_service_contexts ( getParentControl ) nodeId=2 requestId=187
 InterpositionServerRequestInterceptorImpl::receive_request ( getParentControl ) nodeId=2 requestId=187
 InterpositionServerRequestInterceptorImpl::send_reply ( getParentControl ) nodeId=2 requestId=187
 InterpositionServerRequestInterceptorImpl::suspendContext ( getParentControl ) nodeId=2 requestId=187
 InterpositionClientRequestInterceptorImpl::receive_reply ( getParentControl ) nodeId=2 requestId=180
 TransactionReaper::getRemainingTimeout for IOR:000000000000002b49444c... returning 0
 InterpositionServerRequestInterceptorImpl::send_reply ( get_txcontext ) nodeId=2 requestId=186
 InterpositionServerRequestInterceptorImpl::suspendContext ( get_txcontext ) nodeId=2 requestId=186
 InterpositionClientRequestInterceptorImpl::receive_reply ( get_txcontext ) nodeId=2 requestId=179
 InterpositionClientRequestInterceptorImpl.packPropagationContext ( org.omg.CosTransactions.PropagationContext@30dcefff )
 InterpositionServerRequestInterceptorImpl::receive_request_service_contexts ( commit ) nodeId=2 requestId=188
 InterpositionServerRequestInterceptorImpl::receive_request ( commit ) nodeId=2 requestId=188
 XAResourceRecord.commit for < 131072, 29, 36, 0000000000-1-1-64-88038-83-45247594-1337-50004650, 0000000000-1-1-64-88038125802-2794-1337-10005000000000 >



 

Mark Little

unread,
Jun 25, 2020, 4:47:29 AM6/25/20
to Ondra Chaloupka, narayana-users
Hi Ondra.

You’re getting close :) Take a look at the Interposition classes, e.g., OSI, Arjuna, Strict etc. If you understand them then you should have a more complete grasp of how interposition can work in a strict OTS manner as well as a number of alternatives.

Mark.


On 24 Jun 2020, at 13:04, Ondra Chaloupka <ocha...@redhat.com> wrote:

Still, I'm a bit lost in what you want to hear from me. If you can provide me with some more explanation how the things work then I would be able to grasp the point better.


Ondra Chaloupka

unread,
Jun 25, 2020, 3:21:06 PM6/25/20
to narayana-users
Hi Mark,

I'm trying but I grope my way searching the destination. In summary the interposition makes the remote coordinator a resource of the originator. The remote coordinator registers itself with the "client". There is created a hierarchy of transactions where there is used a different strategy defining the interposition type. The different strategies (ie. OSI, Arjuna, Strict) define the top level or nested transaction inside of the interposition hierarchy.
The Narayana provides a way to switch the interposition but it's not what we want to talk about, I assume.

Can you be so kind and elaborate in a little bit more details what is what I missed?

Thanks in advance
Ondra

Mark Little

unread,
Jun 29, 2020, 7:16:06 AM6/29/20
to Ondra Chaloupka, narayana-users
Hi Ondra.

I hand drew the following to try to help the conversation.

Let's take a look at the first:

IMG_3289.jpg

So in this there are two images. The first shows the general interposition flow:

- transaction originator (Tx), and the context containing a reference to Tx flows from Node 1 to Node 2;

- without interposition, resources R1 and R2 register back to Tx remotely for the transaction termination. Since there's two of them, it's clearly 2PC.

In the next diagram we have interposition:

- when the original contact is imported, interposed coordinator (INTx) is created, which registers itself as a resource back to Tx and becomes the transaction with which R1 and R2 register locally.

- now we have a potential problem because Tx only sees one resource this time and at commit time will do 1PC on INTx, which then runs through 2PC locally. Unfortunately the failure models and return codes of 2PC can be different to 1PC, leading to some interesting recovery issues at least.

Now let's look at the next diagram:

IMG_3290.jpg

Here we have nested transactions in Node 1 when we make a call to Node 2, which means the context that flows can (doesn't have to) contain the entire hierarchy. At Node 2 when we do interposition we can then create an entire interposed hierarchy. This has various advantages, especially in the case of concurrent transactions/threads which may be running in Node 1 and Node 2, e.g., if a thread in Node 1 decides to terminate T2 before T3 and T4 are complete (yes, that is allowed) then the fact T2 is aborted can be told to Node 2 at the same time, ending any work which may be happening there quickly rather than have it continue to run to completion only to be told afterwards that the transaction terminated.

Now what does this have to do with heuristics? Well the types of heuristics which can be generated and are expected by the coordinator can be influenced by whether interposition is there or not, as we can see in the first diagram, where interposition can change the behaviour completely. I'm not sure from some of your previous responses if you were considering adding in new interposition protocols for WF-specific communication or modifying those already there but I'd be wary of doing so without fully understanding what's already there.

Back to your original heuristic question though: it's also important to know that the information stored about heuristics is different depending upon whether you are a coordinator or a participant and this affects what you can do for recovery:

- if you are a coordinator then you retain the overall heuristic decision for the transaction (not the individual resource decisions) and perhaps the list of participants which contributed (this could be IORs or it could be some other identifier).

- if you are a participant then you retain the decision (commit or abort) and nothing else.

Within the transaction engine and protocols it implements there is no requirement for participants to retain state such that they can undo their decision. We're talking about ACID transactions here and the worry about cascading rollbacks etc. What this means is that recovering from a heuristic decision is pushed to a human to resolve. We try to give them as much information as possible, e.g., the transaction id, maybe the database URL if it's retained. But ultimately the administrator is expected to go around and manually resolve any inconsistent state. Once done, they can use a tool, e.g., the one we have, to recreate the transaction and tell it that the heuristic decision has been resolved, at which point it would simply call forget on any participants it has remembered, since only those remembered made a heuristic choice. But there is no new transaction work here. No new updates to database tables, unless of course we were using a database to store the coordinator or participant states.

Only when all participants have indicated they have forgotten the heuristic (basically they just delete their state), can the transaction state be removed. Of course given failures it's possible that some participants never recover, or their response to the forget call gets lost, and we have to keep retrying. Eventually if we can't remove the state we'll move it and expect the admin to delete it once they have confirmed all of the participants have been informed.

Hope this helps.

Mark.

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/narayana-users/fcae948c-e740-482a-a080-bbba7399c886n%40googlegroups.com.

Ondra Chaloupka

unread,
Jun 29, 2020, 10:39:51 AM6/29/20
to narayana-users
Hi Mark,

thank you for the deep dive explanation. Let's consider that I understand the processing now.

I will try to add my context as best as I'm able.

From time of I remember, so let's say from time of release AS 7.1 there are two modes how run a remote EJB call. It's either the IIOP remote call which works with JTS and all Narayana interceptors. Or the ejb remoting which uses the WildFly proper protocol. That one works on top of the Narayana JTA classe. It does not support nested transactions and uses the XATerminator interface to import the transaction context as a transaction to the remote JVM. I'm not considering to add a new interposition protocol I'm just trying to fix the issue of the ejb remoting communication with transaction context propagation.

I don't mean that the participant retains the heuristic decision. I mean that Narayana object store may retain the heuristic failure of the participant.

The administration operation I refer to is the operation invoked by an administrator who is expected to understand the transaction state and manually invokes it. The ":recover" should clean the heuristic state and move the transaction to prepared.
The issue, I started with, is that the participant which saved the heuristic state is not moved to prepared state when administrator asks for it.

My recent idea is to change the return code from the remote server (remote EJB communication is considered in the issue I work on) which makes Narayana not to retain the participant's heuristic failure in the object store.
Plus, from the discussion we had, and investigation I went through meanwhile, I consider that it could be desirable to allow recovery module to finish imported transactions if it has been prepared.

Does this make sense? Would you (or anybody) have a feedback to the ideas how to fix the issue?

Thanks again
Ondra

Tom Jenkinson

unread,
Jun 29, 2020, 11:41:35 AM6/29/20
to Ondra Chaloupka, narayana-users
> As the first WildFly server enlisted only the EJB remote call as the XAResource there is used the 1PC

Isn't this a problem? We disabled subordinate transactions doing 1PC in the same server: https://issues.redhat.com/browse/JBTM-2916 but I think this might not be in WFTC and needs to be, if so it would relate to https://groups.google.com/d/msg/wildfly/VivRwQNbVGE/5XJtqkjKAAAJ

On Thu, 11 Jun 2020 at 12:39, Ondra Chaloupka <ocha...@redhat.com> wrote:
Hi,

I have a question on Narayana management operations exposed via mbeans. I wonder about 'clearHeuristic'[1] in particular. This is the operation which is invoked when WildFly cli executes ':recover'[2][3].

Let me explain the scenario which I consider wrong. I came to it when I've been designing testcase for WFTC-85[4]. There are two WildFly servers where the first one calls EJB remote bean on the second server. There is transaction context propagated over the call.
There happens to be an network intermittent failure at time when commit is called. As the first WildFly server enlisted only the EJB remote call as the XAResource there is used the 1PC. On the RMFAIL error (emitted based on the fact there is connection crash) the XAResourceRecord is assigned with heuristic error[5]. The transaction is marked as heuristics. Then the user comes to the WildFly console and wants to finished it. He invokes the ':recover' but nothing happens. The transaction/participant is left uncommitted.
I found that the reason is that the management operation 'clearHeuristic' is invoked only on the "covering" transaction - BasicAction is modified[6] -> [7] -> [8]. But the XAResourceRecord heuristics state[9] is unchanged. Then during recovery commit retry the XAResource.commit is denied[10].

I would like to understand if this could be considered as a flaw of the heuristics management processing or if that's by design for reasons.

After some experiments I've got to a working PoC[11] which main point is to permits to clear heuristic decision state from the AbstractRecord - instantiated as the XAResourceRecord in this case. When the heuristic state of the record (XAResourceRecord._heuristics) is cleared and saved by the management operation then the recovery cycle is able to take it and replay the processing on it.

Would you have some thoughts here?

Thanks a lot
Ondra


[2] /subsystem=transactions/log-store=log-store/transactions=.../participants=...:recover

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

Ondra Chaloupka

unread,
Jun 29, 2020, 12:33:55 PM6/29/20
to narayana-users
Hi Tom,

the JBTM-2916 refers to the dynamic 1PC optimalization ("there is an optimization that will cause the second resource to commit during prepare if the first resource returns XARD_ONLY") while this is a situation of standard 1PC processing where there is only one resource available and there is started directly the onPhaseCommit method call (the call chain does not go to "prepare" first as in case of JBTM-2916). From what I'm able to find (as there is not much explanation on statement "This can cause data inconsistency" in the issue) the defect was find in WFLY 10 and it caused that a flawless commit was not processed. As the WFLY 11 introduced the WFTC component which changed how the context propagation over ejb remoting has been done it could be that the 1PC optimization works without an issue now. Could that be?
The fix for JBTM-2916 is part of Narayana so it should be used for WFTC processing currently as well. What I understand from the fix then the fix changes the prepare handling and adds an possibility of an heuristic outcome in case of a failure. Is that right?

The forum question https://groups.google.com/d/msg/wildfly/VivRwQNbVGAndE/5XJtqkjKAAAJ was about WFLY 18 and a different issue. As far as I remember there was not deleted a WFTC registry file descriptor. That trouble was fixed in the next WFLY release (WFLY19). In fact WFTC-85 aims to amend another WFTC registry issue.

Ondra

Mark Little

unread,
Jun 30, 2020, 4:36:54 AM6/30/20
to Ondra Chaloupka, narayana-users
Why “move the transaction to prepared”? It has committed or rolled back (after phase 2) and hence is completed, though there is a heuristic outcome. There is no going back to prepared.


On 29 Jun 2020, at 15:39, Ondra Chaloupka <ocha...@redhat.com> wrote:

The administration operation I refer to is the operation invoked by an administrator who is expected to understand the transaction state and manually invokes it. The ":recover" should clean the heuristic state and move the transaction to prepared.
The issue, I started with, is that the participant which saved the heuristic state is not moved to prepared state when administrator asks for it.


Ondra Chaloupka

unread,
Jun 30, 2020, 5:22:33 AM6/30/20
to narayana-users
To "move the transaction to prepared" is the functionality of the ActionBean.clearHeuristicDecision which is invoked by the ":recover" cli call. The processing fails during commit and thus it is not completed with comit or rollback. It is completed with heuristics - the 1PC failure marked the BasicAction with heuristic outcome and saves that to object store. Then the :recover shifts the record to prepared (prepared_ok, https://github.com/jbosstm/narayana/blob/5.10.5.Final/ArjunaCore/arjuna/classes/com/arjuna/ats/arjuna/tools/osb/mbean/ActionBean.java#L295) for periodic recovery being able to replay the commit later.

Mark Little

unread,
Jun 30, 2020, 5:26:44 AM6/30/20
to Ondra Chaloupka, narayana-users
OK, now I understand. It’s just for recovery replaying to send the forget messages to the heuristic participants.

Thanks,

Mark.


--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

Ondra Chaloupka

unread,
Jul 22, 2020, 11:16:28 AM7/22/20
to narayana-users
I would like to finish with this topic and the WFTC remoting issue (https://issues.redhat.com/browse/WFTC-85) for now. I hope all technical details were summarized in the prior discussion.

I would add that I talked with @Tom two weeks ago and we agreed that this is a different issue than the JBTM-2916.

As a final point I would like to know if the introduced issue is fine to be fixed in way I proposed in the initial message (https://github.com/ochaloup/narayana/commit/7ef2af5c469f634f82696bc0ad1120b9dc684097 - the _heuristic flag is cleaned when :recover is explicitly called by a human administrator)?

Thanks for your feedback
Ondra

PS. I was experimenting with another approach that would make WFTC to return non-standard XAException for one-phase (https://github.com/ochaloup/wildfly-transaction-client/commit/8a4037a7ff398c7e94bfb2fa53824c9f6152ab09) but I don't consider it as a clean and valid now.
Reply all
Reply to author
Forward
0 new messages