Cannot join LRA with Apache Camel

46 views
Skip to first unread message

Alex Wood

unread,
Jun 16, 2025, 3:00:18 PMJun 16
to narayana-users
Hello,

I'm attempting to modify the Saga example provided by Camel Quarkus [1].  Instead of using JMS (which works in the example project), I wanted to communicate using Apache Kafka since that's what the project I work on uses.

I made some very basic modifications (primarily setting the Kafka bootstrap server) and setting the Camel `.to()` argument to "kafka:" [2]  After making these changes, however, I'm seeing an error in the LRA Coordinator.  When either of the downstream microservices attempts to join, the following error occurs:

java.util.concurrent.CompletionException: org.apache.camel.RuntimeCamelException: Cannot join LRA
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
        at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:649)
        at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:614)
        at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:844)
        at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: org.apache.camel.RuntimeCamelException: Cannot join LRA
        at org.apache.camel.service.lra.LRAClient.lambda$join$2(LRAClient.java:142)
        at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
        ... 9 more     

I walked through things with the debugger and it seems like LRA can't acquire a lock in `LRAService.tryLockTransaction()`.  The message "LRA025040: Lock not acquired, enlistment failed: cannot enlist participant, cannot lock transaction" gets printed to the console.  I can't really see why switching from JMS to Kafka would cause this issue though.

Any thoughts or help y'all could offer would be greatly appreciated.
----------
Regards,
Alex

[2] https://github.com/awood/camel-quarkus-examples/commit/0d5716db08ceb6eedad12a1bd9259d243d9cf689 E.g. the argument becomes "kafka:{{example.services.train}}"

Michael Musgrove

unread,
Jun 17, 2025, 6:53:38 AMJun 17
to Alex Wood, narayana-users
Failure to obtain the lock is a valid temporary situation so what happens if you retry the enlistment?

There was a recent change in this area (https://issues.redhat.com/browse/JBTM-3989 LRA coordinator can't handle parallel participant enlistments) which may be related but we haven't release the fix yet so it would be interesting to see if it's possible for you to test against the current repo (https://github.com/jbosstm/lra)?

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/narayana-users/7273d73b-c514-4e34-855a-55b1ffa56bd8n%40googlegroups.com.

Alex Wood

unread,
Jun 17, 2025, 11:10:59 AMJun 17
to narayana-users
Failure to obtain the lock is a valid temporary situation so what happens if you retry the enlistment?

It fails every time I make the POST request that kicks off the Camel flow when using Kafka and succeeds consistently when I use JMS.  Given that I'm using LRA coordinator a bit indirectly via, Camel, I'm not sure if there is another way to retry enlistment.
 
There was a recent change in this area (https://issues.redhat.com/browse/JBTM-3989 LRA coordinator can't handle parallel participant enlistments) which may be related but we haven't release the fix yet so it would be interesting to see if it's possible for you to test against the current repo (https://github.com/jbosstm/lra)?

I tried with the lastest code from the main branch, but with the same results.  I also noticed a lot of INFO messages reading "ARJUNA012332: Failed to establish connection to server".  I'm not sure if that's related or not.  There are also a lot of periodic recovery messages.  I'm guessing that previous test runs have persisted data somewhere.  Just so I'm working with the same state every time I run a test, how do I clear that state?  I'm running via the lra-coordinator-quarkus project.
-------------
Regards,
Alex

Tom Jenkinson

unread,
Jun 17, 2025, 11:31:01 AMJun 17
to Alex Wood, narayana-users
I could be way off the mark but could it be possible that org.apache.camel.service.lra.LRAClient is somehow incompatible with some update(s) to Narayana LRA coordinator?

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

Alex Wood

unread,
Jun 17, 2025, 2:30:18 PMJun 17
to narayana-users
On Tuesday, June 17, 2025 at 11:31:01 AM UTC-4 Tom Jenkinson wrote:
I could be way off the mark but could it be possible that org.apache.camel.service.lra.LRAClient is somehow incompatible with some update(s) to Narayana LRA coordinator?

The last update to that class was in March 2024, so I updated the lra-coordinator-quarkus project to use the BOM from Narayana 6.0.2.Final (released in January 2024 from what mvnrepository.com says) and I had some limited success. 
At this point, I realized that in my calls to the example, I was using the same id over and over which is used at the LRA ID.  If I use a unique ID each time, sometimes, Camel will now work correctly.  Other times, I'm still seeing the same issue with Camel saying "Cannot join LRA" and the LRA Coordinator saying it couldn't acquire a lock and that enlistment failed -- roughly 75% of the time.  So, I've made some progress, but I'm still wondering why (or what to do about) these lock acquisition failures.  Are these failures just something Camel should be able to handle better since it's a valid temporary situation?
------------
Regards,
Alex

Tom Jenkinson

unread,
Jun 18, 2025, 4:50:13 AMJun 18
to Alex Wood, narayana-users
Hi Alex, it's good to know that an update seems to have alleviated some of the difficulties, I am wondering if you might need a version with https://issues.redhat.com/browse/JBTM-3989 - I will say that Narayana LRA moved to it's own repo a while ago and the versioning has changed a bit: https://github.com/jbosstm/lra/blob/main/pom.xml#L15-L17 (please note the groupId difference and the somehow resetting to a lower version number)

------------
Regards,
Alex

--
You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.

Marco Sappe Griot

unread,
Jun 30, 2025, 4:04:22 AMJun 30
to Alex Wood, Zheng Feng, narayana-users
Hi Alex, do you have any updates on this? 
I would like to add @Zheng Feng here who might know better the integration with Camel/Kafka.

Kind regards,
Marco

Alex Wood

unread,
Jul 1, 2025, 1:17:21 PMJul 1
to narayana-users
Hi Alex, do you have any updates on this? 
I would like to add @Zheng Feng here who might know better the integration with Camel/Kafka.

Kind regards,
Marco

Sorry, I was out for a bit with a family emergency.

I updated lra-coordinator-quarkus to use lra-coordinator-jar 1.0.2.Final-SNAPSHOT that I compiled from the lra repo.  Unfortunately, the results are the same.  I still see the warning about "lock not acquired".  One curious thing I've noted though is that when I start up the LRA Quarkus app, I get a lot of messages from Periodic Recovery that look like

2025-07-01 13:13:07,191 TRACE [io.nar.lra] (Periodic Recovery) 2025-07-01T17:13:07.191166724: LRA id: http://localhost:8080/lra-coordinator/0_ffff0ac014b1_a3e5_6864148a_1b, Participant id: <http://localhost:8081/api/lra-participant/compensate?id=1751389387&Camel-Saga-Compensate=direct://cancelPayment>; rel=compensate,<http://localhost:8081/api/lra-participant/complete?id=1751389387&Camel-Saga-Compensate=direct://cancelPayment>; rel=complete, reason: restored, state: Active, accepted: false

which makes me wonder if previous actions are persisted somewhere and are, perhaps, poisoning subsequent requests.  I am too ignorant of the specifics to know if that idea is way off-base or not, but it has happened to me in the past with Kafka messages at least.
I didn't really see anything that screamed "this is a datastore you can clear".  Where is the LRA state data normally stored?
-----------
Regards,
Alex

Michael Musgrove

unread,
Jul 7, 2025, 3:55:42 AMJul 7
to narayana-users
The TRACE message from Periodic Recovery indicates that the recovery system has reloaded an Active LRA from disk (after a restart of the coordinator) and is in the process of trying to complete it. Recovery will either periodically retry the end phase, which means the participants must be available on the same endpoints as before, or will pass it over to the coordinator so that it can handle pre-end phase actions, such as requests to join the LRA for example.

Michael Musgrove

unread,
Jul 7, 2025, 10:09:50 AMJul 7
to narayana-users
I notice that I didn't answer all of your questions.

The object store is where the state of an LRA is stored. The state needs to be persisted reliably in order to provided transactional guarantees.
When an LRA coordinator starts it reloads any pending LRAs. If the LRA is Active then some participant somewhere is responsible for initiating the end phase of the LRA by asking the coordinator to cancel or close the LRA (https://download.eclipse.org/microprofile/microprofile-lra-2.0/microprofile-lra-spec-2.0.html#the-model).

The object store location and type are configured using properties. The underlying config is provided by ArjunaCore (https://www.narayana.io/docs/api/com/arjuna/ats/arjuna/common/ObjectStoreEnvironmentBean.html) but each integration (quarkus, camel, WildFly, etc) will expose a simplified config so you would need to check the documentation of whatever integration you are using.

Tom Jenkinson

unread,
Jul 18, 2025, 7:33:43 AMJul 18
to narayana-users
Hi Alex,

Please can I ask if the answers provided have helped to understand the cause and address it or whether there is any more help that you might like us to try to provide?

Thank you,
Tom

Alex Wood

unread,
Jul 22, 2025, 2:24:46 PMJul 22
to narayana-users
On Friday, July 18, 2025 at 7:33:43 AM UTC-4 Tom Jenkinson wrote:
Hi Alex,

Please can I ask if the answers provided have helped to understand the cause and address it or whether there is any more help that you might like us to try to provide?

Thank you,
Tom

On Monday, July 7, 2025 at 3:09:50 PM UTC+1 Michael Musgrove wrote:
I notice that I didn't answer all of your questions.

The object store is where the state of an LRA is stored. The state needs to be persisted reliably in order to provided transactional guarantees.
When an LRA coordinator starts it reloads any pending LRAs. If the LRA is Active then some participant somewhere is responsible for initiating the end phase of the LRA by asking the coordinator to cancel or close the LRA (https://download.eclipse.org/microprofile/microprofile-lra-2.0/microprofile-lra-spec-2.0.html#the-model).

The object store location and type are configured using properties. The underlying config is provided by ArjunaCore (https://www.narayana.io/docs/api/com/arjuna/ats/arjuna/common/ObjectStoreEnvironmentBean.html) but each integration (quarkus, camel, WildFly, etc) will expose a simplified config so you would need to check the documentation of whatever integration you are using.

Hi Tom,

I did get a chance to take Michael's advice and deleted the ObjectStore.  However, I'm still running into the same issues.  I've put my changes to the example projects up at


I run
 QUARKUS_HTTP_PORT=8080 ./mvnw quarkus:dev
to start the lra-coordinator-quarkus

Then in four different terminals in the quarkus-camel-examples/saga directory:
* mvn -f saga-train-service/pom.xml clean quarkus:dev -Dquarkus.http.port=8083
* mvn -f saga-flight-service/pom.xml clean quarkus:dev -Dquarkus.http.port=8082
* mvn -f saga-payment-service/pom.xml clean quarkus:dev -Dquarkus.http.port=8081
* mvn -f saga-app/pom.xml clean quarkus:dev -Dquarkus.http.port=8084

Then finally start the process with

I'm still seeing the "cannot join LRA" messages in the train and flight service logs but nothing that seems relevant in the LRA coordinator logs.  The LRA coordinator logs are all just trace and debug level messages when I run the POST.

Thanks for your help on this y'all.  I think I'm going to have to set it aside for awhile though as this proof-of-concept has kind of gotten pushed down in priority.  I'm happy to test out other ideas if people can't replicate my results, but it might just take me a little while.
----------
Regards,
Alex



Tom Jenkinson

unread,
Jul 23, 2025, 4:49:23 AMJul 23
to Alex Wood, narayana-users
Thank you for the update, Alex. Thank you for sharing the reproducer - that should definitely help for someone to take a look into this.

--
You received this message because you are subscribed to a topic in the Google Groups "narayana-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/narayana-users/VY-js937qkU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to narayana-user...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/narayana-users/a02702ba-50d5-4ff8-987b-eeb14a81188dn%40googlegroups.com.

Martin Stefanko

unread,
Jul 31, 2025, 4:36:07 AMJul 31
to Tom Jenkinson, Alex Wood, narayana-users
Hi,

Your LRA has ended before all participants join. I was able to get better results with `.timeout("1h")`. But not 100%. Can you add to the readme how you run Kafka? It might be that my set up is not aligned with the expectation of your app.

Cheers,
Martin

You received this message because you are subscribed to the Google Groups "narayana-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to narayana-user...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/narayana-users/CALaezvVcx4P-ua33DgNjbdm6R12KyvP0n8ZDjMMCBNqVdy53%2BQ%40mail.gmail.com.

Martin Stefanko

unread,
Jul 31, 2025, 8:09:55 AMJul 31
to Tom Jenkinson, Alex Wood, narayana-users
Hi Alex,


I'm not very familiar with Camel, so you will need to finish it yourself, but your issue was that the main route in the SagaRoute class finished the LRA before Flight and Train were able to enlist within the LRA. It can be fixed in multiple ways. In my commit, you will find either keeping the AUTO completionMode and use a delay or you can move to MANUAL completionMode and for instance listen to another event from kafka in order to know when to cancel. In my commit, you can do it manually like this `curl -X POST -H "Long-Running-Action: http://localhost:8080/lra-coordinator/0_ffff0a013253_a7e7_688b5a6c_7f" "http://localhost:8084/api/end"`.

Hope this helps,
Martin

Tom Jenkinson

unread,
Sep 1, 2025, 4:31:43 AMSep 1
to narayana-users
Hi Alex,

Hopefully Martin's response has helped you here (thank you, Martin!). If you need any more help, please do let us know.

Thanks,
Tom

Alex Wood

unread,
Sep 2, 2025, 3:11:48 PMSep 2
to narayana-users
Yes it did!  Thank you so much Martin, Tom, Michael, and Marco for all your help on this.  I really appreciate it.

Tom Jenkinson

unread,
Sep 5, 2025, 4:35:19 AMSep 5
to narayana-users
Great - and thank you, Alex, for your interest in Narayana LRA!
Reply all
Reply to author
Forward
0 new messages