stateMachine.set(...) is hanging

26 views
Skip to first unread message

Martin Tauber

unread,
May 6, 2025, 12:32:39 PM5/6/25
to jgroups-raft
Hi Guys,

I would need your help once again ....

I am using the stateMachine to store my states. I wrote my own class implementing StateMachine and it seams to work fine. But I am running into situation where 

log.info("DO");
byte[] result = raftHandle.set(bytes, 0, bytes.length);
log.info("DONE");

will hang forever :( On the other nodes I see that the apply function was executed correctly, but on the node runing the command it just does not return.. 

I was wondering if there are any threads that are getting blocked ... hm hm. hm 


Any Hints are very welcome!
Thanks

José Bolina

unread,
May 6, 2025, 1:43:50 PM5/6/25
to Martin Tauber, jgroups-raft
Hey, Martin, thanks!

A few questions to help:

* Can you share which version is running and the configuration for the RAFT protocol?
* Is the sender node hanging the leader? That is, is a redirect happening (raft.REDIRECT in the stack)?
* Can you try configuring the RAFT protocol and set `send_commits_immediately=true` just for testing purposes and see if it works?



Cheers,

--
You received this message because you are subscribed to the Google Groups "jgroups-raft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jgroups-raft...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgroups-raft/1bb9e3c9-6856-41f0-be8b-34895b87604an%40googlegroups.com.


--
José Bolina
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Bela Ban

unread,
May 7, 2025, 2:59:17 AM5/7/25
to jgroup...@googlegroups.com
`RaftHandle` doesn't have a method `set()`, only `setAsync()`. Did you mean to call the latter method and then call `CompletableFuture.get()`?

Is this called on a Leader or a Follower? In the latter case, do you have `REDIRECT` in your config? What's the config?

Can you reproduce this? If so, can you post a stack trace when running this? If you use vthreads, make sure to use `jcmd <PID> Thread.dump_to_file filename.txt`
--
You received this message because you are subscribed to the Google Groups "jgroups-raft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jgroups-raft...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgroups-raft/1bb9e3c9-6856-41f0-be8b-34895b87604an%40googlegroups.com.

-- 
Bela Ban | http://www.jgroups.org

Martin Tauber

unread,
May 9, 2025, 1:44:36 AM5/9/25
to jgroups-raft
Hi José,

The version that i am using is 1.0.14.Final

The Raft protocoll is configured as foillows:

RAFT raft = new RAFT() // Raft protocol
.members(membersList)
.setValue("raft_id", nodeId)
.setValue("log_dir", uvuyoHome + File.separator + "data")
.setValue("log_prefix", "memdb." + nodeId)
.setValue("max_log_size", maxLogSize)
.setValue("send_commits_immediately", true)
.setValue("log_class", "org.jgroups.protocols.raft.FileBasedLog");

The node that is hangiung is not the leader.

After adding the commit the leader is permanently dumoing the state ... the other two node don't.

Kind Regards
Martin

Martin Tauber

unread,
May 9, 2025, 1:44:36 AM5/9/25
to jgroups-raft
Hi José,

thank for you help. I am psoting this for the forth time now ... hoping google will not loose it again ...

The Version is: 1.0.14.Final


The raft konfiguration is:

        RAFT raft = new RAFT() // Raft protocol
.members(membersList)
.setValue("raft_id", nodeId)
.setValue("log_dir", uvuyoHome + File.separator + "data")
.setValue("log_prefix", "memdb." + nodeId)
.setValue("max_log_size", maxLogSize)
// .setValue("send_commits_immediately", true)
.setValue("log_class", "org.jgroups.protocols.raft.FileBasedLog");

The node is not the leader.

When I switch on the commit the leader keeps dumping the state permanently and the other nodes do nothing ...

Kind Regards
Martin


On Tuesday, May 6, 2025 at 7:43:50 PM UTC+2 jbo...@redhat.com wrote:

Martin Tauber

unread,
May 9, 2025, 1:44:36 AM5/9/25
to jgroups-raft
Hi José,

thank for you help. I am psoting this for the forth time now ... hoping google will not loose it again ...

The Version is: 1.0.14.Final


The raft konfiguration is:

        RAFT raft = new RAFT() // Raft protocol
.members(membersList)
.setValue("raft_id"nodeId)
.setValue("log_dir"uvuyoHome + File.separator "data")
.setValue("log_prefix""memdb." nodeId)
.setValue("max_log_size"maxLogSize)
// .setValue("send_commits_immediately", true)
.setValue("log_class""org.jgroups.protocols.raft.FileBasedLog");

The node is not the leader.

When I switch on the commit the leader keeps dumping the state permanently and the other nodes do nothing ...

Kind Regards
Martin


On Tuesday, May 6, 2025 at 7:43:50 PM UTC+2 jbo...@redhat.com wrote:

Martin Tauber

unread,
May 9, 2025, 1:44:36 AM5/9/25
to jgroups-raft
Hi José,

The Version is 1.0.14.Final

The Raft Protocol is configured as follows:

RAFT raft = new RAFT() // Raft protocol
.members(membersList)
.setValue("raft_id", nodeId)
.setValue("log_dir", uvuyoHome + File.separator + "data")
.setValue("log_prefix", "memdb." + nodeId)
.setValue("max_log_size", maxLogSize)
.setValue("send_commits_immediately", true)
.setValue("log_class", "org.jgroups.protocols.raft.FileBasedLog");

The hanging node is not the leader ...

After changing the send_commits_imediately the leader keeps dumping the state file ... The other two nodes don't ...

Regards
Martin

On Tuesday, May 6, 2025 at 7:43:50 PM UTC+2 jbo...@redhat.com wrote:

Martin Tauber

unread,
May 9, 2025, 1:44:46 AM5/9/25
to jgroups-raft
Hi José,

ok I'm posting this for the third time i don't konw why google does not like me ... maybe it is just not my day ...

The version is:

1.0.14.Final

The raft configuration is 

        RAFT raft = new RAFT() // Raft protocol
.members(membersList)
.setValue("raft_id", nodeId)
.setValue("log_dir", uvuyoHome + File.separator + "data")
.setValue("log_prefix", "memdb." + nodeId)
.setValue("max_log_size", maxLogSize)
// .setValue("send_commits_immediately", true)
.setValue("log_class", "org.jgroups.protocols.raft.FileBasedLog");

When I add the commit parameter the first node keeps on dumping the state permanently and the other dont ...

The node is not the leader 

Kind Regards
Martin

On Tuesday, May 6, 2025 at 7:43:50 PM UTC+2 jbo...@redhat.com wrote:

Bela Ban

unread,
May 10, 2025, 3:01:10 AM5/10/25
to jgroup...@googlegroups.com
send_commits_immediately should not be true (default: false); this is only used for unit testing with a (pseudo) synchronous execution model.

Can you boil this down to a small *mono-less* example which reproduces the issue? Ideally run as a unit test, take a look at some of the examples.

Martin Tauber

unread,
May 11, 2025, 4:05:11 AM5/11/25
to Bela Ban, jgroup...@googlegroups.com

Hi Bela,

 

Yes I can boil this down but give me some time since it’s currently very busy …

 

Kind Regards

Martin

--
You received this message because you are subscribed to a topic in the Google Groups "jgroups-raft" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jgroups-raft/UF_uiQ-RFJU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jgroups-raft...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgroups-raft/fa8837fd-b501-49a5-bbd4-40e9c1037461%40mailbox.org.

belaban

unread,
May 11, 2025, 4:26:57 AM5/11/25
to Martin Tauber, jgroup...@googlegroups.com
I suggest make the RPC OOB or async, this might fix the problem. See my other emails on this



Von meinem/meiner Galaxy gesendet


-------- Ursprüngliche Nachricht --------
Von: Martin Tauber <martin...@2yetis.net>
Datum: 11.05.25 10:05 (GMT+01:00)

Martin Tauber

unread,
Apr 8, 2026, 10:23:13 AMApr 8
to jgroups-raft

Hi Bela,

Thank you very much for your prompt reply—it’s highly appreciated!

I’m currently having trouble reproducing this issue since it only occurs occasionally, and since I adjusted the size of the data being transferred, it hasn’t happened again. However, I’m still concerned that it might resurface.

My request:
Do you have any insights or suggestions based on past experiences with similar issues? This would help me narrow down the problem and debug it more effectively.

Next steps on my side:

  • I’ll try to set up jstack on the machine so that if the issue occurs again, I can at least gather more detailed information.
  • If you have any additional debugging tips or tools that could help, I’d be very grateful!

Thank you in advance for your support!

Kind regards,
Martin

Bela Ban

unread,
Apr 8, 2026, 10:33:08 AMApr 8
to jgroup...@googlegroups.com
Hi Martin

JGroups should be able to handle big workloads with high traffic / large messages. But please understand that - based on the minimal information you gave - no logs/stack traces etc, it is hard to say what caused the problem. Might be a misconfiguration, incorrect app code, or a bug in JGroups or jgroups-raft.

Just waiting for the bug to re-occur, or hoping it doesn't, is probably not a good strategy. I can only say it again (one last time :-)): write a reproducer which generates enough stress on the system so that the issue can be reproduced much more quickly.

That shouldn't be too difficult, and I can help you with it.
Cheers

Martin Tauber

unread,
Apr 13, 2026, 5:05:54 AMApr 13
to jgroups-raft

Hi Bela,

Thank you for your feedback and guidance. I completely understand the importance of having a reproducible test case.

I will focus on creating a reproducer that simulates the stress conditions you mentioned. This should help get more details on the cause of the issue. I’ll keep you posted on my progress and share any findings or logs as soon as I have them.

If I run into any challenges while setting up the reproducer, I’ll reach out for your advice. Thanks again for your support and willingness to help!

Best regards,
Martin

Bela Ban

unread,
Apr 13, 2026, 6:30:22 AMApr 13
to jgroup...@googlegroups.com
Hi Martin

glad to be of help! Let me know when there are questions.
Cheers
Reply all
Reply to author
Forward
0 new messages