JGroup reconnect without losing state (Android)

4 views
Skip to first unread message

Anton Gerdessen

unread,
Feb 10, 2026, 10:18:20 AMFeb 10
to jgroups-dev
Hello,

I'm still working with jGroups and Android.
I'm transferring large files using JGroup to 20+ connected Android devices using multicast.
I noticed you added: Util.resetCachedAddresses(true, true);
This helps me a lot, most of my uses cases on Android work now.

I use JGroups 5.5.2 and a stack with UDP, ping, FD_ALL3, Merge, Verify suspect, barrier, MFC, Nakack4, Unicast4, GMS, frag2

I have one annoying use case though.

If the user disconnects the (USB-C to ETH) network cable, and re connected it well within the timers for failure detection. This all while a transfer is running, I want to be able to recover from this. 
This use case can also trigger because Android sometimes disconnects the device (USB-C to ETH) by itself, to reconnect it a second later while no-one touched the phone.
Why it does this is a mistery to me.

What I tried is:
  • call the reset function
  • disconnect
  • connect
This will make the transfer continue, but any data send when the cable was disconnected will not recover. I assume this is because Nakack4 was reset on the Android side.

I tried not disconnecting, only connecting, but this does not work either because JGroup detect the cluster as connected, but the underlying socket is gone.

I also tried calling stop and start on the protocols itself, this causes the same situation as calling disconnect / reconnact. Nakack4 lost its state.

I also tried only resetting the transport (UDP) since I assume that holds the link to the sockets, but no success there either.

I turned on trace logging and I can see that my Android JGroup member which was disconnected doesnt send anything anymore. It survives due to the suspect on the server size keeping it in the cluster, but besides that its silent.

Is there a way for me to disconnect/reconnect, while keeping the Nakack4 state to allow retransmits?

Any tips are more than welcome,

Regards,

Anton

Bela Ban

unread,
Feb 10, 2026, 10:51:48 AMFeb 10
to jgrou...@googlegroups.com
Hi Anton

as long as pulling the plug / disconnecting the network cable does not trigger a new view, NAKACK4 won't lose its state. Interpreting what you said below I assume this is not the case and no new view is generated.

So we have to solve these temporary disconnects entirely at the transport level (UDP). I assume the disconnect causes one (or all) of the following:
* The routing table loses its state (multicast route to the interface that was disconnected)
* The interface becomes disabled (perhaps for an extended time) and therefore sending won't succeed

The good news is that - as long as the member whose interface was disconnected - stays in the cluster, we should be able to save the state. This happens because NAKACK4 keeps retransmitting until a message has been received, regardless of what happens at the network level.

I suggest try the following after a disconnect/reconnect:

* Find the UDP protocol: channel.stack().findProtocol(UDP.class)
* Call udp.stopThreads() and udp.destroySockets(). This is the same as stop() but it doesn't call super.stop()
* Call udp.createSockets() and udp.startThreads(). Same as start() plus CONNECT event down

You'd need to change JGroups and make some of these methods public, or use reflection to call them, e.g. via Util.invoke()

I suspect that we don't even need to go this far; perhaps it was just the receiver thread which stopped on the disconnect.

If you could debug this, you'd see that states of the sockets and whether the receiver thread is still running.

Let me know if this works,
Cheers

--
You received this message because you are subscribed to the Google Groups "jgroups-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jgroups-dev...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jgroups-dev/d6466314-77d8-4839-9842-95259e446633n%40googlegroups.com.

-- 
Bela Ban | http://www.jgroups.org

Reply all
Reply to author
Forward
0 new messages