How a distributed storage system like Raft filter duplicate requests even after client session expiration

152 views
Skip to first unread message

Yathish

unread,
Oct 11, 2020, 1:53:01 PM10/11/20
to raft-dev

I am trying to understand how a distributed storage system built on Raft filter duplicate requests even after client session expiration.

I have gone through the Raft dissertation chapter 6.3 which talks about how LogCabin (which is a distributed storage system built on Raft) filters the duplicate requests by maintaining the client sessions. Whenever the Cluster doesn't hear any requests from the client for example like an hour, it will expire the client sessions.

Lets say, client ( Id = 1 ) issued a command to increment the count of a product to 3 in the current active session (sessionId = 123 ) as below :

{ ProductName = iPod, Count = INC(3) }

Leader received the client request and replicated it to the majority of the followers, applied it to the state machine and cached the results so that if the same request is issued by the client again it can simply return the cached result as it is a duplicate request.

The same Client( Id = 1 ) went inactive for an hour, So leader expired the client session.

Client issues the same request (duplicate ) to process again in the new session.

Now how cluster will still filter the duplicate requests when client joins with a new session and tries to execute the same request which was issued in previous session ?

Raft Dissertation chapter 6.3 talks below as one solution :

The second issue is how to deal with a client that continues to operate after its session was expired. We expect this to be an exceptional situation; there is always some risk of it, however, since there is generally no way to know when clients have exited. One option would be to allocate a new session for a client any time there is no record of it, but this would risk duplicate execution of commands that were executed before the client’s previous session was expired. To provide stricter guarantees, servers need to distinguish a new client from a client whose session was expired. When a client first starts up, it can register itself with the cluster using the RegisterClient RPC. This allocates the new client’s session and returns the client its identifier, which the client includes with all subsequent commands. If a state machine encounters a command with no record of the session, it does not process the command and instead returns an error to the client. LogCabin currently crashes the client in this case (most clients probably wouldn’t handle session expiration errors gracefully and correctly, but systems must typically already handle clients crashing).


I am finding it difficult to understand how this will solve to filter duplicate requests issued by client in the new session after his previous session was expired. Would like to also know how this kind of issues is handled in other distributed systems.


Oren Eini (Ayende Rahien)

unread,
Oct 12, 2020, 2:38:32 AM10/12/20
to raft...@googlegroups.com
One of the things that you keep in the state machine is the list of active sessions.
Each client command must have the session id attached to it.
First thing you do, check if the session id is in the list of active sessions, if it isn't, don't process it, return an error.

Session expiration means removing the expired session id from the list of active sessions, that is all. 

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raft-dev/85d47983-e38e-4017-a1ff-29450144f4een%40googlegroups.com.


--
Oren Eini
CEO   /   Hibernating Rhinos LTD
Skype:  ayenderahien
Support:  sup...@ravendb.net
  
Message has been deleted

Yathish

unread,
Oct 12, 2020, 12:54:35 PM10/12/20
to raft-dev
Thanks for the response Ayende Rahien.

What if client connects back with the new session after his previous session got expired and issue the same duplicate command to process in the new session. In this case we are still processing the duplicate request and not filtering at all. 

Below are the lines from the Raft dissertation which talks about filtering duplicate requests in the client's new session:

"The second issue is how to deal with a client that continues to operate after its session was expired. We expect this to be an exceptional situation; there is always some risk of it, however, since there is generally no way to know when clients have exited. One option would be to allocate a new session for a client any time there is no record of it, but this would risk duplicate execution of commands that were executed before the client’s previous session was expired. To provide stricter guarantees, servers need to distinguish a new client from a client whose session was expired. When a client first starts up, it can register itself with the cluster using the RegisterClient RPC. This allocates the new client’s session and returns the client its identifier, which the client includes with all subsequent commands. If a state machine encounters a command with no record of the session, it does not process the command and instead returns an error to the client. LogCabin currently crashes the client in this case (most clients probably wouldn’t handle session expiration errors gracefully and correctly, but systems must typically already handle clients crashing)."

I am trying to understand how can we still filter the duplicate client requests in the client's new session ? 

Oren Eini (Ayende Rahien)

unread,
Oct 13, 2020, 3:23:24 AM10/13/20
to raft...@googlegroups.com
Why would the client do this?
When it negotiates a new session, it should abort all pending requests and initiate new ones. 

A new request with a new session is different, not duplicated. 

On Mon, Oct 12, 2020 at 7:49 PM Yathish <yathish.m...@gmail.com> wrote:
Thanks for the response Ayende Rahien.

What if client connects back with the new session after his previous session got expired and issue the same duplicate command to process in the new session. In this case we are still processing the duplicate request and not filtering at all. 
Is there a way to handle this scenario ? 

Thanks,
On Monday, October 12, 2020 at 12:08:32 PM UTC+5:30 Ayende Rahien wrote:

Philip O'Toole

unread,
Oct 13, 2020, 8:40:29 AM10/13/20
to raft...@googlegroups.com
On Tue, Oct 13, 2020 at 3:23 AM Oren Eini (Ayende Rahien) <aye...@ayende.com> wrote:
Why would the client do this?

Client initiates a connection, sends a command to increment X. Command is received and Raft log is changed, but connection is dropped due to a network fault before the acknowledgement comes back to the client (imagine someone literally tripped over the cable and pulling it out of the wall). The client realises the connection is broken, is unsure whether the command was received, starts a new connection, and sends the request again. 

I do agree that the session is completely new, so semantically it's a new command. But  the practical impact is that the increment operation is duplicated.

I can think of two ways to deal with this:

a) client-side generated request IDs, with logic in the server to check for those. This allows the client to signal the same request is being resent, even across broken connections. That said, *now* the client needs to maintain state so it can generate new request IDs as needed.
b) client code signals to the user (whatever that means) that the increment failed and gives up. That lets a higher-level entity decide what to do. For hard-to-address, rare cases (like that outlined above) that's what I would do. 

Oren Eini (Ayende Rahien)

unread,
Oct 13, 2020, 10:13:30 AM10/13/20
to raft...@googlegroups.com
You cannot assume that there is a way to give the client a failure indication.
The simplest option is to generate a GUID per command and keep track of execute commands for a while.

Yathish

unread,
Oct 13, 2020, 1:27:24 PM10/13/20
to raft-dev
It comes back to the same question which i have asked.

You may store the client sessions in the cluster like for an hour of client inactivity after which you will remove the inactive client session.
Client will initiate connection later again and sends the same duplicate command to increment X. In this case, how can we still filter that duplicate requests ?

Jordan Halterman

unread,
Oct 13, 2020, 1:35:15 PM10/13/20
to raft...@googlegroups.com
You can’t. The answer is just longer timeouts. You can only guarantee linearizability within the context of a session. If you want to guarantee it always, you have to keep the sessions always.

Archie Cobbs

unread,
Oct 13, 2020, 1:47:50 PM10/13/20
to raft-dev
In any scenario where you are communicating across an unreliable network, there's no way to reliably know whether an action succeeded or not on the remote side. See https://en.wikipedia.org/wiki/Two_Generals%27_Problem

The only way to entirely eliminate the possibility of accidentally doing "duplicate work" is to make your transactions idempotent. I thought the dissertation mentioned this somewhere but in a quick check I didn't find it.

In any case, this problem is not unique to Raft.

-Archie

Philip O'Toole

unread,
Oct 13, 2020, 1:49:08 PM10/13/20
to raft...@googlegroups.com
On Tue, Oct 13, 2020 at 1:27 PM Yathish <yathish.m...@gmail.com> wrote:
It comes back to the same question which i have asked.

You may store the client sessions in the cluster like for an hour of client inactivity after which you will remove the inactive client session.
Client will initiate connection later again and sends the same duplicate command to increment X. In this case, how can we still filter that duplicate requests ?

You can't, not unless you keep a record of every request ever made.

Yathish

unread,
Oct 14, 2020, 7:27:39 AM10/14/20
to raft-dev
Thanks Jordan, Archie and Philip for helping in clarifying my doubt.
Reply all
Reply to author
Forward
0 new messages