How to commit the log entries of the follower?

44 views
Skip to first unread message

沈林

unread,
Feb 5, 2017, 8:52:04 PM2/5/17
to jgroups-raft
Hi,Bela:

I want to use jgroups-raft to build a small system of distributed storage, which use for storing configuration of other system。(High Read, Low Write)

Currently,I use guava as local storage, and when the data on the guava is changed, I use jgroups-raft to notify all the machine to change as well. So, when I read the data from the guava , I must to ensure the data is fresh。 In other words, Everytime I read from the guava , I must to ensure the log entries (send by the Leader) of  this machine follower has been  committed. But when I look up the source code of jgroups-raft , I can't find a good way to do this. 

Can you tell me the way, how I can achieve this?

Thanks,
Lin.

Bela Ban

unread,
Feb 6, 2017, 9:51:18 AM2/6/17
to jgroup...@googlegroups.com
Hi Lin,

using a local guava cache won't work, because there's no relation
between sets and gets on jgroups-raft and updates/reads on the local cache.

If you want the properties of RAFT (commiting by majority agreement and
total order of updates), you have to implement the StateMachine [1]
interface, perhaps based on guava for local storage.

There's a ReplicatedStateMachine sample implementation [2]. This impl
returns values from the local cache on get(), which means you have to
live with stale reads.

If you want reads to be ordered correctly with respect to writes, you
have to treat reads as pseudo-writes, ie. subject them to the same
append/commit cycle as writes. I've implemented another service,
CounterService [3], which shows how this is done.

So for example, if A invokes put(x=3), put(x=4), get(x) and B invokes
get(x), put(x=10), then all 5 commands are given a term and an ID, which
are unique and assigned by the leader, e.g.
1 A: put(x=3) ID=45
2 A: put(x=4) ID=46
3 B: get(x) ID=47
4 B: put(x=10) ID=48
5 A: get(x) ID=49

This means the commands are globally ordered, and the results are
1: x=3
2: x=4
3: 4
4: x=10
5: 10

Of course, preventing stale reads reduces read performance, as the cost
of reads is now the same as writes (except for the write to the local
storage).
Cheers,


[1]
https://github.com/belaban/jgroups-raft/blob/master/src/org/jgroups/protocols/raft/StateMachine.java

[2]
https://github.com/belaban/jgroups-raft/blob/master/src/org/jgroups/raft/blocks/ReplicatedStateMachine.java

[3]
https://github.com/belaban/jgroups-raft/blob/master/src/org/jgroups/raft/blocks/CounterService.java#L98


On 06/02/17 02:52, 沈林 wrote:
> Hi,Bela:
>
> I want to use jgroups-raft to build a small system of distributed
> storage, which use for storing configuration of other system。(High
> Read, Low Write)
>
> Currently,I use guava as local storage, and when the data on the guava
> is changed, I use jgroups-raft to notify
> <http://www.baidu.com/link?url=aellGB_8vYgMtZD70WlVuAPh_0a1OwTeVBVJ8azu41i7XOpUudUzjUI0BfMZmYtXy0c8YTxrbiuK8kRLWBry5-J3GmlVnaupaRMBFGklcPq> all
> the machine to change as well. So, when I read the data from the guava ,
> I must to ensure
> <http://www.baidu.com/link?url=G40g9u1JaZfDKS0dAUtUVUJlQ96r-BYGu2MIJ7LbX1i33pWxvrchXy77Kfgs2Mcp7d-8yZ13QFzEpB4Bcesc-6os6fjnGypiXD1yWJJop-C> the
> data is fresh。 In other words
> <http://www.baidu.com/link?url=w-
> A4g91AhU7X2KYhPQ3AfDAZpDX1g7arDSrsFrgZwQtQcTrdflQUDEp-
> IZ0HoWp1J0xgfZ8_V_kEBMk67MUmRpfy6oLDnlLbtqvYoJbVWv2Om9HlUZbO581ANJpqxES4>,
> Everytime I read from the guava , I must to ensure
> <http://www.baidu.com/link?url=G40g9u1JaZfDKS0dAUtUVUJlQ96r-BYGu2MIJ7LbX1i33pWxvrchXy77Kfgs2Mcp7d-8yZ13QFzEpB4Bcesc-6os6fjnGypiXD1yWJJop-C> the
> log entries (send by the Leader) of this machine follower has been
> committed. But when I look up the source code of jgroups-raft , I can't
> find a good way to do this.
>
> Can you tell me the way, how I can achieve this?
>
> Thanks,
> Lin.
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-raft" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-raft...@googlegroups.com
> <mailto:jgroups-raft...@googlegroups.com>.
> To post to this group, send email to jgroup...@googlegroups.com
> <mailto:jgroup...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-raft/cc33c266-ddcc-4474-8b7e-5310c1cd38e0%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-raft/cc33c266-ddcc-4474-8b7e-5310c1cd38e0%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Bela Ban, JGroups lead (http://www.jgroups.org)

沈林

unread,
Feb 8, 2017, 11:01:36 AM2/8/17
to jgroups-raft, bel...@mailbox.org

Hi, Bela:


Thank you  for answering my questions. The two demos of read/write you provided is very useful to me. The current problem is read:


As you said :

1)  The ReplicatedStateMachine demo have to live with stale reads.(Beacuse of reading from the local map directly, without leader)

2)  The CounterService demo takes high cost of reads.(Because  each read must be led by leader. This problem will be more serious When the cluster is bigger ,such as 10 nodes or more, with high read and low write.)


As previously saidI want to build a distributed storage system which possesses the features of high read, low write, and strong consistency. Can it be bring about ? (In order to ensure the performance of read, reading from the local storage, not from the centralized cache , or from other machine  by hash algorithm

, is a good  way I thinkespecially in a multi-room environment or weak network)


Another question is that the ReplicatedStateMachine demo is based on jgroups-raft when updating the data.

But I think it also can just only rely on the jgroups to achieve it. (Using jgroups to notify all the nodes to update the data reliably) So what is the advantages of  jgroups-raft on the ReplicatedStateMachine demo comparing to the jgroups?


Thanks,
Lin.



在 2017年2月6日星期一 UTC+8下午10:51:18,Bela Ban写道:

Bela Ban

unread,
Feb 10, 2017, 1:30:28 AM2/10/17
to jgroup...@googlegroups.com


On 08/02/17 17:01, 沈林 wrote:
> Hi, Bela:
>
>
> Thank you for answering my questions. The two demos of read/write you
> provided is very useful to me. The current problem is read:
>
>
> As you said :
>
> 1) The ReplicatedStateMachine demo have to live with stale
> reads.(Beacuse of reading from the local map directly, without leader)

Right, but note that this is just a demo. Once could implement non-stale
reads the way I described it.

> 2) The CounterService demo takes high cost of reads. (Because each read
> must be led by leader. This problem will be more serious When
> the cluster is bigger ,such as 10 nodes or more, with high read and low
> write.)

Correct.


> As previously said,I want to build a distributed storage system which
> possesses the features of high read, low write, and strong consistency.
> Can it be bring about ? (In order to ensure the performance of read,
> reading from the local storage, not from the centralized cache , or from
> other machine by hash algorithm , is a good way I think,especially in a multi-room environment or weak
> network)


It depends on what properties you want in your system; in my previous
reply I assumed you wanted to execute reads in exactly the same sequence
as writes. Of course, if you're willing to live with properties that are
less strict, e.g. read-your-writes, performance will be better.

Read-your-writes means the read is sent to the leader, and you would see
your previous write, or a write by someone else. This would not involve
a disk read and no consensus, so it will be faster.

If you read Diego's Raft thesis, he describes how to do reads, but this
part hasn't been implemented in jgroups-raft (yet).

> Another question is that the ReplicatedStateMachine demo is based on
> jgroups-raft when updating the data.
>
> But I think it also can just only rely on the jgroups to achieve it.
> (Using jgroups to notify all the nodes to update the data reliably)

Sure, but as I said, it depends on your consistency requirements. For
example, are you able to tolerate inconsistent data on network
partitions? If that's the case, and your application is able to merge
data after a partition heals, then use JGroups. If not, use jgroups-raft.

> So what is the advantages of jgroups-raft on the ReplicatedStateMachine
> demo comparing to the jgroups?

- Changes are only applied by (majority) agreement, as a consequence
- Data never diverges, even under network partitions
- All changes are applied in total order
> > <http://www.baidu.com/link?url=w- <http://www.baidu.com/link?url=w->
> > A4g91AhU7X2KYhPQ3AfDAZpDX1g7arDSrsFrgZwQtQcTrdflQUDEp-
> >
> IZ0HoWp1J0xgfZ8_V_kEBMk67MUmRpfy6oLDnlLbtqvYoJbVWv2Om9HlUZbO581ANJpqxES4>,
>
> > Everytime I read from the guava , I must to ensure
> >
> <http://www.baidu.com/link?url=G40g9u1JaZfDKS0dAUtUVUJlQ96r-BYGu2MIJ7LbX1i33pWxvrchXy77Kfgs2Mcp7d-8yZ13QFzEpB4Bcesc-6os6fjnGypiXD1yWJJop-C
> <http://www.baidu.com/link?url=G40g9u1JaZfDKS0dAUtUVUJlQ96r-BYGu2MIJ7LbX1i33pWxvrchXy77Kfgs2Mcp7d-8yZ13QFzEpB4Bcesc-6os6fjnGypiXD1yWJJop-C>>
> the
> > log entries (send by the Leader) of this machine follower has been
> > committed. But when I look up the source code of jgroups-raft , I
> can't
> > find a good way to do this.
> >
> > Can you tell me the way, how I can achieve this?
> >
> > Thanks,
> > Lin.
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "jgroups-raft" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to jgroups-raft...@googlegroups.com <javascript:>
> > <mailto:jgroups-raft...@googlegroups.com <javascript:>>.
> > To post to this group, send email to jgroup...@googlegroups.com
> <javascript:>
> > <mailto:jgroup...@googlegroups.com <javascript:>>.
> <https://groups.google.com/d/msgid/jgroups-raft/cc33c266-ddcc-4474-8b7e-5310c1cd38e0%40googlegroups.com?utm_medium=email&utm_source=footer
> <https://groups.google.com/d/optout>.
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-raft" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-raft...@googlegroups.com
> <mailto:jgroups-raft...@googlegroups.com>.
> To post to this group, send email to jgroup...@googlegroups.com
> <mailto:jgroup...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-raft/09909dfc-cad2-415c-8aa9-45ac5aef3fff%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-raft/09909dfc-cad2-415c-8aa9-45ac5aef3fff%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages