[LogCabin] About two-nodes cluster

163 views
Skip to first unread message

son...@huawei.com

unread,
Aug 24, 2015, 10:11:16 PM8/24/15
to raft-dev

Hi everyone,

I have a special scenario. If I have only two servers currently, and maybe get one more half year later.

I want to use LogCabin to store data. So later I can expand the cluster from two-nodes to three-nodes.

But as you know, for two-nodes cluster, if one of the servers crashed, the cluster cannot provide service any more.

My monitor system can discover which server crashed. I want to reconfigure the only one running server to be a single-node cluster (use a special reconfigure but not a normal reconfigure).

So it can continue provide service. When the crashed server restarted, I can restore the cluster from one-node to tow-nodes (use a normal reconfigure).

My question is that, after the reconfigurations, whether the logs or data on LogCabin cluster are complete and safe?

I know that, for two-nodes cluster, all the committed logs must exist on both servers. But I am still not sure whether it could goes well.

If you find any problem, welcome feedback to me.

Thank you very much.

Oren Eini (Ayende Rahien)

unread,
Aug 25, 2015, 1:28:11 AM8/25/15
to raft...@googlegroups.com
You can't assume that the only failure mode you'll see is a node down.
A common scenario is network issue, or maybe you have a cron job that is doing so much IO that your LogCabin instance isn't responding, but your monitor is going to say that everything is fine. 

Also, switching to one node cluster means that when you add the 2nd node again, you'll need to copy all the data.

Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Diego Ongaro

unread,
Aug 25, 2015, 7:59:09 PM8/25/15
to raft...@googlegroups.com
As Oren explained, forcing the system down to a one-server cluster is dangerous if there's any possibility that the second server is still alive. That's the key problem, and having another server would solve it (with 3 servers, a majority becomes a meaningful concept). You need another server...what's your monitoring system running on? Steal that one. :)

-Diego

son...@huawei.com

unread,
Aug 26, 2015, 4:57:50 AM8/26/15
to raft-dev

Diego and Oren,

Thank you very much for your replies.

I want to use a LogCabin to replace current storage system, which is now running on two nodes and can provide service when one of them crashed. But it cannot expand to there or more nodes.

So if LogCabin can working on two nodes and still provide service when one of the two nodes crashed, it is easier to achieve the replacement.

I know majority is very important for raft. It would make dangerous.

If I can make sure that one of the two nodes is really shutdown, and I use the special way to change the left one node to a single-node cluster. What will happen?

If the left one was leader, some logs it received but not commit into state machine will be committed into state machine after special change to single-node cluster. For these logs, the client side may get a timeout result, or if cluster change quickly enough, the client side would receive success.

If the left one was follower, some logs are committed on leader but the follower did not know, after special change to single-node cluster, it become leader and commit all its logs which it received before former leader crashed. These logs include the committed logs on former leader, and maybe include some logs uncommitted on former leader. For the uncommitted logs, the client side may get a timeout result, or if cluster change quickly enough, the client side would receive success.

It is my analyzation. But I think it is not mature enough.

Is there any specific problems it would cause you can see?

Thank you.

Oren Eini (Ayende Rahien)

unread,
Aug 26, 2015, 5:20:51 AM8/26/15
to raft...@googlegroups.com
If this is your scenario, you don't need raft. you need a master / slave system


Hibernating Rhinos Ltd  

Oren Eini l CEO Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

 


--

Diego Ongaro

unread,
Aug 26, 2015, 7:01:32 PM8/26/15
to raft...@googlegroups.com
If this is your scenario, you don't need raft. you need a master / slave system
True that this degrades into a master-slave system, but if Song has plans to expand to a larger cluster soon-ish, it might still be a reasonable move. And it'd be great to get more people using LogCabin, even if not in its intended mode of operation.

If the left one was leader, some logs it received but not commit into state machine will be committed into state machine after special change to single-node cluster. For these logs, the client side may get a timeout result, or if cluster change quickly enough, the client side would receive success.
 
In LogCabin, if a client had a request outstanding when leadership changed (timeout, connection closed, or NOT_LEADER reply), it gives up on that particular command and retries it (all internal to the client library). That's no different from a normal leadership change, it'll work fine.

You're going to need to make one change to make this work: you'll need a way to tell a LogCabin server to force-append a configuration entry to its log. I think I'd suggest doing this as a logcabinctl RPC handler, and you can probably steal some code from the LogCabin --bootstrap path. If you're prepared to work on this, please start an issue for this on GitHub and let's move the discussion about how to implement it there.

I think it'd be a good idea to skip forward a large number of terms when you do this too, say 2**32. The reason is that we need the last log term in the UP server to be greater than the last log term in the DOWN/REPAIRED server to guarantee that the UP server's log will be treated as more up-to-date than the DOWN/REPAIRED server's log, even if the DOWN/REPAIRED server's log was ahead by a few terms at the time it went down.

-Diego

son...@huawei.com

unread,
Aug 28, 2015, 3:56:56 AM8/28/15
to raft-dev
If this is your scenario, you don't need raft. you need a master / slave system
True that this degrades into a master-slave system, but if Song has plans to expand to a larger cluster soon-ish, it might still be a reasonable move. And it'd be great to get more people using LogCabin, even if not in its intended mode of operation.

[Song] Yes, use raft is easier to expand from two nodes to more nodes. If a product uses two nodes in some places, and uses three or more nodes in some other place, it need both master-slave system and raft. If raft can be used in two nodes scenario, we can use one solution for all the scenarios.  

 

If the left one was leader, some logs it received but not commit into state machine will be committed into state machine after special change to single-node cluster. For these logs, the client side may get a timeout result, or if cluster change quickly enough, the client side would receive success.
 
In LogCabin, if a client had a request outstanding when leadership changed (timeout, connection closed, or NOT_LEADER reply), it gives up on that particular command and retries it (all internal to the client library). That's no different from a normal leadership change, it'll work fine.

[Song] If we use setTimeout of the Tree to set a nonzero value, the client may receive a timeout error. Is that right?

 
You're going to need to make one change to make this work: you'll need a way to tell a LogCabin server to force-append a configuration entry to its log. I think I'd suggest doing this as a logcabinctl RPC handler, and you can probably steal some code from the LogCabin --bootstrap path. If you're prepared to work on this, please start an issue for this on GitHub and let's move the discussion about how to implement it there.

[Song] I tried to modify it. It seems work well. I modified base on Example/Reconfigure. I fill the request.old_id() a zero. Then, in RaftConsensus::setConfiguration, use a branch to handle it when (request.old_id() == 0), force-append a configuration entry to its log. I will start an issue for it, and make the implement more formally. The problem is that, I cannot submit code to github in my working environment because of network limitation.


I think it'd be a good idea to skip forward a large number of terms when you do this too, say 2**32. The reason is that we need the last log term in the UP server to be greater than the last log term in the DOWN/REPAIRED server to guarantee that the UP server's log will be treated as more up-to-date than the DOWN/REPAIRED server's log, even if the DOWN/REPAIRED server's log was ahead by a few terms at the time it went down.
[Song] When one of the two nodes crashed, the left one will become candidate, its term will increase. I think, if I force-append a configuration entry to its log then, the term must be greater than the term of DOWN/REPAIRED server's log. I will verify it later.
 

Thank you all for your suggestions.

Best Regards,

Song

 

Diego Ongaro

unread,
Sep 15, 2015, 9:48:56 PM9/15/15
to raft...@googlegroups.com
Sorry for the long delay. I'm trying to catch up...

On Fri, Aug 28, 2015 at 12:56 AM, <son...@huawei.com> wrote:
If this is your scenario, you don't need raft. you need a master / slave system
True that this degrades into a master-slave system, but if Song has plans to expand to a larger cluster soon-ish, it might still be a reasonable move. And it'd be great to get more people using LogCabin, even if not in its intended mode of operation.

[Song] Yes, use raft is easier to expand from two nodes to more nodes. If a product uses two nodes in some places, and uses three or more nodes in some other place, it need both master-slave system and raft. If raft can be used in two nodes scenario, we can use one solution for all the scenarios.  

 

If the left one was leader, some logs it received but not commit into state machine will be committed into state machine after special change to single-node cluster. For these logs, the client side may get a timeout result, or if cluster change quickly enough, the client side would receive success.
 
In LogCabin, if a client had a request outstanding when leadership changed (timeout, connection closed, or NOT_LEADER reply), it gives up on that particular command and retries it (all internal to the client library). That's no different from a normal leadership change, it'll work fine.

[Song] If we use setTimeout of the Tree to set a nonzero value, the client may receive a timeout error. Is that right?


Yes. In that case, the client library will only retry up to the timeout time.
 
 
You're going to need to make one change to make this work: you'll need a way to tell a LogCabin server to force-append a configuration entry to its log. I think I'd suggest doing this as a logcabinctl RPC handler, and you can probably steal some code from the LogCabin --bootstrap path. If you're prepared to work on this, please start an issue for this on GitHub and let's move the discussion about how to implement it there.

[Song] I tried to modify it. It seems work well. I modified base on Example/Reconfigure. I fill the request.old_id() a zero. Then, in RaftConsensus::setConfiguration, use a branch to handle it when (request.old_id() == 0), force-append a configuration entry to its log. I will start an issue for it, and make the implement more formally. The problem is that, I cannot submit code to github in my working environment because of network limitation.

 
I'll follow up on this on the github issue: https://github.com/logcabin/logcabin/issues/189
 


I think it'd be a good idea to skip forward a large number of terms when you do this too, say 2**32. The reason is that we need the last log term in the UP server to be greater than the last log term in the DOWN/REPAIRED server to guarantee that the UP server's log will be treated as more up-to-date than the DOWN/REPAIRED server's log, even if the DOWN/REPAIRED server's log was ahead by a few terms at the time it went down.
[Song] When one of the two nodes crashed, the left one will become candidate, its term will increase. I think, if I force-append a configuration entry to its log then, the term must be greater than the term of DOWN/REPAIRED server's log. I will verify it later.

Hmm...I think you're right that skipping so many terms is unnecessary, just incrementing the term should be enough. If the last successful election before the crash is E, then the last log term of each server is <= E, and the current term of each server is >= E. If we force append a configuration entry to the UP server at E+1, then the DOWN server has no way to append an entry to its log with term >E without going through a new election, and it can't be elected anymore. (Things get more complicated with more than two servers; I won't even try right now.)
 
-Diego

Reply all
Reply to author
Forward
0 new messages