Disaster Scenarios

37 views
Skip to first unread message

jonathan....@gmail.com

unread,
Feb 14, 2014, 8:38:22 PM2/14/14
to weed-fil...@googlegroups.com
Hello,

I've been curious about this project for a while, good to see it's still being maintained. I had a couple questions regarding disaster scenarios that the documentation doesn't address, and I can't find answers to elsewhere. I'm going to set up a dev environment next week to run some tests and try to figure out some of this myself, but I figured I'd check here first.

1) Correct me if I'm wrong, but it seems like the master node is a single point of failure. Assuming I knew what to back up, if I restored the configuration to a new machine, could I introduce that to the cluster as a new master? Follow-up: what would then happen to files that were created between the time I created the backup and a failure.
 
2) I realize I could probably play games with how a DC is defined to achieve this, but is there a native way to combine replication modes for a file? (e.g., replication=001010 to replicate once to the same rack and once to different rack in the same DC) 

3) If a volume server node dies, and I replace it with a new one, is data automatically replicated back to the new node?

4) Similarly, if a volume server node dies, and I don't replace it, will data be automatically replicated across the cluster to preserve whatever replication strategy I specified during file creation?

5) If I'm insanely paranoid and want to do tape backups as well, are there any existing solutions to expose the file system as a mount? I could do a full backup of each node, but our tape system doesn't de-dupe.

Sorry for the barrage of questions; I'm curious because data loss in my environment is absolutely unacceptable (and would likely be the end of our entire organization).

Thank you very much,
Jonathan

Chris Lu

unread,
Feb 14, 2014, 10:29:12 PM2/14/14
to weed-fil...@googlegroups.com, weed-fil...@googlegroups.com
Thanks for the questions! 

One important feature in GIT HEAD: the master nodes now can form a cluster and auto failover, no no more SPOF! It's implemented with raft protocol.

For other questions, see the answers below.


Chris

On Feb 14, 2014, at 17:38, jonathan....@gmail.com wrote:

Hello,

I've been curious about this project for a while, good to see it's still being maintained. I had a couple questions regarding disaster scenarios that the documentation doesn't address, and I can't find answers to elsewhere. I'm going to set up a dev environment next week to run some tests and try to figure out some of this myself, but I figured I'd check here first.

1) Correct me if I'm wrong, but it seems like the master node is a single point of failure. Assuming I knew what to back up, if I restored the configuration to a new machine, could I introduce that to the cluster as a new master? Follow-up: what would then happen to files that were created between the time I created the backup and a failure.
 
2) I realize I could probably play games with how a DC is defined to achieve this, but is there a native way to combine replication modes for a file? (e.g., replication=001010 to replicate once to the same rack and once to different rack in the same DC)
If I remember correctly,  Use replication=020



3) If a volume server node dies, and I replace it with a new one, is data automatically replicated back to the new node?

No. 


4) Similarly, if a volume server node dies, and I don't replace it, will data be automatically replicated across the cluster to preserve whatever replication strategy I specified during file creation?


No. The reason is just like your case 3 and 4, that there's no good time to copy the files. But you can copy files by yourself.

5) If I'm insanely paranoid and want to do tape backups as well, are there any existing solutions to expose the file system as a mount? I could do a full backup of each node, but our tape system doesn't de-dupe.

You can backup each volume file directly, and avoid backing up the same volume.


Sorry for the barrage of questions; I'm curious because data loss in my environment is absolutely unacceptable (and would likely be the end of our entire organization).

Thank you very much,
Jonathan

--
You received this message because you are subscribed to the Google Groups "Weed File System" group.
To unsubscribe from this group and stop receiving emails from it, send an email to weed-file-syst...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chris Lu

unread,
Feb 15, 2014, 5:31:39 PM2/15/14
to weed-fil...@googlegroups.com
2) I realize I could probably play games with how a DC is defined to achieve this, but is there a native way to combine replication modes for a file? (e.g., replication=001010 to replicate once to the same rack and once to different rack in the same DC)
If I remember correctly,  Use replication=020

Correction: This particular combination, write once to the same rack and once to a different rack in the same DC, should use replication=011. However, this is not in the code as of today. I am going to add it very soon. 

ChrisLu

unread,
Mar 3, 2014, 4:34:20 AM3/3/14
to weed-fil...@googlegroups.com
FYI: This specific replication type, replication=011, is in release 0.47 now.

Actually the replication type now is enhanced. Here is the section just added to wiki page:

If the replication type is xyz

x: number of replica in other data centers
y
: number of replica in other racks in the same data center
z
: number of replica in other servers in the same rack

x,y,z each can be 0, 1, or 2. So there are 9 possible replication types, and can be easily extended. Each replication type will physically create x+y+z+1 copies of volume data files.



Chris
Reply all
Reply to author
Forward
0 new messages