Looking for some advise on distributed FS: Is BeeGfs the right option for me?

222 views
Skip to first unread message

Jones de Andrade

unread,
Jul 10, 2018, 12:37:42 PM7/10/18
to beegfs-user
Hi all.

I'm looking for some information on several distributed filesystems for our application.

It looks like it finally came down to two candidates, BeeGfs being one of them. But there are still a few questions about ir that I would really like to clarify, if possible.

Our plan, initially on 6 workstations, is to have it hosting a distributed file system that can withstand two simultaneous computers failures without data loss (something that can remember a raid 6, but over the network). This file system will also need to be also remotelly mounted (NFS server with fallbacks) by other 5+ computers. Students will be working on all 11+ computers at the same time (different requisites from different softwares: some use many small files, other a few really big, 100s gb, files), and absolutely no hardware modifications are allowed. This initial test bed is for undergraduate students usage, but if successfull wil be employed also for our small clusters. The connection is a silmple GbE.

Our actual concerns are:
1) Data Resilience: We seen that the buddy mirroring option is available. However, can ir be used to strip-parity data among three computers for each block? If it can only deal in "pairs", nothing impedes that a double failure "chooses to" happen precisely at those two, and that would be a show-stopper for our usage.

2) Metadata Resilience: We seen that we can have more than a single Metadata Server. However, do they have to be dedicated boxes, or they can share boxes with the Data Servers? Can it be configured in such a way that even if two metadata server computers fail the whole system data will still be accessible from the remaining computers, without interruptions, or they share different data aiming only for performance?

3) Other softwares compability: We seen that there is NFS compability, but it is for free version also, or only for the payed supported version? Also, any posix issues?

4) No single (or double) point of failure: every single possible stance has to be able to endure a *double* failure (yes, things can get time to be fixed). Does it need s single master server for any of its activities? Can it endure double failure? How long would it take to any sort of "fallback" to be completed, users would need to wait to regain access?

I think that covers the initial questions we have. Sorry if this is the wrong lit, however.

Looking forward for any answer or suggestion,

Regards,

Jones

Nick Tan

unread,
Jul 11, 2018, 8:31:11 PM7/11/18
to fhgfs...@googlegroups.com

Hi Jones,

 

There’s no method of 3x parity distribution in BeeGFS.  The only way to protect against node failure is buddy mirrors.  And as you have pointed out, it will not protect against 2 node failures if the 2 nodes are in the buddy pair.

 

I believe you can mix metadata and storage nodes in the same physical machine.  I’d recommend using different targets though.  Again, metadata can only be mirrored, so if you lose 2 metadata nodes and they are both in the same buddy pair, you will lose access to those files.  I think you’ll still be able to see the files that are served from the other metadata servers though.

 

You can share via NFS using a beegfs-client which is running a NFS server daemon.  This isn’t a paid feature - you can do this on the free version.  I haven’t come across any posix issues with this.

 

I think the only single point of failure would be the management node (please someone correct me if I’m wrong).  I don’t think the management node can be buddy mirrored.  However if you run this in a HA virtual machine cluster you should be fine.

 

Hope this helps.

 

Nick

 

PRIVACY AND CONFIDENTIALITY NOTICE
The information contained in this message is intended for the named recipients only. It may contain confidential information and if you are not the intended recipient, you must not copy, distribute or take any action in reliance on it. If you have received this message in error please destroy it and reply to the sender immediately or contact us at the above telephone number.
VIRUS DISCLAIMER
While we take every precaution against presence of computer viruses on our system, we accept no responsibility for loss or damage arising from the transmission of viruses to e-mail recipients.

James Burton

unread,
Jul 12, 2018, 11:33:53 AM7/12/18
to fhgfs...@googlegroups.com
Metadata and storage can be mixed in the same physical machine, but are usually different targets. Metadata needs a small amount of low-latency storage (often SSD/NVMe), while storage usually needs big cheap drives. (Big, fast, cheap: Pick any two) If you're using 1GbE, you'll be network-bound even with relatively slow storage.

The beegfs-client automatically mounts the remote the filesystem on the client (at /mnt/beegfs by default). If you need NFS, you can export this mounted directory, although sending all client requests through a single NFS server would negate many of the advantages of a parallel filesystem like BeeGFS. 

The management node is a single point of failure, but it's a small service that can be mirrored through standard HA practices. 


Jim


--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages