--" PENAFIAN: E-mel ini dan apa-apa fail yang dikepilkan bersamanya ("Mesej") adalah ditujukan hanya untuk kegunaan penerima(-penerima) yang termaklum di atas dan mungkin mengandungi maklumat sulit. Anda dengan ini dimaklumkan bahawa mengambil apa jua tindakan bersandarkan kepada, membuat penilaian, mengulang hantar, menghebah, mengedar, mencetak, atau menyalin Mesej ini atau sebahagian daripadanya oleh sesiapa selain daripada penerima(-penerima) yang termaklum di atas adalah dilarang. Jika anda telah menerima Mesej ini kerana kesilapan, anda mesti menghapuskan Mesej ini dengan segera dan memaklumkan kepada penghantar Mesej ini menerusi balasan e-mel. Pendapat-pendapat, rumusan-rumusan, dan sebarang maklumat lain di dalam Mesej ini yang tidak berkait dengan urusan rasmi Universiti Malaya adalah difahami sebagai bukan dikeluar atau diperakui oleh mana-mana pihak yang disebut.DISCLAIMER: This e-mail and any files transmitted with it ("Message") is intended only for the use of the recipient(s) named above and may contain confidential information. You are hereby notified that the taking of any action in reliance upon, or any review, retransmission, dissemination, distribution, printing or copying of this Message or any part thereof by anyone other than the intended recipient(s) is strictly prohibited. If you have received this Message in error, you should delete this Message immediately and advise the sender by return e-mail. Opinions, conclusions and other information in this Message that do not relate to the official business of University of Malaya shall be understood as neither given nor endorsed by any of the forementioned. "
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/f117689c-3974-4d55-8125-453d1561fff6n%40googlegroups.com.
https://doc.beegfs.io/latest/license.html - section 3.4
Multiple metadata nodes are for scale out metadata performance. There are many workloads out there that thrash metadata services. The more metadata processes you have the more chance you have of handling those bad workloads. You will find only so many metadata processes can run on a node before it stops scaling well. Adding nodes scales better.
You could probably do metadata failover without buddy mirroring where you have dual ported drives with 2 servers connected to the drives.
Cheers,
Greg
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/f87a4fd0-48fe-4e56-975b-1a1e5aec9b41n%40googlegroups.com.
I'm also running Meta and Storage in HA mode, basically because we only have 1 Meta device and 1 Storage device which is shared among 2 x Meta servers and 2 x Storage Servers (each) via FC.
Each storage/meta device is divided in 2 pools.
Each server is Active for one pool, so both (meta) servers are
being utilized, as well as able to take-over the 2nd pool in case
of maintenance or a problem.
# pcs status
Cluster name: meta_cluster
Cluster Summary:
* Stack: corosync
* Current DC: mgs-1 (version 2.1.2-4.el8_6.2-ada5c3b36e2) -
partition with quorum
* Last updated: Wed Jun 21 13:41:45 2023
* Last change: Wed Jun 21 13:41:37 2023 by hacluster via crmd
on mgs-1
* 2 nodes configured
* 8 resource instances configured
Node List:
* Online: [ mgs-1 mgs-2 ]
Full List of Resources:
* Resource Group: beegfs_metadata1:
* VIP-metadata1 (ocf::heartbeat:IPaddr2): Started
mgs-1
* disk-metadata1 (ocf::heartbeat:Filesystem): Started
mgs-1
* beegfs-metadata1 (systemd:beegfs-meta@meta1): Started
mgs-1
* Resource Group: beegfs_metadata2:
* VIP-metadata2 (ocf::heartbeat:IPaddr2): Started
mgs-2
* disk-metadata2 (ocf::heartbeat:Filesystem): Started
mgs-2
* beegfs-metadata2 (systemd:beegfs-meta@meta2): Started
mgs-2
* ipmi-mgs-1 (stonith:fence_ipmilanplus): Started mgs-1
* ipmi-mgs-2 (stonith:fence_ipmilanplus): Started mgs-2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Same for storage.
No commercial solution but open source. Runs fine the last 2
years.
G
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAPqNE2XimxgHv6qTz32JNxa6b4XTqz1_%2BcaN5eFbnW-qm6rjLw%40mail.gmail.com.
Hi John,
That is a bit of a naïve question I hear all the time from management. Yes it is scratch storage. Now consider the impact of an outage to a scratch storage system, HA or otherwise. Say you have a modest compute cluster of 500 nodes. What do those 500 nodes do while the scratch fs is down? Sure some workloads might be OK if they are not IO intensive, but effectively HPC production stops. Now add in data loss on the scratch FS, perhaps from jobs that have just completed and so results are still on the scratch FS. There is now a cost of rerunning all the jobs to recompute the lost results. Finally, I don’t believe there is any such thing as “true scratch” as the effort and I/O BW involved in data transfer on and off scratch means in reality that data lives there for a while. I certainly would like to minimise the copy in/out process as it takes IOPs that workloads could be using. I really don’t want those 500 nodes spinning waiting for data.
Cheers,
Greg
From: fhgfs...@googlegroups.com <fhgfs...@googlegroups.com>
On Behalf Of John Hearns
Sent: Wednesday, June 21, 2023 8:40 PM
To: fhgfs...@googlegroups.com
Subject: Re: [beegfs-user] Metadata Server High Availability
If it is truly scratch storage, why do you require HA?
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAPqNE2ULMfAQWqLyg-CQRPqN714PjoNidMefkgGMVyZR9apmWg%40mail.gmail.com.
Fail over indeed does work without buddy mirroring. One will need the data available to both servers, though. But it will be mounted only to one server at a time through pacemaker.
The data could be available over FC, SAS or DRBD. As soon as one node fails, pacemaker will bring up IP, mountpoint and beegfs service on the standby node.
D.
From: fhgfs...@googlegroups.com <fhgfs...@googlegroups.com> on behalf of Ticonderoga <tico...@gmail.com>
Date: Wednesday, 21 June 2023 00:29
To: fhgfs...@googlegroups.com <fhgfs...@googlegroups.com>
Subject: Re: [beegfs-user] Metadata Server High Availability
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/CAHEEduatyQj4zwuXwxdi8uLR3FRK-TufJbbnUQaSZJuvrnyp_g%40mail.gmail.com.
Do you use ipoib on your cluster? You only need the ip address for initial handshake, then it will talk RDMA between clients and servers. Are the clusters in different subnets of the storages?
I have deployed clusters with multimode running up to 11 metadata targets shared across to two metadata servers using pacemaker for HA and the failover works greatly.
D.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/a14b6e85-ca4e-4475-a216-f8b65ec0cc96n%40googlegroups.com.
Hello, Chang,
For metadata we have two dell r750 with fibre channel to one unity storage. – It was designed that way, but I would go to sas connection between the servers and straoge. It has lower latency.
We have 11 Raid1 pools among 22 SSDs on the unity storage. Through pacemaker we run 5 targets on the first server and 6 on the second one.
The management service is tied to the same group as the first beegfs-metadata service, and its data remains inside the volume of the first mount point – so wherever the meta1 is running, the mgmtd is also running.
We use one virtual IP managed by pacemaker for each service (or else, 12 vIP – one per meta + one for the mgmtd).
On the target nodes, we have a very similar setup, there are 32 luns running on 4 servers, 8 luns per server. Each pair of servers have access to the same storage through SAS and is the failover pair of the other.
Let me know if I can be of any further help.
D.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/ffe9cd32-f919-4fe9-a99d-039bbd6368b6n%40googlegroups.com.
Hello, Chang,
I would create a set of services for each LUN, like:
Then create a group for each set of services (you will end up with 8 different groups)
Then, on pacemaker I would set a preferable constraint for each group like:
for LUNS in 1 2 3 4; do pcs constraint location group_${LUNS} prefers server01=50 server02=100; done
for LUNS in 5 6 7 8; do pcs constraint location group_${LUNS} prefers server01=100 server02=50; done
What do you think?
D.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/3d95f4c6-7321-4d21-99ae-daf7b41f1f3dn%40googlegroups.com.