how to properly recover a failed storage target

Tobias

unread,

Oct 9, 2018, 10:57:04 PM10/9/18

to beegfs-user

Hi all,

We have 3 storage nodes with 6 or 8 storage targets each.

b1 - 6 targets

b2 - 8 targets

b3 - 6 targets

We just recently spun "b3" up to help with a temporary increase in data and were planning to decomission it again fairly soon.

We use mirror groups for all data on beegfs (after we got stung once where the directory was not set up right).

Now target 17 which was attached to b3 died (complete disk failure). Unfortunately beegfs did not make it easy to identify what was going on.

While accessing any files with chunks on the failed disk caused I/O errors to come up on the client, beegfs-admon-gui reported "no known issues" and beegfs-ctl said TargetID 17 was "Online" with consistency "Good".

I checked manually on the host and the mountpoint was definitely not accessible anymore - lots of ugly unhappiness in dmesg and disk confirmed dead when tested on another PC.

I added a new disk to b3 and followed the wiki to make it use the same TargetID do a full re-sync of the disk. I think this is what I followed: https://www.beegfs.io/wiki/StorageSynchronizationConstruction

Now everything looks happy again. Except the re-sync took only a few seconds and left the disk "empty" (except for the config files).

# beegfs-ctl --listtargets --nodetype=storage --longnodes --state
TargetID     Reachability  Consistency   NodeID
========     ============  ===========   ======
      1           Online         Good   beegfs-storage b1 [ID: 1]
      2           Online         Good   beegfs-storage b1 [ID: 1]
      3           Online         Good   beegfs-storage b1 [ID: 1]
      4           Online         Good   beegfs-storage b1 [ID: 1]
      5           Online         Good   beegfs-storage b1 [ID: 1]
      6           Online         Good   beegfs-storage b1 [ID: 1]
      7           Online         Good   beegfs-storage b2 [ID: 2]
      8           Online         Good   beegfs-storage b2 [ID: 2]
     10           Online         Good   beegfs-storage b2 [ID: 2]
     11           Online         Good   beegfs-storage b2 [ID: 2]
     12           Online         Good   beegfs-storage b2 [ID: 2]
     13           Online         Good   beegfs-storage b2 [ID: 2]
     14           Online         Good   beegfs-storage b2 [ID: 2]
     15           Online         Good   beegfs-storage b2 [ID: 2]
     16           Online         Good   beegfs-storage b3 [ID: 3]
     17           Online         Good   beegfs-storage b3 [ID: 3]
     18           Online         Good   beegfs-storage b3 [ID: 3]
     19           Online         Good   beegfs-storage b3 [ID: 3]
     20           Online         Good   beegfs-storage b3 [ID: 3]
     21           Online         Good   beegfs-storage b3 [ID: 3]

# beegfs-ctl --listtargets --mirrorgroups
MirrorGroupID MGMemberType TargetID   NodeID
============= ============ ========   ======
         101      primary        1        1
         101    secondary        2        1
         102      primary        3        1
         102    secondary        4        1
         103      primary        5        1
         103    secondary        6        1
         201      primary        7        2
         201    secondary        8        2
         203      primary       11        2
         203    secondary       12        2
         204      primary       13        2
         204    secondary       14        2
         205      primary       15        2
         205    secondary       10        2
         301      primary       16        3
         301    secondary       17        3
         302      primary       18        3
         302    secondary       19        3
         303      primary       20        3
         303    secondary       21        3

Looking at some files which were on the failed storage device (17) things also appear good on the surface.

# beegfs-ctl --getentryinfo --verbose /mnt/beegfs/storage/video-1525866804.mp4
EntryID: 1F-5BAD6B8E-1
Metadata buddy group: 1
Current primary metadata node: mb1 [ID: 1]
Stripe pattern details:
+ Type: RAID10
+ Chunksize: 512K
+ Number of storage targets: desired: 4; actual: 4
+ Storage targets:
  + 1 @ b1 [ID: 1]
  + 8 @ b2 [ID: 2]
  + 11 @ b2 [ID: 2]
  + 17 @ b3 [ID: 3]
Chunk path: u3E9/5BAD/6/1E-5BAD6B8E-1/1F-5BAD6B8E-1
Dentry path: 6B/4F/1E-5BAD6B8E-1/

I can access the file again (e.g. cp /mnt/beegfs/storage/video-1525866804.mp4 /mnt/beegfs/storage/video-1525866804-copy.mp4 without I/O error) and the destination and source files both have a reasonable file size. Running md5sum on the new copy and exisitng copy show identical checksums.

Looking at the content of the files confirms that the success is only superficial.

The video is heavily corrupted.

A hexdump confirms there are big chunks of data that just consist of null bytes.

It appears to me as if the chunks of the file that used to be on target 17 just get filled with blanks (null bytes).

The buddy mirrors were set up on beegfs version 6 and we have since moved to version 7.

I can see that any newly created files / directories with buddy mirror enabled report the stripe pattern as "Buddy Mirror" instead of "RAID10" and files show the mirror groups instead of storage targets.

This overall situation concerns me as it looks like beegfs is eating data / covering up the fact that data is not accessible.

Now if the file is stored in a "RAID10" pattern, it should mean that in the above example two of the storage targets 1,8,11,17 will each have identical chunks of this file. Or it means target 16 (which is the mirror buddy of 17) has a copy of the chunk.

I have a few questions:

- How can I find out where the mirrored chunks are stored?

- How can I recover the data? (Yes we have a backup but fetching files from tape is slow and we would first need to identify which files are affected, so I would rather restore from beegfs).

- What did I do wrong? (E.g. Why did beegfs not do a failover and recover automatically? Any other beginner's mistakes?)

Happy to give any further details if that helps answering my questions.

Many thanks

Tobias

Nick Tan

unread,

Oct 10, 2018, 8:48:32 PM10/10/18

to fhgfs...@googlegroups.com

Hi Tobias,

I am not familiar with the RAID10 stripe type, so I can’t help you there but I can tell you why the failover didn’t happen.

BeeGFS buddy mirroring is set up for host failures, not target failures. If you have multiple targets on one host, beegfs thinks everything is fine as long as the host is up. If a target fails, you need to stop the beegfs-storage process on that host for the failover to kick-in, and then it will be for all the targets on that node rather than just the failed target.

I think BeeGFS was designed to have a single RAID target per storage server which is why the buddy mirroring behaves as it does. Unfortunately this behaviour isn’t very well documented.

Thanks,

Nick

--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

PRIVACY AND CONFIDENTIALITY NOTICE
The information contained in this message is intended for the named recipients only. It may contain confidential information and if you are not the intended recipient, you must not copy, distribute or take any action in reliance on it. If you have received this message in error please destroy it and reply to the sender immediately or contact us at the above telephone number.
VIRUS DISCLAIMER
While we take every precaution against presence of computer viruses on our system, we accept no responsibility for loss or damage arising from the transmission of viruses to e-mail recipients.

Tobias

unread,

Oct 10, 2018, 9:13:02 PM10/10/18

to beegfs-user

Hi Nick,

I *think* RAID10 stripe type was in there either just as default or when I set up buddy mirror under beegfs version 6. Can anyone confirm this?

Thanks for that bit of info regarding single target per host. I did notice they talked about RAID set-up on hosts in various wiki pages but I never got that they did this due to the way Buddy Mirror / beegfs was designed (I thought the RAID 5 or 6 was just to get more usable disk space than a full mirror would give).

Wow ... so you're saying my Mirror Groups that are all set up on a single host (Target 1,2 on host 1 / target 3,4 on host 1 / up to target 20,21 on host 3) are actually useless because they will never do a fail-over unless the host died in which case neither of the 2 mirrors would be available.

I'm a bit disappointed this (crucial) bit of information is not explicitly documented and not even enforced in the management software (beegfs-ctl or beegfs-admon-gui). If this set-up is not supported then beegfs-ctl should not permit buddy groups on a single host without a warning.

OK, looks like I have to go and reconfigure my mirrors and shuffle data now - and hope nothing breaks on the way.

If anyone else here can pitch in with some more intel on other quirks or where I might find the mirrored data (if it even exists) on my setup as per below, that would be much appreciated.

Cheers

Tobias

unread,

Oct 12, 2018, 12:03:36 AM10/12/18

to beegfs-user

Hi all,

Just went back to the wiki and came across a section that does not line up with my experience and Nick's comment below.

Could possibly one of the beegfs team shed some light here?

From here https://www.beegfs.io/wiki/AboutMirroring

In general, a storage buddy group could even be composed of two targets that are attached to the same server.

[...]

If the primary storage target [...] of a buddy group is unreachable, it will get marked as offline and a failover to the secondary will be issued. In this case, the former secondary will become the new primary.

As far as I can tell the failover does not happen for targets on the same server and according to Nick having multiple targets on one server is not how the system is designed to operate.

Any input appreciated.

Thanks

Tobias

Nick Tan

unread,

Oct 12, 2018, 12:23:32 AM10/12/18

to fhgfs...@googlegroups.com

That’s interesting that the documentation says storage buddy groups can be two targets on the same server. If that was the case, failover would never occur when a target fails. Unless you run multi-mode and have a beegfs-storage process per target. Failover only starts when communication is lost to the beegfs-storage process (that is my understanding anyway).

Reply all

Reply to author

Forward