disk failure in a mirror buddy leads to data corruption

88 views
Skip to first unread message

Bishoy Mikhael

unread,
Feb 21, 2018, 8:43:40 PM2/21/18
to beegfs-user
Hi All,

I'm testing BeeGFS 6.17 on CentOS 7.3.1611 with kernel 4.4.116
The setup is a simple two nodes cluster and one client node.
The cluster status (before the failure test) was as follows:

# beegfs-net

mgmt_nodes
=============
localhost.localdomain [ID: 1]
   Connections: TCP: 1 (10.118.33.208:8008);

meta_nodes
=============
localhost2.localdomain [ID: 1]
   Connections: <none>
localhost.localdomain [ID: 2]
   Connections: TCP: 1 (10.118.33.208:8005);

storage_nodes
=============
localhost2.localdomain [ID: 1]
   Connections: TCP: 1 (10.118.33.173:8003);
localhost.localdomain [ID: 3]
   Connections: TCP: 1 (10.118.33.208:8003);


# beegfs-ctl --listnodes --nodetype=storage --details
localhost2.localdomain [ID: 1]
   Ports: UDP: 8003; TCP: 8003
   Interfaces: enp0s3(TCP)
localhost.localdomain [ID: 3]
   Ports: UDP: 8003; TCP: 8003
   Interfaces: enp0s3(TCP)

The mirror buddy is configured as target 301 is the primary and target 101 is the secondary:
# beegfs-ctl --listtargets --mirrorgroups
MirrorGroupID MGMemberType TargetID   NodeID
============= ============ ========   ======
          100      primary      301        3
          100    secondary      101        1

# beegfs-ctl --listmirrorgroups --nodetype=storage
     BuddyGroupID   PrimaryTargetID SecondaryTargetID
     ============   =============== =================
              100               301               101

The test is to write 32KB zero files using an infinite loop from the client node as follows:

# x=0; while true; do dd if=/dev/zero of=file$x bs=1K count=32; echo $x; x=$((x+1)); done


While writing the files, I've failed the primary storage target (301) by removing it from the SCSI bus as follows:
# echo "scsi remove-single-device 5 0 0 0" > /proc/scsi/scsi

Immediately, the client was dumping the following write error:
dd: error writing ‘file2000’: Remote I/O error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.0186692 s, 0.0 kB/s

Surprisingly, the storage targets status was showing that target "301" is still online even after waiting for few minutes to query the status:
# beegfs-ctl --listtargets --nodetype=storage --state
TargetID     Reachability  Consistency   NodeID
========     ============  ===========   ======
     101           Online         Good        1
     301           Online         Good        3

BeeGFS is showing the file contents as follows:
# beegfs-ctl --getentryinfo /mnt/beegfs/test/file2000
Path: /test/file2000
Mount: /mnt/beegfs
EntryID: A8-5A8D076C-2
Metadata buddy group: 100
Current primary metadata node: localhost.localdomain [ID: 2]
Stripe pattern details:
+ Type: Buddy Mirror
+ Chunksize: 1M
+ Number of storage targets: desired: 1; actual: 1
+ Storage mirror buddy groups:
  + 100

While the OS was showing an empty file!
# ls -lh /mnt/beegfs/test/file2000
-rw-r--r-- 1 root root 0 Feb 20 21:45 /mnt/beegfs/test/file2000

The storage daemon log (/var/log/beegfs-storage.log) is showing the following error:
(0) Feb20 21:44:26 Worker7 [SessionLocalFile (open)] >> Failed to open chunkFile: u0/5A8D/0/0-5A8D02F8-2/37F-5A8D0730-2
(0) Feb20 21:44:26 Worker12 [ChunkStore.cpp:682] >> Failed to create file. chunkFilePathStr: u0/5A8D/0/0-5A8D02F8-2/380-5A8D0730-2; retVal: Internal error (1)
(0) Feb20 21:44:26 Worker12 [SessionLocalFile (open)] >> Failed to open chunkFile: u0/5A8D/0/0-5A8D02F8-2/380-5A8D0730-2
(0) Feb20 21:44:26 Worker10 [ChunkStore.cpp:682] >> Failed to create file. chunkFilePathStr: u0/5A8D/0/0-5A8D02F8-2/381-5A8D0730-2; retVal: Internal error (1)
.
.
.


So My questions are:
1. Why BeeGFS remained showing the Primary target as online even after the failure by few minutes?
2. Why BeeGFS was able to create the file but failed to write to it?
3. Why the file content inspection shows that the file has no problem?!


Regards,
Bishoy

Nick Tan

unread,
Feb 21, 2018, 9:25:43 PM2/21/18
to fhgfs...@googlegroups.com

Hi Bishoy,

 

To answer your question 1, as long as the beegfs-storage process is running, BeeGFS thinks the node is up.  So, in your case you would need to either shut down the server or stop the beegfs-storage process for the target 301 to show as offline.

 

Nick

--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

PRIVACY AND CONFIDENTIALITY NOTICE
The information contained in this message is intended for the named recipients only. It may contain confidential information and if you are not the intended recipient, you must not copy, distribute or take any action in reliance on it. If you have received this message in error please destroy it and reply to the sender immediately or contact us at the above telephone number.
VIRUS DISCLAIMER
While we take every precaution against presence of computer viruses on our system, we accept no responsibility for loss or damage arising from the transmission of viruses to e-mail recipients.

Bishoy Mikhael

unread,
Feb 23, 2018, 4:30:12 PM2/23/18
to beegfs-user
Hi Nick,

That sounds scary!
When a target fails, the filesystem will not feel it and mark it as down!
That leads to data corruption, as I was able to create new files on the secondary target in the mirror buddy, but with no content!

That's a serious bug!

alexander.eekhoff

unread,
Feb 26, 2018, 3:18:06 AM2/26/18
to beegfs-user
Hi Bishoy,


For better understanding of your issue, may I ask you, what do you mean with you could create new files on the secondary target? In BeeGFS with its dedicated metadata services it is normal that even without running storage services, you still can create files as long as you don't write to them, because those operations invoke only metadata server operations. A request to the storage targets for commands like 'ls -l' would create a lot of unnecessary network traffic. So, from your comment I assume you could write files, without any IO error message while the primary storage target was online but with the 'disappeared' underlying storage device. Is this correct?

Regarding your questions from before. Nick already answered question 1.
Regarding question 2. The file was created by the metadata server. As long as they are running all metadata operations would work perfectly.
Regarding question 3. 'ls -lh' is as well only a metadata operation. No file inspection happens. For a file inspection it would be necessary to really open the file, with a command like md5sum or hexdump.

Best regards,

Alexander
Reply all
Reply to author
Forward
0 new messages