Native cache mode on new kernels

78 views
Skip to first unread message

Philipp Falk

unread,
Aug 19, 2022, 5:49:10 AMAug 19
to BeeGFS User Group
Hello everyone,

shortly after our 7.3.1 release last week, we discovered an issue with the
native cache mode on Linux versions >5.13, which are used in Ubuntu 22.04
and RHEL 9.

Due to a change in the semantics of the 'enum iter_type' in Linux's uio.h,
a check for a specific type of iterator in the BeeGFS code returned a
wrong result which lead to an iterator not being advanced correctly in the
read code path. Depending on the combination of chunksize and I/O size,
this caused some pages in the cache to contain corrupt data.

I have attached a patch that fixes the check and prevents the cache
corruption on new kernels to this email. The patch can be applied like
this, depending on which version of the client package you use:

For beegfs-client-dkms:

$ cd /usr/src/beegfs-7.3.1
$ patch -p2 < /path/to/native_mode_linux_5.13.patch
$ dkms remove beegfs/7.3.1 -k $(uname -r)
$ dkms install beegfs/7.3.1 -k $(uname -r)

Or if you use the beegfs-client package:

$ cd /opt/beegfs/src/client/client_module_7
$ patch -p2 < /path/to/native_mode_linux_5.13.patch
$ /etc/init.d/beegfs-client rebuild

We advise against using the native cache mode on kernel versions >5.13 with
an unpatched version 7.3.1 of the BeeGFS module.

Best regards
- Philipp

--
Philipp Falk | Head of Engineering | m: philip...@thinkparq.com
ThinkParQ GmbH | Trippstadter Strasse 113 | 67663 Kaiserslautern | Germany
CEO: Frank Herold | COB: Dr. Franz-Josef Pfreundt | Registered: Amtsgericht Kaiserslautern HRB 31565 I VAT-ID-No.:DE 292001792

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential or proprietary information.
native_mode_linux_5.13.patch

yunhua li

unread,
Sep 23, 2022, 3:03:02 AM (10 days ago) Sep 23
to beegfs-user
Hi thanks for sharing the solution.  we applied the patch, on 5.15 kernel & beegfs 7.3.1. but still got crash stack trace. the stack trace changed a little. I also attached full log. any idea? thanks. 
 $ uname -a
Linux mm-idc-cpu-10-50-1-68 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Sep 22 17:52:36 [  287.737699] Call Trace:
Sep 22 17:52:36 [  287.737702]  <TASK>
Sep 22 17:52:36 [  287.737708]  _copy_to_iter+0x13f/0x6a0
Sep 22 17:52:36 [  287.737718]  ? mlx5_ib_post_send+0xab2/0x18d0 [mlx5_ib]
Sep 22 17:52:36 [  287.737752]  __IBVSocket_recvContinueIncomplete+0xb1/0x1e0 [beegfs]
Sep 22 17:52:36 [  287.737783]  IBVSocket_recvT+0x2f/0x60 [beegfs]
Sep 22 17:52:36 [  287.737810]  _RDMASocket_recvT+0x12/0x20 [beegfs]
Sep 22 17:52:36 [  287.737835]  __commkit_readfile_receive+0xe5/0x150 [beegfs]
Sep 22 17:52:36 [  287.737860]  __commkit_readfile_recvdata+0x8c/0x1d0 [beegfs]
Sep 22 17:52:36 [  287.737884]  ? __cond_resched+0x19/0x40
Sep 22 17:52:36 [  287.737891]  ? mutex_lock+0x13/0x40
Sep 22 17:52:36 [  287.737897]  FhgfsOpsCommkit_communicate+0x4e1/0xf60 [beegfs]


crash.log

yunhua li

unread,
Sep 23, 2022, 3:03:02 AM (10 days ago) Sep 23
to beegfs-user
Hi thaks for sharing the solution. we are using 5.15 kernel and beegfs 7.3.1, applied your patch, still got crash. I also attached full log here. any idea? thanks.

Sep 22 17:52:36 [  287.737699] Call Trace:
Sep 22 17:52:36 [  287.737702]  <TASK>
Sep 22 17:52:36 [  287.737708]  _copy_to_iter+0x13f/0x6a0
Sep 22 17:52:36 [  287.737718]  ? mlx5_ib_post_send+0xab2/0x18d0 [mlx5_ib]
Sep 22 17:52:36 [  287.737752]  __IBVSocket_recvContinueIncomplete+0xb1/0x1e0 [beegfs]
Sep 22 17:52:36 [  287.737783]  IBVSocket_recvT+0x2f/0x60 [beegfs]
Sep 22 17:52:36 [  287.737810]  _RDMASocket_recvT+0x12/0x20 [beegfs]
Sep 22 17:52:36 [  287.737835]  __commkit_readfile_receive+0xe5/0x150 [beegfs]
Sep 22 17:52:36 [  287.737860]  __commkit_readfile_recvdata+0x8c/0x1d0 [beegfs]
Sep 22 17:52:36 [  287.737884]  ? __cond_resched+0x19/0x40
Sep 22 17:52:36 [  287.737891]  ? mutex_lock+0x13/0x40
Sep 22 17:52:36 [  287.737897]  FhgfsOpsCommkit_communicate+0x4e1/0xf60 [beegfs]
Sep 22 17:52:36 [  287.737924]  FhgfsOpsCommKit_readfileV2bCommunicate+0x34/0x50 [beegfs]
Sep 22 17:52:36 [  287.737948]  ? writefile_nextIter+0x50/0x50 [beegfs]
Sep 22 17:52:36 [  287.738011]  FhgfsOpsRemoting_readfileVec+0x3f8/0x610 [beegfs]
Sep 22 17:52:36 [  287.738041]  FhgfsOpsHelper_readCached+0x13d/0x3b0 [beegfs]
Sep 22 17:52:36 [  287.738073]  ? from_kgid+0x12/0x20
Sep 22 17:52:36 [  287.738081]  ? __FhgfsInode_initOpenIOInfo+0x7e/0x90 [beegfs]
Sep 22 17:52:36 [  287.738113]  read_common+0xd7/0x1b0 [beegfs]
Sep 22 17:52:36 [  287.738144]  FhgfsOps_buffered_read_iter+0x5d/0xb0 [beegfs]

On Friday, August 19, 2022 at 2:49:10 AM UTC-7 Philipp Falk wrote:
crash.log
Reply all
Reply to author
Forward
0 new messages