Low IOPS Performance over RDMA - Seeking Advice

90 views
Skip to first unread message

TennisBowling

unread,
Jan 27, 2025, 1:00:01 PMJan 27
to beegfs-user
Hello all,

I am experiencing unexpectedly low IOPS performance with my BeeGFS cluster and would greatly appreciate any insights or troubleshooting advice you can offer.

Problem:

My BeeGFS setup is delivering significantly lower IOPS than I expect, especially compared to the direct performance of the NVMe SSDs I am using.  I am getting around 6,000 read and 6,000 write IOPS when running `fio` with a random read/write 4KB workload at iodepth 32 against my BeeGFS mount (`/mnt/beegfs/fast`).  In contrast, when testing the NVMe SSDs directly on the storage servers using the same `fio` parameters, I achieve around 70,000 read and write IOPS. I lose a tiny bit of this if I mount one drive to another node with NFS over RDMA, but nowhere near the BeeGFS performance loss.


My Setup:

Nodes: Two nodes running Ubuntu Server (latest LTS):
    - `node1` (192.168.10.1): Storage Service, Client
    - `node2` (192.168.10.2): Management Service, Metadata Service, Storage Service, Client
- Network:
    - Mellanox ConnectX-3 (mlx4_0) cards on both machines
    - Running in Ethernet mode (RoCE) - `connUseRDMA = true` in BeeGFS configs
    - Two 40Gbps ports bonded using `bond0` in `balance-rr` mode. Intended for 80Gbps aggregate, but iperf shows only like 50Gbps, and I'm fine with that.
- Storage:
    - 3 NVME SSD's: 1 on node1, 2 on node2
- BeeGFS Configuration:
    - BeeGFS version: 7.4.5 (installed from official repository)
    - Volume: "fast-pool" using NVMe targets from both servers, mounted at `/mnt/beegfs/fast`
    - Stripe pattern: RAID0, Chunksize 128K (for IOPS testing), currently testing with 3 storage targets from NVMe pool.


- Does the low IOPS performance seem unusual for this type of setup?
- What could be the potential bottlenecks in my configuration?
- Are there any specific BeeGFS tuning parameters or best practices for maximizing IOPS with RDMA and NVMe storage that I should consider?
- Any suggestions for further troubleshooting steps or diagnostic tools I can use?
- Has anyone experienced similar IOPS limitations with Mellanox ConnectX-3 cards and BeeGFS over RDMA?

Any help or suggestions would be greatly appreciated. Thank you in advance for your time and expertise.

All The Best,
TennisBowling

Waltar

unread,
Jan 27, 2025, 6:52:14 PMJan 27
to beegfs-user
Beegfs is a parallel filesystem made for multiple I/O jobs and I would say normally these are not random read/write inside a file. Nevertheless try your fio with numjobs=32 while still iodepth=32.
4y old 2node beegfs, each 10x sata-ssd in raid6, ib100, >90% full do eg: 4k fio numjobs=1 10k read, 11k write, 4k fio numjobs=32 146k read, 107k write,  Good luck.

Waltar

unread,
Jan 27, 2025, 6:58:12 PMJan 27
to beegfs-user
PS: When doing fio with numjobs= >2   "--group_reporting" for better reading results.

LiangZhi Chen

unread,
Jan 28, 2025, 8:57:47 PMJan 28
to beegfs-user
Hi,

I also have low IOPs issues as well. Do you have any advice for that?

My Setup:

CPU: Intel 5515U
Memory: DDR5 32Gx8
SSD: 12 Gen5 NVMe SSD
RAID: RAID5 (Random Read 16M, Random write 3M IOPs)
NIC: Mellanox CX7

BeeGFS config:
Version: 7.4.5
Storage Node: 
   Two multi-node with XFS
Meta Node:
   Four multi-node with ext4

On the client, we achieve just 400K IOPs for random reads and 200K IOPs for random writes with 64 jobs and a depth of 64. 

This shows that our performance with the BeeGFS service is less than 10% of local performance. Is there any way to enhance our performance?

Waltar 在 2025年1月28日 星期二清晨7:58:12 [UTC+8] 的信中寫道:

Waltar

unread,
Jan 29, 2025, 7:18:21 AMJan 29
to beegfs-user
Hi LiangZhi,
that looks fine as again beegfs is a hpc filesystem normally for sequential reads/writes of multiple files in parallel.
For random r/w access of small blocks the overhead of all the involved threads on client and server side with the lots of communication latency hurt iops throughput.
Download elbencho for testing distributed filesystems (made by beegfs ex-developer) from https://github.com/breuner/elbencho 
"elbencho help"   - test from single and/or multiple nodes, this examples are for start on 1 client :
for metadata performance eg : elbencho -w -d -t 64 -n 64 -N 3200 -s 0 --lat /beegfs  &&   elbencho -r -t 64 -n 64 -N 3200 -s 0 --lat -F -D /beegfs
for throuphput performance eg : elbencho -w -d -t 32 -n 128 -N 128 -s 4g -b 1m /beegfs &&  elbencho -r -t 32 -n 128 -N 128  -s 4g -b 1m -F -D /beegfs
Reply all
Reply to author
Forward
0 new messages