Hello everyone,
I'm facing some latency in my beegfs cluster when using the beegfs client module on Debian buster and CentOS 7 servers. I'm running BeeGFS 7.2.5.
When I try to do a simple "ls" in a folder with 3088 files, it takes from 10 secondes up to 30 secondes before showing all the files. And there is no "cache" used when I try to do a second "ls" right after the first one. I mean it take as long as the first time to do the "ls" the second time.
I've done a network capture (with tcpdump) to see what was going on, and I found an exchange of 37000 packets between the client and the server. Most of the packets size are less than 300 bytes... And by most, I mean something like 36950 packets' size are less than 300 bytes. I tried to recompile the beegfs-client to remove the options "TCP_NODELAY" and/or "TCP_CORK" of the beegfs client sockets to see if that changes anything, and it don't (or it makes it even worse).
I found my current alias of "ls" is "ls --color=tty". If I'm using the command "/bin/ls" instead of the alias, the command show instantly the files. I think the issue is about the access to each files' metadata, which create a huge amount of packets. The normal ls (just "/bin/ls") is just creating an exchange of 172 packets between my beegfs server and my beegfs client.
I'm not using RDMA, and I can't try with RDMA.
Is this an expected behavior ? Is there a way to mitigate this issue ?
Thanks for your help.
Benjamin