Nothing about the underlying fs, but about sockets:
Feb 6 13:45:14 server1 beegfs-meta[3748]: 1015:__IBVSocket_createCommContext: Couldn't create QP (Error: -1)
Feb 6 13:45:14 server1 beegfs-meta[3748]: 464:IBVSocket_accept: creation of CommContext failed
Feb 6 13:45:14 server1 beegfs-meta[3748]: 467:IBVSocket_accept: rdma_reject failed
Feb 6 14:06:13 server1 beegfs-meta[3748]: 1015:__IBVSocket_createCommContext: Couldn't create QP (Error: -1)
Feb 6 14:06:13 server1 beegfs-meta[3748]: 464:IBVSocket_accept: creation of CommContext failed
Feb 6 14:06:13 server1 beegfs-meta[3748]: 467:IBVSocket_accept: rdma_reject failed
[..]
Feb 6 18:11:13 server1 rsyslogd: -- MARK --
[..]
Feb 7 13:03:33 server1 beegfs-meta[3748]: IBVSocket_accept:496: rdma_accept failed
Feb 7 13:03:33 server1 beegfs-meta[3748]: 1015:__IBVSocket_createCommContext: Couldn't create QP (Error: -1)
Feb 7 13:03:33 server1 beegfs-meta[3748]: 464:IBVSocket_accept: creation of CommContext failed
Feb 7 13:03:33 server1 beegfs-meta[3748]: 467:IBVSocket_accept: rdma_reject failed
[...]
Feb 7 14:11:18 server1 rsyslogd: -- MARK --
Feb 11 15:37:59 server1 beegfs-meta[208498]: 467:IBVSocket_accept: rdma_reject failed
Feb 11 15:37:59 server1 beegfs-meta[208498]: IBVSocket_accept:598: Ignoring conn manager event (8: RDMA_CM_EVENT_REJECTED)
[...]
Feb 11 19:01:04 server1 beegfs-meta[208498]: 464:IBVSocket_accept: creation of CommContext failed
Feb 11 19:01:04 server1 beegfs-meta[208498]: 467:IBVSocket_accept: rdma_reject failed
Let me know if you want the full dmesg log.
Today I have again the same kind of error:
Feb 12 03:51:44 server1 rsyslogd: -- MARK --
[...] almost only MARK.
Feb 14 06:11:57 server1 rsyslogd: -- MARK --
Feb 14 06:17:10 server1 beegfs-meta[208498]: 1015:__IBVSocket_createCommContext: Couldn't create QP (Error: -1)
Feb 14 06:17:10 server1 beegfs-meta[208498]: 464:IBVSocket_accept: creation of CommContext failed
Feb 14 06:17:10 server1 beegfs-meta[208498]: 467:IBVSocket_accept: rdma_reject failed
Feb 14 06:18:09 server1 beegfs-meta[208498]: 1015:__IBVSocket_createCommContext: Couldn't create QP (Error: -1)
Feb 14 06:18:09 server1 beegfs-meta[208498]: 464:IBVSocket_accept: creation of CommContext failed
Feb 14 06:18:09 server1 beegfs-meta[208498]: 467:IBVSocket_accept: rdma_reject failed
Feb 14 06:18:39 server1 beegfs-meta[208498]: 1015:__IBVSocket_createCommContext: Couldn't create QP (Error: -1)
Feb 14 06:18:39 server1 beegfs-meta[208498]: 464:IBVSocket_accept: creation of CommContext failed
Feb 14 06:18:39 server1 beegfs-meta[208498]: 467:IBVSocket_accept: rdma_reject failed
Feb 14 06:31:57 server1 rsyslogd: -- MARK --
I have the same kind of errors on other servers too. We are using 4 servers.
In fact I don't know if this is something normal, as even after having this error the fs was still available.
It has only be down when meta crashed on server1. What I mean by crash is:
the machine was not down.
I did /etc/init.d/beegfs-meta status and I got this: service dead, but /var/run/ pid file exists
I've only restarted beegfs-meta with /etc/init.d/beegfs-meta and the storage was working again.
I haven't seen any segfault or similar in dmesg.
HTH