"mkdir -p" fails with Remote I/O error

1,177 views
Skip to first unread message

tilo.bus...@gmail.com

unread,
Aug 14, 2014, 6:28:25 AM8/14/14
to fhgfs...@googlegroups.com
Hi,

I encounted a problem with running the command "mkdir -p <some directory>", for which I receive a "Remote I/O error".

Did anyone encounter the same or a similar error? Is there a solution or might this be a new bug?

Following system architecture:

  • 3 storage Servers, 3 meta servers on 3 physical servers (ribnode003 to ribnode005)
  • (12 Clients on 12 physical servers, including the three storage/meta servers, ribnode001 to ribnode012)
  • OS is Ubuntu 14.04
  • Software release fhgfs 2014.01.r8.debian7
  • fhgfs is mounted to /mnt/fhgfs_ribdata on all servers

The problematic command is:

root@ribnode005:~# mkdir -p /mnt/fhgfs_ribdata/tools/lmod/foo/bar/
mkdir: cannot create directory ‘/mnt/fhgfs_ribdata/tools/lmod’: Remote I/O error

This is what strace says (abridged):

root@ribnode005:~# strace -e file -f mkdir -p /mnt/fhgfs_ribdata/tools/lmod/foo/bar/
execve("/bin/mkdir", ["mkdir", "-p", "/mnt/fhgfs_ribdata/tools/lmod/fo"...], [/* 32 vars */]) = 0
[...]
mkdir("/mnt", 0755)                     = -1 EEXIST (File exists)
chdir("/mnt")                           = 0
mkdir("fhgfs_ribdata", 0755)            = -1 EEXIST (File exists)
chdir("fhgfs_ribdata")                  = 0
mkdir("tools", 0755)                    = -1 EEXIST (File exists)
chdir("tools")                          = 0
mkdir("lmod", 0755)                     = 0
open("lmod", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
mkdir("foo", 0755)                      = 0
open("foo", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = -1 EREMOTEIO (Remote I/O error)
[...]

It appears to me, that fhgfs creates the new directory, but when it tries to open it, there is a remote I/O error. Repeating the command would work without an error.

Could this be a problem with the communication between the three meta servers?

Does anyone have an idea?

Regards,

Tilo

Tilo Buschmann

unread,
Aug 14, 2014, 9:01:09 AM8/14/14
to fhgfs...@googlegroups.com


On Thursday, August 14, 2014 12:28:25 PM UTC+2, Tilo Buschmann wrote:
Hi,

I encounted a problem with running the command "mkdir -p <some directory>", for which I receive a "Remote I/O error".


I should add that there is no other underlying (known) issue in the network or on the machines. All servers are reachable, all machines pingable, all RAIDs in best condition,  ECC-RAM works like a charm. The configuration is not strange in any way, no experimental feature is activated ...

tilo.buschmann@ribnode002:~$ sudo fhgfs-check-servers 
[sudo] password for tilo.buschmann: 
Management
==========
ribnode003 [ID: 1]: reachable at 10.131.17.43:8008 (protocol: TCP)
Metadata
==========
ribnode005 [ID: 4309]: reachable at 10.131.16.45:8005 (protocol: TCP)
ribnode003 [ID: 41034]: reachable at 10.131.16.43:8005 (protocol: TCP)
ribnode004 [ID: 60358]: reachable at 10.131.16.44:8005 (protocol: TCP)
Storage
==========
ribnode004 [ID: 12384]: reachable at 10.131.16.44:8003 (protocol: TCP)
ribnode003 [ID: 24990]: reachable at 10.131.16.43:8003 (protocol: TCP)
ribnode005 [ID: 61323]: reachable at 10.131.16.45:8003 (protocol: TCP)

Christian Mohrbacher

unread,
Aug 14, 2014, 10:56:29 AM8/14/14
to fhgfs...@googlegroups.com
Hi,

Am 14.08.2014 um 12:28 schrieb tilo.bus...@gmail.com:
> Hi,
>
> I encounted a problem with running the command "mkdir -p <some
> directory>", for which I receive a "Remote I/O error".
>
> Did anyone encounter the same or a similar error? Is there a solution
> or might this be a new bug?
>

There is no known bug, which could cause this.

>
>
>
> It appears to me, that fhgfs creates the new directory, but when it
> tries to open it, there is a remote I/O error. Repeating the command
> would work without an error.
>
> Could this be a problem with the communication between the three meta
> servers?
>
>

Which version are you using? Could you attach the log files of the
client and the metadata servers?

Regards,
Christian


Tilo Buschmann

unread,
Aug 14, 2014, 11:32:02 AM8/14/14
to fhgfs...@googlegroups.com, christian....@itwm.fraunhofer.de
Hi Christian,



On Thursday, August 14, 2014 4:56:29 PM UTC+2, Christian Mohrbacher wrote:

> It appears to me, that fhgfs creates the new directory, but when it
> tries to open it, there is a remote I/O error. Repeating the command
> would work without an error.
>
> Could this be a problem with the communication between the three meta
> servers?
>
>

Which version are you using?

We are using 2014.01.r8.debian7 on all servers and clients.
 
Could you attach the log files of the
client and the metadata servers?

Sure thing. I restarted all server and client processes and copied only the content from the client/server start to the end of the test into the log files.

Regards,

Tilo
 

Regards,
Christian


fhgfs-client-ribnode003.log
fhgfs-meta-ribnode003.log
fhgfs-meta-ribnode004.log
fhgfs-meta-ribnode005.log
mkdir-ribnode003.log

Tilo Buschmann

unread,
Aug 14, 2014, 12:36:13 PM8/14/14
to fhgfs...@googlegroups.com, christian....@itwm.fraunhofer.de
Hi Christian,

I increased the log level. The resulting logs might be more interesting to you. If I were to guess, I would say it is some kind of race condition between two meta servers.

  • Create Dir "2" on Meta Server ribnode003
  • Open Dir "2"
  • Create Dir "3" in Dir "2" on Meta Server ribnode004
  • Open Dir "3"
  • Remote I/O Error
Regards,

Tilo
fhgfs-client_run2_ribnode003.log
fhgfs-meta_run2_ribnode003.log
fhgfs-meta_run2_ribnode004.log
fhgfs-meta_run2_ribnode005.log
mkdir_run2_ribnode003.log

Tilo Buschmann

unread,
Aug 14, 2014, 3:36:35 PM8/14/14
to fhgfs...@googlegroups.com, christian....@itwm.fraunhofer.de
I debugged the code a bit ("./filesystem/FhgfsOpsInode.c") and found that the last atomicOpen fails because "lookupOutInfo.statRes != FhgfsOpsErr_SUCCESS". I will debug FhgfsOpsRemoting_lookupIntent tomorrow, but meanwhile you may have an idea what goes wrong here.

Frank Kautz

unread,
Aug 15, 2014, 2:51:51 AM8/15/14
to fhgfs...@googlegroups.com
Hi Tilo,

which kernel version do you use? This is an issue with newer kernels
which supports atomic open, it is fixed in our source and will be
release with the next version.

You can apply a workaround to the file FhgfsOpsInode.c beginning with
line 54.
Remove the following lines:
#ifdef KERNEL_HAS_ATOMIC_OPEN
.atomic_open = FhgfsOps_atomicOpen,
#endif

Restart the client. This should rebuild the client. If not use
/etc/init.d/fhgfs-client rebuild and then restart the client.

Sorry for any inconvenience.

kind regards,
Frank


Am 08/14/2014 um 09:36 PM schrieb Tilo Buschmann:
> I debugged the code a bit ("./filesystem/FhgfsOpsInode.c") and found
> that the last atomicOpen fails because "lookupOutInfo.statRes !=
> FhgfsOpsErr_SUCCESS". I will debug FhgfsOpsRemoting_lookupIntent
> tomorrow, but meanwhile you may have an idea what goes wrong here.
>
> On Thursday, August 14, 2014 6:36:13 PM UTC+2, Tilo Buschmann wrote:
>
> Hi Christian,
>
> I increased the log level. The resulting logs might be more
> interesting to you. If I were to guess, I would say it is some kind
> of race condition between two meta servers.
>
> * Create Dir "2" on Meta Server ribnode003
> * Open Dir "2"
> * Create Dir "3" in Dir "2" on Meta Server ribnode004
> * Open Dir "3"
> * Remote I/O Error
>
> Regards,
>
> Tilo
>
> --
> You received this message because you are subscribed to the Google
> Groups "fhgfs-user" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to fhgfs-user+...@googlegroups.com
> <mailto:fhgfs-user+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Tilo Buschmann

unread,
Aug 15, 2014, 3:32:10 AM8/15/14
to fhgfs...@googlegroups.com, frank...@itwm.fraunhofer.de
Hi Frank,

I use the newest Ubuntu "stock" kernel:

root@ribnode003:~# uname -a
Linux ribnode003 3.13.0-34-generic #60-Ubuntu SMP Wed Aug 13 15:45:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Your suggestion did the trick! I do not get a remote I/O error anymore. Thanks a lot. I will deploy the patch to all our systems.

Regards,

Tilo

laurenc...@gmail.com

unread,
Jun 22, 2017, 3:10:08 AM6/22/17
to beegfs-user, frank...@itwm.fraunhofer.de
Hi,

We are experiencing the same issue using kernel 3.16.0-77-generic, however the "patch" does not solve the issue.

FHGFS version: 2014.01.r16.debian7

Any other suggestions to resolving this?

Thanks
Laurence
Reply all
Reply to author
Forward
0 new messages