"qsub -wd /fhgfs-mount" fail with can't stat Error

258 views
Skip to first unread message

Franck Tison

unread,
Sep 17, 2014, 3:00:14 AM9/17/14
to fhgfs...@googlegroups.com
we try to use as sge working directory (wd) a fhgfs partition but it's failed in some condition

Shepherd error:

09/16/2014 17:12:01 [0:20360]: can't stat() "/fhgfs-mount/test" as stdout_path: Remote I/O error KRB5CCNAME=none uid=0 gid=0 0 202 20031

if we put a slash at the end of the path, it's working
qsub -wd /fhgfs-mount/test/ zz.sh
but failed without
qsub -wd /fhgfs-mount/test zz.sh
if we use option -cwd for current working directory, it's failed to.

F.


Sven Breuner

unread,
Sep 19, 2014, 7:05:00 AM9/19/14
to fhgfs...@googlegroups.com, tiso...@gmail.com
Hi Franck,

Franck Tison wrote on 09/17/2014 09:00 AM:
> we try to use as sge working directory (wd) a fhgfs partition but it's
> failed in some condition
>
> Shepherd error:
>
> 09/16/2014 17:12:01 [0:20360]: can't stat() "/fhgfs-mount/test" as
> stdout_path: Remote I/O error KRB5CCNAME=none uid=0 gid=0 0 202 20031
>
> if we put a slash at the end of the path, it's working
> qsub -wd /fhgfs-mount/test*/* zz.sh
> but failed without
> qsub -wd /fhgfs-mount/test zz.sh
> if we use option -cwd for current working directory, it's failed to.

I have no specific idea of what went wrong (and can't try to reproduce
due to lack of sge) so I'll try some more or less general questions/hints:

* "Remote I/O error" is a kind of generic error of fhgfs and typically
in these cases, more details are available in the log files. So do you
see anything related in /var/log/fhgfs-client.log of the host where the
error occurred (not sure if it's the machine where the job was submitted
or the compute node) or in /var/log/fhgfs-meta.log?

* Which client version are you using (you can find it in dmesg or in the
fhgfs-client.log)

* Is "test" a normal directory or is it a symlink?

* Do you see a possibility to generate an strace of the process that
receives the "remote i/o error" (shepherd?) so we can see some context?

Best regards,
Sven Breuner
Fraunhofer
Reply all
Reply to author
Forward
0 new messages