Hi Franck,
Franck Tison wrote on 09/17/2014 09:00 AM:
> we try to use as sge working directory (wd) a fhgfs partition but it's
> failed in some condition
>
> Shepherd error:
>
> 09/16/2014 17:12:01 [0:20360]: can't stat() "/fhgfs-mount/test" as
> stdout_path: Remote I/O error KRB5CCNAME=none uid=0 gid=0 0 202 20031
>
> if we put a slash at the end of the path, it's working
> qsub -wd /fhgfs-mount/test*/* zz.sh
> but failed without
> qsub -wd /fhgfs-mount/test zz.sh
> if we use option -cwd for current working directory, it's failed to.
I have no specific idea of what went wrong (and can't try to reproduce
due to lack of sge) so I'll try some more or less general questions/hints:
* "Remote I/O error" is a kind of generic error of fhgfs and typically
in these cases, more details are available in the log files. So do you
see anything related in /var/log/fhgfs-client.log of the host where the
error occurred (not sure if it's the machine where the job was submitted
or the compute node) or in /var/log/fhgfs-meta.log?
* Which client version are you using (you can find it in dmesg or in the
fhgfs-client.log)
* Is "test" a normal directory or is it a symlink?
* Do you see a possibility to generate an strace of the process that
receives the "remote i/o error" (shepherd?) so we can see some context?
Best regards,
Sven Breuner
Fraunhofer