The programs are all written in Fortran. The machine is running
RedHat Enterprise Linux. Disk drive A is an internal EIDE disk
drive, while disk drive B is an external USB disk drive.
User A can run Programs A, B, C, and D on disk drives A and B without
any problems.
User B can run Programs A, B, C, and D on disk drives A and B without
any problems.
User C can run Programs A, B, and C on disk drives A and B without any
problems, but Program D dies right when it tries to write the last two
lines of output to a disk file on Drive B. The WRITE statement
utilizes the IOSTAT= option, so the program exits gracefully and
displays the value of the IOSTAT variable, which is 127. Meanwhile,
if User C runs Program D on Drive A, it works fine.
My first thought was that disk drive B had become full, but I used
df to check the available space, and there was over 60 GB of space
available. My next thought was to look up run-time error 127, but
the ifort version 10 "Building Applications" manual doesn't have a
127 listed among its run-time errors. (Now I'm wondering whether the
person who built the program used g77 instead of ifort.) The third
thought was to recompile the program without IOSTAT on that WRITE
statement so that I could see the error message generated by the
system, but the program required a library that was not immediately
in evidence, so it may require contacting the person who built the
program to finish that approach.
In the meantime, can anybody think of why a program would fail for
one particular user on one particular disk drive (an external USB)
when 95 percent of the WRITE statements work just fine, with the
failure occurring on only the last two lines of output? All users
are running the exact same binary executable on the exact same input
data file, and it works without problems for all but one user, and
it even works for this one user on one disk drive, but not on the
other disk drive (with plenty of available space).
Anybody know what error 127 is? Now it seems less likely that ifort
was used, though available, because ifort doesn't appear to have an
error 127 in the runtime. If the program was built with g77, and if
g95 carried over the same error messages as g77, then 127 is a system
error message. In which case the question is: what is Linux error 127?
Secondly, think quotas and so on. Also remember that civilised systems
(not include Gnome, for example) use text files for configuration, and
can be searched. 'grep 127 /usr/include/*/errno.h' gives you your
answer!
I have no idea what sort of key it is blithering on about, as I am
no USB expert, but solving that is left as an exercise for the
student :-)
Regards,
Nick Maclaren.
Usually the easiest way to get a better error message is to simply
remove the IOSTAT=<variable>. Then the program will abort and a usually
better error message is printed.
(Using newer compilers, one can also pass ERRMSG=<character_variable>,
where that variable gets assigned a more or less useful error message;
however, simply removing IOSTAT= is much easier.)
> In which case the question is: what is Linux error 127?
For the latter: Compile and run the following C program:
#include <errno.h>
#include <stdio.h>
int main()
{
errno = 127;
perror("");
return 0;
}
Here it prints: "Operation not permitted", but your system might have a
different output.
(Using "strace" one can also track the file accesses, which might give
some extra information.)
Tobias
> Well, you are missing quite a lot. Firstly, because of the way that
> I/O is buffered and Unix works, you often get an error message MUCH
> later than you should have done, and they may even get delayed until
> after the CLOSE (so you miss them). That explains why you�see the
> error near the end.
In this particular case, I know exactly where the problem occurs,
because there is only one statement that uses that particular error
branch. The statement immediately following the WRITE statement
tests for a nonzero IOSTAT and if so, takes the error branch.
Furthermore, the program both writes the output to a disk file and
displays it on the screen. The screen write succeeds, and the disk
write fails, and the failure is when the program tries to write the
final two lines of the output file.
> Secondly, think quotas and so on.
Already thought about that. If a quota was causing the problem, then
why wouldn't it also cause the same problem on the other disk drive?
> Also remember that civilised systems
> (not include Gnome, for example) use text files for configuration, and
> can be searched. 'grep 127 /usr/include/*/errno.h' gives you your
> answer!
It's the "answer" only if that's the only possible error 127. Without
knowing for sure what compiler was used to build the application, there
would seem to at least be the possibility that the runtime error is
specific to the compiler. What I found for g95 is that error numbers
1 to 199 are from the system, but I know that ifort has some runtime
errors in that range. Maybe they're the same?
> I have no idea what sort of key it is blithering on about, as I am
> no USB expert, but solving that is left as an exercise for the
> student :-)
There are no ideas here either, hence the request to the newsgroup.
> Usually the easiest way to get a better error message is to simply
> remove the IOSTAT=<variable>. Then the program will abort and a usually
> better error message is printed.
>
> (Using newer compilers, one can also pass ERRMSG=<character_variable>,
> where that variable gets assigned a more or less useful error message;
> however, simply removing IOSTAT= is much easier.)
I mentioned that we already tried to remove IOSTAT. Unfortunately, to
rebuild the application, we need a library.
>> In which case the question is: what is Linux error 127?
> For the latter: Compile and run the following C program:
>
> #include <errno.h>
> #include <stdio.h>
>
> int main()
> {
> errno = 127;
> perror("");
> return 0;
> }
>
> Here it prints: "Operation not permitted", but your system might have a
> different output.
Here it displays "Unknown error 127".
> (Using "strace" one can also track the file accesses, which might give
> some extra information.)
I'll have to have User C try that, because the program works just fine
for me!
Reread what I said, and look up how I/O is implemented for more details.
The reason is almost certainly what I said.
Fortran WRITE statements have not been implemented as physical writes
for well over 40 years, quite probably over 50 years. And Fortran
is a language where the default OPEN statement is not really supposed
to check for the ability to write data - even if you have specified
ACTION='Write', many systems won't check until you actually do the
write.
>> Secondly, think quotas and so on.
>
>Already thought about that. If a quota was causing the problem, then
>why wouldn't it also cause the same problem on the other disk drive?
Because quotas are per volume :-) And so are device keys.
>> Also remember that civilised systems
>> (not include Gnome, for example) use text files for configuration, and
>> can be searched. 'grep 127 /usr/include/*/errno.h' gives you your
>> answer!
>
>It's the "answer" only if that's the only possible error 127. Without
>knowing for sure what compiler was used to build the application, there
>would seem to at least be the possibility that the runtime error is
>specific to the compiler. What I found for g95 is that error numbers
>1 to 199 are from the system, but I know that ifort has some runtime
>errors in that range. Maybe they're the same?
That is indeed possible. But, given what I know about how such things
are implemented, and the message that comes out when I do the grep,
that's not where the smart money goes.
>> I have no idea what sort of key it is blithering on about, as I am
>> no USB expert, but solving that is left as an exercise for the
>> student :-)
>
>There are no ideas here either, hence the request to the newsgroup.
This is not the right group for the subtleties of USB devices.
Regards,
Nick Maclaren.
Does the user C have read and/write permissions on drive B (if not,
check udev rules), do you use selinux? It could easily be selinux
problem if you use it. I do not know what the problem is but I have
encountered some strange access problems with selinux (check /etc/
selinux/config). If you are using selinux try disabling it (if you
can) and see if the program works.
>>> Well, you are missing quite a lot. Firstly, because of the way that
>>> I/O is buffered and Unix works, you often get an error message MUCH
>>> later than you should have done, and they may even get delayed until
>>> after the CLOSE (so you miss them). That explains why you�see the
>>> error near the end.
>> In this particular case, I know exactly where the problem occurs,
>> because there is only one statement that uses that particular error
>> branch. The statement immediately following the WRITE statement
>> tests for a nonzero IOSTAT and if so, takes the error branch.
>> Furthermore, the program both writes the output to a disk file and
>> displays it on the screen. The screen write succeeds, and the disk
>> write fails, and the failure is when the program tries to write the
>> final two lines of the output file.
> Reread what I said, and look up how I/O is implemented for more details.
> The reason is almost certainly what I said.
Let's see, the program has a WRITE statement inside a DO loop, and the
IOSTAT variable remains 0 after every write in that loop. The disk
file does indeed contain that output, confirming that the WRITE was
successful, so the 0 value for the IOSTAT variable is confirmed as
being correct. Then after the DO loop is finished, the program tries
to wrap up by writing a couple of housekeeping lines to the ouput, but
IOSTAT gets set to 127, the output does not appear in the disk file,
and you're "almost certain" that the error message is MUCH later than
it should have been???
Maybe you need to reread what I said, because what you said makes no
sense at all.
> Fortran WRITE statements have not been implemented as physical writes
> for well over 40 years, quite probably over 50 years. And Fortran
> is a language where the default OPEN statement is not really supposed
> to check for the ability to write data - even if you have specified
> ACTION='Write', many systems won't check until you actually do the
> write.
So if the error occurred much earlier than when it appeared, why did
the output from all those WRITE statements appear successfully?
>>> Secondly, think quotas and so on.
>> Already thought about that. If a quota was causing the problem, then
>> why wouldn't it also cause the same problem on the other disk drive?
> Because quotas are per volume :-) And so are device keys.
But no quotas have been imposed on the user. And we checked that by
creating a large file (easily a thousand times larger than the file he
was trying to create), so we're confident that quotas are not the
problem.
>>> Also remember that civilised systems
>>> (not include Gnome, for example) use text files for configuration, and
>>> can be searched. 'grep 127 /usr/include/*/errno.h' gives you your
>>> answer!
>> It's the "answer" only if that's the only possible error 127. Without
>> knowing for sure what compiler was used to build the application, there
>> would seem to at least be the possibility that the runtime error is
>> specific to the compiler. What I found for g95 is that error numbers
>> 1 to 199 are from the system, but I know that ifort has some runtime
>> errors in that range. Maybe they're the same?
> That is indeed possible. But, given what I know about how such things
> are implemented, and the message that comes out when I do the grep,
> that's not where the smart money goes.
>>> I have no idea what sort of key it is blithering on about, as I am
>>> no USB expert, but solving that is left as an exercise for the
>>> student :-)
>> There are no ideas here either, hence the request to the newsgroup.
> This is not the right group for the subtleties of USB devices.
Except that we don't know if the problem is due to the drive being
attached via USB. I stated that up front.
>> I don't know if this is a Fortran problem, a Linux problem, a USB problem,
>> or something else, but it's a real bizarre problem. I'm wondering if
>> anybody here has any bright ideas.
>>
>> The programs are all written in Fortran. The machine is running
>> RedHat Enterprise Linux. Disk drive A is an internal EIDE disk
>> drive, while disk drive B is an external USB disk drive.
>>
>> User A can run Programs A, B, C, and D on disk drives A and B without
>> any problems.
>>
>> User B can run Programs A, B, C, and D on disk drives A and B without
>> any problems.
>>
>> User C can run Programs A, B, and C on disk drives A and B without any
>> problems, but Program D dies right when it tries to write the last two
>> lines of output to a disk file on Drive B. The WRITE statement
>> utilizes the IOSTAT= option, so the program exits gracefully and
>> displays the value of the IOSTAT variable, which is 127.
> Does the user C have read and/write permissions on drive B
Yes, as implied by the user's ability to run successfully Programs A, B,
and C, which all require reading an input data file and writing an output
data file.
> (if not, check udev rules), do you use selinux?
The second paragraph above specifies RedHat Enterprise Linux.
> It could easily be selinux
> problem if you use it. I do not know what the problem is but I have
> encountered some strange access problems with selinux (check /etc/
> selinux/config). If you are using selinux try disabling it (if you
> can) and see if the program works.
Do the strange access problems that you've seen include the successful
writing of 95 percent of an output file, followed by a sudden failure
that doesn't occur if a different user runs the program, or if the same
user runs the program on a different disk drive?
>
> In the meantime, can anybody think of why a program would fail for
> one particular user on one particular disk drive (an external USB)
> when 95 percent of the WRITE statements work just fine, with the
> failure occurring on only the last two lines of output?
Check to see what Unix groups the users are members of. Often, there
is a group named "disk" that has special privileges apropos removable
media.
Chip
--
Charles M. "Chip" Coldwell
"Turn on, log in, tune out"
GPG Key ID: 852E052F
GPG Key Fingerprint: 77E5 2B51 4907 F08A 7E92 DE80 AFA9 9A8F 852E 052F
> I don't know if this is a Fortran problem, a Linux problem, a USB problem,
> or something else, but it's a real bizarre problem. I'm wondering if
> anybody here has any bright ideas.
>
> The programs are all written in Fortran. The machine is running
> RedHat Enterprise Linux. Disk drive A is an internal EIDE disk
> drive, while disk drive B is an external USB disk drive.
>
> User A can run Programs A, B, C, and D on disk drives A and B without
> any problems.
>
> User B can run Programs A, B, C, and D on disk drives A and B without
> any problems.
>
> User C can run Programs A, B, and C on disk drives A and B without any
> problems, but Program D dies right when it tries to write the last two
> lines of output to a disk file on Drive B. The WRITE statement
> utilizes the IOSTAT= option, so the program exits gracefully and
> displays the value of the IOSTAT variable, which is 127. Meanwhile,
> if User C runs Program D on Drive A, it works fine.
Another possibility: are users A and B logged onto the console and
user C logged in remotely? There is also a pam_console.so library
that changes the ownership of certain entries in /dev when a person
logs onto the console. The notion being that if you have physical
access to the computer, there's no point in restricting your access to
the floppy drive, etc.
Being able to read files on a disk but not write them screams "Unix
ownership/permissions" to me.
> Another possibility: are users A and B logged onto the console and
> user C logged in remotely?
Just the opposite. Users A and B successfully write to the USB drive
while logged in via ssh connections from other computers. User C is
logged onto the console, and it fails for him.
> Being able to read files on a disk but not write them screams "Unix
> ownership/permissions" to me.
Yes, it would to me as well, except that Programs A, B, and C all
allow User C to write to Disk B, and Program D allows User C to write
to Disk B just fine until the last two lines, then it fails with
IOSTAT=127. I see no reason for a permission to kick in at that time
(and it's very consistent from attempt to attempt).
Well, the problems included an user not being able to access some
devices, while being able to access the others. What is strange in
your case is that the user is clearly able to write for some time.
The linux system error no 127 is EKEYEXPIRED, and acording to kernel
documentation, among other things it is used in SELinux and for
cryptographic filesystems.
I think you should ask this question on some linux od redhat group.
Good luck.
> Yes, it would to me as well, except that Programs A, B, and C all
> allow User C to write to Disk B, and Program D allows User C to write
> to Disk B just fine until the last two lines, then it fails with
> IOSTAT=127. I see no reason for a permission to kick in at that time
> (and it's very consistent from attempt to attempt).
It might help to show the code, preferably the simplest code
that shows the problem. Do note that the system doesn't know
which are the last two lines until the files is closed.
If you write more or fewer lines before those lines, does the
effect move? The actual timing is complicated by buffering.
-- glen
> (snip)
>> Yes, it would to me as well, except that Programs A, B, and C all
>> allow User C to write to Disk B, and Program D allows User C to write
>> to Disk B just fine until the last two lines, then it fails with
>> IOSTAT=127. I see no reason for a permission to kick in at that time
>> (and it's very consistent from attempt to attempt).
> It might help to show the code, preferably the simplest code
> that shows the problem.
I seriously doubt it in this case, partly because the code itself is
fine. Not only can two other users run the program on the same data
files without any failure, the same source code compiles and runs fine
on both Windows (CVF) and OS/2 (Watcom) platforms.
> Do note that the system doesn't know
> which are the last two lines until the files is closed.
But I know which are the last two lines. The output consists of three
housekeeping lines at the beginning of the file, two housekeeping lines
at the end of the file, and a variable amount of output data lines
between. Failure consistently occurs when the program tries to write
the first of those two trailing housekeeping lines.
> If you write more or fewer lines before those lines, does the
> effect move?
Relative to the start of the file, yes; relative to the end of the
file, no. The number of housekeeping lines in the output is constant;
the number of data lines is variable and depends on the amount of
data in the input data file. We tried the program on a variety of
input data files, and they all worked just fine until the first of
the last two housekeeping lines. Switch to a different drive, and
they all worked completely. Switch to a different user, and they all
worked completely, even on the the USB drive.
> The actual timing is complicated by buffering.
But is it possible for Fortran to return an IOSTAT=0 even if an error
really occurred, even though the buffering prevents it from manifesting
itself at that time, and similarly, to return an IOSTAT=127 even if no
error really occurred?
I forget what Fortran dialect the OP was using, but F2003 has a FLUSH
statement, already implemented in quite a few f95 compilers, that
forces an output buffer to be emptied (if it can be) at a time chosen
by the programmer. I seem to remember another compiler that had a
FLUSH(unit_number) subroutine. RTFM.
John Harper
Don't trust it :-( The problem with Fortran FLUSH, C fflush and
all that is that that flush output to the next stage, but the
protocols and interfaces don't have the right primitive - i.e.
a 'flush this through to the far end'. So it may not help.
Anyway, buffering is almost certainly the reason for the OP's
confusion, but he knows better.
Regards,
Nick Maclaren.