I encountered a a truly strange problem in my up2date Centos with
kernel 2.6.18-194.8.1.el5 #1 SMP Thu Jul 1 19:04:48 EDT 2010 x86_64
x86_64 x86_64 GNU/Linux:
When reading a certain file, the C function 'fread' would reproducibly
stop in the middle, at a 4K page boundary (address&0xfff = 0), setting
errno to EFAULT. As a workaround, I could simply continue reading from
there, single bytes at a time. So my function to read from a file in
Linux now looks like this: (Needless to say that there is nothing
wrong with the 'mem' pointer, it really points into application
memory):
/* READ FROM FILE AND RETURN NUMBER OF BYTES READ
============================================== */
int dsc_readbytes(void *mem,FILE *file,int size)
{ int i,bytes;
errno=0;
bytes=fread(mem,1,size,file);
#ifdef LINUX
if (bytes<size&&errno==EFAULT)
{ /* WORKAROUND FOR CENTOS KERNEL 2.6.18-194.8.1.el5 PROBLEM: fread
RETURNS EFAULT
WHEN REACHING A CERTAIN PAGE BOUNDARY. CONTINUE READING SINGLE
BYTES, WHICH WORKS */
printf("fread read %d of %d requested bytes, ending at %p
\n",bytes,size,mem+bytes);
mem+=bytes;
size-=bytes;
for (i=0;i<size;i++,mem++)
{ if (!fread(mem,1,1,file)) printf("fread failed again!\n");
else bytes++; } }
#endif
return(bytes); }
Upon loading this certain file, the output is:
fread read 4604 of 42932 requested bytes, ending at 0xeedc6000
The file is read correctly thanks to the hack. When I run the program
multiple times, the address where fread stops is always different, but
it's always a 4K page boundary.
Has anyone ever seen anything like that?
Where should I best report this?
Thanks for your help,
Elmar
I see several problems with this piece of code.
a) You shouldn't write (void *) when you actually mean (char unsigned *).
Using a pointer-to-void as an operand to the additive operator is a
constraint violation. (However, %p still needs a pointer-to-void.)
b) "size", "i" and "bytes" should all have type "size_t". Accordingly, you
should change the printf conversion specifiers to "%zu" (under
C99/SUSv[34]), or use "%lu" and cast the "size_t" variables to "long
unsigned" (under C89/SUSv[12]).
c) After you've determined that fread() encountered an error, you
shouldn't retry reading from the stream unless you clear its error
indicator first, eg. with clearerr() or rewind().
d) When fread() returns with EFAULT, that probably reports to you that the
underlying read() syscall detected (in kernel space) an
out-of-address-space access, or a memory protection violation. I think it
is very unlikely that a bug in glibc's stdio implementation would cause
this.
Why the single-byte fread()'s might work -- speculation: when you
initially try to read a chunk that is bigger than a page (which is
probably matched by the size of the stream's user-space buffer), the
fread() implementation might try to issue a read() syscall that would
directly transfer data from the file to the buffer you (the client
programmer) specified. Since "mem" is invalid for the size of that
request, the kernel returns with EFAULT.
OTOH, when you try single-byte reads, the fread() implementation might
instead issue a read() call that transfers a pageful of bytes from the
regular file to the stream's glibc-managed stdio buffer. This succeeds.
After that, the fread() implementation might copy single bytes to your own
buffer. I admit, I have no idea why this doesn't result in a SIGSEGV after
a while.
I'd suggest running the program under strace and/or valgrind, or changing
the stream's buffering (with setvbuf()) to unbuffered right after opening
the stream. That way the single-byte reads should directly translate to
read() syscalls.
lacos
[ #ifdef trimmed ]
> return(bytes); }
Not having the source to the rest of the program I can't be sure the
problem is not there.
--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---
[...]
This is a gcc extension.
> b) "size", "i" and "bytes" should all have type "size_t".
There is no reason for this: Provided that all values which can
actually occur fit into an int, using int is ok. Additionally, the gcc
misinterpretation of 'exceptional condition' as 'something I gcc
developer sure consider to be exceptional !!1' doesn't cover direct
assignments, where the C-standard explicitly demands that 'the result
is implementation-defined or an implementation-defined signal is
raised' (when trying to assign a value outside the range of valid ints
to an object of type int).
> "Ersek, Laszlo" <la...@caesar.elte.hu> writes:
>> I see several problems with this piece of code.
... I hesitated to wrte "problems (of different severities)" :)
>> a) You shouldn't write (void *) when you actually mean (char unsigned
>> *). Using a pointer-to-void as an operand to the additive operator is a
>> constraint violation. (However, %p still needs a pointer-to-void.)
>
> This is a gcc extension.
... that breaks portability (standards conformance) with very little
benefit.
>> b) "size", "i" and "bytes" should all have type "size_t".
>
> There is no reason for this: Provided that all values which can actually
> occur fit into an int, using int is ok.
I agree.
My reason was: for sizes, "size_t" is the idiomatic type. fread() returns
"size_t" accordingly, so the unnecessary conversion to "int" (even if it
fits) incurs mental overhead, without benefit. (I realize this is not
c.l.c, so I'll stop.)
> Additionally, the gcc misinterpretation of 'exceptional condition' as
> 'something I gcc developer sure consider to be exceptional !!1' doesn't
> cover direct assignments, where the C-standard explicitly demands that
> 'the result is implementation-defined or an implementation-defined
> signal is raised' (when trying to assign a value outside the range of
> valid ints to an object of type int).
This is the mental load ("I can use 'int' here because...") that I find
superfluous *for the code in question*.
Anyway, I'm fairly sure the OP wrote "int" not after considering possible
ranges, but out of habit. (No offense meant.) If we wish to use the
smallest non-promoted type instead of "size_t", at least let's employ
"unsigned".
Thanks,
lacos
> int dsc_readbytes(void *mem,FILE *file,int size)
> { int i,bytes;
>
> errno=0;
> bytes=fread(mem,1,size,file);
> #ifdef LINUX
> if (bytes<size&&errno==EFAULT)
> { /* WORKAROUND FOR CENTOS KERNEL 2.6.18-194.8.1.el5 PROBLEM: fread
> RETURNS EFAULT
> WHEN REACHING A CERTAIN PAGE BOUNDARY. CONTINUE READING SINGLE
> BYTES, WHICH WORKS */
> The file is read correctly thanks to the hack. When I run the program
> multiple times, the address where fread stops is always different, but
> it's always a 4K page boundary.
>
> Has anyone ever seen anything like that?
> Where should I best report this?
My bet is that 'mem' is not valid for 'size' bytes. You don't normally
detect the problem because you don't normally read that many bytes.
To test my theory, add this line at the very beginning of your
function:
memset(mem, 1, size);
DS
many thanks for your comments on the fread issue.
Let me again confirm that the obvious explanation - an incorrect
pointer 'mem' - does not apply.
I changed the code as follows, adding the 'memset' requested by David
Schwartz and the 'clearerr' requested by lacos (thanks for the hint!),
and found something absolutely amazing:
- if the memset at the beginning is present, fread works normally
- if the memset is not present, fread reports the EFAULT.
I think this is the proof I got a mysterious library/kernel issue,
something like fread complaining about a 4K page which is swapped out.
memset swaps it in again, then fread doesn't complain, or so. It's
reproducible on my main machine, but not on my notebook with a
different Linux distro.
If you know someone in the libc team who might be interested in this,
tell me. Filing an official bug report is going to be difficult, since
there is no minimum example to reproduce, so it's more like someone
looking at the fread code to check the EFAULT.
Here is the current code:
int dsc_readbytes(void *mem,FILE *file,int size)
{ int i,bytes;
/* TEMPORARY CHECK THAT mem IS VALID BY WRITING AND READING */
/* IF memset IS PRESENT, EVERYTHING WORKS, OTHERWISE fread FAILS */
memset(mem,1,size);
errno=i=0;
bytes=fread(mem,1,size,file);
printf("Read %d bytes, errno is %d\n",bytes,errno);
if (bytes<size&&errno==EFAULT)
{ /* WORKAROUND FOR CENTOS KERNEL 2.6.18-194.8.1.el5 PROBLEM: fread
RETURNS EFAULT
WHEN REACHING A CERTAIN PAGE BOUNDARY. CONTINUE READING SINGLE
BYTES, WHICH WORKS */
printf("NOTE: fread read only %d of %d requested bytes, ending
with EFAULT at %p. Recovering...\n",bytes,size,mem+bytes);
mem+=bytes;
size-=bytes;
/* CLEAR ERROR INDICATORS AND CONTINUE READING */
clearerr(file);
for (i=0;i<size;i++)
{ if (!fread(mem,1,1,file)) printf("fread failed again!\n");
else
{ mem++;
bytes++; } }
printf("Finished reading %d bytes in second try\n",bytes); }
return(bytes); }
With memset present, the output is:
Read 42932 bytes, errno is 0
Without memset, the output is:
Read 4604 bytes, errno is 14
NOTE: fread read only 4604 of 42932 requested bytes, ending with
EFAULT at 0xeed18000. Recovering...
Finished reading 42932 bytes in second try
Concerning the other remarks:
>a) You shouldn't write (void *) when you actually mean (char unsigned *).
>Using a pointer-to-void as an operand to the additive operator is a
>constraint violation. (However, %p still needs a pointer-to-void.)
>b) "size", "i" and "bytes" should all have type "size_t". Accordingly, you
>should change the printf conversion specifiers to "%zu" (under
>C99/SUSv[34]), or use "%lu" and cast the "size_t" variables to "long
>unsigned" (under C89/SUSv[12]).
This is of course all true, but I'm working on a huge project which I
won't manage to finish in this life. So I need to cut corners, and
found that I can save 20% programming time and code size by using GCC
extensions (like void* arithmetics) and other 'tricks', e.g. declaring
everything as 'int' (avoiding size_t and unsigned ints). Portability
is not my concern, since 15% is vector assembler code anyway.
Nevertheless, thanks for your patient warnings.
BTW, you mentioned an unnecessary conversion from "size_t" to "int",
but there is no conversion: fread simply returns the result in rax
(64bit) or eax (32bit) and eax is stored in 'bytes'. In 64bit mode,
you even save five bytes memory: four bytes because 'bytes' takes only
4 instead of 8 bytes storage, and one byte because you don't need the
REX prefix when storing eax.
Best regards,
Elmar
> Let me again confirm that the obvious explanation - an incorrect
> pointer 'mem' - does not apply.
>
> I changed the code as follows, adding the 'memset' requested by David
> Schwartz and the 'clearerr' requested by lacos (thanks for the hint!),
> and found something absolutely amazing:
>
> - if the memset at the beginning is present, fread works normally
> - if the memset is not present, fread reports the EFAULT.
To me that suggests kernel bug rather than libc bug. (Assuming it's
definitely not your bug l-)
Try running your program under 'strace -eraw=read'; if you report the
initial value of 'mem' somewhere that should reveal whether it's reading
into your buffer or its internal buffer, and whether fread() is honoring
'size' correctly.
Have a look at the failing program's /proc/$PID/maps to verify whether
the address you're targetting is genuinely mapped. You'll have to do
this while it's running. (I take your point that you're confident
you're allocating it correctly, but if you suspect a kernel bug then one
question is whether the kernel agrees.)
Run 'dmesg' just after a failure to see if the kernel had anything to
say.
> I think this is the proof I got a mysterious library/kernel issue,
> something like fread complaining about a 4K page which is swapped out.
> memset swaps it in again, then fread doesn't complain, or so. It's
> reproducible on my main machine, but not on my notebook with a
> different Linux distro.
>
> If you know someone in the libc team who might be interested in this,
> tell me. Filing an official bug report is going to be difficult, since
> there is no minimum example to reproduce, so it's more like someone
> looking at the fread code to check the EFAULT.
>
> Here is the current code:
>
> int dsc_readbytes(void *mem,FILE *file,int size)
> { int i,bytes;
>
> /* TEMPORARY CHECK THAT mem IS VALID BY WRITING AND READING */
> /* IF memset IS PRESENT, EVERYTHING WORKS, OTHERWISE fread FAILS */
> memset(mem,1,size);
> errno=i=0;
> bytes=fread(mem,1,size,file);
> printf("Read %d bytes, errno is %d\n",bytes,errno);
> if (bytes<size&&errno==EFAULT)
> { /* WORKAROUND FOR CENTOS KERNEL 2.6.18-194.8.1.el5 PROBLEM: fread
> RETURNS EFAULT
I don't know how flexible Centos is about kernel versions but 2.6.18 is
getting on a bit. Perhaps a more recent kernel would be worth a try,
all other things being kept equal?
> Elmar <elmar....@gmail.com> writes:
>
>> Let me again confirm that the obvious explanation - an incorrect
>> pointer 'mem' - does not apply.
>>
>> I changed the code as follows, adding the 'memset' requested by David
>> Schwartz and the 'clearerr' requested by lacos (thanks for the hint!),
>> and found something absolutely amazing:
>>
>> - if the memset at the beginning is present, fread works normally
>> - if the memset is not present, fread reports the EFAULT.
>
> To me that suggests kernel bug rather than libc bug. (Assuming it's
> definitely not your bug l-)
While to me, it sounds like a libc bug (if, again, it isn't the OP's
bug. Enough people have done what he's trying to do without problems
that I have to remain very suspicious....). Anyway, read() isn't
guaranteed to return the requested number of bytes, and it isn't an
error if it doesn't; fread() does have that guarantee. So to me it sounds
like the version of libc he's using isn't continuing to fill his buffer
after read() returns fewer bytes than requested.
One suggestion I'd make, on top of the suggestions others have made, are
that he call ferror() to make sure libc really thinks he's got an
error before he reads errno. Remember that errno can be set without an
error existing.
The OP has mentioned this is a "file" -- is it really a file, or perhaps
a pipe or something? In that case he could easily get a short read
back. Doesn't seem likely it would consistently land on a page
boundary, though.
Is falling all the way back to single-byte fread()s really necessary?
Maybe just wrapping up his fread() in a loop that keeps trying to get
the rest of the buffer until it completes would work around the problem
while trying to be a bit faster.
> Try running your program under 'strace -eraw=read'; if you report the
> initial value of 'mem' somewhere that should reveal whether it's reading
> into your buffer or its internal buffer, and whether fread() is honoring
> 'size' correctly.
>
> Have a look at the failing program's /proc/$PID/maps to verify whether
> the address you're targetting is genuinely mapped. You'll have to do
> this while it's running. (I take your point that you're confident
> you're allocating it correctly, but if you suspect a kernel bug then one
> question is whether the kernel agrees.)
>
> Run 'dmesg' just after a failure to see if the kernel had anything to
> say.
All good ideas.
>> I think this is the proof I got a mysterious library/kernel issue,
>> something like fread complaining about a 4K page which is swapped out.
>> memset swaps it in again, then fread doesn't complain, or so. It's
>> reproducible on my main machine, but not on my notebook with a
>> different Linux distro.
>>
>> If you know someone in the libc team who might be interested in this,
>> tell me. Filing an official bug report is going to be difficult, since
>> there is no minimum example to reproduce, so it's more like someone
>> looking at the fread code to check the EFAULT.
>>
>> Here is the current code:
>>
>> int dsc_readbytes(void *mem,FILE *file,int size)
>> { int i,bytes;
>>
>> /* TEMPORARY CHECK THAT mem IS VALID BY WRITING AND READING */
>> /* IF memset IS PRESENT, EVERYTHING WORKS, OTHERWISE fread FAILS */
>> memset(mem,1,size);
>> errno=i=0;
>> bytes=fread(mem,1,size,file);
>> printf("Read %d bytes, errno is %d\n",bytes,errno);
>> if (bytes<size&&errno==EFAULT)
>> { /* WORKAROUND FOR CENTOS KERNEL 2.6.18-194.8.1.el5 PROBLEM: fread
>> RETURNS EFAULT
>
> I don't know how flexible Centos is about kernel versions but 2.6.18 is
> getting on a bit. Perhaps a more recent kernel would be worth a try,
> all other things being kept equal?
I wonder if libc is equally old?
--
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
> > To me that suggests kernel bug rather than libc bug. (Assuming it's
> > definitely not your bug l-)
I agree.
> While to me, it sounds like a libc bug (if, again, it isn't the OP's
> bug. Enough people have done what he's trying to do without problems
> that I have to remain very suspicious....). Anyway, read() isn't
> guaranteed to return the requested number of bytes, and it isn't an
> error if it doesn't; fread() does have that guarantee. So to me it sounds
> like the version of libc he's using isn't continuing to fill his buffer
> after read() returns fewer bytes than requested.
However, if 'read' returns EFAULT under circumstances where a 'memset'
would not fault, something is very wrong. The soft fault test and the
hard fault test should return identical results and this test is the
kernel's responsibility.
DS
Except that errno's value is undefined unless there is an actual error.
If read() just returned fewer bytes than expected (and fread() is, for
some reason, not read()ing again) then EFAULT could easily be set to
some spurious value.
I made a few final checks you requested, but I think the conclusion is
that even though I installed the latest Centos only 6 months ago and
made a full 'yum update', kernel and libc are so outdated that it's
probably impolite to bother anyone at the kernel or libc team with it.
E.g. libc is 2.5 (first released in 2006!), while 2.11 is the current
one.
So let's put it this way: if you found this thread via Google because
you have the same problem and if your kernel/libc is reasonably
up2date, then please contact the developers, otherwise use my
workaround ;-)
> One suggestion I'd make, on top of the suggestions others have made, are
> that he call ferror() to make sure libc really thinks he's got an
> error before he reads errno. Remember that errno can be set without an
> error existing.
Has been done (code follows below), ferror returns 1 after fread.
> The OP has mentioned this is a "file" -- is it really a file, or perhaps
> a pipe or something? In that case he could easily get a short read
> back. Doesn't seem likely it would consistently land on a page
> boundary, though.
It's a normal file on the disk.
> Is falling all the way back to single-byte fread()s really necessary?
> Maybe just wrapping up his fread() in a loop that keeps trying to get
> the rest of the buffer until it completes would work around the problem
> while trying to be a bit faster.
I tried larger chunks first, but that also EFAULTed. My suspicion was
that fread only performs the EFAULT check when it crosses a boundary,
and by reading 1 byte at a time, you never cross a boundary ;-)
> > Try running your program under 'strace -eraw=read'; if you report the
> > initial value of 'mem' somewhere that should reveal whether it's reading
> > into your buffer or its internal buffer, and whether fread() is honoring
> > 'size' correctly.
Hihi, when using strace (the -eraw option is not in the help BTW), the
problem doesn't show up. So it seems that strace has the same curing
effect as memset ;-)
> > Run 'dmesg' just after a failure to see if the kernel had anything to
> > say.
Nothing in dmesg
> > Have a look at the failing program's /proc/$PID/maps to verify whether
> > the address you're targetting is genuinely mapped. You'll have to do
> > this while it's running. (I take your point that you're confident
> > you're allocating it correctly, but if you suspect a kernel bug then one
> > question is whether the kernel agrees.)
>
Here are the maps:
(The mem ptr is 0xeecf2e04)
00851000-0086c000 r-xp 00000000 08:03
10063024 /lib/ld-2.5.so
0086c000-0086d000 r-xp 0001a000 08:03
10063024 /lib/ld-2.5.so
0086d000-0086e000 rwxp 0001b000 08:03
10063024 /lib/ld-2.5.so
00870000-009c2000 r-xp 00000000 08:03
10062992 /lib/libc-2.5.so
009c2000-009c4000 r-xp 00152000 08:03
10062992 /lib/libc-2.5.so
009c4000-009c5000 rwxp 00154000 08:03
10062992 /lib/libc-2.5.so
009c5000-009c8000 rwxp 009c5000 00:00 0
009ca000-009cd000 r-xp 00000000 08:03
10063042 /lib/libdl-2.5.so
009cd000-009ce000 r-xp 00002000 08:03
10063042 /lib/libdl-2.5.so
009ce000-009cf000 rwxp 00003000 08:03
10063042 /lib/libdl-2.5.so
009d1000-009f8000 r-xp 00000000 08:03
10063048 /lib/libm-2.5.so
009f8000-009f9000 r-xp 00026000 08:03
10063048 /lib/libm-2.5.so
009f9000-009fa000 rwxp 00027000 08:03
10063048 /lib/libm-2.5.so
009fc000-00a11000 r-xp 00000000 08:03
10062999 /lib/libpthread-2.5.so
00a11000-00a12000 r-xp 00015000 08:03
10062999 /lib/libpthread-2.5.so
00a12000-00a13000 rwxp 00016000 08:03
10062999 /lib/libpthread-2.5.so
00a13000-00a15000 rwxp 00a13000 00:00 0
00b4a000-00b4f000 r-xp 00000000 08:03
8702141 /usr/lib/libXdmcp.so.6.0.0
00b4f000-00b50000 rwxp 00004000 08:03
8702141 /usr/lib/libXdmcp.so.6.0.0
00b83000-00c82000 r-xp 00000000 08:03
8702259 /usr/lib/libX11.so.6.2.0
00c82000-00c86000 rwxp 000ff000 08:03
8702259 /usr/lib/libX11.so.6.2.0
00c88000-00c8a000 r-xp 00000000 08:03
8702146 /usr/lib/libXau.so.6.0.0
00c8a000-00c8b000 rwxp 00001000 08:03
8702146 /usr/lib/libXau.so.6.0.0
00c8d000-00c90000 r-xp 00000000 08:03
8705675 /usr/lib/libXrandr.so.2.0.0
00c90000-00c91000 rwxp 00002000 08:03
8705675 /usr/lib/libXrandr.so.2.0.0
00c98000-00ca1000 r-xp 00000000 08:03
8705725 /usr/lib/libXcursor.so.1.0.2
00ca1000-00ca2000 rwxp 00008000 08:03
8705725 /usr/lib/libXcursor.so.1.0.2
00ca4000-00ca8000 r-xp 00000000 08:03
8705724 /usr/lib/libXfixes.so.3.1.0
00ca8000-00ca9000 rwxp 00003000 08:03
8705724 /usr/lib/libXfixes.so.3.1.0
00cab000-00cba000 r-xp 00000000 08:03
8702346 /usr/lib/libXext.so.6.4.0
00cba000-00cbb000 rwxp 0000e000 08:03
8702346 /usr/lib/libXext.so.6.4.0
00d3f000-00d47000 r-xp 00000000 08:03
8702316 /usr/lib/libXrender.so.1.3.0
00d47000-00d48000 rwxp 00007000 08:03
8702316 /usr/lib/libXrender.so.1.3.0
08048000-08494000 r-xp 00000000 08:11
14221823 /home/elmar/python/myapp/myapp
08494000-0854f000 rwxp 0044b000 08:11
14221823 /home/elmar/python/myapp/myapp
0854f000-093f5000 rwxp 0854f000 00:00 0
0ab2e000-0ada5000 rwxp 0ab2e000 00:00
0 [heap]
eaae1000-ebd5e000 rwxp eaae1000 00:00 0
ebd5e000-ebd65000 r-xs 00000000 08:03
8789030 /usr/lib/gconv/gconv-modules.cache
ebd65000-ec30d000 rwxp ebd65000 00:00 0
ec30d000-ec30f000 rwxp 00000000 00:11
1783 /dev/zero
ec30f000-ec311000 rwxp 00000000 00:11
1783 /dev/zero
ec311000-ec313000 rwxp 00000000 00:11
1783 /dev/zero
ec313000-ec315000 rwxp 00000000 00:11
1783 /dev/zero
ec315000-ec715000 rwxs cf268000 00:11
6106 /dev/nvidia0
ec715000-ec719000 rwxs 97e31000 00:11
6106 /dev/nvidia0
ec719000-ec819000 rwxs cbc95000 00:11
6106 /dev/nvidia0
ec819000-ec81d000 rwxs b7aae000 00:11
6106 /dev/nvidia0
ec81d000-ec821000 rwxs cb5e2000 00:11
6106 /dev/nvidia0
ec821000-ec822000 rwxs e2c06000 00:11
6106 /dev/nvidia0
ec822000-ec862000 rwxs b39e8000 00:11
6106 /dev/nvidia0
ec862000-ec939000 rwxp ec862000 00:00 0
ec939000-ec97c000 rwxp 00000000 00:11
1783 /dev/zero
eca2a000-ecec4000 rwxp eca2a000 00:00 0
ecec4000-ecec5000 rwxs d0002000 00:11
6106 /dev/nvidia0
ecec5000-ecec6000 rwxs 37ca0000 00:11
6106 /dev/nvidia0
ecec6000-ecec7000 rwxs a154d000 00:11
6106 /dev/nvidia0
ecee7000-ecee8000 rwxs e2641000 00:11
6106 /dev/nvidia0
ecee8000-ecee9000 rwxs 210d88000 00:11
6106 /dev/nvidia0
ecee9000-ecf21000 rwxp ecee9000 00:00 0
ecf21000-ecf42000 rwxs 00000000 00:09
3735554 /SYSV00000000 (deleted)
ecf42000-ecf4e000 rwxp ecf42000 00:00 0
ecf4e000-ed14e000 r-xp 00000000 08:03
8706095 /usr/lib/locale/locale-archive
ed14e000-f6817000 rwxp ed14e000 00:00 0
f6817000-f6818000 r-xp 00000000 08:03
10034771 /usr/lib/tls/libnvidia-tls.so.
195.36.31
f6818000-f6819000 rwxp 00000000 08:03
10034771 /usr/lib/tls/libnvidia-tls.so.
195.36.31
f6819000-f7dc8000 r-xp 00000000 08:03
8700567 /usr/lib/libGLcore.so.195.36.31
f7dc8000-f7e1a000 rwxp 015ae000 08:03
8700567 /usr/lib/libGLcore.so.195.36.31
f7e1a000-f7e2b000 rwxp f7e1a000 00:00 0
f7e2b000-f7ec6000 r-xp 00000000 08:03
8700546 /usr/lib/libGL.so.195.36.31
f7ec6000-f7ee0000 rwxp 0009a000 08:03
8700546 /usr/lib/libGL.so.195.36.31
f7ee0000-f7ef0000 rwxp f7ee0000 00:00 0
f7ef0000-f7ef1000 rwxs e2060000 00:11
6106 /dev/nvidia0
f7ef1000-f7f1d000 rwxp f7ef1000 00:00 0
ff9b4000-ff9d9000 rwxp 7ffffffda000 00:00
0 [stack]
ffffe000-fffff000 r-xp ffffe000 00:00 0
Here is the current code:
int dsc_readbytes(void *mem,FILE *file,int size)
{ int i,bytes;
/* TEMPORARY CHECK THAT mem IS VALID BY WRITING AND READING */
/* IF memset IS PRESENT, EVERYTHING WORKS, OTHERWISE fread FAILS */
//memset(mem,1,size);
errno=i=0;
bytes=fread(mem,1,size,file);
//printf("Read %d bytes, errno is %d\n",bytes,errno);
#ifdef LINUX
if (bytes<size&&errno==EFAULT)
{ /* WORKAROUND FOR CENTOS KERNEL 2.6.18-194.8.1.el5 PROBLEM: fread
RETURNS EFAULT
WHEN REACHING A CERTAIN PAGE BOUNDARY. CONTINUE READING SINGLE
BYTES, WHICH WORKS */
printf("NOTE: fread read only %d of %d requested bytes, ending
with EFAULT at %p. Recovering...\n",bytes,size,mem+bytes);
printf("Reading started at address %p\n",mem);
printf("Valid application memory (allocated with single malloc) is
%p-%p\n",mem_start,(void*)mem_start+mem_size-1);
printf("Stream error flag is %d\n",ferror(file));
mem+=bytes;
size-=bytes;
/* CLEAR ERROR INDICATORS AND CONTINUE READING */
clearerr(file);
for (i=0;i<size;i++)
{ if (!fread(mem,1,1,file)) printf("fread failed again!\n");
else
{ mem++;
bytes++; } }
printf("Finished reading %d bytes in second try\n",bytes); }
#endif
return(bytes); }
And the output:
NOTE: fread read only 4604 of 42932 requested bytes, ending with
EFAULT at 0xeecf4000. Recovering...
Reading started at address 0xeecf2e04
Valid application memory (allocated with single malloc) is
0xed214040-0xf6813fff
Stream error flag is 1
Finished reading 42932 bytes in second try
Thanks again for your efforts, but don't worry about it any longer.
Elmar
>> Is falling all the way back to single-byte fread()s really necessary?
>> Maybe just wrapping up his fread() in a loop that keeps trying to get
>> the rest of the buffer until it completes would work around the problem
>> while trying to be a bit faster.
>
> I tried larger chunks first, but that also EFAULTed. My suspicion was
> that fread only performs the EFAULT check when it crosses a boundary,
> and by reading 1 byte at a time, you never cross a boundary ;-)
My guess would be that the single-byte reads use the file's internal
buffer, thus bypassing the faulty check on your buffer.
(read() does check the whole buffer 'up front', but following the
complicated path it ends up in for e.g. ext3 does end up with other
places where EFAULT can arise - see mm/filemap.c.)
>> > Try running your program under 'strace -eraw=read'; if you report the
>> > initial value of 'mem' somewhere that should reveal whether it's reading
>> > into your buffer or its internal buffer, and whether fread() is honoring
>> > 'size' correctly.
>
> Hihi, when using strace (the -eraw option is not in the help BTW), the
> problem doesn't show up. So it seems that strace has the same curing
> effect as memset ;-)
Dammit l-)
-e raw=set Print raw, undecoded arguments for the specified set
of system calls. This option has the effect of
causing all arguments to be printed in hexadecimal.
This is mostly useful if you don’t trust the decod‐
ing or you need to know the actual numeric value of
an argument.
(The point being to get it to print the address of the buffer passed to
the underlying read() call.)
> Except that errno's value is undefined unless there is an actual error.
> If read() just returned fewer bytes than expected (and fread() is, for
> some reason, not read()ing again) then EFAULT could easily be set to
> some spurious value.
Yep, you're right. I take that back.
DS