A binary stream need not meaningfully support fseek calls with a whence
value of SEEK_END.
Does this mean that if I fopen a file "r" I can determine its size, but
"rb" I cannot?
--
#include <standard.disclaimer>
_
Kevin D Quitt USA 91387-4454 96.37% of all statistics are made up
Per the FCA, this address may not be added to any commercial mail list
No, I don't think it does. With a few assumptions not guaranteed by
the standard (files are just sequences of bytes, perhaps a couple of
others), you can determine the size of a file by calling
fseek(F, 0, SEEK_END) followed by ftell(F). For binary streams, you
can't depend on the fseek() (for example, all binary files might be
stored as a whole number of disk blocks). For text streams, you can't
depend on the ftell(); the result is unspecified, and can only
reliably be used to invoke fseek() and return to the same position.
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
>Kevin D. Quitt <KQuitt...@IEEIncUNMUNG.com> writes:
>> 7.9.19.2 #3
>>
>> A binary stream need not meaningfully support fseek calls with a whence
>> value of SEEK_END.
>> Does this mean that if I fopen a file "r" I can determine its size, but
>> "rb" I cannot?
>
>No, I don't think it does.
>With a few assumptions not guaranteed by
>the standard (files are just sequences of bytes, perhaps a couple of
>others), you can determine the size of a file by calling
>fseek(F, 0, SEEK_END) followed by ftell(F).
So for "files", fseek and ftell should (but it's not guaranteed) tell me
how big the file is not matter how it's opened. fstat and/or stat are not
part of C.
>For binary streams, you
>can't depend on the fseek() (for example, all binary files might be
>stored as a whole number of disk blocks).
OK, I guess I'm really dense. *Somebody* has to know how big the binary
file is, so I can receive and EOF. Why would it matter how it's stored?
I can understand these functions not being defined for a non-file stream,
but I cannot understand WHY fseek is allowed not to work.
It seems to me that in a file opened in binary mode, these functions
should work trivially under all circumstances. I'm blind I guess.
>For text streams, you can't
>depend on the ftell(); the result is unspecified, and can only
>reliably be used to invoke fseek() and return to the same position.
So for binary files, fseek can't get you to the end of the file, and for
text files, ftell can't tell you where you are. Wonderful.
OK. What about:
fseek( F, ULONG_MAX, SEEK_SET );
size_t_variable = ftell( F );
Just checked the FAQ again; it can't be done. What an astounding
deficiency in the language!
> So for binary files, fseek can't get you to the end of the file, and for
> text files, ftell can't tell you where you are. Wonderful.
Neither is true. For binary files, fseek may get you *further* than
you think the end of file should be, because of NUL padding. And for
text files, ftell can tell you where you are so that you can get
back there, but it can't tell you how many characters you are from
the beginning of the file, becuase "characters" are a matter of
interpretation. And this is indeed wonderful because it supports
two important abstractions -- binary and text files -- across an
incredibly broad range of operating systems.
> OK. What about:
>
> fseek( F, ULONG_MAX, SEEK_SET );
> size_t_variable = ftell( F );
>
> Just checked the FAQ again; it can't be done. What an astounding
> deficiency in the language!
Or a deficiency in the operating systems. Depends on how you look
at it.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
>Kevin D. Quitt <KQuitt...@IEEIncUNMUNG.com> writes:
>> 7.9.19.2 #3
>>
>> A binary stream need not meaningfully support fseek calls with a whence
>> value of SEEK_END.
>>
>>
>> Does this mean that if I fopen a file "r" I can determine its size, but
>> "rb" I cannot?
>
>No, I don't think it does. With a few assumptions not guaranteed by
>the standard (files are just sequences of bytes, perhaps a couple of
>others), you can determine the size of a file by calling
>fseek(F, 0, SEEK_END) followed by ftell(F). For binary streams, you
>can't depend on the fseek() (for example, all binary files might be
>stored as a whole number of disk blocks). For text streams, you can't
>depend on the ftell(); the result is unspecified, and can only
>reliably be used to invoke fseek() and return to the same position.
Some operating systems do not support the stream abstraction or byte
size: all I/O is by the record, the size allocated may be much larger
than the size used, and the actual size used may only be determined by
sequentially reading thru all records in the file to the end.
--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
Brian....@CSi.com (Brian dot Inglis at SystematicSw dot ab dot ca)
fake address use address above to reply
It should be noted that POSIX requires that there be
no difference between text and binary streams, that
text lines be delimited by a single newline character
with no end-of-line padding, and that end-of-file
occur immediately after the last written byte. I.e.,
the simplest model.
>On Thu, 24 Jun 2004 19:46:31 GMT, Keith Thompson <ks...@mib.org> wrote:
>
>
>>Kevin D. Quitt <KQuitt...@IEEIncUNMUNG.com> writes:
>>> 7.9.19.2 #3
>>>
>>> A binary stream need not meaningfully support fseek calls with a whence
>>> value of SEEK_END.
>>> Does this mean that if I fopen a file "r" I can determine its size, but
>>> "rb" I cannot?
>>
>>No, I don't think it does.
>
>>With a few assumptions not guaranteed by
>>the standard (files are just sequences of bytes, perhaps a couple of
>>others), you can determine the size of a file by calling
>>fseek(F, 0, SEEK_END) followed by ftell(F).
>
>So for "files", fseek and ftell should (but it's not guaranteed) tell me
>how big the file is not matter how it's opened.
For starters, you should define the concept of file size. Once you
realise the difficulties in providing an implementation-neutral
definition, all your objections are gone.
>fstat and/or stat are not part of C.
And are not necessarily meaningful outside the universe of POSIX.
>>For binary streams, you
>>can't depend on the fseek() (for example, all binary files might be
>>stored as a whole number of disk blocks).
>
>OK, I guess I'm really dense. *Somebody* has to know how big the binary
>file is, so I can receive and EOF.
You'll receive it when hitting the physical end of the binary file, but
this is not *necessarily* the same as the logical end of the binary file.
That's one of the reasons I've recommended to start by defining the
concept of file size. Let's say that you open a binary file in write
mode and write 10 bytes to it. You close it, reopen it in read mode and
start reading. After reading, say, 512 bytes, you get an EOF instead of
a byte value. What is the size of your binary file: 10 or 512 bytes
and why?
Ever used a VMS system? Remember the unit used by DIR to display the
file sizes?
Text files have a different problem. It is perfectly possible to have
a sentinel character that marks the logical end of the file, so
fseek(F, 0, SEEK_END) can be meaningfully supported, but the
correspondence between characters written to the text file and bytes used
by the filesystem is not defined at the C language level. The most
trivial example is MSDOS/Windows, where the newline character needs two
bytes, but there are the record based filesystems, where the newline
character is not stored at all, because each line is stored in one
variable sized record, but each record has its own, system specific,
overhead. So, the most sensible definition for the size of a text file
would be the numbers of characters written to it, but all you can easily
obtain is the number of disk bytes it occupies.
The C standard's specification of fseek() guarantees only what can be
meaningfully provided by an implementation running on *any* of the
operating systems still in current use. You can easily understand this
once you remove the POSIX model from your mind.
OTOH, if you're restricted to POSIX and POSIX-like systems, you can
safely rely on stat() and you don't need fseek() for the purpose of
determining the file size. But even fseek() will work the way you
expect it, on such systems.
Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan...@ifh.de
Mostly true, except for block-special files. They are NUL
padded to a block boundary after the last write. We
considered them explicitly when developing the semantics
of text and binary files 20 years ago.
It's hard for me to imagine why an OS would allow me to read more than the
10 bytes I wrote when writing and reading as a stream. That's not to say
there isn't one, of course.
>Ever used a VMS system? Remember the unit used by DIR to display the
>file sizes?
Yeah - and the PDP-10 where the high bit set indicated that the 36-bit
word was a line number for various programming languages, and text was
stored 5 characters to the word, with nul padding. Except I don't
remember getting the nul characters when I read the files.
>Text files have a different problem.
Problems with text files I can understand.
>The C standard's specification of fseek() guarantees only what can be
>meaningfully provided by an implementation running on *any* of the
>operating systems still in current use. You can easily understand this
>once you remove the POSIX model from your mind.
It is a trap I fall into on occasion.
Well, I thank you all for your indulgence, and help with the process of my
de-ignorantification (if I may be so bold).
> On 25 Jun 2004 11:38:22 GMT, Dan...@cern.ch (Dan Pop) wrote:
> >Let's say that you open a binary file in write
> >mode and write 10 bytes to it. You close it, reopen it in read mode and
> >start reading. After reading, say, 512 bytes, you get an EOF instead of
> >a byte value. What is the size of your binary file: 10 or 512 bytes
> >and why?
>
> It's hard for me to imagine why an OS would allow me to read more than the
> 10 bytes I wrote when writing and reading as a stream. That's not to say
> there isn't one, of course.
And in 1983, when the standardization of C began, it was hard to find
an OS that *did* let you read just the bytes you wrote to it. Even
Unix, that paragon of simplicity and elegance, has block-special devices
that violate this rule. It is a mark of the influence of Unix that you
can make a statement like this today.
> "Kevin D. Quitt" <KQuitt...@IEEIncUNMUNG.com> wrote in message
> news:7udmd09t19kbr5gqn...@4ax.com...
>
> > So for binary files, fseek can't get you to the end of the file, and for
> > text files, ftell can't tell you where you are. Wonderful.
>
> Neither is true. For binary files, fseek may get you *further* than
> you think the end of file should be, because of NUL padding.
Wouldn't you expect fgetc() to *also* return those NUL characters? It
seems strange that the standard would allow this inconsistency between
the functions.
--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
> That's one of the reasons I've recommended to start by defining the
> concept of file size. Let's say that you open a binary file in write
> mode and write 10 bytes to it. You close it, reopen it in read mode and
> start reading. After reading, say, 512 bytes, you get an EOF instead of
> a byte value. What is the size of your binary file: 10 or 512 bytes
> and why?
I'd say 512, since that's how many bytes I can access.
If some of those bytes are not as meaningful as others, that's
presumably an application issue.
There's also an analogy with another part of the C standard: The sizeof
operator includes padding bytes in its result.
I'm not sure why you brought up fgetc. "Where you
expect the end of the file to be" might well be,
for example, immediately after the last byte your
program just explicitly output to that file. PJP
noted that seeking to "end of file" might position
the stream beyond that point. If you try to read
the data (with the equivalent of a sequence of
fgetc calls) you would indeed expect to also read
the padding characters in this case. (There are
dozens of different file organizations, sometimes
several available on a single OS, and they have
varying properties, only some of which have been
alluded to so far in this thread.)
> Barry Margolin wrote:
> > "P.J. Plauger" <p...@dinkumware.com> wrote:
> >>"Kevin D. Quitt" <KQuitt...@IEEIncUNMUNG.com> wrote...
> >>>So for binary files, fseek can't get you to the end of the file, and for
> >>>text files, ftell can't tell you where you are. Wonderful.
> >>Neither is true. For binary files, fseek may get you *further* than
> >>you think the end of file should be, because of NUL padding.
> > Wouldn't you expect fgetc() to *also* return those NUL characters? It
> > seems strange that the standard would allow this inconsistency between
> > the functions.
>
> I'm not sure why you brought up fgetc. "Where you
> expect the end of the file to be" might well be,
> for example, immediately after the last byte your
> program just explicitly output to that file.
I brought up fgetc because if you can't use fseek and ftell to find out
the "size" of the file, the only other possibility is to count how many
times you can call fgetc until you get EOF. It doesn't matter whether
your program wrote the bytes -- if fgetc returns them then they are
effectively in the file and should be counted as part of its size.
You're mapping the file format into a Unix-like notion
instead of dealing with it in its native environment.
Suppose I'm using a fixed-size record of 80 bytes for
some (COBOL source text?) file and write 5 records to
that file, on a system where files are all stored in
some integral number of 512-byte blocks. The size of
the file as known to the record manager component of
the operating system will be 5x80, and the stdio
implementation might or might not return a 6th record
if your program tries to read it. There is a real
chance that you would be able to read a 6th record but
not a 481st byte (since the stdio internal buffering
would be in terms of entire records, and an attempt to
fetch the 7th record would certainly fail). So you
might be able to read *some* but not *all* of the
padding using fgetc, and the number of successful
fgetcs really isn't a very meaningful indication of
file "size".
> Barry Margolin wrote:
> > I brought up fgetc because if you can't use fseek and ftell to find out
> > the "size" of the file, the only other possibility is to count how many
> > times you can call fgetc until you get EOF. It doesn't matter whether
> > your program wrote the bytes -- if fgetc returns them then they are
> > effectively in the file and should be counted as part of its size.
>
> You're mapping the file format into a Unix-like notion
> instead of dealing with it in its native environment.
No, I'm mapping it into C's concept of files, since this is comp.std.c.
> Suppose I'm using a fixed-size record of 80 bytes for
> some (COBOL source text?) file and write 5 records to
> that file, on a system where files are all stored in
> some integral number of 512-byte blocks. The size of
> the file as known to the record manager component of
> the operating system will be 5x80, and the stdio
> implementation might or might not return a 6th record
> if your program tries to read it. There is a real
> chance that you would be able to read a 6th record but
> not a 481st byte (since the stdio internal buffering
> would be in terms of entire records, and an attempt to
> fetch the 7th record would certainly fail). So you
> might be able to read *some* but not *all* of the
> padding using fgetc, and the number of successful
> fgetcs really isn't a very meaningful indication of
> file "size".
As far as a standard C program is concerned, a file's size is the number
of characters you can read with fgetc. It might or might not be
consistent with what some OS-specific call (analogous to Unix's stat())
returns as the file size.
It's kind of interesting that the C standard doesn't say that the
implementation may not read more characters than were written to the
file. This implies that for full portability, a program must provide
its own way of detecting the application-specific EOF.
> As far as a standard C program is concerned, a file's size is the number
> of characters you can read with fgetc. It might or might not be
> consistent with what some OS-specific call (analogous to Unix's stat())
> returns as the file size.
Yep.
> It's kind of interesting that the C standard doesn't say that the
> implementation may not read more characters than were written to the
> file. This implies that for full portability, a program must provide
> its own way of detecting the application-specific EOF.
It's more than interesting -- it's intentional. See:
http://www.dinkumware.com/manuals/reader.aspx?b=c/&h=lib_file.html
particularly the section labeled Text and Binary Streams.
>It's kind of interesting that the C standard doesn't say that the
>implementation may not read more characters than were written to the
>file.
On the contrary, for binary files, it says that it may:
3 A binary stream is an ordered sequence of characters that can
transparently record internal data. Data read in from a binary
stream shall compare equal to the data that were earlier written
out to that stream, under the same implementation. Such a stream
may, however, have an implementation-defined number of null
characters appended to the end of the stream.
>This implies that for full portability, a program must provide
>its own way of detecting the application-specific EOF.
Most "portable" binary file formats have a header/tag/whatever
containing information about the file's contents, anyway.
Another approach is to make each "record" in a binary file start with a
non-zero byte. Once you have read a record starting with a zero
byte, you know you have reached the logical end of the stream.
Sooooo. It is safe for me to use fseek( F, 0, SEEK_END ), then ftell( F )
to determine how much memory I have to allocate to hold the entire binary
stream, albeit there may be some extra NUL bytes that I'm accommodating.
So in spite of the statement "A binary stream need not meaningfully
support fseek calls with a whence value of SEEK_END" in 7.19.9.2 (3), I
can count on not getting a number smaller than the count of data written
to the file?
> On 28 Jun 2004 16:33:03 GMT, Dan...@cern.ch (Dan Pop) wrote:
>
>> 3 A binary stream is an ordered sequence of characters that can
>> transparently record internal data. Data read in from a binary
>> stream shall compare equal to the data that were earlier written
>> out to that stream, under the same implementation. Such a stream
>> may, however, have an implementation-defined number of null
>> characters appended to the end of the stream.
>
>
>
> Sooooo. It is safe for me to use fseek( F, 0, SEEK_END ), then
ftell( F )
> to determine how much memory I have to allocate to hold the entire binary
> stream, albeit there may be some extra NUL bytes that I'm accommodating.
> So in spite of the statement "A binary stream need not meaningfully
> support fseek calls with a whence value of SEEK_END" in 7.19.9.2 (3), I
> can count on not getting a number smaller than the count of data written
> to the file?
The clause Dan Pop cited doesn't cancel out 7.19.9.2p3; there's still no
guarantee that fseek on a binary stream with a whence value of SEEK_END
will produce meaningful results. However, if you locate the end of the
file by other methods (such as reading bytes from the file until you get
an EOF indication), then what you've said applies to that location.
This may be academic, but does "implementation-defined number of
null characters" preclude an infinite number of null characters?
In other words, would an implementation which never returns EOF
for binary streams but instead returns an endless sequence of null
characters be conforming?
After all, streams in general are not required to have finite length.
-- Niklas Matthies
>This may be academic, but does "implementation-defined number of
>null characters" preclude an infinite number of null characters?
>In other words, would an implementation which never returns EOF
>for binary streams but instead returns an endless sequence of null
>characters be conforming?
>After all, streams in general are not required to have finite length.
There is going to be a problem with binary files opened in append mode:
5 Opening a file with append mode ('a' as the first character in
the mode argument) causes all subsequent writes to the file to be
forced to the then current end-of-file, regardless of intervening
calls to the fseek function. In some implementations, opening a
binary file with append mode ('b' as the second or third character
in the above list of mode argument values) may initially position
the file position indicator for the stream beyond the last data
written, because of null character padding.
Which raises another question: if opening a binary file in append mode
has "well" defined semantics WRT the file position indicator, why can't
fseek(fp, 0, SEEK_END) share the same semantics?
Yes, it's "academic", and infinity is not a "number".
(If you want to make it one, you have to abandon the
normal rules for integer arithmetic.)
Infinity may not be a number in the mathematical sense, but "number
of" has a broader and different meaning. When someone says "an input
stream may consist of any number of characters" this certainly doesn't
by itself imply that those streams are necessarily finite just because
the word "number" was used.
And I'm sure that if there were platforms that lack the concept of EOF
in their native APIs but instead just return zero characters (stranger
things have happened) then this would suddenly stop to be academic.
My question aimed at finding out if this may actually be the case, and
hence could explain the (slight) vagueness of the standard in this
matter.
-- Niklas Matthies
Note: mathematicians sometimes do precisely that.
Are you sure? I would be interested in seeing an
input stream that contains an infinite number of characters.
If you can't get an EOF no matter how many characters you read, the
length is not any finite number. On many systems, /dev/zero behaves
exactly that way.
To learn the precise value of infinity, find a
suitable platform and run
#include <stdio.h>
int main(void) {
FILE *stream = fopen("/dev/zero", "rb");
if (stream != NULL) {
unsigned long count = 0;
while (getc(stream) != EOF)
++count;
printf ("infinity = %lu\n", count);
}
return 0;
}
CONTEST RULES: Pulling the plug, sending signals, or
otherwise causing the program to cease before normal
termination voids your entry. Removing /dev/zero from
the file system also voids your entry, and probably
your system's viability, too ...
So does /dev/tty if the user has infinite patience.
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
To be precise, the term for this is "unbounded".
>Wojtek Lerch <Wojt...@yahoo.ca> writes:
>[...]
>> If you can't get an EOF no matter how many characters you read, the
>> length is not any finite number. On many systems, /dev/zero behaves
>> exactly that way.
>
>So does /dev/tty if the user has infinite patience.
All that's required from the user is not to press the eof key ;-)
> In <ln8ye4a...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:
>
> >Wojtek Lerch <Wojt...@yahoo.ca> writes:
> >[...]
> >> If you can't get an EOF no matter how many characters you read, the
> >> length is not any finite number. On many systems, /dev/zero behaves
> >> exactly that way.
> >
> >So does /dev/tty if the user has infinite patience.
>
> All that's required from the user is not to press the eof key ;-)
Well, he also has to continue pressing some *other* keys. Or he could
get a paperweight and put it on the Return key, then walk away (you want
to use Return to avoid problems with the OS's input editing buffer,
which on some OSes will limit you to about 256 characters/line).
>In article <cc0q6v$m3f$2...@sunnews.cern.ch>, Dan...@cern.ch (Dan Pop)
>wrote:
>
>> In <ln8ye4a...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:
>>
>> >Wojtek Lerch <Wojt...@yahoo.ca> writes:
>> >[...]
>> >> If you can't get an EOF no matter how many characters you read, the
>> >> length is not any finite number. On many systems, /dev/zero behaves
>> >> exactly that way.
>> >
>> >So does /dev/tty if the user has infinite patience.
>>
>> All that's required from the user is not to press the eof key ;-)
>
>Well, he also has to continue pressing some *other* keys. Or he could
>get a paperweight and put it on the Return key, then walk away (you want
>to use Return to avoid problems with the OS's input editing buffer,
>which on some OSes will limit you to about 256 characters/line).
Nope. The lack of an end of file condition is enough. The standard
doesn't say how long it must take until an input function returns.
> > The clause Dan Pop cited doesn't cancel out 7.19.9.2p3; there's
> > still no guarantee that fseek on a binary stream with a whence value
> > of SEEK_END will produce meaningful results. However, if you locate
> > the end of the file by other methods (such as reading bytes from the
> > file until you get an EOF indication), then what you've said applies
> > to that location.
>
> This may be academic, but does "implementation-defined number of
> null characters" preclude an infinite number of null characters?
Infinity is not a number.
--
pete