There are many reasons to avoid making the FILE structure public. I
am in such a situation now since I am reimplementing i/o for the 64 bit
version of lcc-win. I specified in stdio.h an opaque structure.
Currently, 7.19.1(2) says:
FILE
which is an object type capable of recording all the information needed to control a
stream, including its file position indicator, a pointer to its associated buffer (if any), an
error indicator that records whether a read/write error has occurred, and an end-of-file
indicator that records whether the end of the file has been reached;
Could that be changed to:
FILE
which is an object type or an incomplete object type capable of recording all the
information needed to control a stream, ... etc, the rest is the same
Rationale:
There is actually no function in the interface that needs to know what
is the layout of the FILE object. The getc() macro can be made a built-in macro
or similar by the implementation that decides to have an opaque structure.
Modifying the FILE structure when some code (in getc() for instance) access it
is extremely difficult after many programs have been compiled with a certain
size of it. Making it opaque allows to update the file i/o and add fields
without making the old programs obsolete.
>Currently, 7.19.1(2) says:
>FILE
>which is an object type capable of recording all the information needed to control a
>stream, including its file position indicator, a pointer to its associated buffer (if any), an
>error indicator that records whether a read/write error has occurred, and an end-of-file
>indicator that records whether the end of the file has been reached;
>Could that be changed to:
>FILE
>which is an object type or an incomplete object type capable of recording all the
>information needed to control a stream, ... etc, the rest is the same
You'll with your implementation until the standard has been changed?
>Rationale:
>There is actually no function in the interface that needs to know what
>is the layout of the FILE object. The getc() macro can be made a built-in macro
>or similar by the implementation that decides to have an opaque structure.
>Modifying the FILE structure when some code (in getc() for instance) access it
>is extremely difficult after many programs have been compiled with a certain
>size of it. Making it opaque allows to update the file i/o and add fields
>without making the old programs obsolete.
There are many different ways to skin this cat.
In 64-bit Solaris, the "FILE" object is defined as:
struct __FILE_TAG {
long __pad[16];
};
(16 * 8 or 128 bytes)
All access to the "FILE" is done through functions and there's
no risk of getc() and other "macros" doing the wrong thing.
The only thing fixed in the ABI is the size if the FILE object.
I.e., you can get what you want without even changing the standard.
Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
Sure, I will probably do that, but then...
Wouldn't it be better to change the standard and not to have to
write this obscure code?
There are also many reasons to expose it, most notably the ability to
implement getc() and putc() as efficient macros, which typically
requires access to the internals of the FILE structure. With inline
functions, that's not needed so much any more.
> Could that [7.19.1p2] be changed to:
>
> FILE
> which is an object type or an incomplete object type capable of recording all the
> information needed to control a stream, ... etc, the rest is the same
That seems like a reasonable request to me.
> There is actually no function in the interface that needs to know what
> is the layout of the FILE object. The getc() macro can be made a built-in macro
> or similar by the implementation that decides to have an opaque structure.
>
> Modifying the FILE structure when some code (in getc() for instance) access it
> is extremely difficult after many programs have been compiled with a certain
> size of it. Making it opaque allows to update the file i/o and add fields
> without making the old programs obsolete.
Not if you embed code into the client, whether by inline functions or
built-in macros. You can always add fields at the end, but you can't go
changing the layout without preventing any direct access from client
code.
--
Larry Jones
Why can't I ever build character in a Miami condo or a casino somewhere?
-- Calvin
Yes, but that's not a good reason for the standard to require
FILE to be an object type. (You didn't say it was, but I thought
I'd clarify.)
[snip]
I agree that allowing FILE to be an incomplete type is a good idea;
it fixes a minor annoyance with no real cost that I can see (other
than the cost of making *any* change to the standard).
Of course such a change wouldn't be of any help for current
compilers until the next C standard is published (unless it's made
in a Technical Corrigendum, but I don't think that's feasible).
--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
>Not if you embed code into the client, whether by inline functions or
>built-in macros. You can always add fields at the end, but you can't go
>changing the layout without preventing any direct access from client
>code.
Actually, if you use a "FILE object" it is NOT possible to add
bits on the end. (The size is different and copying the datastructure
in compiled code would copy only a bit) This was somewhat worse
because many of the older implementations used:
FILE __iob[NFILE];
#define stdin &__iob[0]
#define stdout &__iob[1]
#define stderr &__iob[2]
Clearly that breaks when the FILE object grows in size.
I agree with this.
> There are also many reasons to expose it, most notably the ability to
> implement getc() and putc() as efficient macros, which typically
> requires access to the internals of the FILE structure. With inline
> functions, that's not needed so much any more.
Surely for an inline function you would need the code and therefore the
definition of the struct in the header?
I think this is one instance where it would make sense for the
implementation to use some compiler magic if it wants to have the
benefits of FILE being incomplete and also the benefits of getc/putc
being inlined.
>> Could that [7.19.1p2] be changed to:
>>
>> FILE
>> which is an object type or an incomplete object type capable of recording all the
>> information needed to control a stream, ... etc, the rest is the same
>
> That seems like a reasonable request to me.
Agreed.
Any implementation where users have been making use of the internals of
the struct and where the implementation wants to allow them to continue
would have the freedom to keep the existing definition, so the change
does not introduce of itself any breakage.
>> There is actually no function in the interface that needs to know what
>> is the layout of the FILE object. The getc() macro can be made a built-in macro
>> or similar by the implementation that decides to have an opaque structure.
>>
>> Modifying the FILE structure when some code (in getc() for instance) access it
>> is extremely difficult after many programs have been compiled with a certain
>> size of it. Making it opaque allows to update the file i/o and add fields
>> without making the old programs obsolete.
>
> Not if you embed code into the client, whether by inline functions or
> built-in macros. You can always add fields at the end, but you can't go
> changing the layout without preventing any direct access from client
> code.
If getc/putc have been imlemented as macros then there is a good
argument for the implementation avoiding breaking pre-compiled code
which makes use of any standard library functions which are implemented
as macros (the most obvious being getc/putc, but any of the others could
legally be implemented as macros). However, this is a choice for the
implementation not the standard, and currently implementations could
still break such binary compatibility.
I agree with Jacob that this would be a sensible change to make to the
standard. It would not break the conformance of any existing compiler,
and if implementors use this freedom the only code they could break
would be non-standard code anyway.
--
Flash Gordon
I agree on the first point, and I *almost* agree on the second.
Such a change to the standard could break some existing conforming,
even strictly conforming, code, but I don't think it would break any
code that's not specifically contrived to make this point.
For example, I believe this (rather silly) program is strictly
conforming (ignoring the issue of whether performing output that might
succeed or fail breaks strict conformance):
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
const FILE tmp = *stdin;
*stdin = tmp;
puts("Hello, world");
if (sizeof tmp > 0) exit(EXIT_SUCCESS);
else exit(EXIT_FAILURE);
}
This doesn't affect the argument.
But none of the standard library functions use FILE objects, only
pointers to them; the objects themselves are allocated by library code.
And the standard makes it clear that they might be magic, so you can't
go copying them and expect them to still work. So, I think it is,
officially, "safe".
> This was somewhat worse
> because many of the older implementations used:
>
> FILE __iob[NFILE];
> #define stdin &__iob[0]
> #define stdout &__iob[1]
> #define stderr &__iob[2]
>
> Clearly that breaks when the FILE object grows in size.
And leads to horrible kludges like allocating parallel structures
(__bufendtab[], anyone?) instead.
--
Larry Jones
At times like these, all Mom can think of is how long she was in
labor with me. -- Calvin
What if a FILE contains `volatile' elements?
(As long as we're dealing with contrived code, we may as well
deal with contrived objections ...)
Yeah, ok.
But referring to sizeof(FILE) is sufficient to make the point.
> #include <stdio.h>
> #include <stdlib.h>
> int main(void)
> {
> const FILE tmp = *stdin;
> *stdin = tmp;
> puts("Hello, world");
> if (sizeof tmp > 0) exit(EXIT_SUCCESS);
> else exit(EXIT_FAILURE);
> }
Gnuplot does the assignment bit:
That was the only example I could find where the code wasn't also
manipulating the object:
http://google.com/codesearch?as_q=\sFILE\+\w%2B\s*[%3B%3D]&as_lang=c&as_case=y
It's not uncommon apparently for people to typedef FILE, so it's difficult
to ascertain how uncommon of a contrivance the code truly is [within the
Google Code Search bot reachable corpus].
That code copies a FILE object *and then makes use of the copy*.
The standard specifically doesn't guarantee that that will work.
(That doesn't make the code in question evil or incorrect, it's
just non-portable, in that it depends on things not guaranteed by
the C standard.)
[snip]
> jacob navia <ja...@nospam.org> wrote:
>>
>> There are many reasons to avoid making the FILE structure public.
>
> There are also many reasons to expose it, most notably the ability to
> implement getc() and putc() as efficient macros, which typically
> requires access to the internals of the FILE structure. With inline
> functions, that's not needed so much any more.
It isn't necessary for FILE to be an object type to implement
getc() and putc() as equally efficient macros. It's not even
really all that difficult -- the appropriate structure can still
be declared in the header file, just not made available through
FILE. Similarly any header-supplied inline functions could cast
any FILE * parameters to the appropriate internal type.
>But none of the standard library functions use FILE objects, only
>pointers to them; the objects themselves are allocated by library code.
>And the standard makes it clear that they might be magic, so you can't
>go copying them and expect them to still work. So, I think it is,
>officially, "safe".
True, but unfortunately many of the existing applications cannot be
fixed in that way.
>> This was somewhat worse
>> because many of the older implementations used:
>>
>> FILE __iob[NFILE];
>> #define stdin &__iob[0]
>> #define stdout &__iob[1]
>> #define stderr &__iob[2]
>>
>> Clearly that breaks when the FILE object grows in size.
>And leads to horrible kludges like allocating parallel structures
>(__bufendtab[], anyone?) instead.
Yep. I've rewritten part of the Solaris 9/10 libc and instead of
having 3 or more different arrays, there's only one additional
structure for the 32 bit implementation:
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/inc/file64.h
struct xFILEdata {
uintptr_t _magic; /* Check: magic number, must be first */
unsigned char *_end; /* the end of the buffer */
rmutex_t _lock; /* lock for this structure */
mbstate_t _state; /* mbstate_t */
int _altfd; /* alternate fd if > 255 */
};
This is used specifically for NFILE; we still have _bufendtab but
that was, unfortunately, badly used; rather than using the index
in _iob[], _bufendtab was index with the file descriptor. It
is kept as it was, i.e., we set it, but we don't use it anywhere
in the code.
Originally, finding the additional data required us to search through
all the allocated files. In Solaris 10 (9?) it's a lot easier;
there's a array of "struct xFILEdata _xftab[NFILE]" for the first
NFILEs but the rest of the FILEs are allocated together with xFILEdata.
>Surely for an inline function you would need the code and therefore the
>definition of the struct in the header?
Inline and macro are both horrid if you want to define a proper
ABI. I would avoid both.
>I think this is one instance where it would make sense for the
>implementation to use some compiler magic if it wants to have the
>benefits of FILE being incomplete and also the benefits of getc/putc
>being inlined.
Or just make the code a little but slower.
In today's machines, the cost of a function call compared to the i/o
to the disk is so small that this distinction doesn't have any practical
meaning.
The stream operations aren't CPU bound but limited by
o RAM access. The RAM chips are MUCH slower than the CPU
o DISK access. That's even slower, even if you have solid state disks.
In this context, caring about a function call is completely
BOGUS, since in many cases, inlining functions produces more
cache misses since the code is bigger, and slows down the CPU
instead of improving it.
My point anyway, is that I would like the standard to ALLOW
having an opaque structure. Other implementations that have
an object type for FILE are NOT affected since that is obviously
allowed too. There is no change, just allowing something.
>In today's machines, the cost of a function call compared to the i/o
>to the disk is so small that this distinction doesn't have any practical
>meaning.
It's one of the reasons why in the 64 bit Solaris ABIs we don't
use macros for stdio.
>My point anyway, is that I would like the standard to ALLOW
>having an opaque structure. Other implementations that have
>an object type for FILE are NOT affected since that is obviously
>allowed too. There is no change, just allowing something.
The Solaris 64 bit implementation uses an opaque FILE but it
is still a complete type. Using a incomplete type is not
currently allowed and that is what some people want.
In most environments, it would be a *lot* slower. Don't forget that
getc/putc usually just deal with the file buffer; they only do real I/O
when the buffer is empty/full.
--
Larry Jones
My brain is trying to kill me. -- Calvin
That depends on how much inline magic the compiler has. But even with
standard C, the definition of the struct could be at block scope so that
it's not usable outside the function(s).
--
Larry Jones
The real fun of living wisely is that you get to be smug about it. -- Hobbes
But getc/putc don't do I/O to the disk in most cases, they just
manipulate the file buffer (in a fairly simple way).
> In this context, caring about a function call is completely
> BOGUS, since in many cases, inlining functions produces more
> cache misses since the code is bigger, and slows down the CPU
> instead of improving it.
getc/putc are traditionally such simple functions that inlining them
(whether via inline functions or macros) doesn't significantly affect
the code size. And it does (or at least did) significantly improve
performance -- the folks who originally made them macros were not prone
to premature micro-optimization.
--
Larry Jones
Oh, what the heck. I'll do it. -- Calvin
This is up to the implementers to determine. The modification I propose
allows the implementor to decide this stuff, that's all. If in
certain environments, a macro is better, then the implementor can do it
that way.
If in other environments the opaque type is better, with my modification
that would be possible.
The best of all worlds!
Looking just a bit beyond the boundaries of C itself, I think
you'll find that's no longer the case. In a multi-threaded program
(all Solaris programs are multi-threaded nowadays, even when their
thread count happens to be unity), getc/putc cannot muck with the
buffer until they've taken a lock or synchronized in some other
way. By the time you've locked and released a mutex or waited and
posted a semaphore or cobbled together a compare-and-swap loop,
there's enough going on that the time spent getting into and out of
a function starts to dwindle in importance.
Making FILE an incomplete type wouldn't require getc and putc to
perform real I/O on each call. They could easily convert the FILE* to
_REAL_FILE* and access the buffer that way.
I think you're losing track here: "make the code a little bit slower" as
in "adding one function call" - but the point is that a getc call does
little enough that function call overhead can indeed be significant.
Anyway, this entire discussion seems thoroughly off track by now -
threads may invalidate the above argument, the trick you mention
invalidates another, and neither is about what Navia proposes anyway
since he's not trying to require FILE to be an incomplete type.
--
Hallvard
As macros, they could only do that if _REAL_FILE were in scope. What's
the point of creating an opaque type if the actual type must be equally
visible. If stdio.h is an actual visible file, then the definition o
_REAL_FILE will be just as visible that that of FILE. If stdio.h is not
an actual, visible file, but implemented by compiler "magic", then the
constents of FILE would be just as inaccessible as the contents of
_REAL_FILE.
Right. Larry was saying that it would be a *lot* slower; I was trying
to refute that claim.
> Anyway, this entire discussion seems thoroughly off track by now -
> threads may invalidate the above argument, the trick you mention
> invalidates another, and neither is about what Navia proposes anyway
> since he's not trying to require FILE to be an incomplete type.
--
The real benefit of letting FILE be an incomplete type is that it makes it
illegal for programs to try to define their own FILE objects or compute the
size of FILE, and permits compilers to detect and diagnose those obvious
programming errors. Making it harder for programmers to circumvent that and
directly access the contents of whatever structure the FILE pointer really
points to is just a bonus.
YES!!!
and again
YES!!!
Let's keep it 100% clear: I would like to ADD incomplete types to
the possibilities of implementing FILE objects. I am not requesting
that ALL implementations should have an incomplete type.
In implementations where the people writing the library see that
getc() could be significantly speeded up with a macro... then, they
can still go ahead an write that macro!
And I have yet to see anyone who disagrees with your suggestion.
>jacob navia <ja...@nospam.org> wrote:
>> In today's machines, the cost of a function call compared to the i/o
>> to the disk is so small that this distinction doesn't have any practical
>> meaning.
>But getc/putc don't do I/O to the disk in most cases, they just
>manipulate the file buffer (in a fairly simple way).
But almost all getc() and putc() calls correspond to some kind of i/o
operation; just not one-to-one, so it is reasonable to consider how
the function call overhead compares with the (amortised) i/o time.
A fast current disk can read or write of the order of 100MB/s, and
processors run at about 2GHz; that means that getc() and putc() must
take less than about 20 cycles to not be a bottleneck.
Of course, programs accessing huge quantities of data usually don't
use putc() and getc().
-- Richard
--
Please remember to mention me / in tapes you leave behind.
> Keith Thompson wrote:
> > Making FILE an incomplete type wouldn't require getc and putc to
> > perform real I/O on each call. They could easily convert the FILE* to
> > _REAL_FILE* and access the buffer that way.
>
> As macros, they could only do that if _REAL_FILE were in scope. What's
> the point of creating an opaque type if the actual type must be equally
> visible.
The point is that FILE _must_ be visible to the user-programmer, but
_REAL_FILE is implementation-specific, and since programmers can't even
rely on it being there at all, they are less likely to assume anything
about its contents.
Technically, that is not an argument, but (I'm almost ashamed to admit)
programmers are humans, and therefore the psychological difference
between the Standard-required (even if not internally Standard-defined)
FILE and the implementation-specific, quite possibly even absent,
_REAL_FILE, _File_Struct_Internal, __filearray, and any number of other
variations, is relevant.
Richard
>struct __FILE_TAG {
> long __pad[16];
>};
>
>(16 * 8 or 128 bytes)
>
>All access to the "FILE" is done through functions and there's
>no risk of getc() and other "macros" doing the wrong thing.
>
>The only thing fixed in the ABI is the size if the FILE object.
Why does the FILE object have to be the right size? Isn't a FILE *
always cast to the real structure type before using it?
>In article <4ac4b315$0$83240$e4fe...@news.xs4all.nl>,
>Casper H.S. Dik <Caspe...@Sun.COM> wrote:
>>struct __FILE_TAG {
>> long __pad[16];
>>};
>>
>>(16 * 8 or 128 bytes)
>>
>>All access to the "FILE" is done through functions and there's
>>no risk of getc() and other "macros" doing the wrong thing.
>>
>>The only thing fixed in the ABI is the size if the FILE object.
>Why does the FILE object have to be the right size? Isn't a FILE *
>always cast to the real structure type before using it?
I'm afraid that they still use:
#define stdin &_iob[0]
#define stdout &_iob[1]
#define stderr &_iob[2]
_iob[] is fixed by the ABI, it cannot be grown.
I also believe, though, that if an object is a complete time,
that you cannot grow it anyway as the size of the object is
also part of the ABI.