Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

64 bit fread/fwrite/fopen etc

2,464 views
Skip to first unread message

jacob navia

unread,
Jul 2, 2004, 3:32:12 AM7/2/04
to
is there any standard concerning the handling of files bigger as 4GB?

Under the linux operating system there is ftello and fseeko, that use a
64 bit offset type. Under windows there is lseeki64 and telli64.

Is there any recommendations as to how to implement this feature
in standard C?

Thanks for your time


Richard Bos

unread,
Jul 2, 2004, 3:53:22 AM7/2/04
to
"jacob navia" <ja...@jacob.remcomp.fr> wrote:

> is there any standard concerning the handling of files bigger as 4GB?

Yes. Use fopen().

> Under the linux operating system there is ftello and fseeko, that use a
> 64 bit offset type. Under windows there is lseeki64 and telli64.
>
> Is there any recommendations as to how to implement this feature
> in standard C?

Yes. Use an implementation which handles that sized files using normal
fseek(), or fgetpos(). There is no excuse to resort to non-standard
functions when the existing ones can do the job just as well; that is
mere laziness on the part of the implementation's designers.

Richard

Harti Brandt

unread,
Jul 2, 2004, 5:37:38 AM7/2/04
to
On Fri, 2 Jul 2004, jacob navia wrote:

jn>is there any standard concerning the handling of files bigger as 4GB?
jn>
jn>Under the linux operating system there is ftello and fseeko, that use a
jn>64 bit offset type. Under windows there is lseeki64 and telli64.
jn>
jn>Is there any recommendations as to how to implement this feature
jn>in standard C?

Well, BSD systems just have extended all the file size related types to be
64bit so you don't need to do anything special except that they could not
change the return type of ftell() (which is long). So, depending on
whether you have a POSIX system you have to use f[gs]etpos() (C, no posix)
or ftello, fseeko (posix). Running BSD on a 64 bit platform lets you also
use ftell()/fseek(), but that would be rather poor coding style.

There is also the bug called Large File Summit where a couple of companies
decided that they don't want to break 32 bit apps and made all this a lot
more complicated: all usual file functions are duplicated with names
that have 64 appended to them. You can either use them directly or compile
your program with -D_LARGEFILE_SOURCE and/or -D_FILE_OFFSET_BITS=64. In
that case the normal calls are mapped to their 64 bit equivalent, which
more or less makes this equivalent to the BSD funtionality. If you compile
a 64-bit app on these systems, the 64 bit versions get used by default.
See lf64(5) on your Solaris system.

harti

Antoine Leca

unread,
Jul 2, 2004, 5:34:25 AM7/2/04
to
En 40e51416....@news.individual.net, Richard Bos va escriure:

> "jacob navia" <ja...@jacob.remcomp.fr> wrote:
>> Is there any recommendations as to how to implement this feature
>> in standard C?
>
> Yes. Use an implementation which handles that sized files using normal
> fseek(), [...]

Given that Jacob targets Win32, so does not have TOTAL liberty in the
election of the width of long, I am not sure this particular advice is
useful.

Or are you suggesting a new set of SEEK_xxx constants, like those of V6?


> or fgetpos().

Now, the problem is that while fgetpos() IS a way to have the problem
solved, this information does not appear to have percolated in the head of
most programmers, which appear to largely prefer fseek() and ftell(). Hence
Jacob's interrogation.


> There is no excuse to resort to non-standard
> functions when the existing ones can do the job just as well; that is
> mere laziness on the part of the implementation's designers.

Tell this to Microsoft when they chose to have long stay 32-bit on Win64.


Antoine

Richard Bos

unread,
Jul 2, 2004, 5:57:43 AM7/2/04
to
"Antoine Leca" <ro...@localhost.gov> wrote:

> En 40e51416....@news.individual.net, Richard Bos va escriure:

> > or fgetpos().
>
> Now, the problem is that while fgetpos() IS a way to have the problem
> solved, this information does not appear to have percolated in the head of
> most programmers, which appear to largely prefer fseek() and ftell().

This is a problem with those programmers, _not_ with C.

> > There is no excuse to resort to non-standard
> > functions when the existing ones can do the job just as well; that is
> > mere laziness on the part of the implementation's designers.
>
> Tell this to Microsoft when they chose to have long stay 32-bit on Win64.

This is a problem with Micro$oft (so what else is new?), _not_ with C.

Richard

Message has been deleted

jacob navia

unread,
Jul 2, 2004, 6:51:33 AM7/2/04
to
If we exclude all programmers working under linux and those
working under windows there is not a lot of C programmers left

:-)

Dan Pop

unread,
Jul 2, 2004, 9:49:25 AM7/2/04
to
In <cc331t$156$1...@news-reader1.wanadoo.fr> "jacob navia" <ja...@jacob.remcomp.fr> writes:

>is there any standard concerning the handling of files bigger as 4GB?

Yup, both C89 and C99 handle the situation when the size of long is
limiting the usage of ftell/fseek: fgetpos/fsetpos support arbitrarily
large files. Simply define fpos_t as appropriate.

>Under the linux operating system there is ftello and fseeko, that use a
>64 bit offset type. Under windows there is lseeki64 and telli64.
>
>Is there any recommendations as to how to implement this feature
>in standard C?

This feature is not needed in standard C.

The right way of fixing standard C is by making ftell/fseek use something
like ftell_t, a type whose integer conversion rank cannot be smaller than
long's.

Legacy applications (using hardcoded long) will continue to work to
the same extent as they did before the change, because they *have*
to include <stdio.h>, while new applications could simply forget
about fgetpos/fsetpos.

The only change visible to a legacy application is that ftell will not
return -1L when the file position indicator cannot be represented by
the type long (it's not clear if the current standard requires this
behaviour or if it is a case of undefined behaviour). Hopefully,
such applications were written to use fgetpos and not ftell.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan...@ifh.de

Antoine Leca

unread,
Jul 2, 2004, 11:21:47 AM7/2/04
to
En 40e53124....@news.individual.net, Richard Bos va escriure:

>
> This is a problem with those programmers, _not_ with C.

I know this is comp.std.c. Anyway, there is no real reason to discuss about
some ethered conception of programming languages when it clearly diverge
from reality.

And when reality is clearly doing something very different from what it is
idealized, I believe I have to concentrate on the former, i.e. reality. You
appear to stay with the ideal.


Antoine

Eric Sosman

unread,
Jul 2, 2004, 12:04:18 PM7/2/04
to

Let's review the original poster's question for a moment,
shall we? Here it is (slightly snipped):

> is there any standard concerning the handling of files bigger as 4GB?

> [...]


> Is there any recommendations as to how to implement this feature
> in standard C?

Pay particular heed to the final phrase "in standard C."
The questioner has said he is not interested in non-Standard
extensions, work-arounds, compatibility schemes, or "reality,"
but in Standard C solutions.

... and the Standard C answer is simply this: The only
Standard I/O facilities are those of the Standard library.
Either they'll work or they won't. If they choke on files
larger than 4GB -- or 2GB or 64KB or 512PB -- that's a quality
of implementation issue, nothing more.

--
Eric....@sun.com

Paul Eggert

unread,
Jul 2, 2004, 7:08:05 PM7/2/04
to
At Fri, 2 Jul 2004 09:32:12 +0200, "jacob navia" <ja...@jacob.remcomp.fr> writes:

> is there any standard concerning the handling of files bigger as 4GB?

Sure. POSIX (ISO/IEC 9945) standardizes fseeko and ftello as an
extension to the C standard. See:

http://www.opengroup.org/onlinepubs/009695399/functions/fseek.html

The C standard also has fsetpos etc. but it's less useful (and far
less used) in practice, since the file pointers are cookies instead
of being integers.

Paul Eggert

unread,
Jul 2, 2004, 7:11:00 PM7/2/04
to
At Fri, 02 Jul 2004 11:18:54 GMT, Richard Kettlewell <inv...@invalid.invalid> writes:

> The way to access large files in Linux (and most/all UNIX platforms) is
> to use -D_LARGEFILE_SOURCE, which makes off_t and fpos_t wide enough for
> large files.

Nope, that's not right. Use -D_FILE_OFFSET_BITS=64 instead (or better
yet, compile on a 64-bit platform :-).

_LARGEFILE_SOURCE controls the visibility of symbols like fseeko, but
it doesn't make off_t wide.

Douglas A. Gwyn

unread,
Jul 3, 2004, 12:50:54 AM7/3/04
to
jacob navia wrote:
> is there any standard concerning the handling of files bigger as 4GB?
> Under the linux operating system there is ftello and fseeko, that use a
> 64 bit offset type. Under windows there is lseeki64 and telli64.

ftell and fseek, when the implementation has decided to
make them work for large files.

fgetpos and fsetpos, otherwise.

> Is there any recommendations as to how to implement this feature
> in standard C?

Yes, choose your typedefs more carefully.

Douglas A. Gwyn

unread,
Jul 3, 2004, 12:51:48 AM7/3/04
to
jacob navia wrote:
> If we exclude all programmers working under linux and those
> working under windows there is not a lot of C programmers left

And a good thing, too.

Message has been deleted

Antoine Leca

unread,
Jul 5, 2004, 6:58:12 AM7/5/04
to
En tLKdnWsfwNf...@comcast.com, Douglas A. Gwyn va escriure:

Should I understand that if nobody used C, that would be the best?


Antoine

jacob navia

unread,
Jul 5, 2004, 7:27:39 AM7/5/04
to

"Antoine Leca" <ro...@localhost.gov> a écrit dans le message de
news:40e934a6$0$23522$626a...@news.free.fr...

Well it looks like. Eliminating all Unix + windows users would leave
C where some people want it to be: A dying language for
old microprocessors


Antoine Leca

unread,
Jul 5, 2004, 7:29:44 AM7/5/04
to
En 40E58782...@sun.com, Eric Sosman va escriure:

> Let's review the original poster's question for a moment,
> shall we? Here it is (slightly snipped):
>
> > Is there any recommendations as to how to implement this feature
> > in standard C?

Fact is, I know beforehand (or I believe I know) what Jacob asks for, and I
did not pay the due attention to the word "standard" here. As a result, I
misread some comments.

So I apologize for the hard words I used.

(Another problem is that answering an implementer with "Use an
implementation that..." is not particularly useful.)


> Pay particular heed to the final phrase "in standard C."
> The questioner has said he is not interested in non-Standard
> extensions, work-arounds, compatibility schemes, or "reality,"
> but in Standard C solutions.

Well, my milleage varies about Jacob's real intention.

But I agree you are right to read it this way, and answer correspondingly.

(Now, not considering "reality" at all would be an error, IMHO.)


> ... and the Standard C answer is simply this: The only
> Standard I/O facilities are those of the Standard library.
> Either they'll work or they won't. If they choke on files
> larger than 4GB -- or 2GB or 64KB or 512PB -- that's a quality
> of implementation issue, nothing more.

Please note that I was merely pointing out that fgetpos() is widely
disregarded by programmers. [ For example, while I think fgetpos() predates
the LFS, fseeko() made its way into Posix. That said, I have no real ideas
about the relative use of fgetpos() WRT ftello(). ]

And I do not consider this lack of acceptance is only a quality of
implementation: it has more to see with features that are useful to put in
the standard, and features that do not. Which might be a interesting thread,
but it did not sparkle on.


Furthermore, about quality of implementation: I regard this forum as useful
for people like Jacob that tries to implement, with his limited resources, a
better support of Standard C in his implementation: at least here he has
good possibilities to encounter people (like you) that are willingful to
share experience.

But, I may easily be wrong, and perhaps this forum should be close to such
discussions, and only restricted to discussions about the terms of the
standard and the ways it can be convoluted.

I hope not.


Antoine

Francis Glassborow

unread,
Jul 5, 2004, 7:53:47 AM7/5/04
to
In article <ccbdvb$nku$1...@news-reader2.wanadoo.fr>, jacob navia
<ja...@jacob.remcomp.fr> writes

>Well it looks like. Eliminating all Unix + windows users would leave
>C where some people want it to be: A dying language for
>old microprocessors

Not at all, it would leave it as an excellent language for embedded
systems programming (like the 30 odd found in almost any modern car)


--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

jacob navia

unread,
Jul 5, 2004, 2:45:11 PM7/5/04
to
Dear friends
I am in a classic situation:
sizeof(int) == 32 bits
sizeof(fpos_t) should be 64 bits, now is 32 bits

Functions like fseek receive a "long int" as parameter.
Impossible to seek more than 2GB (it is a signed long).

fpos_t is not used throughout the standard.

I have to rewrite the standard library for the 64 bit version.
I thought that here people could give me an advise as to
how to do that correctly.

In the 64 bit version a long int is still 32 bits. I know this
make little sense but I have to care about Microsoft standards,
and they decide the size of things.

fseek will not work for files larger than 2GB then.

Concerning fsetpos/fgetpos I can change without any problems
since I can define fpos_t as long long.

ftell has the same problem as fseek, even more since the result is
an int and not an fpos_t.

Is there any work in changing this?

If I define

int fseek(FILE *stream, fpos_t offset,int whence);
and
fpos_t ftell(FILE *stream)

is that OK?

Thanks


Harti Brandt

unread,
Jul 5, 2004, 3:15:05 PM7/5/04
to
On Mon, 5 Jul 2004, jacob navia wrote:

jn>Dear friends
jn>I am in a classic situation:
jn>sizeof(int) == 32 bits
jn>sizeof(fpos_t) should be 64 bits, now is 32 bits

fpos_t might be a struct if, for example, the underlying file system
supports only record structured files (think of RMS or FILES11) so
sizeof(fpos_t) has nothing to do with the maximum seek range.

jn>Functions like fseek receive a "long int" as parameter.
jn>Impossible to seek more than 2GB (it is a signed long).
jn>
jn>fpos_t is not used throughout the standard.

How that. Read 7.19.9.1 - the description of fgetpos().

jn>I have to rewrite the standard library for the 64 bit version.
jn>I thought that here people could give me an advise as to
jn>how to do that correctly.

ECANNOTPARSE. What do you mean with rewriting the standard library?

jn>In the 64 bit version a long int is still 32 bits. I know this
jn>make little sense but I have to care about Microsoft standards,
jn>and they decide the size of things.

What is 64-bit then? Pointers and long longs?

jn>fseek will not work for files larger than 2GB then.

That's true when LONG_MAX is 2G.

If I read the above as 'I'm implementing the standard library' and you
have a compiling environment where a long is 32-bit, then there is no way
around this limitation of fseek/ftell. If my assumption is true, then you
can, of course, define fpos_t as unsigned long long (given that your
underlying system is not-record oriented but treats a file as a stream of
bytes).

jn>Concerning fsetpos/fgetpos I can change without any problems
jn>since I can define fpos_t as long long.
jn>
jn>ftell has the same problem as fseek, even more since the result is
jn>an int and not an fpos_t.

How that? ftell must return a long according to 7.19.9.4.

jn>Is there any work in changing this?
jn>
jn>If I define
jn>
jn>int fseek(FILE *stream, fpos_t offset,int whence);
jn>and
jn>fpos_t ftell(FILE *stream)
jn>
jn>is that OK?

That doesn't make any sense to me. fseek() and ftell() must be defined
just as they are defined in the standard.

harti

jacob navia

unread,
Jul 5, 2004, 3:25:57 PM7/5/04
to

"Harti Brandt" <bra...@dlr.de> a écrit dans le message de
news:2004070521...@beagle.kn.op.dlr.de...

> On Mon, 5 Jul 2004, jacob navia wrote:
>
> jn>Dear friends
> jn>I am in a classic situation:
> jn>sizeof(int) == 32 bits
> jn>sizeof(fpos_t) should be 64 bits, now is 32 bits
>
> fpos_t might be a struct if, for example, the underlying file system
> supports only record structured files (think of RMS or FILES11) so
> sizeof(fpos_t) has nothing to do with the maximum seek range.
>

OK, granted. In my case would be a long long int.

> jn>Functions like fseek receive a "long int" as parameter.
> jn>Impossible to seek more than 2GB (it is a signed long).
> jn>
> jn>fpos_t is not used throughout the standard.
>
> How that. Read 7.19.9.1 - the description of fgetpos().

Yes fgetpos is OK. But why fseek must have a long int as
second parameter?

Wouldn't fpos_t be the right type??????

>
> jn>I have to rewrite the standard library for the 64 bit version.
> jn>I thought that here people could give me an advise as to
> jn>how to do that correctly.
>
> ECANNOTPARSE. What do you mean with rewriting the standard library?
>

It means that I have to write fread fopen fclose etc, in the 64 bit version.
This prompted me to see if I could get a 64 bit file support into the
standard library, *including* fseek.

> jn>In the 64 bit version a long int is still 32 bits. I know this
> jn>make little sense but I have to care about Microsoft standards,
> jn>and they decide the size of things.
>
> What is 64-bit then? Pointers and long longs?

Exactly

>
> jn>fseek will not work for files larger than 2GB then.
>
> That's true when LONG_MAX is 2G.

Yes, and will be the case in the 64 bit version: long int == int == 32 bits
long long == void * == 64 bits

>
> If I read the above as 'I'm implementing the standard library' and you
> have a compiling environment where a long is 32-bit, then there is no way
> around this limitation of fseek/ftell. If my assumption is true, then you
> can, of course, define fpos_t as unsigned long long (given that your
> underlying system is not-record oriented but treats a file as a stream of
> bytes).
>

But why not being CONSEQUENT and change the standard for fseek???
why not

int fseek(FILE *, fpos_t, int)


> jn>Concerning fsetpos/fgetpos I can change without any problems
> jn>since I can define fpos_t as long long.
> jn>
> jn>ftell has the same problem as fseek, even more since the result is
> jn>an int and not an fpos_t.
>
> How that? ftell must return a long according to 7.19.9.4.

Yes, you are right, I just misread maybe because in my implementation
long is equal to int.

jacob


Paul Eggert

unread,
Jul 5, 2004, 4:48:08 PM7/5/04
to
At Mon, 5 Jul 2004 20:45:11 +0200, "jacob navia" <ja...@jacob.remcomp.fr> writes:

> ftell has the same problem as fseek, even more since the result is
> an int and not an fpos_t.
>
> Is there any work in changing this?

Not in the C Standard, no. It's too late to change this.

But POSIX has standardized a better solution: fseeko and ftello deal
with off_t values, not long values, so they should work just fine on
your platform. You should implement and use them rather than
reinventing the wheel slightly differently.

http://www.opengroup.org/onlinepubs/000095399/functions/fseek.html
http://www.opengroup.org/onlinepubs/000095399/functions/ftell.html

Keith Thompson

unread,
Jul 5, 2004, 5:52:09 PM7/5/04
to
"jacob navia" <ja...@jacob.remcomp.fr> writes:
> Yes fgetpos is OK. But why fseek must have a long int as
> second parameter?

Um, because the standard says so.

If long int is 32 bits, fseek() can't handle offsets greater than 2GB.
If feek() is going to handle offsets greater than 2GB, long int has to
be larger than 32 bits (which Win64 apparently doesn't allow).

> Wouldn't fpos_t be the right type??????

No, fpos_t isn't necessarily an integer type.

I agree it would have been better for the argument to fseek() to be
some implementation-defined typedef guaranteed to be an integer type,
but it's too late to change the standard.

Since you can't do what you want to do in standard C within Win64, you
might consider mimicking whatever POSIX has done (I'm not familiar
with the details). You'll still have feek() and ftell() (which will
work on files up to 2GB) and fsetpos() and fgetpos() (which should
work on files of any size).

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

jacob navia

unread,
Jul 5, 2004, 6:19:42 PM7/5/04
to
Yes, I will follow the Posix proposition. It is a good idea.

Thanks for your time

P.S. I would say they should be part of the C 2009.

Message has been deleted

Keith Thompson

unread,
Jul 5, 2004, 9:44:24 PM7/5/04
to

I think there's a simpler solution for C 2009. The POSIX fseeko() and
ftello() functions exist only because the C standard feek() and
ftell() functions specify long arguments, which are now inadequate on
many systems. (Does POSIX require fpos_t to be an integer type?)

For C 2009, I suggest changing the declarations of feek() and ftell()
to something like:

int fseek(FILE *stream, fseek_t offset, int whence);
fseek_t ftell(FILE *stream);

where fseek_t is a typedef for a signed integer type big enough to
represent file offsets.

This should (I think) eliminate the need for fseeko(), ftello(),
fsetpos(), and fgetpos(), though the latter two would be kept around
for backward compatibility.

Douglas A. Gwyn

unread,
Jul 5, 2004, 10:22:45 PM7/5/04
to
jacob navia wrote:
> Well it looks like. Eliminating all Unix + windows users would leave
> C where some people want it to be: A dying language for
> old microprocessors

Actually it leaves it the language of choice for
microprocessors. The less bloated the environment,
the more suitable C is for implementing systems.

Douglas A. Gwyn

unread,
Jul 5, 2004, 10:38:23 PM7/5/04
to
jacob navia wrote:
> ftell has the same problem as fseek, even more since the result is
> an int and not an fpos_t.

ftell returns a long int.

> Is there any work in changing this?
> If I define
> int fseek(FILE *stream, fpos_t offset,int whence);
> and
> fpos_t ftell(FILE *stream)
> is that OK?

Actually it would be kind of stupid, because that
would force fpos_t to be defined as long int in an
environment where you have already said that long
int is too small. Make fpos_t the same as
uint_least64_t.

Douglas A. Gwyn

unread,
Jul 5, 2004, 10:48:53 PM7/5/04
to
jacob navia wrote:
> Yes fgetpos is OK. But why fseek must have a long int as
> second parameter?

Because that is the interface specified by the standard,
which programmers rightfully assume is what the C
implementation provides.

> Wouldn't fpos_t be the right type??????

No. fpos_t is in general a "cookie", which can be
defined as an integral type by some implementations but
which can be defined as a structure type by other
implementations. The right type is what the standard
interface specification says it is.

> It means that I have to write fread fopen fclose etc, in the 64 bit version.
> This prompted me to see if I could get a 64 bit file support into the
> standard library, *including* fseek.

You can *extend* fseek by assigning more cases for its
third argument, much as was done for 6th Edition Unix's
seek system call (which could seek to block number or
to byte number). You can also upgrade type long int to
be wider than 32 bits. You can also add functions with
names not reserved for the (C) standard functions, e.g.
fseeko (check the POSIX spec before implementing).

> But why not being CONSEQUENT and change the standard for fseek???

Because you are not in a position to change the standard,
and it is unlikely that the keepers of the C standard
would introduce an incompatible change. The best that
you might hope for is for the type of the offset
argument to be made a typedef name instead of a basic
type, which would allow existing implementations to
avoid disrupting their existing user code base while
allowing implementations such as yours to define the
type as something other than long int.

> why not
> int fseek(FILE *, fpos_t, int)

Because we do not want to require fpos_t to be defined
as long int for existing C implementations.

Douglas A. Gwyn

unread,
Jul 5, 2004, 10:50:26 PM7/5/04
to
Keith Thompson wrote:
> I agree it would have been better for the argument to fseek() to be
> some implementation-defined typedef guaranteed to be an integer type,
> but it's too late to change the standard.

Actually that's the kind of change that can be "phased in",
although it would take many years before it really has the
desired effect. People in need of an immediate solution
have to use something else.

Douglas A. Gwyn

unread,
Jul 5, 2004, 10:55:04 PM7/5/04
to
Keith Thompson wrote:
> For C 2009, I suggest changing the declarations of feek() and ftell()
> to something like:
> int fseek(FILE *stream, fseek_t offset, int whence);
> fseek_t ftell(FILE *stream);
> where fseek_t is a typedef for a signed integer type big enough to
> represent file offsets.

While it won't be done by 2009, that's a reasonable suggestion
for the next major revision of the C standard.

> This should (I think) eliminate the need for fseeko(), ftello(),
> fsetpos(), and fgetpos(), though the latter two would be kept around
> for backward compatibility.

Unfortunately, due to linkage compatibility requirements,
many existing implementations would still have to retain
the old function linkage (interface), so new functions
would be needed even if spelled the same (using compiler
magic to link with the new versions). This would have to
be carefully explored and a consensus reached by affected
implementors.

Dan Pop

unread,
Jul 6, 2004, 8:45:01 AM7/6/04
to
In <ln6591r...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:

>I think there's a simpler solution for C 2009. The POSIX fseeko() and
>ftello() functions exist only because the C standard feek() and
>ftell() functions specify long arguments, which are now inadequate on
>many systems. (Does POSIX require fpos_t to be an integer type?)
>
>For C 2009, I suggest changing the declarations of feek() and ftell()
>to something like:
>
> int fseek(FILE *stream, fseek_t offset, int whence);
> fseek_t ftell(FILE *stream);
>
>where fseek_t is a typedef for a signed integer type big enough to
>represent file offsets.

How is this any different from my own proposal, upthread, except for the
fact that I used ftell_t instead of fseek_t? ;-)

>This should (I think) eliminate the need for fseeko(), ftello(),
>fsetpos(), and fgetpos(), though the latter two would be kept around
>for backward compatibility.

Unfortunately, it also breaks some existing programs, as explained in my
proposal...

Dan Pop

unread,
Jul 6, 2004, 9:28:40 AM7/6/04
to
In <ccc7jn$r87$1...@news-reader4.wanadoo.fr> "jacob navia" <ja...@jacob.remcomp.fr> writes:

>I am in a classic situation:

There is nothing "classic" about your situation.

>sizeof(int) == 32 bits
>sizeof(fpos_t) should be 64 bits, now is 32 bits

Which means that your current implementation cannot handle files over
2 GB, although NTFS has supported them for years, due to your brain
dead choice for fpos_t...

>Functions like fseek receive a "long int" as parameter.
>Impossible to seek more than 2GB (it is a signed long).

OTOH, fsetpos should not be affected of this limitation, should it?

>fpos_t is not used throughout the standard.

??? The standard is not a C program.

>I have to rewrite the standard library for the 64 bit version.

Whatever changes you're making in this area should better implemented on
the 32-bit, especially since you're unwilling to change the size of long.

>I thought that here people could give me an advise as to
>how to do that correctly.

Indeed, but your common practice is to reject good advice.

>In the 64 bit version a long int is still 32 bits. I know this
>make little sense but I have to care about Microsoft standards,
>and they decide the size of things.
>
>fseek will not work for files larger than 2GB then.
>
>Concerning fsetpos/fgetpos I can change without any problems
>since I can define fpos_t as long long.

And make the change in the 32-bit version, too.

>ftell has the same problem as fseek, even more since the result is
>an int and not an fpos_t.

Nope, the result is a long. Did you bother to read the specification?

>Is there any work in changing this?
>
>If I define
>
>int fseek(FILE *stream, fpos_t offset,int whence);
>and
>fpos_t ftell(FILE *stream)
>
>is that OK?

It could be sort of OK, *if* you didn't use fpos_t for this purpose, as it
is defined to serve a completely different purpose (and doesn't even need
to be a scalar type).

But you could use a type of your own invention instead of long, let's
say ftell_t (because it's the return type of the function ftell). You'd
be no longer standard conforming, but this is a minor point, as most
existing programs wouldn't tell the difference.

Existing software, using hard coded long, would be still limited to 2 GB,
except that ftell won't be able to return -1 if the current position is
beyond the 2GB limit. This is the only disadvantage of this approach.
It's up to you to decide how important it is.

New software, written for your implementation (we know you couldn't care
less about the portability of software developed for your implementation)
could use ftell_t and be able to fseek on files of arbitrary sizes.

Clueful programmers, caring about portability (I doubt there is one of
them among your user base, but I could be wrong) would use fgetpos/fsetpos
so, the change of the utmost importance is to make these functions work
on files of any size even in your 32-bit implementation.

And, in the highly unlikely event that someone would ever want to port
software developed on your implementation, he would have to typedef
ftell_t as long himself. No big deal, but the ported program will be
limited by the other platform's long capabilities (no problem if the
other platform is a 64-bit Unix).

The biggest advantage, however, is that you'd create existing practice in
this direction, that could lead to the C standard being changed
accordingly. As large files become more and more common, the limitations
of both fseek/ftell and fgetpos/fsetpos should impose a revision in this
part of the standard, in the next version.

Then again, I know you well enough to be sure that you're going to
completely ignore the advice contained in this post...

Antoine Leca

unread,
Jul 6, 2004, 10:46:41 AM7/6/04
to
En cce9e8$mnr$2...@sunnews.cern.ch, Dan Pop va escriure:

> Clueful programmers, caring about portability (I doubt there is one of
> them among your user base, but I could be wrong)

I know of at least two projects where lcc-win32 is used, irregularly I must
admit, to "bring fresh air" and avoid the dependancy to either GCC or
Microsoft(/Borland) way of using C in the Win32 environment. And this is
precisely about bringing more portability.


Antoine

Dan Pop

unread,
Jul 6, 2004, 11:44:10 AM7/6/04
to

But that's the point: lcc-win32 is used as the alternate compiler, not as
the main development compiler. From my own (short) experience with
lcc-win32, it is horrible for people caring about portability: dirty
headers (in -ansic mode) and misleading diagnostics.

Message has been deleted

Antoine Leca

unread,
Jul 6, 2004, 12:54:18 PM7/6/04
to
En ccehca$ial$2...@sunnews.cern.ch, Dan Pop va escriure:

> But that's the point: lcc-win32 is used as the alternate compiler,
> not as the main development compiler.

Granted. I missed the fine point.


Antoine

Keith Thompson

unread,
Jul 6, 2004, 2:47:08 PM7/6/04
to
Dan...@cern.ch (Dan Pop) writes:
> In <ln6591r...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:
[...]

> >For C 2009, I suggest
[...]

> How is this any different from my own proposal, upthread, except for the
> fact that I used ftell_t instead of fseek_t? ;-)

I must have missed that. I didn't mean to be redundant.
Or repetitious. Or, for that matter, redundant.

Keith Thompson

unread,
Jul 6, 2004, 2:49:56 PM7/6/04
to
Richard Kettlewell <inv...@invalid.invalid> writes:

> Keith Thompson <ks...@mib.org> writes:
> > This should (I think) eliminate the need for fseeko(), ftello(),
> > fsetpos(), and fgetpos(), though the latter two would be kept around
> > for backward compatibility.
>
> fpos_t needs to contain an mbstate_t, so I don't think you can do away
> with it and f{set,get}pos merely by changing fseek/ftell in this way.

Ok, you're probably right. (I don't understand the mbstate_t stuff
very well, so I can't really comment further.)

jacob navia

unread,
Jul 6, 2004, 4:48:30 PM7/6/04
to

"Antoine Leca" <ro...@localhost.gov> a écrit dans le message de
news:40ead99d$0$26347$626a...@news.free.fr...

There are 5000 downloads/month of my software from all
over the world. It is used from China to Latin america,
Russia, Europe, and the US.

It is easy to use, simple, as the language that compiles.

Building a C99 compiler, assembler, linker, IDE and debugger
is a very complex undertaking. My work is far from perfect.

It is used for teaching C, for driving hardware, for backend
as other languages (Eiffel, objective c) as windows programming tool
and many other applications.

Strict compliance is very difficult to achieve, and I concentrated
in many other parts of the system: the debugger, the IDE. I
added a tutorial with 350 pages, and many other things. The IDE
will show you any mispellings underlining the offending word
in red. It is the only IDE that does this in real time.

Still, if strict compliance is difficult, I try my best and ask in this
forum for advice.

Mr Pop can say anything about my work, this is an unmoderated
newsgroup.


Douglas A. Gwyn

unread,
Jul 7, 2004, 5:49:31 AM7/7/04
to
Keith Thompson wrote:
> Ok, you're probably right. (I don't understand the mbstate_t stuff
> very well, so I can't really comment further.)

mbstate_t holds information about the current "shift state"
and is useful only in environments where there are such
multibyte encodings. The idea of "shift" dates back to
5-level Teletype codes where one would "shift out" to an
alternate graphical interpretation of the code values for
digits and special symbols, and "shift in" to get back to
the alphabetical interpretation of the codes. The shift
state is a "mode" (in more modern terminology), and such
encoding has fairly well-known pros and cons.

To correctly interpret a multibyte text stream upon seeking
into the middle of it, assuming it uses a shift encoding,
it is necessary to also reestablish the proper shift state
for the stream.

Whether any of this is good design, it still has to be
dealt with in places like Japan.

Richard Bos

unread,
Jul 7, 2004, 9:18:29 AM7/7/04
to
Paul Eggert <egg...@twinsun.com> wrote:

> At Mon, 5 Jul 2004 20:45:11 +0200, "jacob navia" <ja...@jacob.remcomp.fr> writes:
>
> > ftell has the same problem as fseek, even more since the result is
> > an int and not an fpos_t.
> >
> > Is there any work in changing this?
>
> Not in the C Standard, no. It's too late to change this.
>
> But POSIX has standardized a better solution: fseeko and ftello deal
> with off_t values, not long values, so they should work just fine on
> your platform.

Enlighten me: why is this any better than using the perfectly ISO
f[gs]etpos()?

Richard

Richard Bos

unread,
Jul 7, 2004, 9:18:29 AM7/7/04
to
"jacob navia" <ja...@jacob.remcomp.fr> wrote:

> "Antoine Leca" <ro...@localhost.gov> a écrit dans le message de
> news:40ead99d$0$26347$626a...@news.free.fr...
> > En ccehca$ial$2...@sunnews.cern.ch, Dan Pop va escriure:
> > > But that's the point: lcc-win32 is used as the alternate compiler,
> > > not as the main development compiler.
> >
> > Granted. I missed the fine point.
>

> There are 5000 downloads/month of my software from all
> over the world. It is used from China to Latin america,
> Russia, Europe, and the US.

That means nothing. _I_ downloaded it, but I never use it anymore.

> It is easy to use, simple,

Beg to differ... I found it not a patch on Dev-C++.

> as the language that compiles.

Erm... that grammatical not. Sense neither do.

> Building a C99 compiler, assembler, linker, IDE and debugger
> is a very complex undertaking. My work is far from perfect.

Both of these are true. Note: all kudos to you for even writing a
compiler that actually compiles. It is, as you say, complex. Which
leaves me wondering: why on earth are you looking for more trouble than
you need?
For example, in this thread, why do you go looking for an _alternative_
to the ISO C functions, such as fseeko() and lseeki64(), rather than use
the perfectly Standard fgetpos(). It is already part of ISO C. It can do
what you want it to do. Why not implement it the way you want it to
function? What, in short, is _wrong_ with being portable and ISO
compatible, apart from not being MiCr0$oFt, L33NuX, and K3wL?

Richard

Harti Brandt

unread,
Jul 7, 2004, 10:33:34 AM7/7/04
to
On Wed, 7 Jul 2004, Richard Bos wrote:

RB>Paul Eggert <egg...@twinsun.com> wrote:
RB>
RB>> At Mon, 5 Jul 2004 20:45:11 +0200, "jacob navia" <ja...@jacob.remcomp.fr> writes:
RB>>
RB>> > ftell has the same problem as fseek, even more since the result is
RB>> > an int and not an fpos_t.
RB>> >
RB>> > Is there any work in changing this?
RB>>
RB>> Not in the C Standard, no. It's too late to change this.
RB>>
RB>> But POSIX has standardized a better solution: fseeko and ftello deal
RB>> with off_t values, not long values, so they should work just fine on
RB>> your platform.
RB>
RB>Enlighten me: why is this any better than using the perfectly ISO
RB>f[gs]etpos()?

Well, for some applications fseeko/ftello can do what f[gs]etpos() cannot.
With fsetpos you can only seek to a position in a file you have been
before. With fseeko you can do whatever magic you could do with fseek.
For example create files with holes (at least on unix-like systems) or
efficiently seek around in fixed-length record files.

If he implements the standard library for his compiler I would recommend
to implement all four of these functions (given that the undelying systems
allows him to).

harti

Richard Bos

unread,
Jul 7, 2004, 11:01:13 AM7/7/04
to
Harti Brandt <bra...@dlr.de> wrote:

> On Wed, 7 Jul 2004, Richard Bos wrote:
>
> RB>Paul Eggert <egg...@twinsun.com> wrote:
> RB>

> RB>> But POSIX has standardized a better solution: fseeko and ftello deal
> RB>> with off_t values, not long values, so they should work just fine on
> RB>> your platform.
> RB>
> RB>Enlighten me: why is this any better than using the perfectly ISO
> RB>f[gs]etpos()?
>
> Well, for some applications fseeko/ftello can do what f[gs]etpos() cannot.
> With fsetpos you can only seek to a position in a file you have been
> before. With fseeko you can do whatever magic you could do with fseek.

Ah, yes. Using SEEK_CUR or SEEK_END, of course.

> efficiently seek around in fixed-length record files.

Mind you, strictly speaking there is nothing stopping fseek() from
allowing this unless the size of the record is larger than INT_MAX. But
I see what you mean.

> If he implements the standard library for his compiler I would recommend
> to implement all four of these functions (given that the undelying systems
> allows him to).

Except, of course, that fseeko and ftello are in the user's namespace,
so they should be hidden behind a #define or similar.

Richard

Dan Pop

unread,
Jul 7, 2004, 10:42:18 AM7/7/04
to

On binary streams, off_t values have well defined semantics and you
can perform arithmetic on them, while fpos_t is an ADT. Do you also
need a picture?

Dan Pop

unread,
Jul 7, 2004, 10:44:36 AM7/7/04
to

>Both of these are true. Note: all kudos to you for even writing a
>compiler that actually compiles.

He didn't write it. He merely took lcc and ported it to Win32 and added
a ton of bells and whistles.

Paul Eggert

unread,
Jul 7, 2004, 1:02:24 PM7/7/04
to
At Wed, 07 Jul 2004 15:01:13 GMT, r...@hoekstra-uitgeverij.nl (Richard Bos) writes:

>> If he implements the standard library for his compiler I would recommend
>> to implement all four of these functions (given that the undelying systems
>> allows him to).
>
> Except, of course, that fseeko and ftello are in the user's namespace,
> so they should be hidden behind a #define or similar.

The POSIX standard provides for this compatibility pedanticism, as
strictly conforming POSIX applications are required to define the
feature test macro _POSIX_C_SOURCE to 200112L before including
<stdio.h>.

Hardly anybody but bothers, though, since the default compilation
environment on most platforms is not pedantic and does not require the
use of feature test macros. (Thank goodness.)

Antoine Leca

unread,
Jul 8, 2004, 6:56:13 AM7/8/04
to
En arOdncuCqOm...@comcast.com,
Douglas A. Gwyn va escriure Re: 64 bit fread/fwrite/fopen etc

>
> The idea of "shift" dates back to 5-level Teletype codes

Really?
I would have guess it dated from the early typewriters, where to print the
uppercase letters, one have to "shift" the carrier, to get the alternate set
of glyphs which were stamped in the upper part of the forms.

But both certainly predate me, so I am not sure. I was exposed to
typewriters much before I got access to 5+1-holes tapes, so perhaps this
misguided me.


Antoine

Antoine Leca

unread,
Jul 8, 2004, 6:58:19 AM7/8/04
to
En wwv3c45...@rjk.greenend.org.uk, Richard Kettlewell va escriure:

> fpos_t needs to contain an mbstate_t

Does it? Even in the case when all encodings are stateless (Jacob's case)?

I would have guess not, but I must confess I did not reread that part.


Antoine

Francis Glassborow

unread,
Jul 8, 2004, 8:17:45 AM7/8/04
to
In article <40ed28b0$0$18106$626a...@news.free.fr>, Antoine Leca
<ro...@localhost.gov> writes

I have even used an ancient typewriter that used two levels of shift:-)


--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Message has been deleted

Jonathan Thornburg

unread,
Jul 8, 2004, 10:36:09 AM7/8/04
to
jacob navia wrote:
> Well it looks like. Eliminating all Unix + windows users would leave
> C where some people want it to be: A dying language for
> old microprocessors

And for writing other operating systems. (Take a quick look at the
source tree for {Free,Net,Open}BSD and you'll find a lot of C...)
And for writing application software under other operating systems...

--
-- "Jonathan Thornburg (remove -animal to reply)" <jth...@aei.mpg-zebra.de>
Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut),
Golm, Germany, "Old Europe" http://www.aei.mpg.de/~jthorn/home.html
"Washing one's hands of the conflict between the powerful and the
powerless means to side with the powerful, not to be neutral."
-- quote by Freire / poster by Oxfam

Dan Pop

unread,
Jul 8, 2004, 10:22:23 AM7/8/04
to

You have forgotten the context, which is character set encodings.
Manual typewriters are entirely irrelevant to this context.

Robert Harris

unread,
Jul 8, 2004, 11:06:38 AM7/8/04
to
The "shift" on telex machines switches between letters and numbers
(there aren't any lower case letters; 5 holes means 32 different
characters).

Robert

Antoine Leca

unread,
Jul 8, 2004, 12:41:24 PM7/8/04
to
En ccjlav$qdb$1...@sunnews.cern.ch, Dan Pop va escriure:
> In <40ed28b0$0$18106$626a...@news.free.fr> "Antoine Leca"

>>>
>>> The idea of "shift" dates back to 5-level Teletype codes
>>
>> I would have guess it dated from the early typewriters,
>
> You have forgotten the context, which is character set encodings.
> Manual typewriters are entirely irrelevant to this context.

I think not.

Further research showed me that in fact, Doug is (almost) right and the idea
of "shifts", used in 5-moment telegraph codes [Baudot 1874], is independant
and in fact _predates_ the invention of the "shift key" (and carrier
mechanism) [Remington 1878]. And of course, the real teletype, which is a
fusion of both (a typewriter and a telegraph), occured some years after.

But there was quite a bit of inventions these years; and if course,
electricity, transmissions and mechanics were not as opposed as they are
nowadays.


Antoine

Dan Pop

unread,
Jul 8, 2004, 1:17:03 PM7/8/04
to
In <40ed7998$0$18118$626a...@news.free.fr> "Antoine Leca" <ro...@localhost.gov> writes:

>En ccjlav$qdb$1...@sunnews.cern.ch, Dan Pop va escriure:
>> In <40ed28b0$0$18106$626a...@news.free.fr> "Antoine Leca"
>>>>
>>>> The idea of "shift" dates back to 5-level Teletype codes
>>>
>>> I would have guess it dated from the early typewriters,
>>
>> You have forgotten the context, which is character set encodings.
>> Manual typewriters are entirely irrelevant to this context.
>
>I think not.

Then, what is the relevance of manual typewriters to this context?

Antoine Leca

unread,
Jul 9, 2004, 3:47:35 AM7/9/04
to
En ccjvif$kie$1...@sunnews.cern.ch, Dan Pop va escriure:

> Then, what is the relevance of manual typewriters to this context?

And the point is?


Antoine

Antoine Leca

unread,
Jul 9, 2004, 3:50:33 AM7/9/04
to
En wwvn02a...@rjk.greenend.org.uk, Richard Kettlewell va escriure:
> I was responding to a suggestion that a future revision of the C
> standard redefined fseek/ftell in terms of fpos_t, not to any
> suggestion for fixing Jacob's immediate problem.

Ah yes, sorry, I was missing the scope was made that large.

You are right this is a stop-over for global replacement.


Antoine

Dan Pop

unread,
Jul 9, 2004, 7:22:31 AM7/9/04
to

You introduced the manual typewriters into the discussion and now you're
asking ME what is the point? It's your goddam point, not mine!

Douglas A. Gwyn

unread,
Jul 9, 2004, 9:54:27 AM7/9/04
to
Harti Brandt wrote:
> Well, for some applications fseeko/ftello can do what f[gs]etpos() cannot.
> With fsetpos you can only seek to a position in a file you have been
> before. With fseeko you can do whatever magic you could do with fseek.
> For example create files with holes (at least on unix-like systems) or
> efficiently seek around in fixed-length record files.

However, on systems that have fseeko, the type used as offset
by fsetpos very likely is actually the same as for fseeko and
fsetpos will probably work for setting current file position
to an arbitrary computed byte offset. POSIX could have
mandated this behavior for fsetpos rather than adding a new
function.

Wojtek Lerch

unread,
Jul 9, 2004, 12:01:45 PM7/9/04
to
Douglas A. Gwyn wrote:
> However, on systems that have fseeko, the type used as offset
> by fsetpos very likely is actually the same as for fseeko and
> fsetpos will probably work for setting current file position
> to an arbitrary computed byte offset. POSIX could have
> mandated this behavior for fsetpos rather than adding a new
> function.

But fsetpos doesn't take an offset; it takes an opaque type that
contains an offset and possibly an mbstate_t. Do you mean POSIX should
have specified the exact contents of fpos_t?

Keith Thompson

unread,
Jul 9, 2004, 2:06:56 PM7/9/04
to
Dan...@cern.ch (Dan Pop) writes:
> In <40ee4dfe$0$20265$626a...@news.free.fr> "Antoine Leca"
> <ro...@localhost.gov> writes:
> >En ccjvif$kie$1...@sunnews.cern.ch, Dan Pop va escriure:
> >> Then, what is the relevance of manual typewriters to this context?
> >
> >And the point is?
>
> You introduced the manual typewriters into the discussion and now you're
> asking ME what is the point? It's your goddam point, not mine!

Yes, by all means, let's have yet another interminable thread arguing
about whether somebody's complaint about somebody's statement about
whether some minor point is relevant is relevant is relevant. They're
so much fun. And relevant.

Brian Inglis

unread,
Jul 9, 2004, 3:31:03 PM7/9/04
to
On Tue, 06 Jul 2004 01:44:24 GMT in comp.std.c, Keith Thompson
<ks...@mib.org> wrote:

>"jacob navia" <ja...@jacob.remcomp.fr> writes:
>> Yes, I will follow the Posix proposition. It is a good idea.
>>
>> Thanks for your time
>>
>> P.S. I would say they should be part of the C 2009.
>
>I think there's a simpler solution for C 2009. The POSIX fseeko() and
>ftello() functions exist only because the C standard feek() and
>ftell() functions specify long arguments, which are now inadequate on
>many systems. (Does POSIX require fpos_t to be an integer type?)
>
>For C 2009, I suggest changing the declarations of feek() and ftell()
>to something like:
>
> int fseek(FILE *stream, fseek_t offset, int whence);
> fseek_t ftell(FILE *stream);
>
>where fseek_t is a typedef for a signed integer type big enough to
>represent file offsets.


>
>This should (I think) eliminate the need for fseeko(), ftello(),
>fsetpos(), and fgetpos(), though the latter two would be kept around
>for backward compatibility.

In non-POSIX filesystem environments, ftell()/fseek() have to encode
fpos_t information (e.g. record number and byte offset encoded as
record number * max record length + byte offset, but sometimes not
even allowing the byte offset, only the start of the record as a
target) into an integer, so are even less useful than on modern POSIX
filesystems.
Maybe time to deprecate ftell()/fseek() and recommend
fgetpos()/fsetpos() as the preferred alternative, allowing me to find
my way back to some point in my PB sized files?

--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

Brian....@CSi.com (Brian dot Inglis at SystematicSw dot ab dot ca)
fake address use address above to reply

Paul Eggert

unread,
Jul 9, 2004, 5:08:22 PM7/9/04
to
At Fri, 9 Jul 2004 13:54:27 GMT, "Douglas A. Gwyn" <DAG...@null.net> writes:

> on systems that have fseeko, the type used as offset by fsetpos very
> likely is actually the same as for fseeko and fsetpos will probably
> work for setting current file position to an arbitrary computed byte
> offset.

Yes, that's common, but your proposed solution would make it a pain to
set the current file position to an arbitrary offset relative to the
current location, or relative to the end. fseeko lets you do that.

There is also a hassle due to the optional mbstate_t component
accessed by fgetpos. Most people don't use fgetpos, but if they did
they'd probably be running into that hassle, too, and having to deal
with narrow and wide streams, etc. fseeko avoids that hassle too.

Douglas A. Gwyn

unread,
Jul 10, 2004, 4:18:12 AM7/10/04
to
Wojtek Lerch wrote:
> Do you mean POSIX should have specified the exact
> contents of fpos_t?

I'm saying that it *could* have.

Paul Eggert

unread,
Jul 10, 2004, 1:14:24 PM7/10/04
to

It's unlikely that they'd all agree on the exact contents. More
likely, they'd agree to specify some of the contents, and leave the
rest unspecified. For example, Solaris 9 fpos_t does not have an
mbstate_t component, whereas GNU/Linux fpos_t has it.

While we're on the subject, doesn't the C standard require that fpos_t
have the equivalent of an mbstate_t component if the system (like
Solaris 9) supports multibyte encodings? If so, it appears that the
Solaris 9 multibyte environment doesn't conform to the C Standard with
respect to fsetpos. Hardly anybody uses fsetpos (most programs use
lseek, fseek, or fseeko) so this isn't a high-priority item to fix,
but I suppose Sun should fix it at some point.

Douglas A. Gwyn

unread,
Jul 11, 2004, 10:49:21 AM7/11/04
to
Paul Eggert wrote:
> While we're on the subject, doesn't the C standard require that fpos_t
> have the equivalent of an mbstate_t component if the system (like
> Solaris 9) supports multibyte encodings?

It didn't originally, but it does now. Note that that
makes fsetpos more generally useful than fseek(o) on
shift-state supporting platforms.

lawrenc...@ugsplm.com

unread,
Jul 11, 2004, 6:06:39 PM7/11/04
to
Paul Eggert <egg...@twinsun.com> wrote:
>
>
> While we're on the subject, doesn't the C standard require that fpos_t
> have the equivalent of an mbstate_t component if the system (like
> Solaris 9) supports multibyte encodings?

Only if the supported multibyte encodings have shift states.

-Larry Jones

In my opinion, we don't devote nearly enough scientific research
to finding a cure for jerks. -- Calvin

Paul Eggert

unread,
Jul 12, 2004, 12:03:26 AM7/12/04
to
At Sun, 11 Jul 2004 22:06:39 GMT, lawrenc...@ugsplm.com writes:

> Paul Eggert <egg...@twinsun.com> wrote:
>
>> While we're on the subject, doesn't the C standard require that fpos_t
>> have the equivalent of an mbstate_t component if the system (like
>> Solaris 9) supports multibyte encodings?
>
> Only if the supported multibyte encodings have shift states.

Ah, that explains it; thanks. Apparently Solaris 9 doesn't support
such encodings in the standard C library. (Not that it has to; it
doesn't claim C99 conformance, but I thought it might be an issue for
Solaris 10 whenever it comes out.) I guess that I was confused by the
fact that there is some support for such encodings in the Solaris 9
iconv library but that's a different matter.

Douglas A. Gwyn

unread,
Jul 12, 2004, 3:56:13 AM7/12/04
to
Paul Eggert wrote:
> Ah, that explains it; thanks. Apparently Solaris 9 doesn't support
> such encodings in the standard C library. (Not that it has to; it
> doesn't claim C99 conformance, but I thought it might be an issue for
> Solaris 10 whenever it comes out.)

Or perhaps Solaris doesn't claim to support shift encodings
in its conforming mode. The C standard leaves it up to the
implementation what set of multibyte encodings it supports.
If an implementation doesn't support shift encodings then
it need not have an mbstate_t in its fpos_t.

David R Tribble

unread,
Jul 12, 2004, 7:49:49 PM7/12/04
to
Wojtek Lerch wrote:
>> Do you mean POSIX should have specified the exact
>> contents of fpos_t?

Douglas A. Gwyn wrote:
> I'm saying that it *could* have.

The ISO C definition of fpos_t is fairly lame. The only operations you can
safely do with it are:

- Save the current file location: fgetpos().
- Restore a previously saved file location: fsetpos().

You can't do any kind of arithmetic with an fpos_t value like you can with
an ftell() value, because fpos_t has no defined properties other than
storing a file location.

Conceivably, you can combine the fpos_t operations with fseek(SEEK_CUR)
operations so that you could advance the file pointer in blocks less than
2GB in size, and eventually make your way to the location in the file that
you want to reach (assuming that the location is >4GB bytes from the
beginning of the file), but this seems like a kludgey way to do things,
especially if fseek() incurs real I/O overhead.

I tried suggesting, many years ago, adding the ability to add arbitrary
signed increments to a fpos_t value, as well as the ability to compare
fpos_t values, but there did not seem to be any interest.

int fcmppos(const fpos_t *loc1, const fpos_t *loc2);

int faddpos(fpos_t *loc, long long int inc);
-or-
int faddpos(fpos_t *loc, long int nIncrs, size_t incrSize);

-drt

Douglas A. Gwyn

unread,
Jul 13, 2004, 3:30:42 AM7/13/04
to
David R Tribble wrote:
> You can't do any kind of arithmetic with an fpos_t value ...

You could have under POSIX, if it had been so specified!
The C standard allows such additional requirements to be
imposed by implementations.

Harti Brandt

unread,
Jul 13, 2004, 5:03:22 AM7/13/04
to
On Fri, 9 Jul 2004, Douglas A. Gwyn wrote:

DAG>Harti Brandt wrote:
DAG>> Well, for some applications fseeko/ftello can do what f[gs]etpos() cannot.
DAG>> With fsetpos you can only seek to a position in a file you have been
DAG>> before. With fseeko you can do whatever magic you could do with fseek.
DAG>> For example create files with holes (at least on unix-like systems) or
DAG>> efficiently seek around in fixed-length record files.
DAG>
DAG>However, on systems that have fseeko, the type used as offset
DAG>by fsetpos very likely is actually the same as for fseeko and
DAG>fsetpos will probably work for setting current file position
DAG>to an arbitrary computed byte offset. POSIX could have
DAG>mandated this behavior for fsetpos rather than adding a new
DAG>function.

I suppose it wouldn't be easy to correctly handle multibyte character sets
in this case with fsetpos. At least on my FreeBSD system the specification
of fseek() and fseeko() require that the resulting file position points to
the first byte of a multibyte sequence (given a multibyte character set).

I don't know enough about handling of multibyte charactersets, but given
that all the read/write functions operate on complete multibyte characters
I see no reason why fpos_t couldn't be off_t. (In fact it is on FreeBSD).

There are also file systems that don't see files as byte streams. In this
case the C library must provide the byte (or character) stream abstraction
based on what is available. This may not be possible without additional
information in an fpos_t. An example would be text files under VMS and
RSX. These files are variable record files with the record length prefixed
to the actual record and the record length padded to an even number of
bytes. The file system allows you to seek to the start of the record only.
In this case fpos_t would probably contain the record number and the
offset into that record or the block number, the block offset and the
record offset (don't remember what parameters the seek required).
Implementing fseeko() on such systems is a hairy task in may require
scanning the file to find the needed record.

harti

Douglas A. Gwyn

unread,
Jul 13, 2004, 6:29:42 AM7/13/04
to
Harti Brandt wrote:
> On Fri, 9 Jul 2004, Douglas A. Gwyn wrote:
> I suppose it wouldn't be easy to correctly handle multibyte character sets
> in this case with fsetpos. At least on my FreeBSD system the specification
> of fseek() and fseeko() require that the resulting file position points to
> the first byte of a multibyte sequence (given a multibyte character set).

More likely, the behavior is undefined if you seek to a
location within a multibyte sequence on a text stream.
But that is not a problem if you got the file offset
from a text stream opened in "wide" mode, because it
will have been picking up the multibyte sequences
intact.

The real problem occurs only when encodings involve
shift states. In that case not only does the file
offset need to be between multibyte sequences but also
the stream's internal record of the shift state needs
to be restored.

Platforms that don't support shift encodings don't
need to have an mbstate_t in their fpos_t cookie.

> I don't know enough about handling of multibyte charactersets, but given
> that all the read/write functions operate on complete multibyte characters
> I see no reason why fpos_t couldn't be off_t. (In fact it is on FreeBSD).

If shift encodings exist on the platform, then either
the program needs to ensure that seeking is done to an
offset where the stream is in an "initial" shift state
or else the shift state needs to be restored (which
implies something like fpos_t containing mbstate_t).

> There are also file systems that don't see files as byte streams. In this
> case the C library must provide the byte (or character) stream abstraction
> based on what is available. This may not be possible without additional
> information in an fpos_t. An example would be text files under VMS and
> RSX. These files are variable record files with the record length prefixed
> to the actual record and the record length padded to an even number of
> bytes. The file system allows you to seek to the start of the record only.
> In this case fpos_t would probably contain the record number and the
> offset into that record or the block number, the block offset and the
> record offset (don't remember what parameters the seek required).

Yes, but POSIX explicitly requires files to be byte
arrays with no additional structure (somewhat over-
simplified, but you get the drift), so POSIX would
not have to be concerned with that extra baggage in
a file-offset object.

> Implementing fseeko() on such systems is a hairy task in may require
> scanning the file to find the needed record.

Same for fseek() in such an environment.

Harti Brandt

unread,
Jul 13, 2004, 9:36:46 AM7/13/04
to
On Tue, 13 Jul 2004, Douglas A. Gwyn wrote:

DAG>Harti Brandt wrote:
DAG>> On Fri, 9 Jul 2004, Douglas A. Gwyn wrote:
DAG>> I suppose it wouldn't be easy to correctly handle multibyte character sets
DAG>> in this case with fsetpos. At least on my FreeBSD system the specification
DAG>> of fseek() and fseeko() require that the resulting file position points to
DAG>> the first byte of a multibyte sequence (given a multibyte character set).
DAG>
DAG>More likely, the behavior is undefined if you seek to a
DAG>location within a multibyte sequence on a text stream.
DAG>But that is not a problem if you got the file offset
DAG>from a text stream opened in "wide" mode, because it
DAG>will have been picking up the multibyte sequences
DAG>intact.
DAG>
DAG>The real problem occurs only when encodings involve
DAG>shift states. In that case not only does the file
DAG>offset need to be between multibyte sequences but also
DAG>the stream's internal record of the shift state needs
DAG>to be restored.
DAG>
DAG>Platforms that don't support shift encodings don't
DAG>need to have an mbstate_t in their fpos_t cookie.
DAG>
DAG>> I don't know enough about handling of multibyte charactersets, but given
DAG>> that all the read/write functions operate on complete multibyte characters
DAG>> I see no reason why fpos_t couldn't be off_t. (In fact it is on FreeBSD).
DAG>
DAG>If shift encodings exist on the platform, then either
DAG>the program needs to ensure that seeking is done to an
DAG>offset where the stream is in an "initial" shift state
DAG>or else the shift state needs to be restored (which
DAG>implies something like fpos_t containing mbstate_t).
DAG>
DAG>> There are also file systems that don't see files as byte streams. In this
DAG>> case the C library must provide the byte (or character) stream abstraction
DAG>> based on what is available. This may not be possible without additional
DAG>> information in an fpos_t. An example would be text files under VMS and
DAG>> RSX. These files are variable record files with the record length prefixed
DAG>> to the actual record and the record length padded to an even number of
DAG>> bytes. The file system allows you to seek to the start of the record only.
DAG>> In this case fpos_t would probably contain the record number and the
DAG>> offset into that record or the block number, the block offset and the
DAG>> record offset (don't remember what parameters the seek required).
DAG>
DAG>Yes, but POSIX explicitly requires files to be byte
DAG>arrays with no additional structure (somewhat over-
DAG>simplified, but you get the drift), so POSIX would
DAG>not have to be concerned with that extra baggage in
DAG>a file-offset object.

Does that byte stream abstraction refer to the actual physical format of
the file or just to the abstraction seen by the program?

DAG>
DAG>> Implementing fseeko() on such systems is a hairy task in may require
DAG>> scanning the file to find the needed record.
DAG>
DAG>Same for fseek() in such an environment.

Sure.

harti

Eric Sosman

unread,
Jul 13, 2004, 11:46:20 AM7/13/04
to

Try to formulate a precise definition of "actual physical
format," and you will discover the answer for yourself.

--
Eric....@sun.com

Paul Eggert

unread,
Jul 13, 2004, 5:35:59 PM7/13/04
to
At Tue, 13 Jul 2004 06:29:42 -0400, "Douglas A. Gwyn" <DAG...@null.net> writes:

> The real problem occurs only when encodings involve
> shift states.

I don't see why one doesn't have real problems even with encodings
like Shift-JIS that don't have shift states.

In ordinary ASCII, I can write an application like the Unix "look"
command that seeks to an arbitrary byte position in a text file, and
then starts scanning, looking for text.

I can do that sort of thing with UTF-8, too, since the program can
tell by looking at the first byte whether it's in the middle of a
character. But I can't do that with Shift-JIS: if the first byte seen
is an ASCII backslash, the program doesn't know whether it's actually
a backslash or the second byte in a Shift-JIS character.

Even if POSIX had added primitives for adding numeric offsets to
fpos_t values, I don't see how it could have gotten around this problem.
This may help to explain why the POSIX folks went with fseeko instead.

jacob navia

unread,
Jul 13, 2004, 8:37:57 PM7/13/04
to
The seek operation implies an array-like structure in the
underlying file.

Suppose a file containing
struct s {
double data[50];
};

You know at each seek at which position within
the record you are.

Now suppose:

struct s1 {
int length;
double data[];
};

The file organization here is that of a *list* not of an array.
This is a widely used file organization with many variants.

In this second kind of files the seek operation is just not
meaningful. The data layout implies a sequential scan of
the records.

The "shift state" problem is part of a more general problem.

Character records could imply a sequential scan if they
assume a sequential scan in the data encoding. For the
case you mention it suffices to read always the character
at the position and the character just before it to know if the
backslash has a meaning or not. In the general case this is not
possible. In a file organized as a list of "s1" structures
above a seek operation is an error.

The standard function seek should not care about situations
where the usage of the function is not meaningful because
of the underlying file structure. Its task is to give a
basic tool with which you can *build* file structures.


Douglas A. Gwyn

unread,
Jul 14, 2004, 12:51:48 AM7/14/04
to
Paul Eggert wrote:
> I don't see why one doesn't have real problems even with encodings
> like Shift-JIS that don't have shift states.

Because for a text stream you would not normally be
seeking to an arbitrary byte location, but only to
one already known to be between m.b. sequences.

> In ordinary ASCII, I can write an application like the Unix "look"
> command that seeks to an arbitrary byte position in a text file, and
> then starts scanning, looking for text.

You can, but you should expect to lose some text near
the start of the scan.

> I can do that sort of thing with UTF-8, too, since the program can
> tell by looking at the first byte whether it's in the middle of a
> character. But I can't do that with Shift-JIS: if the first byte seen
> is an ASCII backslash, the program doesn't know whether it's actually
> a backslash or the second byte in a Shift-JIS character.

Ditto. The scan will synchronize quickly, in many
cases. But the real question is why you would do this.

> Even if POSIX had added primitives for adding numeric offsets to
> fpos_t values, I don't see how it could have gotten around this problem.
> This may help to explain why the POSIX folks went with fseeko instead.

How does fseeko (which is just fseek with a possibly
wider integer type for the offset) solve the problem,
when you insist on seeking into the middle of a m.b.
sequence? You still have a synchronization problem.

Also note that I never suggested that POSIX should have
added primitives for arithmetic with fpos_t. I said
that POSIX *could* have required fpos_t to be an integer
type representing offset as number-of-bytes, in which
case no new primitives are needed.

Wojtek Lerch

unread,
Jul 14, 2004, 1:11:23 AM7/14/04
to
"Douglas A. Gwyn" <DAG...@null.net> wrote in message
news:WuCdnSQnL8x...@comcast.com...

> Also note that I never suggested that POSIX should have
> added primitives for arithmetic with fpos_t. I said
> that POSIX *could* have required fpos_t to be an integer
> type representing offset as number-of-bytes, in which
> case no new primitives are needed.

You did? I thought you were suggesting a structure consisting of an integer
and an mbstate_t. POSIX doesn't forbid state-dependent encodings, does it?


Douglas A. Gwyn

unread,
Jul 14, 2004, 5:24:43 AM7/14/04
to
Wojtek Lerch wrote:
> You did? I thought you were suggesting a structure consisting of an integer
> and an mbstate_t. POSIX doesn't forbid state-dependent encodings, does it?

Not that I am aware, but it could have, or more likely
it could have required that seeking into a m.b. text
stream be done only at a position where the stream is
in initial shift state. fseeko doesn't set the shift
state properly anyway.

Since you may have gotten lost as this thread evolved:
It started with questions about how to provide wider
tell/seek functionality. The ftello/fseeko functions
(POSIX) were mentioned as a model, also fgetpos/fsetpos.
There was a complaint that arithmetic could not be done
on fsetpos cookies, and I remarked that POSIX had been
free to require that they be integer types. Then it
was noted that the C standard requires fsetpos to
restore the shift state as well as file offset, and
in further discussion it was noted that many platforms
don't support shift encodings anyway, and that on such
platforms it would still be reasonable to make fpos_t
an integer type.

The main reason fpos_t was specified as a cookie rather
than definitely an integer type for C89 was that there
weren't suitable integer types available (in many cases).
The situation is different for C99, where extended
integer types are allowed (and there is a standard type
guaranteed to be at least 64 bits wide).

It was also possible that the flexibility of the cookie
would be useful on platforms with record managers, such
as VMS. (However, they still had to make fseek work
with byte offset on binary streams.)

When the multibyte extensions were grafted on in 1994,
shift state entered the picture. Somewhere along the
way WG14 has decided that since fpos_t was a cookie
anyway, it should be used to contain the shift state
(for a wide-oriented text stream) in addition to the
original file position. We probably would have come up
with something else had POSIX previously specified that
fpos_t would be an (extended) integer type.

Antoine Leca

unread,
Jul 14, 2004, 9:36:35 AM7/14/04
to
En 7wwu19e...@sic.twinsun.com, Paul Eggert va escriure:

>> Only if the supported multibyte encodings have shift states.
>
> Ah, that explains it; thanks. Apparently Solaris 9 doesn't support
> such encodings in the standard C library. (Not that it has to;

Agreed so far.

> it doesn't claim C99 conformance, but I thought it might be an issue
> for Solaris 10 whenever it comes out.)

As far as I know, C99 has no further _support_ requirements with respect to
multibyte encodings that preceding versions had not.

That is, Solaris 25 can perfectly continue to "ignore" encodings with
shift-states at the general level of the standard C library, and defer their
support to specialized tools (such as iconv, obviously.)


Or am I missing something here?


Antoine

Wojtek Lerch

unread,
Jul 14, 2004, 10:45:27 AM7/14/04
to
Douglas A. Gwyn wrote:
> When the multibyte extensions were grafted on in 1994,
> shift state entered the picture. Somewhere along the
> way WG14 has decided that since fpos_t was a cookie
> anyway, it should be used to contain the shift state
> (for a wide-oriented text stream) in addition to the
> original file position. We probably would have come up
> with something else had POSIX previously specified that
> fpos_t would be an (extended) integer type.

Ah, I get it now: you meant POSIX could have made its pos_t an integer
type *before* the mbstate_t stuff was added to C. What I thought you
meant was that POSIX could have mandated pos_t as a structure containing
an off_t and possibly some other stuff, to allow fsetpos() (defined as
we know it) to be used with computed file offsets.

If POSIX had said that pos_t is an integer type, it would force C to
invent a new interface instead of adding the shift state handling to
fsetpos(), and the final situation would be similar to what we have now
-- fsetpos() would be what fseeko() is now, and the new interface would
probably resemble the fsetpos() we know. The main difference would be
that both functions would be defined in the C standard. Have I missed
something?

Paul Eggert

unread,
Jul 18, 2004, 2:49:56 AM7/18/04
to
At Wed, 14 Jul 2004 02:37:57 +0200, "jacob navia" <ja...@jacob.remcomp.fr> writes:

> For the case you mention it suffices to read always the character at
> the position and the character just before it to know if the
> backslash has a meaning or not.

No it doesn't. Shift-JIS doesn't have that property. It's possible
for a valid backslash B to be preceded by a byte A (the second half of
the precending character) such that A-followed-by-B is also a valid
character.

Paul Eggert

unread,
Jul 18, 2004, 3:03:02 AM7/18/04
to
At Wed, 14 Jul 2004 00:51:48 -0400, "Douglas A. Gwyn" <DAG...@null.net> writes:

> Paul Eggert wrote:
>> In ordinary ASCII, I can write an application like the Unix "look"
>> command that seeks to an arbitrary byte position in a text file, and
>> then starts scanning, looking for text.
>
> You can, but you should expect to lose some text near
> the start of the scan.

Only if you *want* to lose the text. In ASCII, it's a relatively
simple matter to scan backwards to that line's start, or forwards to
the next line's start, whichever you prefer. This is also true for
UTF-8.

> How does fseeko (which is just fseek with a possibly
> wider integer type for the offset) solve the problem,
> when you insist on seeking into the middle of a m.b.
> sequence? You still have a synchronization problem.

Yes: there's no magic here. Any application using fseeko to position
to arbitrary positions within text files has to know what it's doing,
and be prepared to deal with any resulting glitches. But the behavior
is well defined, which is enough to write useful applications. The
practical glitches are no worse than the usual mess when trying to
read multibyte text (which can contain encoding errors, etc.).

Paul Eggert

unread,
Jul 18, 2004, 3:08:03 AM7/18/04
to
At Wed, 14 Jul 2004 15:36:35 +0200, "Antoine Leca" <ro...@localhost.gov> writes:

> As far as I know, C99 has no further _support_ requirements with respect to
> multibyte encodings that preceding versions had not.

Yes, you're correct: a C99 host need not support multibyte encodings
at all. Or it could support them in some contexts but not others.

The same is true for POSIX, but interestingly enough it's not true for
GNU/Linux: the LSB requires multibyte encodings.

Ross Ridge

unread,
Jul 18, 2004, 12:14:01 PM7/18/04
to
"jacob navia" <ja...@jacob.remcomp.fr> writes:
> For the case you mention it suffices to read always the character at
> the position and the character just before it to know if the
> backslash has a meaning or not.

Paul Eggert <egg...@twinsun.com> wrote:
>No it doesn't. Shift-JIS doesn't have that property. It's possible
>for a valid backslash B to be preceded by a byte A (the second half of
>the precending character) such that A-followed-by-B is also a valid
>character.

For that matter, with EUC-JP or ISO-2202-JP you can't always tell if
you're looking at the first or second byte of a multi-byte character by
looking at the previous byte.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rri...@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/u/rridge/
db //

0 new messages