Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

fseek/ftell for files bigger than 2^32

820 views
Skip to first unread message

jacob navia

unread,
Jun 10, 2015, 4:27:59 PM6/10/15
to
The lcc-win compiler uses long long for fseek/ftell and thus supports
files bigger than 2Gig... commonplace nowadays.

Could it be possible to change the prototype of those functions in the
standard?

This would be a compatible change since any int can be extended to long
long and working code would not change...

The situation now is impossible with POSIX adding fseeko and ftello and
other vendors using various other names (fseek64 and ftell64 being also
used)

Updating those functions would be the easiest way out.

Files bigger than 2Gbytes are really nothing special with disk drives
making several TERA bytes!

Thanks for your attention

jacob

Kaz Kylheku

unread,
Jun 10, 2015, 6:44:17 PM6/10/15
to
On 2015-06-10, jacob navia <ja...@jacob.remcomp.fr> wrote:
> The lcc-win compiler uses long long for fseek/ftell and thus supports
> files bigger than 2Gig... commonplace nowadays.

These functions can support files bigger than 2Gb; it just requires
multiple calls.

> Could it be possible to change the prototype of those functions in the
> standard?
>
> This would be a compatible change since any int can be extended to long
> long and working code would not change...

This breaks:

int (*seek_op)(FILE *, long, int) = fseek; // error

If there is a stinky cast, it breaks silently:

int (*seek_op)(void *stream, long, int) = (int (*)(void *, long, int)) fseek;

Kaz Kylheku

unread,
Jun 10, 2015, 7:21:00 PM6/10/15
to
Which is not to say I want to be a naysayer. In POSIX, transitions from raw
types like int have been managed. for instance, the accept function
used to have an "int *" parameter, and it was transitioned to "socklen_t *".

Transitioning the long parameter of fseek and ftell to a typedef should have
been done in C99, if not earlier.

The typedef lets users and implementors stage their own obsolescence schedule:
choose the point in time at which they break with backward compatibility and
target the typedef away from long to some other type.

Casper H.S. Dik

unread,
Jun 11, 2015, 3:31:51 AM6/11/15
to
jacob navia <ja...@jacob.remcomp.fr> writes:

>The lcc-win compiler uses long long for fseek/ftell and thus supports
>files bigger than 2Gig... commonplace nowadays.

Clearly not standard compliant as the standard defines is as:

int fseek(FILE *stream, long int offset, int whence);

The posix standard adds the following function (and similar):

int fseeko(FILE *stream, off_t offset, int whence);

off_t is the type which big enough for all file positions;
on 32 bit systems it can 32 but also 64 (might depend on the
compilation environment) Many 32-bit posix systems have added
a "large file compilation environment" in the 90's.

>Could it be possible to change the prototype of those functions in the
>standard?

No, generally it is not possible to change the prototype to use
incompatible types.

As this problem already existed 20 years ago and was fixed in that time
without modifying the prototype, I guess I am not alone in that opinion.

>This would be a compatible change since any int can be extended to long
>long and working code would not change...

Actually, it wouldn't. You are forgetting binaries compiled with
the old headers.

>The situation now is impossible with POSIX adding fseeko and ftello and
>other vendors using various other names (fseek64 and ftell64 being also
>used)

Which particular systems do you have where this is an issue?

Most 64-bit systems use a 64 bit long and they do not have this
issue; even phones now have > 4GB of memory and disks and now
are getting 64 bit CPUs.

>Updating those functions would be the easiest way out.

Clearly not because it break compatibility. And what type would
you use? What type does the standard have which is at least 64 bits?

Casper

Casper H.S. Dik

unread,
Jun 11, 2015, 3:34:03 AM6/11/15
to
Kaz Kylheku <k...@kylheku.com> writes:

>Which is not to say I want to be a naysayer. In POSIX, transitions from raw
>types like int have been managed. for instance, the accept function
>used to have an "int *" parameter, and it was transitioned to "socklen_t *".

But generally socklen_t has the same size of int (it is generally defined
as int)

>Transitioning the long parameter of fseek and ftell to a typedef should have
>been done in C99, if not earlier.

It is not a compatible change; it does not allow for binary compatibility.

Casper

Richard Kettlewell

unread,
Jun 11, 2015, 4:47:26 AM6/11/15
to
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
> jacob navia <ja...@jacob.remcomp.fr> writes:
>> The situation now is impossible with POSIX adding fseeko and ftello
>> and other vendors using various other names (fseek64 and ftell64
>> being also used)
>
> Which particular systems do you have where this is an issue?

Windows uses _fseeki64/_ftelli64, with an __int64 offset
parameter/return.

The obvious course of action would be for the next C standard to adopt
off_t and fseeko/ftello from POSIX.

--
http://www.greenend.org.uk/rjk/

Casper H.S. Dik

unread,
Jun 11, 2015, 5:11:08 AM6/11/15
to
Richard Kettlewell <r...@greenend.org.uk> writes:

>Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
>> jacob navia <ja...@jacob.remcomp.fr> writes:
>>> The situation now is impossible with POSIX adding fseeko and ftello
>>> and other vendors using various other names (fseek64 and ftell64
>>> being also used)
>>
>> Which particular systems do you have where this is an issue?

>Windows uses _fseeki64/_ftelli64, with an __int64 offset
>parameter/return.

I think at Microsoft they made a stupid decision to make long 32
bit in 64 bit Windows. Originally at Sun we did nearly made the
same mistake (we're talking about 20 years ago) as a large
number of people felt that having int and long having the same
size would make porting easier. But as Sun wasn't the first, the
Unix industry voted for long being 64 bit and Sun followed that lead
when the first 64 bit Solaris came out. Of course, HAL build the
first 64 bit Solaris version a number of years earlier and they
used Sun/SPARC's first draft V9 ABI (sizeof (int) == sizeof (long))
around 1995. Sun release its first 64 bit CPU around the same time
but did not shipped a 64 bit OS until Solaris 7 (end of 1998)


For windows fseek/ftell's definition is an issue but that is a problem
created at Microsoft, at least for 64 bit Windows.

>The obvious course of action would be for the next C standard to adopt
>off_t and fseeko/ftello from POSIX.

Right.

Casper

Richard Kettlewell

unread,
Jun 11, 2015, 5:32:29 AM6/11/15
to
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
> Richard Kettlewell <r...@greenend.org.uk> writes:
>>Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
>>> jacob navia <ja...@jacob.remcomp.fr> writes:
>>>> The situation now is impossible with POSIX adding fseeko and ftello
>>>> and other vendors using various other names (fseek64 and ftell64
>>>> being also used)
>>>
>>> Which particular systems do you have where this is an issue?
>
>>Windows uses _fseeki64/_ftelli64, with an __int64 offset
>>parameter/return.
>
> I think at Microsoft they made a stupid decision to make long 32
> bit in 64 bit Windows. Originally at Sun we did nearly made the
> same mistake (we're talking about 20 years ago) as a large
> number of people felt that having int and long having the same
> size would make porting easier. But as Sun wasn't the first, the
> Unix industry voted for long being 64 bit and Sun followed that lead
> when the first 64 bit Solaris came out. Of course, HAL build the
> first 64 bit Solaris version a number of years earlier and they
> used Sun/SPARC's first draft V9 ABI (sizeof (int) == sizeof (long))
> around 1995. Sun release its first 64 bit CPU around the same time
> but did not shipped a 64 bit OS until Solaris 7 (end of 1998)
>
>
> For windows fseek/ftell's definition is an issue but that is a problem
> created at Microsoft, at least for 64 bit Windows.

I don’t think the size of ‘long’ is the key factor here; even if they’d
made long be 64-bits in their 64-bit ABI, there’d still be their 32-bit
ABI to consider.

The POSIX world has mostly addressed this by having (at least) two
32-bit ABIs, distinguished by the size of off_t. If (hypothetically)
Microsoft adopted off_t now, either unilaterally or with ISO’s blessing,
they could make it 64 bits in both their ABIs, avoiding the trouble that
the dual ABIs cause in the POSIX world. Occasionally it’s best to be
late to the party.

--
http://www.greenend.org.uk/rjk/

Casper H.S. Dik

unread,
Jun 11, 2015, 5:53:24 AM6/11/15
to
Richard Kettlewell <r...@greenend.org.uk> writes:

>I don’t think the size of ‘long’ is the key factor here; even if they’d
>made long be 64-bits in their 64-bit ABI, there’d still be their 32-bit
>ABI to consider.

I'm not sure how much of the Windows market is 32 bit only; this
is likely now much smaller than the 64 bit capable market.

On the latter market there is no direct need for a 32-bit application
which can handle files over 2^31-1 bytes in size (I seem to remember
that FAT32 allowed for files upto 2^32-1 in size, more than fseek()
or ftell() would support.


>The POSIX world has mostly addressed this by having (at least) two
>32-bit ABIs, distinguished by the size of off_t. If (hypothetically)
>Microsoft adopted off_t now, either unilaterally or with ISO’s blessing,
>they could make it 64 bits in both their ABIs, avoiding the trouble that
>the dual ABIs cause in the POSIX world. Occasionally it’s best to be
>late to the party.

In Posix, generally both APIs can be used simultanously in the
same application.

The question is mostly how much future is there in 32 bit windows
and whether a new ABI is useful. Or do they have _ftell64 in 32 bit
Windows?


Casper

Richard Kettlewell

unread,
Jun 11, 2015, 6:21:22 AM6/11/15
to
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
> Richard Kettlewell <r...@greenend.org.uk> writes:

>>I don’t think the size of ‘long’ is the key factor here; even if
>>they’d made long be 64-bits in their 64-bit ABI, there’d still be
>>their 32-bit ABI to consider.
>
> I'm not sure how much of the Windows market is 32 bit only; this
> is likely now much smaller than the 64 bit capable market.
>
> On the latter market there is no direct need for a 32-bit application
> which can handle files over 2^31-1 bytes in size (I seem to remember
> that FAT32 allowed for files upto 2^32-1 in size, more than fseek()
> or ftell() would support.

The market for 32-bit general-purposes computers, as a whole, may well
be tiny by now. But there’s more to life than PCs and even on 64-bit
platforms sometimes there are still practical reasons for deploying
32-bit object code.

>>The POSIX world has mostly addressed this by having (at least) two
>>32-bit ABIs, distinguished by the size of off_t. If (hypothetically)
>>Microsoft adopted off_t now, either unilaterally or with ISO’s blessing,
>>they could make it 64 bits in both their ABIs, avoiding the trouble that
>>the dual ABIs cause in the POSIX world. Occasionally it’s best to be
>>late to the party.
>
> In Posix, generally both APIs can be used simultanously in the
> same application.

The problem is the two ABIs, with a B, not APIs.

> The question is mostly how much future is there in 32 bit windows
> and whether a new ABI is useful. Or do they have _ftell64 in 32 bit
> Windows?

Yes.

--
http://www.greenend.org.uk/rjk/

Casper H.S. Dik

unread,
Jun 11, 2015, 7:26:14 AM6/11/15
to
Richard Kettlewell <r...@greenend.org.uk> writes:

>> In Posix, generally both APIs can be used simultanously in the
>> same application.

>The problem is the two ABIs, with a B, not APIs.

It actually also works for the ABI in POSIX. You should not mix
them for the same file descriptors.

Casper

Richard Kettlewell

unread,
Jun 11, 2015, 8:10:34 AM6/11/15
to
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
If it worked Debian wouldn’t need separate inn2 and inn2-lfs packages
(to pick an example relevant to Usenet that came up recently elsewhere).

--
http://www.greenend.org.uk/rjk/

Casper H.S. Dik

unread,
Jun 11, 2015, 8:44:47 AM6/11/15
to
It certainly works on Solaris as of Solaris 2.6 (18 years ago) and
certainly can work but what happens in this case is that the offsets
are encoded in the files (or so it seems) but they also say:

"The old inn2-lfs package does not exist anymore and must be replaced by
the new functionally equivalent inn2 package, which supports large files."

The history database needs to be rebuild when changing from
a non-LFS inn2 to a LFS compiled inn2 version.

Casper

Richard Kettlewell

unread,
Jun 11, 2015, 9:09:25 AM6/11/15
to
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
> Richard Kettlewell <r...@greenend.org.uk> writes:
>>Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
>>> Richard Kettlewell <r...@greenend.org.uk> writes:
>>>>> In Posix, generally both APIs can be used simultanously in the
>>>>> same application.
>>>
>>>>The problem is the two ABIs, with a B, not APIs.
>>>
>>> It actually also works for the ABI in POSIX. You should not mix
>>> them for the same file descriptors.
>
>>If it worked Debian wouldn’t need separate inn2 and inn2-lfs packages
>>(to pick an example relevant to Usenet that came up recently elsewhere).
>
> It certainly works on Solaris as of Solaris 2.6 (18 years ago) and
> certainly can work but what happens in this case is that the offsets
> are encoded in the files (or so it seems) but they also say:

How does Solaris arrange that an application built with 32-bit off_t
works with a library with 64-bit off_t (and that uses off_t in its API),
or vica versa, then?

> "The old inn2-lfs package does not exist anymore and must be replaced by
> the new functionally equivalent inn2 package, which supports large files."
>
> The history database needs to be rebuild when changing from
> a non-LFS inn2 to a LFS compiled inn2 version.

That doesn’t mean the general problem’s gone away.

--
http://www.greenend.org.uk/rjk/

Keith Thompson

unread,
Jun 11, 2015, 11:05:43 AM6/11/15
to
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
> jacob navia <ja...@jacob.remcomp.fr> writes:
[...]
>>Updating those functions would be the easiest way out.
>
> Clearly not because it break compatibility. And what type would
> you use? What type does the standard have which is at least 64 bits?

long long.

But the type used should be a typedef, similar (or identical) to POSIX's
off_t.

If backward compatibility were not an issue, changing ftell and fseek to
use off_t rather than long would be the obvious solution. Instead, I'd
advocate having ISO C adopt POSIX's fseeko(), ftello(), and off_t.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Casper H.S. Dik

unread,
Jun 11, 2015, 11:13:23 AM6/11/15
to
Richard Kettlewell <r...@greenend.org.uk> writes:

>How does Solaris arrange that an application built with 32-bit off_t
>works with a library with 64-bit off_t (and that uses off_t in its API),
>or vica versa, then?

You will need to provide two interfaces and do some magically
things whether you are using a LFS compile environment or not.

The C library is one such example; and if an interface which
makes the off_t visible, you will need to do the same.

>> "The old inn2-lfs package does not exist anymore and must be replaced by
>> the new functionally equivalent inn2 package, which supports large files."
>>
>> The history database needs to be rebuild when changing from
>> a non-LFS inn2 to a LFS compiled inn2 version.

>That doesn’t mean the general problem’s gone away.

The problem was solved in the middle of the '90s; it needs to
be handled in all new ABIs/APIs. In the case of inn2 it seems to
encode the off_t inside the file so clearly it either needs to
abstract it or its database will only work with binaries compiled
in the way.

Casper

Richard Kettlewell

unread,
Jun 11, 2015, 11:52:24 AM6/11/15
to
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
> Richard Kettlewell <r...@greenend.org.uk> writes:
>>How does Solaris arrange that an application built with 32-bit off_t
>>works with a library with 64-bit off_t (and that uses off_t in its API),
>>or vica versa, then?
>
> You will need to provide two interfaces and do some magically
> things whether you are using a LFS compile environment or not.
>
> The C library is one such example; and if an interface which
> makes the off_t visible, you will need to do the same.

That’s not solving the problem, that’s leaving it to
developers/integrators to work around.

>>> "The old inn2-lfs package does not exist anymore and must be replaced by
>>> the new functionally equivalent inn2 package, which supports large files."
>>>
>>> The history database needs to be rebuild when changing from
>>> a non-LFS inn2 to a LFS compiled inn2 version.
>
>>That doesn’t mean the general problem’s gone away.
>
> The problem was solved in the middle of the '90s; it needs to
> be handled in all new ABIs/APIs. In the case of inn2 it seems to
> encode the off_t inside the file so clearly it either needs to
> abstract it or its database will only work with binaries compiled
> in the way.

The problem I’m talking about is the use of APIs that contain off_t, not
how it encodes its database.

--
http://www.greenend.org.uk/rjk/

Casper H.S. Dik

unread,
Jun 11, 2015, 11:58:05 AM6/11/15
to
Richard Kettlewell <r...@greenend.org.uk> writes:

>That’s not solving the problem, that’s leaving it to
>developers/integrators to work around.

That is not correct; whenever you have an ABI/API which uses
off_t you make two available. Under POSIX, you'd then compile
either with $(getconf LFS_CFLAGS) or not and depending on that
you either get one or the other.

It is a bit more work for API writers but generally the same
source code can be used to provide both interfaces. I.e., it
is hardly any additional work.

>The problem I’m talking about is the use of APIs that contain off_t, not
>how it encodes its database.

That was not the problem why there were two inn2 versions.

Casper

Richard Kettlewell

unread,
Jun 11, 2015, 12:14:14 PM6/11/15
to
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
> Richard Kettlewell <r...@greenend.org.uk> writes:
>> That’s not solving the problem, that’s leaving it to
>> developers/integrators to work around.
>
> That is not correct; whenever you have an ABI/API which uses
> off_t you make two available. Under POSIX, you'd then compile
> either with $(getconf LFS_CFLAGS) or not and depending on that
> you either get one or the other.
>
> It is a bit more work for API writers but generally the same
> source code can be used to provide both interfaces. I.e., it
> is hardly any additional work.

If “you” is not a developer or integrator, who is it?

>>The problem I‘m talking about is the use of APIs that contain off_t, not
>>how it encodes its database.
>
> That was not the problem why there were two inn2 versions.

The problem ***that I’m actually talking about*** is to do with the ABI
incompatibility.
https://sourceware.org/ml/libc-alpha/2014-03/msg00409.html mentions it.

--
http://www.greenend.org.uk/rjk/

Casper H.S. Dik

unread,
Jun 11, 2015, 12:56:08 PM6/11/15
to
Richard Kettlewell <r...@greenend.org.uk> writes:

>> It is a bit more work for API writers but generally the same
>> source code can be used to provide both interfaces. I.e., it
>> is hardly any additional work.

>If “you” is not a developer or integrator, who is it?

I suppose I'm one but I was not involved at that time.

>>>The problem I‘m talking about is the use of APIs that contain off_t, not
>>>how it encodes its database.
>>
>> That was not the problem why there were two inn2 versions.

>The problem ***that I’m actually talking about*** is to do with the ABI
>incompatibility.
>https://sourceware.org/ml/libc-alpha/2014-03/msg00409.html mentions it.

But that is *not* about APIs, it seems, but rather about
both file formats and hard coded assumptions on the size of off_t.

Casper

Richard Kettlewell

unread,
Jun 11, 2015, 1:01:07 PM6/11/15
to
Casper H.S. Dik <Caspe...@OrSPaMcle.COM> writes:
> But that is *not* about APIs, it seems, but rather about
> both file formats and hard coded assumptions on the size of off_t.

This is too much like pulling teeth, I’m giving up.

--
http://www.greenend.org.uk/rjk/
0 new messages