Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

fseek on large files

946 views
Skip to first unread message

Erik de Castro Lopo

unread,
Nov 10, 2001, 12:28:03 AM11/10/01
to
Hi all,

I'm using fseek on a platform where sizeof (long) == 4. This makes it
difficult to fseek past the 0x7FFFFFFF th byte of a file larger than
2 Gig.

I release that fsetpos and fgetpos are designed to get around this but
they have their own limitations.

To make matters worse, OpenBSD and some of the other *BSDs have redfined
fseek as:

int fseek( FILE *stream, off_t offset, int whence);

and made off_t a 64 bit value even on machines where long is 32 bits.

The code I'm working on works on Unix, Win32, MacOS, BeOS.

Questions :

1) Is there a portable replacement for fseek?
2) How do people get around this mess?

TIA,
Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo nos...@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+
"It's far too easy to make fun of Microsoft products, but it takes a
real man to make them work, and a god to make them do anything useful"
-- Anonymous

Jinho You

unread,
Nov 10, 2001, 12:51:44 AM11/10/01
to

Erik de Castro Lopo wrote:
>
> Hi all,
>
> I'm using fseek on a platform where sizeof (long) == 4. This makes it
> difficult to fseek past the 0x7FFFFFFF th byte of a file larger than
> 2 Gig.
>
> I release that fsetpos and fgetpos are designed to get around this but
> they have their own limitations.
>
> To make matters worse, OpenBSD and some of the other *BSDs have redfined
> fseek as:
>
> int fseek( FILE *stream, off_t offset, int whence);
>
> and made off_t a 64 bit value even on machines where long is 32 bits.
>
> The code I'm working on works on Unix, Win32, MacOS, BeOS.
>
> Questions :
>
> 1) Is there a portable replacement for fseek?
> 2) How do people get around this mess?

You can use open64() & lseek64() in *glibc*.

Rich Teer

unread,
Nov 10, 2001, 12:56:43 AM11/10/01
to
On Sat, 10 Nov 2001, Erik de Castro Lopo wrote:

> To make matters worse, OpenBSD and some of the other *BSDs have redfined
> fseek as:
>
> int fseek( FILE *stream, off_t offset, int whence);

That is the CORRECT definition of fseek. You're code is
broken if you're assuming that an off_t is the same size
as a long. The derived types were designed to get round
exactly this sort of problem.

You may need to compile your program (if your platform
supports it) to be large file aware.

> 1) Is there a portable replacement for fseek?

fseek IS portable.

> 2) How do people get around this mess?

They either upgrade to a 64-bit environment, or use a
large file compilation environment. The availability
of either of these varies on what platform you're running
on.

--
Rich Teer

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-online.net

Erik de Castro Lopo

unread,
Nov 10, 2001, 1:19:28 AM11/10/01
to

I don't think I'll find glibc on Solaris, HPUX, AIX, Win32, MacOS and BeOS.
At least one and probably most of these systems won't have it.

On top of that there is the little problem that if these are anything like
open/lseek/read etc, then the read64 version can return a short read when the
process receives a signal even if there is more data.

Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo nos...@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+

Q: How do you stop a Windows NT machine from crashing?
A: Shut it down and switch it off.

Erik de Castro Lopo

unread,
Nov 10, 2001, 1:26:24 AM11/10/01
to
Rich Teer wrote:
>
> On Sat, 10 Nov 2001, Erik de Castro Lopo wrote:
>
> > To make matters worse, OpenBSD and some of the other *BSDs have redfined
> > fseek as:
> >
> > int fseek( FILE *stream, off_t offset, int whence);
>
> That is the CORRECT definition of fseek.

On many systems I find offset defined as a long. Is this incorrect? When did it change?

> You're code is
> broken if you're assuming that an off_t is the same size
> as a long.

I'm not assuming that. In fact, my wish is that off_t is 64 bits on all
sytems even ones with 32 bit longs.

> The derived types were designed to get round
> exactly this sort of problem.
>
> You may need to compile your program (if your platform
> supports it) to be large file aware.
>
> > 1) Is there a portable replacement for fseek?
>
> fseek IS portable.

Portable maybe, but with different capabilities on differnt platforms. If off_t
is 64 bits on some platforms and 32 bits on other platforms fseek is not fully
portable with respect to files greater than 2 Gig in length.

> > 2) How do people get around this mess?
>
> They either upgrade to a 64-bit environment, or use a
> large file compilation environment. The availability
> of either of these varies on what platform you're running
> on.

Details?

Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo nos...@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+

The mouse has been moved. Windows must be restarted for the change
to take effect. Reboot now?

Peter Nilsson

unread,
Nov 10, 2001, 1:43:50 AM11/10/01
to
Erik de Castro Lopo wrote in message <3BECC87E...@mega-nerd.net>...

>Rich Teer wrote:
>>
>> On Sat, 10 Nov 2001, Erik de Castro Lopo wrote:
>>
>> > To make matters worse, OpenBSD and some of the other *BSDs have
redfined
>> > fseek as:
>> >
>> > int fseek( FILE *stream, off_t offset, int whence);
>>
>> That is the CORRECT definition of fseek.
>
>On many systems I find offset defined as a long. Is this incorrect? When
did it change?

C99 draft:

7.13.9.2 The fseek function

Synopsis
#include <stdio.h>
int fseek(FILE *stream, long int offset, int whence);

K&R2 as well.

--
Peter


Villy Kruse

unread,
Nov 10, 2001, 5:21:16 AM11/10/01
to


Is there anything in the standard about fseeko and/or fseeko64, or is
that a glibc2 extension on linux? fseeko takes off_t instead of long
for the offset argument.


Villy

Andrew Gierth

unread,
Nov 10, 2001, 5:20:52 AM11/10/01
to
>>>>> "Rich" == Rich Teer <ri...@rite-group.com> writes:

>> int fseek( FILE *stream, off_t offset, int whence);

Rich> That is the CORRECT definition of fseek.

Actually it's not. fseek() comes from the C standard (as I'm sure the
c.l.c crowd will point out at some stage) and I don't believe the C
standard even _has_ an off_t.

The SUSv2 (and v3 draft 7) defines fseek() as taking an offset of type
'long', and an additional function fseeko() taking an offset of type
'off_t'.

--
Andrew.

comp.unix.programmer FAQ: see <URL: http://www.erlenstar.demon.co.uk/unix/>
or <URL: http://www.whitefang.com/unix/>

Andrew Gierth

unread,
Nov 10, 2001, 5:24:20 AM11/10/01
to
>>>>> "Erik" == Erik de Castro Lopo <nos...@mega-nerd.net> writes:

Erik> To make matters worse, OpenBSD and some of the other *BSDs have
Erik> redfined fseek as:

Erik> int fseek( FILE *stream, off_t offset, int whence);

Which other *BSDs have done this? On FreeBSD-stable the definition of
fseek() is the standard one (using type 'long'), and an additional
function fseeko() exists which takes an offset of type off_t (as
specified by the relevent Unix standards).

Erik de Castro Lopo

unread,
Nov 10, 2001, 7:02:18 AM11/10/01
to
Andrew Gierth wrote:
>
> >>>>> "Erik" == Erik de Castro Lopo <nos...@mega-nerd.net> writes:
>
> Erik> To make matters worse, OpenBSD and some of the other *BSDs have
> Erik> redfined fseek as:
>
> Erik> int fseek( FILE *stream, off_t offset, int whence);
>
> Which other *BSDs have done this? On FreeBSD-stable the definition of
> fseek() is the standard one (using type 'long'), and an additional
> function fseeko() exists which takes an offset of type off_t (as
> specified by the relevent Unix standards).

I may be wrong about the others but OpenBSD does have this.


Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo nos...@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+

This is Linux country. On a quiet night, you can hear NT re-boot.

CBFalconer

unread,
Nov 10, 2001, 8:25:23 AM11/10/01
to
Rich Teer wrote:
>
> On Sat, 10 Nov 2001, Erik de Castro Lopo wrote:
>
> > To make matters worse, OpenBSD and some of the other *BSDs have redfined
> > fseek as:
> >
> > int fseek( FILE *stream, off_t offset, int whence);
>
> That is the CORRECT definition of fseek. You're code is
> broken if you're assuming that an off_t is the same size
> as a long. The derived types were designed to get round
> exactly this sort of problem.

Oh? From N869:

==========

7.19.9.2 The fseek function

Synopsis

[#1]

#include <stdio.h>
int fseek(FILE *stream, long int offset, int whence);

Description

[#2] The fseek function sets the file position indicator for
the stream pointed to by stream. If a read or write error
occurs, the error indicator for the stream is set and fseek
fails.

==============

(or is this another difference between N869 and the final std?)

--
Chuck F (cbfal...@yahoo.com) (cbfal...@XXXXworldnet.att.net)
Available for consulting/temporary embedded and systems.
(Remove "XXXX" from reply address. yahoo works unmodified)
mailto:u...@ftc.gov (for spambots to harvest)

Richard Heathfield

unread,
Nov 10, 2001, 8:50:42 AM11/10/01
to
CBFalconer wrote:
>
> Rich Teer wrote:
> >
> > On Sat, 10 Nov 2001, Erik de Castro Lopo wrote:
> >
> > > To make matters worse, OpenBSD and some of the other *BSDs have redfined
> > > fseek as:
> > >
> > > int fseek( FILE *stream, off_t offset, int whence);
> >
> > That is the CORRECT definition of fseek. You're code is
> > broken if you're assuming that an off_t is the same size
> > as a long. The derived types were designed to get round
> > exactly this sort of problem.
>
> Oh? From N869:
>
> ==========
>
> 7.19.9.2 The fseek function
>
> Synopsis
>
> [#1]
>
> #include <stdio.h>
> int fseek(FILE *stream, long int offset, int whence);
>
> Description
>
> [#2] The fseek function sets the file position indicator for
> the stream pointed to by stream. If a read or write error
> occurs, the error indicator for the stream is set and fseek
> fails.
>
> ==============
>
> (or is this another difference between N869 and the final std?)

No, your draft hasn't let you down this time. :-)

Here's the final Standard text:

7.19.9.2 The fseek function
Synopsis

1 #include <stdio.h>


int fseek(FILE *stream, long int offset, int whence);

Description
2 The fseek function sets the file position indicator for the stream


pointed to by stream.
If a read or write error occurs, the error indicator for the stream is
set and fseek fails.

Looks identical to me, especially wrt the offset arg.


--
Richard Heathfield : bin...@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton

Mark McIntyre

unread,
Nov 10, 2001, 9:02:49 AM11/10/01
to
On Sat, 10 Nov 2001 13:25:23 GMT, CBFalconer <cbfal...@yahoo.com>
wrote:

>Rich Teer wrote:
>>
>> On Sat, 10 Nov 2001, Erik de Castro Lopo wrote:
>>
>> > To make matters worse, OpenBSD and some of the other *BSDs have redfined
>> > fseek as:
>> >
>> > int fseek( FILE *stream, off_t offset, int whence);
>>
>> That is the CORRECT definition of fseek. You're code is
>> broken if you're assuming that an off_t is the same size
>> as a long. The derived types were designed to get round
>> exactly this sort of problem.
>
>Oh? From N869:
>

>7.19.9.2 The fseek function


> int fseek(FILE *stream, long int offset, int whence);

thats the same as in official C99. You could say that BSD is "broken"
but I suspect that off_t is typedef'ed to a long.


--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>

Lawrence Kirby

unread,
Nov 10, 2001, 10:01:43 AM11/10/01
to
In article <Pine.GSO.4.33.0111092152430.1844-100000@grover>
ri...@rite-group.com "Rich Teer" writes:

>On Sat, 10 Nov 2001, Erik de Castro Lopo wrote:
>
>> To make matters worse, OpenBSD and some of the other *BSDs have redfined
>> fseek as:
>>
>> int fseek( FILE *stream, off_t offset, int whence);
>
>That is the CORRECT definition of fseek.

Both C and POSIX require the 2nd argument to fseek() to have type long.
The form above is non-standard.

> You're code is
>broken if you're assuming that an off_t is the same size
>as a long. The derived types were designed to get round
>exactly this sort of problem.

Your code is broken if you assume that the 2nd argument to fseek() is
anything other than long.

>You may need to compile your program (if your platform
>supports it) to be large file aware.
>
>> 1) Is there a portable replacement for fseek?
>
>fseek IS portable.

Not to > 31 bit file offsets.

>> 2) How do people get around this mess?
>
>They either upgrade to a 64-bit environment, or use a
>large file compilation environment. The availability
>of either of these varies on what platform you're running
>on.

This won't change the fact that fseek() is limited to a 32 bit signed value
on systems where long is a 32 bit signed type. To make use of > 32bit
offets on systems that support then you have to resort of system-specific
extensions.

--
-----------------------------------------
Lawrence Kirby | fr...@genesis.demon.co.uk
Wilts, England | 7073...@compuserve.com
-----------------------------------------

Rudolf Polzer

unread,
Nov 10, 2001, 11:12:43 AM11/10/01
to
Lawrence Kirby <fr...@genesis.demon.co.uk> wrote:
[...]
(fseek with long)

> >> 2) How do people get around this mess?
> >
> >They either upgrade to a 64-bit environment, or use a
> >large file compilation environment. The availability
> >of either of these varies on what platform you're running
> >on.
>
> This won't change the fact that fseek() is limited to a 32 bit signed value
> on systems where long is a 32 bit signed type. To make use of > 32bit
> offets on systems that support then you have to resort of system-specific
> extensions.

Really? There's the workaround with calling fseek() multiple times
using SEEK_CUR. But I do _not_ know a workaround for ftell() :(

--
#!/usr/bin/perl -W -- WARNING: This copies a random file from
use strict;my$s;my$n=0;for # the current directory to your
(<*>){++$n;int rand$n or$s # signature file. Use at your
=$_};`cp $s ~/.signature`; # own risk! (c) 2001 Rudolf Polzer

Rich Teer

unread,
Nov 10, 2001, 1:41:44 PM11/10/01
to
On Sat, 10 Nov 2001, Erik de Castro Lopo wrote:

> On many systems I find offset defined as a long. Is this incorrect? When did it change?

Oops, my bad. That's what I get for reading man pages
too late!

> Details?

Since Solaris 7, Solaris has supported a 64-bit compilation
environment. (How you invoke it is compiler dependant.)
I think a lot of other Unices are still only 32-bit, in
which case you might be able to use the transitional
interfaces (fseek64, etc.).

Erik Max Francis

unread,
Nov 10, 2001, 1:51:32 PM11/10/01
to
CBFalconer wrote:

> #include <stdio.h>
> int fseek(FILE *stream, long int offset, int whence);

...


> (or is this another difference between N869 and the final std?)

Nope, it's the same prototype (that is, the offset argument being a
long) as in C89 and C99.

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/ \ Laws are silent in time of war.
\__/ Cicero
Esperanto reference / http://www.alcyone.com/max/lang/esperanto/
An Esperanto reference for English speakers.

Tor Rustad

unread,
Nov 10, 2001, 1:21:30 PM11/10/01
to
"Jinho You" <jh...@chonnam.chonnam.ac.kr> wrote in message

However, that isn't a portable replacement.

--
Tor <torust AT online DOT no>

Tor Rustad

unread,
Nov 10, 2001, 1:20:57 PM11/10/01
to
"Rich Teer" <ri...@rite-group.com> wrote in message

> On Sat, 10 Nov 2001, Erik de Castro Lopo wrote:
>
> > To make matters worse, OpenBSD and some of the other *BSDs have redfined
> > fseek as:
> >
> > int fseek( FILE *stream, off_t offset, int whence);
>
> That is the CORRECT definition of fseek.

Wrong. The C89/C99 interface is

int fseek(FILE *stream, long int offset, int whence);

Type off_t isn't even defined in C.

Tor Rustad

unread,
Nov 10, 2001, 2:20:21 PM11/10/01
to
"Villy Kruse" <v...@pharmnl.ohout.pharmapartners.nl> wrote in message

<snip>



> Is there anything in the standard about fseeko and/or fseeko64, or is
> that a glibc2 extension on linux? fseeko takes off_t instead of long
> for the offset argument.

fseeko and fseeko64 isn't provided by the standard C library.

Erik de Castro Lopo

unread,
Nov 10, 2001, 8:42:11 PM11/10/01
to
Hi all,

In my original post on this matter I stated :

> To make matters worse, OpenBSD and some of the other *BSDs have redfined
> fseek as:
>
> int fseek( FILE *stream, off_t offset, int whence);

This is not actually correct. I now know that OpenBSD defines the offset
parameter as being of type long as per the ISO C standard. I doubt any of
the other BSD flavours do anything different.

However, because the ISO C standard defines fseek () as follows:

int fseek( FILE *stream, long offset, int whence);

this function will behave differently on platforms with 32 and 64 bit longs
on files longer than 2 gigabytes.

Erik
--
+-----------------------------------------------------------+
Erik de Castro Lopo nos...@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+

"If you need a piece of paper and a pen to explain it,
then its not bleedin' obvious" -- Erik's First Law

Casper H.S. Dik - Network Security Engineer

unread,
Nov 10, 2001, 7:44:55 AM11/10/01
to
[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]

Erik de Castro Lopo <nos...@mega-nerd.net> writes:

>To make matters worse, OpenBSD and some of the other *BSDs have redfined
>fseek as:

> int fseek( FILE *stream, off_t offset, int whence);

Systems that have this defintion of fseek are not standard
compliant.

The relevant unix standards have defined

int fseeko( FILE *stream, off_t offset, int whence);

off_t ftello(FILE *stream);


It's true that the standard definitions of fseek/ftell are broken, but
you can't just change them.

Various 32 bit Unixes or 32 bit compilation environments also offer
a choice of 32 bit off_t or 64 bit off_t, so you would need to
specify large file compilation flags which can be retrieved
using getconf on Unix98 compliant systems.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Casper H.S. Dik - Network Security Engineer

unread,
Nov 10, 2001, 9:23:27 AM11/10/01
to
[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]

Mark McIntyre <ma...@garthorn.demon.co.uk> writes:

Actually, OpenBSD has the proper definition judging by its on-line
manual pages:


int
fseek(FILE *stream, long offset, int whence);

int
fseeko(FILE *stream, off_t offset, int whence);

http://www.openbsd.org/cgi-bin/man.cgi?query=fseek&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html


(off_t is 64 bits in the 4.4BSD derived OS)

Lawrence Kirby

unread,
Nov 11, 2001, 8:35:57 AM11/11/01
to
In article <slrn9uqkfr.c...@www42.durchnull.de>
zweiund...@durchnull.de "Rudolf Polzer" writes:

>Lawrence Kirby <fr...@genesis.demon.co.uk> wrote:
>[...]
>(fseek with long)
>> >> 2) How do people get around this mess?
>> >
>> >They either upgrade to a 64-bit environment, or use a
>> >large file compilation environment. The availability
>> >of either of these varies on what platform you're running
>> >on.
>>
>> This won't change the fact that fseek() is limited to a 32 bit signed value
>> on systems where long is a 32 bit signed type. To make use of > 32bit
>> offets on systems that support then you have to resort of system-specific
>> extensions.
>
>Really? There's the workaround with calling fseek() multiple times
>using SEEK_CUR. But I do _not_ know a workaround for ftell() :(

That might work, but it is quite possible that implementations of fseek()
simply cannot seek past LONG_MAX bytes even where the underlying filesystem
supports it. And of course if the file is a lot bigger than LONG_MAX then
this approach could be on the slow side.

Mark McIntyre

unread,
Nov 11, 2001, 10:17:15 AM11/11/01
to
On 10 Nov 2001 14:23:27 GMT, Caspe...@Holland.Sun.Com (Casper H.S.

Dik - Network Security Engineer) wrote:

>[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]

I hope this wasn't aimed at me, because i didn't. Did you mistakenly
put that comment in?

>Mark McIntyre <ma...@garthorn.demon.co.uk> writes:
>
>>>7.19.9.2 The fseek function
>>> int fseek(FILE *stream, long int offset, int whence);
>
>>thats the same as in official C99. You could say that BSD is "broken"
>>but I suspect that off_t is typedef'ed to a long.
>
>Actually, OpenBSD has the proper definition judging by its on-line
>manual pages:

Glad to hear it.

Logan Shaw

unread,
Nov 11, 2001, 5:13:57 PM11/11/01
to
In article <ig5tutsr7cibndt68...@4ax.com>,

Mark McIntyre <ma...@garthorn.demon.co.uk> wrote:
>On 10 Nov 2001 14:23:27 GMT, Caspe...@Holland.Sun.Com (Casper H.S.
>Dik - Network Security Engineer) wrote:
>
>>[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]
>
>I hope this wasn't aimed at me, because i didn't. Did you mistakenly
>put that comment in?

Casper has been putting that comment into every post for years. It's
not directed specifically at you.

- Logan
--
"In order to be prepared to hope in what does not deceive,
we must first lose hope in everything that deceives."

Georges Bernanos

Mark McIntyre

unread,
Nov 11, 2001, 6:07:50 PM11/11/01
to
On 11 Nov 2001 16:13:57 -0600, lo...@cs.utexas.edu (Logan Shaw) wrote:

>In article <ig5tutsr7cibndt68...@4ax.com>,
>Mark McIntyre <ma...@garthorn.demon.co.uk> wrote:
>>On 10 Nov 2001 14:23:27 GMT, Caspe...@Holland.Sun.Com (Casper H.S.
>>Dik - Network Security Engineer) wrote:
>>
>>>[[ PLEASE DON'T SEND ME EMAIL COPIES OF POSTINGS ]]
>>
>>I hope this wasn't aimed at me, because i didn't. Did you mistakenly
>>put that comment in?
>
>Casper has been putting that comment into every post for years. It's
>not directed specifically at you.

Ah, proably means he's never posted to comp.lang.c before. Did he try
putting it in his sig instead? The usual place for such stuff?

Bill Godfrey

unread,
Nov 12, 2001, 6:20:46 AM11/12/01
to
Erik de Castro Lopo <nos...@mega-nerd.net> writes:

> I release that fsetpos and fgetpos are designed to get around this but
> they have their own limitations.

Err, in what way does fsetpos/fgetpos fail to meet your needs?

Speaking of which, the unixoid I'm using seems to miss the point, and
supplies fsetpos and fsetpos64.

Oh well.

Bill, a file that big?

Lawrence Kirby

unread,
Nov 12, 2001, 11:14:24 AM11/12/01
to
In article <i2pwv0w...@cvhf434.gpt.co.uk>
bi...@bacchae.f9.co.uk "Bill Godfrey" writes:

>Erik de Castro Lopo <nos...@mega-nerd.net> writes:
>
>> I release that fsetpos and fgetpos are designed to get around this but
>> they have their own limitations.
>
>Err, in what way does fsetpos/fgetpos fail to meet your needs?

You cannot portably seek to calculated file positions using fsetpos().

>Speaking of which, the unixoid I'm using seems to miss the point, and
>supplies fsetpos and fsetpos64.

If plain fsetpos() can't cope with 64 bit offsets on platforms where the
underlying system supports then, there is something very wrong.

Norman Black

unread,
Nov 12, 2001, 4:23:52 PM11/12/01
to
You could switch to using open, open64, close and lseek/lseek64.


The "64" APIs are officially stated as transitional APIs until full 64-bit file
APIs are the norm. In this way the 32-bit and 64-bit can coexist. They seem to
be very portable. The "transition" period will likely exists as long as a
reasonable number of people are running 32-bit operating systems, ... a very
long time.

I personally use open(... O_LARGEFILE) and lseek64. My code runs on Linux and
Solaris.

--
Norman Black
Stony Brook Software
nospam => stonybrk

"Erik de Castro Lopo" <nos...@mega-nerd.net> wrote in message
news:3BECBAD1...@mega-nerd.net...
> Hi all,
>
> I'm using fseek on a platform where sizeof (long) == 4. This makes it
> difficult to fseek past the 0x7FFFFFFF th byte of a file larger than
> 2 Gig.


>
> I release that fsetpos and fgetpos are designed to get around this but
> they have their own limitations.
>

> To make matters worse, OpenBSD and some of the other *BSDs have redfined
> fseek as:
>
> int fseek( FILE *stream, off_t offset, int whence);
>

> and made off_t a 64 bit value even on machines where long is 32 bits.
>
> The code I'm working on works on Unix, Win32, MacOS, BeOS.


>
> Questions :
>
> 1) Is there a portable replacement for fseek?

> 2) How do people get around this mess?
>

> TIA,


> Erik
> --
> +-----------------------------------------------------------+
> Erik de Castro Lopo nos...@mega-nerd.com (Yes it's valid)
> +-----------------------------------------------------------+

> "It's far too easy to make fun of Microsoft products, but it takes a
> real man to make them work, and a god to make them do anything useful"
> -- Anonymous

Andrew Gierth

unread,
Nov 12, 2001, 6:20:38 PM11/12/01
to
>>>>> "Norman" == Norman Black <nos...@ix.netcom.com> writes:

Norman> You could switch to using open, open64, close and
Norman> lseek/lseek64.

Explicitly using the -64 calls is setting yourself up for trouble if
you ever end up porting to a system that has clean largefile support
(which means any 4.4BSD derivative or any LP-64 system regardless of
origin) rather than the grody LFS backward-compatibility kludges.

The rationale for those functions is simple: if you have existing
object code or executables that assume that off_t is 32 bits, then you
can't just go in and change the definitions of lseek() etc. Moreover,
if you want existing programs to behave safely and (reasonably)
consistently, you want to ensure that they don't get confused by
offset values supplied by other (new) programs. So, the different
function names are used to ensure that existing code links with the
right versions, and that new code can link with the new versions.

Note that none of this is an issue if you don't care about binary
compatibility. It's also not an issue if off_t was _already_ 64 bits,
as it is on the *BSDs or on systems where long is also 64 bits.

The right way to use the -64 calls is not to use them at all, but to
utter the appropriate incantations to the compiler to get off_t defined
as 64 bits and all the new functions used by default. That works both
on LFS-kludged systems and systems where off_t was 64 bits all along,
with no difference other than the compile options.

Norman> The "64" APIs are officially stated as transitional APIs
Norman> until full 64-bit file APIs are the norm. In this way the
Norman> 32-bit and 64-bit can coexist. They seem to be very
Norman> portable.

$ fgrep open64 /usr/include/*
$

you were saying?

Norman> The "transition" period will likely exists as long as a
Norman> reasonable number of people are running 32-bit operating
Norman> systems, ... a very long time.

no, it'll last as long as people are running binaries compiled on
systems with 32-bit off_t. Even then, the -64 APIs aren't necessarily
going to last that long - there are other ways to ensure that old
binaries get the right system calls.

Norman Black

unread,
Nov 13, 2001, 4:19:48 PM11/13/01
to
My mistake for trying open64. As my message said I personally use open.

--
Norman Black
Stony Brook Software
nospam => stonybrk

"Andrew Gierth" <and...@erlenstar.demon.co.uk> wrote in message
news:87snbjk...@erlenstar.demon.co.uk...

Andrew Gierth

unread,
Nov 13, 2001, 6:10:03 PM11/13/01
to
>>>>> "Norman" == Norman Black <nos...@ix.netcom.com> writes:

Norman> My mistake for trying open64. As my message said I personally
Norman> use open.

You mentioned you were using O_LARGEFILES and lseek64, both of which are
part of the LFS compatibility kludge rather than things to use in real
programs.

Norman Black

unread,
Nov 14, 2001, 3:47:55 PM11/14/01
to
> You mentioned you were using O_LARGEFILES and lseek64, both of which are
> part of the LFS compatibility kludge rather than things to use in real
> programs.

1. I do not use C. I use Modula-2.
2. Because of that I call lseek64 because that *is* the real and true name of
the implemented procedure in libc on Linux and Solaris. Those libc.so files have
an lseek and lseek64 and lseek64 is the only one that takes large offsets. While
going through the system C header files I did notice that if the appropriate
defines were in place then lseek was aliased to lseek64.
3. I could call my Modula-2 procedure definition lseek instead of lseek64 and
then change the public symbol name to what is appropriate (currently lseek64)
but then that does not really change anything now does it.

BTW. open64 is the real public name for the "large" open. I just checked the
Solaris libc.so.

I am not sure I would call the "64 transition" extensions a kludge. They seem
like a very reasonable solution to a real problem.

--
Norman Black
Stony Brook Software
nospam => stonybrk

"Andrew Gierth" <and...@erlenstar.demon.co.uk> wrote in message

news:87heryf...@erlenstar.demon.co.uk...

Andrew Gierth

unread,
Nov 14, 2001, 6:19:00 PM11/14/01
to
>>>>> "Norman" == Norman Black <nos...@ix.netcom.com> writes:

>> You mentioned you were using O_LARGEFILES and lseek64, both of
>> which are part of the LFS compatibility kludge rather than things
>> to use in real programs.

Norman> 1. I do not use C. I use Modula-2.

I guess we'd better remove comp.lang.c from followups, then :-)

Norman> 2. Because of that I call lseek64 because that *is* the real
Norman> and true name of the implemented procedure in libc on Linux
Norman> and Solaris.

it's the name of the visible symbol in libc, so for some purposes it's
the "real" name. But it's defined in no standard, it's not required to
exist, it _won't_ exist on any system that doesn't need legacy support
for 32-bit off_t, it's there purely so that existing executables,
objects and libraries that refer to "lseek" doesn't get the new
function unexpectedly. The name of the routine intended to be used in
source code is still "lseek", and that's the only name for it defined
in any standard.

Norman> Those libc.so files have an lseek and lseek64 and lseek64 is
Norman> the only one that takes large offsets. While going through
Norman> the system C header files I did notice that if the
Norman> appropriate defines were in place then lseek was aliased to
Norman> lseek64. 3. I could call my Modula-2 procedure definition
Norman> lseek instead of lseek64 and then change the public symbol
Norman> name to what is appropriate (currently lseek64) but then that
Norman> does not really change anything now does it.

It means that porting your code to a system lacking lseek64 (where the
plain "lseek" takes 64-bit offsets) becomes a matter of changing one
definition rather than every use of the function.

Norman> BTW. open64 is the real public name for the "large" open. I
Norman> just checked the Solaris libc.so.

Same arguments apply as for lseek64.

Norman> I am not sure I would call the "64 transition" extensions a
Norman> kludge. They seem like a very reasonable solution to a real
Norman> problem.

A real problem with, by nature, no possible clean solution other than
completely breaking backward compatibility. Any fix for it would have
been a kludge. But surely, building dependencies on one specific
solution into your own applications (when you don't need to) is just
asking for further trouble in the future.

Norman Black

unread,
Nov 15, 2001, 5:41:09 PM11/15/01
to
> Norman> 1. I do not use C. I use Modula-2.
>
> I guess we'd better remove comp.lang.c from followups, then :-)

Oops, I had no idea a C newsgroup was in the posting list. If I had noticed this
I would have deleted the reference.

I mentioned I do not use C, but I am not sure I mentioned that I am the
developer of the Modula-2 compiler I use.

> it's the name of the visible symbol in libc, so for some purposes it's
> the "real" name. But it's defined in no standard, it's not required to
> exist

All Unix standards I know of, POSIX, Unix98 and such are source level only. They
also only state what is guaranteed to be there and by what name. Not what is
actually there on a specific implementation (Linux vs SunOS for example). This
obviously means nothing to someone not using C. For example, these standards say
nothing about the actual implementation name. Any implementation could have the
public symbol name of lseek actually be GetWhatThisDoes or anything else but
lseek and still be "standard". What actually exists is what anyone not using the
system header files has to worry about. The system header files handle any
jiggering,of the names necessary. Usually with defines. I then translate their
jiggering to my Modula-2 interfaces. Obviously C defines mean nothing to
Modula-2 or any non C language.

> It means that porting your code to a system lacking lseek64 (where the
> plain "lseek" takes 64-bit offsets) becomes a matter of changing one
> definition rather than every use of the function.

1. I could name the procedure "doda" and still have the public symbol name as
whatever else I want (lseek, lseek64 for example). I could easily have my
"lseek64" use a public name of "lseek". This takes a handful of seconds and
requires no changes in any source files other than the definition (like your
system .h files). Any C use could also do the same with something like "#define
lseek64 lseek".

2. I never, ever call the "operating system" directly from any program. Did I
say EVER! I know libc is a "C" library but Unix kernels tend to have many libc
calls as actual kernel entries. There is a blurring between many C and Unix
kernel calls. I consider libc as "operating system" when I think of Unix
systems.

My code always uses encapsulation "libraries" that define their own API
independent of the operating system API. A port of my code simply means porting
the implementation of that library module. For example in my "FileFunc" module
my "lseek" is actually a procedure called "SetFilePos", and it uses a 32-bit
unsigned number, hence why I am calling lseek64 (when available) on Unix.
SetFilePointer is used on Win32, INT 21h function 42h on DOS, and so on.... This
same idea hold true for my GUI programs. This is where real differences exist in
the implementations of each encapsulation. FileIO is conceptually quite similar
across multiple systems.

Encapsulation is the key to *real* portability. Standards are very good but not
good enough in my mind.
Encapsulation takes little time really, except in the case of GUI encapsulation
which can take some time. The payback is huge, IMO.

3. I have given some thought to versioning my "UNIX" interface module with 32
and 64-bit setups as you say. This is good for someone that uses those calls
directly rather than the encapsulation. This lets someone use the "lseek" name
and have it be 32 or 64-bit. There is nothing magic about this, but it does
conform to existing Unix programming documentation, which I do not try to
duplicate in my Modula-2 system.

--
Norman Black
Stony Brook Software
nospam => stonybrk

"Andrew Gierth" <and...@erlenstar.demon.co.uk> wrote in message

news:87wv0ta...@erlenstar.demon.co.uk...

Victor Wagner

unread,
Nov 15, 2001, 12:21:38 PM11/15/01
to
Norman Black <nos...@ix.netcom.com> wrote:

: 1. I do not use C. I use Modula-2.

: 2. Because of that I call lseek64 because that *is* the real and true
: name of the implemented procedure in libc on Linux and Solaris. Those
: libc.so files have an lseek and lseek64 and lseek64 is the only one
: that takes large offsets. While going through the system C header files
: I did notice that if the appropriate defines were in place then lseek
: was aliased to lseek64.

: 3. I could call my Modula-2 procedure definition lseek instead of
: lseek64 and then change the public symbol name to what is appropriate
: (currently lseek64) but then that does not really change anything now
: does it.

This does change - maintainability of your program.

If in some future version of libc lseek would take 64-bit argument,
(and it is explicitely declared that it would sometime, and 64
functions are transitional), you'll have to change only one
module where Modula names for libc functions are defined.

With current approach you'll have to grep all your code for
.*64 names.

Really, this "change" could happen tomorrow if somebody would
ask you to port your program to some arch (such as Linux-Alpha)
where lseek is already 64-bit.


--
"You, sir, are nothing but a pathetically lame salesdroid!
I fart in your general direction!"
-- Randseed on #Linux

Norman Black

unread,
Nov 16, 2001, 3:52:56 PM11/16/01
to
> This does change - maintainability of your program.
>
> If in some future version of libc lseek would take 64-bit argument,
> (and it is explicitely declared that it would sometime, and 64
> functions are transitional), you'll have to change only one
> module where Modula names for libc functions are defined.
>
> With current approach you'll have to grep all your code for
> .*64 names.

No I would not. Only the public symbol name needs to change. The name of the
procedure can be different than the public symbol name used in linking.

Anyway, I elaborated more on this in another post, although only in the
comp.unix.programmer group.

--
Norman Black
Stony Brook Software
nospam => stonybrk

"Victor Wagner" <vi...@wagner.rinet.ru> wrote in message
news:9t0tj2$hsl$1...@wagner.wagner.home...

0 new messages