Is there a way to have both definitions active in their respective ways?
The purpose is building a library which provides certain kinds of wrapping
around system or library calls that do deal with interface changes due to
large file support.
--
|---------------------------------------/----------------------------------|
| Phil Howard KA9WGN (ka9wgn.ham.org) / Do not send to the address below |
| first name lower case at ipal.net / spamtrap-200...@ipal.net |
|------------------------------------/-------------------------------------|
Not through standard interfaces, no.
> For example,
> when large file support is not turned on, type off_t is 32 bits and off64_t
> is not defined. But when LFS is turned on, off64_t does get defined, and
> off_t is redefined as 64 bits.
>
> Is there a way to have both definitions active in their respective ways?
That can be done, but you have to call the 32-bit and 64-bit versions
directly, instead of macro'd through LFS. See lfcompile64(5) on Solaris.
Alternatively, you can nearly simulate this by always calling the
64-bit versions in your library, "promoting" args from 32-bit callers
as needed. The difference then would be that you'd have to emulate
failures related to the 32-bit limits, eg max size of a file.
> The purpose is building a library which provides certain kinds of wrapping
> around system or library calls that do deal with interface changes due to
> large file support.
If your library is an interposer, you don't need to do anything
special at all though. 32-bit and 64-bit (LFS) library calls are
independent from each other, with the appropriate call invoked via
macros at compile time when building with LFS support. So you just
need 2 versions of each function you are interposing on.
-frank
That's what I want to avoid. I don't want to create a "permanent transition"
like LFS has become. I don't want two different interfaces. I want to make
things work as they should, where the usage of an interface involves using
the correct ADT which is defined to the correct size and time. With the
exception of the fseek and ftell functions, which got replaced by fseeko
and ftello, the whole standard interface _should_ have been done by having
simply redefined all the data types. There should never have been "64"
versions of any syscalls, library functions, data types, structs, etc.
They ended up doing this for various reasons related to legacy programs.
My library has no legacy so I don't want to create any such mess for it.
So I'm just trying to understand the specifics of the existant mess to see
how I might be able to accomplish this (exactly one way to access each
element of the interface, and yet have it always work no matter how the
calling programs are accessing the POSIX interface).
That's not possible, if backwards compatibility is to be retained.
> I want to make
> things work as they should, where the usage of an interface involves using
> the correct ADT which is defined to the correct size and time. With the
> exception of the fseek and ftell functions, which got replaced by fseeko
> and ftello, the whole standard interface _should_ have been done by having
> simply redefined all the data types. There should never have been "64"
> versions of any syscalls, library functions, data types, structs, etc.
> They ended up doing this for various reasons related to legacy programs.
I disagree strongly; it was done the correct way. Supporting
so-called legacy programs is absolutely essential.
That's not to say that the purist in me doesn't appreciate your position.
> My library has no legacy so I don't want to create any such mess for it.
> So I'm just trying to understand the specifics of the existant mess to see
> how I might be able to accomplish this (exactly one way to access each
> element of the interface, and yet have it always work no matter how the
> calling programs are accessing the POSIX interface).
I can't imagine how this could be done.
-frank
Let me clarify. There are two different ways to make reference to having
more than one interface. Clearly if someone compiles my library on a
64-bit machine, that is a truly different interface at the ABI level than
one compiled on a 32-bit machine. Similarly, at the ABI level, interfaces
would be different for different endianness, as well as padding/alignment
issues where structs could be involved, and possibly even the argument
list itself.
The above is one kind of "two different interfaces". The other is having
what might be better termed "overloaded interface" where for each part of
the interface (a function, a struct, a data type, etc), or some subset of
those parts, a corresponding variant definition is created that only makes
sense, or is only needed on, certain categories of architectures. What I
mean here are things like the added "64" functions in LFS, new data types
with "64" in them, and various other things. On a 32 bit machine these are
different. On a 64 bit machine they are pointless.
The kind of thing I want to avoid is the latter.
I see nothing wrong with having a "32-bit pointer / 64-bit file reference"
type of architecture that provides compatibility for legacy programs with
an extra legacy library that provides the ABI for older programs that expect
"32-bit pointer / 32-bit file reference". They could also make an alternate
toolchain to compile programs in the legacy architecture (for example the
legacy library itself which can still need maintenance). But the default
should be the forward architecture.
A very small amount of the POSIX standard failed to have the ability to
"just recompile" to get correctly written programs working on the new
"32-bit pointer / 64-bit file reference" architecture (e.g. why we had
to have fseeko and ftello because of the lack of ODTs). Programs using
fseeko and ftello in lieu of fseek and ftell should be able to simply be
recompiled for the default architecture.
Too many programs would have to be fixed. So someone decided rather than
recode those programs that were broken, we'd have to recode all programs
to add large file support in the transition. So in an effort to avoid
more work, we have a lot more work. It was a bad idea from the get-go.
Someone called it a transition. Looks like it will be around for many
years, maybe decades.
|> I want to make
|> things work as they should, where the usage of an interface involves using
|> the correct ADT which is defined to the correct size and time. With the
|> exception of the fseek and ftell functions, which got replaced by fseeko
|> and ftello, the whole standard interface _should_ have been done by having
|> simply redefined all the data types. There should never have been "64"
|> versions of any syscalls, library functions, data types, structs, etc.
|> They ended up doing this for various reasons related to legacy programs.
|
| I disagree strongly; it was done the correct way. Supporting
| so-called legacy programs is absolutely essential.
This depends on the definition of support. If you want original binary
programs that source is unavailable for to also work with large files,
then that is quite a stretch. But adding the "64" extensions is in no way
a form of supporting legacy programs, either with or without source.
With two different ABIs, you have legacy support. No need to bloat the
API to do so. You might have 2 library binaries around, and intimate
libraries like libc might well have to be coded to understand which it
is being compiled for (because the kernel might only have the LFS KABI).
| That's not to say that the purist in me doesn't appreciate your position.
But it's more than just a purist issue. It's also a practical issue.
Why force recoding all programs when it would have only been necessary
to recode the broken subset (which I am not convinced is so large).
|> My library has no legacy so I don't want to create any such mess for it.
|> So I'm just trying to understand the specifics of the existant mess to see
|> how I might be able to accomplish this (exactly one way to access each
|> element of the interface, and yet have it always work no matter how the
|> calling programs are accessing the POSIX interface).
|
| I can't imagine how this could be done.
This is my quest.
At this point it seems the way I will do so is to have my API be always
of the "implicit large file" type when either implicit or explicit large
files are used by calling programs. A calling program using implicit
would not really see a difference. A calling program using explicit is
going to have to take notice and deal with API elements with great care.
And those in my library would be the large file type but without the large
file names. It's the ability to be able to call both that will be lost.
So a program might be able to call both lstat() and lstat64() in libc to
reach the kernel in two different ways (and libc may well be translating
lstat() to a 64-bit underlying KABI where lstat64() would be a straight
call). But it would only have the 64-bit API in my library. A program
being compiled entirely without LFS would get the non-LFS API, and have
to link with a non-LFS binary (assuming one gets compiled).
I will have to decide if I want to make the default make configuration
create both 32 bit and 64 bit library binaries, or just the 64 bit.
One thing I think I need to detect on a given platform is if LFS was a
choice at all. If the machine is a true 64-bit machine, that might not
be entirely the obvious case, since there may still be 32-bit program
emulation going on, with support for a 32-bit pointer ABI which could
have either 32-bit or 64-bit file access data sizes, or both.
But I definitely do NOT want to see "64" variations in an API on a true
64-bit machine for programs compiled to the full architecture. Yet with
the POSIX LFS way, we apparently will be stuck with that. It's not so
much a transition as it is a trap.
[...]
> I see nothing wrong with having a "32-bit pointer / 64-bit file reference"
> type of architecture that provides compatibility for legacy programs with
> an extra legacy library that provides the ABI for older programs that expect
> "32-bit pointer / 32-bit file reference". They could also make an alternate
> toolchain to compile programs in the legacy architecture (for example the
> legacy library itself which can still need maintenance). But the default
> should be the forward architecture.
Reusing my example from colds: Could you please provide a single
reason why a DNS-server of some kind, running on a 32-bit
architecture, should use 64-bit variables for its UDP communication
calls?
NB: A reason, meaning "some advantage to be gained by doing so', not
only 'I want to process my recorded television shows and don't give a
shit", which is all I have read so far.
No. Why should I provide such an example? I've never suggested that it
is necessary to do this.
| NB: A reason, meaning "some advantage to be gained by doing so', not
| only 'I want to process my recorded television shows and don't give a
| shit", which is all I have read so far.
How about a reason for asking for this?
Or is it that you are not making the distinction between 64-bit variables
in general, and the few 64-bit variable data types needed to access file
positions and sizes when files might be (because the system supports it)
larger than 2GB or 4GB?
I was making the erronous assumption that 'these few variable data
types' would include 64-bit-I/O support as well (ie ssize_t and
size_t) which they (at least according to the headers on the system I
am currently using) don't.
It would be useful to have 64-bit size_t and ssize_t if it were useful to
read in more than 4GB of a file at once. On a 32-bit architecture, that
cannot be done (at least within the scope of normal I/O call interfaces).
I suppose someone might find a cause to do something like that on a 64-bit
architecture, and indeed, you do find size_t and ssize_t (and pointers and
the like) that big (and part of the reason you do see a sudden jump up in
the memory needed when moving from 32-bit to 64-bit architecture).
What was all the purpose of having opaque data types that could be defined
in a different size on another machine? Was it just as a means to document
what a given variable was trying to be? Or was it a means to be able to
recompile a well written program to another architecture in a language that
was supposed to make this easier to do than say assembly language?
> So I'm just trying to understand the specifics of the existant mess to see
> how I might be able to accomplish this (exactly one way to access each
> element of the interface, and yet have it always work no matter how the
> calling programs are accessing the POSIX interface).
You can't, at least not portably.
As far as POSIX is concerned, ILP32_OFF32 and ILP32_OFFBIG are
separate programming environments, just as ILP32_OFF32 and
LP64_OFF64 are separate. You don't expect to be able to mix
ILP32_OFF32 and LP64_OFF64 object files, so you shouldn't expect to
be able to do it with ILP32_OFF32 and ILP32_OFFBIG either.
The fact that the usual way that ILP32_OFFBIG is implemented is as
a variation of ILP32_OFF32 with some identifiers translated "behind
the scenes" is an internal implementation detail. There is no
reason why a POSIX implementation that offers both programming
environments couldn't have separate headers and libraries for each,
and report errors if you try to mix them.
--
Geoff Clare <net...@gclare.org.uk>
I don't have a problem with more than one programming environment. What
I have a problem with is the mixed programming environment where some of
the calls have added versions specifically for 64-bit file offsets. I do
not want to have some of my functions have 64-bit versions. If my library
is used in a pure 32-bit environment or pure 64-bit environment, even on
the same machine, it should just work fine. The problem is when there is
the mixed environment where a program could call either syscall, such as
both stat() and stat64(). What I am thinking of doing in that case is to
make my functions be only the 64-bit version, but with the normal name.
The complication is that the headers will have to change under this case
so the prototypes use the 64-bit variant names.
On 20 May 2007 13:47:01 GMT phil-new...@ipal.net wrote:
> At this point it seems the way I will do so is to have my API be always
> of the "implicit large file" type when either implicit or explicit large
> files are used by calling programs. A calling program using implicit
> would not really see a difference.
One thing I still don't understand is if your API is a shim or similar
for the POSIX API, or if it's something completely different.
If it's an equivalent/shim/etc, then having implicit large files is a
mistake if your goal is to have existing programs (say, 'diff') use
your library instead of or on top of POSIX. You would find that
(e.g.) programs would happily open 2.1G files instead of failing, and
then get lost as to their position in the file when they go past the
2G mark. Now of course (for this specific example) whether or not
this is a problem depends on the nature of the program, but in general
you'd have to audit any program you want to link with your library.
If a program has to be re-written or modified to use your API, it is
of course a different story.
...
> But I definitely do NOT want to see "64" variations in an API on a true
> 64-bit machine for programs compiled to the full architecture. Yet with
> the POSIX LFS way, we apparently will be stuck with that. It's not so
> much a transition as it is a trap.
POSIX LFS has a large legacy of programs to support! You may not have
that defining handicap.
-frank
> It seems a program cannot access BOTH the old (without large file
> support) and new (with large file support) interfaces at the same time.
> For example, when large file support is not turned on, type off_t is 32
> bits and off64_t is not defined. But when LFS is turned on, off64_t
> does get defined, and off_t is redefined as 64 bits.
Doesn't _just_ having _LARGEFILE64_SOURCE=1 do the right thing here? All
of the (informal) documentation I can find implies it does.
> Is there a way to have both definitions active in their respective ways?
>
> The purpose is building a library which provides certain kinds of
> wrapping around system or library calls that do deal with interface
> changes due to large file support.
Well I did the above for a library and it seems to work, although
probably all of the users want LFS with full 64bits ... but I thought I
tested it with off_t == 32bit as well, a little bit at least.
--
James Antill -- ja...@and.org
http://www.and.org/and-httpd/ -- $2,000 security guarantee
http://www.and.org/vstr/
It's not exactly a shim. Some of the functions are simplistic changes
to syscalls, such as a version of stat() that takes multiple string
arguments and forms the path by inserting "/" between each argument
and joining that all together as one string for the real syscall. It
is obviously nor just a look-alike shim, as the API design is different.
Another is a very different way to do file tree recursion than either
the existing ftw or fts functions. It would be affected in terms of
the struct stat it returns to callers for each object returned (unlike
ftw it returns to the caller with each object).
Examples:
http://libh.slashusr.org/source/io/src/lib/h/lstat_join_sep.c
http://libh.slashusr.org/source/ftr/src/lib/h/ftr_header.h
See macros ftr_stat() and ftr_stat_ptr() in the latter URL. Those macros
and many others like them provide info about the file last found by the
ftr_get() function, for the given ftr object.
| If it's an equivalent/shim/etc, then having implicit large files is a
| mistake if your goal is to have existing programs (say, 'diff') use
| your library instead of or on top of POSIX. You would find that
| (e.g.) programs would happily open 2.1G files instead of failing, and
| then get lost as to their position in the file when they go past the
| 2G mark. Now of course (for this specific example) whether or not
| this is a problem depends on the nature of the program, but in general
| you'd have to audit any program you want to link with your library.
It's not a shim in the sense of causing existing programs to effectively
call my library, even if recompiled to do so (as in a macro shim). It
is a different API that requires deliberate programming to use. A lot
of it provides programming conveneience (useful for new programs, but
not worth recoding old programs). A lot of it provides new features.
| If a program has to be re-written or modified to use your API, it is
| of course a different story.
It does have to be re-written or modified. It is a different API. But
it does make use of POSIX data types, structs, and functions (some even
in macros that would be compiled at caller compile time).
|> But I definitely do NOT want to see "64" variations in an API on a true
|> 64-bit machine for programs compiled to the full architecture. Yet with
|> the POSIX LFS way, we apparently will be stuck with that. It's not so
|> much a transition as it is a trap.
|
| POSIX LFS has a large legacy of programs to support! You may not have
| that defining handicap.
No, I certainly do not have that legacy. But even if I did I would not
take the approach POSIX LFS did. I would want to make it work for both
old and new programs, even if that meant having 2 or even 3 different
binary libraries (at least one being there for legacy ABI compatibility
for programs w/o source code available). I think 2 would be sufficient,
though. My objective is that none of my API have any "64" functions or
data types of my own creation. Everything would work with large file
PSOXI API elements, unless the calling programs don't use LFS. Then my
library might need a legacy binary so that it can properly fail when the
LFS functions would otherwise succeed, such as failing on a 5GB file being
found in a tree recursion, which I believe should be the correct behaviour
when the stat info being returned to the caller is from the legacy API.
If it didn't fail in such a case, the information cannot be properly
returned.
But that is not the only way a calling program might do this.
|> Is there a way to have both definitions active in their respective ways?
|>
|> The purpose is building a library which provides certain kinds of
|> wrapping around system or library calls that do deal with interface
|> changes due to large file support.
|
| Well I did the above for a library and it seems to work, although
| probably all of the users want LFS with full 64bits ... but I thought I
| tested it with off_t == 32bit as well, a little bit at least.
There are at least 8 different variations of doing LFS, based on what can
be selected through CPP definitions. Apparently they fall into 3 major
groups: no LFS, explicit LFS, implicit LFS. I think implicit LFS will
be the easiest. But apparently a lot of programs have chosen to go with
explicit LFS. My thinking at the moment is to just treat explicit the
same as implicit, since there is very little legacy in usage of my library.
That is, if you choose to have both stat() and stat64(), and also use my
library, you've already committed to doing some amount of recoding, and
committed to conding usage of my library, so you must make sure all usage
of my library is consistent with the LFS support you are using. My macros
will probably have to make use of the "64" variants of POSIX symbols in
the explicit cases, but I think that won't be a problem.
> I don't have a problem with more than one programming environment.
> What I have a problem with is the mixed programming environment where
> some of the calls have added versions specifically for 64-bit file
> offsets. I do not want to have some of my functions have 64-bit
> versions.
This was introduced to allow 32-bit systems to access large files. The
transition from 32-bit systems via 32-bit systems with large file
support, to 64-bit systems was handled with considerable deftness by
the (then relevant) Unix vendors.
The only time you would need a 32 and 64 bit version of a system call
is when you're running a 32-bit machine with large file support.
Most Unix systems -even though all are now 64-bit- continue to support
32 bit programs, and 32-bit programs with large file support. This is a
feature.
If you restrict your library to a 64-bit version only, there is no need
to worry about large file support.
--
Stefaan A Eeckels
--
He who will not reason, is a bigot;
he who cannot is a fool;
and he who dares not is a slave. (Sir William Drummond)
> My macros will probably have to make use of the "64" variants of
> POSIX symbols in the explicit cases,...
Why? We're talking about a transitional interface here. The only time
you need to use the lf64 functions is when a 32-bit program needs to
access large files.
There are valid reasons to continue to use 32-bit programs (on SPARC,
they are marginally faster, the program needs to run on old 32-bit
systems, etc), but there are not many compelling reasons to go out of
one's way to support a transitional interface in new developments.
Just go 64-bit, and be happy.
--
Stefaan A Eeckels
--
You know, it is almost always the case in the real world that something
is "fair" when you like it and "unfair" when you don't.
-- Jeffrey Siegal in gnu.misc.discuss
For an application, yes. But the OP is trying to avoid having two
interfaces (the LFS one and the non-LFS one) for each function, in a
library. What _LARGEFILE64_SOURCE=1 (or more correctly, `getconf
LFS_CFLAGS`) does is select the LFS interfaces for various functions.
>> Is there a way to have both definitions active in their respective ways?
>>
>> The purpose is building a library which provides certain kinds of
>> wrapping around system or library calls that do deal with interface
>> changes due to large file support.
>
> Well I did the above for a library and it seems to work, although
> probably all of the users want LFS with full 64bits ... but I thought I
> tested it with off_t == 32bit as well, a little bit at least.
There is no [good] way for a function to determine if its caller used
an LFS or non-LFS datatype. Even if you want to do the nasty stuff to
figure it out, LFS programs call different functions than non-LFS
functions; e.g. open64 instead of open. So to work with existing
programs and not require a recompile, a library must provide both
interfaces. Even if you are adding some brand new interface, if you
are using the types that are changed with LFS (off_t et al.) you
must provide 2 interfaces to deal with the different sized args.
libelf is a good example of a library that has LFS problems, but that
isn't itself part of the LFS "standard". This is unfortunate, since
implementors have been reluctant (or just didn't think of it) to
"extend" the LFS interface to libelf, ie to define multiple versions
of the libelf routines based on LFS or non-LFS settings. Therefore
however you or your vendor compiles libelf is the only way your
program that uses libelf can be compiled, regardless of the other
(LFS) functionality you may want. Well, you could write a shim to
translate the data types. Horrible.
-frank
I don't thinks so. I think it was done the way they did it so they would
not have to undo some misguided attempts to handle large files that had
already be utilized.
| The only time you would need a 32 and 64 bit version of a system call
| is when you're running a 32-bit machine with large file support.
| Most Unix systems -even though all are now 64-bit- continue to support
| 32 bit programs, and 32-bit programs with large file support. This is a
| feature.
I cannot figure out when a program would need both the small file and large
file calls in the very same program.
| If you restrict your library to a 64-bit version only, there is no need
| to worry about large file support.
There is the need to work with those programs that select the explicit LFS
mode, which means stat() is still for small files and stat64() is for large
files (although some systems might only do one of them at the KABI).
And that seems to be widely used.
| There are valid reasons to continue to use 32-bit programs (on SPARC,
| they are marginally faster, the program needs to run on old 32-bit
| systems, etc), but there are not many compelling reasons to go out of
| one's way to support a transitional interface in new developments.
|
| Just go 64-bit, and be happy.
When I'm writing a program, that concept works fine. But for a library
one has to deal with how the prgorams calling it are written.
That's nice on paper but really most software developers aren't that good
and changing the default (to be LFS-ok) would IMHO have been a big problem.
I can see even just requiring a re-link against a non-LFS library to be
problematic.
Anyway, I'll be very interested to hear of your solution.
-frank
My solution as I would have done POSIX LFS, or my solution for my libraries?
> On Mon, 21 May 2007 22:18:30 -0000 James Antill <james-...@and.org>
> | Doesn't _just_ having _LARGEFILE64_SOURCE=1 do the right thing here?
> | All of the (informal) documentation I can find implies it does.
>
> But that is not the only way a calling program might do this.
Right, I probably wasn't clear. My solution was to do:
if off64_t is available in the environment, all library interfaces
explicitly use off64_t. Otherwise library interfaces use off_t.
This works for:
. 32bit programs where off_t == off64_t
. 32bit programs where off_t == 32bit, and off64_t is defined.
. 64bit programs with just off_t.
...and it only doesn't work for:
. 32bit programs where no LFS support is enabled.
...my feeling was that the later are broken anyway, so doing the right
thing in all the other cases was enough.
--
James Antill -- ja...@and.org
C String APIs use too much memory? ustr: length, ref count, size and
read-only/fixed. Ave. 55% overhead over strdup(), for 0-20B strings
http://www.and.org/ustr/
for your own libraries
How does one tell if that is so? LARGEFILE64_SOURCE?
> On Tue, 22 May 2007 15:16:21 -0000 James Antill <james-...@and.org>
> wrote: | On Mon, 21 May 2007 22:49:25 +0000, phil-news-nospam wrote: |
> |> On Mon, 21 May 2007 22:18:30 -0000 James Antill
> <james-...@and.org> |> | Doesn't _just_ having _LARGEFILE64_SOURCE=1
> do the right thing here? |> | All of the (informal) documentation I can
> find implies it does. |>
> |> But that is not the only way a calling program might do this. |
> | Right, I probably wasn't clear. My solution was to do: |
> | if off64_t is available in the environment, all library interfaces |
> explicitly use off64_t. Otherwise library interfaces use off_t.
>
> How does one tell if that is so? LARGEFILE64_SOURCE?
I basically[1] used:
AC_CHECK_TYPE(off64_t, AC_DEFINE(HAVE_OFF64_T),
AC_DEFINE_UNQUOTED(off64_t, off_t))
...you can't really use _LARGEFILE64_SOURCE because that relies on the
user of your lib. defining it. If you don't use autoconf, you could try:
getconf LFS64_CFLAGS
...and see what the output is, but I'm not sure how portable that is.
[1] A little bit of sed post output to give the output a namespace.
if they don't define it, then how would they end up with off_t defined
larger than "normal" and/or off64_t defined?
| getconf LFS64_CFLAGS
|
| ...and see what the output is, but I'm not sure how portable that is.
It comes up blank on some machines that do have large files. So it seems
to not be reliable. Is it a program possibly compiled with other headers?
> On Wed, 23 May 2007 14:40:53 -0000 James Antill <james-...@and.org>
> |
> | AC_CHECK_TYPE(off64_t, AC_DEFINE(HAVE_OFF64_T),
> | AC_DEFINE_UNQUOTED(off64_t, off_t))
> |
> | ...you can't really use _LARGEFILE64_SOURCE because that relies on the
> | user of your lib. defining it. If you don't use autoconf, you could
> try:
>
> if they don't define it, then how would they end up with off_t defined
> larger than "normal" and/or off64_t defined?
Yes, when you've decided the environment has an off64_t build all your
interfaces to use that ... then when a user of your library comes along,
then before it gets to that piece of code someone will need to have
defined _LARGEFILE64_SOURCE.
What I mean is that this can only really be treated as a result, not
something you should check IMO. The order goes roughly like:
1. Compile of library must know if off64_t exists or not.
2. Library header must know if off64_t exists or not.
3. Someone must define _LARGEFILE64_SOURCE, if off64_t exists.
4. Library should include the headers it needs.
5. Library uses off_t or off64_t in it's own definitions.
...in theory you can swap 2 and 3, but then you might be giving out the
wrong interface definitions if the user doesn't do the right thing
(instead of doing a #warning or #error). The above order also means that
the library headers can "do everything", which is much nicer to use,
unless the user requests otherwise.
They don't need to define _LARGEFILE64_SOURCE. They can choose to. Or
they can choose not to. I want my library to work for both, and correctly.
Correctly is defined as presenting the same files as the interfaces made
available to them can do. But I also want _my_ library to achieve that by
having no added function names, no added variable types, etc. Because the
underlying functionality will have to be very slightly different for the
two cases I see, there will be two different library files produced. They
will have to link to the correct one.
| What I mean is that this can only really be treated as a result, not
| something you should check IMO. The order goes roughly like:
|
| 1. Compile of library must know if off64_t exists or not.
| 2. Library header must know if off64_t exists or not.
| 3. Someone must define _LARGEFILE64_SOURCE, if off64_t exists.
| 4. Library should include the headers it needs.
| 5. Library uses off_t or off64_t in it's own definitions.
|
| ...in theory you can swap 2 and 3, but then you might be giving out the
| wrong interface definitions if the user doesn't do the right thing
| (instead of doing a #warning or #error). The above order also means that
| the library headers can "do everything", which is much nicer to use,
| unless the user requests otherwise.
If they don't define _LARGEFILE64_SOURCE, they are expecting a legacy
interface with legacy semantics. While my library doesn't really have a
legacy interface of its own, per se, it will still need to present the
relevant matching semantics. For example, a failure to get stat info for
a large file is one of those semantics.
What's the advantage of that over having *64 interfaces? Two libraries
with the same interface but differently sized data types may also be
confusing to debug (for users of the library).
-frank
Not only that but it means that shared libraries are much less likely to
use your library (because then _their_ users won't be able to use/not-use
LFS), and it sounds like random users of the library are likely to screw
up compiling the library (I can only hope that the library name they link
against is the same on i386 and x86-64).
No one _needs_ two libraries or two interfaces _unless_ they are mixing
types of programs on the same system. Either of these two approachs is
_supposed_ to be a transition. However, because programs get coded to
use *64 interface, it really is not a true transition. This is a legacy
that will be difficult to get rid of. This is why I do not want to create
it in the first place ... because it will end up being there forever.
I stand by MHO that the whole *64 interface idea (at the POSIX layer) was
a terribly bad idea. At the kernel layer (ABI or even API), it does not
matter so much, as few programs should be touching that layer (system
utilities and the core libc or alternative stub library, and that's all).
A system might be pure 32-bit (in terms of file offset/size referencing
facility). Or a system might be pure 64-bit (regardless of whether the
pointer size is 32-bit or 64-bit). Or a system could have a mix of 32-bit
and 64-bit (on pointer size being 32-bit or 64-bit). The mix would be to
support old programs that have not been, or cannot yet be, converted to
64-bit for some reason (badly coded, lack of source, etc). It is only the
mix systems that would need two libraries (or three or four if it is a
64-bit pointer architecture that also supports 32-bit pointers).
The thing is, a system "in transition" could eventually get itself out of
transition by replacing all programs that need legacy interface sizing with
programs that use the latest interface sizing ... and removing the no longer
needed additional libraries. With *64 interface symbols, it will be next
to impossible to exit from the transition.
Programs are coded to use the LFS *64 interfaces mostly transparently,
through macros. It's entirely possible to come along later and add
*32 interfaces and drop the *64 names. Of course this will break all
existing compiled programs expecting the unadorned names to have
32-bit data types, but that is something you apparently find
acceptable. It just means all existing programs have to be
recompiled.
> I stand by MHO that the whole *64 interface idea (at the POSIX layer) was
[yadda]
You're awful hung up on how bad the so-called LFS transition is. OK,
maybe it sucks! But you are stuck with it.
Since your library depends on LFS types, the users of your library are
stuck with *64 interfaces, regardless of how your library handles it.
Using a macro to select a 32- or 64-bit interface is pretty easy,
especially since users of your library *already* have to do that.
If you have identical interface names for the 2 different cases, how
will users of your library determine which library to link against?
No matter how you do it, it will be extra work on the user's part
and IMHO more confusing to debug. It will also be unreliable; you
won't be able to guarantee that the correct library is linked.
-frank
Yes, I am stuck with it. Yes, I am hung up on how bad it is. But that
doesn't mean I have to further it.
| Since your library depends on LFS types, the users of your library are
| stuck with *64 interfaces, regardless of how your library handles it.
They are stuck with the POSIX names having *64 versions, if they select
such.
| Using a macro to select a 32- or 64-bit interface is pretty easy,
| especially since users of your library *already* have to do that.
_LARGEFILE_SOURCE, _LARGEFILE64_SOURCE, and _FILE_OFFSET_BITS=64 ?
| If you have identical interface names for the 2 different cases, how
| will users of your library determine which library to link against?
| No matter how you do it, it will be extra work on the user's part
| and IMHO more confusing to debug. It will also be unreliable; you
| won't be able to guarantee that the correct library is linked.
Users could be making use of non-*64 names AND not define any of the macros
on a 32-bit only system (has not begun the transition) and on a 64-bit only
system (has completed the transition). What should my library do in these
two cases? I see no reason to export to the calling program any *64 variant
name. I won't be able to use *64 variant POSIX names. Each of these cases
will mean a quite normal compile just as if the LFS strategy had never been
created in the first place. The end result will be programs that expect to
link to dynamic symbols with no *64 varient names, yet have two different
ABIs. They will have to be linked to a library with the correct ABI, which
on these two system cases would be expected to be the only library.
Now take these two dynamically linked executable binaries over to a third
system which is in transition.
I don't have an issue with making my library link to *64 POSIX names.
Well, I do in the sense that I think the whole approach is wrong, but I
can deal with it. I just don't want to create new names to export to
the calling program. I don't want to expand on the POSIX mistake.
I think what you are trying to suggest isn't so much that I have to use
my own *64 names at the ABI layer (to dynamically link with), but rather,
that it is difficult to sort out what library name will be linked with
for systems with both ABIs available. And that will end up being a lot
of systems for a long time because of the lack of an exit strategy in
the POSIX LFS design. So I take it that you are suggesting that what I
need to do is have both filesystem size interfaces in a single library
file, which then requires distinct names for the two different sizes.
If I am correct in that assessment of your position, could you tell me
your feeling about using *32 names for the 32-bit version of symbols,
and leave the 64-bit versions as plain names?
Yes.
> | Using a macro to select a 32- or 64-bit interface is pretty easy,
> | especially since users of your library *already* have to do that.
>
> _LARGEFILE_SOURCE, _LARGEFILE64_SOURCE, and _FILE_OFFSET_BITS=64 ?
It depends on the platform. Generally you would use
`getconf LFS_CFLAGS`. Does autoconf have a built-in check for the
correct flags?
My point was that users have to select the correct macros anyway, so
your use of those macros to switch between plain and *64 interface
names is no extra burden. But I was assuming users correctly choose
the macros to set. Which is not at all a certainty. I've seen lots
and lots of open source software which gets it wrong.
> | If you have identical interface names for the 2 different cases, how
> | will users of your library determine which library to link against?
> | No matter how you do it, it will be extra work on the user's part
> | and IMHO more confusing to debug. It will also be unreliable; you
> | won't be able to guarantee that the correct library is linked.
>
> Users could be making use of non-*64 names AND not define any of the macros
> on a 32-bit only system (has not begun the transition) and on a 64-bit only
> system (has completed the transition). What should my library do in these
> two cases? I see no reason to export to the calling program any *64 variant
> name.
Well, sure (by definition). For the 32-bit non-LFS case and the 64-bit
case, you just use the plain interface and you get 32-bit or 64-bit
data types and file size support.
> I won't be able to use *64 variant POSIX names. Each of these cases
> will mean a quite normal compile just as if the LFS strategy had never been
> created in the first place. The end result will be programs that expect to
> link to dynamic symbols with no *64 varient names, yet have two different
> ABIs.
These are 2 different ABIs because they are 2 different architectures.
> They will have to be linked to a library with the correct ABI, which
> on these two system cases would be expected to be the only library.
Not on a multilib system. In which case the user is still prevented
from linking against the wrong library (32-bit apps can't link against
64-bit libraries, and vice versa).
> Now take these two dynamically linked executable binaries over to a third
> system which is in transition.
And they will work, because on the transitional system, the 32-bit
apps with 64-bit versions of the interface have *64 names.
But if you don't have a different interface name for 64-bit data types
on a 32-bit system, you can't compile on a transitional system and use
the resultant binary on a non-transitional system. Or rather, the
binary will execute but will be broken.
> I don't have an issue with making my library link to *64 POSIX names.
> Well, I do in the sense that I think the whole approach is wrong, but I
> can deal with it. I just don't want to create new names to export to
> the calling program. I don't want to expand on the POSIX mistake.
>
> I think what you are trying to suggest isn't so much that I have to use
> my own *64 names at the ABI layer (to dynamically link with), but rather,
> that it is difficult to sort out what library name will be linked with
> for systems with both ABIs available.
Yes. And impossible to tell if you are linked against the correct
library; which is a bigger problem if you compile and one system
and run on another (as you would typically do with pkg mgmt).
> And that will end up being a lot
> of systems for a long time because of the lack of an exit strategy in
> the POSIX LFS design. So I take it that you are suggesting that what I
> need to do is have both filesystem size interfaces in a single library
> file, which then requires distinct names for the two different sizes.
Yes.
> If I am correct in that assessment of your position, could you tell me
> your feeling about using *32 names for the 32-bit version of symbols,
> and leave the 64-bit versions as plain names?
Sounds great.
-frank
I do not use autoconf and fully intend to avoid it.
| My point was that users have to select the correct macros anyway, so
| your use of those macros to switch between plain and *64 interface
| names is no extra burden. But I was assuming users correctly choose
| the macros to set. Which is not at all a certainty. I've seen lots
| and lots of open source software which gets it wrong.
Given there are 8 different ways to do this, and apparently only 3 of
them make any sense, I can imagine the difficulties.
|> | If you have identical interface names for the 2 different cases, how
|> | will users of your library determine which library to link against?
|> | No matter how you do it, it will be extra work on the user's part
|> | and IMHO more confusing to debug. It will also be unreliable; you
|> | won't be able to guarantee that the correct library is linked.
|>
|> Users could be making use of non-*64 names AND not define any of the macros
|> on a 32-bit only system (has not begun the transition) and on a 64-bit only
|> system (has completed the transition). What should my library do in these
|> two cases? I see no reason to export to the calling program any *64 variant
|> name.
|
| Well, sure (by definition). For the 32-bit non-LFS case and the 64-bit
| case, you just use the plain interface and you get 32-bit or 64-bit
| data types and file size support.
But then there is the library issue. Either you need two libraries or
you need to make one set of interface symbols use variant names at the
ABI layer.
|> I won't be able to use *64 variant POSIX names. Each of these cases
|> will mean a quite normal compile just as if the LFS strategy had never been
|> created in the first place. The end result will be programs that expect to
|> link to dynamic symbols with no *64 varient names, yet have two different
|> ABIs.
|
| These are 2 different ABIs because they are 2 different architectures.
2 different ABIs, or 2 different sets of names in the same ABI?
At least the implementation I see in Linux has 2 different sets
of names and a single ABI (all in one libc.so).
I want to avoid the 2 different sets of names for my library. I do not
want to end up with binary executables that are trying to link to names
that have "64" in them, especially not on machines that have only one
supported way to access the filesystem (machines that are beyond the
LFS transition ... e.g. "LFS pure").
|> They will have to be linked to a library with the correct ABI, which
|> on these two system cases would be expected to be the only library.
|
| Not on a multilib system. In which case the user is still prevented
| from linking against the wrong library (32-bit apps can't link against
| 64-bit libraries, and vice versa).
Is this a confusion with pointer size differences. There are 4 different
sub-architecture possibilities:
32-bit pointer plus 32-bit file offset
32-bit pointer plus 64-bit file offset
64-bit pointer plus 32-bit file offset (doubtful any of these exist)
64-bit pointer plus 64-bit file offset
for the sake of clarity, and because the 64/32 cases probably do not even
exist anywhere, I'll be just discussing 32-bit pointer based architectures
unless I say otherwise.
|> Now take these two dynamically linked executable binaries over to a third
|> system which is in transition.
|
| And they will work, because on the transitional system, the 32-bit
| apps with 64-bit versions of the interface have *64 names.
What do you mean by "32-bit apps with 64-bit versions"?
| But if you don't have a different interface name for 64-bit data types
| on a 32-bit system, you can't compile on a transitional system and use
| the resultant binary on a non-transitional system. Or rather, the
| binary will execute but will be broken.
Link to a different transitional library. But that is hard because no
standard emerged on how to have this set up to make it easy.
|> I don't have an issue with making my library link to *64 POSIX names.
|> Well, I do in the sense that I think the whole approach is wrong, but I
|> can deal with it. I just don't want to create new names to export to
|> the calling program. I don't want to expand on the POSIX mistake.
|>
|> I think what you are trying to suggest isn't so much that I have to use
|> my own *64 names at the ABI layer (to dynamically link with), but rather,
|> that it is difficult to sort out what library name will be linked with
|> for systems with both ABIs available.
|
| Yes. And impossible to tell if you are linked against the correct
| library; which is a bigger problem if you compile and one system
| and run on another (as you would typically do with pkg mgmt).
Yes, that is a problem. And it is a problem because POSIX chose to not
specify a way to identify the libraries (probably because the scope of
what POSIX is about is the API). IMHO, it should never have been an
API issue (aside from fseek and ftell being broken in POSIX, fixed by
the change to fseeko and ftello). It should have been an architecture
selection issue, with a better means to identify sub-architectures on
mixed-sub-architecture systems (mixed during transition from one to
another).
|> And that will end up being a lot
|> of systems for a long time because of the lack of an exit strategy in
|> the POSIX LFS design. So I take it that you are suggesting that what I
|> need to do is have both filesystem size interfaces in a single library
|> file, which then requires distinct names for the two different sizes.
|
| Yes.
|
|> If I am correct in that assessment of your position, could you tell me
|> your feeling about using *32 names for the 32-bit version of symbols,
|> and leave the 64-bit versions as plain names?
|
| Sounds great.
I might do this, then. Since no common facility exists to properly select
the correct library, it would be difficult to ensure that from just one
library. And the other alternative is to forego any 32-bit offset support
altogether.
How do they do this for mixing 32-bit pointer programs and 64-bit pointer
programs on the same machine where it can support both sub-architectures?
Are there different library files? Or do all the syscalls get *64 variant
names, too?
Part of the reason it's difficult because the Linux folks haven't
really learned of 'getconf LFS_CFLAGS'.
> |> Users could be making use of non-*64 names AND not define any of
> |> the macros on a 32-bit only system (has not begun the transition)
> |> and on a 64-bit only system (has completed the transition). What
> |> should my library do in these two cases? I see no reason to
> |> export to the calling program any *64 variant name.
> |
> | Well, sure (by definition). For the 32-bit non-LFS case and the 64-bit
> | case, you just use the plain interface and you get 32-bit or 64-bit
> | data types and file size support.
>
> But then there is the library issue. Either you need two libraries or
> you need to make one set of interface symbols use variant names at the
> ABI layer.
When you wrote "64-bit only system" I took that to mean a 64-bit
architecture, not a 32-bit architecture using the 64-bit data types.
So most of the rest of my response (and your followup) doesn't make
sense and I'll just elide it.
....
> I might do this, then. Since no common facility exists to properly select
> the correct library, it would be difficult to ensure that from just one
> library. And the other alternative is to forego any 32-bit offset support
> altogether.
Yeah, in your header file you could just have something like
#if defined(_ILP32) && (_FILE_OFFSET_BITS != 64)
#error "32-bit offset (non-LFS) not supported by libfoo"
#endif
> How do they do this for mixing 32-bit pointer programs and 64-bit pointer
> programs on the same machine where it can support both sub-architectures?
> Are there different library files? Or do all the syscalls get *64 variant
> names, too?
Are you asking how does a 64-bit CPU run 32- and 64-bit programs
simultaneously?
First, both the linker and the runtime loader disallow linking between
different architectures (this is done via ELF magic number stuff). So
there are 2 different versions of each library. But because the same
library names are used (e.g. libc.so), this means that 32-bit and
64-bit programs have different DT_RPATH settings (either embedded in
the app or default chosen by the runtime linker).
32-bit programs make syscalls to the 64-bit kernel via a different
entry point than 64-bit programs make (which allows clearing the upper
half of registers, etc). So I would guess that there are 2 different
kernel functions for the 2 different LFS data sizes. Solaris on SPARC
seems to do it that way:
$ dis -F lseek /usr/lib/libc.so
**** DISASSEMBLER ****
disassembly for /usr/lib/libc.so
section .text
lseek()
lseek: 82 10 20 13 mov 0x13, %g1
lseek+0x4: 91 d0 20 08 ta %icc, %g0 + 8
lseek+0x8: 0a bd 85 dc blu __cerror
lseek+0xc: 01 00 00 00 nop
lseek+0x10: 81 c3 e0 08 retl
lseek+0x14: 01 00 00 00 nop
$ dis -F llseek /usr/lib/libc.so
**** DISASSEMBLER ****
disassembly for /usr/lib/libc.so
section .text
llseek()
llseek: 82 10 20 af mov 0xaf, %g1
llseek+0x4: 91 d0 20 08 ta %icc, %g0 + 8
llseek+0x8: 0a bd 85 ec blu __cerror64
llseek+0xc: 01 00 00 00 nop
llseek+0x10: 81 c3 e0 08 retl
llseek+0x14: 01 00 00 00 nop
$
So for lseek, it's syscall #0x13 and for llseek it's syscall #0xaf
(lseek maps to llseek in the Solaris LFS environment).
The only problem with that is that the 64-bit libc calls syscall #0x13
for BOTH lseek() and llseek(). Therefore it would seem to me that the
lseek() syscall does in fact return 64-bit data.
I suppose in this case, since a 32-bit non-LFS app cannot pass 64-bit
data to lseek(), it is only the return value we care about validating.
And if there is a return value >= 2^31, the different syscall entry
point handler can throw the [valid] return from the syscall away as
invalid. I don't know how to trace an app at that level to verify
this one way or the other. That doesn't seem quite right because the
lseek() would still have been done, and the handler would have to know
whether to expect signed or unsigned return values, and that for signed
values -1 is the only acceptable result. Unless of course 2^31-1 is
the largest possible return value for a 32-bit syscall.
Well, as you can see, I don't really know how the syscall part works.
-frank
================================================================
phil@varuna:/home/phil 916> getconf LFS_CFLAGS
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
phil@varuna:/home/phil 917>
================================================================
It seems to do something. But what is that telling me? I thought that
the macros that provide information from the system to the program had
different names, e.g. _LFS_LARGEFILE and _LFS64_LARGEFILE.
How can getconf know whether a program is written to use 64 bits with
the traditional non-64 names, or is written to use the alternate names?
It certainly cannot if not given any such program to examine.
All that getconf can be expected to do is provide information about what
the system implementation is capable of providing.
|> |> Users could be making use of non-*64 names AND not define any of
|> |> the macros on a 32-bit only system (has not begun the transition)
|> |> and on a 64-bit only system (has completed the transition). What
|> |> should my library do in these two cases? I see no reason to
|> |> export to the calling program any *64 variant name.
|> |
|> | Well, sure (by definition). For the 32-bit non-LFS case and the 64-bit
|> | case, you just use the plain interface and you get 32-bit or 64-bit
|> | data types and file size support.
|>
|> But then there is the library issue. Either you need two libraries or
|> you need to make one set of interface symbols use variant names at the
|> ABI layer.
|
| When you wrote "64-bit only system" I took that to mean a 64-bit
| architecture, not a 32-bit architecture using the 64-bit data types.
Sorry. I'm talking about the opaque data types to deal with file offsets
and file sizes, structs that contain them as members, and the functions
that work with these types and structs.
| So most of the rest of my response (and your followup) doesn't make
| sense and I'll just elide it.
Oh.
| ....
|> I might do this, then. Since no common facility exists to properly select
|> the correct library, it would be difficult to ensure that from just one
|> library. And the other alternative is to forego any 32-bit offset support
|> altogether.
|
| Yeah, in your header file you could just have something like
|
| #if defined(_ILP32) && (_FILE_OFFSET_BITS != 64)
| #error "32-bit offset (non-LFS) not supported by libfoo"
| #endif
But I will support it. If my library gets compiled on a system that has no
LFS support, I want it to work just fine with the size that is available.
|> How do they do this for mixing 32-bit pointer programs and 64-bit pointer
|> programs on the same machine where it can support both sub-architectures?
|> Are there different library files? Or do all the syscalls get *64 variant
|> names, too?
|
| Are you asking how does a 64-bit CPU run 32- and 64-bit programs
| simultaneously?
|
| First, both the linker and the runtime loader disallow linking between
| different architectures (this is done via ELF magic number stuff). So
| there are 2 different versions of each library. But because the same
| library names are used (e.g. libc.so), this means that 32-bit and
| 64-bit programs have different DT_RPATH settings (either embedded in
| the app or default chosen by the runtime linker).
|
| 32-bit programs make syscalls to the 64-bit kernel via a different
| entry point than 64-bit programs make (which allows clearing the upper
| half of registers, etc). So I would guess that there are 2 different
| kernel functions for the 2 different LFS data sizes. Solaris on SPARC
| seems to do it that way:
At the kernel ABI layer, there does not necessarily need to be different
syscalls. With a translating library dynamically linked in, a call to a
32-bit function can still call a 64-bit kernel interface in most cases.
A few cases like open() would still need to inform the kernel so it can
flag the descriptor to limit access to only those files that 32-bit
offsets can work with.
For example, a program expecting struct stat with 32-bit members calls
stat() via being linked to a translating library. The stub code in that
library for stat() will call the kernel interface with a 64-bit expectation
(passes a pointer to a 64-bit struct stat in most implementations). If the
kernel returns an error, the stub returns that error. Else it then tests
the values to see if they properly fit in a 32-bit struct stat. If any do
not, it returns an error. If they all do fit, it fills in the caller's
32-bit struct stat and returns success.
Thus a kernel ABI could have mostly only 64-bit file offset/size interfaces
and still support programs expecting 32-bit offset/size interfaces by means
of a different library that presents a 32-bit program ABI and talks to the
kernel's 64-bit ABI. A similar method could be done for 32-bit pointer ABI
interfaces as well. Either there would be separate libraries loaded for
each set of functions (the dynamic link would have to know how to do this
as an faking of a single libc), or there would be a combination of libraries
to handle the variant mixes pointer bits and offset bits that might exist.
That's plausible.
| I suppose in this case, since a 32-bit non-LFS app cannot pass 64-bit
| data to lseek(), it is only the return value we care about validating.
| And if there is a return value >= 2^31, the different syscall entry
| point handler can throw the [valid] return from the syscall away as
| invalid. I don't know how to trace an app at that level to verify
| this one way or the other. That doesn't seem quite right because the
| lseek() would still have been done, and the handler would have to know
| whether to expect signed or unsigned return values, and that for signed
| values -1 is the only acceptable result. Unless of course 2^31-1 is
| the largest possible return value for a 32-bit syscall.
|
| Well, as you can see, I don't really know how the syscall part works.
As long as we are interfacing to a standard API via a library, we should
not have to know in order to achieve correct operation on a system that
has the capability to perform as expected.
I'll eventually dig my old 32-bit Sparc machines out and set them up.
I have an old Solaris 7 to run on them (as well as OpenBSD and Splack).
And I got some Ultra 10's being tossed out from work and could download
a Solaris 10 for those (hopefully it still supports those machines).
Then I'd have at least a couple diverse test points to see how well my
code does, in addition to the x86 Linux systems I have now, with x86-64
coming soon, plus all the emulation architectures in QEMU and Hercules.
Maybe you're thinking of sysconf()?
> How can getconf know whether a program is written to use 64 bits with
> the traditional non-64 names, or is written to use the alternate names?
> It certainly cannot if not given any such program to examine.
>
> All that getconf can be expected to do is provide information about what
> the system implementation is capable of providing.
'getconf LFS_CFLAGS' tells you what to pass to the preprocessor (it
should really be 'getconf LFS_CPPFLAGS') to enable the LFS
environment. My point about Linux is that much software simply makes
up what they think the flags should be, rather than calling getconf to
find out. This makes it difficult for you to correctly write a
portable library which DTRT, since the #ifdef'd flags you use to
enable LFS (and which choose functionality in the user-included header
file for your library), even though correct, may not be the ones a
user of your library uses.
> | ....
> |> I might do this, then. Since no common facility exists to properly select
> |> the correct library, it would be difficult to ensure that from just one
> |> library. And the other alternative is to forego any 32-bit offset support
> |> altogether.
> |
> | Yeah, in your header file you could just have something like
> |
> | #if defined(_ILP32) && (_FILE_OFFSET_BITS != 64)
> | #error "32-bit offset (non-LFS) not supported by libfoo"
> | #endif
>
> But I will support it. If my library gets compiled on a system that has no
> LFS support, I want it to work just fine with the size that is available.
You just said, "the other alternative is to forego 32-bit offset
support". I thought you meant you might do that.
-frank
No. I was referring to the macros _LFS_LARGEFILE and _LFS64_LARGEFILE as
defined in the POSIX LFS extensions.
http://www.unix.org/version2/whatsnew/lfs20mar.html
|> How can getconf know whether a program is written to use 64 bits with
|> the traditional non-64 names, or is written to use the alternate names?
|> It certainly cannot if not given any such program to examine.
|>
|> All that getconf can be expected to do is provide information about what
|> the system implementation is capable of providing.
|
| 'getconf LFS_CFLAGS' tells you what to pass to the preprocessor (it
| should really be 'getconf LFS_CPPFLAGS') to enable the LFS
| environment. My point about Linux is that much software simply makes
| up what they think the flags should be, rather than calling getconf to
| find out. This makes it difficult for you to correctly write a
| portable library which DTRT, since the #ifdef'd flags you use to
| enable LFS (and which choose functionality in the user-included header
| file for your library), even though correct, may not be the ones a
| user of your library uses.
I don't think they make it up. I get it from the document I read:
http://www.unix.org/version2/whatsnew/lfs20mar.html
If someone else might be using different flags that are not listed in
the above document, and not in the POSIX standard per se, then they are
beyond my interest in supporting them (unless they can make a good
argument on why I should adopt what they are doing).
|> |> I might do this, then. Since no common facility exists to properly select
|> |> the correct library, it would be difficult to ensure that from just one
|> |> library. And the other alternative is to forego any 32-bit offset support
|> |> altogether.
|> |
|> | Yeah, in your header file you could just have something like
|> |
|> | #if defined(_ILP32) && (_FILE_OFFSET_BITS != 64)
|> | #error "32-bit offset (non-LFS) not supported by libfoo"
|> | #endif
|>
|> But I will support it. If my library gets compiled on a system that has no
|> LFS support, I want it to work just fine with the size that is available.
|
| You just said, "the other alternative is to forego 32-bit offset
| support". I thought you meant you might do that.
I thought about it in terms of the impact. I want to get to a point where
32-bit file offset support is fully depricated. But that is going to be a
very very long time, especially with embedded systems.
The whole purpose of opaque datatypes was to allow correctly written code
to simply be recompiled in another architecture or subarchitecture (where
a change in file offset size is a subarchitecture) and just work. But too
many people write bad code, and too many people made special hacks to some
systems and became dependent on it, and the standards people fell behind
and ended up being swayed to adopt bad choices in a hurry. That and the
standard itself had some defects that should have been fixed earlier (e.g.
the ftell/ftello and fseek/fseeko issue) like in version 1.
We haven't achieved true opaque datatypes. Many people envision that it
will be quite a while before we need 128 bit file offset types. Do you
think we'll have things fixed by then? That should be plenty of time,
right?
I've periodically imagined putting together my own low level language much
like C (very much like C, actually, but a few important differences) which
would avoid many of the issues C has. That thought keeps bubbling up as
it might also be a way to drive a new cleaner system interface as well.
One thing my language would have is the ability to define data types as
any size you wish (from among what is available) rather easily, but only
by defining a type. You have to define your own type, then define variables
to that type. You can't simply make foo be an integer of 32 bits; you must
define a type of your own as a 32 bit integer then make foo be an instance
of that new type. It will force at least the motion of going about creating
opaque types (and hopefully some programmers would even do it sensibly).
Ah, sure. But those aren't necessarily what you, the programmer, have
to define. Like how _GNU_SOURCE defines other things,
_LARGEFILE_SOURCE defines the right stuff for you. In fact, section
3.3.1 and 3.3.2 seem to say that you (the programmer) should *not*
define _LFS_LARGEFILE, rather unistd.h will define it for you.
-frank
But my point is, _LARGEFILE_SOURCE, _LARGEFILE64_SOURCE, _FILE_OFFSET_BITS
should be defined by the program source. Things like _LFS_LARGEFILE and
_LFS64_LARGEFILE, get defined by the system to indicate what is, or could
be, available. What I have not discovered, yet (too many things to try),
is if _LFS_LARGEFILE and _LFS64_LARGEFILE are defined to say what _could_
be requested, or defined to say what _has_ been requested.
> |> No. I was referring to the macros _LFS_LARGEFILE and _LFS64_LARGEFILE as
> |> defined in the POSIX LFS extensions.
> |>
> |> http://www.unix.org/version2/whatsnew/lfs20mar.html
I'm afraid you may be misunderstanding the status of the above
document. It was a white paper which did two things:
1. It proposed additions to SUS (the Single UNIX Specification) related
to large file support.
2. It proposed some transitional interfaces which implementors could
provide to help convert existing programs to be large file capable.
The transitional interfaces are not, and never have been, included in
POSIX or SUS. I seem to recall that some implementors provided
different transitional interfaces (lf_open() instead of open64()).
The proposed additions to SUS were included in SUSv2 (possibly with
minor modifications). SUSv2 has been superseded by SUSv3/POSIX.1-2001
and some changes to the LFS-related interfaces have occurred.
Thus the white paper is really of no use in writing current
applications; it is purely of historical interest.
> |
> | Ah, sure. But those aren't necessarily what you, the programmer, have
> | to define. Like how _GNU_SOURCE defines other things,
> | _LARGEFILE_SOURCE defines the right stuff for you. In fact, section
> | 3.3.1 and 3.3.2 seem to say that you (the programmer) should *not*
> | define _LFS_LARGEFILE, rather unistd.h will define it for you.
>
> But my point is, _LARGEFILE_SOURCE, _LARGEFILE64_SOURCE, _FILE_OFFSET_BITS
> should be defined by the program source. Things like _LFS_LARGEFILE and
> _LFS64_LARGEFILE, get defined by the system to indicate what is, or could
> be, available. What I have not discovered, yet (too many things to try),
> is if _LFS_LARGEFILE and _LFS64_LARGEFILE are defined to say what _could_
> be requested, or defined to say what _has_ been requested.
None of these things are in POSIX or SUS. You should try to stick to
the standard interfaces rather than using the obsolete, non-standard,
transitional interfaces.
The standard specifies two different programming environments for
32-bit systems. One has a 32-bit off_t, the other has a 64-bit (or
greater) off_t. You can use sysconf() or getconf to query which of
the programming environments are supported, and confstr() or getconf
to find out what compiler flags, linker flags, and libraries are
needed for each environment.
--
Geoff Clare <net...@gclare.org.uk>
So where is a clear and correct document that tells specifically what does
or does not happen with the specific macros defined or not?
|> | Ah, sure. But those aren't necessarily what you, the programmer, have
|> | to define. Like how _GNU_SOURCE defines other things,
|> | _LARGEFILE_SOURCE defines the right stuff for you. In fact, section
|> | 3.3.1 and 3.3.2 seem to say that you (the programmer) should *not*
|> | define _LFS_LARGEFILE, rather unistd.h will define it for you.
|>
|> But my point is, _LARGEFILE_SOURCE, _LARGEFILE64_SOURCE, _FILE_OFFSET_BITS
|> should be defined by the program source. Things like _LFS_LARGEFILE and
|> _LFS64_LARGEFILE, get defined by the system to indicate what is, or could
|> be, available. What I have not discovered, yet (too many things to try),
|> is if _LFS_LARGEFILE and _LFS64_LARGEFILE are defined to say what _could_
|> be requested, or defined to say what _has_ been requested.
|
| None of these things are in POSIX or SUS. You should try to stick to
| the standard interfaces rather than using the obsolete, non-standard,
| transitional interfaces.
|
| The standard specifies two different programming environments for
| 32-bit systems. One has a 32-bit off_t, the other has a 64-bit (or
| greater) off_t. You can use sysconf() or getconf to query which of
| the programming environments are supported, and confstr() or getconf
| to find out what compiler flags, linker flags, and libraries are
| needed for each environment.
What is the value of sysconf() or getconf for each of the cases:
1. If only 32-bit is supported
2. If only 64-bit is supported
3. If both are supported
For my macros that are part of my library, how should they test for which
environment is being used? What if the calling program has selected a
transitional environment?
> |> |> http://www.unix.org/version2/whatsnew/lfs20mar.html
> |
> | I'm afraid you may be misunderstanding the status of the above
> | document. It was a white paper which did two things:
> |
> | 1. It proposed additions to SUS (the Single UNIX Specification) related
> | to large file support.
> |
> | 2. It proposed some transitional interfaces which implementors could
> | provide to help convert existing programs to be large file capable.
> |
> | The transitional interfaces are not, and never have been, included in
> | POSIX or SUS. I seem to recall that some implementors provided
> | different transitional interfaces (lf_open() instead of open64()).
> |
> | The proposed additions to SUS were included in SUSv2 (possibly with
> | minor modifications). SUSv2 has been superseded by SUSv3/POSIX.1-2001
> | and some changes to the LFS-related interfaces have occurred.
> |
> | Thus the white paper is really of no use in writing current
> | applications; it is purely of historical interest.
>
> So where is a clear and correct document that tells specifically what does
> or does not happen with the specific macros defined or not?
http://www.unix.org/version3/online.html
> |> | Ah, sure. But those aren't necessarily what you, the programmer, have
> |> | to define. Like how _GNU_SOURCE defines other things,
> |> | _LARGEFILE_SOURCE defines the right stuff for you. In fact, section
> |> | 3.3.1 and 3.3.2 seem to say that you (the programmer) should *not*
> |> | define _LFS_LARGEFILE, rather unistd.h will define it for you.
> |>
> |> But my point is, _LARGEFILE_SOURCE, _LARGEFILE64_SOURCE, _FILE_OFFSET_BITS
> |> should be defined by the program source. Things like _LFS_LARGEFILE and
> |> _LFS64_LARGEFILE, get defined by the system to indicate what is, or could
> |> be, available. What I have not discovered, yet (too many things to try),
> |> is if _LFS_LARGEFILE and _LFS64_LARGEFILE are defined to say what _could_
> |> be requested, or defined to say what _has_ been requested.
> |
> | None of these things are in POSIX or SUS. You should try to stick to
> | the standard interfaces rather than using the obsolete, non-standard,
> | transitional interfaces.
> |
> | The standard specifies two different programming environments for
> | 32-bit systems. One has a 32-bit off_t, the other has a 64-bit (or
> | greater) off_t. You can use sysconf() or getconf to query which of
> | the programming environments are supported, and confstr() or getconf
> | to find out what compiler flags, linker flags, and libraries are
> | needed for each environment.
>
> What is the value of sysconf() or getconf for each of the cases:
>
> 1. If only 32-bit is supported
> 2. If only 64-bit is supported
> 3. If both are supported
sysconf(_SC_V6_ILP32_OFF32) or getconf _POSIX_V6_ILP32_OFF32
tells you if an "ILP32" (32-bit int, long and pointer) programming
environment with 32-bit off_t is supported.
sysconf(_SC_V6_ILP32_OFFBIG) or getconf _POSIX_V6_ILP32_OFFBIG
tells you if an ILP32 programming environment with 64-bit (or
greater) off_t is supported.
Combining the two results will answer each of your three questions
(unless you specifically wanted 64-bit off_t, not 64-bit or greater).
> For my macros that are part of my library, how should they test for which
> environment is being used?
I don't know of any direct way, nor can I see the need for such a
thing. Perhaps you need to look in a different way at the problem
you are trying to solve.
> What if the calling program has selected a
> transitional environment?
Non-standard environments are (naturally) not covered by the standard.
--
Geoff Clare <net...@gclare.org.uk>
|> So where is a clear and correct document that tells specifically what does
|> or does not happen with the specific macros defined or not?
|
| http://www.unix.org/version3/online.html
I'll give that a look later on.
If off_t is greater, I don't think that will be an issue.
But the problem is, I want to be able to do this test during the
compile using CPP tests like #ifdef so that my macro definitions
can be done correctly.
|
|> For my macros that are part of my library, how should they test for which
|> environment is being used?
|
| I don't know of any direct way, nor can I see the need for such a
| thing. Perhaps you need to look in a different way at the problem
| you are trying to solve.
The macros will need to know which functions to call within my own library
to get the correct behaviour semantics. An example is the lstat() syscall,
which in a 32-bit off_t environment will fail for a file which cannot be
described in the corresponding struct stat.
Otherwise, I would have to go back to the idea of having 2 different .so
files (and probably 2 different .a files) for my library, one for 32-bit
off_t and one for 64-bit off_t.
|> What if the calling program has selected a
|> transitional environment?
|
| Non-standard environments are (naturally) not covered by the standard.
Nevertheless, it seems they are as prevalent as if it had been standardized.
That brings us full circle to the point I made in my first post in
this thread. The only fully portable solution (i.e. to all POSIX
systems, both current and future) is to have two separate libraries.
Mixing ILP32_OFF32 and ILP32_OFFBIG objects may work on (some) current
systems, but as far as POSIX is concerned they are separate programming
environments and implementations can refuse to allow you to mix them.
--
Geoff Clare <net...@gclare.org.uk>
| That brings us full circle to the point I made in my first post in
| this thread. The only fully portable solution (i.e. to all POSIX
| systems, both current and future) is to have two separate libraries.
OK, so suppose I do that. How will you distinguish these libraries?
One probem is there is no real mechanism for handling subarchitectures
with different ABIs in a distinct way.
| Mixing ILP32_OFF32 and ILP32_OFFBIG objects may work on (some) current
| systems, but as far as POSIX is concerned they are separate programming
| environments and implementations can refuse to allow you to mix them.
What's important is the ability to match up binary executables compiled
for each environment (subarchitecture) with the appropropriate library.
> | That brings us full circle to the point I made in my first post in
> | this thread. The only fully portable solution (i.e. to all POSIX
> | systems, both current and future) is to have two separate libraries.
>
> OK, so suppose I do that. How will you distinguish these libraries?
That's entirely up to you. One common convention is to put them
in separate directories and use -L to choose the directory, so
that the library can still have its normal name in each case.
(Of course for shared libraries you also need to set rpath in the
executable so the right library version will be loaded at run time.)
--
Geoff Clare <net...@gclare.org.uk>
One issue is that whatever is done to an executable, that executable must
be able to run on any system that supports executables of that pointer
size and that offset size, whether or not other sizes are supported. So,
clearly, if an rapth setting is used for this, it would have to be a broad
standard. This is yet another area that POSIX has stumbled on by not
having established a uniform standard in the first place. I'm sure their
execuse will be that POSIX is only about APIs, not about the underlying
mechanisms that various systems use. But if POSIX doesn't want to take
the lead on this, who should? LSB for Linux? Who for other Unix systems?
Maybe we need to have a standard "ABI connectivity" naming scheme somewhat
like C++ does for their object class interface linkages. But in this case
it would be for libraries. It would need to address both pointer size and
offset size (although I suspect there won't be any real life usage of 64
bit pointer and 32 bit offset, although I would not be 100% sure). How
about names like: libc.32.64.so (first is pointer size, second is offset
size)?
BTW, it's actually possible on the 32-bit x86 to use pointers up to 48 bits
without needing any additional instructions from the 64-bit architecture.
The catch is it innvolves some segment register juggling and corresponding
instruction generation from the compiler. AFAIK, no toolchain or kernel
has been made to support this hacked mode, so maybe we'll never need to
deal with it. I don't know if any other 32-bit CPU has any such capability
to go beyond 32 bits for pointers and address space.
Applications can still only see 32 bits worth of address space, so this
isn't useful for LFS (e.g. you can't mmap a very large file in).
It's called PAE and is supported by Linux. I think also by Solaris,
but I'm not 100% sure.
-frank
If your goal is to be able to randomly access different parts of a file
without having to call mmap() more than once, then it certainly is an
advantage to be accessing the large file from a 64-bit address space.
But many of the advantages of using mmap() still work when mapping only
a portion of a file at one time. Programs accessing large files in a
32-bit address space would be restricted to that methd.
| It's called PAE and is supported by Linux. I think also by Solaris,
| but I'm not 100% sure.
I was referring to using it at the process/VM layer. It could be done.
AFAIK, this isn't implemented in Linux. I don't know about Solaris.
How could it possibly be done at the process layer? The application
expects a contigous memory layout.
-frank
The process layer ABI would have to not expect a contiguous memory layout.
It would be faked for the C programming layer by having the compiler for
this kind of sub-architecture present the appearance of a flat layout by
the use of code that loads 6 byte or 8 byte pointer types into appropriate
segment and offset registers. It could allow applications to run in a VM
that in theory could be as large as nearly 281474976710656 bytes, at the
expense of being a bit slower due to all the extra instructions handling
pointers in an odd way.
One of the C compiler memory models for the 16-bit real mode 8086 allowed
using a 32-bit pointer with values up to 1MB (or modulo 1MB) by splitting
the pointer bits between a 16-bit segment and a 16-bit address. I wrote
and ran programs with that memory model under MS-DOS.
How can you have a 64-bit integer (long long) on a machine that only has
instructions for integers up to 32 bits? You just use compiler tricks.
There's no reason we can't have int128_t and uint128_t other than no one
has modified the C compilers to do it (yet).
Actually, we could build a completely fake 64-bit pointer size environment
at the C programming level, complete with 64-bit long, 64-bit off_t, and
many other things at the same size as in a real 64-bit machine, and generate
the code strictly for a 32-bit machine. It would have limitations, such as
the largest addressable memory, depending on the ABI layer sub-architecture
choice involved. It wouldn't have a great deal of value, as it would waste
memory for no real benefit, other than as a test realm for making sure code
works correctly for the comping 64-bit machines. Now that the latter really
are here, this benefit is no more. But it could have been done a long time
ago if enough of the right people had thought out of the box.