Questions about pointer comparisons

rap...@gmail.com

unread,

Dec 1, 2009, 6:54:22 AM12/1/09

to

Assuming this program:

#include <stdio.h>

int main( int argc, char *argv[] )
{

int a[100];

int b[100];

int *pa;

int *b_top;

pa = a + 20;

b_top = b + 99;

if( (pa >= b) && (pa<=b_top) )
{
printf( "Pa is pointing to an element in array b\n" );
}
else
{
printf( "Pa is not pointing to an element in array b\n" );
}

}

Will the else branch always execute in theory (it probably will in
practice)?

Does the standard define what comparison between pointers which point
to different arrays do?

In principle, there could be a rule that for all possible pointers,
there must be an ordering.

What if the pointers were void pointers in the comparisons?

i.e. would

((void *)Pa) < ((void *)b_top)

give the same answer as

Pa < b_top

for all Pa and b_top

Jens Thoms Toerring

unread,

Dec 1, 2009, 7:35:29 AM12/1/09

to

rap...@gmail.com <rap...@gmail.com> wrote:
> Assuming this program:

> #include <stdio.h>

> int main( int argc, char *argv[] )
> {
> int a[100];
> int b[100];
> int *pa;
> int *b_top;

> pa = a + 20;
> b_top = b + 99;

> if( (pa >= b) && (pa<=b_top) )
> {
> printf( "Pa is pointing to an element in array b\n" );
> }
> else
> {
> printf( "Pa is not pointing to an element in array b\n" );
> }
> }

> Will the else branch always execute in theory (it probably will in
> practice)?

> Does the standard define what comparison between pointers which point
> to different arrays do?

Yes, the standard is quite clear about this in the section about
"Relational operators":

If the objects pointed to are not members of the same aggregate
or union object, the result is undefined, with the following
exception. If P points to the last member of an array object
and Q points to a member of the same array object, the pointer
expression P+1 compares higher than Q , even though P+1 does
not point to a member of the array object.

I.e. by comparing pointers to different arrays you invoke un-
defined behaviour and the result is meaningless.

> In principle, there could be a rule that for all possible pointers,
> there must be an ordering.

That would require that each machine a C can be written has to
have a flat memory model. And what do you hope to gain by com-
paring pointers to different objects? In your example program
the compiler could put 'a' before 'b' or also the other way
round. And it, in principle, could also put some extra padding
in between. So what would be the benefit of being able to com-
pare pointers to somehwere within 'a' and 'b' (except maybe
figuring out something about how your compiler lays out the
arrays in memory (for the set of options you invoked it with),d
something you can't rely on)?

> What if the pointers were void pointers in the comparisons?

> i.e. would

> ((void *)Pa) < ((void *)b_top)

> give the same answer as

> Pa < b_top

> for all Pa and b_top

Casting to void pointers doesn't change anything about the
basic fact that it invokes undefined behaviour. I would say
it just makes things a bit worse since a comparison like
a < b implies a - b < 0, and arithmetic on void poiters is
also undefined.
Regards, Jens
--
\ Jens Thoms Toerring ___ j...@toerring.de
\__________________________ http://toerring.de

Nick Keighley

unread,

Dec 1, 2009, 7:48:03 AM12/1/09

to

On 1 Dec, 11:54, "raph...@gmail.com" <raph...@gmail.com> wrote:

> Assuming this program:

I've reduced the vertical space (I was tempted to fix the indentation
as well...)

> #include <stdio.h>
>
> int main( int argc, char *argv[] )
> {
> int a[100];
> int b[100];
> int *pa;
> int *b_top;
>
> pa = a + 20;
> b_top = b + 99;
> if( (pa >= b) && (pa<=b_top) )
> printf( "Pa is pointing to an element in array b\n" );
> else
> printf( "Pa is not pointing to an element in array b\n" );
> }
>

> Will the else branch always execute in theory[?]

no

> (it probably will in practice)?

probably, on a typical modern desktop

> Does the standard define what comparison between pointers which point
> to different arrays do?

no (it says the behaviour is undefined)

> In principle, there could be a rule that for all possible pointers,
> there must be an ordering.

there could be such a rule, but there isn't

> What if the pointers were void pointers in the comparisons?

the behaviour still wouldn't be defined

> i.e. would
>
> ((void *)Pa) < ((void *)b_top)
>
> give the same answer as
>
> Pa < b_top
>
> for all Pa and b_top

you don't know, the behaviour is undefined in either case and may not
necessarily give the same behaviour on different implementations.

Pillsy

unread,

Dec 1, 2009, 10:52:31 AM12/1/09

to

On Dec 1, 7:35 am, j...@toerring.de (Jens Thoms Toerring) wrote:
[...]
> And what do you hope to gain by comparing pointers to different objects?
[...]
Perhaps you want to use that ordering as the basis for a set
represented as a binary tree? AIUI, that's how things work in the C++
STL, and it seems reasonable to want to use the same implementation
strategy in C.

Cheers,
Pillsy

Nick Keighley

unread,

Dec 1, 2009, 11:03:18 AM12/1/09

to

On 1 Dec, 15:52, Pillsy <pillsb...@gmail.com> wrote:
> On Dec 1, 7:35 am, j...@toerring.de (Jens Thoms Toerring) wrote:

> > And what do you hope to gain by comparing pointers to different objects?
>

> Perhaps you want to use that ordering as the basis for a set
> represented as a binary tree? AIUI, that's how things work in the C++
> STL, and it seems reasonable to want to use the same implementation
> strategy in C.

I don't think C++ allows you compare pointers to different objects
either. A particular implementation may do this but they are gambling
that they never port to somewhere this assumption is broken. I suppose
on most implementaions you can rely on them being in *some* order and
that distinct objects can be compared and won;t do anything bizzare
like over lapping or interleaving. Some of the mainframe people may be
slightly scared by the idea of trying to get this to work...

Pillsy

unread,

Dec 1, 2009, 11:28:21 AM12/1/09

to

On Dec 1, 11:03 am, Nick Keighley <nick_keighley_nos...@hotmail.com>
wrote:
[...]

> > Perhaps you want to use that ordering as the basis for a set
> > represented as a binary tree? AIUI, that's how things work in the C++
> > STL, and it seems reasonable to want to use the same implementation
> > strategy in C.

> I don't think C++ allows you compare pointers to different objects
> either.

I don't want to wander too deep into the dubiously topical weeds of
how C++ works, but (again AIUI) the STL provides a default ordering on
pointers using comparison using std::less, not that you can use the
"<" operator.

> A particular implementation may do this but they are gambling
> that they never port to somewhere this assumption is broken.

Well, yeah, but that's the point of making it part of the standard
library, isn't it?

> I suppose on most implementaions you can rely on them being in
> *some* order and that distinct objects can be compared and won;t
> do anything bizzare like over lapping or interleaving.

I don't see how even overlapping or interleaving prevents you from
defining an ordering that's sufficient for the purpose at hand
(stuffing things in a binary tree). Then again, I'm not a mainframe
programmer, and it's more than possible that I'm missing something.

Cheers,
Pillsy

Seebs

unread,

Dec 1, 2009, 1:27:09 PM12/1/09

to

On 2009-12-01, rap...@gmail.com <rap...@gmail.com> wrote:
> Will the else branch always execute in theory (it probably will in
> practice)?

I wouldn't say it "always" will in practice -- I'd guess it will fairly
often, on many compilers.

> Does the standard define what comparison between pointers which point
> to different arrays do?

No, but it specifies what it does -- it invokes undefined behavior.

> In principle, there could be a rule that for all possible pointers,
> there must be an ordering.

But this rule would have been extremely hard to implement on some fairly
widespread targets, so it isn't there.

> What if the pointers were void pointers in the comparisons?

No effect.

> ((void *)Pa) < ((void *)b_top)
>
> give the same answer as
>
> Pa < b_top
>
> for all Pa and b_top

Interesting question. I'd guess that the answer is likely that they'd do
the same thing (which might be failing dismally) on essentially every target,
but it's still undefined behavior unless they're both pointers into the same
object.

-s
--
Copyright 2009, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Nick

unread,

Dec 1, 2009, 4:02:18 PM12/1/09

to

Which is something of a shame - although finding out if the array 'a' is
lower in memory than the array 'b' could be meaningless on machines
without a flat memory model, find out if a particular pointer is pointing
to a place within a particular object could, conceivably, be useful in
some rather strange circumstances - the OPs program is asking a not
silly question (I have a pointer to an integer, and an array of
integers, is my pointer pointing to a place inside the array).

I can see why - it's hard to see how you can legitimise it while keeping
if(a < b) outlawed - but I could also see a use for it.
--
Online waterways route planner: http://canalplan.org.uk
development version: http://canalplan.eu

Keith Thompson

unread,

Dec 1, 2009, 4:07:33 PM12/1/09

to

Nick <3-no...@temporary-address.org.uk> writes:
> j...@toerring.de (Jens Thoms Toerring) writes:

[...]

>> I.e. by comparing pointers to different arrays you invoke un-
>> defined behaviour and the result is meaningless.
>
> Which is something of a shame - although finding out if the array 'a' is
> lower in memory than the array 'b' could be meaningless on machines
> without a flat memory model, find out if a particular pointer is pointing
> to a place within a particular object could, conceivably, be useful in
> some rather strange circumstances - the OPs program is asking a not
> silly question (I have a pointer to an integer, and an array of
> integers, is my pointer pointing to a place inside the array).
>
> I can see why - it's hard to see how you can legitimise it while keeping
> if(a < b) outlawed - but I could also see a use for it.

Agreed. (You can achieve the same thing in 100% portable C by
comparing the address for equality to the address of each byte
within the array, but that's not feasible for large arrays -- and
the simpler method, though not guaranteed to work, will *probably*
work on most implementations.)

As for why the standard doesn't guarantee anything, consider a
system with a segmented addressing scheme, where an address consists
of something that identifies a segment plus an offset within the
segment. Each object must be contained within a single segment.
"<" and ">" comparisons on pointers can then compare just the offset
portion of the addresses, which may be significantly more efficient.

The standard could have required "<" and ">" to work sensibly
on pointers to distinct objects, but that would require such
implementations to use more expensive operations for such
comparisons. It would also require some consistent ordering to be
imposed on the segments, which might require still more work.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Peter Nilsson

unread,

Dec 1, 2009, 4:26:11 PM12/1/09

to

Nick Keighley <nick_keighley_nos...@hotmail.com> wrote:
> > On Dec 1, 7:35 am, j...@toerring.de (Jens Thoms Toerring) wrote:
> > > And what do you hope to gain by comparing pointers to
> > > different objects?

...

> I don't think C++ allows you compare pointers to different

> objects either. ...

The original context was relational operators, but I think it's
worth mentioning that pointers to different objects are often
compared with each other via equality operators.

That a conforming implementation must support this makes me
wonder why the application of relative operators on pointers
to 'different' objects is undefined, rather than unspecified.

The only reason I can think of is to cater for some sort of
interpreted implementation where arrays are implemented as
linked lists. Is there perhaps a more pragmatic reason?

--
Peter

Antoninus Twink

unread,

Dec 1, 2009, 4:45:44 PM12/1/09

to

On 1 Dec 2009 at 21:02, Nick wrote:
> the OPs program is asking a not silly question (I have a pointer to an
> integer, and an array of integers, is my pointer pointing to a place
> inside the array).

I suggest you ignore the idiotic paranoia of the regulars. This
construct will work perfectly on every single desktop platform you're
ever likely to encounter, either now or in the future.

If you're programming in the real world, be a pragmatist. If you want to
debate angels on pinheads with the clc "regulars", then by all means be
a pedantic literalist.

Seebs

unread,

Dec 1, 2009, 4:57:31 PM12/1/09

to

On 2009-12-01, Peter Nilsson <ai...@acay.com.au> wrote:
> That a conforming implementation must support this makes me
> wonder why the application of relative operators on pointers
> to 'different' objects is undefined, rather than unspecified.

> The only reason I can think of is to cater for some sort of
> interpreted implementation where arrays are implemented as
> linked lists. Is there perhaps a more pragmatic reason?

Making !=/== do extra work that's needed to keep things from blowing up
has lower impact than making all the pointer relational operators do that.

When you're working within an object, you're unlikely to compare pointers
for equality, so the extra cost is trivial; when you're comparing across
objects, though, you might well need equality to work.

Nick

unread,

Dec 1, 2009, 5:30:43 PM12/1/09

to

Antoninus Twink <nos...@nospam.invalid> writes:

> On 1 Dec 2009 at 21:02, Nick wrote:
>> the OPs program is asking a not silly question (I have a pointer to an
>> integer, and an array of integers, is my pointer pointing to a place
>> inside the array).
>
> I suggest you ignore the idiotic paranoia of the regulars. This
> construct will work perfectly on every single desktop platform you're
> ever likely to encounter, either now or in the future.

Please don't patronise me. I'm quite capable of working out who gives
sensible advice, and who leaps straight for the "you can't do"
autoresponder. I'm also quite capable of spotting cronies or
sockpuppets of people who never post anything about programming at all.
I'd already mentally put you on the "candidate for plonking" list, along
with your mates. You might be helping to fast-track your application.

For the record, in my time I've programmed in C on a lot of machines
that don't fit this description. I've programmed on non-ascii machines,
I've programmed on machines where, except for chars, everything was 64
bits. I've programmed on machines where the introduction of a standard
compiler blew my software up because suddenly the bits were ordered from
the opposite end.

I also write software now that assumes ascii (and comments on it), and
uses a pile of system specific stuff - nicely partitioned off so it can
be changed if and when the system changes. Indeed, the code is running
on its the third generation of processor and operating system.

> If you're programming in the real world, be a pragmatist. If you want to
> debate angels on pinheads with the clc "regulars", then by all means be
> a pedantic literalist.

But note that when you do go outside the realms of what is standardised,
it's worth knowing that you are doing it.

Keith Thompson

unread,

Dec 1, 2009, 5:46:20 PM12/1/09

to

Nick <3-no...@temporary-address.org.uk> writes:
> Antoninus Twink <nos...@nospam.invalid> writes:
[more of the same]

> Please don't patronise me. I'm quite capable of working out who gives
> sensible advice, and who leaps straight for the "you can't do"
> autoresponder.

[...]

Nick, by all means feel free to waste your own time reading what the
trolls post, or even responding to them by e-mail, but please don't
waste everyone else's time by posting your respnoses to the newsgroup.
Most of us have already killfiled them.

(I see AT uses a fake e-mail address, so I suppose replying by e-mail
isn't an option. Oh well.)

Antoninus Twink

unread,

Dec 1, 2009, 5:53:15 PM12/1/09

to

On 1 Dec 2009 at 22:30, Nick wrote:
> I'd already mentally put you on the "candidate for plonking" list, along
> with your mates. You might be helping to fast-track your application.

Go right ahead - it will certainly earn you plenty of brownie points
with the "regulars".

> For the record, in my time I've programmed in C on a lot of machines
> that don't fit this description.

Good for you.

>> If you're programming in the real world, be a pragmatist. If you want to
>> debate angels on pinheads with the clc "regulars", then by all means be
>> a pedantic literalist.
>
> But note that when you do go outside the realms of what is standardised,
> it's worth knowing that you are doing it.

I don't disagree. In fact, pragmatism precisely means that you should
decide on the right balance between portability and functionality.

Let's imagine I have two large arrays, say size n, one called A and the
other called B. A and B both store struct foos, all struct foos either
live in A or in B, and I need to be able to tell whether a given struct
foo comes from A or B.

I see three choices here:

1) I can add a field to each struct foo saying whether it belongs to A
or B. This will either require some fiddling and complexity in order to
use a spare bit in one of the struct fields for this purpose, or else
I've increased the memory by at least 2n bytes.

2) the Heathfield solution:
int is_in_A(struct foo *p)
{
struct foo *q;
for(q = A; q < A + n; q++)
if(q == p)
return 1;
return 0;
}
This is O(n) work EVERY TIME I want to check whether the struct is in A
or B. But it's standards compliant.

3) the OP's solution:
#define IS_IN_A(p) ((p) >= A && (p) < A + n)
Now it's O(1).

The /only/ circumstances in which a pragmatic programmer should be
prepared to pay the price of 1) or 2) are

* either if they're taking part in an academic exercise (e.g. all of
Heathfield's programming falls into this category),

* or if they have DAMN good reasons to expect that their program will
need to run on some obscure architecture where pointer representations
will screw this up.

Antoninus Twink

unread,

Dec 1, 2009, 5:58:17 PM12/1/09

to

On 1 Dec 2009 at 22:46, Keith Thompson wrote:

> Nick <3-no...@temporary-address.org.uk> writes:
>> Please don't patronise me.
>

> Nick, by all means feel free to waste your own time reading what the
> trolls post, or even responding to them by e-mail, but please don't
> waste everyone else's time by posting your respnoses to the newsgroup.
> Most of us have already killfiled them.

Keith, if I didn't know you were incapable of humor, I'd chalk this up
as comic genius.

This must be the most patronizing response to a request not to be
patronized that it's possible to imagine.

Pillsy

unread,

Dec 1, 2009, 5:59:22 PM12/1/09

to

On Dec 1, 4:07 pm, Keith Thompson <ks...@mib.org> wrote:
[....]

> The standard could have required "<" and ">" to work sensibly
> on pointers to distinct objects, but that would require such
> implementations to use more expensive operations for such
> comparisons. It would also require some consistent ordering to be
> imposed on the segments, which might require still more work.

That work wouldn't necessarily have to be done by the relational
operators. "<" and ">" don't provide lexicographic ordering on
strings, either, but strcmp() does. Unlike strcmp(), implementing
these kinds of pointer comparisons evidently requires an extensive
knowledge of non-portable details of an implementation.

Cheers,
Pillsy

Seebs

unread,

Dec 1, 2009, 5:55:52 PM12/1/09

to

On 2009-12-01, Keith Thompson <ks...@mib.org> wrote:
> (I see AT uses a fake e-mail address, so I suppose replying by e-mail
> isn't an option. Oh well.)

The purpose of replying by email is to communicate with someone. I have
seen nothing to suggest that there is anything to communicate with in
that case, so no real loss.

Flash Gordon

unread,

Dec 1, 2009, 5:50:39 PM12/1/09

to

Pillsy wrote:
> On Dec 1, 11:03 am, Nick Keighley <nick_keighley_nos...@hotmail.com>
> wrote:
> [...]
>>> Perhaps you want to use that ordering as the basis for a set
>>> represented as a binary tree? AIUI, that's how things work in the C++
>>> STL, and it seems reasonable to want to use the same implementation
>>> strategy in C.
>
>> I don't think C++ allows you compare pointers to different objects
>> either.
>
> I don't want to wander too deep into the dubiously topical weeds of
> how C++ works, but (again AIUI) the STL provides a default ordering on
> pointers using comparison using std::less, not that you can use the
> "<" operator.

The C equivalent would be adding a library function to do this, of course.

>> A particular implementation may do this but they are gambling
>> that they never port to somewhere this assumption is broken.
>
> Well, yeah, but that's the point of making it part of the standard
> library, isn't it?

It would be, and if you want to propose it I suggest wandering over to
comp.std.c

>> I suppose on most implementaions you can rely on them being in
>> *some* order and that distinct objects can be compared and won;t
>> do anything bizzare like over lapping or interleaving.
>
> I don't see how even overlapping or interleaving prevents you from
> defining an ordering that's sufficient for the purpose at hand
> (stuffing things in a binary tree). Then again, I'm not a mainframe
> programmer, and it's more than possible that I'm missing something.

I actually suspect that a major reason for it not being defined
originally was small architectures, such as the Intel 8086. On such
architectures each individual object will fit within a segment, but the
program may be using objects in several segments. It is quick to compare
just the offset, but comparing both segment and offset, especially if
you need to normalise the pointers first, is a rather more major task.

As you say, I'm sure an ordering could be defined for even the most
exotic architecture, even if it does not really have an ordering.

If functions were added I would suggest two functions...
ptrcmp(void*,void*)
ptrbtwn(void*,void*,void*)
After all, you might be able to check if a pointer is in a range more
efficiently than doing two individual pointer comparisons.
--
Flash Gordon

Eric Sosman

unread,

Dec 1, 2009, 6:02:00 PM12/1/09

to

There have been computers with multiple address spaces[*],
where relations within a space make sense but cross-space
relations don't. Loose analogy: You may know that "1234 Elm
Street" is farther out of town than "96 Elm Street," but what
can you say about its position relative to "1447 River Drive?"

[*] At least one contemporary chip supports multiple spaces,
although I am not aware of any machine that actually uses them
except to distinguish I/O registers from ordinary memory. Also,
the address space ID is hidden inside the MMU where the code
never sees it as part of a virtual address, so virtual addresses
have a natural ordering even across spaces. But if somebody
finds a way to quadruple the teraflops by exploiting multiple
address spaces, it's likely they'll make use of the idea ...

--
Eric Sosman
eso...@ieee-dot-org.invalid

Keith Thompson

unread,

Dec 1, 2009, 6:13:45 PM12/1/09

to

If the language required "<" and ">" to work consistently for pointer
values (i.e., for any two valid pointers p1 and p2, exactly one of
(p1 < p2), (p1 == p2), and (p1 > p2) is true, along with the other
usual consistency requirements), then the "<" and ">" operators
*would* have to do all that work.

But providing a pointer comparison function in the standard library
that *does* do all the extra work would be an interesting idea.
Perhaps something like:

int ptrcmp(const void *p1, const void *p2);

returning an integer less than, equal to, or greater than zero
depending on whether p1 < p2, p1 == p2, or p1 > p2. It would
return a result consistent with the relational operators where such
a result is defined, and an implementation-defined but consistent
result in other cases. For most implementations, the implementation
could be simple:

int ptrcmp(const void *p1, const void *p2)
{
if (p1 < p2) return -1;
else if (p1 == p2) return 0;
else return +1;
}

Or would it be more useful to provide separate ptrless and ptrgreater
functions (equality is already handled by "==")?

Beej Jorgensen

unread,

Dec 1, 2009, 6:17:52 PM12/1/09

to

Antoninus Twink <nos...@nospam.invalid> wrote:
>I see three choices here:

Another (admittedly contrived) option which is also O(1) and completely
portable is this:

#include <stdio.h>

int main(void)
{
int array[200];
int *a = array;
int *b = array + 100;

int *pa;
int *b_top;
pa = a + 20;
b_top = b + 99;

if ((pa >= b) && (pa <= b_top)) {

printf( "Pa is pointing to an element in array b\n" );
} else {
printf( "Pa is not pointing to an element in array b\n" );
}
}

-Beej

Richard Harter

unread,

Dec 1, 2009, 6:34:38 PM12/1/09

to

On Tue, 01 Dec 2009 14:46:20 -0800, Keith Thompson
<ks...@mib.org> wrote:

>Nick <3-no...@temporary-address.org.uk> writes:
>> Antoninus Twink <nos...@nospam.invalid> writes:
>[more of the same]
>> Please don't patronise me. I'm quite capable of working out who gives
>> sensible advice, and who leaps straight for the "you can't do"
>> autoresponder.
>[...]
>
>Nick, by all means feel free to waste your own time reading what the
>trolls post, or even responding to them by e-mail, but please don't
>waste everyone else's time by posting your respnoses to the newsgroup.
>Most of us have already killfiled them.

I will only say that I find your post to be quite offensive.

Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Infinity is one of those things that keep philosophers busy when they
could be more profitably spending their time weeding their garden.

Peter Nilsson

unread,

Dec 1, 2009, 7:12:13 PM12/1/09

to

On Dec 2, 10:02 am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> Peter Nilsson wrote:
> > Nick Keighley <nick_keighley_nos...@hotmail.com> wrote:
> >>> On Dec 1, 7:35 am, j...@toerring.de (Jens Thoms Toerring) wrote:
> >>>> And what do you hope to gain by comparing pointers to
> >>>> different objects?
> > ...
> >> I don't think C++ allows you compare pointers to different
> >> objects either. ...
>
> > The original context was relational operators, but I think it's
> > worth mentioning that pointers to different objects are often
> > compared with each other via equality operators.
>
> > That a conforming implementation must support this makes me
> > wonder why the application of relative operators on pointers
> > to 'different' objects is undefined, rather than unspecified.
>
> > The only reason I can think of is to cater for some sort of
> > interpreted implementation where arrays are implemented as
> > linked lists. Is there perhaps a more pragmatic reason?
>
> There have been computers with multiple address spaces[*],
> where relations within a space make sense but cross-space
> relations don't.

Yes, but do you know of any systems that trap doing a comparison
instruction on addresses from different spaces?

I read that MIPS had a trap on overflow, but it seems the
comparison and branch instructions didn't have that feature.

--
Peter

Peter Nilsson

unread,

Dec 1, 2009, 7:12:17 PM12/1/09

to

On Dec 2, 8:57 am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2009-12-01, Peter Nilsson <ai...@acay.com.au> wrote:
> > That a conforming implementation must support this makes me
> > wonder why the application of relative operators on pointers
> > to 'different' objects is undefined, rather than unspecified.
> > The only reason I can think of is to cater for some sort of
> > interpreted implementation where arrays are implemented as
> > linked lists. Is there perhaps a more pragmatic reason?
>
> Making !=/== do extra work that's needed to keep things from
> blowing up has lower impact than making all the pointer
> relational operators do that.

What extra work? I don't know of a system where 'robust'
testing for equality requires anything other than a compare
or conditional-branch instruction, neither of which trap.
[Happy to be educated though.]

--
Peter

Seebs

unread,

Dec 1, 2009, 7:21:41 PM12/1/09

to

On 2009-12-02, Peter Nilsson <ai...@acay.com.au> wrote:
>> Making !=/== do extra work that's needed to keep things from
>> blowing up has lower impact than making all the pointer
>> relational operators do that.

> What extra work? I don't know of a system where 'robust'
> testing for equality requires anything other than a compare
> or conditional-branch instruction, neither of which trap.
> [Happy to be educated though.]

Segmented architectures with pointers that don't include segment information
by default. To make !=/== work, you have to figure out how to tag the
pointers with their segments, and compare segments as well as the "basic"
pointer value. Otherwise, the first objects in two segments compare equal,
which is Very Bad.

Kaz Kylheku

unread,

Dec 1, 2009, 7:43:06 PM12/1/09

to

On 2009-12-01, Keith Thompson <ks...@mib.org> wrote:

> If the language required "<" and ">" to work consistently for pointer
> values (i.e., for any two valid pointers p1 and p2, exactly one of
> (p1 < p2), (p1 == p2), and (p1 > p2) is true, along with the other
> usual consistency requirements), then the "<" and ">" operators
> *would* have to do all that work.

Which means that code targetting 8086 machines would be larger and slower,
having to normalize segment:offset values rather than blindly comparing
them as 32 bit words.

When C was standarized, this architecture was widespread, and used
as a target for C programming. Standardizing pointer inequality would have
codified code that didn't work on MS-DOS, at least in some memory models.

This issue is now worth revisiting.

Keith Thompson

unread,

Dec 1, 2009, 7:45:05 PM12/1/09

to

I don't think blowing up is the problem (thought the standard
currently allows it). I think the problem is consistency.

Suppose you have two pointers, p1 and p2, that point to different
objects. Each pointer consists of something that identifies
the segment containing the object, plus a byte offset within that
segment. Comparing the offsets is easy, but there might not be any
defined relationship (other than equality/inequality) between the
segment identifiers. Though it's quite possible that just comparing
the segment ids as if they were (signed or unsigned?) integers
would give consistent results (which could easily vary from one
run of a program to the next, but that's ok).

Another issue is normalization. If segments aren't necessarily
disjoint, you could have two different representations of the same
memory address. Then again, that would affect "==" and "!=" as well,
a problem that implementations already have to solve.

I've never really used a system with a non-linear addressing scheme.
Perhaps someone who has could chime in.

It's plausible that computing (p1 < p2) on certain systems is
significantly easier if the compiler can assume that they point to
the same object (or just past the end of it). Perhaps the standard
left the behavior in other cases undefined just because it would
have been too much effort, with too little payoff, to constrain it
enough to make it unspecified or implementation-defined.

Kaz Kylheku

unread,

Dec 1, 2009, 7:48:18 PM12/1/09

to

On 2009-12-02, Seebs <usenet...@seebs.net> wrote:
> On 2009-12-02, Peter Nilsson <ai...@acay.com.au> wrote:
>>> Making !=/== do extra work that's needed to keep things from
>>> blowing up has lower impact than making all the pointer
>>> relational operators do that.
>
>> What extra work? I don't know of a system where 'robust'
>> testing for equality requires anything other than a compare
>> or conditional-branch instruction, neither of which trap.
>> [Happy to be educated though.]
>
> Segmented architectures with pointers that don't include segment information
> by default.

Plural?

Oh, you must be counting 8088, 8086, 80186, etc, as separate architectures.

Sjouke Burry

unread,

Dec 1, 2009, 8:01:44 PM12/1/09

to

But on an intel segment/offset memory structure, one memory location
can have about 400 different pointers, all pointing at that loc,
and all different.
So before comparing, you have to normalize both pointers.
Or calculate the physical address(16 times seg + offset) or something
like that.
Thats quite some extra work for a comparison.

Keith Thompson

unread,

Dec 1, 2009, 8:03:45 PM12/1/09

to

Agreed (except that I think you meant relational operators, not
inequality).

Seebs

unread,

Dec 1, 2009, 8:30:29 PM12/1/09

to

On 2009-12-02, Keith Thompson <ks...@mib.org> wrote:
> Suppose you have two pointers, p1 and p2, that point to different
> objects. Each pointer consists of something that identifies
> the segment containing the object, plus a byte offset within that
> segment. Comparing the offsets is easy, but there might not be any
> defined relationship (other than equality/inequality) between the
> segment identifiers. Though it's quite possible that just comparing
> the segment ids as if they were (signed or unsigned?) integers
> would give consistent results (which could easily vary from one
> run of a program to the next, but that's ok).

I believe, though I could be wrong, that there has existed at least one
architecture on which there existed more than one segment/offset pair
referring to the same physical byte of memory, meaning that it could be
possible for two pointers with differing segment AND differing offset
values to actually refer to the same location...

Seebs

unread,

Dec 1, 2009, 8:32:15 PM12/1/09

to

On 2009-12-02, Kaz Kylheku <kkyl...@gmail.com> wrote:
> On 2009-12-02, Seebs <usenet...@seebs.net> wrote:
>> Segmented architectures with pointers that don't include segment information
>> by default.

> Plural?

Maybe?

> Oh, you must be counting 8088, 8086, 80186, etc, as separate architectures.

No, I just referred to the category. The category may contain only one
architecture; I have no clue. Although... Hmm. I'm pretty sure I've seen
some relative addressing modes on other systems, which could imply that
there would exist things comparable in effect to "segmenting".

For that matter, consider a 68k with its 24-bit address space implemented
with 32-bit words. On such a system, arguably, 0x10123456 and 0x20123456
were the same pointer... (And emacs relied on this, making a ton of work
for Amiga programmers when the 68020 showed up.)

Eric Sosman

unread,

Dec 1, 2009, 8:45:27 PM12/1/09

to

No. But "it traps" is not what "undefined behavior"
means. Given pointer variables p,q,r, would you be happy
with p<q && q<r && r<p, all without a trap?

Or, even without resorting to segmented architectures and
multiple address spaces, how would you like p<q && q-p==0?
That one's *easy* to achieve, even in a flat address space.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Keith Thompson

unread,

Dec 1, 2009, 9:06:57 PM12/1/09

to

Seebs <usenet...@seebs.net> writes:
> On 2009-12-02, Keith Thompson <ks...@mib.org> wrote:
>> Suppose you have two pointers, p1 and p2, that point to different
>> objects. Each pointer consists of something that identifies
>> the segment containing the object, plus a byte offset within that
>> segment. Comparing the offsets is easy, but there might not be any
>> defined relationship (other than equality/inequality) between the
>> segment identifiers. Though it's quite possible that just comparing
>> the segment ids as if they were (signed or unsigned?) integers
>> would give consistent results (which could easily vary from one
>> run of a program to the next, but that's ok).
>
> I believe, though I could be wrong, that there has existed at least one
> architecture on which there existed more than one segment/offset pair
> referring to the same physical byte of memory, meaning that it could be
> possible for two pointers with differing segment AND differing offset
> values to actually refer to the same location...

And compilers already have to deal with that to support (p1 == p2).

On such an architecture, leaving (p1 < p2) undefined when p1 and p2
don't point into the same object (or ...) doesn't make the compiler's
job significantly easier, it just makes for faster code in some cases.

Peter Nilsson

unread,

Dec 1, 2009, 9:32:06 PM12/1/09

to

On Dec 2, 11:21 am, Seebs <usenet-nos...@seebs.net> wrote:
> On 2009-12-02, Peter Nilsson <ai...@acay.com.au> wrote:
> > > Making !=/== do extra work that's needed to keep things
> > > from blowing up has lower impact than making all the
> > > pointer relational operators do that.
> >
> > What extra work? I don't know of a system where 'robust'
> > testing for equality requires anything other than a compare
> > or conditional-branch instruction, neither of which trap.
> > [Happy to be educated though.]
>
> Segmented architectures with pointers that don't include
> segment information by default. To make !=/== work, you have
> to figure out how to tag the pointers with their segments, and

> compare segments as well as the "basic" pointer value. ...

Wouldn't that presumably apply to the normal case for relational
operators as well...?

extern int foo(const int *a, const int *b) { return a < b; }

--
Peter

Seebs

unread,

Dec 1, 2009, 9:28:01 PM12/1/09

to

On 2009-12-02, Keith Thompson <ks...@mib.org> wrote:
> And compilers already have to deal with that to support (p1 == p2).

Somewhat.

> On such an architecture, leaving (p1 < p2) undefined when p1 and p2
> don't point into the same object (or ...) doesn't make the compiler's
> job significantly easier, it just makes for faster code in some cases.

I'm not sure about this. On such an architecture, equality is pretty well
defined (do they point to the same byte), but relations are less clearly
defined.

Do we just map everything to a flat space and compare there? Is a pointer
larger if it has a larger segment, or if it has a larger offset?

Note that it's not totally obvious, even if it were defined, that the
right answers ought to be completely consistent. I could imagine pointers
such that p1 < p2, p2 < p3, p3 < p4, and p4 < p1 in such an environment.

But I think the big thing is the code efficiency; if you can simplify the
common tests for the case where they basically make sense, life gets a lot
easier and faster.

Peter Nilsson

unread,

Dec 1, 2009, 9:35:37 PM12/1/09

to

Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> Peter Nilsson wrote:

> > Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> > > ...

> > > There have been computers with multiple address spaces[*],
> > > where relations within a space make sense but cross-space
> > > relations don't.
> >
> > Yes, but do you know of any systems that trap doing a comparison
> > instruction on addresses from different spaces?
>
> No. But "it traps" is not what "undefined behavior"
> means.

Yielding true or false (rightly or wrongly) fits within
the bounds of unspecified behaviour. The question was, why
is it undefined behaviour?

> Given pointer variables p,q,r, would you be happy
> with p<q && q<r && r<p, all without a trap?

For pointers to 'different' objects, I might be...

<http://en.wikipedia.org/wiki/Nontransitive_dice>

--
Peter

Richard Heathfield

unread,

Dec 1, 2009, 11:48:45 PM12/1/09

to

In <slrnhhbh4l.ihq...@guild.seebs.net>, Seebs wrote:

> On 2009-12-02, Keith Thompson <ks...@mib.org> wrote:
>> Suppose you have two pointers, p1 and p2, that point to different
>> objects. Each pointer consists of something that identifies
>> the segment containing the object, plus a byte offset within that
>> segment. Comparing the offsets is easy, but there might not be any
>> defined relationship (other than equality/inequality) between the
>> segment identifiers. Though it's quite possible that just
>> comparing the segment ids as if they were (signed or unsigned?)
>> integers would give consistent results (which could easily vary
>> from one run of a program to the next, but that's ok).
>
> I believe, though I could be wrong, that there has existed at least
> one architecture on which there existed more than one segment/offset
> pair referring to the same physical byte of memory, meaning that it
> could be possible for two pointers with differing segment AND
> differing offset values to actually refer to the same location...

You're right. MS-DOS is such an architecture. In MS-DOS, segments are
16 bits wide, and offsets are 16 bits wide, but a segment-offset pair
is a 20-bit value formed like this:

pointer_value = (segment << 4) | offset;

It isn't difficult to come up with several segment-offset pairs that
equate to the same pointer value. For example, the seg/off pair
331A:1399 normalises to 333B9. So do 3323:01A9, 3232:33B9, 3233:10A9,
and so on and so on..

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

Ian Collins

unread,

Dec 1, 2009, 11:46:40 PM12/1/09

to

Relational operators would be meaningless on segmented architectures.
How would you define them?

> --
> Peter

Your sig. is missing the trailing space.

--
Ian Collins

Nobody

unread,

Dec 2, 2009, 12:29:25 AM12/2/09

to

On Tue, 01 Dec 2009 18:27:09 +0000, Seebs wrote:

>> In principle, there could be a rule that for all possible pointers,
>> there must be an ordering.
>
> But this rule would have been extremely hard to implement on some fairly
> widespread targets, so it isn't there.

No it wouldn't.

The ordering wouldn't have to be meaningful, just satisfy the axioms for a
total ordering, i.e. for all a, b, c:

a == a
a == b => b == a
a > b => b < a
a > b && b > c => a > c

So long as a pointer can be interpreted as a string of bits, you can
define a total ordering for pointers.

C opted to use <, > and == for a partial ordering, as this is sufficient
for many purposes and may be more efficient.

Nick

unread,

Dec 2, 2009, 1:41:38 AM12/2/09

to

Keith Thompson <ks...@mib.org> writes:

> The standard could have required "<" and ">" to work sensibly
> on pointers to distinct objects, but that would require such
> implementations to use more expensive operations for such
> comparisons. It would also require some consistent ordering to be
> imposed on the segments, which might require still more work.

You could get round the second part by saying they can return false if
pointing to different objects. So the OP's example would work, but a<b
and a>b would both fail. Nasty, I agree.

I've been trying to think of a real reason to want to do what the OP was
doing, and I think I've come up with one: caching a number of short
strings in a block of memory. You may be passed a pointer to a string
and if that pointer is already part of the cache you return immediately,
if it isn't you search the cache for the string, only if that fails do
you add it (if there is space).
--
Online waterways route planner: http://canalplan.org.uk
development version: http://canalplan.eu

Phil Carmody

unread,

Dec 2, 2009, 4:11:09 AM12/2/09

to

Define them to work on normalised values. It's been done before -
at least 2 different ways!

Phil
--
Any true emperor never needs to wear clothes. -- Devany on r.a.s.f1

Phil Carmody

unread,

Dec 2, 2009, 4:13:44 AM12/2/09

to

Sjouke Burry <burrynu...@ppllaanneett.nnll> writes:
> Peter Nilsson wrote:
> > On Dec 2, 8:57 am, Seebs <usenet-nos...@seebs.net> wrote:
> >> On 2009-12-01, Peter Nilsson <ai...@acay.com.au> wrote:
> >>> That a conforming implementation must support this makes me
> >>> wonder why the application of relative operators on pointers
> >>> to 'different' objects is undefined, rather than unspecified.
> >>> The only reason I can think of is to cater for some sort of
> >>> interpreted implementation where arrays are implemented as
> >>> linked lists. Is there perhaps a more pragmatic reason?
> >> Making !=/== do extra work that's needed to keep things from
> >> blowing up has lower impact than making all the pointer
> >> relational operators do that.
> > What extra work? I don't know of a system where 'robust'
> > testing for equality requires anything other than a compare
> > or conditional-branch instruction, neither of which trap.
> > [Happy to be educated though.]
> > --
> > Peter
> But on an intel segment/offset memory structure, one memory location
> can have about 400 different pointers, all pointing at that loc,
> and all different.

about 4000

> So before comparing, you have to normalize both pointers.
> Or calculate the physical address(16 times seg + offset) or something
> like that. Thats quite some extra work for a comparison.

Hence all decent compilers supported different models, so that
you could let it do the least work possible.

Kaz Kylheku

unread,

Dec 2, 2009, 4:31:02 AM12/2/09

to

On 2009-12-02, Nick <3-no...@temporary-address.org.uk> wrote:
> Keith Thompson <ks...@mib.org> writes:
>
>> The standard could have required "<" and ">" to work sensibly
>> on pointers to distinct objects, but that would require such
>> implementations to use more expensive operations for such
>> comparisons. It would also require some consistent ordering to be
>> imposed on the segments, which might require still more work.
>
> You could get round the second part by saying they can return false if
> pointing to different objects.

Problem with that is that De Morgan's transformations of expressions
don't behave properly.

Sure, the ptr >= a && ptr < a + n test works (hit test
fails for distinct objects, but a test which should be equivalent.
expressed !(ptr < a || ptr >= a + n) breaks.

That's bad, because compilers should be able to do algebraic
transformations on expressions, which depends on being able
to trust equivalences like a < b == !(a >= b).

> and a>b would both fail. Nasty, I agree.

Indeed.

>
> I've been trying to think of a real reason to want to do what the OP was
> doing, and I think I've come up with one: caching a number of short

One use for this is memory management. Your allocator's free function is
given a pointer, and it has to figure out which block it belongs to (if
any).

A conservative garbage collector has to do a hit test like this to
determine whether some value pulled from the stack might be
a pointer into one of the heap areas.

> strings in a block of memory. You may be passed a pointer to a string
> and if that pointer is already part of the cache you return immediately,
> if it isn't you search the cache for the string, only if that fails do
> you add it (if there is space).

But to make this work, you have to return a different pointer from
the original string, one which points into that area so that it
satisfies the hit test. Strings arriving into the program will
not meet the hit test even if they match something in the table.

You might as well implement interning of strings to objects.

Interning is a conversion of some data which denote an object to that
object, such that two or more occurences of that data map to the
same instance. When a string enters into our program, we intern it to a
pointer to some other type, say ``struct symbol *''. If the same
string occurs twice, the interning function returns the same pointer.
We never have to put a ``struct symbol *'' through an existence
check because we know it's already interned. We can compare two
such pointers for exact equality, in place of comparing strings.

Pillsy

unread,

Dec 2, 2009, 10:02:11 AM12/2/09

to

On Dec 1, 7:12 pm, Peter Nilsson <ai...@acay.com.au> wrote:
> On Dec 2, 8:57 am, Seebs <usenet-nos...@seebs.net> wrote:

[...]

> > Making !=/== do extra work that's needed to keep things from
> > blowing up has lower impact than making all the pointer
> > relational operators do that.

> What extra work? I don't know of a system where 'robust'
> testing for equality requires anything other than a compare
> or conditional-branch instruction, neither of which trap.
> [Happy to be educated though.]

One thing that just occurred to me is that assuming there's a well-
defined ordering on pointers to distinct objects could interfere with
some garbage collection schemes which like to move things around. This
might make it impossible to guarantee that the order of p1 and p2
won't change over time even if you don't touch the values of p1 and p2
directly.

I don't know of anything prohibiting an implementation deciding to use
such a strategy for memory management, but I honestly don't know a
whole hell of a lot about what conforming C implementations are
allowed to do, and limiting the possible garbage collection strategies
for a language that's so wedded to manual memory management is
probably a complete non-issue regardless.

Cheers,
Pillsy

Keith Thompson

unread,

Dec 2, 2009, 11:25:46 AM12/2/09

to

Pillsy <pill...@gmail.com> writes:
> On Dec 1, 7:12 pm, Peter Nilsson <ai...@acay.com.au> wrote:
>> On Dec 2, 8:57 am, Seebs <usenet-nos...@seebs.net> wrote:
> [...]
>> > Making !=/== do extra work that's needed to keep things from
>> > blowing up has lower impact than making all the pointer
>> > relational operators do that.
>
>> What extra work? I don't know of a system where 'robust'
>> testing for equality requires anything other than a compare
>> or conditional-branch instruction, neither of which trap.
>> [Happy to be educated though.]
>
> One thing that just occurred to me is that assuming there's a well-
> defined ordering on pointers to distinct objects could interfere with
> some garbage collection schemes which like to move things around. This
> might make it impossible to guarantee that the order of p1 and p2
> won't change over time even if you don't touch the values of p1 and p2
> directly.

C's semantics already effectively forbid such a garbage collection
scheme.

Relocating an object to which the program has a valid pointer would
invalidate that pointer. Conceivably a GC system could go through
memory and update any pointers, but a program is allowed to depend
on the representation of a pointer. For example, it could write a
pointer value to a file and read it back later in the same execution
of the program.

In practice, conservative GC schemes like Boehm's can be made to work
with C code, but the C code has to observe some restrictions beyond
those imposed by the language. Note that Boehm GC does not, as far
as I know, relocate objects, but it does depend on being able to find
pointer values in memory.

> I don't know of anything prohibiting an implementation deciding to use
> such a strategy for memory management, but I honestly don't know a
> whole hell of a lot about what conforming C implementations are
> allowed to do, and limiting the possible garbage collection strategies
> for a language that's so wedded to manual memory management is
> probably a complete non-issue regardless.

--

Richard

unread,

Dec 2, 2009, 12:02:30 PM12/2/09

to

Antoninus Twink <nos...@nospam.invalid> writes:

> On 1 Dec 2009 at 22:46, Keith Thompson wrote:

>> Nick <3-no...@temporary-address.org.uk> writes:
>>> Please don't patronise me.
>>

>> Nick, by all means feel free to waste your own time reading what the
>> trolls post, or even responding to them by e-mail, but please don't
>> waste everyone else's time by posting your respnoses to the newsgroup.
>> Most of us have already killfiled them.
>

> Keith, if I didn't know you were incapable of humor, I'd chalk this up
> as comic genius.
>
> This must be the most patronizing response to a request not to be
> patronized that it's possible to imagine.
>

Indeed.

--
"Avoid hyperbole at all costs, its the most destructive argument on
the planet" - Mark McIntyre in comp.lang.c

Richard

unread,

Dec 2, 2009, 12:03:10 PM12/2/09

to

Seebs <usenet...@seebs.net> writes:

> On 2009-12-01, Keith Thompson <ks...@mib.org> wrote:

>> (I see AT uses a fake e-mail address, so I suppose replying by e-mail
>> isn't an option. Oh well.)
>
> The purpose of replying by email is to communicate with someone. I have
> seen nothing to suggest that there is anything to communicate with in
> that case, so no real loss.
>
> -s

Twink has given more useful real world C advice than, you preening
idiot.

Kenny McCormack

unread,

Dec 2, 2009, 12:21:59 PM12/2/09

to

In article <hf66kf$97r$4...@news.eternal-september.org>,

Richard <rgr...@gmail.com> wrote:
>Seebs <usenet...@seebs.net> writes:
>
>> On 2009-12-01, Keith Thompson <ks...@mib.org> wrote:
>>> (I see AT uses a fake e-mail address, so I suppose replying by e-mail
>>> isn't an option. Oh well.)
>>
>> The purpose of replying by email is to communicate with someone. I have
>> seen nothing to suggest that there is anything to communicate with in
>> that case, so no real loss.
>>
>> -s
>
>Twink has given more useful real world C advice than, you preening
>idiot.

Right. And that's exactly the problem. Giving someone useful real
world C advice is the very last thing a CLC regs wants to see happening.
It screws up their whole world.

Pillsy

unread,

Dec 2, 2009, 12:51:38 PM12/2/09

to

On Dec 2, 11:25 am, Keith Thompson <ks...@mib.org> wrote:
> Pillsy <pillsb...@gmail.com> writes:
[...]

> > One thing that just occurred to me is that assuming there's a well-
> > defined ordering on pointers to distinct objects could interfere with
> > some garbage collection schemes which like to move things around. This
> > might make it impossible to guarantee that the order of p1 and p2
> > won't change over time even if you don't touch the values of p1 and p2
> > directly.

> C's semantics already effectively forbid such a garbage collection
> scheme.

That's clear now. Thanks.

> Relocating an object to which the program has a valid pointer would
> invalidate that pointer. Conceivably a GC system could go through
> memory and update any pointers, but a program is allowed to depend
> on the representation of a pointer. For example, it could write a
> pointer value to a file and read it back later in the same execution
> of the program.

[...]

Cheers,
Pillsy

Bruce Cook

unread,

Dec 4, 2009, 7:33:30 AM12/4/09

to

Flash Gordon wrote:

> I actually suspect that a major reason for it not being defined
> originally was small architectures, such as the Intel 8086. On such
> architectures each individual object will fit within a segment, but the
> program may be using objects in several segments. It is quick to compare
> just the offset, but comparing both segment and offset, especially if
> you need to normalise the pointers first, is a rather more major task.

And because segments overlap it's possible to have 2 different addresses
that point to the same object, or an address with a higher segment than
another but points to a lower address. (Can't remember what the segment
overlap size was on 80x86 but quickly learned to hate that memory model)

Bruce

Bruce Cook

unread,

Dec 4, 2009, 7:44:16 AM12/4/09

to

Richard Heathfield wrote:

Yep exactly the problem.

On the earlier compilers that directly supported this "extended" address
space (before the time of C standards) there was no guarantee of == working
with pointers. We had a macro PTR_CMP(a, b) which on 80x86 and a few other
architectures would do the magic to accurately compare pointers. mostly we
just avoided it in the first place.

Bruce

Noob

unread,

Dec 4, 2009, 8:54:46 AM12/4/09

to

Bruce Cook wrote:

> And because segments overlap it's possible to have 2 different addresses
> that point to the same object, or an address with a higher segment than
> another but points to a lower address. (Can't remember what the segment
> overlap size was on 80x86 but quickly learned to hate that memory model)

uint16_t segment, offset;
addr = segment*16 + offset; /* 20-bit address */

Thus (0,16) and (1,0) reduce to the same address.
(4,0) "is lower than" (0,80)
(0,80) "is lower than" (8,0)

Flash Gordon

unread,

Dec 4, 2009, 3:37:46 PM12/4/09

to

I remember how the model works. However, if you normalise the pointers
first (which I mentioned might be needed above) the problems you mention
go away (the purpose of normalising is to deal with them).
--
Flash Gordon

Stephen Sprunk

unread,

Dec 5, 2009, 12:30:27 AM12/5/09

to

... but normalization is expensive. That's why real-mode x86 variants
included various pointer classes (e.g. near, far, huge) with different
behavior. By letting the programmer decide what, if any, normalization
was to be done to any given pointer, rather than mandating that it
_always_ be done, the resulting program could be significantly faster.
(And, unless the programmers were superhuman, probably a lot buggier.)

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Flash Gordon

unread,

Dec 5, 2009, 3:23:36 AM12/5/09

to

Stephen Sprunk wrote:
> Flash Gordon wrote:
>> Bruce Cook wrote:
>>> Flash Gordon wrote:
>>>> I actually suspect that a major reason for it not being defined
>>>> originally was small architectures, such as the Intel 8086. On such
>>>> architectures each individual object will fit within a segment, but the
>>>> program may be using objects in several segments. It is quick to compare
>>>> just the offset, but comparing both segment and offset, especially if
>>>> you need to normalise the pointers first, is a rather more major task.
>>> And because segments overlap it's possible to have 2 different
>>> addresses that point to the same object, or an address with a higher
>>> segment than another but points to a lower address. (Can't remember
>>> what the segment overlap size was on 80x86 but quickly learned to hate
>>> that memory model)
>> I remember how the model works. However, if you normalise the pointers
>> first (which I mentioned might be needed above) the problems you mention
>> go away (the purpose of normalising is to deal with them).
>
> ... but normalization is expensive.

I addressed this in my first quoted post above... my last post was just
saying that it was *possible* to define an ordering (by using pointer
normalisation), since Bruce seems to be saying it was not.

> That's why real-mode x86 variants
> included various pointer classes (e.g. near, far, huge) with different
> behavior. By letting the programmer decide what, if any, normalization
> was to be done to any given pointer, rather than mandating that it
> _always_ be done, the resulting program could be significantly faster.
> (And, unless the programmers were superhuman, probably a lot buggier.)

Agreed.
--
Flash Gordon

Nobody

unread,

Dec 5, 2009, 10:16:52 AM12/5/09

to

On Fri, 04 Dec 2009 14:54:46 +0100, Noob wrote:

>> And because segments overlap it's possible to have 2 different addresses
>> that point to the same object, or an address with a higher segment than
>> another but points to a lower address. (Can't remember what the segment
>> overlap size was on 80x86 but quickly learned to hate that memory model)
>
> uint16_t segment, offset;
> addr = segment*16 + offset; /* 20-bit address */

On 80286 and later, wrapping was optional so it could be up to 21 bits:

0xFFFF0 + 0xFFFF = 0x10FFEF (1M+64K-16).

Nobody

unread,

Dec 5, 2009, 10:23:05 AM12/5/09

to

On Sat, 05 Dec 2009 08:23:36 +0000, Flash Gordon wrote:

>>> I remember how the model works. However, if you normalise the pointers
>>> first (which I mentioned might be needed above) the problems you mention
>>> go away (the purpose of normalising is to deal with them).
>>
>> ... but normalization is expensive.
>
> I addressed this in my first quoted post above... my last post was just
> saying that it was *possible* to define an ordering (by using pointer
> normalisation), since Bruce seems to be saying it was not.

It's *possible* to define an ordering without using normalization,
although this would result in different pointer representations for the
same address not comparing equal.

You can have the same issues on modern systems with a flat address
space by e.g. mmap()ing a region twice, resulting in multiple logical
addresses for the same physical address.

rap...@gmail.com

unread,

Dec 9, 2009, 8:31:16 AM12/9/09

to

Thanks for the info.

On Dec 1, 9:07 pm, Keith Thompson <ks...@mib.org> wrote:
> The standard could have required "<" and ">" to work sensibly
> on pointers to distinct objects, but that would require such
> implementations to use more expensive operations for such
> comparisons. It would also require some consistent ordering to be
> imposed on the segments, which might require still more work.

Maybe there could be an operator that returns the "flat consistant
address" for each pointer.

The code would then be something like

if( ( (flat *)pa => (flat *)pb ) && ( (flat *)pa < (flat *)pb +
100 ) )
{
// pa points to an element of pb
}

If you know that pa and pb are the same array, then you can drop the
(flat *) part.

On Dec 2, 5:29 am, Nobody <nob...@nowhere.com> wrote:
> The ordering wouldn't have to be meaningful, just satisfy the axioms for a
> total ordering, i.e. for all a, b, c:
>
> a == a
> a == b => b == a
> a > b => b < a
> a > b && b > c => a > c

Number 2 doesn't follow, presumably you meant

a > b => b < a

> So long as a pointer can be interpreted as a string of bits, you can

> define a total ordering for pointers.

I agree. If the pointer "legitimately" points at an array, then it
will have the same segment as the array base. This assumes that the
array fits in a single segment. However, multi-segment arrays would
have to have the full far address anyway.

A random pointer might point at an element in the array, while being
considered to fall outside the array (due to the segment mismatch),
but that shouldn't happen in practice, unless the segments are
directly modified.

Also, in a segmented system, there is the concept of a far and near
pointer.

The rules would be something like:

If the pointers point at the same (logical) array, then it works as
currently.

If the 2 pointers point at 2 different (logical) arrays, then the
ordering is taken from the ordering of the arrays, which are required
to be consistant.

On Dec 2, 6:41 am, Nick <3-nos...@temporary-address.org.uk> wrote:
> I've been trying to think of a real reason to want to do what the OP was
> doing, and I think I've come up with one: caching a number of short

> strings in a block of memory. You may be passed a pointer to a string
> and if that pointer is already part of the cache you return immediately,
> if it isn't you search the cache for the string, only if that fails do
> you add it (if there is space).

Right that is what I was thinking. Ideally, the search would use a
binary search ... which has the same requirement.

Keith Thompson

unread,

Dec 9, 2009, 11:44:34 AM12/9/09

to

"rap...@gmail.com" <rap...@gmail.com> writes:
> Thanks for the info.
>
> On Dec 1, 9:07 pm, Keith Thompson <ks...@mib.org> wrote:
>> The standard could have required "<" and ">" to work sensibly
>> on pointers to distinct objects, but that would require such
>> implementations to use more expensive operations for such
>> comparisons. It would also require some consistent ordering to be
>> imposed on the segments, which might require still more work.
>
> Maybe there could be an operator that returns the "flat consistant
> address" for each pointer.
>
> The code would then be something like
>
> if( ( (flat *)pa => (flat *)pb ) && ( (flat *)pa < (flat *)pb +
> 100 ) )
> {
> // pa points to an element of pb
> }
>
> If you know that pa and pb are the same array, then you can drop the
> (flat *) part.

This would require the implementation to provide a pointer type flat*
that might have a completely different representation than any other
pointer types, and might not even act like a pointer type at all.

It would make more sense to provide standard library functions that do
"flat" pointer comparison.

> On Dec 2, 5:29 am, Nobody <nob...@nowhere.com> wrote:
>> The ordering wouldn't have to be meaningful, just satisfy the axioms for a
>> total ordering, i.e. for all a, b, c:
>>
>> a == a
>> a == b => b == a
>> a > b => b < a
>> a > b && b > c => a > c
>
> Number 2 doesn't follow, presumably you meant
>
> a > b => b < a

That's exactly what write as item 3. And yes, item 2 is a valid
axiom; equality is commutative.

[...]

>> So long as a pointer can be interpreted as a string of bits, you can
>> define a total ordering for pointers.
>
> I agree. If the pointer "legitimately" points at an array, then it
> will have the same segment as the array base. This assumes that the
> array fits in a single segment. However, multi-segment arrays would
> have to have the full far address anyway.
>
> A random pointer might point at an element in the array, while being
> considered to fall outside the array (due to the segment mismatch),
> but that shouldn't happen in practice, unless the segments are
> directly modified.
>
> Also, in a segmented system, there is the concept of a far and near
> pointer.
>
> The rules would be something like:
>
> If the pointers point at the same (logical) array, then it works as
> currently.
>
> If the 2 pointers point at 2 different (logical) arrays, then the
> ordering is taken from the ordering of the arrays, which are required
> to be consistant.

Yes, that could have been done. The problem is that the compiler
doesn't necessarily know whether two pointers point to the same array.
In the worst case, the compiler might have to generate extra code for
every pointer comparison, even when it's not necessary.

[...]

rap...@gmail.com

unread,

Dec 9, 2009, 12:42:21 PM12/9/09

to

On Dec 9, 4:44 pm, Keith Thompson <ks...@mib.org> wrote:
> That's exactly what write as item 3. And yes, item 2 is a valid
> axiom; equality is commutative.

Opps, read it to quickly.

> Yes, that could have been done. The problem is that the compiler
> doesn't necessarily know whether two pointers point to the same array.
> In the worst case, the compiler might have to generate extra code for
> every pointer comparison, even when it's not necessary.

If the pointer is a far pointer, then the full check would be needed.
The compiler could know which type of pointer.

Another option is to have only near or only far pointers in a given
program.

If the segments are being dealt with transparently by the compiler
(which is required, since far pointers are not actually a valid c
concept?), then there needs to be some way to distinguish between the
2 types.

So,

int a[100]; // could store it in "near" memory
int b[100000]; // must store it in "far" memory (assuming 64k per
segment)

int *c; // should be able to point to elements in a or b

c could be implemented as a 32 bit number with the 16 LSBs as an
offset and the 8 MSBs as a flag to indicate if the pointer is a near
pointer. Otherwise, the 16 MSBs are the segment address.

Near pointers can be compared by checking 3 bytes, which may not be
much of a speed boost, compared to just testing the full 32 bits.

Another option is to have the MSB of the offset used as the flag.
This drops the segment sizes down to 32k. However, it means only 2
bytes need to be checked (but the MSB needs to be checked).