So the standard says, they tell me. It is also one the more flagrant
violations of the Principle of Least Astonishment I've seen in a
while. In fact, while we're at it, it would seem to violate the idea
that you give the programmer all the rope she asks for, because she
just might be needing it to pull herself out of a bog. Gentlemen
system programmers, surely you too have algorithms that are more
accurately expressed with arrays from other than base zero?
Actually, on a segmented architecture I might be astonished if it
*didn't* bomb. The principle is rather subjective I'm afraid.
: In fact, while we're at it, it would seem to violate the idea
: that you give the programmer all the rope she asks for, because she
: just might be needing it to pull herself out of a bog.
Note that the standard does *not* say that you can't do this, it
just says that it is nonportable. So, unless this bog is a
portable bog, she (Ugh. I prefer s/h/it for a neutered pronoun :-)
won't need a portable rope!
: Gentlemen
: system programmers, surely you too have algorithms that are more
: accurately expressed with arrays from other than base zero?
Well, actually, no. One of the characteristics of being *very*
experienced with a language is that you tend to think of
solutions in terms of what that language most easily supplies.
Hmmmm. Now that I think about it, I do seem to recall some Shell
sort where a zero base made the code more complex.
However, since there *is* a portable way to do this (if you don't
mind the syntax), I'll show it.
func()
{
int foo_array[SIZE][SIZE];
#define foo(n,m) (foo_array[(n)-1][(m)-1])
...
}
Ugly, but it works. And it can be used to make the NR programs
portable.
---
Bill
novavax!proxftl!bill
Yes, certainly. However, if one wants such code to be portable, one must
be careful how one computes addresses into such arrays. The only fully
portable way to compute a[b] when you want "a" to start at subscript "s"
is a[b-s]. (a-s)[b] certainly is appealing, since it permits doing the
subtraction once rather than every time, but it is *NOT PORTABLE*. Thanks
primarily (but not exclusively) to Intel, it is not safe to back a pointer
up past the beginning of an array and then advance it again. C has never
guaranteed this to work; indeed, there have always been explicit warnings
that once the pointer goes outside the array, all bets are off. X3J11 has
legitimized pointers just past the end of an array, since this is very
common and is cheap to do, even on difficult machines, but the beginning
of an array remains an absolute barrier to portable pointers. This is
simply a fact of life in the portability game.
--
Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology
they just act that way. | uunet!attcan!utzoo!henry he...@zoo.toronto.edu
On a segmented architecture, like 8086's, malloc can and does return
a value that is a pointer to the beginning of a segment. That is, there
is a 16 bit selector and a 16 bit offset, the offset portion is 0 or a
very small number. Thus, subtracting a value from the pointer could result
in a segment wrap. Trouble occurs when you do things like:
array = malloc(MAX * sizeof(array[0]));
for (p = &array[MAX-1]; p >= &array[0]; p--)
...
The >= will fail, because the last p-- will cause an underflow and now
p is greater than &array[MAX]! I've encountered this many times in
porting code from Unix to PCs. The correct way to write the loop is:
for (p = &array[MAX]; p-- > &array[0]; )
or something similar.
Please, no flames about Intel's architecture. I've heard them all for years.
The best way to learn to write portable code is to be required to port
your applications to Vaxes, 68000s, and PCs. (I have all 3 on my desk!)
And VAX/VMS specifically. Until you've ported to VMS you haven't
ported. Really.
--
Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
We find that using 68000's (Suns and Macintoshes) and IBM-PC's
(in the various memory models) is sufficient to catch most
portability problems. Especially since we avoid as much as
possible using system provided libraries and do not do terminal
I/O except through standard I/O.
Anybody else have suggestions on sets of systems for checking
portability? And how about portability between different system
libraries and different terminal handling schemes, a problem we
don't (yet) have because we ignore it?
---
Bill
novavax!proxftl!bill
1) floating header files (where is struct mumble defined, where does
header file bletch.h live, etc.)
2) library differences (is it in libc, or libm, is it strchr or
rindex, does it exist or do I have to carry my own version along. . .)
3) Implementation problems (working around bugs in the system. . .)
4) Architecture dependencies due to language design or implementation
decisions left to the implementer. (it worked on my brand x, but
after I tried it on brand y, I went back and read K&R and sure
enough its not legal. . .)
The rude fact is that the first three can't be covered by any small
set of machines, although a fairly good aproximation can be made by
trying a System V machine, a BSD machine, an MS-DOS machine, and a non-
un*x machine. (VAX/VMS, Apollo)
As far as the fourth goes, I have been bitten by the following
collection of dumb assumptions:
1) byte ordering
2) word size
3) "all pointers are == char *"
4) calling stack implementation
5) assuming a linear address space without holes
My collection of machines which ring out these problems is fairly
small, but probably not widely accessable. I find that I can get by
with only three machines to cover all of the dumb assumptions: A Vax
11/780, a Cray 2, and an Silicon Graphics Iris 4D. The Cray system
seriously strains dumb assumptions in 1-3 and 5, and the Iris gets 4.
Without the 2, a much larger collection of machines would be needed,
but I suspect that a lot could be accomplished with an IBM PC and a
SUN-3 as an alternative.
Anyway, I don't assume that code I've written is reasonably portable,
until it compiles and appears to work on 16, 32 and 64 bit machines of
varying byte order, memory model and calling stack implementation, and
then I only assume that it is reasonably portable and that I've blown
it for the next class of machines I'm going to see. This is a very
healthy paranoia to develop.
Marty
+-+-+-+ I don't know who I am, why should you? +-+-+-+
| fo...@lemming.nas.nasa.gov |
| ...!ames!orville!fouts |
| Never attribute to malice what can be |
+-+-+-+ explained by incompetence. +-+-+-+
You forgot the best one: any '286 running SCO Xenix System V.
I like to refer to my machine as a portability test with a power switch.
--
Chip Salzenberg <ch...@ateng.uu.net> or <uunet!ateng!chip>
A T Engineering My employer may or may not agree with me.
The urgent leaves no time for the important.
int b[4], *bb = &b[-1];
and variations thereof are interdit, why not use
int bb[5];
Before I am flamed to death for wasting *four* *whole* *bytes* of memory,
I think I can claim excemption under the `speed-vs-space' banner.
Using a pointer as an array probably involves an extra instruction or
CPU cycle somewheres - and `#define bb(x) (b[(x)-1])' does countless
`invisible' subtractions...
pdc
No! That wasn't the problem! (Wish it was, that'd be easy to
avoid!).
The problem is that the authors of Numerical Recipes (NR) observe,
correctly, that many numerical problems are naturally non-zero based.
This gives you the choice between carrying around boatloads of index
arithmatic (inefficient and error-prone), or making non-zero based
arrays. They opt for the latter, in the following way:
float *my_vec; /* this is going to be a vector */
int nl, nh;
...
my_vec = vector( nl, nh ); /* allocates a vector with lowest valid
index nl, and highest valid index nh
*/
...
my_vec[3] = foo(bar);
...
Where we have:
float *vector( nl, nh )
int nl;
int nh;
{
float *v;
v = (float *)malloc( ( nh-nl +1 )* sizeof(float) );
if( v == 0 ) nrerror( "Allocation error in vector()" );
return v - nl;
}
This is quite a bit more disciplined than the example above; it is
also quite bit more fundamental. Fortunately, as far as I've checked
at least, NR only uses vectors and matrices with either 0 or unit
offset, so on broken architectures you could always do
malloc( (nh + 1 )* sizeof(float) );
return v;
This would waste a float per vector, and a pointer-to-float plus n
floats for an n-by-something matrix. Ugly, but it works. (and we
*are* the throw-away culture after all :-)
Rob Carriere
Sorry, but reality is sometimes astonishing.
That is not an X3J11 invention, just an acknowledgement of the
way the world is. (For example, segmented architectures.)
>Gentlemen >system programmers, surely you too have algorithms that are
>more accurately expressed with arrays from other than base zero?
I doubt that even lady system programmers have much trouble with
0-based arrays.
INcorrectly! I've written a lot of array/matrix code in both
Fortran and C, and have found that it normally doesn't matter
and in those cases where it does matter, it doesn't matter much.
I've known mathematicians who have switched over to starting
enumerating at 0 instead of 1. They argued that THAT was "more
natural". One can certainly get used to either convention.
--
-- David Dyer-Bennet
...!{rutgers!dayton | amdahl!ems | uunet!rosevax}!umn-cs!ns!ddb
d...@Lynx.MN.Org, ...{amdahl,hpda}!bungia!viper!ddb
Fidonet 1:282/341.0, (612) 721-8967 hst/2400/1200/300
Trivial refutation time! Surely it is obvious that ``numerical
problems'' forms a (large) superset of ``array/matrix code'' as far as
numerical analysis is concerned?
Believe it or not, but there are *many* algorithms out there where
it's either base-1 indexing or index arithmatic all over the place.
Not with your traditional LU-decomposition stuff and so on, but with
algorithms where the contents or properties of the matrix elements are
computed from the indeces.
Rob Carriere
Trivial indeed! If the code does not involve arrays/matrices,
the issue of 0-based or 1-based indexing doesn't even arise.
> Also, while I haven't tried it personally, I remember a LONG string of
>articles years ago in some group with the subject "Porting to PRIME seen
>as a probable negative experience"; I seem to remember it has to do with
>different types of pointers being of different sizes, none of which would
>fit in in an int.
This is somewhat of a bum rap. Prime C at one time had 48 bit
char pointers versus 32 bit word pointers. Currently all pointers are
48 bits. This makes for problems for people who blithely stuff pointers
into ints. But it really isn't a problem for people who port from UNIX
to PRIMOS who run their code through lint (and act on the results).
Primos C does have some oddities. They set the high bit on in
ascii chars. Setting a file pointer to stdin or stdout has to be done
at the top level. Library routines sometimes have different calling
sequences. There is a 128K limit on array sizes (machine architecture).
But it really isn't all that bad; I've never seen the C compiler break
on standard portable C which is more than I can say for VMS C.
--
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
Richard Harter, SMDS Inc.
I have used an Hewlett-Packard HP9000 model 500, which has a very
strange memory architecture (uses non-contiguous memory
segments), and a real stack (which means that uninitialized local variables
contain a zero). Emacs cannot run on it.
There exists a Pr1me system which always has the high bit
of each byte set, so ASCII runs from 128 to 255.
Of course, EBCDIC machines catch most of the programs which
avoided <ctype.h> but use ('a' <= x && x <= 'z') , and segmented
memory machines (like PC's with large model) catch all programs
that (mis)use ints and pointers and longs.
--
Johan
I would think Vaxes, IBM-PCs, Suns, and Macs are somewhat incestuous.
If you have the money and are interested in the market, you might try
some radically different hardware and operating systems like mainframe
IBM, Cray, CDC, Fujitsu,.....
From article <7...@ns.UUCP>, by d...@ns.UUCP (David Dyer-Bennet):
> No portability check is complete until you've tried some word-oriented
> rather than byte-oriented system. Preferrably something with a word-size
> not a multiple of 8 bits (like 60, or 36). CDC, Unisys, Honeywell, and of
> course the DEC PDP-10 series all come to mind.
Actually the 9-bit byte machines are fairly easy to port to: all
sorts of code of varying quality will run on the Honeywell-Bull
DPS-8 using the Waterloo C Compiler. Try a machine with funny
pointer lengths like the DPS-6, though... Its an 8-bit byte, but
char pointers are 48 bits and others are 32, if you use to high an
address the system will trap even loading a register, etc, etc.
--dave (Bell labs had a DPS-8 C compiler many moons ago) c-b
--
David Collier-Brown. |{yunexus,utgpu}!geac!lethe!dave
78 Hillcrest Ave,. | He's so smart he's dumb.
Willowdale, Ontario. | --Joyce C-B
Yes, *all* of 'em.
(At least, all of the "50 Series" proprietary processors.)
--
Roger B.A. Klorese MIPS Computer Systems, Inc.
{ames,decwrl,prls,pyramid}!mips!rogerk 25 Burlington Mall Rd, Suite 300
rog...@mips.COM (rogerk%mips...@ames.arc.nasa.gov) Burlington, MA 01803
I don't think we're in toto any more, Kansas... +1 617 270-0613
But why should it abort? If the address is sr:0, (sr = segment register)
subtract 1 to get (sr-1):ffff [or whatever number of 'f's]. Memory
protection, it seems to me, should not notice attempts to compute addresses
but only attempts to access forbidden addresses.
Of course, this approach levies heavy penalities on segmented architecutres.
If you are using the 'small' model (in the 8088 meaning of the word),
sr:0 - 1 = sr:ffff. Now you got to worry about the model. But doesn't the
philosophy of C say 'programmer knows best'. If you want to diddle with
segmented architectures, you got to put up with headaches.
So what am I missing?
-Nath
v...@osupyr.mast.ohio-state.edu
A mathematician is one who starts counting at 0 :-)
Historically, people were suspicious of 'nothing' which is why 0 was not
a number by itself (as opposed to being used in place value notation) till
about 6th century A.D.
As far as indexing goes where one starts makes a difference in terms of
typography :-) More seriously, one may have several things to be indexed,
over a big range (-infinity to infinity even) and each thing is indexed
over some subrange not starting at 0. Changing every origin to 0 is
painful and likely to lead to bugs. Ideally this must be fixed up at the
preprocessor level than at code level. Anybody want to write these
macros?
-Nath
v...@osupyr.mast.ohio-state.edu
Regrettably, some architectures prohibit this: (sr-1):ffff may
mean <undefined segment>:ffff, and the loading of the selector into an
selector register will cause a fault. The basic idea here is that
the operating system pre-fetches a page or segment on being informed
that the program is "about" to need it, as indicated by loading its
selector into a distinguished register.
This behavior is possible on the Honeywell DPS-6[1], and certainly
on an Intel machine running a non-DOS operating system.
--dave (@lethe) c-b
[1] I think the compiler writers watch out for this happening, but
I do know that it makes compiler- & debugger-writing **difficult**.
Anyone from SDG want to comment?
--
David Collier-Brown. | yunexus!lethe!dave
.I have used an Hewlett-Packard HP9000 model 500, which has a very
.strange memory architecture (uses non-contiguous memory
.segments), and a real stack (which means that uninitialized local variables
.contain a zero). Emacs cannot run on it.
^^^^^^^^^^^^^^^^^^^^^^^
Nonsense, many people use Emacs on HP9000/500.
--
Dave Caswell
Greenwich Capital Markets uunet!philabs!gcm!dc
There exist machines whose protection philosophy is to prevent you from
even thinking something illegal. In particular, on the Unisys A-series,
the compiler must implement all memory addressing protection--there is
no kernel/user state protection on memory.* A program cannot be allowed
to form an invalid address, as there is nothing to stop it from using it,
and nothing in the hardware to stop you from stomping on another user
if you do. Therefore, the compiler and the operating system would be
written so as to cause an interrupt if computing 'b - 1' were attempted.
The ANSI rules were written to allow C to be implemented on such an
architecture.
Note that there is no C compiler for the A-series today, although one is
rumored. The rumors say that arrays and pointers will not be implemented
this way, however. In order to get around some other problems, and to allow
more old programs to run, a linear-address space machine will be simulated,
using a large array. (Arrays are hardware concepts on the A-series.)
>Of course, this approach levies heavy penalities on segmented architecutres.
On some architectures, it may be an infinite penalty--C could not be
implemented. Or maybe only by simulating a more PDP-11-like machine
(as discussed above).
>If you are using the 'small' model (in the 8088 meaning of the word),
>sr:0 - 1 = sr:ffff. Now you got to worry about the model. But doesn't the
>philosophy of C say 'programmer knows best'. If you want to diddle with
>segmented architectures, you got to put up with headaches.
You sometimes have to, in order to get some benefits (like having your
OS written in a really high-level language, with no assembler, etc.)
>So what am I missing?
A broad education in the corners of the computer architecture world.
>-Nath
>v...@osupyr.mast.ohio-state.edu
* Note that putting the protection in the compiler was also an idea
of Per Brinch-Hansen's in the 1970s, with Concurrent Pascal. Burroughs
had been doing it for many years, even then.
--
Craig Jackson
UUCP: {harvard!axiom,linus!axiom,ll-xn}!drilex!dricej
BIX: cjackson
A quote from the file etc/MACHINES of the GNU 18.50 distribution:
"The [HP9000] series 500 has a seriously incompatible memory architecture
"which relocates data in memory during execution of a program,
"and support for it would be difficult to implement.
Of course, "other" emacses are available for HP9000 model 500
(MicroEmacs, Jove, Scame, Unipress??).
--
Johan
I believe he is probably referring to GNU Emacs which definitely does
not run on the HP9000/500. There are even references to this fact in
the installation/porting documentation.
About a 15 months ago I ordered Emacs from Unipress software as they said
an HP9000/500 version was available. `Available' meant that they sent out
a source tape with some comments for the HP9000/500. I spent several days
to get to a running version but it was so buggy that everything was boxed
up and returned within the week. Maybe they have it working now...
--
Steve Fullerton Statware, Inc.
scf%statwa...@cs.orst.edu 260 SW Madison Ave, Suite 109
orstcs!statware!scf Corvallis, OR 97333
503/753-5382
In article <8...@osupyr.mast.ohio-state.edu> v...@osupyr.mast.ohio-state.edu
(Vidhyanath K. Rao) asks:
>But why should it abort? If the address is sr:0, (sr = segment register)
>subtract 1 to get (sr-1):ffff [or whatever number of 'f's].
On many machines, addresses are unsigned numbers. The domain and range
of an unsigned 16-bit number is 0..65535. What is the (mathematical)
result of 0 - 1? Answer: -1. Is it in range? No. So what happens?
Integer underflow, which on many machines is a trap.
You can even do this on a VAX, although there you must first enable the
trap (use bispsw or set the appropriate flag in the subroutine entry
mask), and then it only fires on integer computations outside the range
-2 147 483 648..2 147 483 647; so if (for instance) you were to write
main()
{
char *p;
p = (char *)0x7fffffff;
asm("bispsw $0x20"); /* PSL_IV */
p++;
}
This program, when run, aborts with a `floating exception' (SIGFPE).
It would be legal for the C compiler to set IV in the entry point
of each subroutine, although it would probably break too much code
that expects integer overflow/underflow to be ignored, and the code
that does C's `unsigned' arithmetic would have to turn it off temporarily.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: ch...@mimsy.umd.edu Path: uunet!mimsy!chris
I am told that the original Fortran/Pascal version is a good book, but
they clearly could have used some help with this version. I
appreciate that they tried to explain and point out the problems of
numerical programming in C, but they often criticize or make uncalled
for assumptions about C due to their ignorance.
For example, the following is a quote from a recent issue of
Micro/Systems Journal which reviewed the book:
They suggest not using switch-case-default construction, ...
because they consider the structure confusing, and also "burdened
by uncertainty, from compiler to compiler about what data types
are allowed in its control expression." It is recommended that
this be replaced by "a more recognizable and translatable if-else
construction."
They [NRC] go on to make many other unfounded remarks, such as "avoid
continue;". Isn't there a better book for numerical programming in C?
Don Libes cme-durer.arpa ...!uunet!cme-durer!libes
What's to stop you from doing the following:
Generate code in an array.
Jump to the beginning of the array. *
Now you've blown the protection. You can do anything. I hope this isn't a
multiuser machine...
* this may involve such things as passing a pointer to an array to a
function that's declared that argument as a pointer to a function, or
even by writing the array out as a file and executing it... I can't see
how you could write a valid 'C' compiler that wouldn't let you violate
this protection.
--
Peter da Silva `-_-' Ferranti International Controls Corporation.
"Have you hugged U your wolf today?" pe...@ficc.uu.net
But nobody says you have to load the selector into a selector register
just to compute an address. Why should the address calculation hardware
be involved at all?
What are the implications of this "relocation of data in memory during
execution" and why would it cause Emacs problems but not problems in
general with C programs which use pointers. (If the HP9000 will move a
block of memory after a pointer to it has been loaded, then the pointer is
now no good... or is it?)
--
|------------Dan Levy------------| THE OPINIONS EXPRESSED HEREIN ARE MINE ONLY
| Bell Labs Area 61 (R.I.P., TTY)| AND ARE NOT TO BE IMPUTED TO AT&T.
| Skokie, Illinois |
|-----Path: att!ttbcad!levy-----|
I may not be the first one to cast a stone at this example, but have
you considered the possibility that a floating point exception is
manifestly the \fBwrong\fP thing to do in your example? There is no
floating-point math in there. Complain to your vendor.
--
...!bikini.cis.ufl.edu!ki4pv!cdis-1!tanner ...!bpa!cdin-1!cdis-1!tanner
or... {allegra killer gatech!uflorida decvax!ucf-cs}!ki4pv!cdis-1!tanner
The version of Gosling Emacs we have is datad 1985.
In article <14...@ficc.uu.net> pe...@ficc.uu.net (Peter da Silva) writes:
-What's to stop you from doing the following:
-
- Generate code in an array.
- Jump to the beginning of the array. *
Whenever the compiler is forced to generate `iffy' code, it also generates
tests such as tags to make sure that you do not do something like this.
The FORTRAN origins of the NRC code are at points quite clear, there
are several routines that could easily have been coded base 0, and
were coded base 1, and so on. The introductory chapter contains
several remarks about C that to the experienced C programmer, and even
to me, sound rather inane. All true.
However, I do not know of any book, in *any* language, that contains
the quality and quantity of numerical material that NRC has. You
should take into account that good numerical analysis is more than 80%
math, so even if the programming job were botched (and it isn't), the
book would be worth its money. Further, apart from the base-0/base-1
array issue that has been beaten to death in this group already, the
actual C code (as opposed to their philosophy about it) is good. The
reason for this is simply that while numerical code may have very
intricate analysis behind it, the actual code tends to be rather
simple -- a couple of for loops and a handful of if's is typical.
Finally, the problem with the base-1 arrays can simply be solved by a
minor change to the vector(), dvector(), ivector(), matrix(),
dmatrix() and imatrix() code, with a corresponding change in the
free_<vector, etc> routines. I posted the change for vector() a while
ago.
In summary, anyone who claims the book to be of little or no value is
not doing it justice.
Rob Carriere
(This is from memory)
When I worked at the Eindhoven University of Technology we had 4 HP9000/500s
and we ported emacs to it.
The real problem (with emacs 16.??) we had was that emacs assumed that a machine
with 32 bit words uses only 24 address bits. The upper 8 were assumed to be free
and available for type tags. In essence, it assumed that all machines have a VAX
like addressing scheme. The HP9000/500 uses all 32 bits. This same problem
occurred every time we tried to port a program which made any assumption on the
format of a pointer.
The "problem" is that the HP9000/500 has a segmented memory architecture. Each
pointer consists of a segment number and a segment offset and then some. At load
time the segments are assigned to the program and each pointer (identified as
such in the load file) is fixed up with the segment number of its associated data.
When you have emacs dump itself to create a faster loadable binary, you would
have to write every pointer back with no segment assigned. For arbitrary pointers
this is a hell of a job.
A colleague of mine succeeded in porting it but without the dump facility. If
you are interrested, you can contact him at the following address:
Geert Leon Janssen
Eindhoven University of Technology,
Department of Electrical Engineering,
P.O. Box 513,
5600 MB Eindhoven, The Netherlands.
(UUCP: ...!mcvax!euteal!geert)
--
Hans Zuidam E-Mail: ha...@nlgvax.UUCP
Philips Telecommunications and Data Systems, Tel: +31 40 892288
Project Centre Geldrop, Building XR
Willem Alexanderlaan 7B, 5664 AN Geldrop The Netherlands
That's simple. All the compiler has to do is detect any attempt to
use a data object as a function. The only way to even attempt this in
standard C is via an explicit cast to a function pointer somewhere,
which is where the compiler would enforce the constraint.
As I remember ... an pointer on the HP9000/500 system is
something you cannot treat as an numeric quantity. Of course, you
should not do that anyway.
I recall the following symptoms:
- address space is not contiguous from zero to somewhere,
pointers contain segment numbers and offsets;
- you cannot store a pointer on disk, and read it back in
another run, because your program will probably not be loaded
in the same memory segments;
- you cannot use the highest bits of a pointer for other
purposes (as GNU Emacs does). All 32 bits contain information.
When you use pointers thru C (e.g. "ptr1 - ptr2" or "ptr[index]")
everything goes well, that's why "normal" applications are not
affected.
Another feature of the HP9000/500 is that local variables are
garanteed to contain 0 (zero) at startup.
--
Johan
In article <70...@cdis-1.uucp> tan...@cdis-1.uucp (Dr. T. Andrews) writes:
>I may not be the first one to cast a stone at this example, but have
>you considered the possibility that a floating point exception is
>manifestly the \fBwrong\fP thing to do in your example? There is no
>floating-point math in there. Complain to your vendor.
What would you have it called? Here are the possible signals:
#define SIGHUP 1 /* hangup */
#define SIGINT 2 /* interrupt */
#define SIGQUIT 3 /* quit */
#define SIGILL 4 /* illegal instruction (not reset when caught) */
#define SIGTRAP 5 /* trace trap (not reset when caught) */
#define SIGIOT 6 /* IOT instruction */
#define SIGABRT SIGIOT /* compatibility */
#define SIGEMT 7 /* EMT instruction */
#define SIGFPE 8 /* floating point exception */
#define SIGKILL 9 /* kill (cannot be caught or ignored) */
#define SIGBUS 10 /* bus error */
#define SIGSEGV 11 /* segmentation violation */
#define SIGSYS 12 /* bad argument to system call */
#define SIGPIPE 13 /* write on a pipe with no one to read it */
#define SIGALRM 14 /* alarm clock */
#define SIGTERM 15 /* software termination signal from kill */
#define SIGURG 16 /* urgent condition on IO channel */
#define SIGSTOP 17 /* sendable stop signal not from tty */
#define SIGTSTP 18 /* stop signal from tty */
#define SIGCONT 19 /* continue a stopped process */
#define SIGCHLD 20 /* to parent on child stop or exit */
#define SIGCLD SIGCHLD /* compatibility */
#define SIGTTIN 21 /* to readers pgrp upon background tty read */
#define SIGTTOU 22 /* like TTIN for output if (tp->t_local<OSTOP) */
#define SIGIO 23 /* input/output possible signal */
#define SIGXCPU 24 /* exceeded CPU time limit */
#define SIGXFSZ 25 /* exceeded file size limit */
#define SIGVTALRM 26 /* virtual time alarm */
#define SIGPROF 27 /* profiling time alarm */
#define SIGWINCH 28 /* window size changes */
#define SIGUSR1 30 /* user defined signal 1 */
#define SIGUSR2 31 /* user defined signal 2 */
You have to squeeze it in somewhere, and in fact the VAX hardware
reports integer overflow and floating exceptions with the same trap /
fault (`arithmetic exception').
> In article <14...@ficc.uu.net> pe...@ficc.uu.net (Peter da Silva) writes:
> -What's to stop you from doing the following:
> -
> - Generate code in an array.
> - Jump to the beginning of the array. *
Chris Torek noted:
> Whenever the compiler is forced to generate `iffy' code, it also generates
> tests such as tags to make sure that you do not do something like this.
So what's to stop me from writing out a load module and subverting
the protection mechanism, as I noted in my (deleted) footnote? I would
think that the perversions necessary to make 'C' safe to run on this machine
would make it sufficiently useless that a little thing like calculating
a pointer to a position before the beginning of an array is a minor
detail...
That is to say, yes... this construct is non-portable. But only to machines
you would have severe problems porting to in the first place.
In article <14...@ficc.uu.net> pe...@ficc.uu.net (Peter da Silva) writes:
>So what's to stop me from writing out a load module and subverting
>the protection mechanism, as I noted in my (deleted) footnote?
The O/S, of course, which cooperates with the compiler as to these tags
or region markers or whatever. In fact, the only way to subvert the
system, if the system is done right, is to take it apart and either
rewire it, or move its disks to another machine and rewrite them, or
something along those lines---i.e., something software is physically
unable to protect against. (I thought this whole line of reasoning was
obvious. [proof by intimidation :-) ])
>I would think that the perversions necessary to make 'C' safe to run
>on this machine would make it sufficiently useless ...
Probably.
Decent memory protection. (There are those of us who believe that
executable and writable memory should be mutually exclusive. (with a
provision to change from one to the other.))
>So what's to stop me from writing out a load module and subverting
>the protection mechanism, as I noted in my (deleted) footnote?
The same type of protection mechinism that makes it impossible
(or hopefully at least difficult) to alter other users files.
Writing out executalbe files may be considered a priviliged
function reserved to compilers.
(Please note I am not saying that I think that compilers are the proper
place to enforce system security, just that portably written code shouldn't
have undue hardship running on such a machine.)
--
Bob Larson Arpa: Bla...@Ecla.Usc.Edu bla...@skat.usc.edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson
Prime mailing list: info-prime-request%ai...@ecla.usc.edu
oberon!ais1!info-prime-request
Two things stop this:
1. There's no way to 'say it'; see below.
2. There is a tag field on each word of memory. Data has a tag of 0 or 2;
code has a tag of 3. It is the responsibility of the compiler to make sure
that a user program cannot set its own tags. Only the operator can turn
a program into a compiler, and only a compiler can create an object program.
(There are, of course, holes for people with super-user-like privileges.
Just like Unix.)
>* this may involve such things as passing a pointer to an array to a
>function that's declared that argument as a pointer to a function, or
>even by writing the array out as a file and executing it... I can't see
>how you could write a valid 'C' compiler that wouldn't let you violate
>this protection.
Another feature of this system is a type-checking linker. All functions
must agree in number of arguments and type of arguments with their calls.
The linker, called the binder on the A-series, enforces this. (This makes
varargs be a pain in the behind, BTW. One reason why A-series C most likely
will not fully use the hardware, and therefore be a slow, undesirable
language. Much like their PL/I.)
>Peter da Silva `-_-' Ferranti International Controls Corporation.
-Sho
So what's wrong with "many people use Emacs on HP9000/500"? Nobody
(until your posting) said anything about specifically GNU emacs.
der Mouse
old: mcgill-vision!mouse
new: mo...@larry.mcrcim.mcgill.edu
actually sounds a lot like working on an intel iapx86 microprocessor, doesn't it? the restrictions are much the same.
jim nutt
'the computer handyman'
--
St. Joseph's Hospital/Medical Center - Usenet <=> FidoNet Gateway
Uucp: ...ncar!noao!asuvax!stjhmc!15.11!jim.nutt
I'm getting real tired of the "I wanna do X" comments. First,
there is almost (but now always) a better, standard way to do the
same thing. And second, there seems to be a *major*
misunderstanding of what ANSI says.
ANSI does *not* say that you can't jump into your data. All that
it says that this is not guaranteed to work. In other words,
it's nonportable. OF COURSE its nonportable. It's just not
going to work on a machine with a different processor, and it may
not work with a different operating system or a different
compiler.
A similar comment applies to a lot of things that people are
complaining about. ANSI rarely says that, at run time, certain
things are not allowed. Instead, it says things like "if you do
X, the results are undefined (or implementation dependent)".
So you *can* write your incremental compiler and have it conform
to ANSI C (though it will not be maximally conforming). You just
can't do that and assume that it will port. And that is what you
would expect.
Let me make that clear: unless the ANSI standard says that the
compiler *must* prohibit something, that thing is *allowed*. If
there is something reasonable to do on your particular system, so
long as doing it doesn't conflict with the do's and don'ts that
are stated in the standard, the compiler can do it. And you can
use that feature to your hearts content, and have a conforming
program. The program won't be maximally conforming, so it won't
necessarily port easily, but that is the cost of using a system
specific feature.
---
Bill
novavax!proxftl!bill
There is another way to treat a data object as a function:
union foo {
char *data;
int (*func)();
};
The compiler would either have to prohibit unions with both text and
data pointers or do runtime bookkeeping to remember what was last
stored in such unions.
Mark Russell
m...@ukc.ac.uk
More to the point: The dpANS says you (the implementor) are _allowed_ to
load the selector into a selector register when computing the address. To
do otherwise could slow down register-intensive pointer manipulation.
--
Chip Salzenberg <ch...@ateng.uu.net> or <uunet!ateng!chip>
A T Engineering My employer may or may not agree with me.
The urgent leaves no time for the important.
But consider what might have happened had dpANS mandated that the compution
of a pointer to x[-1] be a valid operation. Then machines for wich the
mandated behavior is slow would be not used by people interested in high
performance. The net effect could be salubrious for the computer industry in
the long run.
Marv Rubinstein
Perhaps. I, for one, would find it useful to be Officially Allowed to
compute &arr[negative_offset]. I already make use of this (nonportably) in
existing code. There is a second pitfall, however.
Consider the following (not strictly conforming) code:
struct array_descriptor {
int low; /* lower bound */
int bound; /* upper bound - lower bound */
int *data; /* pointer to &data[0] */
};
/*
* Allocate a new array whose subscripts range from [low..high)
*/
struct array_descriptor *new_array(int low, int high) {
struct array_descriptor *p;
int bound, *dp;
/* first get a descriptor */
if ((p = (struct array_descriptor *)malloc(sizeof(*p))) == NULL)
return (NULL);
/* then check for degenerate arrays (no data) */
p->low = low;
if ((bound = high - low) <= 0) {
p->bound = 0;
p->data = NULL;
} else {
/* allocate data */
if ((dp = (int *)malloc(bound * sizeof(*dp))) == NULL) {
free((char *)p);
return (NULL);
}
p->bound = bound;
p->data = &dp[-low]; /* virtual zero point */
}
return (p);
}
If the computation `&dp[-low]' does not over- or under-flow, it
produces some pointer. If `low' is positive, it produces a pointer
that does not point to valid data, but as long as that pointer is
used by adding a value in [low..high) before indirecting, things
should work out.
Now consider the free routine:
void free_array(struct array descriptor *p) {
if (p->data != NULL)
free((char *)(&p->data[p->low]));
free((char *)p);
}
Do you see the hidden assumption here?
if (p->data != NULL)
but p->data is not a `valid' pointer. Maybe we had best write
if (&p->data[p->low] != NULL)
but this is no good either (look again at new_array). At least
if (p->bound)
seems safe. But what *really* happens if, in new_array, &p->data[-low]
turns out to `just happen' to equal NULL?
The approach I used in my own (nonportable) code was to keep the
original pointer around, just in case (and because there was no
p->bound available: allocation of data objects is deferred until they
are needed). This also `just happens' to keep happy some garbage
collecting C runtime systems. The above code, run on such a system,
might fail mysteriously after the garbage collector runs---because the
data pointer computed by &p->data[-100] is outside the region allocated
by malloc. The GC routine would assume it was free, and cheerfully
release it for another malloc().
> More to the point: The dpANS says you (the implementor) are _allowed_ to
> load the selector into a selector register when computing the address. To
> do otherwise could slow down register-intensive pointer manipulation.
OK, then, I withdraw my objection to the original message. Since
there is no portable method of declaring non-zero-based arrays in 'C',
and since the code generation task for using such is trivial, they should
be added.
I doubt that any effect on the computer industry would have occurred
other than reduced adherence to the postulated C standard. People
writing portable applications would still not be able to compute
&array[-1], since several compilers would ignore that requirement
(benchmark speed is a far greater driving factor than the desires of
a few sloppy programmers to compute non-existent addresses). What
good would that situation accomplish? Better that the standard be
widely followed and that programmers become better educated about
actual portability considerations, than to encourage false hopes for
availability of features that are difficult or detrimental to provide.
No, it is sufficient if any attempt to use this trick malfunctions badly.
(For example, if the two kinds of pointers are not the same size, it is
almost guaranteed to.)
--
NASA is into artificial | Henry Spencer at U of Toronto Zoology
stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry he...@zoo.toronto.edu
No. A much more probable result would be widespread rejection of the C
standard, making things worse than before. ANSI does not have the power
to legislate conformance to standards -- that has to be voluntary. If
too many manufacturers, especially big ones, decline to conform to a
standard, it falls into disuse and is forgotten. Let us not forget that
the machine whose segmented architecture causes the biggest headaches for
pointer trickery is also the biggest-selling computer of all time. To get
a standard accepted (by the world, not just by ANSI), it is necessary --
distasteful, but necessary -- to restrain desires for social engineering,
and produce something that will work even on systems one does not like.
Re comments about x[-1] should be legal and should be in the standard.
>I doubt that any effect on the computer industry would have occurred
>other than reduced adherence to the postulated C standard. People
>writing portable applications would still not be able to compute
>&array[-1], since several compilers would ignore that requirement
>(benchmark speed is a far greater driving factor than the desires of
>a few sloppy programmers to compute non-existent addresses). What
>good would that situation accomplish? Better that the standard be
>widely followed and that programmers become better educated about
>actual portability considerations, than to encourage false hopes for
>availability of features that are difficult or detrimental to provide.
You may be right about reduced adherence, at least in this regard.
However the problem is not simply a matter of "sloppy" programming.
In C a pointer is a fairly anonymous object. What you are saying is
that it is a potential error to add or subtract an integer from a
pointer if the result is out of range. Very well, but what is that
range? Suppose a pointer is passed through a calling sequence. In
the function I have no way of knowing whether &x[n] will break for any
n other than 0. For that matter I have no way of knowing whether
x is a legal pointer!
In principle this is not right -- there is no way to write defensive
code to check on pointer validity. To be sure a "correct" program
never has an invalid pointer and all that but what about the rest of
us poor mortals?
--
In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
Richard Harter, SMDS Inc.
Wherever the actual data is, plus a nonexistent extra member just past
the end of an array. But you knew that.
>Suppose a pointer is passed through a calling sequence. In
>the function I have no way of knowing whether &x[n] will break for any
>n other than 0. For that matter I have no way of knowing whether
>x is a legal pointer!
That's not quite the same issue. One presumes that your functions have
interface specifications, which if adhered to guarantee proper operation.
The code defining the function must of course implement those
specifications. The C language does not provide the means to completely
specify all relevant aspects of an interface in the C code itself, nor
does it require run-time enforcement of the interface specifications.
It does permit some degree of compile-time checking of the interface.
>In principle this is not right -- there is no way to write defensive
>code to check on pointer validity. To be sure a "correct" program
>never has an invalid pointer and all that but what about the rest of
>us poor mortals?
I suppose you should all learn ways to produce correct code. Many
such methods were figured out in the 1970s and should be part of
every professional programmer's repertoire by now.
If you ever get the chance to help design a brand-new programming
language, you might consider building in stronger aids for interface
specification checking. Ada tried that but didn't carry it far enough.
Okay, let's imagine: X3J11 says that x[-1] must be valid.
then: int must be 32 bits.
then: address space must be linear.
etc. until only the SPARC is conforming. (no smileys here)
Each time you make a "beneficial" restriction, you're condemning present
users of real, useful computers to the purgatory of enforced non-
conformance. I don't think anyone really wants X3J11 to make decisions
about which hardware will be permitted to run C programs.
In addition, it should be observed that on this issue, X3J11 stuck to its
charter and codified existing practice.
The members of the array that the pointer points into, plus the special
case of just above the end of the array.
>Suppose a pointer is passed through a calling sequence. In
>the function I have no way of knowing whether &x[n] will break for any
>n other than 0. For that matter I have no way of knowing whether
>x is a legal pointer!
That's correct. It is the caller's responsibility to supply a pointer
that is adequate for the function's purposes, and the function writer's
responsibility to document those purposes well enough that the caller
knows what his responsibilities are. There is no way to check this at
runtime in conventional C implementations. That's C for you.
>The members of the array that the pointer points into, plus the special
>case of just above the end of the array.
The question was rhetorical, in that I pointing out that there is
no way to determine from the pointer itself what its range was. I expect
it is a good thing to repeat the legal answer, since it is does seem to be
a complete mystery to some people :-).
>>Suppose a pointer is passed through a calling sequence. In
>>the function I have no way of knowing whether &x[n] will break for any
>>n other than 0. For that matter I have no way of knowing whether
>>x is a legal pointer!
>That's correct. It is the caller's responsibility to supply a pointer
>that is adequate for the function's purposes, and the function writer's
>responsibility to document those purposes well enough that the caller
>knows what his responsibilities are. There is no way to check this at
>runtime in conventional C implementations. That's C for you.
Such as it is :-). In theory this means that there is an entire class
of error checking that one can't do. For the most part this doesn't matter.
However it would be very nice if there were a library routine that would
tell you whether a pointer was legal or not. I am not much a fan of the
"if all the spec's and the interface definitions and the code are all correct
then you don't need error checking" school of programming. I rather like
the "be a skeptic and check and report the errors when you find them"
school of thought.
As a side note, one argument for making x[-1] legal is that it permits
you to use sentinels in both directions. I don't see that this is a
problem, regardless of architecture. All that is required is that nothing
be allocated on a segment boundary. However, as the man says, they way
it is is the way it is. There never was a machine, a language, or an
operating system without arcane restrictions. [Except lisp :-)]
There is a considerable practical difference between this case and the
one for a pointer just past the last member of an array. The [-1] case
can require reservation of an arbitrary amount of unused address space
below an actual object, whereas the other case requires no more than a
byte (or a word, depending on the architecture) of space reserved above.
Good point. [For those who didn't catch it, the size of the object that
a pointer is pointing to can be a structure of (arbitrary) size. To allow
&x[-1] to be legal you have to leave enough space for an instance of the
structure (or array of structures or whatever) before the actual instance.]
Doug's claim that the other case (one past) need only require a word (or byte)
of memory at the end suggests a question that I don't know the answer to.
Suppose that x is an array of structures of length n. As we all know,
&x[n] is legal. But what about &x[n].item? If Doug's assertion is correct,
and I expect it is since he is quite knowledgable in these matters, then
it would seem to follow that &x[n].item is not guaranteed to be legal.
Excuse my ignorance, but why must an int be 32 bits for the above to work?
> then: address space must be linear.
Does X3J11 say that the contents of x[-1] must be valid?
Mark Jones
The situation unfortunately isn't as symmetrical as it looks, because
a pointer to an array element points to the *beginning* of the array
element. A pointer one past the end of an array points to the byte
(well, the addressing unit, whatever it is) following the array; a
pointer one past the beginning points to the byte (etc.) that is one
array-member-size before the beginning. Computing x[size] without
risk of overflow only requires that there be at least one byte between
the array and the end of a segment; computing x[-1] without risk of
underflow requires an entire array element between the array and the
start of the segment, which can get expensive if the elements are big
(consider multidimensional arrays).
The difference in costs was felt to be sufficient to justify a difference
in treatment. Both practices have been technically illegal all along,
so legitimizing both wasn't vitally necessary. Since x[size] gets used
a lot and is cheap to do, it was legalized. Since x[-1] was rather more
costly and is used less, it wasn't.
While Pascal lends itself to this somewhat better than C, I learned
some lessons from one compiler that are worth mentioning here.
Grads of the University of Wisconsin at Madison (and other places with
Univac 1100s) remember Charles Fisher's Pascal compiler with great
fondness. Aside from some extensions (separate compilation) that
aren't relevant to this discussion, he worked within the limits of
Jensen and Wirth, but still produced a compiler that did an enormous
amount of run-time checking. (Checking variant tags found several bugs
a day in my CS 701 project.) The niftiest check, though, was sanity
testing pointers with a "lock" and a "key".
(In Pascal, a programmer can only point to objects that have been
allocated, one at a time, from the heap. Think "p=malloc(sizeof t)"
or some such.)
When a "new" object is allocated, some space is reserved at the
beginning for a "lock". This is an integer, usually fairly long
(thirty-two bits). (As an important tweak, the high order bits of this
"lock" are never all zero or all ones.) Every pointer has enough room
for both an address and a "key". When a pointer variable is assigned a
value, both the address and the "key" are assigned. When an object is
deallocated, the "lock" is changed (typically, set to an otherwise
impossible value, such as zero).
When a pointer is used, the compiler first checks the "key" field in
the pointer to the "lock" field in the structure. NOTE THAT THIS IS
COMPLETELY TRANSPARENT TO THE PROGRAMMER! Null pointers can always be
caught by an ambitious compiler. With "locks" and "keys", even
uninitialized and dangling pointers can be caught. That is, if you
allocate some memory, assign the address to a pointer variable, and
then deallocate the memory, this run-time check will complain if you
use the obsolete pointer variable to get at memory that's no longer
safe to access or modify.
Now, what changes would need to be made for C? First, it's possible in
C to allocate not just single objects, but arrays of objects. An
allocated array would need room for both a lock and a length. You
don't need a lock for every element. A "pointer" would be represented
as a base address, an offset, and a key. The compiler would check that
the offset was no greater than the length, and that the key matched the
lock stored (once) at the beginning of the array.
Second, C can have pointers to global and local variables (to the
initialized data section and to the stack). It doesn't make sense to
have a "lock" for every auto or extern short. (It's not too,
unreasonable, though, to have a lock for every array and struct.) C
would need some sort of "skeleton key" for small non-malloc'ed objects.
Finally, since we're generating code with "pointers" that are very
different than what most C compilers use, we'd have to be very careful
in linking code (say, libraries) that expect "normal" C pointers.
So, this is tough to do in C. (I think some MS-DOS C interpreters do
at least some of this.) On the other hand, by redefining Object* and
Object->, "lock" and "key" checking would be straightforward to
implement in C++. Any takers?
References (CNF = Charles N. Fisher, Computer Sciences Department,
University of Wisconsin at Madison; RJL = Richard J. LeBlanc, then at
the School of Information and Computer Science, Georgia Institute of
Technology):
RJL and CNF, "A case study of run-time errors in Pascal programs",
SOFTWARE-PRACTICE AND EXPERIENCE, v. 12, pp. 825-834 (1982).
CNF and RJL, "Efficient implementation an optimization of run-time
checking in Pascal", SIGPLAN NOTICES, v. 12, #3, p 1924 (March 1977).
CNF and RJL, "The implementation of run-time diagnostics in Pascal",
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, v. 6, pp. 313-319 (1980).
CNF and RJL, UW-PASCAL REFERENCE MANUAL, Madison Academic Computing
Center, Madison, WI, 1977.
(More recent references may be available. Anyone at uwvax or uwmacc
have anything to contribute?)
--
Paul S. R. Chisholm, ps...@poseidon.att.com (formerly p...@lznv.att.com)
AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm
I'm not speaking for the company, I'm just speaking my mind.
--
Send compilers articles to ima!compilers or, in a pinch, to Lev...@YALE.EDU
Plausible paths are { decvax | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
One problem with this is that on the segmented machines it is the act
of computing such a pointer that is invalid, not the pointer itself.
For example, if P is a pointer that happens to point to offset 0 in a
segment, *computing* P-1 will cause a fault. So, what you need is a
routine that tells you whether a particular offset from a pointer is
legal; something like:
if (valid_pointer_offset (P, sizeof(*P), -1))
P--;
Also, in order for this to work on machines with linear address
spaces, all pointers would have to carry around the location and size
of the objects to which they point. This is done in Symbolics C,
since pointers are actually implemented as two Lisp objects, an array
and an offset into the array, and array objects contain their size,
which is checked by the microcode/hardware array-indexing operations.
Each call to malloc returns a new array, and each stack frame can be
treated as an array (this means that it won't detect overflowing from
one local array into another in the same frame, but nothing's
perfect). Expecting C implementations on conventional architectures
to do this is too much.
Barry Margolin
Thinking Machines Corp.
bar...@think.com
{uunet,harvard}!think!barmar
Consider:
some_big_type x[2];
If sizeof(some_big_type) is half the size of a segement, computing &x[-1] is
no harder (or easier) than computing &x[2]. The standard mandates that
&x[2] be computable but it does not mandate that &x[-1] be computable. I
suspect that a C implementation that allows arrays as large as large as a
segment is able to compute both addresses.
Note, mandating that &x[-1] be computable does not mean that x[-1] is
referencable. So evenif &x[-1] were computable we still could not have a
sentinel at the low end of the array. A sentinal at the low end is easy.
just start the data with x[1].
Marv Rubinstein
Yup.
I think if your code has to worry about this, it is ALREADY in trouble.
Pointers should come in two flavors: null (which is easy to test) and
valid (which you should be able to assume when non-null).
This got me thinking about a subtle dpANS wording difference:
struct _whatever *pstruct;
pstruct = (struct _whatever *) malloc (n * sizeof(struct _whatever));
is pstruct[n-1] or pstruct+(n-1) -guaranteed- to be allowed on
-all- dpANS conformant installations?
The dpANS (Jan '88) states that malloc() returns space for an -object-
of size (n * sizeof(struct _whatever)) whereas
calloc(n, sizeof(struct _whatever))
allocates space for an -array- of 'n' objects each of whose size
is sizeof(struct _whatever).
I guess it comes down to this: does dpANS -guarantee- an object is
divisable into sub-objects following Chris Torek's "locally
flat" paradigm, and that pointers produced by arithmetic on
pointers to the sub-objects will be valid.
Simply stated, does dpANS wording imply any difference between
calloc (n, size) and malloc (n*size) ?
\/\/ill
--
St. Joseph's Hospital/Medical Center - Usenet <=> FidoNet Gateway
Uucp: ...{gatech,ames,rutgers}!ncar!noao!asuvax!stjhmc!18.6!will.summers
Excuse for the question, but that is the first time I am looking at that
subject and I don't see any reason why x[-a] can't be permitted, mainly
for two basic reasons:
(a) x[a] == *(x+a) therefore x[-1] == *(x-1), which looks
perfectly ok to me.
(b) yacc uses array[-1]. If it is considered invalid, that will mean
that yacc has to be rewriten for the new standard?
Am I missing something?
George Kyriazis
kyri...@turing.cs.rpi.edu
------------------------------
Oh come on now, my code is never in trouble. What, never? But it is
just this kind of thinking "you should be able to assume" that leads to
bug-ridden code which lacks robustness. If one is writing a service
routine one doesn't blissfully assume that the caller hasn't blown it;
one checks the input for validity. If there is trouble in your routine,
that's your problem. But if you don't check your input and it violates
your interface assumptions anything can happen.
>One problem with this is that on the segmented machines it is the act
>of computing such a pointer that is invalid, not the pointer itself.
I don't understand this. I can understand that on certain wacko
architectures that computing it IN A SEGMENT REGISTER would cause
a problem. But why not do the computation in an ordinary
arithmetic register, presumably by casting to an integer type?
DougMcDonald
int array[10], *ip = array;
ip++;
foo(ip[-1]);
I assume when yacc uses array index of -1 it is doing something
similar.
--
Scott Wilson arpa: swi...@sun.com
Sun Microsystems uucp: ...!sun!swilson
Mt. View, CA
You cannot fix the caller's violation of the interface spec in the called
routine.
It often pays to perform thorough parameter validation while debugging an
application, but you should not rely on such micro-tests for robustness.
There's more. Here's some code I ripped directly from a y.tab.c
(yacc output) of a compiler I'm working on:
case 28:
# line 160 "gram.y"
{ Mpc_insert_with_searchdir(yypvt[-3],yypvt[-1]);
yyval = tree(N_INSERT_DECL3);} break;
case 29:
# line 166 "gram.y"
{ yyval= tree(N_CONST_DECL); } break;
case 30:
# line 171 "gram.y"
{ yyval= list(L_CONST_DECL_LIST, (Node*)0, yypvt[-0]); } break;
'Nuff said?
I missed some of the early postings on this subject. Could someone
be kind enough to bring me up to date? Is the committee going to
break this code? If so, why, fer Pete's sake?
The C standard does not tsy that computing a negative offset off of
a pointer is illegal, it says that computing one that is outside of
the array that the pointer points in to (the a single object is an
array of size 1) iis not standard C.
In the case of the YACC example, consider the following:
int a[10];
int *ip;
ip = &a[0];
&ip[-1]; /* iplementation-defined behavior */
&ip[0]; /* just fine */
&ip[9]; /* just fine */
&ip[10]; /* just fine */
ip[10]; /* iplementation-defined behavior */
ip = &a[5];
&ip[-6]; /* iplementation-defined behavior */
&ip[-5]; /* just fine */
&ip[4]; /* just fine */
&ip[5]; /* just fine */
ip[5]; /* iplementation-defined behavior */
And that's the way it is.
;-D on ( My goodness and my badness ) Pardo
--
pa...@cs.washington.edu
{rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
>{ Mpc_insert_with_searchdir(yypvt[-3],yypvt[-1]);
> yyval = tree(N_INSERT_DECL3);} break;
>Is the committee going to
>break this code? If so, why, fer Pete's sake?
This should be OK, I believe. As I understand the internals of
yaccpar, yypvt is a pointer into yyv[], so you're not going outside of
the bounds of the array. The problem is in the initialization of the
stack where it's pointed to the -1th element which doesn't exist.
-=-
_________________________________________________________________________
| Christopher Mills | "If you see someone without a smile, |
| mi...@baloo.eng.ohio-state.edu | give them mine - I'm not using it." |
====== My thoughts are not my own--I'm posessed by mailer daemons. ======
I feel like the world has gone through some strange warp. Back when I
was studying numerical analysis the complaint from the mathematicians
and numerical analysts was about how awkward it was to code algorithms
in Fortran-IV because it used origin 1 indexing and origin 0 would
clearly have been so much more "natural".
You should know by now not to believe every raving you hear on the net.
The discussion was about computing out-of-range pointers, then somebody
who hadn't understood the discussion made noises about [-1] array
indexing being outlawed, which is plain silly.
The problem arises on machines with non-linear address spaces. The classic
example is the 80x86 family from Intel. Address on this machine consist
of two parts - the segment and offset. For large arrays, it is handy
to have the array start at the beginning of segment, and make the segment
large enough to hold the array. Therefore, array arithmetic is done only
with the offset, making the code run much faster. Unfortunately, the
price you pay is that negative absolute indices 'wrap around' the segment,
and give bizarre results. This can be finessed in YACC by declaring
the array and a pointer, and then if YACC has a 'largest' negative index
of 1 or 2, set pointer = array + 1 or 2, and things work fine. I hope
this makes things clearer.
Alan M. Carroll car...@s.cs.uiuc.edu
Grad Student / U of Ill - Urbana ...{pur-ee,convex}!uiucdcs!s!carroll
"Too many fools who don't think twice, too many ways to pay the price" - AP&EW
Richard Harter (g-...@XAIT.Xerox.COM) writes:
> >As a side note, one argument for making x[-1] legal is that it permits
> >you to use sentinels in both directions. I don't see that this is a
> >problem, regardless of architecture. All that is required is that nothing
> >be allocated on a segment boundary...
Henry Spencer (he...@utzoo.uucp), no less, replies:
> The situation unfortunately isn't as symmetrical as it looks, because
> a pointer to an array element points to the *beginning* of the array
> element.
He must not have gotten over his cold yet. The correct statement is:
a pointer to an array element *is typically implemented as* pointing to
the beginning of the array element. Depending on the machine architecture,
it might be equally well implementable as a pointer to the *end* of the
array element. Other implementations are also conceivable.
A pointer to anything points to *all* of the thing, at once.
The following code copies all of y over all of x, doesn't it?
(Assuming that x and y have types for which the operations are legal.)
p = &x; q = &y; *p = *q;
I'm feeling rather sensitive about this point just now, because I've been
discussing by email with David Prosser, the editor of the Draft ANSI
Standard for C, the several errors in its descriptions of array and pointer
operations. It appears that he and his predecessors made the same or
similar mistakes.
> Both practices have been technically illegal all along,
> so legitimizing both wasn't vitally necessary. Since x[size] gets used
> a lot and is cheap to do, it was legalized. Since x[-1] was rather more
> costly and is used less, it wasn't.
Rather, since x[size] gets used a lot and x[-1] is used less, *and an
implementation is possible on most or all machines where x[size] is cheap*,
it was appropriate to bless x[size].
Mark Brader "True excitement lies in doing 'sdb /unix /dev/kmem'"
utzoo!sq!msb, m...@sq.com -- Pontus Hedman
No, the comittee isn;t going to break that code. Those array references
are from a pointer pointing into the middle of an array back into the
earlier part of the array. This is legal. What is not legal is to have
a pointer point back before the beginning of an array, even if it is
not dereferenced, because there are architectures that do bounds checking
on pointer calculations and there are architectures like the intel 8086
family for which this is may not even be meaningful.
Personally, I think these architectures are the result of severe brain
damage or overdependence on marketing claims. But they're out there, and
have to be accomodated.
--
Peter da Silva `-_-' Ferranti International Controls Corporation.
"Have you hugged U your wolf today?" pe...@ficc.uu.net
[ complaints from programmers ]
> and numerical analysts was about how awkward it was to code algorithms
> in Fortran-IV because it used origin 1 indexing and origin 0 would
> clearly have been so much more "natural".
Most cases 0 is more natural. For some cases 1 is more natural. For other
cases -63 might be more natural. and for others 7 might be the best base.
Fortran now allows these other bases (we use a lot of 0-based arrays here).
'C' doesn't. There is some question whether it should.
More precisely, what it says is that even *computing* a pointer to the
-1th element of an array can send you off into the Twilight Zone. This
is not new -- if you read K&R, that's always been the rule. However,
there is no objection to subtracting 1 from a pointer that already points
to the Nth element of an array, where N > 0. (&a[1])[-1] is legal.
Mark, you might be surprised if you study the X3J11 drafts very closely.
Remember, for example, that a pointer to a struct, suitably cast, is
*required* to point to its first member. Okay, you can say that the cast
can involve a conversion... but when you pay careful attention to the
rules about pointers to unions, such schemes start coming unravelled.
When you look very carefully at the combination of the rules for pointers
to structs, pointers to unions, pointers to arrays whose members are also
arrays, compatibility across separately-compiled modules, etc., it's very
hard to do seriously unorthodox things with pointers without breaking a
rule somewhere.
For one reason, on machines with notions of data type at the hardware
level, this may be illegal. For another reason, pointer arithmetic
may be seriously unorthodox, to the point where doing it using integers
may be much more expensive than using the segment registers. (One
obvious way this can happen -- it almost did happen on the 68000 --
is that a pointer might not fit in an integer register.)
We seem to have strayed out of specifics into the area of general software
methodology. The question as I see it is not one of "fixing" caller
interface errors -- it is one of handling them gracefully. In robust code
the effects of an error are predictable and there are damage control measures;
in fragile code the effects of an error are unpredictable and may be
catastrophic. I would say that it pays to perform parameter validation,
not merely while debugging an application, but at all times and that the
specifications should include, as a matter of course, the actions to be
taken when parameters are not valid. My view is that one should never
assume, as a matter of course, that software is correct.
Assuming that malloc did not return NULL, yes. If it did not you would
not have allocated at least n*sizeof(struct _whatever) bytes.
(Incidentally, `_' marks a reserved name space.) Rephrasing the
question as `is &pstruct[n] legally computable', the answer is still
yes.
>I guess it comes down to this: does dpANS -guarantee- an object is
>divisable into sub-objects following Chris Torek's "locally
>flat" paradigm, and that pointers produced by arithmetic on
>pointers to the sub-objects will be valid.
A `malloc'ed object, yes; at least, that was certainly the intention.
>Simply stated, does dpANS wording imply any difference between
>calloc (n, size) and malloc (n*size) ?
Other than that calloc fills the allocated memory with zero bytes,
there is no guaranteed difference. It is possible that one or the
other might be preferable on some architecture (calloc is often just
malloc+memset in disguise, but on some machine calloc might be able to
use less space, since it gets more information), but there is no
predicting which is best where, and either is correct.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: ch...@mimsy.umd.edu Path: uunet!mimsy!chris
Another way of stating the question is, "Is sizeof(foo) constrained to
be a multiple of the alignment of foo?"
(I have another question at the bottom of this posting.)
The only copy I have of the proposed ANSII C standard is a pretty early
one. It says, "When applied to a structure or union object, the result
is the total number of bytes in the object considered as a member of an
array..." That indicates that the code above is okay (provided that
your compiler is ANSII C.)
When I wrote a storage allocator a while back, I was not quite willing
to believe the guarantee, so I defined a structure,
"struct heap_unit" which could be redefined on various machines if
necessary. All memory allocations were done in multiples of
sizeof(heap_unit). The first, and so far only, implementation
(for Sun3) was as follows:
typedef struct heap_unit
{ struct heap_unit* next; }
Heap_unit;
The "next" field is used to link free-lists together.
...
Now for the other question: Is it guaranteed that the actual memory
allocated (static, automatic, or malloc) for a variable foo is always
at least sizeof(foo)? It would seem that such should be the case,
but I can't find it stated explicitly in my old draft. (I am completely
uninterested in the moral and socioethical considerations of the following
code.)
bar()
{
char a;
struct something foo;
char z;
a = 'a';
z = 'z';
/* Might the following "step on" char a or char z? */
bzero(&foo, sizeof(foo));
}
Occording to the standard, sizeof(foo) returns the size which would
be allocated for a struct something in an array. Will this much
necessarily be allocated for foo on the stack, insulating it from
char a and char z?
> >a pointer to an array element *is typically implemented as* pointing to
> >the beginning of the array element...
> Remember, for example, that a pointer to a struct, suitably cast, is
> *required* to point to its first member. Okay, you can say that the cast
> can involve a conversion... but when you pay careful attention to the
> rules about pointers to unions, such schemes start coming unravelled.
> When you look very carefully at the combination of the rules for pointers
> to structs, pointers to unions, pointers to arrays whose members are also
> arrays, compatibility across separately-compiled modules, etc., it's very
> hard to do seriously unorthodox things with pointers without breaking a
> rule somewhere.
Harrumph. The cast not only can, but *does* involve a conversion.
The question is whether the conversion changes any bits. Most of the
examples you talk about also involve conversions, and the same thing
applies. Inter-module compatibility doesn't, but there are very few
requirements here.
Now, as a counterexample to the original assertion, consider a word-
addressed machine where an "int" occupies one word. Do you seriously
want to say that an "int *" points to a particular byte of the word?
A pointer can be implemented as pointing to the object addressed in almost
any fashion, as long as it is *deducible* exactly which bytes constitute
the object.
Well, maybe your code *is* in trouble. Maybe you're being called by
some other slob's function, and he* can't tell '\0' from NULL. Or
maybe you've got mysterious core dumps, and would like to at least
printf( "Goodbye, cruel world!\n" ) before you exit() off that mortal
coil. (Or, who knows, even tell your MS-DOS tester where the software
was right before the PC suddenly froz
Anyway, you do have a few sources of data on your data. Note that
*all* of these are compiler and operating system dependent to
*implement*, but once implemented, could be used in a fairly portable
function.
Everything's either global/static, automatic, or malloc'ed, right?
You may be able (by staring at the output of nm, or at the .MAP files
your linker generates) to find a relationship between some names that
often (always?) show up. Is one symbol always the first or last in
initialized (global) memory? Then you know one limit of the address
range of extern's and static's.
The symbols end, etext, and edata go *way* back in the history of the
UNIX(R) operating system. "The address of etext is the first address
above the program text [instructions; that is, a limit of the range of
function pointers], edata above the initialized data region [extern's
and static's], and end above the uninitialized data region. . . . the
current value of the program break [initially & end] should be
determined by 'sbrk(0)' (see brk(2))." [end(3C), from an old UNIX
system manual.] (This one looks UNIX-system specific, I'm afraid.)
It's very common for at least two of these areas to share a common
boundary. (For example, the stack begins just above the instructions.)
So one number gives you two boundaries.
A final trick is to put a magic auto variable in main(), such that
it's the very first object on the stack. (This may be the lexically
first or last variable in main()), and store its address in an extern
for later checking. A checking function can define a local variable of
its own, if only to measure the extent of the stack.
Between this flood of numbers, and some system-specific experimentation
to see how they work together, we could produce the following checking
functions:
valid_fpointer(): Is the argument conceivably a valid function
pointer? (In this case, make sure it's on a valid boundary, too.)
valid_extern(): Is the argument possibly a valid pointer to an
extern or static object?
valid_auto(): Is the argument in the right range to be the address of
a local variable of some active function?
valid_alloc(): Is the argument a value malloc() or one of its cousins
has returned? (There are all sorts of ways of beefing up this one.)
valid_heap(): More lenient than valid_alloc(), is this possibly the
address of an object, or part of an object, allocated off of the heap
by the malloc() family?
#define valid_data( p ) \
( valid_extern( p ) || valid_auto( p ) || valid_heap( p ) )
Paul S. R. Chisholm, ps...@poseidon.att.com (formerly p...@lznv.att.com)
AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm
I'm not speaking for the company, I'm just speaking my mind.
UNIX(R) is a registered trademark of AT&T
(*"he": No female programmer would ever do that!-)
True, but in general one must draw the line somewhere. It's almost always
possible to add just one more check. There usually has to be some sort of
balance with efficiency.
Yes.
The source for who contains the following type of thing:
char *str = "hello world"+6;
where printf( "'%c'", str[ -6]);
gives 'h'.
This is is used in the "who am i" command.
It works on most C compilers (with 1 exception still in developement). The
exception is due to the method of compilation requiring a "fake node" to
make it work. DMR is quoted as saying it should work by the compiler writer,
even in ANSI C.
| Terry Lambert UUCP: ...{ decvax, ihnp4 } ...utah-cs!century!terry |
| @ Century Software OR: ...utah-cs!uplherc!sp7040!obie!wsccs!terry |
| SLC, Utah |
| These opinions are not my companies, but if you find them |
| useful, send a $20.00 donation to Brisbane Australia... |
| 'I have an eight user poetic liscence' - me |
>True, but in general one must draw the line somewhere. It's almost always
>possible to add just one more check. There usually has to be some sort of
>balance with efficiency.
It is true that one always has to take into account the tradeoff
between efficiency and error checking. However that is not normally the
tradeoff involved in parameter validation. Unless the body of a function
is very short and very fast, parameter validation execution times are a
nominal part of the total cost of a function invocation.
The real trade offs are between error checking and code size.
It costs time and money to write error checking code, and the presence
of that code in the source increases both maintenance costs and object
code size.
Furthermore, not all error checking is of equal value, nor is it
always appropriate. If the called routine is essentially a subcomponent
of the calling routine, parameter validation is of less value; the two
two routines are part of a common package. On the other hand, if the
called routine is a service routine, it is much more important to do
things like parameter validation.
Note: This is not so much a reply to Henry, who knows all these
things, but a general comment.
Yes, but since C has some specific problems with what you would
like, it is not too far off the subject.
: The question as I see it is not one of "fixing" caller
: interface errors -- it is one of handling them gracefully.
And I see the problem as one of how much code and time one is
willing to spend on "graceful handling". As it happens, I
routinely write code that does not do *any* parameter
validation. And my debugged code almost never breaks in a way
that would have been caught by parameter validation (or, for that
matter, by most of the other tricks of the robust-is-god school),
Now, unless the cost of a failing program is fairly high and the
cost of adding validity checks is low enough, it follows that
such checks do not belong in my code. I don't suppose that I am
really that atypical, so I suppose that the same applies to other
programmers. (Note that I am not speaking of checks used during
the debugging phase; there one needs all the help one can get.
:-)
In fact, the only place that I recommend parameter validation of
any kind is in functions that are callable from the outside world
(entry points to libraries, subsystem entry points, etc.) And
even there, I only recommend this when the cost of a failure (in
the final product) is fairly high.
: I would say that it pays to perform parameter validation,
: not merely while debugging an application, but at all times
Such blanket assertions are almost always wrong; and in this case
is certainly wrong, for the reasons I gave above. Could we
please refrain from stating relatives as if they were absolutes?
(Remember the flames about "is volatile necessary"? Using
absolutes this way guarantees that kind of flame.)
: and that the
: specifications should include, as a matter of course, the actions to be
: taken when parameters are not valid.
This *is* a good idea when robustness is important.
: My view is that one should never
: assume, as a matter of course, that software is correct.
And my view is that one should balance the cost of software
failure against the cost of making it less likely to fail.
---
Bill
You can still reach me at proxftl!bill
But I'd rather you send to proxftl!twwells!bill
>: The question as I see it is not one of "fixing" caller
>: interface errors -- it is one of handling them gracefully.
>And I see the problem as one of how much code and time one is
>willing to spend on "graceful handling". As it happens, I
>routinely write code that does not do *any* parameter
>validation. And my debugged code almost never breaks in a way
>that would have been caught by parameter validation (or, for that
>matter, by most of the other tricks of the robust-is-god school),
>Now, unless the cost of a failing program is fairly high and the
>cost of adding validity checks is low enough, it follows that
>such checks do not belong in my code. I don't suppose that I am
>really that atypical, so I suppose that the same applies to other
>programmers. (Note that I am not speaking of checks used during
>the debugging phase; there one needs all the help one can get.
>:-)
Debugging??! You mean there are people that actually write
code that has bugs in it? :-). As you imply, there are tradeoffs.
Delivered commercial software, real time systems, software with a
lot of users, software in critical applications, and the like have
a high failure cost. A lot of my work has been in these areas, so
I am biased towards robustness. There are a lot of situations where
this is not the case.
As a general remark, once you get past the initial typos,
etc, most errors are the result of lacunae in the specifications
and analysis. As you suggest, parameter validation is not a high
profit area in the robust software game. However it is fairly easy
and it mostly can be done in a routine fashion. It is also part of
a general mind set which asks as a matter of course, "What will happen
if the assumptions made are violated, and what should I do about it?"
It is this mind set and the systematic routine application of this
approach that reduces the likelihood of these pernicious lacunae.
This is more important in large systems.
>: My view is that one should never
>: assume, as a matter of course, that software is correct.
>And my view is that one should balance the cost of software
>failure against the cost of making it less likely to fail.
Oh, I agree. And when I write a throwaway program I cut a lot
of corners. But even then it often pays to be scrupulous -- the routine
error checking makes debugging much more pleasant.
Picky, picky, picky...
>The question is whether the conversion changes any bits. Most of the
>examples you talk about also involve conversions, and the same thing
>applies. Inter-module compatibility doesn't, but there are very few
>requirements here.
Inter-module compatibility's main effect is that it puts limits on how
tricky the compiler can be within a single module, because certain aspects
of the trickery have to be consistent across modules.
>Now, as a counterexample to the original assertion, consider a word-
>addressed machine where an "int" occupies one word. Do you seriously
>want to say that an "int *" points to a particular byte of the word?
The original discussion was about aggregate objects ("aggregate" is used
loosely, possibly not in the precise sense X3J11 uses) in which the
internal structure is programmer-visible. An int is not a relevant example.
However... I'm coming to think that Mark is right (although it does come
under the heading of "something no sensible implementor would do"). I was
misled by a related problem that I'd studied at some length. So long as
the compiler knows about all the conversions being done, which it should
for a portable program, it can fool around as it pleases internally. So
the implications for out-of-range pointers result from common practice
rather than logical necessity.
--
The meek can have the Earth; | Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry he...@zoo.toronto.edu