Thanks
`size_t' is a type suitable for representing the amount
of memory a data object requires, expressed in units of `char'.
It is an integer type (C cannot keep track of fractions of a
`char'), and it is unsigned (negative sizes make no sense).
It is the type of the result of the `sizeof' operator. It is
the type you pass to malloc() and friends to say how much
memory you want. It is the type returned by strlen() to say
how many "significant" characters are in a string.
Each implementation chooses a "real" type like `unsigned
int' or `unsigned long' (or perhaps something else) to be its
`size_t', depending on what makes the most sense. You don't
usually need to worry about what `size_t' looks like "under the
covers;" all you care about is that it is the "right" type for
representing object sizes.
The implementation "publishes" its own choice of `size_t'
in several of the Standard headers: <stdio.h>, <stdlib.h>,
and some others. If you examine one of these headers (most
implementations have some way of doing this), you are likely
to find something like
#ifndef __SIZE_T
#define __SIZE_T
typedef unsigned int size_t;
#endif
... meaning that on this particular implementation `size_t' is
an `unsigned int'. Other implementations make other choices.
(The preprocessor stuff -- which needn't be in exactly the form
shown here -- ensures that your program will contain only one
`typedef' for `size_t' even if it includes several of the headers
that declare it.)
General guidance: If you want to express the size of something
or the number of characters in something, you should probably use
a `size_t' value to do so. Some people also hold that an array
index is a sort of "proxy" for a size, so `size_t' should be used
for array indices as well; I see merit in the argument but confess
that I usually disregard it.
--
Eric Sosman
eso...@acm-dot-org.invalid
I'm one of those who recommend using 'size_t' for an array index,
because it will be able to represent any possible index value for
any possible size array on a given implementation.
-Mike
Candice
The other problem is that, assuming garbage values are random, I know that
50% of garbage integers passed to my routine will be negative. So an
"assert( N >= 0)" will have a very high chance of trapping garbage, if N is
declared as an int. Declare "N" as a size_t, and you cannot legitmately do
this test, only "sanity check". Sanity checks are pretty dangerous - who is
to say that in a few years time images of a million by a million pixels
won't be in routine use?
How so?
Do you feel that:
size_t i = 0;
is somenow 'uglier' than e.g:
int i = 0;
I don't. I think the first form very clearly expresses
the intended usage of 'i'.
> The other problem is that, assuming garbage values are random,
Eh? What garbage values? Prevent garbage by always initializing
your objects. Prevent overflow/underfow/div by zero, etc. by
thinking carefully when writing computational expressions.
>I know that
> 50% of garbage integers passed to my routine will be negative.
No, you can't know that.
>So an
> "assert( N >= 0)" will have a very high chance of trapping garbage, if N
is
> declared as an int.
Doing that is far outside of any methodical or coherent way
to trap invalid data.
> Declare "N" as a size_t, and you cannot legitmately do
> this test,
The test itself is what's not legitimate.
> only "sanity check". Sanity checks are pretty dangerous -
They can be, and they can also help, but only as a 'rough'
test, not conclusive.
> who is
> to say that in a few years time images of a million by a million pixels
> won't be in routine use?
What's that got to do with anything?
-Mike
I'm also one of those who use size_t wherever appropriate, not just
because it is correct, but also because it reduces the number of
warnings from lint-like programs.
size_t i = 0; is not ugly, the ugly part IMHO is the missing #include
directive needed to get a definition of the size_t type. size_t should
be defined by the language just like int and long, not by some header file.
<POSIX rant>
It gets even uglier if you use posix functions like read() and write()
and have to mix size_t with its signed cousin, ssize_t. Its pretty
stupid to have a function which accepts a size_t argument which value
cannot be greater than SSIZE_MAX. Kinda makes me miss K&R C and just use
int's.
</POSIX>
Bjørn
If beauty is not in the eye of the beholder, then the argument
being made seems to be not that it uglifies code, but that the
code is ugly either way.
>The other problem is that, assuming garbage values are random,
>I know that 50% of garbage integers passed to my routine will
>be negative.
Please don't ever use that as a general guide.
>So an
>"assert( N >= 0)" will have a very high chance of trapping garbage,
This tends to be a low level, and perhaps misplaced, trapping....
>if N is declared as an int
...And orchestrated too then in that case.
>Declare "N" as a size_t, and you cannot legitmately do
>this test,
Some would argue that's the point.
>only "sanity check". Sanity checks are pretty dangerous -
Let's assume this is true...
>who is
>to say that in a few years time images of a million by a million pixels
>won't be in routine use?
... If so, you're way is neither here nor there about it,
which makes this a red herring argument.
Notwithstanding that, it seem you are prescribing an
insane sanity check then.
--
Greg Comeau / Comeau C++ 4.3.3, for C++03 core language support
Comeau C/C++ ONLINE ==> http://www.comeaucomputing.com/tryitout
World Class Compilers: Breathtaking C++, Amazing C99, Fabulous C90.
Comeau C/C++ with Dinkumware's Libraries... Have you tried it?
IMAGE *create_image(size_t width, size_t height)
(The suggested form)
or
IMAGE *create_image(int width, int height)
(The pre-ANSI form).
Now my caller has allocated a list of image parameters, with malloc(),
thinks he has initialised them to inputs from a file, but in fact due to a
bug in his routine only the first set of parameters are initialised, the
others are set to whatever malloc() happened to return. Happens all the
time.
So in my second function, I write
assert(width >= 0);
assert(height >= 0);
(I might want to allow zero_dimension images).
The caller is calling the function many times with gargbage functions. We
have to be very unlucky for the assert() not to trigger and tell him what he
has done.
In the first function, width and height are size_t. So the test won't work.
No probs, because the function will still be called with huge garbage
values.
So I can write
assert(width <= 8000)
because an 8000 * 8000 image isn't going to fit in memory. Values of that
size must be corrupt, this is "sanity checking".
However now we have several problems. 8000 is a reasonable value for my
particular machine, but do I know, for instance, that the routine won't be
used on some high-end machine that processes massive images?
The second thing is that I now have to document the behaviour. My caller is
an intelligent man who knows that he can expect bad things to happen if he
tries to create an image with negative dimensions. He might also guess that
there is a limit on image size, but he cannot be expected to know that it is
8000. So I've got to put in a little note saying "dimensions must be 8000 or
less".
Or I could just omit the sanity check and let the allocation routine run out
of memory, in which case caller will waste time wondering whether the values
are wrong or the machine in low on memory.
So these are not huge issues, but we've got something that is slightly less
friendly and easy to use than we had before. The really important point is
that the cumulative effect of such little annoyances is significant in terms
of code quality and reliability.
> IMAGE *create_image(size_t width, size_t height)
> (The suggested form)
> or
> IMAGE *create_image(int width, int height)
> (The pre-ANSI form).
[snip]
> So in my second function, I write
> assert(width >= 0);
> assert(height >= 0);
> (I might want to allow zero_dimension images).
I don't see how that would be *not* equivalent to writing:
assert(width < SIZE_MAX / 2);
in the first function. In this case it happens that whatever value
you get, it is "correct". You have to control the image size anyway,
in this case it would be something similar to:
assert(width < MAX_WIDTH);
whereas in the second case it must be:
assert(width >= 0 && width < MAX_WIDTH);
I just can't see advantage of signed arguments; the amount of work is
the same as in the unsigned case, and _additionally_ you have to
take care for the negative values (ie. fight the problems that you
have created yourself).
--
Stan Tobias
mailx `echo si...@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`
But the int argument is self-documenting. Everyone knows that trying to
create an image of negative dimensions is illegal. It is also probably true
that horrible things will happen if width * height overflows the size of a
size_t, but that check is harder to put in. But it is not inherently illegal
to create a huge image.
Specifically, use of your assert is beginning to dig a hole for yourself.
Why, the calling programming might ask, is width constrained to be less than
some expression?
You are not "creating problems for yourself" by declaring create_image to
take an integer, and thus opening the possibility of being passed negative
argument. The problem is the calling programmer's and he is passing garbage
to your function. If you can recognise it as garbage, you've done him a
favour.
I'll give you another poser. How would you write the following set of
functions?
/*
Create an image set to black
*/
IMAGE *create_image(mystery_t width, mystery_t height);
/*
set a pixel (COLOUR is a type defined elsewhere that holds a colour value)
Out-of-bounds values to be rejected.
*/
void set_pixel(IMAGE *image, mystery-t x, mystery_t y, COLOUR col);
/*
draw a circle, parts outside the image to be clipped.
*/
void circle(IMAGE *image, mystery_t x, mystery_t y, mystery_t r, COLOUR
col);
What would you use for mystery_t, in each case?
No problem here with size_t or int.
> /*
> set a pixel (COLOUR is a type defined elsewhere that holds a colour value)
> Out-of-bounds values to be rejected.
> */
> void set_pixel(IMAGE *image, mystery-t x, mystery_t y, COLOUR col);
If I use size_t, I can replace checks for >=0 and <=width by
one check for <=width, analogously for height.
> /*
> draw a circle, parts outside the image to be clipped.
> */
> void circle(IMAGE *image, mystery_t x, mystery_t y, mystery_t r, COLOUR
> col);
dito.
>
> What would you use for mystery_t, in each case?
size_t, on all counts.
The only critical part is create_image; here we have to put a comment
at the check against SIZE_MAX/2.
Checks against too large image size are as easy as in int and we have
more ways of doing it, e.g.
not only INT_MAX/width<height or SIZE_MAX/width<height but also
(width*height)/height!=width
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
I like to use it as well in the appropriate places.
> size_t i = 0; is not ugly, the ugly part IMHO is the missing #include
> directive needed to get a definition of the size_t type.
Well, having to worry about casting to unsigned long or something
else appropriate for size_t variables in printf() (without %z, C99) is
what bothers me most about it aesthetically.
> size_t should be defined by the language just like int and long,
> not by some header file.
Good point.
--
Randy Howard (2reply remove FOOBAR)
> Malcolm <mal...@55bank.freeserve.co.uk> wrote:
>
>> IMAGE *create_image(size_t width, size_t height)
>> (The suggested form)
>> or
>> IMAGE *create_image(int width, int height)
>> (The pre-ANSI form).
>
> [snip]
>
>> So in my second function, I write
>
>> assert(width >= 0);
>> assert(height >= 0);
>> (I might want to allow zero_dimension images).
>
> I don't see how that would be *not* equivalent to writing:
>
> assert(width < SIZE_MAX / 2);
Make that assert(width <= SIZE_MAX/2); and I'd probably agree with you
subject to a couple of notes
1. size_t doesn't have to have the same size as int
2. even if it does have the same size it doesn't have to have double
(roughly) the range, although typically that is the case.
> in the first function. In this case it happens that whatever value you
> get, it is "correct". You have to control the image size anyway, in
> this case it would be something similar to:
>
> assert(width < MAX_WIDTH);
Even if you don't do this directly the chances are that something eise
will report a failure for an oversized image, e.g. memory allocation.
Lawrence
However a naive programmer might try to malloc(width * height *
sizeof(pixel)), not realising that if width * height overflows then he may
ask for the wrong amount of memory, and maybe the function will appear to
succeed when in fact it has failed.
So a more sophisticated "sanity check" is actually a good idea, for this
particular function. The general observation however remains valid; "int"
allows a self-documenting check for garnbage, whilst size_t doesn't.
Using a signed integer, overflow will give undefined behavior.
Using an unsigned integer, overflow gives well-defined behavior,
but an incorrect value. Which is easier to detect?
>
> However a naive programmer might try to malloc(width * height *
> sizeof(pixel)), not realising that if width * height overflows then he may
> ask for the wrong amount of memory, and maybe the function will appear to
> succeed when in fact it has failed.
It will have succeeded in performing what was requested of it.
If the request was wrong, it's the coder's fault. Choosing
signed over unsigned can't prevent it.
>
> So a more sophisticated "sanity check" is actually a good idea, for this
> particular function. The general observation however remains valid; "int"
> allows a self-documenting check for garnbage,
It allows the possiblity of undefined behavior.
>whilst size_t doesn't.
It always has well-defined behavior. (And can represent the
size of any object. No other type provides this guarantee).
-Mike
Also, if create_image() is called with huge parameters, there are two
possibilities. Either they have been entered by a human who genuinely wants
a huge image for some reason, or they are corrupt values (eg random memory).
As a humans we know that if the function is called with a demand for an
image 1000000 by 1000000 pixels then it is impossible that such round vlaues
could have arisen by chance, and it must be someone wanting a giant image.
Such a person doesn't want an assertion fail, or to be told that his
parameters are invalid, because an image of a milion pixels square is
clearly a logical possibility. He wants to be told "sorry, the computer does
not have enough memory to fulfil your request".
However there is no way a computer can distinguish such a call from a
request for 1352678 by 2511044 pixels, which is typical garbage.
>
> > However a naive programmer might try to malloc(width * height *
> > sizeof(pixel)), not realising that if width * height overflows then he
may
> > ask for the wrong amount of memory, and maybe the function will
> > appear to succeed when in fact it has failed.
>
> It will have succeeded in performing what was requested of it.
> If the request was wrong, it's the coder's fault. Choosing
> signed over unsigned can't prevent it.
>
It's a bug in the function. A huge value could overflow to a small value, so
the call to malloc() succeeds, and UB when you try to access the pixels.
>
> It allows the possiblity of undefined behavior.
> whilst size_t doesn't.
>
If we call malloc(width * height) with huge values, then if width and height
are ints then this is UB. In this case UB is actually good, because it means
the computer is allowed to perform correct behaviour (terminating the
program with an error message). Defined wrong behaviour is far more
dangerous than UB.
>
> It always has well-defined behavior. (And can represent the
> size of any object. No other type provides this guarantee).
>
This is the problem. In my opinion ANSI have dug C into a hole with size_t.
In a narrow technical sense they are right - malloc() can legitmately be
called with a request for more memory than will fit in INT_MAX, so let's
have a special type. But then that means that strings can be longer than
INT_MAX as well, so strlen() has to return a size_t. Then if strings can be
longer than INT_MAX, then an index into a character array must be size_t as
well, And in fact it applies to all objects, so if we represent the number
of employees in a payroll function by an int that is wrong as well, strictly
it must be size_t. So without really realising they were doing it, ANSI made
a fundamental change to the language. And this is when more modern language
like Java have done away with unsigned types altogether, because of the
problems they cause.