Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

what is size_t

14 views
Skip to first unread message

candy...@yahoo.com

unread,
Jan 29, 2005, 9:27:20 AM1/29/05
to
I sometimes comes across statements which invloves the use of
size_t.But I dont know exactly that what is the meaning of size_t.What
I know about it is that it is used to hide the platform details.I tried
to find its meaning in the header files but did'nt got a good answer.So
can somebody please tell me that what is the meaning of size_t and what
are its possible values?

Thanks

Eric Sosman

unread,
Jan 29, 2005, 10:03:14 AM1/29/05
to

`size_t' is a type suitable for representing the amount
of memory a data object requires, expressed in units of `char'.
It is an integer type (C cannot keep track of fractions of a
`char'), and it is unsigned (negative sizes make no sense).
It is the type of the result of the `sizeof' operator. It is
the type you pass to malloc() and friends to say how much
memory you want. It is the type returned by strlen() to say
how many "significant" characters are in a string.

Each implementation chooses a "real" type like `unsigned
int' or `unsigned long' (or perhaps something else) to be its
`size_t', depending on what makes the most sense. You don't
usually need to worry about what `size_t' looks like "under the
covers;" all you care about is that it is the "right" type for
representing object sizes.

The implementation "publishes" its own choice of `size_t'
in several of the Standard headers: <stdio.h>, <stdlib.h>,
and some others. If you examine one of these headers (most
implementations have some way of doing this), you are likely
to find something like

#ifndef __SIZE_T
#define __SIZE_T
typedef unsigned int size_t;
#endif

... meaning that on this particular implementation `size_t' is
an `unsigned int'. Other implementations make other choices.
(The preprocessor stuff -- which needn't be in exactly the form
shown here -- ensures that your program will contain only one
`typedef' for `size_t' even if it includes several of the headers
that declare it.)

General guidance: If you want to express the size of something
or the number of characters in something, you should probably use
a `size_t' value to do so. Some people also hold that an array
index is a sort of "proxy" for a size, so `size_t' should be used
for array indices as well; I see merit in the argument but confess
that I usually disregard it.

--
Eric Sosman
eso...@acm-dot-org.invalid

Mike Wahler

unread,
Jan 29, 2005, 12:14:03 PM1/29/05
to
"Eric Sosman" <eso...@acm-dot-org.invalid> wrote in message
news:O_6dnULM1fc...@comcast.com...

> General guidance: If you want to express the size of something
> or the number of characters in something, you should probably use
> a `size_t' value to do so. Some people also hold that an array
> index is a sort of "proxy" for a size, so `size_t' should be used
> for array indices as well; I see merit in the argument but confess
> that I usually disregard it.

I'm one of those who recommend using 'size_t' for an array index,
because it will be able to represent any possible index value for
any possible size array on a given implementation.

-Mike


candy...@yahoo.com

unread,
Jan 29, 2005, 12:29:53 PM1/29/05
to

Eric Sosman wrote:
>
>
> `size_t' is a type .................
Thanks a lot for a wonderful description of size_t.

Candice

Malcolm

unread,
Jan 29, 2005, 2:11:21 PM1/29/05
to

"Mike Wahler" <mkwa...@mkwahler.net> wrote

> I'm one of those who recommend using 'size_t' for an array index,
> because it will be able to represent any possible index value for
> any possible size array on a given implementation.
>
You're right, but it uglifies code.

The other problem is that, assuming garbage values are random, I know that
50% of garbage integers passed to my routine will be negative. So an
"assert( N >= 0)" will have a very high chance of trapping garbage, if N is
declared as an int. Declare "N" as a size_t, and you cannot legitmately do
this test, only "sanity check". Sanity checks are pretty dangerous - who is
to say that in a few years time images of a million by a million pixels
won't be in routine use?


Mike Wahler

unread,
Jan 29, 2005, 2:31:41 PM1/29/05
to

"Malcolm" <mal...@55bank.freeserve.co.uk> wrote in message
news:ctgmg9$abg$1...@newsg1.svr.pol.co.uk...

>
> "Mike Wahler" <mkwa...@mkwahler.net> wrote
> > I'm one of those who recommend using 'size_t' for an array index,
> > because it will be able to represent any possible index value for
> > any possible size array on a given implementation.
> >
> You're right, but it uglifies code.

How so?

Do you feel that:

size_t i = 0;

is somenow 'uglier' than e.g:

int i = 0;

I don't. I think the first form very clearly expresses
the intended usage of 'i'.

> The other problem is that, assuming garbage values are random,

Eh? What garbage values? Prevent garbage by always initializing
your objects. Prevent overflow/underfow/div by zero, etc. by
thinking carefully when writing computational expressions.

>I know that
> 50% of garbage integers passed to my routine will be negative.

No, you can't know that.

>So an
> "assert( N >= 0)" will have a very high chance of trapping garbage, if N
is
> declared as an int.

Doing that is far outside of any methodical or coherent way
to trap invalid data.

> Declare "N" as a size_t, and you cannot legitmately do
> this test,

The test itself is what's not legitimate.

> only "sanity check". Sanity checks are pretty dangerous -

They can be, and they can also help, but only as a 'rough'
test, not conclusive.

> who is
> to say that in a few years time images of a million by a million pixels
> won't be in routine use?

What's that got to do with anything?

-Mike


Bjørn Augestad

unread,
Jan 29, 2005, 2:48:18 PM1/29/05
to
Mike Wahler wrote:
> "Malcolm" <mal...@55bank.freeserve.co.uk> wrote in message
> news:ctgmg9$abg$1...@newsg1.svr.pol.co.uk...
>
>>"Mike Wahler" <mkwa...@mkwahler.net> wrote
>>
>>>I'm one of those who recommend using 'size_t' for an array index,
>>>because it will be able to represent any possible index value for
>>>any possible size array on a given implementation.
>>>
>>
>>You're right, but it uglifies code.
>
>
> How so?
>
> Do you feel that:
>
> size_t i = 0;
>
> is somenow 'uglier' than e.g:
>
> int i = 0;
>
> I don't. I think the first form very clearly expresses
> the intended usage of 'i'.
>

I'm also one of those who use size_t wherever appropriate, not just
because it is correct, but also because it reduces the number of
warnings from lint-like programs.

size_t i = 0; is not ugly, the ugly part IMHO is the missing #include
directive needed to get a definition of the size_t type. size_t should
be defined by the language just like int and long, not by some header file.

<POSIX rant>
It gets even uglier if you use posix functions like read() and write()
and have to mix size_t with its signed cousin, ssize_t. Its pretty
stupid to have a function which accepts a size_t argument which value
cannot be greater than SSIZE_MAX. Kinda makes me miss K&R C and just use
int's.
</POSIX>

Bjørn

Greg Comeau

unread,
Jan 29, 2005, 5:31:51 PM1/29/05
to
In article <ctgmg9$abg$1...@newsg1.svr.pol.co.uk>,

Malcolm <mal...@55bank.freeserve.co.uk> wrote:
>
>"Mike Wahler" <mkwa...@mkwahler.net> wrote
>> I'm one of those who recommend using 'size_t' for an array index,
>> because it will be able to represent any possible index value for
>> any possible size array on a given implementation.
>>
>You're right, but it uglifies code.

If beauty is not in the eye of the beholder, then the argument
being made seems to be not that it uglifies code, but that the
code is ugly either way.

>The other problem is that, assuming garbage values are random,
>I know that 50% of garbage integers passed to my routine will
>be negative.

Please don't ever use that as a general guide.

>So an
>"assert( N >= 0)" will have a very high chance of trapping garbage,

This tends to be a low level, and perhaps misplaced, trapping....

>if N is declared as an int

...And orchestrated too then in that case.

>Declare "N" as a size_t, and you cannot legitmately do
>this test,

Some would argue that's the point.

>only "sanity check". Sanity checks are pretty dangerous -

Let's assume this is true...

>who is
>to say that in a few years time images of a million by a million pixels
>won't be in routine use?

... If so, you're way is neither here nor there about it,
which makes this a red herring argument.

Notwithstanding that, it seem you are prescribing an
insane sanity check then.
--
Greg Comeau / Comeau C++ 4.3.3, for C++03 core language support
Comeau C/C++ ONLINE ==> http://www.comeaucomputing.com/tryitout
World Class Compilers: Breathtaking C++, Amazing C99, Fabulous C90.
Comeau C/C++ with Dinkumware's Libraries... Have you tried it?

Malcolm

unread,
Jan 30, 2005, 5:39:32 AM1/30/05
to
"Greg Comeau" <com...@panix.com> wrote

>
> Notwithstanding that, it seem you are prescribing an
> insane sanity check then.
>
Consider this. I'm writing a function to create an image

IMAGE *create_image(size_t width, size_t height)
(The suggested form)
or
IMAGE *create_image(int width, int height)
(The pre-ANSI form).

Now my caller has allocated a list of image parameters, with malloc(),
thinks he has initialised them to inputs from a file, but in fact due to a
bug in his routine only the first set of parameters are initialised, the
others are set to whatever malloc() happened to return. Happens all the
time.

So in my second function, I write

assert(width >= 0);
assert(height >= 0);
(I might want to allow zero_dimension images).

The caller is calling the function many times with gargbage functions. We
have to be very unlucky for the assert() not to trigger and tell him what he
has done.

In the first function, width and height are size_t. So the test won't work.
No probs, because the function will still be called with huge garbage
values.

So I can write
assert(width <= 8000)

because an 8000 * 8000 image isn't going to fit in memory. Values of that
size must be corrupt, this is "sanity checking".

However now we have several problems. 8000 is a reasonable value for my
particular machine, but do I know, for instance, that the routine won't be
used on some high-end machine that processes massive images?
The second thing is that I now have to document the behaviour. My caller is
an intelligent man who knows that he can expect bad things to happen if he
tries to create an image with negative dimensions. He might also guess that
there is a limit on image size, but he cannot be expected to know that it is
8000. So I've got to put in a little note saying "dimensions must be 8000 or
less".
Or I could just omit the sanity check and let the allocation routine run out
of memory, in which case caller will waste time wondering whether the values
are wrong or the machine in low on memory.

So these are not huge issues, but we've got something that is slightly less
friendly and easy to use than we had before. The really important point is
that the cumulative effect of such little annoyances is significant in terms
of code quality and reliability.


S.Tobias

unread,
Jan 30, 2005, 11:51:17 AM1/30/05
to
Malcolm <mal...@55bank.freeserve.co.uk> wrote:

> IMAGE *create_image(size_t width, size_t height)
> (The suggested form)
> or
> IMAGE *create_image(int width, int height)
> (The pre-ANSI form).

[snip]

> So in my second function, I write

> assert(width >= 0);
> assert(height >= 0);
> (I might want to allow zero_dimension images).

I don't see how that would be *not* equivalent to writing:

assert(width < SIZE_MAX / 2);

in the first function. In this case it happens that whatever value
you get, it is "correct". You have to control the image size anyway,
in this case it would be something similar to:

assert(width < MAX_WIDTH);

whereas in the second case it must be:

assert(width >= 0 && width < MAX_WIDTH);

I just can't see advantage of signed arguments; the amount of work is
the same as in the unsigned case, and _additionally_ you have to
take care for the negative values (ie. fight the problems that you
have created yourself).

--
Stan Tobias
mailx `echo si...@FamOuS.BedBuG.pAlS.INVALID | sed s/[[:upper:]]//g`

Malcolm

unread,
Jan 30, 2005, 1:27:40 PM1/30/05
to

"S.Tobias" <si...@FamOuS.BedBuG.pAlS.INVALID> wrote

>
> > So in my second function, I write
>
> > assert(width >= 0);
> > assert(height >= 0);
> > (I might want to allow zero_dimension images).
>
> I don't see how that would be *not* equivalent to writing:
>
> assert(width < SIZE_MAX / 2);
>
> in the first function.
>
> I just can't see advantage of signed arguments; the amount of work is
> the same as in the unsigned case, and _additionally_ you have to
> take care for the negative values (ie. fight the problems that you
> have created yourself).
>
The mistake you're making is to forget that the calling programmer is a
human.
What you are saying is that it is possible to trap exactly the same set of
inputs by using some expression to test the size_t argument.

But the int argument is self-documenting. Everyone knows that trying to
create an image of negative dimensions is illegal. It is also probably true
that horrible things will happen if width * height overflows the size of a
size_t, but that check is harder to put in. But it is not inherently illegal
to create a huge image.

Specifically, use of your assert is beginning to dig a hole for yourself.
Why, the calling programming might ask, is width constrained to be less than
some expression?

You are not "creating problems for yourself" by declaring create_image to
take an integer, and thus opening the possibility of being passed negative
argument. The problem is the calling programmer's and he is passing garbage
to your function. If you can recognise it as garbage, you've done him a
favour.

I'll give you another poser. How would you write the following set of
functions?

/*
Create an image set to black
*/
IMAGE *create_image(mystery_t width, mystery_t height);
/*
set a pixel (COLOUR is a type defined elsewhere that holds a colour value)
Out-of-bounds values to be rejected.
*/
void set_pixel(IMAGE *image, mystery-t x, mystery_t y, COLOUR col);
/*
draw a circle, parts outside the image to be clipped.
*/
void circle(IMAGE *image, mystery_t x, mystery_t y, mystery_t r, COLOUR
col);

What would you use for mystery_t, in each case?


Michael Mair

unread,
Jan 30, 2005, 3:57:51 PM1/30/05
to

No problem here with size_t or int.

> /*
> set a pixel (COLOUR is a type defined elsewhere that holds a colour value)
> Out-of-bounds values to be rejected.
> */
> void set_pixel(IMAGE *image, mystery-t x, mystery_t y, COLOUR col);

If I use size_t, I can replace checks for >=0 and <=width by
one check for <=width, analogously for height.

> /*
> draw a circle, parts outside the image to be clipped.
> */
> void circle(IMAGE *image, mystery_t x, mystery_t y, mystery_t r, COLOUR
> col);

dito.


>
> What would you use for mystery_t, in each case?

size_t, on all counts.

The only critical part is create_image; here we have to put a comment
at the check against SIZE_MAX/2.
Checks against too large image size are as easy as in int and we have
more ways of doing it, e.g.
not only INT_MAX/width<height or SIZE_MAX/width<height but also
(width*height)/height!=width


Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Malcolm

unread,
Jan 30, 2005, 5:53:44 PM1/30/05
to

"Michael Mair"

>
> > I'll give you another poser. How would you write the following set of
> > functions?
> >
> > /*
> > Create an image set to black
> > */
> > IMAGE *create_image(mystery_t width, mystery_t height);
>
> No problem here with size_t or int.
>
> > /*
> > set a pixel (COLOUR is a type defined elsewhere that holds a colour
value)
> > Out-of-bounds values to be rejected.
> > */
> > void set_pixel(IMAGE *image, mystery-t x, mystery_t y, COLOUR col);
>
> If I use size_t, I can replace checks for >=0 and <=width by
> one check for <=width, analogously for height.
>
> > /*
> > draw a circle, parts outside the image to be clipped.
> > */
> > void circle(IMAGE *image, mystery_t x, mystery_t y, mystery_t r, COLOUR
> > col);
>
> dito.
> >
> > What would you use for mystery_t, in each case?
>
> size_t, on all counts.
>
The question was, of course, designed so that there is a problem with that
answer.

>
> The only critical part is create_image; here we have to put a comment
> at the check against SIZE_MAX/2.
> Checks against too large image size are as easy as in int and we have
> more ways of doing it, e.g.
> not only INT_MAX/width<height or SIZE_MAX/width<height but also
> (width*height)/height!=width
>
Actually you probably need more than one byte per pixel. However with a bit
of care you can come up with a better "sanity check" than testing against
8000. This supposes of course that you are allocating the image in one
block, which is what I would do today, but not in the old 286 memory model
days. The point was never that use of size_t, by itself, will instantly
create a totally unworkable and un manageable disaster, but that it
introduces little niggly difficulties that have a cumulative effect of
making code harder to maintain.


Randy Howard

unread,
Jan 31, 2005, 12:54:25 AM1/31/05
to
In article <6MRKd.602$4c.7...@juliett.dax.net>, b...@metasystems.no says...

> I'm also one of those who use size_t wherever appropriate, not just
> because it is correct, but also because it reduces the number of
> warnings from lint-like programs.

I like to use it as well in the appropriate places.

> size_t i = 0; is not ugly, the ugly part IMHO is the missing #include
> directive needed to get a definition of the size_t type.

Well, having to worry about casting to unsigned long or something
else appropriate for size_t variables in printf() (without %z, C99) is
what bothers me most about it aesthetically.

> size_t should be defined by the language just like int and long,
> not by some header file.

Good point.

--
Randy Howard (2reply remove FOOBAR)

Lawrence Kirby

unread,
Jan 31, 2005, 7:14:08 AM1/31/05
to
On Sun, 30 Jan 2005 16:51:17 +0000, S.Tobias wrote:

> Malcolm <mal...@55bank.freeserve.co.uk> wrote:
>
>> IMAGE *create_image(size_t width, size_t height)
>> (The suggested form)
>> or
>> IMAGE *create_image(int width, int height)
>> (The pre-ANSI form).
>
> [snip]
>
>> So in my second function, I write
>
>> assert(width >= 0);
>> assert(height >= 0);
>> (I might want to allow zero_dimension images).
>
> I don't see how that would be *not* equivalent to writing:
>
> assert(width < SIZE_MAX / 2);

Make that assert(width <= SIZE_MAX/2); and I'd probably agree with you
subject to a couple of notes

1. size_t doesn't have to have the same size as int

2. even if it does have the same size it doesn't have to have double
(roughly) the range, although typically that is the case.

> in the first function. In this case it happens that whatever value you
> get, it is "correct". You have to control the image size anyway, in
> this case it would be something similar to:
>
> assert(width < MAX_WIDTH);

Even if you don't do this directly the chances are that something eise
will report a failure for an oversized image, e.g. memory allocation.

Lawrence

Malcolm

unread,
Jan 31, 2005, 4:59:36 PM1/31/05
to

"Lawrence Kirby" <lkn...@netactive.co.uk> wrote

>
> Even if you don't do this directly the chances are that something eise
> will report a failure for an oversized image, e.g. memory allocation.
>
What you want to happen is for the function to return an out-of-memory
condition if you try to allocate an enormous image (which request may well
be legitimate, if you design posters or something). You want it to assert
fail on invalid parameters if you try to pass it garbage.
My point was that by using ints as parameters, you have a free garbage
detector, because negative values have to be garbage. Using an unsigned
type, you never know whether the request is legitimate or not.

However a naive programmer might try to malloc(width * height *
sizeof(pixel)), not realising that if width * height overflows then he may
ask for the wrong amount of memory, and maybe the function will appear to
succeed when in fact it has failed.

So a more sophisticated "sanity check" is actually a good idea, for this
particular function. The general observation however remains valid; "int"
allows a self-documenting check for garnbage, whilst size_t doesn't.


Mike Wahler

unread,
Feb 1, 2005, 3:21:46 AM2/1/05
to

"Malcolm" <mal...@55bank.freeserve.co.uk> wrote in message
news:ctm93k$ito$1...@newsg4.svr.pol.co.uk...

>
> "Lawrence Kirby" <lkn...@netactive.co.uk> wrote
> >
> > Even if you don't do this directly the chances are that something eise
> > will report a failure for an oversized image, e.g. memory allocation.
> >
> What you want to happen is for the function to return an out-of-memory
> condition if you try to allocate an enormous image (which request may well
> be legitimate, if you design posters or something). You want it to assert
> fail on invalid parameters if you try to pass it garbage.
> My point was that by using ints as parameters, you have a free garbage
> detector, because negative values have to be garbage. Using an unsigned
> type, you never know whether the request is legitimate or not.

Using a signed integer, overflow will give undefined behavior.
Using an unsigned integer, overflow gives well-defined behavior,
but an incorrect value. Which is easier to detect?

>
> However a naive programmer might try to malloc(width * height *
> sizeof(pixel)), not realising that if width * height overflows then he may
> ask for the wrong amount of memory, and maybe the function will appear to
> succeed when in fact it has failed.

It will have succeeded in performing what was requested of it.
If the request was wrong, it's the coder's fault. Choosing
signed over unsigned can't prevent it.

>
> So a more sophisticated "sanity check" is actually a good idea, for this
> particular function. The general observation however remains valid; "int"
> allows a self-documenting check for garnbage,

It allows the possiblity of undefined behavior.

>whilst size_t doesn't.

It always has well-defined behavior. (And can represent the
size of any object. No other type provides this guarantee).

-Mike


Malcolm

unread,
Feb 1, 2005, 5:12:24 PM2/1/05
to

"Mike Wahler" <mkwa...@mkwahler.net> wrote

>
> > My point was that by using ints as parameters, you have a free garbage
> > detector, because negative values have to be garbage. Using an unsigned
> > type, you never know whether the request is legitimate or not.
>
> Using a signed integer, overflow will give undefined behavior.
> Using an unsigned integer, overflow gives well-defined behavior,
> but an incorrect value. Which is easier to detect?
>
In this particular example, we probably want to call malloc() with width *
height to create the pixels for our image. So any values of width * height
that overflow SIZE_MAX are potential problems.
And because of the way that ANSI have defined the behaviour of signed and
unsigned types, it is actually easier to do this using unsigned rather than
signed arithmetic, so you have a point, in this particular case.
However if we were to use a different allocation scheme internally, then the
point would no longer hold.

Also, if create_image() is called with huge parameters, there are two
possibilities. Either they have been entered by a human who genuinely wants
a huge image for some reason, or they are corrupt values (eg random memory).
As a humans we know that if the function is called with a demand for an
image 1000000 by 1000000 pixels then it is impossible that such round vlaues
could have arisen by chance, and it must be someone wanting a giant image.
Such a person doesn't want an assertion fail, or to be told that his
parameters are invalid, because an image of a milion pixels square is
clearly a logical possibility. He wants to be told "sorry, the computer does
not have enough memory to fulfil your request".
However there is no way a computer can distinguish such a call from a
request for 1352678 by 2511044 pixels, which is typical garbage.


>
> > However a naive programmer might try to malloc(width * height *
> > sizeof(pixel)), not realising that if width * height overflows then he
may
> > ask for the wrong amount of memory, and maybe the function will
> > appear to succeed when in fact it has failed.
>
> It will have succeeded in performing what was requested of it.
> If the request was wrong, it's the coder's fault. Choosing
> signed over unsigned can't prevent it.
>

It's a bug in the function. A huge value could overflow to a small value, so
the call to malloc() succeeds, and UB when you try to access the pixels.


>
> It allows the possiblity of undefined behavior.
> whilst size_t doesn't.
>

If we call malloc(width * height) with huge values, then if width and height
are ints then this is UB. In this case UB is actually good, because it means
the computer is allowed to perform correct behaviour (terminating the
program with an error message). Defined wrong behaviour is far more
dangerous than UB.


>
> It always has well-defined behavior. (And can represent the
> size of any object. No other type provides this guarantee).
>

This is the problem. In my opinion ANSI have dug C into a hole with size_t.
In a narrow technical sense they are right - malloc() can legitmately be
called with a request for more memory than will fit in INT_MAX, so let's
have a special type. But then that means that strings can be longer than
INT_MAX as well, so strlen() has to return a size_t. Then if strings can be
longer than INT_MAX, then an index into a character array must be size_t as
well, And in fact it applies to all objects, so if we represent the number
of employees in a payroll function by an int that is wrong as well, strictly
it must be size_t. So without really realising they were doing it, ANSI made
a fundamental change to the language. And this is when more modern language
like Java have done away with unsigned types altogether, because of the
problems they cause.


0 new messages