Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Expanding buffer - response to "Determine the size of malloc" query

14 views
Skip to first unread message

James Harris

unread,
May 30, 2008, 9:12:13 AM5/30/08
to
Initial issue: read in an arbitrary-length piece of text.
Perceived issue: handle variable-length data

The code below is a suggestion for implementing a variable length
buffer that could be used to read text or handle arrays of arbitrary
length. I don't have the expertise in C of many folks here so I feel
like I'm offering a small furry animal for sacrifice to a big armour
plated one... but will offer it anyway. Please do suggest improvements
or challenge the premise. It would be great if it could be improved to
become a generally useful piece of code.

Well, here goes. This should be fun. :-?

-

The following utility code is passed a buffer (allocated by the
caller) and maintains it at an appropriate size. The main function
increases the allocation (when necessary) by factors - rather than
fixed amounts - for speed. There is a secondary function to trim a
buffer back to a specific size. An extra byte (one more than is
requested) is always left at the end.

/*
* Expanding buffer
*/

#define EBUF_SIZE_INIT 128
#define EBUF_SIZE_MIN 128
#define EBUF_INCREASE 1.5 /* Factor to increase space by each time */

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>

int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
size_t new_size;
char *new_buf;

if (*buf_size < offset + 2) { /* NB last pos left empty */
new_size = *buf_size * EBUF_INCREASE + 1;
if (new_size < offset + 2) new_size = offset + 2;
if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
if ((new_buf = realloc(*buf, new_size)) == NULL) {
return 1; /* Failed to realloc buffer */
}
*buf = new_buf;
*buf_size = new_size;
}
return 0; /* Reallocated successfuly */
}


int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
int new_size = offset + 2; /* Includes empty char */
char *new_buf;

if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
if (new_size != *buf_size) {
if ((new_buf = realloc(*buf, new_size)) == NULL) {
return 1; /* Reallocation failed (unlikely) */
}
*buf = new_buf;
*buf_size = new_size;
}
return 0; /* Reallocation succeeded */
}

James Harris

unread,
May 30, 2008, 9:20:21 AM5/30/08
to

An example of intended use follows. Note that the routines are coded
to expect buffer and current size as parameters. Despite the error
handling the code is intended to be fast. Including "if (offset + 2 >
buf1_size)" in the main code the function should only be called if the
buffer is too small. The cost of one integer comparison is small.

int main() {
char *buf1;
size_t buf1_size = EBUF_SIZE_INIT;
size_t offset;

if ((buf1 = malloc(buf1_size)) == NULL) {
fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
buf1_size);
exit(1);
}

...

offset = <position in buffer to write to>

...

/* Check buf1 is big enough */
if (offset + 2 > buf1_size && ebuf_full(&buf1, &buf1_size, offset))
{
fprintf(stderr, "Buffer overflow - have %d bytes but need %d
bytes",
buf1_size, offset + 2);
exit(1);
}
buf1[offset] = 0;

...

free(buf1);
}


James Harris

unread,
May 30, 2008, 9:44:24 AM5/30/08
to
On 30 May, 14:12, James Harris <james.harri...@googlemail.com> wrote:

Here's another piece of example code to use the proposed functions.
This one is to read an arbitrary-length line. Hopefully when compared
with a custom line-reading function the code below keeps a far simpler
interface while allowing any necessary options. It should also be fast
in that, again, the function only gets called if there is a need for
more space. Since the function allocates memory in ever-increasing
chunks for most iterations the function will not be called.


#define ENDCHAR '\n'

FILE *infile = stdin;
char *buffer;
size_t bufsize = 100; /* Initial size only */
size_t offset;

... (allocate buffer)

/* Read to 'endchar' */
for (offset = 0; (ch = getc(infile)) != EOF; ) {
if (offset + 2 > bufsize &&
ebuf_full(&buffer, &bufsize, offset) {
fprintf(stderr, "Line too long for memory");
exit(1);
}
buffer[offset++] = ch;
if (ch == ENDCHAR) break;
}

... (free buffer)

Notably since we invoke getc() we could easily have more than one
termination character such as

if (ch == '\n' || ch == '\0' || ch == ',')

etc. which is intended to be a big advantage over calling a line
reader function.

--
James

Eric Sosman

unread,
May 30, 2008, 11:14:23 AM5/30/08
to

Only a small, furry animal offering itself up for sacrifice
would use a non-standard header like <malloc.h>. Consider
yourself eaten by the Ravenous Bugblatter Beast.

Also, there seems to be no reason for <stdio.h> in the
buffer-bashing code; it doesn't hurt to #include extraneous
baggage, but it doesn't help either. Everything you need
is in <stdlib.h>.

>> int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
>> size_t new_size;
>> char *new_buf;
>>
>> if (*buf_size < offset + 2) { /* NB last pos left empty */

This magical `2' appears in quite a few places. Maybe
it deserves a #define of its own?

>> new_size = *buf_size * EBUF_INCREASE + 1;

As a small matter of personal preference and prejudice,
I myself would avoid floating-point arithmetic here and do
the calculation in integers. Not a big deal, though.

>> if (new_size < offset + 2) new_size = offset + 2;
>> if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
>> if ((new_buf = realloc(*buf, new_size)) == NULL) {
>> return 1; /* Failed to realloc buffer */
>> }
>> *buf = new_buf;
>> *buf_size = new_size;
>> }
>> return 0; /* Reallocated successfuly */
>>
>> }
>>
>> int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
>> int new_size = offset + 2; /* Includes empty char */
>> char *new_buf;
>>
>> if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
>> if (new_size != *buf_size) {
>> if ((new_buf = realloc(*buf, new_size)) == NULL) {
>> return 1; /* Reallocation failed (unlikely) */

There's an interface design decision lurking here: Should
this be considered a "failure," or just an "unsuccessful
attempt to optimize?" Arguments can be made for both points
of view. IMHO you've chosen rightly, because it's possible
that ebuf_trim() could fail in an attempt to *increase* the
size of the buffer, in which case the calling program might
be, er, surprised to discover that the buffer was too small
for the offset.

>> }
>> *buf = new_buf;
>> *buf_size = new_size;
>> }
>> return 0; /* Reallocation succeeded */
>>
>> }
>
> An example of intended use follows. Note that the routines are coded
> to expect buffer and current size as parameters. Despite the error
> handling the code is intended to be fast. Including "if (offset + 2 >
> buf1_size)" in the main code the function should only be called if the
> buffer is too small. The cost of one integer comparison is small.
>
> int main() {

`int main(void)' would be very slightly better.

> char *buf1;
> size_t buf1_size = EBUF_SIZE_INIT;
> size_t offset;
>
> if ((buf1 = malloc(buf1_size)) == NULL) {
> fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
> buf1_size);

What is the type of buf1_size? Answer: size_t. What type
of operand does the "%d" specifier convert? Answer: int. Are
size_t and int the same? Answer: No. What should you do to
fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
didn't mean that ...)

For fprintf() and so on, <stdio.h> *is* needed.

> exit(1);

ITYM `exit(EXIT_FAILURE);'. Or `return EXIT_FAILURE;'.

> }

Instead of pre-allocating the initial buffer, why not
set buf1=NULL and buf1_size=0 and just let ebuf_full()
take care of everything?

> ...
>
> offset = <position in buffer to write to>
>
> ...
>
> /* Check buf1 is big enough */
> if (offset + 2 > buf1_size && ebuf_full(&buf1, &buf1_size, offset))
> {
> fprintf(stderr, "Buffer overflow - have %d bytes but need %d
> bytes",
> buf1_size, offset + 2);
> exit(1);
> }
> buf1[offset] = 0;
>
> ...
>
> free(buf1);

... and since main() returns an int value, you should ...?
(C99 introduced a special rule for main() that says falling
off the end is equivalent to returning zero, but IMHO this
should be viewed as a concession to the large amount of sloppy
code already in existence, not as an encouragement to further
sloppiness. Besides, C99 implementations have not exactly
taken the world by storm, and lots of C90 implementations are
still in use.)

> }

It seems to me you understand the basic ideas of how to
use realloc() to grow a buffer (although the fact that you
can reallocate a NULL may have escaped you). There are a
few glitches in the way you've done things, easily fixable.

If you want to package something like this for wider
use as a buffer-managing utility, you might consider putting
the buffer information in a struct and passing a single
struct pointer to the functions. Not only would this make
the interface clearer by reducing the argument count, but
it would also make it easy for you to add further fillips
of functionality later on, just by adding a few elements
to the struct and leaving the calls alone.

Go back to your lair and lick your wounds; I think
they're not life-threatening.

--
Eric....@sun.com

Malcolm McLean

unread,
May 31, 2008, 5:05:24 AM5/31/08
to

"James Harris" <james.h...@googlemail.com> wrote in message

> Initial issue: read in an arbitrary-length piece of text.
> Perceived issue: handle variable-length data
>
> The code below is a suggestion for implementing a variable length
> buffer that could be used to read text or handle arrays of arbitrary
> length. I don't have the expertise in C of many folks here so I feel
> like I'm offering a small furry animal for sacrifice to a big armour
> plated one... but will offer it anyway. Please do suggest improvements
> or challenge the premise. It would be great if it could be improved to
> become a generally useful piece of code.
>
Firstly, don't worry about the actual code bodies at this stage. Any
reasonably competent C programmer should be able to provide those.

The thing is the interfaces.

The first problem is that if we use char *, the functions will only work on
character arrays. If we use void *s, this problem disappears, but there
might be issues about too many casts to access the actual data.

The second issue is whether to use a structure for the buffer, or, as you
have done, pass in several parameters to represent size and capacity.
There's a nasty C stitch-up if we use void *s with option 2.

ebuf_full(void **buf ...)

char *buffer;
/* this is illegal */
ebuf_full(&buffer)

buffer has to be assigned to a dummy void *first. Which makes the function
unusable.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm


James Harris

unread,
Jul 25, 2008, 6:00:13 PM7/25/08
to
On 30 May, 15:14, Eric Sosman <Eric.Sos...@sun.com> wrote:
> JamesHarriswrote:

Haha - the sacrificial animal of my analogy was the code I was
offering up - rather than me!! But you've raised - and eaten - some
good points. I wasn't aware not to use malloc.h, for example.

> Also, there seems to be no reason for <stdio.h> in the
> buffer-bashing code; it doesn't hurt to #include extraneous
> baggage, but it doesn't help either. Everything you need
> is in <stdlib.h>.

OK

> >> int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
> >> size_t new_size;
> >> char *new_buf;
>
> >> if (*buf_size < offset + 2) { /* NB last pos left empty */
>
> This magical `2' appears in quite a few places. Maybe
> it deserves a #define of its own?

Agreed, it's a bit scabby as it stands. The reason for the +2 is that
there's a +1 to change from an offset to a length - e.g. an offset of
7 means a length of 8 - and I wanted to leave one extra byte after the
specified length. I'd rather avoid the clutter of another defined
constant. I'll rewrite to consistently use offsets rather than lengths
and thus avoid the +2.

> >> new_size = *buf_size * EBUF_INCREASE + 1;
>
> As a small matter of personal preference and prejudice,
> I myself would avoid floating-point arithmetic here and do
> the calculation in integers. Not a big deal, though.

Me too. The reason for including a factor of 1.5 was simply to
demonstrate that we don't need to settle for integer factors.

> >> if (new_size < offset + 2) new_size = offset + 2;
> >> if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
> >> if ((new_buf = realloc(*buf, new_size)) == NULL) {
> >> return 1; /* Failed to realloc buffer */
> >> }
> >> *buf = new_buf;
> >> *buf_size = new_size;
> >> }
> >> return 0; /* Reallocated successfuly */
>
> >> }
>
> >> int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
> >> int new_size = offset + 2; /* Includes empty char */
> >> char *new_buf;
>
> >> if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
> >> if (new_size != *buf_size) {
> >> if ((new_buf = realloc(*buf, new_size)) == NULL) {
> >> return 1; /* Reallocation failed (unlikely) */
>
> There's an interface design decision lurking here: Should
> this be considered a "failure," or just an "unsuccessful
> attempt to optimize?" Arguments can be made for both points
> of view. IMHO you've chosen rightly, because it's possible
> that ebuf_trim() could fail in an attempt to *increase* the
> size of the buffer, in which case the calling program might
> be, er, surprised to discover that the buffer was too small
> for the offset.

OK

> >> }
> >> *buf = new_buf;
> >> *buf_size = new_size;
> >> }
> >> return 0; /* Reallocation succeeded */
>
> >> }
>
> > An example of intended use follows. Note that the routines are coded
> > to expect buffer and current size as parameters. Despite the error
> > handling the code is intended to be fast. Including "if (offset + 2 >
> > buf1_size)" in the main code the function should only be called if the
> > buffer is too small. The cost of one integer comparison is small.
>
> > int main() {
>
> `int main(void)' would be very slightly better.
>
> > char *buf1;
> > size_t buf1_size = EBUF_SIZE_INIT;
> > size_t offset;
>
> > if ((buf1 = malloc(buf1_size)) == NULL) {
> > fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
> > buf1_size);
>
> What is the type of buf1_size? Answer: size_t. What type
> of operand does the "%d" specifier convert? Answer: int. Are
> size_t and int the same? Answer: No. What should you do to
> fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
> didn't mean that ...)

I've no idea how to print these values, then. Would they be better as
unsigned ints? I guess this would mean unsigned ints would have to be
wide enough for any memory offset. Not sure if that can be relied
upon.

> For fprintf() and so on, <stdio.h> *is* needed.
>
> > exit(1);
>
> ITYM `exit(EXIT_FAILURE);'. Or `return EXIT_FAILURE;'.

OK. Was trying to keep the interface light. As such I wanted to reduce
the number of defines. The procedure name in meant to indicate the
meaning of a zero or non-zero return. The function can exist in an if
statement as

if (ebuf_full(....)) handle error


> > }
>
> Instead of pre-allocating the initial buffer, why not
> set buf1=NULL and buf1_size=0 and just let ebuf_full()
> take care of everything?

I didn't know this could be done. Will include in the rewrite.


> > ...
>
> > offset = <position in buffer to write to>
>
> > ...
>
> > /* Check buf1 is big enough */
> > if (offset + 2 > buf1_size && ebuf_full(&buf1, &buf1_size, offset))
> > {
> > fprintf(stderr, "Buffer overflow - have %d bytes but need %d
> > bytes",
> > buf1_size, offset + 2);
> > exit(1);
> > }
> > buf1[offset] = 0;
>
> > ...
>
> > free(buf1);
>
> ... and since main() returns an int value, you should ...?
> (C99 introduced a special rule for main() that says falling
> off the end is equivalent to returning zero, but IMHO this
> should be viewed as a concession to the large amount of sloppy
> code already in existence, not as an encouragement to further
> sloppiness. Besides, C99 implementations have not exactly
> taken the world by storm, and lots of C90 implementations are
> still in use.)

OK. That was a miss on my part.

> > }
>
> It seems to me you understand the basic ideas of how to
> use realloc() to grow a buffer (although the fact that you
> can reallocate a NULL may have escaped you). There are a
> few glitches in the way you've done things, easily fixable.
>
> If you want to package something like this for wider
> use as a buffer-managing utility, you might consider putting
> the buffer information in a struct and passing a single
> struct pointer to the functions. Not only would this make
> the interface clearer by reducing the argument count, but
> it would also make it easy for you to add further fillips
> of functionality later on, just by adding a few elements
> to the struct and leaving the calls alone.

I thought about that but chose against it. Options seem to be

1. Address and size are scalars in the caller
- limits other info that can be stored

2. Struct holding address, size, factor and other parameters
- simplfies calls to ebuf-trim
- requires normal use of pointers to be dereferenced via the struct

3. Struct holding parameters other than the address
- still needs extra parameter to be passed to ebuf_full
- requires ebuf_full to locate parameter block

On balance the first option seemed best. It keeps the system simple
without losing function.

Eric Sosman

unread,
Jul 25, 2008, 6:33:52 PM7/25/08
to
James Harris wrote:
> On 30 May, 15:14, Eric Sosman <Eric.Sos...@sun.com> wrote:
>> JamesHarriswrote:
>>> [...]

>>> size_t buf1_size = EBUF_SIZE_INIT;
>>> size_t offset;
>>> if ((buf1 = malloc(buf1_size)) == NULL) {
>>> fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
>>> buf1_size);
>> What is the type of buf1_size? Answer: size_t. What type
>> of operand does the "%d" specifier convert? Answer: int. Are
>> size_t and int the same? Answer: No. What should you do to
>> fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
>> didn't mean that ...)
>
> I've no idea how to print these values, then. Would they be better as
> unsigned ints? I guess this would mean unsigned ints would have to be
> wide enough for any memory offset. Not sure if that can be relied
> upon.

If you can count on a C99 implementation, there's a length
modifier "z" for printing size_t values:

printf ("Size = %zu\n", buf1_size);

If you need to live with the more widely available C90
systems, there's no "z" modifier and you need to convert the
size_t to something printf() knows how to handle:

printf ("Size = %u\n", (unsigned int)buf1_size);

or (safer):

printf ("Size = %lu\n", (unsigned long)buf1_size);

or even (extremely safe, extremely unusual):

printf ("Size = %.0f\n", (double)buf1_size);

These will work as well on C99 as they do on C90.

--
Eric....@sun.com

James Harris

unread,
Jul 25, 2008, 6:33:37 PM7/25/08
to
On 31 May, 09:05, "Malcolm McLean" <regniz...@btinternet.com> wrote:
> "JamesHarris" <james.harri...@googlemail.com> wrote in message

> > Initial issue: read in an arbitrary-length piece of text.
> > Perceived issue: handle variable-length data
>
> > The code below is a suggestion for implementing a variable length
> > buffer that could be used to read text or handle arrays of arbitrary
> > length. I don't have the expertise in C of many folks here so I feel
> > like I'm offering a small furry animal for sacrifice to a big armour
> > plated one... but will offer it anyway. Please do suggest improvements
> > or challenge the premise. It would be great if it could be improved to
> > become a generally useful piece of code.
>
> Firstly, don't worry about the actual code bodies at this stage. Any
> reasonably competent C programmer should be able to provide those.
>
> The thing is the interfaces.
>
> The first problem is that if we use char *, the functions will only work on
> character arrays. If we use void *s, this problem disappears, but there
> might be issues about too many casts to access the actual data.

AFAIK casts tend to make code less safe and I try to avoid them. Is
there a good solution to this?

> The second issue is whether to use a structure for the buffer, or, as you
> have done, pass in several parameters to represent size and capacity.
> There's a nasty C stitch-up if we use void *s with option 2.
>
> ebuf_full(void **buf ...)
>
> char *buffer;
> /* this is illegal */
> ebuf_full(&buffer)
>
> buffer has to be assigned to a dummy void *first. Which makes the function
> unusable.

Not nice!

Having a separate struct would allow other advantages such as having a
per-buffer size increase factor but I think it would need the pointer
to be dereferenced when it is used normally. On balance I think
address and length (or, better, address and offset) is better.


Here's a rewrite where I've improved the code slightly by simplifying
a few bits of it. It now uses offsets rather than lengths and bases
the increase on the requested offset so eliminating some of the
checks. It does require the factor to be greater than or equal to 1.
I'll include the functions and a sample main in one go.

/*
* Test expanding buffer
*/

#define EBUF_INCREASE 1.5 /* Factor (>= 1) for space increase */

#include <stdio.h>
#include <stdlib.h>

int ebuf_full(char **buf, size_t *buf_limit, size_t offset) {
size_t new_limit;
char *new_buf;

if (offset >= *buf_limit) {
new_limit = offset * EBUF_INCREASE;
if ((new_buf = realloc(*buf, new_limit + 1)) == NULL) {
return 1; /* Failed to realloc */
}
*buf = new_buf;
*buf_limit = new_limit;
}
return 0; /* Realloc succeeded */
}

int ebuf_trim(char **buf, size_t *buf_limit, size_t offset) {
char *new_buf;

if ((new_buf = realloc(*buf, offset + 1)) == NULL) {
return 1; /* Realloc failed */
}
*buf = new_buf;
*buf_limit = offset;
return 0; /* Succeeded */
}


int main(void) {
char *buf1 = NULL;
size_t buf1_limit = 0;
size_t offset;

for (offset = 0; offset < 1000; offset += 200) {
fprintf(stderr, "\n---Checking for offset %d\n", offset);

if (offset >= buf1_limit && ebuf_full(&buf1, &buf1_limit, offset))
{
fprintf(stderr, "-Ebuf overflow %d/%d bytes", buf1_limit,
offset);
exit(1);
}
buf1[offset] = 'x';
}

fprintf(stderr, "\n---Trim from %d to %d\n", buf1_limit, offset);
if (ebuf_trim(&buf1, &buf1_limit, offset)) {
fprintf(stderr, "-Buffer trim to %d failure\n", offset);
exit(1);
}

free(buf1);
return 0;
}

James Harris

unread,
Jul 25, 2008, 7:06:52 PM7/25/08
to

Rather than use size_t would I be better to use a type of unsigned int
or unsigned long in the first place?

--
James

pete

unread,
Jul 25, 2008, 9:16:29 PM7/25/08
to

If size_t confuses you, then use long unsigned instead.

--
pete

Eric Sosman

unread,
Jul 25, 2008, 9:22:38 PM7/25/08
to
James Harris wrote:
> On 25 Jul, 22:33, Eric Sosman <Eric.Sos...@sun.com> wrote:
>> [... how to print a size_t ...]

>
> Rather than use size_t would I be better to use a type of unsigned int
> or unsigned long in the first place?

I think not: size_t is the type for sizes, so the program
cannot go wrong by using it. If you use unsigned int the program
may err, because valid size_t values might exceed the range of
unsigned int on some machines.

Under C90 rules, size_t must be an unsigned integer type,
and the only "integer types" are the flavors of char, short,
int, and long, with long being the widest. Therefore, in C90
it is guaranteed that converting a size_t to unsigned long
loses no information and preserves the value; it's a good
solution for printing. On the other hand, unsigned long might
be overkill: a 64-bit value, say, where a 32-bit value might
suffice. So the wise course is to calculate with size_t
values and convert to unsigned long only for display purposes.

The situation gets stickier in C99, because the repertoire
of "integer types" is expanded and in fact becomes open-ended:
it is now possible that size_t could be wider than unsigned long --
think of a 64-bit size_t with a 32-bit unsigned long. You could
print by converting to unsigned long long or to uintmax_t, but
since these types only exist in C99 and C99 provides the "z"
length modifier, using "z" for display is surely best.

Summary: Calculate with size_t, except perhaps in situations
where you must economize on storage and *know* that the actual
values will fit in something more restricted. For display
purposes, either convert to unsigned long (C90) or rely on
the "z" modifier (C99).

--
Eric Sosman
eso...@ieee-dot-org.invalid

pete

unread,
Jul 25, 2008, 9:28:06 PM7/25/08
to
James Harris wrote:

> #define EBUF_INCREASE 1.5 /* Factor (>= 1) for space increase */

In my get_line function, for reading text files,
http://www.mindspring.com/~pfilandr/C/get_line/get_line.c
I increase the buffer size by only one byte,
each time that the buffer is found to be too small.

Most text files that I've dealt with,
only have line lengths of less than a hundred bytes,
and a hundred calls to realloc in a program
isn't going to add up to any substantial time.

The get_line function is set up so that if you know
that you're going to be dealing
with a file which has significantly long lines,
then you can supply an adequately large original buffer
so that no reallocation will be needed.

--
pete

Peter Nilsson

unread,
Jul 26, 2008, 12:22:08 AM7/26/08
to
James Harris wrote:
> > #define EBUF_SIZE_INIT 128
> > #define EBUF_SIZE_MIN 128
> > #define EBUF_INCREASE 1.5 /* Factor to increase space by each time */
>
> #define ENDCHAR '\n'

Macros begining with E followed by another capital are reserved if
<errno.h>
is included. Although you don't now, you should not rule out the
possibility of
future versions including it.

--
Peter

santosh

unread,
Jul 26, 2008, 3:16:20 AM7/26/08
to
Eric Sosman wrote:

Even more safely:

printf("Size = %Lf\n", (long double)buf1_size);

:-)

santosh

unread,
Jul 26, 2008, 3:18:27 AM7/26/08
to
pete wrote:

This will break on Windows with objects larger than 4 Gb.

Eric Sosman

unread,
Jul 26, 2008, 9:10:22 AM7/26/08
to
santosh wrote:
> Eric Sosman wrote:
>> [... printing size_t values in C90 ...]

>> or even (extremely safe, extremely unusual):
>>
>> printf ("Size = %.0f\n", (double)buf1_size);
>>
>> These will work as well on C99 as they do on C90.
>
> Even more safely:
>
> printf("Size = %Lf\n", (long double)buf1_size);
>
> :-)

Next time you have an object larger than
10000000000000000000000000000000000000 bytes,
be sure to let us know. Install an extra
swap disk, too. :-)

--
Eric Sosman
eso...@ieee-dot-org.invalid

James Harris

unread,
Jul 26, 2008, 3:35:58 PM7/26/08
to
On 26 Jul, 01:28, pete <pfil...@mindspring.com> wrote:
> James Harris wrote:
> > #define EBUF_INCREASE 1.5 /* Factor (>= 1) for space increase */
>
> In my get_line function, for reading text files,
> http://www.mindspring.com/~pfilandr/C/get_line/get_line.c
> I increase the buffer size by only one byte,
> each time that the buffer is found to be too small.

The proposed code is NOT specifically for reading lines. It is
intended to be used any time a variable length buffer is needed. The
buffer contents could be generated in a loop, for example.

If the buffer increase factor is set to 1 ebuf_full will degenerate to
allocating only as much space as is needed each time it is called.

> Most text files that I've dealt with,
> only have line lengths of less than a hundred bytes,
> and a hundred calls to realloc in a program
> isn't going to add up to any substantial time.
>
> The get_line function is set up so that if you know
> that you're going to be dealing
> with a file which has significantly long lines,
> then you can supply an adequately large original buffer
> so that no reallocation will be needed.

Ebuf_full allows a buffer of arbitrary size to be pre-allocated, if
preferred. Whether pre-allocated or not increasing the buffer by
factors allows it to scale.

--
James

James Harris

unread,
Jul 26, 2008, 3:39:57 PM7/26/08
to

OK. Perhaps I should call it xbuf instead so we have

#define XBUF_INCREASE 1.5

int xbuf_full(...

int xbuf_trim(...

--
James

pete

unread,
Jul 26, 2008, 5:41:51 PM7/26/08
to

Floating point types become unsuitable
for the representation of integers,
at the point where two consecutive integers
converted to the floating point type in question,
compare equal.

I don't know how to, at compile time,
calculate the values of the two lowest
consecutive integers where that happens.

--
pete

pete

unread,
Jul 26, 2008, 6:03:39 PM7/26/08
to

OK, then I guess it's better to learn size_t.

--
pete

James Harris

unread,
Jul 26, 2008, 7:00:59 PM7/26/08
to

I'm not sure it's a question of learning the meaning of size_t. The
problem was in printing it with printf prior to C99.

The recommendation seems to be to cast to unsigned long or similar but
surely if size_t is wider than unsigned long it will fail to print
correctly. In the absence of C99's %z component perhaps the best way
is to print it by a function (which hasn't been mentioned so may be
wrong or impossible....).

--
James

James Harris

unread,
Jul 26, 2008, 7:09:30 PM7/26/08
to

Here's a new version of the functions hopefully taking on board the
recommended changes and with appropriate documentation. The post is
long but the functions themselves are very short.

How do these look and are they good enough for general use? Apart from
their operation should they be packaged in some way to make them
useful - e.g. by making header and source code?

$ cat xbuf.c
/*
* Expanding buffer
*
* Implement buffer management to semi-automatically expand
* or contract the space allocated to a buffer as needed.
*
* An arbitrary number of buffers may be maintained.
*
* Two functions are provided to manage the buffer space.
* Both return True if they fail to carry out user
* instructions. The intention is that they appear in
* if-constructs as in
*
* if <failure> then handle error
*
* otherwise the buffer may be used. This allows the
* failure test to be added immediately in front of any
* code which uses a particular offset without otherwise
* altering the code.
*
* The functions are
*
* 1. xbuf_full returns True if the buffer is "full" - i.e.
* the buffer is too small _and_ cannot be expanded to the
* required size. (If possible the buffer will be expanded
* and False will be returned to indicate that the buffer
* is not full.) Xbuf_full never reduces the size of the
* buffer. It will only make the buffer larger as needed.
*
* 2. xbuf_trimfail returns True if the buffer size cannot
* be set to match the passed-in offset. (If possible the
* buffer size will be set and False will be returned to
* indicate that the trim operation did not fail.)
* Note that xbuf_trimfail will either shrink or enlarge
* the buffer as needed to exactly match the size needed
* for the supplied offset.
*
* In the above False means zero (0) and True means non-zero,
* specifically, one (1).
*
* Any space added to the buffer on either call will be
* filled with undefined values, not necessarily zeros.
*
* A buffer can initially be set up either by a call to
* malloc or by setting the buffer pointer to NULL (and
* defining a size of zero).
*
* Any existing buffer can be passed to the functions as
* long as it was created by malloc/calloc - and possibly
* resized by realloc.
*
* Client code may resize a buffer by realloc at any time
* without reference to the xbuf functions.
*
* The buffer is at all times 'owned' by the client. As
* well as initially creating the buffer (or defining its
* base pointer as NULL as shown above) the client code is
* responsible for freeing the buffer when it is no longer
* needed.
*
* Either xbuf call may relocate the buffer. Rather than
* holding pointers to within the buffer client code should
* address places within a buffer by offsets from its
* beginning. Offsets do not change (as long as they are
* within the limits of the buffer).
*
* For convenience the user supplies an offset to the calls.
* This is the index that would be used in an array expression
* such as
*
* buffer[offset]
*
* The minimum size the buffer needs to be in order to
* support this call is always
*
* offset + 1
*
* For example, to address offset zero the minimum size of
* the buffer must be 1.
*
* Example xbuf_full call:
*
* if (xbuf_full(&buf, &siz, offset)) {
* error: cannot expand buffer size from siz for offset
* }
* buf[offset] = value;
*
* Example xbuf_trimfail call:
*
* if (xbuf_trimfail(&buf, &siz, offset)) {
* error: cannot change buffer size from siz for offset
* }
* buf[offset] = value;
*
* For performance reasons the xbuf_full function call may be
* prefixed with a simple test to see if the call is needed
* such as
*
* if (offset >= size && xbuf_full( ... as above ... ))
*
* This saves a function call in most cases and can make the
* xbuf_full function suitable for use even in tight loops.
* Depending on the value given to the XBUF_CONSTANT most
* iterations will not need to call the function to expand the
* buffer. On a modern CPU branch prediction and speculative
* execution should allow the cost of the test to almost
* disappear.
*
* The proportion of calls executed can be adjusted by means of
* the constant XBUF_FACTOR.
*/


#define XBUF_FACTOR 3 / 2 /* Factor by which to increase space */

/* XBUF_FACTOR may be an expression which is _not_ enclosed
* in parentheses as long as it fits with its single use in
* xbuf_full. See the function code for how XBUF_FACTOR is used.
*
* XBUF_FACTOR must not be less than 1. This is not checked but
* will reduce the buffer to be smaller than needed and lead
* to memory access violations.
*
* If XBUF_FACTOR is set to exactly 1 xbuf_full will only
* allocate the exact space needed on each call and will not
* allocate any extra. This may have a harmful effect on
* performance.
*
* Normally, set XBUF_FACTOR to greater than 1 and use
* xbuf_trimfail to reduce the footprint of a buffer when
* expansion is no longer expected.
*/


#include <stdio.h>
#include <stdlib.h>

/*
* Xbuf_full. Check if the buffer is full. If the desired
* offset is beyond the current buffer expand the buffer
* to make it big enough. If expansion is not possible return
* true to indicate the buffer is full. In this case the
* buffer will be unchanged.
*/

int xbuf_full(char **buf, size_t *buf_size, size_t offset) {
size_t new_size;
char *new_buf;

// fprintf(stderr, "xbuf_full check %08x of size %d can take %d\n",
// *buf, *buf_size, offset);

if (offset >= *buf_size) {
new_size = (offset + 1) * XBUF_FACTOR;

// fprintf(stderr, "New buffer size is to be %d\n", new_size);

if ((new_buf = realloc(*buf, new_size)) == NULL) {

return 1; /* Failed to realloc. Buffer is full */


}
*buf = new_buf;
*buf_size = new_size;

// fprintf(stderr, "New size %d at %08x\n", *buf_size, *buf);

}
return 0; /* Realloc succeeded. Buffer is not full */
}


/*
* Xbuf_trimfail. Adjust the size of the buffer to match the
* offset supplied. Note that this function will expand or
* contract the buffer as needed.
* If the trim operation fails true will be returned and
* the buffer will be unchanged.
*/

int xbuf_trimfail(char **buf, size_t *buf_size, size_t offset) {
char *new_buf;

// fprintf(stderr, "Trim buf at %08x of size %d for offset %d\n",
// *buf, *buf_size, offset);

if ((new_buf = realloc(*buf, offset + 1)) == NULL) {
return 1; /* Trim failed */
}
*buf = new_buf;
*buf_size = offset + 1;

// fprintf(stderr, "New size %d at %08x\n", *buf_size, *buf);

return 0; /* Succeeded */
}


int main(void) {
char *buf1 = NULL;

size_t buf1_siz = 0;
size_t offs;

for (offs = 100; offs < 1000000000; offs += 10000000) {
fprintf(stderr, "\n---Checking for offset %d\n", offs);

if (offs >= buf1_siz && xbuf_full(&buf1, &buf1_siz, offs)) {
fprintf(stderr, "-Xbuf overflow %d/%d bytes", buf1_siz, offs);
exit(1);
}
buf1[offs] = 'x';
}

fprintf(stderr, "\n\n\n---Trim from %d for %d\n", buf1_siz, offs);
if (xbuf_trimfail(&buf1, &buf1_siz, offs)) {
fprintf(stderr, "-Buffer trim for %d failure\n", offs);
exit(1);
}

free(buf1);
fprintf(stderr, "\n");
return 0;
}

pete

unread,
Jul 26, 2008, 7:21:45 PM7/26/08
to
James Harris wrote:

> I'm not sure it's a question of learning the meaning of size_t. The
> problem was in printing it with printf prior to C99.
>
> The recommendation seems to be to cast to unsigned long or similar but
> surely if size_t is wider than unsigned long it will fail to print
> correctly. In the absence of C99's %z component perhaps the best way
> is to print it by a function (which hasn't been mentioned so may be
> wrong or impossible....).

In this context, C89 and C99 are two different languages.

For C89, use long unsigned.
size_t isn't bigger than long unsigned, in C89.

You could probably do something with conditional compilation,
based on whether or not LLONG_MAX was defined in <limits.h>,
to gain portability across C89 and C99 platforms,
but it wouldn't be pretty.

--
pete

Message has been deleted

pete

unread,
Jul 26, 2008, 9:18:52 PM7/26/08
to
Mark L Pappin wrote:

> pete <pfi...@mindspring.com> writes:
>
>> size_t isn't bigger than long unsigned, in C89.
>
> I don't see that this is guaranteed -

ISO/IEC 9899: 1990

6.1.2.5 Types
There are four signed integer types,
designated as signed char, short int, int, and long int.

For each of the signed integer types,
there is a corresponding (but different) unsigned integer type
(designated with the keyword unsigned)
that uses the same amount of storage
(including sign information) and has the same alignment requirements.

An enumeration comprises a set of named integer constant values.
Each distinct enumeration constitutes a different enumerated type.

The type char, the signed and unsigned integer types,
and the enumerated types are collectively called integral types.

6.1.3.2 Integer constants
The type of an integer constant
is the first of the corresponding list
in which its value can be represented.
Unsuffixed decimal:
int, long int, unsigned long int;
unsuffixed octal or hexadecimal:
int, unsigned int, long int, unsigned long int;
suffixed by the letter u or U:
unsigned int, unsigned long int;
suffixed by the letter l or L:
long int, unsigned long int;
suffixed by both the letters u or U and l or L:
unsigned long int.

7.1.6 Common definitions <stddef.h>
The following types and macros
are defined in the standard header <stddef.h>.
Some are also defined in other headers,
as noted in their respective subclauses.
The types are
ptrdiff_t
which is the signed integral type
of the result of subtracting two pointers:
size_t
which is the unsigned integral type
of the result of the sizeof operator;

--
pete

CBFalconer

unread,
Jul 26, 2008, 8:33:17 PM7/26/08
to
James Harris wrote:
> pete <pfil...@mindspring.com> wrote:
>
... snip ...

>
>> The get_line function is set up so that if you know that you're
>> going to be dealing with a file which has significantly long
>> lines, then you can supply an adequately large original buffer
>> so that no reallocation will be needed.
>
> Ebuf_full allows a buffer of arbitrary size to be pre-allocated,
> if preferred. Whether pre-allocated or not increasing the buffer
> by factors allows it to scale.

Investigate using ggets, written in purely standard C and released
to the public domain. The whole package is available at:

<http://cbfalconer.home.att.net/download/ggets.zip>

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.


santosh

unread,
Jul 27, 2008, 3:29:49 AM7/27/08
to
James Harris wrote:

<snip>

> I'm not sure it's a question of learning the meaning of size_t. The
> problem was in printing it with printf prior to C99.
>
> The recommendation seems to be to cast to unsigned long or similar but
> surely if size_t is wider than unsigned long

In a C90 conforming implementation size_t cannot be wider than unsigned
long since the latter is the largest integral type specified by the
Standard, and size_t is defined as an unsigned integral type.

BTW, is it conforming for a C90 implementation to implement size_t as an
unsigned integral type larger than unsigned long? Is there a specific
statement in the Standard that forbids this? Won't the "as if" rule
rescue such an implementation?

> it will fail to print correctly.

This is exceedingly unlikely to the point that I don't think you need to
worry about it.

> In the absence of C99's %z component perhaps the best way
> is to print it by a function (which hasn't been mentioned so may be
> wrong or impossible....).

The Standard defined size_t as an unsigned integral type, but the exact
nature of the type is implementation defined. If you know that your
implementation conforms to C99 then the specific format specifier %zu
is the way to go. Otherwise just cast the size_t value to the largest
unsigned integral type that your implementation supports (either
unsigned long or unsigned long long) and print it.

You can set a small macro similar to the ones in inttypes.h for this
purpose, which will expand to the correct (or best) format specifier
for each implementation. Like say:

#if __STDC_VERSION__ >= 199901L
#define PRI_SIZE_T "zu"
#else
#define PRI_SIZE_T "lu"
#endif

You could also test for ULLONG_MAX and change "lu" to "llu", though that
may be overkill.

James Harris

unread,
Jul 27, 2008, 3:41:26 AM7/27/08
to
On 27 Jul, 00:33, CBFalconer <cbfalco...@yahoo.com> wrote:
> James Harris wrote:
> > pete <pfil...@mindspring.com> wrote:
>
> ... snip ...
>
> >> The get_line function is set up so that if you know that you're
> >> going to be dealing with a file which has significantly long
> >> lines, then you can supply an adequately large original buffer
> >> so that no reallocation will be needed.
>
> > Ebuf_full allows a buffer of arbitrary size to be pre-allocated,
> > if preferred. Whether pre-allocated or not increasing the buffer
> > by factors allows it to scale.
>
> Investigate using ggets, written in purely standard C and released
> to the public domain. The whole package is available at:
>
> <http://cbfalconer.home.att.net/download/ggets.zip>
>

Pete added line reading to the discussion. The code I proposed is NOT


specifically for reading lines. It is intended to be used any time a
variable length buffer is needed. The buffer contents could be
generated in a loop, for example.

As I understand it ggets only works when reading lines.

I believe the xbuf functions have other benefits:

- The user maintains control and can always choose whether to allow
the buffer to expand or not based on whatever criteria the programmer
wishes - for example, when the current size of the buffer reaches a
certain limit.
- Related to this, if using the functions to help read from an input
stream the input can be terminated on any number of specific
characters - for example, carriage return, null, line feed, and/or any
control character.
- The xbuf functions will work with a buffer allocated by the caller
which tends to keep the malloc and free functions in the same code
and, I hope this makes the programmer more aware of the need to free
any buffer space used.
- The caller can manipulate the buffer at any time with realloc (along
with noting the new length) and the xbuf functions will still work on
the same buffer.
- More than one buffer can be grown at the same time - for example, if
reading two interleaved streams or reading one stream and generating
another buffer.
- Control is not relinqushed to the called function as it is with
ggets. If reading a long line over a slow link I understand that ggets
will keep control until the line ends. Multi-threading can address
this but is a very heavy handed approach.

and lastly,

- The names of the functions are intended to make it clear that they
are not part of the standard library. (The name ggets looks too much
like those in standard libraries for my taste - but that is a personal
preference.)

If the xbuf functions fail to do any of the above or can be bettered
improvements would be welcome.

Harald van Dijk

unread,
Jul 27, 2008, 4:03:14 AM7/27/08
to
On Sun, 27 Jul 2008 12:59:49 +0530, santosh wrote:
> James Harris wrote:
>
> <snip>
>
>> I'm not sure it's a question of learning the meaning of size_t. The
>> problem was in printing it with printf prior to C99.
>>
>> The recommendation seems to be to cast to unsigned long or similar but
>> surely if size_t is wider than unsigned long
>
> In a C90 conforming implementation size_t cannot be wider than unsigned
> long since the latter is the largest integral type specified by the
> Standard, and size_t is defined as an unsigned integral type.
>
> BTW, is it conforming for a C90 implementation to implement size_t as an
> unsigned integral type larger than unsigned long? Is there a specific
> statement in the Standard that forbids this? Won't the "as if" rule
> rescue such an implementation?

No, it won't. Since the C90 standard says that size_t must be a typedef
for unsigned char, unsigned short, unsigned int, or unsigned long (the
only unsigned integer types), this is a strictly conforming C90 program:

#include <stddef.h>
#include <limits.h>
#define SIZE_MAX ((size_t) -1)
int main(void) {
return SIZE_MAX != UCHAR_MAX
&& SIZE_MAX != USHRT_MAX
&& SIZE_MAX != UINT_MAX
&& SIZE_MAX != ULONG_MAX;
}

It must return 0. If the implementation makes size_t larger than unsigned
long, the program returns 1. Returning 1 where the standard requires 0 is
not allowed by the as-if rule :-)

There are plenty of more convincing correct C90 programs that would be
broken by such an implementation, but converting size_t to unsigned long
and expecting no change in value is one such example, and you weren't
convinced by that. Could you explain why in a bit more detail?

santosh

unread,
Jul 27, 2008, 5:32:40 AM7/27/08
to
Harald van D?k wrote:

> On Sun, 27 Jul 2008 12:59:49 +0530, santosh wrote:
>> James Harris wrote:
>>
>> <snip>
>>
>>> I'm not sure it's a question of learning the meaning of size_t. The
>>> problem was in printing it with printf prior to C99.
>>>
>>> The recommendation seems to be to cast to unsigned long or similar
>>> but surely if size_t is wider than unsigned long
>>
>> In a C90 conforming implementation size_t cannot be wider than
>> unsigned long since the latter is the largest integral type specified
>> by the Standard, and size_t is defined as an unsigned integral type.
>>
>> BTW, is it conforming for a C90 implementation to implement size_t as
>> an unsigned integral type larger than unsigned long? Is there a
>> specific statement in the Standard that forbids this? Won't the "as
>> if" rule rescue such an implementation?
>
> No, it won't. Since the C90 standard says that size_t must be a
> typedef for unsigned char, unsigned short, unsigned int, or unsigned
> long (the only unsigned integer types), this is a strictly conforming
> C90 program:

Okay. I do not have access to the C90 standard and I seem to have
misunderstood. I was under the impression that size_t was defined
as "an unsigned integer type" in C90, like it is in C99. So C90
strictly restricts size_t to be an alias for one of unsigned
char/short/int/long.

> #include <stddef.h>
> #include <limits.h>
> #define SIZE_MAX ((size_t) -1)
> int main(void) {
> return SIZE_MAX != UCHAR_MAX
> && SIZE_MAX != USHRT_MAX
> && SIZE_MAX != UINT_MAX
> && SIZE_MAX != ULONG_MAX;
> }
>
> It must return 0. If the implementation makes size_t larger than
> unsigned long, the program returns 1. Returning 1 where the standard
> requires 0 is not allowed by the as-if rule :-)
>
> There are plenty of more convincing correct C90 programs that would be
> broken by such an implementation, but converting size_t to unsigned
> long and expecting no change in value is one such example, and you
> weren't convinced by that. Could you explain why in a bit more detail?

I think my real question was whether C90 requires size_t to be a typedef
for the "fundamental" unsigned integer types that it defines, or
whether it would be conforming for an implementation to define size_t
as /an/ unsigned integer type, but distinct from unsigned
char/short/int/long. But your program above has answered that question.

So I suppose it's impossible to write a fully conforming C90 program
under 64 bit Windows that calls a Windows API function.

Harald van Dijk

unread,
Jul 27, 2008, 7:44:36 AM7/27/08
to
On Sun, 27 Jul 2008 15:02:40 +0530, santosh wrote:
> Harald van D?k wrote:
>> On Sun, 27 Jul 2008 12:59:49 +0530, santosh wrote:
>>> James Harris wrote:
>>>
>>> <snip>
>>>
>>>> I'm not sure it's a question of learning the meaning of size_t. The
>>>> problem was in printing it with printf prior to C99.
>>>>
>>>> The recommendation seems to be to cast to unsigned long or similar
>>>> but surely if size_t is wider than unsigned long
>>>
>>> In a C90 conforming implementation size_t cannot be wider than
>>> unsigned long since the latter is the largest integral type specified
>>> by the Standard, and size_t is defined as an unsigned integral type.
>>>
>>> BTW, is it conforming for a C90 implementation to implement size_t as
>>> an unsigned integral type larger than unsigned long? Is there a
>>> specific statement in the Standard that forbids this? Won't the "as
>>> if" rule rescue such an implementation?
>>
>> No, it won't. Since the C90 standard says that size_t must be a typedef
>> for unsigned char, unsigned short, unsigned int, or unsigned long (the
>> only unsigned integer types), this is a strictly conforming C90
>> program:
>
> Okay. I do not have access to the C90 standard and I seem to have
> misunderstood. I was under the impression that size_t was defined as "an
> unsigned integer type" in C90, like it is in C99.

It is. However, ...

> So C90 strictly
> restricts size_t to be an alias for one of unsigned char/short/int/long.

...C90 does not consider any type other than those four an unsigned
integer type. C90's "unsigned integer type" is what C99 calls a "standard
unsigned integer type", and C90 does not recognise what C99 calls
"extended integer types". If the implementation supports some type that
behaves exactly as an integer type would, and is represented exactly the
same way, it still cannot be an integer type according to the C90's
definition.

santosh

unread,
Jul 27, 2008, 8:39:03 AM7/27/08
to
Harald van D?k wrote:

Hmm, seems pretty restrictive to me. Glad that that was rectified with
C99. I suppose this same restriction would make ptrdiff_t potentially
unusable with objects greater than LONG_MAX bytes, though of course,
objects of that size aren't guaranteed in the first place? Wouldn't an
unsigned ptrdiff_t have been more suitable than a signed one?

Harald van Dijk

unread,
Jul 27, 2008, 11:26:47 AM7/27/08
to

Indeed. But there's no guarantee that ptrdiff_t is large enough to hold
the largest difference between two pointers anyway.

> though of course,
> objects of that size aren't guaranteed in the first place?

True, but (assuming size_t is wide enough) there's nothing stopping you
from passing a large value to malloc, and only using an object of that
size if malloc succeeds.

> Wouldn't an
> unsigned ptrdiff_t have been more suitable than a signed one?

(p - 1) - p should evaluate to -1. This is not possible if ptrdiff_t were
unsigned.

pete

unread,
Jul 27, 2008, 6:59:55 PM7/27/08
to

ptrdiff_t doesn't have that problem with programs that
don't exceed any of the guaranteed minimum environmental limits.

N869
7.18.3 Limits of other integer types
[#2]
-- limits of ptrdiff_t
PTRDIFF_MIN -65535
PTRDIFF_MAX +65535
-- limit of size_t
SIZE_MAX 65535

But, the size of an object doesn't have to exceed LONG_MAX bytes
to have that problem. It only has to exceed 65535 bytes.

N869
6.5.6 Additive operators

[#9] When two pointers are subtracted, both shall point to
elements of the same array object, or one past the last
element of the array object; the result is the difference of
the subscripts of the two array elements. The size of the
result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that
type, the behavior is undefined.

> Wouldn't an
> unsigned ptrdiff_t have been more suitable than a signed one?


The POSIX solution, is a signed size_t.
http://bytes.com/forum/thread458286.html

--
pete

Chris Thomasson

unread,
Jul 27, 2008, 8:12:46 PM7/27/08
to
"James Harris" <james.h...@googlemail.com> wrote in message
news:d0f617e1-dedd-4de8...@79g2000hsk.googlegroups.com...
[...]
> Here's a rewrite where I've improved the code slightly by simplifying
> a few bits of it. It now uses offsets rather than lengths and bases
> the increase on the requested offset so eliminating some of the
> checks. It does require the factor to be greater than or equal to 1.
> I'll include the functions and a sample main in one go.
[...]

Three small nit-picks in the test code:


> int main(void) {
> char *buf1 = NULL;

> size_t buf1_limit = 0;
> size_t offset;
>
> for (offset = 0; offset < 1000; offset += 200) {
> fprintf(stderr, "\n---Checking for offset %d\n", offset);
>
> if (offset >= buf1_limit && ebuf_full(&buf1, &buf1_limit, offset))
> {

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If `buf1' is not NULL, and `ebuf_full' returns 1, AFAICT, your leaking
memory here.


> fprintf(stderr, "-Ebuf overflow %d/%d bytes", buf1_limit,
> offset);
> exit(1);
> }
> buf1[offset] = 'x';
> }
>
> fprintf(stderr, "\n---Trim from %d to %d\n", buf1_limit, offset);
> if (ebuf_trim(&buf1, &buf1_limit, offset)) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If `buf1' is not NULL, and `ebuf_trim' returns 1, your leaking memory here
as well...


> fprintf(stderr, "-Buffer trim to %d failure\n", offset);
> exit(1);
> }
>
> free(buf1);
> return 0;
> }


Also, for printing `size_t', its probably "better" to use something like:

printf("sizeof(void*) == %lu\n", (unsigned long)sizeof(void*));

[...]

blargg

unread,
Jul 28, 2008, 4:11:48 AM7/28/08
to
In article <g6hfc4$jre$1...@registered.motzarella.org>, santosh
<santo...@gmail.com> wrote:
> ...

> I think my real question was whether C90 requires size_t to be a typedef
> for the "fundamental" unsigned integer types that it defines, or
> whether it would be conforming for an implementation to define size_t
> as /an/ unsigned integer type, but distinct from unsigned
> char/short/int/long. But your program above has answered that question.
>
> So I suppose it's impossible to write a fully conforming C90 program
> under 64 bit Windows that calls a Windows API function.

Does the Windows API prevent a C90 compiler on 64-bit windows from making
unsigned long 64 bits wide? If long must be 32 bits, could such a compile
at least make 2^32-1 the maximum object size, so that the 32-bit unsigned
long would be sufficient to hold the size of any object?

CBFalconer

unread,
Jul 28, 2008, 3:19:37 PM7/28/08
to
blargg wrote:
>
... snip ...

>
> Does the Windows API prevent a C90 compiler on 64-bit windows from
> making unsigned long 64 bits wide? If long must be 32 bits, could
> such a compile at least make 2^32-1 the maximum object size, so
> that the 32-bit unsigned long would be sufficient to hold the size
> of any object?

No, the minimum size of a long is 32 bits. Emphasize, minimum.

Harald van Dijk

unread,
Jul 28, 2008, 4:42:08 PM7/28/08
to
On Mon, 28 Jul 2008 15:19:37 -0400, CBFalconer wrote:
> blargg wrote:
>> Does the Windows API prevent a C90 compiler on 64-bit windows from
>> making unsigned long 64 bits wide?
>
> No, the minimum size of a long is 32 bits. Emphasize, minimum.

The C standard allows long to hold more than 32 bits, but that doesn't
mean the Windows API does. It's similar to how you shouldn't expect an
implementation that makes system(NULL) returns 0 to conform to POSIX, even
though any POSIX-conforming implementation is also an conforming C
implementation.

santosh

unread,
Jul 28, 2008, 10:19:51 PM7/28/08
to
blargg wrote:

> In article <g6hfc4$jre$1...@registered.motzarella.org>, santosh
> <santo...@gmail.com> wrote:
>> ...
>> I think my real question was whether C90 requires size_t to be a
>> typedef for the "fundamental" unsigned integer types that it defines,
>> or whether it would be conforming for an implementation to define
>> size_t as /an/ unsigned integer type, but distinct from unsigned
>> char/short/int/long. But your program above has answered that
>> question.
>>
>> So I suppose it's impossible to write a fully conforming C90 program
>> under 64 bit Windows that calls a Windows API function.
>
> Does the Windows API prevent a C90 compiler on 64-bit windows from
> making unsigned long 64 bits wide?

IIRC long is constrained to be 32 bits for code which interfaces with
the Windows API.

> If long must be 32 bits, could such
> a compile at least make 2^32-1 the maximum object size, so that the
> 32-bit unsigned long would be sufficient to hold the size of any
> object?

It could do so, but then you are giving up one advantage of using a
larger address-space computer.

Honestly, I don't know much about 64 bit Windows, so I expect that I'm
either wrong in what I said above or that there are system specific
workarounds. Please don't take my word for this but ask in a group like
<news:comp.os.ms-windows.programmer.win32> or a suitable group under
the <news:microsoft.public> category.

santosh

unread,
Jul 29, 2008, 6:09:42 AM7/29/08
to
pete wrote:

So it is. Thanks for that correction.

<snip>

> The POSIX solution, is a signed size_t.
> http://bytes.com/forum/thread458286.html

I wonder why ISO C decided not to adopt it? After all, size_t is
specifically meant for counting the bytes of an object, and presumably,
would be suitable for holding pointer offset values too.

0 new messages