Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

fgets vs gets

147 views
Skip to first unread message

Lily

unread,
May 30, 2003, 11:12:55 AM5/30/03
to
Hello,

I am unclear of how I can use fgets to achieve the same result as using
gets.

I was coding a command line interpreter program, and I used gets. But it
complains the following:

/tmp/ccDLLJkR.o: In function `main':
/tmp/ccDLLJkR.o(.text+0x27): the `gets' function is dangerous and should not
be used.

I just ran the executable and ignore the above message and I could get the
expected and required output.

I was told that using:
fgets(buf, sizeof buf, stdin) would achieve the same result
as gets(buf), but when I try it, it wouldn't.

What/where could I have done a mistake?

Lily


Emmanuel Delahaye

unread,
May 30, 2003, 11:36:59 AM5/30/03
to
In 'comp.lang.c', "Lily" <umwo...@cc.umanitoba.ca> wrote:

> I am unclear of how I can use fgets to achieve the same result as using
> gets.

So you need your light to be shed, because the use of gets() is forbidden in
a real application (with monkey users).



> I was coding a command line interpreter program, and I used gets. But it
> complains the following:
>
> /tmp/ccDLLJkR.o: In function `main':
> /tmp/ccDLLJkR.o(.text+0x27): the `gets' function is dangerous and should
> not be used.

Absolutely.



> I just ran the executable and ignore the above message and I could get
> the expected and required output.

All is fine as long as you use the program in the way it was designed. But if
by chance, a monkey typist types more characters that expected, it produces
an undefined behaviour, because the gets() functions doesn't know the
expected maximum number of characters.

It simply swallows the whole stdin buffer until '\n' and spits the whole
stuff into the buffer which address was passed, regardless the number of
characters it writes. If the buffer is to small, it's the UB. A serious bug,
and also a serious security hole. There is no portable way to prevent it.



> I was told that using:
> fgets(buf, sizeof buf, stdin) would achieve the same result
> as gets(buf), but when I try it, it wouldn't.
>
> What/where could I have done a mistake?

You have forgotten to post the non working code. Also, you should read the
FAQ that explains the dangers of gets() and the correct use of fgets().

--
-ed- emdel at noos.fr
The C-language FAQ: http://www.eskimo.com/~scs/C-faq/top.html
C-library: http://www.dinkumware.com/htm_cl/index.html
FAQ de f.c.l.c : http://www.isty-info.uvsq.fr/~rumeau/fclc/

"Clearly your code does not meet the original spec."
"You are sentenced to 30 lashes with a wet noodle."
-- Jerry Coffin in a.l.c.c++

Hallvard B Furuseth

unread,
May 30, 2003, 12:07:43 PM5/30/03
to
Lily wrote:

> I am unclear of how I can use fgets to achieve the same result as using
> gets.

Declare a big enough buffer, pass the size of that buffer as an argument
to fgets(), remove the trailing newline from the buffer after calling
fgets().

Finally, decide what to do if the input string is too long for the
buffer (which means you'll not receive a trailing newline in the input
buffer). Alternatives:
- Ignore the problem, like gets() did. If someone inputs a too long
line, it will be read as two lines. (While with gets() the program
just misbehaved in some random way.)
- realloc() the buffer to have room for e.g. twice as much data,
and call fgets() again to read the rest of the line - starting where
the previous strings leaves off in the input buffer.
- Ignore the remaining characters on the line:
loop and do getchar() until it returns EOF or '\n'.

--
Hallvard

Eric Sosman

unread,
May 30, 2003, 12:06:55 PM5/30/03
to
Lily wrote:
>
> I was told that using:
> fgets(buf, sizeof buf, stdin) would achieve the same result
> as gets(buf), but when I try it, it wouldn't.

fgets() deposits the trailing '\n' character in
`buf', but gets() discards it. See Questions 12.23
and 7.1 in the comp.lang.c Frequently Asked Questions
(FAQ) list

http://www.eskimo.com/~scs/C-faq/top.html

... and if that doesn't solve your problem, post the
actual code.

--
Eric....@sun.com

CBFalconer

unread,
May 30, 2003, 12:36:18 PM5/30/03
to
Lily wrote:
>
> I am unclear of how I can use fgets to achieve the same result
> as using gets.
>
> I was coding a command line interpreter program, and I used
> gets. But it complains the following:
>
> /tmp/ccDLLJkR.o: In function `main':
> /tmp/ccDLLJkR.o(.text+0x27): the `gets' function is dangerous
> and should not be used.

And it is absolutely right. For a safe and easily used
alternative, try ggets, available at:

<http://cbfalconer.home.att.net/download/ggets.zip>

--
Chuck F (cbfal...@yahoo.com) (cbfal...@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!


Dann Corbit

unread,
May 30, 2003, 3:04:55 PM5/30/03
to
"CBFalconer" <cbfal...@yahoo.com> wrote in message
news:3ED784CA...@yahoo.com...

> Lily wrote:
> >
> > I am unclear of how I can use fgets to achieve the same result
> > as using gets.
> >
> > I was coding a command line interpreter program, and I used
> > gets. But it complains the following:
> >
> > /tmp/ccDLLJkR.o: In function `main':
> > /tmp/ccDLLJkR.o(.text+0x27): the `gets' function is dangerous
> > and should not be used.
>
> And it is absolutely right. For a safe and easily used
> alternative, try ggets, available at:
>
> <http://cbfalconer.home.att.net/download/ggets.zip>

Another alternative:

http://home.att.net/~jackklein/ctips01.html#safe_gets

--
C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html
"The C-FAQ Book" ISBN 0-201-84519-9
C.A.P. FAQ: ftp://cap.connx.com/pub/Chess%20Analysis%20Project%20FAQ.htm

bd

unread,
May 30, 2003, 4:13:29 PM5/30/03
to
On Fri, 30 May 2003 10:12:55 +0000, Lily wrote:

> Hello,
>
> I am unclear of how I can use fgets to achieve the same result as using
> gets.
>
> I was coding a command line interpreter program, and I used gets. But it
> complains the following:
>
> /tmp/ccDLLJkR.o: In function `main':
> /tmp/ccDLLJkR.o(.text+0x27): the `gets' function is dangerous and should not
> be used.

Indeed - you can overrun buffers easily with gets.

> I just ran the executable and ignore the above message and I could get the
> expected and required output.
>
> I was told that using:
> fgets(buf, sizeof buf, stdin) would achieve the same result
> as gets(buf), but when I try it, it wouldn't.
>
> What/where could I have done a mistake?

Try stripping the trailing newline:
if(strlen(buf) && buf[strlen(buf)-1] == '\n')
buf[strlen(buf)-1] == '\0';

Simon Biber

unread,
May 30, 2003, 4:31:27 PM5/30/03
to
"bd" <bdo...@bd-home-comp.no-ip.org> wrote:
> Try stripping the trailing newline:
> if(strlen(buf) && buf[strlen(buf)-1] == '\n')
> buf[strlen(buf)-1] == '\0';

Ick - three calls to strlen when one would suffice.
Gratuitous inefficiency. I don't think it's premature
optimisation to use:
char *p = strchr(buf, '\n');
if(p != NULL) *p = '\0';

--
Simon.


Emmanuel Delahaye

unread,
May 30, 2003, 5:19:22 PM5/30/03
to
In 'comp.lang.c', "bd" <bdo...@bd-home-comp.no-ip.org> wrote:

> Try stripping the trailing newline:
> if(strlen(buf) && buf[strlen(buf)-1] == '\n')
> buf[strlen(buf)-1] == '\0';

I'm very sorry to have to say it, but this is probably the ugliest way I
ever seen of writing:

char *p = strchr (buf, '\n');
if (p)
{
*p = 0;

bd

unread,
May 30, 2003, 5:21:20 PM5/30/03
to

True, but mine reduces the modification to other parts of the code. Also,
it only gets trailing \ns, but come to think of it that's not strictly
required. You could do:
#define CHOMP(s) \
{ size_t s##_temp = strlen(s); \
if(s##_temp && s[s##_temp - 1] == '\n') \
s[s##_temp - 1] = '\0'; \
}

Eric Sosman

unread,
May 30, 2003, 5:29:58 PM5/30/03
to
Emmanuel Delahaye wrote:
>
> In 'comp.lang.c', "bd" <bdo...@bd-home-comp.no-ip.org> wrote:
>
> > Try stripping the trailing newline:
> > if(strlen(buf) && buf[strlen(buf)-1] == '\n')
> > buf[strlen(buf)-1] == '\0';
>
> I'm very sorry to have to say it, but this is probably the ugliest way I
> ever seen of writing:
>
> char *p = strchr (buf, '\n');
> if (p)
> {
> *p = 0;
> }

An even more compact possibility is

buf[strcspn(buf,"\n")] = '\0';

... but I think this is a case where brevity is the
soul of something other than wit.

--
Eric....@sun.com

Simon Biber

unread,
May 30, 2003, 7:14:55 PM5/30/03
to
"bd" <bdo...@bd-home-comp.no-ip.org> wrote:
> On Sat, 31 May 2003 06:31:27 +1000, Simon Biber wrote:
> > "bd" <bdo...@bd-home-comp.no-ip.org> wrote:
> >> Try stripping the trailing newline:
> >> if(strlen(buf) && buf[strlen(buf)-1] == '\n')
> >> buf[strlen(buf)-1] == '\0';
> >
> > Ick - three calls to strlen when one would suffice.
> > Gratuitous inefficiency. I don't think it's premature
> > optimisation to use:
> > char *p = strchr(buf, '\n');
> > if(p != NULL) *p = '\0';
>
> True, but mine reduces the modification to other parts of the code.

What modification? The variable declaration? In C you can declare variables
anywhere -- and if you're still catering for obsolete C compilers, then just
add braces around the two lines. This little pointer is likely to be
optimised into a register, whereas your version cannot generally be
optimised to the same degree, because each function call could have side
effects.

> Also, it only gets trailing \ns, but come to think
> of it that's not strictly required.

Indeed, the string constructed by fgets will only ever contain zero or one
newline character, never more. (If that weren't the case, there is strrchr -
but since that must first find the end of the string and then work back, it
must be slower.)

> You could do:
> #define CHOMP(s) \
> { size_t s##_temp = strlen(s); \
> if(s##_temp && s[s##_temp - 1] == '\n') \
> s[s##_temp - 1] = '\0'; \
> }

The strchr would work equally well in such a macro and IMHO is more elegant
because it has only one test in the if statement.

--
Simon.


CBFalconer

unread,
May 30, 2003, 10:34:24 PM5/30/03
to
Dann Corbit wrote:
> "CBFalconer" <cbfal...@yahoo.com> wrote in message
> > Lily wrote:
> > >
> > > I am unclear of how I can use fgets to achieve the same result
> > > as using gets.
> > >
> > > I was coding a command line interpreter program, and I used
> > > gets. But it complains the following:
> > >
> > > /tmp/ccDLLJkR.o: In function `main':
> > > /tmp/ccDLLJkR.o(.text+0x27): the `gets' function is dangerous
> > > and should not be used.
> >
> > And it is absolutely right. For a safe and easily used
> > alternative, try ggets, available at:
> >
> > <http://cbfalconer.home.att.net/download/ggets.zip>
>
> Another alternative:
>
> http://home.att.net/~jackklein/ctips01.html#safe_gets

I started my ggets development with 'what attracts newbies to
gets', and the answer was 'simplicity'. No worries about \n
detection. No worries about actual length (HAH). Just go and get
that string. So I tried to match those qualities in ggets.

1. It always ends at the next \n. The \n is always removed.
2. It can also be used on non stdin files. ggets<->fggets
3. However, unlike gets<->fgets, \n is always removed,
4. I would have liked to return a pointer, but that runs into
the need for multiple error possibilities (EOF, file error,
memory shortage). So I returned 0 for OK, EOF, or positive.
5. No argument initialization is ever needed. OK, the file
for fggets needs to be open.

What can go wrong? AFAICT the only possibility is that the user
fails to eventually free the returned string and has a memory
leak. The same applies to strdup, when implemented. At any rate
the result is a program failure, not a potential system failure.
Another possibility is that the caller omits the & in front of the
char* pointer passed. Shades of scanf. Only two things to
remember, and the compiler should flag the missing &.

Malcolm

unread,
May 31, 2003, 6:03:16 AM5/31/03
to

"Simon Biber" <sbi...@optushome.com.au> wrote in message

>
> Ick - three calls to strlen when one would suffice.
> Gratuitous inefficiency. I don't think it's premature
> optimisation to use:
> char *p = strchr(buf, '\n');
> if(p != NULL) *p = '\0';
>
Also the newline is there for a reason - just stripping it out is likely to
replace ub on buffer overflow with a defined-behaviour bug as a partial
input is treated as whole, and then the next input function also gets an
invalid line from stdin.
Why is a defined-behaviour bug generally worse than ub? Ub is undefined by
the C standard, but it can be defined by the platform to do something
sensible - for instance detecting illegal memory access and terminating the
program with an error message. Functionally, a crash is no different to the
computer blowing a fuse.
However if we have defined, incorrect behaviour, the platform is forbidden
from detecting it. The program is forced to give the incorrect output,
whether that be a gas bill for $200 for Mrs Bloggs when really she owes $100
(much more annoying than a bill for $234,000,000,000.02), or too big a dose
of radiation in a computerised piece of medical equipment.


Paul Hsieh

unread,
May 31, 2003, 7:07:22 AM5/31/03
to
In article <3ED802C4...@yahoo.com>, cbfal...@yahoo.com says...

> Dann Corbit wrote:
> > "CBFalconer" <cbfal...@yahoo.com> wrote in message
> > > Lily wrote:
> > > > I am unclear of how I can use fgets to achieve the same result
> > > > as using gets.
> > > >
> > > > I was coding a command line interpreter program, and I used
> > > > gets. But it complains the following:
> > > >
> > > > /tmp/ccDLLJkR.o: In function `main':
> > > > /tmp/ccDLLJkR.o(.text+0x27): the `gets' function is dangerous
> > > > and should not be used.
> > >
> > > And it is absolutely right. For a safe and easily used
> > > alternative, try ggets, available at:
> > >
> > > <http://cbfalconer.home.att.net/download/ggets.zip>
> >
> > Another alternative:
> >
> > http://home.att.net/~jackklein/ctips01.html#safe_gets

Presizing just propogates the memory management problem back to the programmer.
Setting a maximum input length just turns a programming problem into an end-
user problem.



> I started my ggets development with 'what attracts newbies to
> gets', and the answer was 'simplicity'. No worries about \n
> detection. No worries about actual length (HAH). Just go and get
> that string. So I tried to match those qualities in ggets.
>
> 1. It always ends at the next \n. The \n is always removed.
> 2. It can also be used on non stdin files. ggets<->fggets
> 3. However, unlike gets<->fgets, \n is always removed,

fgets does not remove the '\n'. Furthermore fgets ignores any encountered '\0'
characters, and only terminates on one of '\n', EOF, or the length limit.

> 4. I would have liked to return a pointer, but that runs into
> the need for multiple error possibilities (EOF, file error,
> memory shortage). So I returned 0 for OK, EOF, or positive.
> 5. No argument initialization is ever needed. OK, the file
> for fggets needs to be open.
>
> What can go wrong? AFAICT the only possibility is that the user
> fails to eventually free the returned string and has a memory
> leak.

You call realloc roughly (strlen(output) / 128) times. Some weaker C library
implementations perform this very badly and will artifically run out of memory
far before all the heap memory is really consumed. You should use increasing
powers of 2, and if you are that worried, realloc at the end to shrink it to
the minimum required memory. This will guarantee that at most
log_2(strlen(output)) calls to realloc are made.

See breadln, or bgets in the "Better String Library" here:

http://bstring.sourceforge.net/

> [...] The same applies to strdup, when implemented.

Well strdup is kind of trivial -- I don't think its one of the "severe problem"
functions in the C library. In any event, these are only two functions, and
neither perform in-place operations. The other C library string functions are
not so easily replaced by analogous methods.

--
Paul Hsieh
http://bstring.sourceforge.net/
http://www.pobox.com/~qed/

pete

unread,
May 31, 2003, 7:21:25 AM5/31/03
to
Paul Hsieh wrote:

> Well strdup is kind of trivial --
> I don't think its one of the "severe problem"
> functions in the C library.

It's not even one of the functions in the standard C library.

--
pete

Simon Biber

unread,
May 31, 2003, 7:39:06 AM5/31/03
to
"Malcolm" <mal...@55bank.freeserve.co.uk> wrote:

> "Simon Biber" <sbi...@optushome.com.au> wrote:
> > Ick - three calls to strlen when one would suffice.
> > Gratuitous inefficiency. I don't think it's premature
> > optimisation to use:
> > char *p = strchr(buf, '\n');
> > if(p != NULL) *p = '\0';
>
> Also the newline is there for a reason - just stripping it
> out is likely to replace ub on buffer overflow with a
> defined-behaviour bug as a partial input is treated as
> whole, and then the next input function also gets an
> invalid line from stdin.

It's easy enough to add


char *p = strchr(buf, '\n');

if(p != NULL) *p = '\0'; else exit(EXIT_FAILURE);

I think that's heaps better than UB.

--
Simon.


Hallvard B Furuseth

unread,
May 31, 2003, 7:42:42 AM5/31/03
to
Simon Biber wrote:

> It's easy enough to add
> char *p = strchr(buf, '\n');
> if(p != NULL) *p = '\0'; else exit(EXIT_FAILURE);

That will fail if the last line in the file does not end with newline.

--
Hallvard

Simon Biber

unread,
May 31, 2003, 7:56:24 AM5/31/03
to

True. It assumes you have a properly formatted text
file, with a newline after every line. The C standard
leaves it implementation-defined whether a text stream
requires a terminating newline character or not.

Perhaps it would work if you changed it to:
while(fgets(buf, sizeof buf, stdin)) {


char *p = strchr(buf, '\n');
if(p != NULL) {
*p = '\0';

} else {
if(!feof(stdin)) {
fputs("Input line too long\n", stderr);
exit(EXIT_FAILURE);
}
}
...
}

Of course, if it is possible to read a line in stages
and process it that way, it is far preferable to the
abort. Or if you need to handle arbitrary-length lines,
perhaps something like CBFalconer's ggets is in order
-- it will read any length line into a realloced buffer.

--
Simon.


CBFalconer

unread,
May 31, 2003, 9:26:09 AM5/31/03
to
Paul Hsieh wrote:
> cbfal...@yahoo.com says...
> > Dann Corbit wrote:
> > > "CBFalconer" <cbfal...@yahoo.com> wrote in message
> > > > Lily wrote:
>
> > > > > I am unclear of how I can use fgets to achieve the same
> > > > > result as using gets.
> > > > >
> > > > > I was coding a command line interpreter program, and I
> > > > > used gets. But it complains the following:
> > > > >
> > > > > /tmp/ccDLLJkR.o: In function `main':
> > > > > /tmp/ccDLLJkR.o(.text+0x27): the `gets' function is
> > > > > dangerous and should not be used.
> > > >
> > > > And it is absolutely right. For a safe and easily used
> > > > alternative, try ggets, available at:
> > > >
> > > > <http://cbfalconer.home.att.net/download/ggets.zip>
> > >
> > > Another alternative:
> > >
> > > http://home.att.net/~jackklein/ctips01.html#safe_gets
>
> Presizing just propogates the memory management problem back to
> the programmer. Setting a maximum input length just turns a
> programming problem into an end-user problem.

>
> > I started my ggets development with 'what attracts newbies to
> > gets', and the answer was 'simplicity'. No worries about \n
> > detection. No worries about actual length (HAH). Just go and
> > get that string. So I tried to match those qualities in ggets.
> >
> > 1. It always ends at the next \n. The \n is always removed.
> > 2. It can also be used on non stdin files. ggets<->fggets
> > 3. However, unlike gets<->fgets, \n is always removed,
>
> fgets does not remove the '\n'. Furthermore fgets ignores any
> encountered '\0' characters, and only terminates on one of '\n',
> EOF, or the length limit.

As I pointed out in 3. Maybe I was unclear.

>
> > 4. I would have liked to return a pointer, but that runs into
> > the need for multiple error possibilities (EOF, file error,
> > memory shortage). So I returned 0 for OK, EOF, or positive.
> > 5. No argument initialization is ever needed. OK, the file
> > for fggets needs to be open.
> >
> > What can go wrong? AFAICT the only possibility is that the user
> > fails to eventually free the returned string and has a memory
> > leak.
>
> You call realloc roughly (strlen(output) / 128) times. Some
> weaker C library implementations perform this very badly and will
> artifically run out of memory far before all the heap memory is
> really consumed. You should use increasing powers of 2, and if
> you are that worried, realloc at the end to shrink it to the
> minimum required memory. This will guarantee that at most
> log_2(strlen(output)) calls to realloc are made.

This is perfectly possible, without changing the interface one
iota. I deliberately chose otherwise because, at least for me,
the normal use is with interactive input or text files, where such
long lines are possible but highly unlikely. I made allowances
for the internal actions of at least one malloc package (mine).
If anything this is (to me) an argument for inclusion in the
standard library package, where the internal action can be suited
to the actual malloc available.

>
> See breadln, or bgets in the "Better String Library" here:
>
> http://bstring.sourceforge.net/

I am changing the subject to better reflect my point.

Malcolm

unread,
May 31, 2003, 11:16:40 AM5/31/03
to

"Simon Biber" <sbi...@optushome.com.au> wrote in message
>
> Or if you need to handle arbitrary-length lines,
> perhaps something like CBFalconer's ggets is in order
> -- it will read any length line into a realloced buffer.
>
That's the real answer. The problem is that, even on a modern PC with 4GB of
swap space, you can't guarantee that your input won't be the one to run the
computer out of memory, so you still have to handle the error case.


karl malbrain

unread,
May 31, 2003, 1:15:04 PM5/31/03
to
That's VIRTUAL MEMORY, not "swap space". We have a ROLLOUT command for
that.
karl m, ACM SIGOPS.

"Malcolm" <mal...@55bank.freeserve.co.uk> wrote in message
news:bbagra$g4j$1...@newsg1.svr.pol.co.uk...

Paul Hsieh

unread,
May 31, 2003, 5:33:03 PM5/31/03
to
In article <3ED8A217...@yahoo.com>, cbfal...@yahoo.com says...

> Paul Hsieh wrote:
> > You call realloc roughly (strlen(output) / 128) times. Some
> > weaker C library implementations perform this very badly and will
> > artifically run out of memory far before all the heap memory is
> > really consumed. You should use increasing powers of 2, and if
> > you are that worried, realloc at the end to shrink it to the
> > minimum required memory. This will guarantee that at most
> > log_2(strlen(output)) calls to realloc are made.
>
> This is perfectly possible, without changing the interface one
> iota. I deliberately chose otherwise because, at least for me,
> the normal use is with interactive input or text files, where such
> long lines are possible but highly unlikely.

Yeah, 640K ought to be enough for anyone. There is no significant disadvantage
for using doubling sizes. An additional issue is that a realloc can cost an
additional memcpy (if the adjacent tail memory is not available.) Since memcpy
is basically O(n), your copy performance can be as bad as:

O (strlen(output) ^ 2 / (2*128))

while the power of 2 solution can only be as bad as:

O (strlen(output) * 2)

For an input that is 50K, for example, my solution would be roughly 100 times
faster in the worst case scenarios of realloc copying.

On the flip side, if your inputs are really small enough, then you are being
wasteful of memory for the average case, meaning that you have to worry about
how many such inputs you accumulate at once.

Using powers of 2 is a no-lose scenario, and you should probably consider it
more seriously. Its not any harder to program. In your case, its just
initialize some counter d, double it with d += d, then use (d - 16) for your
actual length so that it fits nicely with your reallocation scheme.

CBFalconer

unread,
Jun 1, 2003, 12:00:28 AM6/1/03
to
Paul Hsieh wrote:
> In article <3ED8A217...@yahoo.com>, cbfal...@yahoo.com says...
> > Paul Hsieh wrote:
>
> > > You call realloc roughly (strlen(output) / 128) times. Some
> > > weaker C library implementations perform this very badly and will
> > > artifically run out of memory far before all the heap memory is
> > > really consumed. You should use increasing powers of 2, and if
> > > you are that worried, realloc at the end to shrink it to the
> > > minimum required memory. This will guarantee that at most
> > > log_2(strlen(output)) calls to realloc are made.
> >
> > This is perfectly possible, without changing the interface one
> > iota. I deliberately chose otherwise because, at least for me,
> > the normal use is with interactive input or text files, where such
> > long lines are possible but highly unlikely.
>
> Yeah, 640K ought to be enough for anyone. There is no significant
> disadvantage for using doubling sizes. An additional issue is that
> a realloc can cost an additional memcpy (if the adjacent tail memory
> is not available.) Since memcpy is basically O(n), your copy
> performance can be as bad as:
>
... snip ...

>
> Using powers of 2 is a no-lose scenario, and you should probably
> consider it more seriously. Its not any harder to program.
... snip ...

Of course it is no harder to program. If you look at my hashlib
system I do just that.

However there is a quite likely scenario in which the doubling (or
any other ratio) system can be extremely wasteful. It applies
whenever the expanded memory error is non-contiguous with the old,
and the doubling system is the only thing making demands on the
the malloc system. There are also other awkward cases.

The previous allocation has to exist concurrently with the new.
The sum of all the previous releases is of sizes 1, 2, 4, 8, ....
N, which sum up to less than 2N. Thus this space cannot be used
for the new allocation, and must be dumped into the free space
pool. Nothing in the system (the doubler is the only thing making
demands) will be able to use this space. Thus the effective
memory use both expands the actual allocation, and the free space,
by roughly a factor of 2. Things will break much faster.

A sufficiently smart realloc system could work around this by and
large. It would not be a large change to my nmalloc system to
handle it, but the non-contiguous areas will always break things
sooner or later. And most users won't be using my malloc package.

As I said earlier, I designed ggets for the most likely
applications, and a multi-megabyte single text line is not one I
anticipate arising often. strlen efficiency also falls off
rapidly with long lines, yet it seems adequate for the majority of
uses. If you have an application where the ggets realloc actually
is a bind, as shown by measurements, then there is a good reason
to modify it FOR THAT APPLICATION. Meanwhile my every instinct
says my approach is better.

Richard Heathfield

unread,
Jun 1, 2003, 4:15:09 AM6/1/03
to
CBFalconer wrote:

<snip>

> As I said earlier, I designed ggets for the most likely
> applications, and a multi-megabyte single text line is not one I
> anticipate arising often. strlen efficiency also falls off
> rapidly with long lines, yet it seems adequate for the majority of
> uses.

Right, but for really long lines, you'd keep track, right?

> If you have an application where the ggets realloc actually
> is a bind, as shown by measurements, then there is a good reason
> to modify it FOR THAT APPLICATION.

Absolutely. So far, I agree entirely.

> Meanwhile my every instinct
> says my approach is better.

Here, however, I must disagree. I think your way is simpler on the face of
it, but harder to house-keep properly. As you know, I have written a
similar function, although it is modelled more on fgets than on gets, which
makes the housekeeping considerably simpler, at the expense of putting a
couple of odd-looking &s into the args list. Now, I can understand why you
might like ggets more than fgetline -- after all, you wrote ggets -- but I
am not yet convinced that your approach - a new buffer every time, with no
way (IIRC) for the user to limit the size of that buffer - is /better/ than
giving the choices of supplying one's own reallocable buffer pointer and an
upper limit beyond which the user doesn't want the buffer to grow (as a
security measure).

--
Richard Heathfield : bin...@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton

John

unread,
Jun 1, 2003, 5:49:25 AM6/1/03
to

"Richard Heathfield" <dont...@address.co.uk.invalid> wrote in message
news:bbccmc$567$1...@titan.btinternet.com...

<snip>

> might like ggets more than fgetline -- after all, you wrote ggets -- but I
> am not yet convinced that your approach - a new buffer every time, with no
> way (IIRC) for the user to limit the size of that buffer - is /better/
than
> giving the choices of supplying one's own reallocable buffer pointer and
an
> upper limit beyond which the user doesn't want the buffer to grow (as a
> security measure).

Much of what you say makes sense but why an upper limit? I have
one that returns a error if realloc fails, isn't that enough?

John


Richard Heathfield

unread,
Jun 1, 2003, 8:15:40 AM6/1/03
to
John wrote:

>
> "Richard Heathfield" <dont...@address.co.uk.invalid> wrote in message
> news:bbccmc$567$1...@titan.btinternet.com...
>
> <snip>
>
>> might like ggets more than fgetline -- after all, you wrote ggets -- but
>> I am not yet convinced that your approach - a new buffer every time, with
>> no way (IIRC) for the user to limit the size of that buffer - is /better/
> than
>> giving the choices of supplying one's own reallocable buffer pointer and
> an
>> upper limit beyond which the user doesn't want the buffer to grow (as a
>> security measure).
>
> Much of what you say makes sense but why an upper limit?

It gives the programmer (optional) control over the size of input line that
the program will accept. If I don't care, I will pass (size_t)-1 for that
param. Typically, I choose 32768 as a default. In circumstances where it
mattered, I would provide an override option.

> I have
> one that returns a error if realloc fails, isn't that enough?

Well, it's certainly necessary. Whether it is sufficient is, I suppose, a
matter of taste. It's clearly not a cut-and-dried issue that everyone here
agrees on, or Chuck would have addressed it in ggets(), wouldn't he? :-)

CBFalconer

unread,
Jun 1, 2003, 9:04:02 AM6/1/03
to

If ggets fails for memory shortage, it will return a partial line
and a positive error indicator. The user is perfectly free to
free that line (or other things) and carry on.

BTW, in a separate message I justified the linear, rather than
exponential, growth of the buffer FOR THIS APPLICATION.

At the same time it is not a case of use ggets or nothing - your
routine (forget the name offhand) is still available, when those
characteristics are required. One can get input with various
primitives, including getc, fgets, scanf, and even fread if
desired. They are all tools - reach for the appropriate one.
(For the unwary reader, don't reach for gets.)

James Antill

unread,
Jun 1, 2003, 8:09:22 PM6/1/03
to
On Sat, 31 May 2003 21:33:03 +0000, Paul Hsieh wrote:

> In article <3ED8A217...@yahoo.com>, cbfal...@yahoo.com says...
>> Paul Hsieh wrote:
>> > You call realloc roughly (strlen(output) / 128) times. Some
>> > weaker C library implementations perform this very badly and will
>> > artifically run out of memory far before all the heap memory is
>> > really consumed. You should use increasing powers of 2, and if
>> > you are that worried, realloc at the end to shrink it to the
>> > minimum required memory. This will guarantee that at most
>> > log_2(strlen(output)) calls to realloc are made.
>>
>> This is perfectly possible, without changing the interface one
>> iota. I deliberately chose otherwise because, at least for me,
>> the normal use is with interactive input or text files, where such
>> long lines are possible but highly unlikely.
>
> Yeah, 640K ought to be enough for anyone. There is no significant disadvantage
> for using doubling sizes. An additional issue is that a realloc can cost an
> additional memcpy (if the adjacent tail memory is not available.) Since memcpy
> is basically O(n), your copy performance can be as bad as:
>
> O (strlen(output) ^ 2 / (2*128))
>
> while the power of 2 solution can only be as bad as:
>
> O (strlen(output) * 2)

Err, no. If the input is 17 bytes long, and you start at 1, worst case
is...

1 (input, len=1 sz=1)
1 (realloc copy, len=1 sz=2)
1 (input, len=2 sz=2)
2 (realloc copy, len=2 sz=4)
2 (input, len=4 sz=4)
4 (realloc copy, len=4 sz=8)
4 (input, len=8, sz=8)
8 (realloc copy, len=8, sz=16)
8 (input, len=16 sz=16)
16 (realloc copy, len=16, sz=32)
1 (input, len=17 sz=32)

And at the end your length is 17 and you've done 48 copies
(48 being > 17 * 2) -- and this picture only gets _worse_ with bigger
numbers. Note that at the end of it you are wasting 46% of your space
(and the waste percentage _stays constant_ as the numbers increase).
That's also discounting the per call overhead of realloc().

Really the only sane way to do this is to have seperate chunks of memory,
copies are always O(n) and waste is always (chunk size / 2).
If you did it properly you could even share the chunks over multiple
connections.

Someone really needs to write something like that... (see sig. :)

--
James Antill -- ja...@and.org
Need an efficent and powerful string library for C?
http://www.and.org/vstr/

CBFalconer

unread,
Jun 1, 2003, 9:09:17 PM6/1/03
to
James Antill wrote:
>
... snip ...

>
> Err, no. If the input is 17 bytes long, and you start at 1,
> worst case is...
>
> 1 (input, len=1 sz=1)
> 1 (realloc copy, len=1 sz=2)
> 1 (input, len=2 sz=2)
> 2 (realloc copy, len=2 sz=4)
> 2 (input, len=4 sz=4)
> 4 (realloc copy, len=4 sz=8)
> 4 (input, len=8, sz=8)
> 8 (realloc copy, len=8, sz=16)
> 8 (input, len=16 sz=16)
> 16 (realloc copy, len=16, sz=32)
> 1 (input, len=17 sz=32)
>
> And at the end your length is 17 and you've done 48 copies
> (48 being > 17 * 2) -- and this picture only gets _worse_ with
> bigger numbers. Note that at the end of it you are wasting 46%
> of your space (and the waste percentage _stays constant_ as the
> numbers increase). That's also discounting the per call
> overhead of realloc().
>
> Really the only sane way to do this is to have seperate chunks
> of memory, copies are always O(n) and waste is always (chunk
> size / 2). If you did it properly you could even share the
> chunks over multiple connections.

I suggest you read the code in question first. It doesn't work
like that. If it did I would agree that it was foolish.

James Antill

unread,
Jun 1, 2003, 10:56:15 PM6/1/03
to
On Mon, 02 Jun 2003 01:09:17 +0000, CBFalconer wrote:

> I suggest you read the code in question first. It doesn't work
> like that. If it did I would agree that it was foolish.

No, you're code doesn't, but I replied to Paul, who is using power of 2
increments on expanding the buffer (with a minimum of 8, but that
doesn't really matter).
Ggets uses an initial size of 112 and increments of 128.

So for 241 bytes of input, worst case you do...

112 + (n * 128)

112 (input, len=112 sz=112)
112 (realloc copy, len=112 sz=240)
128 (input, len=240 sz=240)
240 (realloc copy, len=240 sz=368)
1 (input, len=241 sz=368)

128
368
128
496
1

...which copies 593 bytes (246% of the work) of data and wastes 35% of
space. Your waste percentage goes down, just like Vstr, however with
just 497 bytes of input ggets copies, worst case, 1713 bytes which is
344% of the work (but only wastes 21% of space).

CBFalconer

unread,
Jun 2, 2003, 6:29:52 AM6/2/03
to
James Antill wrote:
> On Mon, 02 Jun 2003 01:09:17 +0000, CBFalconer wrote:
>
> > I suggest you read the code in question first. It doesn't work
> > like that. If it did I would agree that it was foolish.
>
> No, you're code doesn't, but I replied to Paul, who is using
> power of 2 increments on expanding the buffer (with a minimum of
> 8, but that doesn't really matter).
>
> Ggets uses an initial size of 112 and increments of 128.

Now I KNOW you have read the source :-)

>
> So for 241 bytes of input, worst case you do...
>
> 112 + (n * 128)
>
> 112 (input, len=112 sz=112)
> 112 (realloc copy, len=112 sz=240)
> 128 (input, len=240 sz=240)
> 240 (realloc copy, len=240 sz=368)
> 1 (input, len=241 sz=368)
>
> 128
> 368
> 128
> 496
> 1
>
> ...which copies 593 bytes (246% of the work) of data and wastes
> 35% of space. Your waste percentage goes down, just like Vstr,
> however with just 497 bytes of input ggets copies, worst case,
> 1713 bytes which is 344% of the work (but only wastes 21% of

> space.

This is only true if realloc copies everything each time. This
depends on many things. The copy is about equal in cost to a
strlen call. For a suitable malloc/free/realloc package the
expected number of actual copies may well be zero. For an example
see my nmalloc package for DJGPP.

Also see my earlier posting on the trap of the doubling
mechanism. At any rate, all this discussion is about the internal
implementation, which can be freely altered without affecting the
ggets interface. It was designed that way.

So once your profile runs show that the bind is in ggets, feel
free to replace the implementation. In many cases a suitable
change would be one #define in the ggets source.

Paul Hsieh

unread,
Jun 2, 2003, 8:23:18 AM6/2/03
to
"James Antill" <james-...@and.org> wrote:
> On Sat, 31 May 2003 21:33:03 +0000, Paul Hsieh wrote:
> > Since memcpy is basically O(n), your copy performance can be
> > as bad as:
> >
> > O (strlen(output) ^ 2 / (2*128))
> >
> > while the power of 2 solution can only be as bad as:
> >
> > O (strlen(output) * 2)
>
> Err, no. If the input is 17 bytes long, and you start at 1, worst case
> is...

I don't understand how to read the strange table you posted here. In
any event, its clearly wrong. The sequences of memory allocations for
a 17 character input (to bgets) is as follows:

malloc(8); /* from a cstr2bstr ("") */
realloc(16); /* from a balloc (9) */
realloc(32); /* from a balloc (17) */

The maximum amount of extra copying is 8 + 16 for a total of 24
characters. Note that 24 < 2 * 17. It does not get *worse* for
bigger numbers -- the property of total copying and/or memory
allocations < strlen(output) * 2 is maintained for any sized input.

Of course, unlike other solutions, the bstring library is short,
readable, easy to compile and test, and is known to work on multiple
platforms (hint, hint). Playing around with it takes very little
effort. Besides being documented as having this property, the above
observation is trivial to duplicate (add some printfs in cstr2bstr,
and balloc, and write an dozen line trivial main.)

CBFalconer's solution has the problem that for a 17 byte inputs, 95
bytes additional bytes are allocated, making it 6.5 times more than
necessary. While his memory efficiency improves as the input size
increases, his copying becomes much much worse.

In concrete terms, for inputs of size > 512, ggets will perform more
bytes copied then bgets, and for inputs of size < 64, ggets will
allocate more memory than bgets. For inputs of size > 1024, ggets
will perform *more* allocations than bgets. His best argument is that
asymptotically, the amount of memory allocated is 50% tighter for
ggets over bgets. But asymptotically his algorithm cannot even
complete in a reasonable amount of time. So in comparison to bgets,
ggets only looks favorable for inputs in the range of 64 to 512
characters, and even then bgets is never worse than twice as bad
(except in number of reallocations, but that is never more than 4
extra.)

Personally, I don't think CBFalconer's intuition is justified in light
of this.

Paul Hsieh

unread,
Jun 2, 2003, 8:58:57 AM6/2/03
to
CBFalconer <cbfal...@yahoo.com> wrote:

> Paul Hsieh wrote:
> > Using powers of 2 is a no-lose scenario, and you should probably
> > consider it more seriously. Its not any harder to program.
> ... snip ...
>
> Of course it is no harder to program. If you look at my hashlib
> system I do just that.
>
> However there is a quite likely scenario in which the doubling (or
> any other ratio) system can be extremely wasteful. It applies
> whenever the expanded memory error is non-contiguous with the old,
> and the doubling system is the only thing making demands on the
> the malloc system. There are also other awkward cases.

You neglected to realize the by using powers of 2, there are far
*fewer* actual occurrences of realloc. While each of the reallocs in
ggets has a higher chance of not requiring a memcpy than each that is
in bgets, yours are far more numerous, and thus you will be throwing
dice more often, even though they are somewhat loaded. In the not so
atypical case of the allocation being adjacent to nothing but empty
space (think about it, its a succession of reallocs after a malloc)
the bgets solution just ends up working better.



> The previous allocation has to exist concurrently with the new.
> The sum of all the previous releases is of sizes 1, 2, 4, 8, ....
> N, which sum up to less than 2N. Thus this space cannot be used
> for the new allocation, and must be dumped into the free space
> pool. Nothing in the system (the doubler is the only thing making
> demands) will be able to use this space. Thus the effective
> memory use both expands the actual allocation, and the free space,
> by roughly a factor of 2. Things will break much faster.

Things will break at most twice as fast as any other solution. The
lesson of the last 30 years of computing is that this is about 18
months. You also neglected to point out that in situtations where
this doubled amount of memory usage in bgets is a problem corresponds
to situtations where ggets has serious performance problems.



> A sufficiently smart realloc system could work around this by and
> large. It would not be a large change to my nmalloc system to
> handle it, but the non-contiguous areas will always break things
> sooner or later. And most users won't be using my malloc package.

Me thinks you don't understand all aspects of the issue. The
contiguousness/fragmentation is a problem that bothers all good
allocation schemes with roughly an equal occurrence of degenerate
cases. But the other issue is the sheer *number* of reallocations.
Some poor heap algorithms accumulate strange internal structures that
just get slower and slower as more and more reallocations are made in
the system (two of the compilers on my system have this problem.)

Trust me, your solution will *shred* the heaps on many systems if you
have large amounts of input, whereas mine seems to survive by sheer
virtue of minimizing the number of times it calls realloc on difficult
to deal with sizes.



> As I said earlier, I designed ggets for the most likely
> applications, and a multi-megabyte single text line is not one I
> anticipate arising often.

So you feel ok with publicizing this non-scalable solution as an
answer to other people's problems, for which you cannot possibly know
the size of their input? As I responded to James Antill, your
solution has the additional problem of allocating far too much memory
for many many very small inputs. Your solution only works really well
for a very specific range of input sizes that isn't very good for
either large or small inputs.

> [...] strlen efficiency also falls off


> rapidly with long lines, yet it seems adequate for the majority of
> uses.

Well, I beg to differ on this point. Calling strlen is always the
slowest approach, and any string manipulation limited algorithm will
show this to be a serious performance hit (or strcat, which is really
just a strlen + strcpy.)

> [...] If you have an application where the ggets realloc actually


> is a bind, as shown by measurements, then there is a good reason
> to modify it FOR THAT APPLICATION. Meanwhile my every instinct
> says my approach is better.

Oddly, I would not recommend that bgets be modified for any reason.
Though, you may prefer breadln, which will perform better if you have
a bufferable input stream (isatty() returns false, or you just know
the stream is continuous.)

Dan Pop

unread,
Jun 2, 2003, 10:12:22 AM6/2/03
to
In <3ed8984f$0$17187$afc3...@news.optusnet.com.au> "Simon Biber" <sbi...@optushome.com.au> writes:

>"Hallvard B Furuseth" wrote:
>> Simon Biber wrote:
>>
>> > It's easy enough to add
>> > char *p = strchr(buf, '\n');
>> > if(p != NULL) *p = '\0'; else exit(EXIT_FAILURE);
>>
>> That will fail if the last line in the file does
>> not end with newline.
>
>True. It assumes you have a properly formatted text
>file, with a newline after every line. The C standard
>leaves it implementation-defined whether a text stream
>requires a terminating newline character or not.

Therefore, well written code can never assume that the last line in a
text file is properly terminated.

I really like implementations that ignore the last line if it's not
newline-terminated.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan...@ifh.de

Eric Sosman

unread,
Jun 2, 2003, 10:51:31 AM6/2/03
to
CBFalconer wrote:
>
> I started my ggets development with 'what attracts newbies to
> gets', and the answer was 'simplicity'. No worries about \n
> detection. No worries about actual length (HAH). Just go and get
> that string. So I tried to match those qualities in ggets.
> [...]

> 4. I would have liked to return a pointer, but that runs into
> the need for multiple error possibilities (EOF, file error,
> memory shortage). So I returned 0 for OK, EOF, or positive.

My getline() function returns a pointer, or NULL on
EOF or error or memory shortage. This doesn't mean the
caller cannot tell the three conditions apart, just that
a little extra code is necessary:

if ((ptr = getline(stream)) == NULL) {
if (feof(stream))
/* we reached end-of-input */;
else if (ferror(stream))
/* an I/O error occurred */;
else
/* we ran out of memory */;
}

In many instances I find it really doesn't matter
whether a failure occurred because of memory shortage
or because of I/O error; the program's future course
will have one "success" path and one "failure" path.
When that's the case, I wind up writing just a
single error test:

while ((ptr = getline(stream)) != NULL) {
/* digest an input line */
}
if (! feof(stream))
return A_FAILURE_CODE;
/* end-of-input processing */
return A_SUCCESS_CODE;

Like Beauty, Simplicity is the beholder's.

--
Eric....@sun.com

Dan Pop

unread,
Jun 2, 2003, 11:00:21 AM6/2/03
to

Certainly not on systems with lazy swap space allocation, unless they
have a per-process memory usage limit. Digital UNIX was particularly
fun in this respect: when the system started to run out of virtual memory
it started to kill randomly selected processes, to make more memory
available to the process it has foolishly guaranteed that this memory is
available, without even trying to check!

On Linux, things are somewhat better, because malloc() checks that enough
virtual memory is available at the time it is called. However, if the
program doesn't quickly use it, another program may "steal" it and we
have the same problem as above.

The "cure" is eager swap allocation, but it has its own problems, too,
mostly caused by programs overallocating memory. Lazy swap space
allocation (especially without any kind of checking) is ideal for sparse
arrays: you just treat them as ordinary arrays, the space allocated but
not used doesn't consume any resource, except for virtual memory address
space.

John

unread,
Jun 2, 2003, 11:48:55 AM6/2/03
to

"Dan Pop" <Dan...@cern.ch> wrote in message
news:bbfoq5$buk$1...@sunnews.cern.ch...

> In <bbci88$869bb$1...@ID-97861.news.dfncis.de> "John"
<John....@btinternet.com> writes:
>
<snip>

> >
> >Much of what you say makes sense but why an upper limit? I have
> >one that returns a error if realloc fails, isn't that enough?
>
> Certainly not on systems with lazy swap space allocation, unless they
> have a per-process memory usage limit. Digital UNIX was particularly
> fun in this respect: when the system started to run out of virtual memory
> it started to kill randomly selected processes, to make more memory
> available to the process it has foolishly guaranteed that this memory is
> available, without even trying to check!
>
> On Linux, things are somewhat better, because malloc() checks that enough
> virtual memory is available at the time it is called. However, if the
> program doesn't quickly use it, another program may "steal" it and we
> have the same problem as above.
>
> The "cure" is eager swap allocation, but it has its own problems, too,
> mostly caused by programs overallocating memory. Lazy swap space
> allocation (especially without any kind of checking) is ideal for sparse
> arrays: you just treat them as ordinary arrays, the space allocated but
> not used doesn't consume any resource, except for virtual memory address
> space.

Ok, thanks for the explanation. What is a sparse array?

John

CBFalconer

unread,
Jun 2, 2003, 12:00:21 PM6/2/03
to
Paul Hsieh wrote:
> CBFalconer <cbfal...@yahoo.com> wrote:
>
> > ... snip ...

>
> > As I said earlier, I designed ggets for the most likely
> > applications, and a multi-megabyte single text line is not one I
> > anticipate arising often.
>
> So you feel ok with publicizing this non-scalable solution as an
> answer to other people's problems, for which you cannot possibly know
> the size of their input? As I responded to James Antill, your
> solution has the additional problem of allocating far too much memory
> for many many very small inputs. Your solution only works really well
> for a very specific range of input sizes that isn't very good for
> either large or small inputs.

Well, we obviously disagree and are making no progress towards
common ground. Incidentally, ggets present implementation
reallocs at exit, so it does not waste any memory. This is always
a reduction, and a realloc that can't do that efficiently is
broken IMO. At any rate both my and your views are now available
for posterity :-)

CBFalconer

unread,
Jun 2, 2003, 1:13:01 PM6/2/03
to
Dan Pop wrote:
> "John" <John....@btinternet.com> writes:
> >
> > <snip>

> >
> >Much of what you say makes sense but why an upper limit? I have
> >one that returns a error if realloc fails, isn't that enough?
>
> Certainly not on systems with lazy swap space allocation, unless
> they have a per-process memory usage limit. Digital UNIX was
> particularly fun in this respect: when the system started to run
> out of virtual memory it started to kill randomly selected
> processes, to make more memory available to the process it has
> foolishly guaranteed that this memory is available, without even
> trying to check!
>
> On Linux, things are somewhat better, because malloc() checks
> that enough virtual memory is available at the time it is called.
> However, if the program doesn't quickly use it, another program
> may "steal" it and we have the same problem as above.
>
> The "cure" is eager swap allocation, but it has its own problems,
> too, mostly caused by programs overallocating memory. Lazy swap
> space allocation (especially without any kind of checking) is
> ideal for sparse arrays: you just treat them as ordinary arrays,
> the space allocated but not used doesn't consume any resource,
> except for virtual memory address space.

All these problems arise on machines where multiple processes
exist, contrary to the basic assumptions of C. A further
possibility is that the actual use of the allocated memory can
block, just like an i/o operation, until it can be satisfied.
This too would be unexpected, and have its own set of problems.

CBFalconer

unread,
Jun 2, 2003, 1:13:04 PM6/2/03
to
John wrote:
> "Dan Pop" <Dan...@cern.ch> wrote in message
>
... snip ...

> >
> > The "cure" is eager swap allocation, but it has its own problems,
> > too, mostly caused by programs overallocating memory. Lazy swap
> > space allocation (especially without any kind of checking) is
> > ideal for sparse arrays: you just treat them as ordinary arrays,
> > the space allocated but not used doesn't consume any resource,
> > except for virtual memory address space.
>
> Ok, thanks for the explanation. What is a sparse array?

An array where a significant portion is unused. If you have an
array[10000] of something, and only use indices 0..10, 200..210,
9000..9010 for example, there is no need for actual storage for
the areas not yet used. The remainder can be considered to be 0
initialized even if they don't exist by suitable games within the
virtual memory manager.

CP/M implemented sparse files (not arrays). I believe some Unices
do likewise. One more feature that BillyBoy lost in MsDos. Very
useful for data bases.

CBFalconer

unread,
Jun 2, 2003, 1:13:09 PM6/2/03
to

Fair enough. I think the point is that the main line is handled
simply, and the frills are available if needed.

CBFalconer

unread,
Jun 2, 2003, 1:13:11 PM6/2/03