Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

reading entire input using getdelim

31 views
Skip to first unread message

Oğuz

unread,
Sep 17, 2022, 1:00:40 AM9/17/22
to
I'm writing a program that needs to read its input in entirety before
processing it. And since its input is expected to be text (i.e. no NULs
embedded), it does it like this:

char *buf = NULL;
size_t n = 0;
if (getdelim(&buf, &n, 0, stdin) == -1)
exit(1);

This seems to work just fine, but feels too good to be true. Am I
missing something? Is there anything wrong with it?

Keith Thompson

unread,
Sep 17, 2022, 1:20:24 AM9/17/22
to
It's POSIX-specific, so it's not 100% portable. If there does happen to
be a null character in the input, it will quietly drop everything after
it. It would be cleaner to have a function that does the same thing
without a delimiter, so it reads everything up to end-of-file, but I'm
not aware that POSIX has such a function (though you could roll your
own).

An implementation is likely to use realloc() internally, which might
waste some memory if the input file is very large.

And if the input is empty, it returns -1.

It's arguably an abuse of getdelim(), and the special treatment of null
characters in the input *could* an issue, but I don't see any other
problems with it.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Oğuz

unread,
Sep 17, 2022, 5:31:05 AM9/17/22
to
On 9/17/22 8:20 AM, Keith Thompson wrote:
> It's POSIX-specific, so it's not 100% portable. If there does happen to
> be a null character in the input, it will quietly drop everything after
> it.

The rest of the program doesn't expect a NUL byte either, should have
mentioned that in OP. getdelim returns the number of bytes read into the
buffer so I can make it so that it doesn't quietly drop everything after
a NUL but fail the program instead.

> And if the input is empty, it returns -1.

Didn't notice that, thanks. I guess I can detect empty input by setting
errno to 0 before calling getdelim and testing if it changed if getdelim
returns -1.

Ben Bacarisse

unread,
Sep 17, 2022, 8:48:25 AM9/17/22
to
Oğuz <oguzism...@gmail.com> writes:

> On 9/17/22 8:20 AM, Keith Thompson wrote:
<cut>
>> And if the input is empty, it returns -1.
>
> Didn't notice that, thanks. I guess I can detect empty input by
> setting errno to 0 before calling getdelim and testing if it changed
> if getdelim returns -1.

You can use ferror() to distinguish between the two.

Beware of malicious input as there is no way to stop the function from
trying to allocate enough memory for a giant file.

--
Ben.

Oğuz

unread,
Sep 17, 2022, 9:21:31 AM9/17/22
to
On 9/17/22 3:48 PM, Ben Bacarisse wrote:> You can use ferror() to
distinguish between the two

Ah, that's better. Thank you

>
> Beware of malicious input as there is no way to stop the function from
> trying to allocate enough memory for a giant file.
>

Yes, I wouldn't use this in a production setting

Kenny McCormack

unread,
Sep 17, 2022, 10:04:51 AM9/17/22
to
In article <slurp-2022...@ram.dialup.fu-berlin.de>,
Stefan Ram <r...@zedat.fu-berlin.de> wrote:
>=?UTF-8?B?T8SfdXo=?= <oguzism...@gmail.com> writes:
>>This seems to work just fine, but feels too good to be true. Am I
>>missing something? Is there anything wrong with it?
>
> Here's my attempt at it with only standard C functions.
> This is the first code that worked, written today and
> not carefully reviewed!

Anyone can "roll their own". That's not the point of this thread.
(And zillions of CLC posters have done so, over the years... Add your name
to the list [if it isn't there already])

The whole point of this thread is to ask if the built-in one is up to the
task. It seems to me the basic answer is "Yes".

--
Trump - the President for the rest of us.

https://www.youtube.com/watch?v=JSkUJKgdcoE

Kaz Kylheku

unread,
Sep 17, 2022, 1:43:01 PM9/17/22
to
Chances are you're using tools and languages in a production setting
which have "snarf file as string" functions in their run-time libraries
that they are happily using.

The shell syntax "`$(cat file)" will grab the entire file and turn it into
a word interpolated into the command line; it takes no argument on
limiting the size, and you see it in system shell scripts all the time.

Just because the code is "in production" doesn't mean that the input is
controlled by a malicious user who is trying to bring down the
application.

If your code processes a stream in its entirety, it's may be open to a
DoS, even if it doesn't buffer all of it. It might not run out of
memory, but it will be stuck there reading the stream. The malicious
user just feeds a really large stream, perhaps an infinite one. Or a
small amount of input, but at a glacial pace, like one byte at a time,
with five minutes in between. (Anti-spam "honey pot" mail servers
do this sort of things to suspected spammer connections.)

To be completely paranoid, you need timeouts everywhere.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

Kaz Kylheku

unread,
Sep 17, 2022, 1:56:53 PM9/17/22
to
On 2022-09-17, Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> { while( !feof( file ))
> { if( ferror( file )){ free( buff ); return; }
> int const i = fgetc( file );
> if( !*buff )return;
> if( i < 0 )return;
> append( buff, size, offset,( unsigned char )i ); }}

I've used this sort of Lisp-like formatting for a bunch of code in the past.

The grammar actions in the TXR parser use it, e.g:

json_vals : json_val { $$ = if3(parser->quasi_level > 0 &&
unquotes_occur($1, 0),
cons($1, nil),
vector(one, $1)); }
| json_vals ',' json_val { if (consp($1))
{ $$ = cons($3, $1); }
else if (parser->quasi_level > 0 &&
unquotes_occur($3, 0))
{ val li = list_vec($1);
$$ = cons($3, li); }
else
{ vec_push($1, $3);
$$ = $1; } }

Kenny McCormack

unread,
Sep 18, 2022, 11:17:15 AM9/18/22
to
>r...@zedat.fu-berlin.de (Stefan Ram) writes:
>>( FILE * const file,
>> unsigned char * *buff,
>> size_t *size,
>> size_t *offset )
>>{ while( !feof( file ))
>> { if( ferror( file )){ free( buff ); return; }
>
> "free( buff )" is a bug, it should be "free( *buff )".
>
> In the meantime, I rewrote the function to accept a maximum
> size for the buffer and use "fread" instead of "fgetc".

You're still not accomplishing anything that hasn't been done already
thousands of times.

Nor are you answering OP's question.

--
I've learned that people will forget what you said, people will forget
what you did, but people will never forget how you made them feel.

- Maya Angelou -
0 new messages