Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

sscanf behavior (bug in glibc?)

41 views
Skip to first unread message

Laurent Deniau

unread,
Mar 21, 2002, 1:35:38 AM3/21/02
to
I was testing performance of sscanf vs fscanf and I found that the behavior of
sscanf is quite strange (gcc 2.95.2 i386). It seems that sscanf always parse the
entire string, even if all arguments specified in the format are already
converted. This is rather annoying if we want to parse step by step a very long
string with successive calls to sscanf because in that case, the complexity of
sscanf is O(n^2) while the one of fscanf remains O(n) (correct behavior).

The small program below shows the problem:

./sscanf_test 10000
sum = 10000, long string = no, time = 0.03s
./sscanf_test 10000 yes
sum = 10000, long string = yes, time = 1.27s

The results above show that considering PI with its trailing zero stops sscanf
immediately after the conversion and we get a reasonable time. If we do not
consider the trailing zero, sscanf stops after having read the entire string and
the time becomes not acceptable.

The C99 norm doesn't say anything about this strange behavior, so I conclude to
a bug in the glibc...

Am I wrong?

a+, ld.

----------------------

cat sscanf_test.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define PI "3.14 "
#define CHRONO (((double)t1-t0)/CLOCKS_PER_SEC)

int
main(int argc, char *argv[])
{
size_t const max_item = strtoul(argc == 2 ? argv[1] : "10000", NULL, 0);
size_t const item_size = strlen(PI) + (argc != 3);
size_t const max_size = max_item * item_size;
char str[max_size + 1];
clock_t t0, t1;
size_t sum = 0;
size_t i;
float pi;

for(i=0; i<max_size; i += item_size)
memcpy(str+i, PI, item_size);
str[max_size] = 0;

t0 = clock();
for(i=0; i<max_size; i += item_size)
sum += sscanf(str+i, "%f", &pi);
t1 = clock();

printf("sum = %u, long string = %s, time = %gs\n",
sum, argc != 3 ? "no" : "yes", CHRONO);

return 0;
}

--
[ Laurent Deniau -- Scientific Computing & Data Analysis ]
[ CERN -- The European Laboratory for Particle Physics ]
[ Laurent...@cern.ch - http://cern.ch/Laurent.Deniau ]
[ -- One becomes old when dreams become regrets -- ]
--
comp.lang.c.moderated - moderation address: cl...@plethora.net

those who know me have no need of my name

unread,
Mar 21, 2002, 11:11:53 PM3/21/02
to
[fu-t set]
User-Agent: Gnus/5.090006 (Oort Gnus v0.06) XEmacs/21.1 (Cuyahoga Valley,
i386-redhat-linux)

<clcm-2002...@plethora.net> divulged:

>I was testing performance of sscanf vs fscanf and I found that the
>behavior of sscanf is quite strange (gcc 2.95.2 i386).

given that only the gnu c compiler and library were tested by you the post
should have gone to a group that concentrates on those items.

>It seems that sscanf always parse the entire string, even if all arguments
>specified in the format are already converted.

>The small program below shows the problem:

i modified your test program to display the length of each string in the
buffer (in the argc!=3 case) and the length of the buffer overall (in the
!(argc!=3) case), which i've included below, feeling that this made the
attribute actually being tested clearer, i.e., 10,000 sscanf's of a 5 char
string each containing "3.14 " versus scanning a 50,000 character string
containing 10,000 repetitions of "3.14 ", and while i obtained different
results they are comparable so i've not quoted your results.

| $ ./sscanf_test 10000
| sum = 10000, strlen = 5, time = 0.03s
| $ ./sscanf_test 10000 yes
| sum = 10000, strlen = 50000, time = 0.84s

>The results above show that considering PI with its trailing zero stops
>sscanf immediately after the conversion and we get a reasonable time. If
>we do not consider the trailing zero, sscanf stops after having read the
>entire string and the time becomes not acceptable.
>
>The C99 norm doesn't say anything about this strange behavior, so I
>conclude to a bug in the glibc...

glibc uses `memory based files' to deal with buffers as if they were files,
so that the scanf family uses a common specifier scan routine, which calls
fgetc to obtain the next character to consider and ungetc to put it back
onto the `stream' if it's outside the domain of the specifier. this in
conjunction with leading space elimination appears to have significant
overhead.

testing this on a different type of system, e.g., tru64 unix 5.1a on a
ds-10l w/466mhz ev6 cpu using the compaq supplied c compiler and library
produces ...

| $ ./sscanf_test 10000
| sum = 10000, strlen = 5, time = 0.016666s
| $ ./sscanf_test 10000 yes
| sum = 10000, strlen = 50000, time = 0.116662s

slightly modified version of original program [bug calculating max_item
corrected and display strlen(str) instead of `long string = yes/no']:

| #include <stdio.h>
| #include <stdlib.h>
| #include <string.h>
| #include <time.h>
|
| #define PI "3.14 "
| #define CHRONO (((double)t1-t0)/CLOCKS_PER_SEC)
|
| int
| main(int argc, char *argv[])
| {

| size_t const max_item = strtoul(argc >= 2 ? argv[1] : "10000", NULL, 0);


| size_t const item_size = strlen(PI) + (argc != 3);
| size_t const max_size = max_item * item_size;
| char str[max_size + 1];
| clock_t t0, t1;
| size_t sum = 0;
| size_t i;
| float pi;
|
| for(i=0; i<max_size; i += item_size)
| memcpy(str+i, PI, item_size);
| str[max_size] = 0;
|
| t0 = clock();
| for(i=0; i<max_size; i += item_size)
| sum += sscanf(str+i, "%f", &pi);
| t1 = clock();
|

| printf("sum = %u, strlen = %u, time = %gs\n",
| sum, strlen(str), CHRONO);
|
| return 0;
| }

--
bringing you boring signatures for 17 years

CBFalconer

unread,
Mar 21, 2002, 11:11:59 PM3/21/02
to

Your code is not legal C to start with:

[1] c:\c\malloc>gcc deniau.c
blah.c: In function `main':
blah.c:15: warning: ANSI C forbids variable-size array `str'
blah.c:31: warning: unsigned int format, size_t arg (arg 2)

[1] c:\c\malloc>timerun a 10000 yes
Timer 3 on: 6:33:19
sum = 10000, long string = yes, time = 37.9121s
Timer 3 off: 6:33:58 Elapsed: 0:00:38.39

[1] c:\c\malloc>timerun a
Timer 3 on: 6:34:10
sum = 10000, long string = no, time = 0.274725s
Timer 3 off: 6:34:10 Elapsed: 0:00:00.49

It may well be considered a quality of implementation factor, but
it is not a bug. Similar things show up in the DJGPP
implementation of free in glibc (which I have cured).

The obvious cure is to write your own scanner and to deal with the
input as a stream, rather than as a string.

--
Chuck F (cbfal...@yahoo.com) (cbfal...@XXXXworldnet.att.net)
Available for consulting/temporary embedded and systems.
(Remove "XXXX" from reply address. yahoo works unmodified)
mailto:u...@ftc.gov (for spambots to harvest)

Hans-Bernhard Broeker

unread,
Mar 21, 2002, 11:12:16 PM3/21/02
to
In comp.lang.c.moderated Laurent Deniau <Laurent...@cern.ch> wrote:

> I was testing performance of sscanf vs fscanf and I found that the
> behavior of sscanf is quite strange (gcc 2.95.2 i386).

Please note that the compiler version has nothing to do with the
performance of libc functions, unless those happen to be replaced by
inline versions, by the compiler. Which almost certainly won't ever
be the case for sscanf().

> The C99 norm doesn't say anything about this strange behavior, so I
> conclude to a bug in the glibc...

Definitely not a bug --- it can't be because the documentation doesn't
say *anything* about the timing behaviour of sscanf. Where no
definition of "correct" behaviour exists, no observed behaviour,
however bad it may seem, can be incorrect, either.

So this is a quality-of-implementation issue, at most, and it's not
restricted to glibc. I've just found O(n^2) on another platform with
a different libc, too.

BTW: if I change the call from sscanf() to strtod(), the run time
becomes O(n) in all platforms I cared to test. I.e. the difference
must happen in sscanf() itself, and be caused by the length of the
remainder of the input string following the matched 3.14.

[...]


> {
> size_t const max_item = strtoul(argc == 2 ? argv[1] : "10000", NULL, 0);

A bug in your testing program, I think. You'll want to test for
argc>=2 instead.

> size_t const item_size = strlen(PI) + (argc != 3);
> size_t const max_size = max_item * item_size;
> char str[max_size + 1];

For full ANSI C89 conformity, you'd better change this to use
malloc()'ed storage. Variable-length arrays are a C99-ism.


--
Hans-Bernhard Broeker (bro...@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Dan Pop

unread,
Mar 22, 2002, 9:19:37 AM3/22/02
to
In <clcm-2002...@plethora.net> Laurent Deniau <Laurent...@cern.ch> writes:

>I was testing performance of sscanf vs fscanf and I found that the behavior of
>sscanf is quite strange (gcc 2.95.2 i386). It seems that sscanf always parse the
>entire string, even if all arguments specified in the format are already
>converted.

Such a behaviour is certainly allowed by the standard.

>This is rather annoying if we want to parse step by step a very long
>string with successive calls to sscanf

How do you plan to do that, without knowing where the previous sscanf call
has stopped the conversion process? sscanf only reports the number of
successful conversions performed, with no clue about how many characters
were processed by those conversions. If you have this piece of critical
information, you can trivially shorten the input string, by inserting
a null byte in the right place, before calling sscanf. Of course, you
restore the affected byte before the next sscanf call.

>because in that case, the complexity of
>sscanf is O(n^2) while the one of fscanf remains O(n) (correct behavior).

Why is an O(n^2) behaviour incorrect? Chapter and verse, please.

>The small program below shows the problem:
>
>./sscanf_test 10000
>sum = 10000, long string = no, time = 0.03s
>./sscanf_test 10000 yes
>sum = 10000, long string = yes, time = 1.27s
>
>The results above show that considering PI with its trailing zero stops sscanf
>immediately after the conversion and we get a reasonable time. If we do not
>consider the trailing zero, sscanf stops after having read the entire string and
>the time becomes not acceptable.
>
>The C99 norm doesn't say anything about this strange behavior, so I conclude to
>a bug in the glibc...

Your logic is severely broken: if the standard doesn't say anything on a
particular issue, no matter what an implementation does with that issue,
it can't violate the standard, so its conformance can't be affected.

Additionally, gcc + glibc are not (and do not claim to be) a conforming
C99 implementation. The only language specification relevant to them
is C89. But, since C89 is silent on this issue, too, this doesn't make much of
a difference here.

>Am I wrong?

Your observations are, probably, correct (although your program isn't),
but your conclusions about the implementation's conformance are wrong.

>cat sscanf_test.c
>
>#include <stdio.h>
>#include <stdlib.h>
>#include <string.h>
>#include <time.h>
>
>#define PI "3.14 "
>#define CHRONO (((double)t1-t0)/CLOCKS_PER_SEC)
>
>int
>main(int argc, char *argv[])
>{
> size_t const max_item = strtoul(argc == 2 ? argv[1] : "10000", NULL, 0);

^^^^^^^^^
When argc == 3, you're ignoring argv[1] and use "10000" instead. I doubt
this is the intended behaviour of the program.

> size_t const item_size = strlen(PI) + (argc != 3);
> size_t const max_size = max_item * item_size;
> char str[max_size + 1];

^^^^^^^^^^^^
This definition is not allowed by C89 and your implementation is not a
conforming C99 implementation. Are you sure you've used gcc as a
conforming standard C compiler to compile your program? If this is not
the case, you can't draw any kind of valid conclusions from its
behaviour.

After fixing your program, so that I can try it on other implementations,
I ran it on HPUX, Solaris, IRIX and Digital Unix, using the native
implementations of those systems. The O(n*n) behaviour could be observed
on all of them. E.g.

shift5:~ 4> uname -a
IRIX64 shift5 6.5 07151439 IP27
shift5:~ 5> ./a.out 10000
sum = 10000, long string = no, time = 0.05s
shift5:~ 6> ./a.out 10000 yes
sum = 10000, long string = yes, time = 2.03s

So, it certainly makes sense to temporarily shorten the input string
before calling sscanf on it.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan...@ifh.de

those who know me have no need of my name

unread,
Mar 23, 2002, 12:55:45 AM3/23/02
to
<clcm-2002...@plethora.net> divulged:
>Laurent Deniau wrote:

[snip all except the `offending' lines]

>> char str[max_size + 1];

>> printf("sum = %u, long string = %s, time = %gs\n",


>> sum, argc != 3 ? "no" : "yes", CHRONO);

>Your code is not legal C to start with:

>[1] c:\c\malloc>gcc deniau.c
>blah.c: In function `main':
>blah.c:15: warning: ANSI C forbids variable-size array `str'

your compiler implements this construct, but issues a warning only because
it's syntax checker is not using current standard. this does not make the
posted code illegal.

>blah.c:31: warning: unsigned int format, size_t arg (arg 2)

this is a proper criticism, the specifier should have been `%zu'. but,
using `%u' doesn't make the program non-conformant nor does it mean the
program will invoke undefined behavior unless the implementation makes
size_t an alias for something other than unsigned int, so it's a fairly
weak critique since that qualification wasn't included.

so. since these two warnings from your compiler do not prove that the code
is `illegal' it makes your claim that the code is `illegal' unfounded, or
insufficiently qualified so as to be well founded, and i'm finding the
continued habit (by several posters of clc) to (again) be annoying.

all that said ...

chuck, why did you feel that it was necessary to claim that the code is
illegal? i've seen you be clear about k&r vs c89/c90, so what made this
code different enough that decided not to respond similarly (e.g., that it
isn't valid c89 code)? were you just trying to say `if you are going to
ask about potential bugs in gcc and/or glibc you should be posting to some
other group', but still provide some sort of answer?

others, what are your feelings? am i still being unsane (i've brought this
up before), and i should just accept that some people will critique posted
code using whatever version of the standards they wish to use without
qualifying their response so as to include that datum? would it really be
acceptable for someone to declare a program `illegal' merely because it
uses the void type, without mentioning that they used k&r1 as the criteria
or that other compilers might support it since it is part of the language.

i do not find the claim that `there are almost no fully c99 conformant
compilers' to be sufficient reason to disallow any construct added to the
language by c99. an example is the referenced program, which makes use of
`variable length arrays'. a warning seems appropriate, due to the claim
above, but that's all, just a warning, e.g., `that may not be available in
other, pre-c99, compilers'.

--
bringing you boring signatures for 17 years

Andreas Schwab

unread,
Mar 23, 2002, 12:56:07 AM3/23/02
to
Dan...@ifh.de (Dan Pop) writes:

|> >This is rather annoying if we want to parse step by step a very long
|> >string with successive calls to sscanf
|>
|> How do you plan to do that, without knowing where the previous sscanf call
|> has stopped the conversion process? sscanf only reports the number of
|> successful conversions performed, with no clue about how many characters
|> were processed by those conversions.

%n No input is consumed. The corresponding argument shall be a
pointer to signed integer into which is to be written the number
of characters read from the input stream so far by this call to
the fscanf function.

Andreas.

--
Andreas Schwab, SuSE Labs, sch...@suse.de
SuSE GmbH, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Daniel Fox

unread,
Mar 23, 2002, 3:02:52 AM3/23/02
to
those who know me have no need of my name <not-a-rea...@usa.net> wrote
in news:clcm-2002...@plethora.net:

> <clcm-2002...@plethora.net> divulged:
>>Laurent Deniau wrote:
>
> [snip all except the `offending' lines]
>
>>> char str[max_size + 1];
>
>>> printf("sum = %u, long string = %s, time = %gs\n",
>>> sum, argc != 3 ? "no" : "yes", CHRONO);
>
>>Your code is not legal C to start with:
>
>>[1] c:\c\malloc>gcc deniau.c
>>blah.c: In function `main':
>>blah.c:15: warning: ANSI C forbids variable-size array `str'
>
> your compiler implements this construct, but issues a warning only because
> it's syntax checker is not using current standard. this does not make the
> posted code illegal.

The OP stated he was using gcc 2.95.2.

From http://www.gnu.org/software/libc/CONFORMANCE:

(Note that -std=c99 is not available in GCC 2.95.2, and that no
version of GCC presently existing implements the full C99 standard.)

So I suppose that is why CBF said what he said.

-Daniel

CBFalconer

unread,
Mar 23, 2002, 8:45:45 AM3/23/02
to
those who know me have no need of my name wrote:
>
... snip ...

>
> chuck, why did you feel that it was necessary to claim that the code is
> illegal? i've seen you be clear about k&r vs c89/c90, so what made this
> code different enough that decided not to respond similarly (e.g., that it
> isn't valid c89 code)? were you just trying to say `if you are going to
> ask about potential bugs in gcc and/or glibc you should be posting to some
> other group', but still provide some sort of answer?

IIRC my main point was that the time performance did not make it a
bug. The 'illegal' was mainly a side point. I know that gcc
2.953 is not C99 conformant, and I was surprised that it accepted
it and executed it. Just sloppy.

Lawrence Kirby

unread,
Mar 23, 2002, 7:59:59 AM3/23/02
to
On 23 Mar, in article <clcm-2002...@plethora.net>
not-a-rea...@usa.net

"those who know me have no need of my name" wrote:

><clcm-2002...@plethora.net> divulged:
>>Laurent Deniau wrote:
>
>[snip all except the `offending' lines]
>
>>> char str[max_size + 1];
>
>>> printf("sum = %u, long string = %s, time = %gs\n",
>>> sum, argc != 3 ? "no" : "yes", CHRONO);
>
>>Your code is not legal C to start with:
>
>>[1] c:\c\malloc>gcc deniau.c
>>blah.c: In function `main':
>>blah.c:15: warning: ANSI C forbids variable-size array `str'
>
>your compiler implements this construct, but issues a warning only because
>it's syntax checker is not using current standard. this does not make the
>posted code illegal.
>
>>blah.c:31: warning: unsigned int format, size_t arg (arg 2)
>
>this is a proper criticism, the specifier should have been `%zu'. but,
>using `%u' doesn't make the program non-conformant

It means that it is not stictly conforming.

>nor does it mean the
>program will invoke undefined behavior unless the implementation makes
>size_t an alias for something other than unsigned int, so it's a fairly
>weak critique since that qualification wasn't included.

It means that the behaviour is undefined unless you know *for certain*
that size_t happens to be defined as unsigned int on the platform being
used. General C code that uses %u to convert as size_t operand is
just plain broken.

>so. since these two warnings from your compiler do not prove that the code
>is `illegal' it makes your claim that the code is `illegal' unfounded,

Right, the fact that a piece of code compiles with warnings proves nothing
about its correctness as C code. The warnings are simply suggestions
by the compiler that you recheck the relevant parts of the code.

--
-----------------------------------------
Lawrence Kirby | fr...@genesis.demon.co.uk
Wilts, England | 7073...@compuserve.com
-----------------------------------------

Dan Pop

unread,
Mar 25, 2002, 8:47:10 AM3/25/02
to
In <3C9C7A76...@yahoo.com> CBFalconer <cbfal...@yahoo.com> writes:

>those who know me have no need of my name wrote:
>>
>... snip ...
>>
>> chuck, why did you feel that it was necessary to claim that the code is
>> illegal? i've seen you be clear about k&r vs c89/c90, so what made this
>> code different enough that decided not to respond similarly (e.g., that it
>> isn't valid c89 code)? were you just trying to say `if you are going to
>> ask about potential bugs in gcc and/or glibc you should be posting to some
>> other group', but still provide some sort of answer?
>
>IIRC my main point was that the time performance did not make it a
>bug. The 'illegal' was mainly a side point. I know that gcc
>2.953 is not C99 conformant, and I was surprised that it accepted
>it and executed it. Just sloppy.

No surprise at all: VLAs have been a GNU C feature for a very long time.
The problem is that GNU C VLAs have different semantics from C99 VLAs
and, therefore, the support for C99 VLAs is marked as "broken" in the
gcc C99 conformance document: http://gcc.gnu.org/c99status.html

jacob navia

unread,
Mar 26, 2002, 1:00:58 PM3/26/02
to
Your code compiles and runs under lcc-wiN32 with no warnings. It is legal
C99 code.

Dan Pop

unread,
Mar 27, 2002, 7:38:23 AM3/27/02
to
In <a7qd0r$g3q$1...@wanadoo.fr> "jacob navia" <ja...@jacob.remcomp.fr> writes:

>Your code compiles and runs under lcc-wiN32 with no warnings. It is legal
>C99 code.

Does lcc-wiN32 claim to be a conforming C99 translator?

Nobody disputed the fact that the code was legal C99 code. But, by the
OP's admission, it was not compiled with a conforming C99 compiler.
Furthermore, the compiler actually used supported that C99 feature
in a non-conforming way.

Laurent Deniau

unread,
Mar 28, 2002, 11:29:33 PM3/28/02
to
Hans-Bernhard Broeker wrote:
>
> In comp.lang.c.moderated Laurent Deniau <Laurent...@cern.ch> wrote:
>
> > I was testing performance of sscanf vs fscanf and I found that the
> > behavior of sscanf is quite strange (gcc 2.95.2 i386).
>
> Please note that the compiler version has nothing to do with the
> performance of libc functions, unless those happen to be replaced by

Yes I know, except that in the glibc you have defines that optimize part of the
code when compiled with gcc instead of another compiler. That was just to fix
part of the context, nothing more.

> inline versions, by the compiler. Which almost certainly won't ever
> be the case for sscanf().

True, functions with va_list argument are never inlined for implementation
reasons.



> > The C99 norm doesn't say anything about this strange behavior, so I
> > conclude to a bug in the glibc...
>
> Definitely not a bug --- it can't be because the documentation doesn't

You will find my conclusions in my other post (not connected to this thread
because this post was not yet visible when I have answered few minutes after).

> say *anything* about the timing behaviour of sscanf. Where no
> definition of "correct" behaviour exists, no observed behaviour,
> however bad it may seem, can be incorrect, either.
>
> So this is a quality-of-implementation issue, at most, and it's not
> restricted to glibc. I've just found O(n^2) on another platform with
> a different libc, too.

True, but users expect that sscanf runs at least as fast as fscanf, which is not
the case on many implementation. For a string of about 1000 characters, sscanf
starts to be slower (on my computer) than fscanf and I don't find it to be an
extremely long string or very specific case. BTW, the norm says that "sscanf is
equivalent to fscanf" with no more precision.

> BTW: if I change the call from sscanf() to strtod(), the run time
> becomes O(n) in all platforms I cared to test. I.e. the difference
> must happen in sscanf() itself, and be caused by the length of the
> remainder of the input string following the matched 3.14.

No. As I said, the scanner itself is the same for both. So if you remove the
exceeding strlen() in sscanf (by providing the length yourself for example), you
get the same performance results. About strtod, the norm specifies that %f
behaves like strtod. So no matter with my problem otherwise it would have been a
non-conformity.

> [...]
> > {
> > size_t const max_item = strtoul(argc == 2 ? argv[1] : "10000", NULL, 0);
>
> A bug in your testing program, I think. You'll want to test for
> argc>=2 instead.

No, argc == 2 is enough for what I wanted to show. It is not intended to be a
reference for other programmers neither a course on C.

> > size_t const item_size = strlen(PI) + (argc != 3);
> > size_t const max_size = max_item * item_size;
> > char str[max_size + 1];
>
> For full ANSI C89 conformity, you'd better change this to use
> malloc()'ed storage. Variable-length arrays are a C99-ism.

The norm is actually C99 since it came out. C89 is obsolete. I know perfectly
that gcc doesn't support C99 at all, but if I can use some features that behaves
for a given context like the norm says, then I do it. Especially here where I
wanted something short like for an example (my real code take care about all
these stuffs). If I had used malloc as you suggest, I would have done it quickly
and dirty and therefore I would have got some posts telling me that malloc must
be cast (or not), malloc must be checked, memory must be freed, this action is
illegal C89, etc, etc...

Thanks for you answer,

a+, ld.

--
[ Laurent Deniau -- Scientific Computing & Data Analysis ]
[ CERN -- The European Laboratory for Particle Physics ]
[ Laurent...@cern.ch - http://cern.ch/Laurent.Deniau ]
[ -- One becomes old when dreams become regrets -- ]

Hans-Bernhard Broeker

unread,
Mar 29, 2002, 11:23:55 PM3/29/02
to
In comp.lang.c.moderated Laurent Deniau <Laurent...@cern.ch> wrote:
> Hans-Bernhard Broeker wrote:
>> Definitely not a bug --- it can't be because the documentation doesn't

> You will find my conclusions in my other post (not connected to this thread
> because this post was not yet visible when I have answered few minutes after).

I don't think any such other post is visible here, right now. OTOH, my post you're answering is a week old... what took you so long?


>> So this is a quality-of-implementation issue, at most, and it's not
>> restricted to glibc. I've just found O(n^2) on another platform with
>> a different libc, too.

> True, but users expect that sscanf runs at least as fast as fscanf, which is not
> the case on many implementation.

Users expect all kinds of strange things. Sometimes illogical or
outright impossible ones, too. Breaking expectations therefore cannot
constitute a bug, all on its own.

> BTW, the norm says that "sscanf is equivalent to fscanf" with no
> more precision.

And you may just have re-dicsovered one of the reasons why it wasn't
specified any more narrowly. It's none of the Standard's business to
restrict implementors' choices of how to implement the Standard
Library functions, so it cannot make any requirements about the speed
of any of those functions. The Standard only can specify what result
you get, but not when you get it.

>> BTW: if I change the call from sscanf() to strtod(), the run time
>> becomes O(n) in all platforms I cared to test. I.e. the difference
>> must happen in sscanf() itself, and be caused by the length of the
>> remainder of the input string following the matched 3.14.

> No. As I said, the scanner itself is the same for both. So if you remove the
> exceeding strlen() in sscanf (by providing the length yourself for example), you
> get the same performance results. About strtod, the norm specifies that %f
> behaves like strtod. So no matter with my problem otherwise it would have been a
> non-conformity.

Sorry, but I can't seem to follow your argument, here. What exactly is it you're
saying "No." to? And why?

What I was trying to point out was that, whatever the implementation's
reason for this admittedly unexpected large runtime of sscanf() really
is, it seems to not happen if I call strtod() instead, and it
definitely is triggered by the 'tail' of the string. This can be
demonstrated even more radically by calling

sscanf(longstring, "%f", &number);

and thus scanning the same string many times instead of letting it
grow shorter and shorter each iteration.

>> [...]
>> > {
>> > size_t const max_item = strtoul(argc == 2 ? argv[1] : "10000", NULL, 0);
>>
>> A bug in your testing program, I think. You'll want to test for
>> argc>=2 instead.

> No, argc == 2 is enough for what I wanted to show. It is not intended to be a
> reference for other programmers neither a course on C.

Sure, but this change does make the program a lot more flexible,
because you can test all combination of max_item and long_string
flags. E.g. the O(N^2) behaviour couldn't actually be demonstrated
with the program as given.

--
Hans-Bernhard Broeker (bro...@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Laurent Deniau

unread,
Apr 2, 2002, 7:33:12 AM4/2/02
to
Hans-Bernhard Broeker wrote:
>
> In comp.lang.c.moderated Laurent Deniau <Laurent...@cern.ch> wrote:
> > Hans-Bernhard Broeker wrote:
> >> Definitely not a bug --- it can't be because the documentation doesn't
>
> > You will find my conclusions in my other post (not connected to this thread
> > because this post was not yet visible when I have answered few minutes after).
>
> I don't think any such other post is visible here, right now. OTOH, my post you're answering is a week old... what took you so long?

I have a lot of problem with this newsgroup (message or thread lost, split or
rather late, some posts reported only after few replies, etc...). I don't know
why, but I suspect the cross post with comp.lang.c.moderated making confusion to
my server/browser (this is the only ng where I have such problems over the 16 I
participate.)

> >> So this is a quality-of-implementation issue, at most, and it's not
> >> restricted to glibc. I've just found O(n^2) on another platform with
> >> a different libc, too.
>
> > True, but users expect that sscanf runs at least as fast as fscanf, which is not
> > the case on many implementation.
>
> Users expect all kinds of strange things. Sometimes illogical or
> outright impossible ones, too. Breaking expectations therefore cannot
> constitute a bug, all on its own.
>
> > BTW, the norm says that "sscanf is equivalent to fscanf" with no
> > more precision.
>
> And you may just have re-dicsovered one of the reasons why it wasn't
> specified any more narrowly. It's none of the Standard's business to
> restrict implementors' choices of how to implement the Standard
> Library functions, so it cannot make any requirements about the speed
> of any of those functions. The Standard only can specify what result
> you get, but not when you get it.

This not true. On the principle, I agree with the fact that the norm does not
specify such thing. But when you use a function and no side effects are
specified, you are expecting something normal or common or logical (however you
call it). Special behavior should be described. For example, if snprintf where
implemented like this:

- open a tmpfile
- use fprintf to format the string
- use read to move the string from the file to the buffer
- close the tmp file (which destroy it at the same time)
- return the buffer

Nothing here is against the specifications of the norm simply because it says
the same as for sscanf (aka snprintf behaves like fprintf). But implementing
snprintf like this is rather a bad way, wasting user and system time plus a file
descriptor during the task. And once all these side effects are know by the
user, do you expect that snprintf will still be used in the same way (like for
converting a single float to a string)? If I discover such thing and if some
simple bench shows a big slow down comparing to expected behavior (like the one
of fprintf), I would probably not use snprintf anymore and write my own one if I
really need it.

So here, the case is the same but with sscanf vs fscanf.

> >> BTW: if I change the call from sscanf() to strtod(), the run time
> >> becomes O(n) in all platforms I cared to test. I.e. the difference
> >> must happen in sscanf() itself, and be caused by the length of the
> >> remainder of the input string following the matched 3.14.
>
> > No. As I said, the scanner itself is the same for both. So if you remove the
> > exceeding strlen() in sscanf (by providing the length yourself for example), you
> > get the same performance results. About strtod, the norm specifies that %f
> > behaves like strtod. So no matter with my problem otherwise it would have been a
> > non-conformity.
>
> Sorry, but I can't seem to follow your argument, here. What exactly is it you're
> saying "No." to? And why?

I mean that the scanner itself is the same in the case of glibc (and most lib
implementation). And considering the norm, what you say is obvious. The only
point that I wanted to underline is that the time lost is not related to the
remaining characters after conversion since the scanner of sscanf and strtod
both stop at the same place in the string, that is when %f does not match
anymore characters and this is the correct behavior for both fscanf and sscanf.
But sscanf computes *first* the length of the string and that is the problem I
mention.

> What I was trying to point out was that, whatever the implementation's
> reason for this admittedly unexpected large runtime of sscanf() really
> is, it seems to not happen if I call strtod() instead, and it
> definitely is triggered by the 'tail' of the string. This can be

Again, not the tail but the length. sscanf scanner has the correct behavior
since most of the time it is the scanner of fscanf...

So to summarize my question and answer, I was just mentionning that sscanf
implemented in term of fscanf makes it O(n^2) instead of O(n) for long string
because fscanf needs to know the buffer length and therefore sscanf has to
compute the string length. If sscanf gets its own scanner or if the relation
between sscanf and fscanf would have been different, then there is no problem to
make sscanf in O(n). That is why my first post (before I found the use of strlen
in sscanf) was expecting a bug in the glibc. And before reporting a possible
bug, I wanted to be sure that I hadn't a misunderstanding of the problem. In the
mid time I found what was the real problem which is definitely not a bug (as I
said in my second post) but an implementation problem (which does not rely on
this ng).

0 new messages