Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

case-insensitive qsort?

461 views
Skip to first unread message

DFS

unread,
Jun 22, 2017, 4:09:04 PM6/22/17
to

int compare (const void * a, const void * b)
{return strcmp((const char *)a, (const char *)b);}


Is there an easy way to convert the chars to all lower (or upper)?

I couldn't find one online.

Melzzzzz

unread,
Jun 22, 2017, 4:15:14 PM6/22/17
to
What about strcasecmp?

--
press any key to continue or any other to quit...

DFS

unread,
Jun 22, 2017, 4:28:29 PM6/22/17
to
On 6/22/2017 4:15 PM, Melzzzzz wrote:
> On 2017-06-22, DFS <nos...@dfs.com> wrote:
>>
>> int compare (const void * a, const void * b)
>> {return strcmp((const char *)a, (const char *)b);}
>>
>>
>> Is there an easy way to convert the chars to all lower (or upper)?
>>
>> I couldn't find one online.
>
> What about strcasecmp?


That's Windows only, as far as I can tell.


Melzzzzz

unread,
Jun 22, 2017, 4:34:59 PM6/22/17
to
Hm, I use it on Linux all the time...

James R. Kuyper

unread,
Jun 22, 2017, 4:37:02 PM6/22/17
to
The tolower() and toupper() functions can convert one character at a
time. You'll have to write your own code to loop over all the characters
in a string.

James R. Kuyper

unread,
Jun 22, 2017, 4:42:15 PM6/22/17
to
The man page on my work system says that the version of strcasecmp()
that is currently installed there conforms to 4.4BSD and POSIX.1-2001.


bartc

unread,
Jun 22, 2017, 4:46:15 PM6/22/17
to
There is tolower() and toupper(). It's easy to create a function based
on those to convert a whole string, if one isn't provided.

But doing this on each compare is time consuming, and you have to find
space to put the converted strings. Even doing it one-time on the whole
table is problematical, assuming you want to preserve case in the result.

Perhaps the simplest approach is to write your own version of strcasecmp
(call it something different), which does a character-at-a-time
conversion before compare, But it will be slower than a regular strcmp.

Something like this perhaps (not fully tested, and slow):

int strcmp_lc(char* s, char* t) {
char c,d;

while (1) {
c=tolower(*s++);
d=tolower(*t++);

if (c && d) {
if (c<d)
return -1;
else if (c>d) {
return 1;
}
}
else if (c){
return 1;
else if (d)
return -1;
else
return 0;
}
}

--
bartc

Rick C. Hodgin

unread,
Jun 22, 2017, 4:58:39 PM6/22/17
to
Write your own. I'm not sure what the logic is for comparing strings
of different lengths, but this will compare up to the length of the
minimum value, and use bartc's logic for the final bit:

int stricmp(const char *a, const char *b)
{
char ca, cb;

// Are we valid
if (a && b)
{
// Iterate for each character that matches each string
do
{
// Grab our characters in lower-case
ca = tolower(*a);
cb = tolower(*b);

if (ca == cb) {
// They're equal, no code here, just keep going

} else if (ca < cb) {
return(-1);

} else {
return(1);
}

} while (*a++ && *b++);
// If we get here, they matched so far

// Based on which one remains, the size
if (ca) return(1); // Greater than
else if (cb) return(-1); // Less than
else return(0); // Equal

// Signal equal
return(0);
}

// If we get here, invalid
return(-2);
}

Thank you,
Rick C. Hodgin

Richard Heathfield

unread,
Jun 22, 2017, 5:11:34 PM6/22/17
to
On 22/06/17 21:09, DFS wrote:
>
> int compare (const void * a, const void * b)
> {return strcmp((const char *)a, (const char *)b);}
>
>
> Is there an easy way to convert the chars to all lower (or upper)?

http://c-faq.com/~scs/cclass/int/sx10b.html

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

Rick C. Hodgin

unread,
Jun 22, 2017, 5:12:19 PM6/22/17
to
On Thursday, June 22, 2017 at 4:58:39 PM UTC-4, Rick C. Hodgin wrote:
> [snip]

A version without the extra pass:

int stricmp(const char *a, const char *b)
{
char ca, cb;

// Are we valid
if (a && b)
{
// Iterate for each character that matches each string
do
{
// Grab our characters in lower-case
ca = tolower(*a);
cb = tolower(*b);

if (ca == cb) {
// They're equal, no code here, just keep going

} else if (ca < cb) {
return(-1);

} else {
return(1);
}

// Move to next position in both pointers
++a;
++b;

} while (*a && *b);
// If we get here, they matched so far

// Based on final position, their size indicates
if (*a == *b) return(0); // Equal
else if (*a) return(1); // Greater than
else if (*b) return(-1); // Less than
else return(0); // Equal
}

// If we get here, invalid
return(-1);

jak

unread,
Jun 22, 2017, 5:17:35 PM6/22/17
to
stricmp

Keith Thompson

unread,
Jun 22, 2017, 5:41:43 PM6/22/17
to
stricmp is non-standard. strcasecmp is at least defined by POSIX, but
not by ISO C.

Note that, unless you make some simplifying assumptions like ASCII-only,
mapping a string to upper or lower case can be very complicated (think
about accented letters, the German Eszett (Unicode LATIN SMALL LETTER
SHARP S, 'ß'), and so on.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

James R. Kuyper

unread,
Jun 22, 2017, 5:55:55 PM6/22/17
to
On 06/22/2017 04:09 PM, DFS wrote:
>
> int compare (const void * a, const void * b)
> {return strcmp((const char *)a, (const char *)b);}

The casts are unnecessary - the exact same conversions will occur
implicitly if you leave them out, so long as there's a function
prototype for strcmp() in scope.

supe...@casperkitty.com

unread,
Jun 22, 2017, 6:01:15 PM6/22/17
to
On Thursday, June 22, 2017 at 4:41:43 PM UTC-5, Keith Thompson wrote:
> Note that, unless you make some simplifying assumptions like ASCII-only,
> mapping a string to upper or lower case can be very complicated (think
> about accented letters, the German Eszett (Unicode LATIN SMALL LETTER
> SHARP S, 'ß'), and so on.

Unless you make simplifying assumptions like ASCII-only, sorting strings in
"human-readable" order is apt to be a major headache whether or not you try
to merge upper and lower case. ASCII-only case-insensitive comparison
functions can be reasonably practical and efficient, but if support for non-
ASCII strings will be required I'd suggest transforming each string into an
int[] or long[] such that strings that should compare equal map to equal
sequences of numbers, and the first mismatch will indicate which string should
compare first. Otherwise the logic to handle all the weird sort cases on
every comparison would slow things down and make it more complicated.

jak

unread,
Jun 22, 2017, 6:07:15 PM6/22/17
to
Il 22/06/2017 23:41, Keith Thompson ha scritto:
> jak <ple...@nospam.tnx> writes:
>> Il 22/06/2017 22:09, DFS ha scritto:
>>> int compare (const void * a, const void * b)
>>> {return strcmp((const char *)a, (const char *)b);}
>>>
>>> Is there an easy way to convert the chars to all lower (or upper)?
>>>
>>> I couldn't find one online.
>>
>> stricmp
>
> stricmp is non-standard. strcasecmp is at least defined by POSIX, but
> not by ISO C.
>
> Note that, unless you make some simplifying assumptions like ASCII-only,
> mapping a string to upper or lower case can be very complicated (think
> about accented letters, the German Eszett (Unicode LATIN SMALL LETTER
> SHARP S, 'ß'), and so on.
>
Yes, but I suppose if OP could not find strcasecmp, he probably worked
in windows environment. I just suggested an obvious alternative :)

DFS

unread,
Jun 22, 2017, 6:19:26 PM6/22/17
to
All the examples I found online were like that.

This worked without a strcasecmp prototype:

int comparechar (const void * a, const void * b)
{return strcasecmp(a,b);}


TinyC compiler Windows


jak

unread,
Jun 22, 2017, 6:26:34 PM6/22/17
to
...and qsort work fine? I think you need write that in this way:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int cmp(const void *, const void *);

int main()
{
char *str[] = {
"pippo",
"pluto",
"paperino",
"pisolo",
"bruttolo",
"nannolo",
"scemolo",
"porcolo",
"paperino",
"paperone",
"minni"
};

int i, sz = sizeof(str) / sizeof(str[0]);

for(i = 0; i < sz; i++)
printf("%s\n", str[i]);

qsort(str, sz, sizeof(str[0]), cmp);
printf("---------------------\n");

for(i = 0; i < sz; i++)
printf("%s\n", str[i]);

return 0;
}

int cmp(const void *a, const void *b)
{
return stricmp(*(const char **)a, *(const char **)b);
}

James R. Kuyper

unread,
Jun 22, 2017, 6:43:26 PM6/22/17
to
On 06/22/2017 06:20 PM, DFS wrote:
> On 6/22/2017 5:55 PM, James R. Kuyper wrote:
>> On 06/22/2017 04:09 PM, DFS wrote:
>>>
>>> int compare (const void * a, const void * b)
>>> {return strcmp((const char *)a, (const char *)b);}
>>
>> The casts are unnecessary - the exact same conversions will occur
>> implicitly if you leave them out, so long as there's a function
>> prototype for strcmp() in scope.
>
>
> All the examples I found online were like that.

You'll find a lot of bad code online. It is, in general, a lot easier to
find than good code.

> This worked without a strcasecmp prototype:
>
> int comparechar (const void * a, const void * b)
> {return strcasecmp(a,b);}
>
>
> TinyC compiler Windows

In C90, if an unrecognized identifier was used in a context where a
function call was allowed, the identifier was assumed to be a function
returning 'int'. The default argument promotions (6.5.2.2p6) were
performed on the arguments, and it was simply assumed that the result of
those promotions was compatible with the types of the parameters in the
actual function definition. If both of those assumptions turned out to
be correct, such code had well-defined behavior. Otherwise, the behavior
was undefined. This is part of the feature called "implicit int".

I first learned C in 1979, and my teacher was already telling us that
relying on implicit int was a bad idea; I never did so deliberately. It
was removed in C99 - code like that is now a constraint violation. Which
is one of the reasons why you should compile your code using command
line options putting your compiler into C99 mode, if it has any.

The successful execution of your code also relies upon the fact that
char* and void* have exactly the same representation and alignment
requirements (6.2.5p28). Footnote 48 says: "The same representation and
alignment requirements are meant to imply interchangeability as
arguments to functions, return values from functions, and members of
unions.". In practice, that usually works out, as it did in your case.

However, footnotes are not normative text, and it's entirely possible
for an implementation to meet 6.2.5p28's requirements without making
them interchangeable. For example, the platform's calling conventions
could specify that void* arguments are passed in even-numbered
registers, while char* arguments are passed in odd-numbered registers. I
point this out, not because it's likely - there's no good reason I can
think of why any implementation would do anything like that. I point it
out because the standard permits it, and I would prefer the standard be
changed to mandate the interchangeability it mentions in footnote 48. It
also mentions interchangeability in footnotes 41 and 258, and I would
like to see interchangeability be mandated in those cases as well.

Richard Heathfield

unread,
Jun 22, 2017, 6:49:07 PM6/22/17
to
On 22/06/17 23:20, DFS wrote:
> On 6/22/2017 5:55 PM, James R. Kuyper wrote:
>> On 06/22/2017 04:09 PM, DFS wrote:
>>>
>>> int compare (const void * a, const void * b)
>>> {return strcmp((const char *)a, (const char *)b);}
>>
>> The casts are unnecessary - the exact same conversions will occur
>> implicitly if you leave them out, so long as there's a function
>> prototype for strcmp() in scope.
>
>
> All the examples I found online were like that.

<shrug> The casts are unnecessary, *even though* all the examples you
found online were like that.

> This worked without a strcasecmp prototype:
>
> int comparechar (const void * a, const void * b)
> {return strcasecmp(a,b);}

Even so, it's a good idea to provide a prototype for every function you
use. Otherwise, the compiler can't do its type-checking.

Ike Naar

unread,
Jun 22, 2017, 6:53:25 PM6/22/17
to
On 2017-06-22, bartc <b...@freeuk.com> wrote:
> Something like this perhaps (not fully tested, and slow):
>
> int strcmp_lc(char* s, char* t) {
> char c,d;
>
> while (1) {
> c=tolower(*s++);
> d=tolower(*t++);
>
> if (c && d) {
> if (c<d)
> return -1;
> else if (c>d) {
> return 1;
> }
> }
> else if (c){
> return 1;
> else if (d)
> return -1;
> else
> return 0;
> }
> }

Definitely not tested.
The curly braces don't match.

bartc

unread,
Jun 22, 2017, 7:07:36 PM6/22/17
to
C has a problem with curly braces. I had to add some to avoid a dangling
else problem.

And added more when I needed to add a printf between the first 'else if'
and 'return 1'. When printf and extra braces were removed (on the posted
code), one { was left behind. On the 'if (c){' line.

--
bartc

Ben Bacarisse

unread,
Jun 22, 2017, 7:56:19 PM6/22/17
to
Others have explained that it's not, but if you need to write your own I
think those on offer so far are not ideal.

First off, you need to use unsigned char values because char may be a
signed integer type and tolower is not defined for any negative
arguments other than EOF. Secondly, the examples I've seen seem
determined to avoid writing a simple loop.

Here's one possible version:

int caseless_compare(const void *p1, const void *p2)
{
const unsigned char *s1 = p1, *s2 = p2;
int d;
while ((d = tolower(*s1) - tolower(*s2)) == 0 && *s1) s1++, s2++;
return d;
}

It needs a little thought to check that it's right (here I am making
myself a hostage to fortune again!), especially the asymmetry of the
test for null bytes, but I find mentally verifying it simpler than other
versions I've seen so far.

--
Ben.

Robert Wessel

unread,
Jun 22, 2017, 7:56:27 PM6/22/17
to
On Thu, 22 Jun 2017 15:01:07 -0700 (PDT), supe...@casperkitty.com
wrote:
That's not nearly adequate to do a proper job.

Even ignoring locales, does "ant farm" sort before or after
"anteater"? It depends on the application.

In some locales, accented letter sort identically to their base
(French), sometimes with a tiebreaker rule if two things are identical
except for accents (and that rule varies - in French it's the last
accented character in a word which determines the tiebreaker).

In other cases accented characters sort independently as if it were a
different character. So in Czech Á sorts just after A, but in
Estonian Ä sorts after the W (along with a few other accented vowels).
The rules for upper and lower case letters may be different.

Things like ligatures (Æ) are treated differently (in English, you'd
sort it as the pair of characters, AE, in some other languages you'd
treat it as a single character, with some specific sort position).

The rules often differ based on application - German usually sorts
accented characters with the base character, but in things like phone
books, they're sorted as the base letter plus an E (IOW Ä sorts as
AE), so that Mr. Müller and Mr. Mueller appear together in the phone
book.

Then you have things like Mac and Mc in names, and dozens, if not
hundreds of other rules specific to the language, location and
application.

bartc

unread,
Jun 22, 2017, 8:24:40 PM6/22/17
to
On 23/06/2017 00:56, Ben Bacarisse wrote:
> DFS <nos...@dfs.com> writes:

>> That's Windows only, as far as I can tell.
>
> Others have explained that it's not, but if you need to write your own I
> think those on offer so far are not ideal.
>
> First off, you need to use unsigned char values because char may be a
> signed integer type and tolower is not defined for any negative
> arguments other than EOF. Secondly, the examples I've seen seem
> determined to avoid writing a simple loop.
>
> Here's one possible version:
>
> int caseless_compare(const void *p1, const void *p2)
> {
> const unsigned char *s1 = p1, *s2 = p2;
> int d;
> while ((d = tolower(*s1) - tolower(*s2)) == 0 && *s1) s1++, s2++;
> return d;
> }
>
> It needs a little thought to check that it's right (here I am making
> myself a hostage to fortune again!), especially the asymmetry of the
> test for null bytes, but I find mentally verifying it simpler than other
> versions I've seen so far.

It passes the same sorts of tests I did on mine. But the results are a
little funny as it returns -x,0,y instead of -1,0,1.

But while it's shorter, mine was a few % faster when compiled with
gcc-O3. And a few % slower with another compiler. So not much in it.

--
bartc

dfs

unread,
Jun 22, 2017, 8:36:07 PM6/22/17
to
old school! Coming up on 40 years... wow.



> and my teacher was already telling us that
> relying on implicit int was a bad idea; I never did so deliberately. It
> was removed in C99 - code like that is now a constraint violation. Which
> is one of the reasons why you should compile your code using command
> line options putting your compiler into C99 mode, if it has any.

As of May 2009:
"3.2 ISOC99 extensions
TCC implements many features of the new C standard: ISO C99. Currently
missing items are: complex and imaginary numbers and variable length
arrays."

https://bellard.org/tcc/tcc-doc.html#SEC7

There is no C99 option, but I found:

-Wimplicit-function-declaration'
Warn about implicit function declaration.

I also noticed tcc has a -Wall option, which I haven't been using but
will begin to (not sure why but I assumed all warnings was the default
on tcc).



I've booted over to Linux, and compiling my program thusly:

gcc pivot.c -Wall -std=c99 -lsqlite3 -o pivot

throws "warning: implicit declaration of function ‘strcasecmp’; did you
mean ‘strncmp’?"

Remove the -c99 switch and the warning goes away.

Put the -c99 switch back in, and put the prototype in place:
int strcasecmp(const char *s1, const char *s2);

and no warning. As you originally said.



> The successful execution of your code also relies upon the fact that
> char* and void* have exactly the same representation and alignment
> requirements (6.2.5p28). Footnote 48 says: "The same representation and
> alignment requirements are meant to imply interchangeability as
> arguments to functions, return values from functions, and members of
> unions.". In practice, that usually works out, as it did in your case.

Gotcha. Your answers are great, James. I appreciate the time you take
(I appreciate everyone's help, of course).

Unfortunately, I won't remember it all tomorrow :)

Keith Thompson

unread,
Jun 22, 2017, 9:15:52 PM6/22/17
to
dfs <nos...@dfs.com> writes:
[...]
> I've booted over to Linux, and compiling my program thusly:
>
> gcc pivot.c -Wall -std=c99 -lsqlite3 -o pivot
>
> throws "warning: implicit declaration of function "strcasecmp"; did you
> mean "strncmp"?"
>
> Remove the -c99 switch and the warning goes away.
>
> Put the -c99 switch back in, and put the prototype in place:
> int strcasecmp(const char *s1, const char *s2);
>
> and no warning. As you originally said.

If you use a library function, include the header that declares it.
Don't waste time trying to figure out how to get away with not doing so.
Don't rewrite the declaration yourself.

"man strcasecmp" says that strcasecmp is declared in <strings.h> (not to
be confused with the C standard header <string.h>). POSIX says the
same.

Ben Bacarisse

unread,
Jun 22, 2017, 9:30:53 PM6/22/17
to
bartc <b...@freeuk.com> writes:

> On 23/06/2017 00:56, Ben Bacarisse wrote:
>> DFS <nos...@dfs.com> writes:
>
>>> That's Windows only, as far as I can tell.
>>
>> Others have explained that it's not, but if you need to write your own I
>> think those on offer so far are not ideal.
>>
>> First off, you need to use unsigned char values because char may be a
>> signed integer type and tolower is not defined for any negative
>> arguments other than EOF. Secondly, the examples I've seen seem
>> determined to avoid writing a simple loop.
>>
>> Here's one possible version:
>>
>> int caseless_compare(const void *p1, const void *p2)

(This was written as a function to pass to qsort, hence the use of void *
rather than char *.)

>> {
>> const unsigned char *s1 = p1, *s2 = p2;
>> int d;
>> while ((d = tolower(*s1) - tolower(*s2)) == 0 && *s1) s1++, s2++;
>> return d;
>> }
>>
>> It needs a little thought to check that it's right (here I am making
>> myself a hostage to fortune again!), especially the asymmetry of the
>> test for null bytes, but I find mentally verifying it simpler than other
>> versions I've seen so far.
>
> It passes the same sorts of tests I did on mine. But the results are a
> little funny as it returns -x,0,y instead of -1,0,1.

I'm not sure what's funny about that. That's how strcmp et al. are
specified so it's reasonable to use that specification.

> But while it's shorter, mine was a few % faster when compiled with
> gcc-O3. And a few % slower with another compiler. So not much in it.

What is yours? Can you post the version you actually used since using
char is problematic and the {}s were peculiar (though I think I can
just correct that)?

--
Ben.

Rick C. Hodgin

unread,
Jun 22, 2017, 10:53:55 PM6/22/17
to
On Thursday, June 22, 2017 at 9:30:53 PM UTC-4, Ben Bacarisse wrote:
> bartc <b...@freeuk.com> writes:
> > But while it's shorter, mine was a few % faster when compiled with
> > gcc-O3. And a few % slower with another compiler. So not much in it.
>
> What is yours? Can you post the version you actually used since using
> char is problematic and the {}s were peculiar (though I think I can
> just correct that)?

I tested three algorithms that I call: Rick, Bart, Ben. My algorithm
was using the second one I posted above.

The timing is just the 3 or 4 most significant digits of the time re-
quired to iterate 50,000,000 times on five compares using these values:

rick sammi -- less than
rick rick -- equal
ricker rick -- greater than
rick ricker -- less than
rick alex -- greater than

The results below are actually 4.53 seconds, 5.52 seconds, and 4.50
seconds, etc. These are the results on these compilers:

32-bit Watcom C Compiler:
Release -- Rick = 453
Bart = 552
Ben = 450
Debug ---- Rick = 6124
Bart = 7005
Ben = 6517

64-bit Microsoft Visual Studio 2015:
Release -- Rick = 280
Bart = 335
Ben = 326
Debug ---- Rick = 842
Bart = 1033
Ben = 992

64-bit Microsoft Visual Studio 2010:
Release -- Rick = 275
Bart = 314
Ben = 340
Debug ---- Rick = 6501
Bart = 8350
Ben = 6885

32-bit Microsoft Visual Studio 2015:
Release -- Rick = 293
Bart = 386
Ben = 379
Debug ---- Rick = 1195
Bart = 1609
Ben = 1376

32-bit Microsoft Visual Studio 2010:
Release -- Rick = 338
Bart = 340
Ben = 344
Debug ---- Rick = 945
Bart = 1239
Ben = 1041

Tiny C Compiler Version 0.9.26 (x86-64 Win64)
Rick = 4301
Bart = 5521
Ben = 4537

-----
Overall average:
Release -- Rick = 328
Bart = 385
Ben = 368
Debug ---- Rick = 3121
Bart = 3847
Bin = 3362

FWIW, I'm surprised and amazed.

Pascal J. Bourguignon

unread,
Jun 22, 2017, 10:55:02 PM6/22/17
to
Well this is a good function, for 1980.

But when you're not back from the future, you need to deal with unicode.
This means, before sorting, you need to convert the vectors of octets
into vectors of wchar_t with mbsrtowcs(). Then you can use wcscoll()
to compare them according to the locale.

--
__Pascal J. Bourguignon
http://www.informatimago.com

Siri Cruise

unread,
Jun 22, 2017, 11:22:26 PM6/22/17
to
> > First off, you need to use unsigned char values because char may be a
> > signed integer type and tolower is not defined for any negative
> > arguments other than EOF. Secondly, the examples I've seen seem
> > determined to avoid writing a simple loop.
> >
> > Here's one possible version:
> >
> > int caseless_compare(const void *p1, const void *p2)
> > {
> > const unsigned char *s1 = p1, *s2 = p2;
> > int d;
> > while ((d = tolower(*s1) - tolower(*s2)) == 0 && *s1) s1++, s2++;
> > return d;
> > }


And extension is to use collation which is a function from a character code to a
sort key. Collation functions are usually implemented as array of integers whose
length is the maximum character code. Then the above becomes

(d = collation[*s1] - collation[*s2]) == 0 && *s1)

Where for an 8 bit character,

unsigned collation[256] = {

You can sort digits greater than letters, like EBCDIC with entries

['0'] = 1000, ['1'] = 1001, ...,

and case insensitive with

['a'] = 'a', ['b'] = 'b', ...,
['A'] = 'a', ['B'] = 'b', ...,

and equate characters that aren't treated as distinct with

[' '] = ' ', ['\t'] = ' ', ['\n'] = ' ', ...,

};

> Well this is a good function, for 1980.
>
> But when you're not back from the future, you need to deal with unicode.
> This means, before sorting, you need to convert the vectors of octets
> into vectors of wchar_t with mbsrtowcs(). Then you can use wcscoll()
> to compare them according to the locale.

You can create a collation array for unicode as well. You might also want to
normalise to NFC or NFD before sorting. Where I work has been slapped by
comparing NFC from the MacOSX dirents and NFD from PHP.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
Free the Amos Yee one. This post / \
Yeah, too bad about your so-called life. Ha-ha. insults Islam. Mohammed

bartc

unread,
Jun 23, 2017, 5:43:29 AM6/23/17
to
On 23/06/2017 03:54, Pascal J. Bourguignon wrote:
> Ben Bacarisse <ben.u...@bsb.me.uk> writes:

>> int caseless_compare(const void *p1, const void *p2)
>> {
>> const unsigned char *s1 = p1, *s2 = p2;
>> int d;
>> while ((d = tolower(*s1) - tolower(*s2)) == 0 && *s1) s1++, s2++;
>> return d;
>> }

> Well this is a good function, for 1980.
>
> But when you're not back from the future, you need to deal with unicode.

Does strcmp() deal with Unicode?

If not, then the function discussed here is simply a version of strcmp
that ignores case.

And case conversion is defined, by most of the examples I've seen, by
C's tolower() and toupper() functions.


--
bartc

bartc

unread,
Jun 23, 2017, 6:01:15 AM6/23/17
to
On 23/06/2017 03:53, Rick C. Hodgin wrote:
> On Thursday, June 22, 2017 at 9:30:53 PM UTC-4, Ben Bacarisse wrote:
>> bartc <b...@freeuk.com> writes:
>>> But while it's shorter, mine was a few % faster when compiled with
>>> gcc-O3. And a few % slower with another compiler. So not much in it.
>>
>> What is yours? Can you post the version you actually used since using
>> char is problematic and the {}s were peculiar (though I think I can
>> just correct that)?
>
> I tested three algorithms that I call: Rick, Bart, Ben. My algorithm
> was using the second one I posted above.
>
> The timing is just the 3 or 4 most significant digits of the time re-
> quired to iterate 50,000,000 times on five compares using these values:
>
> rick sammi -- less than
> rick rick -- equal
> ricker rick -- greater than
> rick ricker -- less than
> rick alex -- greater than

> Overall average:
> Release -- Rick = 328
> Bart = 385
> Ben = 368
> Debug ---- Rick = 3121
> Bart = 3847
> Bin = 3362
>
> FWIW, I'm surprised and amazed.


I made a small change on my version, and it was 2-3 times as fast as
Ben's using gcc-O3 (but only 15% faster with gcc-O0). It replaces
tolower() with a lookup table:

char tolc[256];

void init_tolc(void){
int i;
for (i=0; i<256; ++i) tolc[i]=tolower(i);
}

but then needs initialisation before first use:

init_tolc();

My compare function changes to:

int strcmp_lc(char* s, char* t) {
char c,d;

while (1) {
c=tolc[*s++];
d=tolc[*t++];
if (c && d) {
if (c<d)
return -1;
else if (c>d) {
return 1;
}
}
else if (c)
return 1;
else if (d)
return -1;
else
return 0;
}
}

(The one simple test it does (but 100M times) is comparing "bartholomew"
with "bartolomeo", which shows we're both equally devoid of imagination...)

Of course, the fair thing to do would be to apply this tweak to Ben's
version. Which then promptly became twice the speed of mine! Which shows
that tolower() probably had significant overheads on all versions.


--
bartc

Ben Bacarisse

unread,
Jun 23, 2017, 7:22:52 AM6/23/17
to
More accurately the OP would need to. I was posting about replacing
strcasecmp so *I* don't need to deal with Unicode.

> This means, before sorting, you need to convert the vectors of octets
> into vectors of wchar_t with mbsrtowcs(). Then you can use wcscoll()
> to compare them according to the locale.

Does wcscoll always do case-insensitive compares? Not, I think, in the
"C" locale. Of course, we don't really know what the OP wants here --
controlling the order by using the locale might be exactly right or it
could be entirely wrong.

--
Ben.

Rick C. Hodgin

unread,
Jun 23, 2017, 7:24:01 AM6/23/17
to
Bart wrote:
> if (c<d)
> return -1;
> else if (c>d)
> return 1;

You are doing two tests on every compare. Try reversing the
test so that if they're equal, it only does one test:

if (c ==d) { }
else if (c<d)
return -1;
else
return 1;

Should be an observable speedup.

David Brown

unread,
Jun 23, 2017, 8:07:48 AM6/23/17
to
C does not have a problem with curly braces - some C /programmers/ make
problems for themselves with curly braces.

There are several ways to make clear, consistent rules about braces on
"if" (and "while", "for", etc.) statements in a way that makes it easy
to get it right. Unless you have reason to use something else (for
consistency with existing code, for example), you can't go wrong with
the "one true brace style".

bartc

unread,
Jun 23, 2017, 8:19:21 AM6/23/17
to
But it's still doing two tests? One of them is the ==. Still, it made
about a 1% difference.

However, the original was thrown together as an example for the OP, and
designed to get around the lack of any existing function. It wasn't
meant to be fast.

Looking at the code again, I don't need to treat 0 characters
differently, so it could be reduced to:

int strcmp_lc(char* s, char* t) {
char c,d;

do {
c=tolc[*s++];
d=tolc[*t++];

if (c>d) return 1;
if (c<d) return -1;
} while (c);
return 0;
}

This is nearly 50% faster. But this is still bothering to return -1 and
1. Ben said this wasn't necessary, but then working along those lines,
and making use of c-d, the method just eventually morphs into his
version. So I'll keep my version as above as it's clearer.

--
bartc

Rick C. Hodgin

unread,
Jun 23, 2017, 8:24:08 AM6/23/17
to
I tested this on my computer using 64-bit Visual Studio 2015, and
your algorithm is faster by about 2% !!, which really surprised me.
Trying the same in 32-bit code and it was about 3% slower, which
is what I would've expected.

In 64-bit Visual Studio 2010, it was within 1% up and down on
repeated tests, and the same for 32-bit code. No significant
variation using either form.

And I must note, Windows 10 is a horrible platform for benchmarking.
I'm seeing variations of 5% to 10% on repeated tests with no other
load on the system except whatever Windows is doing in the background
of its own accord, so I must take the average of a larger sample to
get accurate results. :-(

bartc

unread,
Jun 23, 2017, 8:24:37 AM6/23/17
to
Yes it has. You can choose to use {} everywhere to reduce the problem of
dangling elses, inserting extra statements into one-statement bodies, or
avoiding the problem of an extra or missing { being hidden because
because of a missing of extra { elsewhere.

But then the {} themselves become a problem with differing placement
styles, extra clutter, and making an already too-long function look even
longer.

(And using {} everywhere doesn't help with maintaining or debugging
someone else's code that doesn't use that style.)


--
bartc

Kenny McCormack

unread,
Jun 23, 2017, 8:33:57 AM6/23/17
to
In article <7bd4661e-fa14-46a4...@googlegroups.com>,
Rick C. Hodgin <rick.c...@gmail.com> wrote:
...
>And I must note, Windows 10 is a horrible platform.

Full stop.

--
They say compassion is a virtue, but I don't have the time!

- David Byrne -

David Brown

unread,
Jun 23, 2017, 8:48:35 AM6/23/17
to
That may give an observable speedup in cases where there are many
matches - and an observable slowdown in cases where the original order
is already sorted. Re-arrangements like that are going to depend on the
type of data used in testing.

You can also get a significant speed up by recording a two-dimensional
cache of the comparison between two lowered characters, rather than just
caching the tolower() values. (That will depend on things like cache
size and speed.)

A very simple speedup of your code can be achieved by making the cache
and the functions "static".

And if you are using this for larger comparisons, you might want to make
use of SIMD instructions in some way.

So the "best" or "fastest" method will depend heavily on the data.

David Brown

unread,
Jun 23, 2017, 8:52:30 AM6/23/17
to
On 23/06/17 14:24, bartc wrote:
> On 23/06/2017 13:07, David Brown wrote:
>> On 23/06/17 01:07, bartc wrote:
>
>>> C has a problem with curly braces. I had to add some to avoid a dangling
>>> else problem.
>>>
>>> And added more when I needed to add a printf between the first 'else if'
>>> and 'return 1'. When printf and extra braces were removed (on the posted
>>> code), one { was left behind. On the 'if (c){' line.
>>>
>>
>> C does not have a problem with curly braces - some C /programmers/ make
>> problems for themselves with curly braces.
>>
>> There are several ways to make clear, consistent rules about braces on
>> "if" (and "while", "for", etc.) statements in a way that makes it easy
>> to get it right. Unless you have reason to use something else (for
>> consistency with existing code, for example), you can't go wrong with
>> the "one true brace style".
>
> Yes it has. You can choose to use {} everywhere to reduce the problem of
> dangling elses, inserting extra statements into one-statement bodies, or
> avoiding the problem of an extra or missing { being hidden because
> because of a missing of extra { elsewhere.

Using "extra" brackets like this does not /reduce/ the problem - it
/eliminates/ the problem.

>
> But then the {} themselves become a problem with differing placement
> styles, extra clutter, and making an already too-long function look even
> longer.
>

The brackets are not a problem unless your functions are too big, or too
deeply nested - in which case you already have problems.

> (And using {} everywhere doesn't help with maintaining or debugging
> someone else's code that doesn't use that style.)
>

Consistently bad style may be clearer than adding new code with a better
style. Working with other people's code always has its challenges.

David Brown

unread,
Jun 23, 2017, 9:09:53 AM6/23/17
to
In Norwegian, the letter "å" or "Å" is written like an "a" (or "A") with
a circle above it. But it counts as an independent letter, and comes
last in the alphabet. Sometimes, however, it is written as "aa" or "Aa"
- either because you are transcribing it into basic Latin letters (or
ASCII), or just because that is the way a particular word or name is
usually written. So the name "Aase" would be sorted at the end of a
list alongside "Åse". "Aaron", on the other hand, would go at the start
of the list because that is two "a" letters rather than a single "å"
letter written as "aa".

>
> Things like ligatures (Æ) are treated differently (in English, you'd
> sort it as the pair of characters, AE, in some other languages you'd
> treat it as a single character, with some specific sort position).
>
> The rules often differ based on application - German usually sorts
> accented characters with the base character, but in things like phone
> books, they're sorted as the base letter plus an E (IOW Ä sorts as
> AE), so that Mr. Müller and Mr. Mueller appear together in the phone
> book.
>
> Then you have things like Mac and Mc in names, and dozens, if not
> hundreds of other rules specific to the language, location and
> application.
>

Turkish is another fun one for capitals - the capital version of "i" is
"İ" (with a dot above it), and the lower-case version of "I" is "ı"
(with no dot).

And once you've got the Latin scripts sorted out, you can move on
through Arabic towards Chinese...

Ben Bacarisse

unread,
Jun 23, 2017, 9:24:03 AM6/23/17
to
David Brown <david...@hesbynett.no> writes:

> On 23/06/17 14:24, bartc wrote:
>> On 23/06/2017 13:07, David Brown wrote:
>>> On 23/06/17 01:07, bartc wrote:
>>
>>>> C has a problem with curly braces. I had to add some to avoid a dangling
>>>> else problem.
<snip>
>>> C does not have a problem with curly braces - some C /programmers/ make
>>> problems for themselves with curly braces.
<snip>
>> Yes it has. You can choose to use {} everywhere to reduce the problem of
>> dangling elses,
<snip>
> Using "extra" brackets like this does not /reduce/ the problem - it
> /eliminates/ the problem.

I don't think there *is* a problem of "dangling elses" any more than
there is a problem of dangling operators.

Every C programmer knows that in 1 + 2 * 3 the * is associated with the
2. If you don't want that you write (1 + 2) * 3 but no one calls the *
a dangling operator. Likewise, every C programmer knows that an else,
in the absence of any {}s, is associated with the nearest if. What's
more, every C editor knows that too and will show you that by
indentation.

<snip>
--
Ben.

bartc

unread,
Jun 23, 2017, 9:36:47 AM6/23/17
to
On 23/06/2017 14:23, Ben Bacarisse wrote:
> David Brown <david...@hesbynett.no> writes:
>
>> On 23/06/17 14:24, bartc wrote:
>>> On 23/06/2017 13:07, David Brown wrote:
>>>> On 23/06/17 01:07, bartc wrote:
>>>
>>>>> C has a problem with curly braces. I had to add some to avoid a dangling
>>>>> else problem.
> <snip>
>>>> C does not have a problem with curly braces - some C /programmers/ make
>>>> problems for themselves with curly braces.
> <snip>
>>> Yes it has. You can choose to use {} everywhere to reduce the problem of
>>> dangling elses,
> <snip>
>> Using "extra" brackets like this does not /reduce/ the problem - it
>> /eliminates/ the problem.
>
> I don't think there *is* a problem of "dangling elses" any more than
> there is a problem of dangling operators.

Wikpedia seems to have an article about it. So some people think it is a
problem.

> Every C programmer knows that in 1 + 2 * 3 the * is associated with the
> 2. If you don't want that you write (1 + 2) * 3 but no one calls the *
> a dangling operator.

That's only vaguely connected. But expressions don't usually have a
layout superimposed on them that may be at odds with the actual syntax.

The nearest might be writing 1+2 * 3 with the spacing suggesting the 1+2
is done first.

But with if statements, it's more serious; start with this:

if (cond1)
if (cond2) stmt1;

then much later, someone decides the outer if needs an else clause:

if (cond1)
if (cond2) stmt1;
else
stmt2;

Or you start with:

if (cond1)
if (cond2) stmt1; else stmt2;
else
stmt3;

and someone decides they don't need the 'else stmt2;' and deletes it,
not realising this now invisibly rearranges all the logic following.

> Likewise, every C programmer knows that an else,
> in the absence of any {}s, is associated with the nearest if. What's
> more, every C editor knows that too and will show you that by
> indentation.

The language shouldn't have to depend on smart editors.

--
bartc

Ben Bacarisse

unread,
Jun 23, 2017, 9:40:05 AM6/23/17
to
David Brown <david...@hesbynett.no> writes:

> On 23/06/17 01:57, Robert Wessel wrote:
>> On Thu, 22 Jun 2017 15:01:07 -0700 (PDT), supe...@casperkitty.com
>> wrote:
>>
>>> On Thursday, June 22, 2017 at 4:41:43 PM UTC-5, Keith Thompson wrote:
>>>> Note that, unless you make some simplifying assumptions like ASCII-only,
>>>> mapping a string to upper or lower case can be very complicated (think
>>>> about accented letters, the German Eszett (Unicode LATIN SMALL LETTER
>>>> SHARP S, 'ß'), and so on.
>>>
>>> Unless you make simplifying assumptions like ASCII-only, sorting strings in
>>> "human-readable" order is apt to be a major headache whether or not you try
>>> to merge upper and lower case. ASCII-only case-insensitive comparison
>>> functions can be reasonably practical and efficient, but if support for non-
>>> ASCII strings will be required I'd suggest transforming each string into an
>>> int[] or long[] such that strings that should compare equal map to equal
>>> sequences of numbers, and the first mismatch will indicate which string should
>>> compare first. Otherwise the logic to handle all the weird sort cases on
>>> every comparison would slow things down and make it more complicated.

And in Spanish, before 2010, you had ch, ñ, ll all considered as
separate letters (ch collating just after c, ñ just after n and ll just
after l). But then in 2010 the Real Academia Española (which, like the
Académie Française in France, oversees the language) decided that ch and
ll would no longer be separate letters. I suspect some of motivation
would have been the fact that it's a pain to do on a computer and was a
rule more often ignored than observed in that context.

What's more, at some time prior to that change (1994?) it was decided
that CH and LL should remain letters but be collated inside the C and L
categories but given a separate heading (if that's relevant) to show the
letter status.

It seems likely that these examples are only the tip of a huge iceberg
of issue to do with accurate linguistic sorting. It's a hard problem
based on many rules but those rules even change over time.

<snip>
--
Ben.

Richard Heathfield

unread,
Jun 23, 2017, 9:41:48 AM6/23/17
to
On 23/06/17 13:48, David Brown wrote:
> On 23/06/17 13:23, Rick C. Hodgin wrote:
>> Bart wrote:
>>> if (c<d)
>>> return -1;
>>> else if (c>d)
>>> return 1;
>>
>> You are doing two tests on every compare. Try reversing the
>> test so that if they're equal, it only does one test:
>>
>> if (c ==d) { }
>> else if (c<d)
>> return -1;
>> else
>> return 1;
>>
>> Should be an observable speedup.
>>
>
> That may give an observable speedup in cases where there are many
> matches - and an observable slowdown in cases where the original order
> is already sorted. Re-arrangements like that are going to depend on the
> type of data used in testing.

It's a premature optimisation, in fact. I prefer to opt for the simple
elegance of:

return (c > d) - (c < d);

unless I have compelling reasons not to. Or, if I don't want to return
straight away in the event of their being equal because I want to make
further comparisons, I just catch the result in an object:

diff = (c > d) - (c < d);

and diff can now be used as part of the loop control.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

David Brown

unread,
Jun 23, 2017, 9:52:13 AM6/23/17
to
If you have an "else" clause, use brackets. Then there is /no/ problem.

It is true that some people don't write code in a way that is clear and
maintainable. Such people need to be educated, re-educated, or find
themselves a different job. This is not rocket science. It is also not
a problem with C - /every/ language has ways to write code that is not
clear, and appears to do something different from what it actually does.
So if you are going to program in C, you need to use a style that
minimises the difference between what the code looks like, and what the
code does - just as you do in any other language.

>
>> Likewise, every C programmer knows that an else,
>> in the absence of any {}s, is associated with the nearest if. What's
>> more, every C editor knows that too and will show you that by
>> indentation.
>
> The language shouldn't have to depend on smart editors.
>

It doesn't depend on smart editors - it depends on smart programmers.
But smart programmers use smart tools to make their lives easier.


David Brown

unread,
Jun 23, 2017, 9:56:35 AM6/23/17
to
Does that mean you would have a heading for C covering Ca up to Cg, then
a heading for CH covering Cha up to Chz, and then go back to a second C
heading for Ci up to Cz ?

>
> It seems likely that these examples are only the tip of a huge iceberg
> of issue to do with accurate linguistic sorting. It's a hard problem
> based on many rules but those rules even change over time.
>

Yes - but it keeps people in jobs :-)

And in some ways it has got simpler - no one prints phone books any
more, so there is no need to worry about ordering there!




Ben Bacarisse

unread,
Jun 23, 2017, 10:06:24 AM6/23/17
to
Yes, that was the recommendation between about 1994 and 2010. Before
that all ch... words sorted after cz... words, and after 2010, I think
you no longer need a heading for ch in the middle of the c words because
ch is no longer a letter.

--
Ben.

Rick C. Hodgin

unread,
Jun 23, 2017, 10:23:42 AM6/23/17
to
On Friday, June 23, 2017 at 9:41:48 AM UTC-4, Richard Heathfield wrote:
> > On 23/06/17 13:23, Rick C. Hodgin wrote:
> >> Bart wrote:
> >>> if (c<d)
> >>> return -1;
> >>> else if (c>d)
> >>> return 1;
> >>
> >> You are doing two tests on every compare. Try reversing the
> >> test so that if they're equal, it only does one test:
> >>
> >> if (c ==d) { }
> >> else if (c<d)
> >> return -1;
> >> else
> >> return 1;
> >>
> >> Should be an observable speedup.
>
> It's a premature optimisation, in fact. I prefer to opt for the
> simple elegance of:
>
> return (c > d) - (c < d);
>
> unless I have compelling reasons not to.

You're still inside the loop at that point. So long as it's match-
ing, you want to continue looping and not return.

> Or, if I don't want to return
> straight away in the event of their being equal because I want to make
> further comparisons, I just catch the result in an object:
>
> diff = (c > d) - (c < d);
>
> and diff can now be used as part of the loop control.

Now you're doing two Bart's two compares, a new subtract, a new
variable store, and another new compare on the difference.

Try it. The code used is above. Publish the results. Without
taking the time to see what assembly is generated for each by
the various optimizers, I'm honestly not sure which way would
be fastest. I was very surprised Bart's algorithm was faster
in any case.

-----
Here's a faster algorithm if you can guarantee that you'll only
be doing a comparison on alpha text:

// For only alpha text, no numbers, symbols, or anything else
int stricmp_alpha_only(const char *a, const char *b)
{
char ca, cb;

// Iterate for each character that matches each string
do
{
// Grab our characters in lower-case
ca = *a | 0x40;
cb = *b | 0x40;

// See where we are
if (ca == cb) { /* still going */ }
else if (ca < cb) return(-1);
else return(1);

// Increase both pointers
++a;
++b;

} while (*a && *b);
// If we get here, they matched so far

// Based on which one remains, the size
if (*a == *b) return(0); // Equal
else if (*a) return(1); // Greater than
else return(-1); // Less than
}

64-bit Visual Studio 2015 Release results:
Release -- Rick = 280 // Using tolower()
Bart = 335 // Using tolower()
Ben = 326 // Using tolower()
Rick2 = 67 // Using | 0x40;

Big speedup potential if you know your data well.

Malcolm McLean

unread,
Jun 23, 2017, 10:35:54 AM6/23/17
to
Conventionally Mc. is sorted with the "Mac"s. So I was before Mannall in the
register at school.

Robert Wessel

unread,
Jun 23, 2017, 11:03:36 AM6/23/17
to
For whatever reason, I still get two (different) phonebooks at home
each year. My main usage is tossing last year's edition in the
recycle bin when the new one arrives, and replacing it with the new
one on the bottom bookshelf.

To be fair, at least one of them includes a bunch of coupons, and so
really seems more a vehicle for advertising than a primary phonebook
(although that's always been basically true of the "yellow" pages).

bartc

unread,
Jun 23, 2017, 11:44:51 AM6/23/17
to
On 23/06/2017 15:23, Rick C. Hodgin wrote:

> Here's a faster algorithm if you can guarantee that you'll only
> be doing a comparison on alpha text:

> 64-bit Visual Studio 2015 Release results:
> Release -- Rick = 280 // Using tolower()
> Bart = 335 // Using tolower()
> Ben = 326 // Using tolower()
> Rick2 = 67 // Using | 0x40;
>
> Big speedup potential if you know your data well.

I get these, based on comparing "bartholomew" with "bartolomeo" 100
million times:

Ben: 3090 ms (using tolower())
Ben: 530 ms (modified to use lookup table)
Bart: 720 ms (shorter version and using lookup table)
Rick: 3070 ms (second version posted with tolower())
Rick/alpha: 750 ms


--
bartc

Rick C. Hodgin

unread,
Jun 23, 2017, 11:51:13 AM6/23/17
to
What do you get with Ben and Bart when you use the | 0x40; instead
of the lookup table? And what CPU are you on? I'm using an Intel
Core i7 7700HQ at 2.80 GHz.

I'm curious if the speed changes on AMD CPUs.

bartc

unread,
Jun 23, 2017, 12:13:37 PM6/23/17
to
Then I get:

Ben: 860 ms
Bart: 580 ms

Which is a good result from my point of view...

Machine is Intel at 3.2GHz, dual-core something or other.

However, I'm not sure if the alpha-only version is that useful. Text
with mixed case is likely to also have things like spaces that will
screw things up.

And for comparing identifier names, file names etc which are
case-insensitive, you would just convert to a single case. Also they
would be full of underscores, $ signs and digits.

--
bartc

Rick C. Hodgin

unread,
Jun 23, 2017, 12:43:20 PM6/23/17
to
I had to modify the test because the optimizer was now seeing that
the calls to the compare functions didn't actually do anything and
then removed them. :-) So, this one now adds the return result to
a cumulative variable which is later printf()'d.

Under those new test conditions, I see these relative scores with
the lookup table:

64-bit Visual Studio 2015
Release -- Bart = 593
Ben = 458
Rick = 572

32-bit Visual Studio 2015
Release -- Bart = 619
Ben = 519
Rick = 897

64-bit Visual Studio 2010
Release -- Bart = 560
Ben = 476
Rick = 574

32-bit Visual Studio 2010
Release -- Bart = 613
Ben = 533
Rick = 1793

> Machine is Intel at 3.2GHz, dual-core something or other.

It's really interesting to see how subtle changes in algorithms
significantly alter the performance through the optimizer.

I'll write a custom assembly version to see if I can do anything
better manually. It's been 15+ years since I've done regular
assembly development, so probably not. :-)

> However, I'm not sure if the alpha-only version is that useful. Text
> with mixed case is likely to also have things like spaces that will
> screw things up.
>
> And for comparing identifier names, file names etc which are
> case-insensitive, you would just convert to a single case. Also they
> would be full of underscores, $ signs and digits.

True. It would only work on text, which you could have in a hash,
for example.

Rick C. Hodgin

unread,
Jun 23, 2017, 4:19:22 PM6/23/17
to
On Friday, June 23, 2017 at 12:43:20 PM UTC-4, Rick C. Hodgin wrote:
> I'll write a custom assembly version to see if I can do anything
> better manually. It's been 15+ years since I've done regular
> assembly development, so probably not. :-)

I've written the assembly version, but in the process discovered
the compiler is "cheating" by the static text. Because it knows
the data, it's using known-at-compile-time info to alter which
algorithms are used for processing each comparison. The results
are still correct, but it doesn't actually do the compute algorithm.

I'm going to write a version that reads in two text files and walks
them using a structure like this

file1: 4rick4rick6ricker4rick4rick0
file2: 5sammi4rick4rick6ricker4alex0

I can alter the content then as needed to try various tests, but
this way the compiler won't know what data is there in advance
and will have to code for the actual algorithm.

Ian Collins

unread,
Jun 23, 2017, 4:47:04 PM6/23/17
to
On 06/24/17 12:24 AM, bartc wrote:
> On 23/06/2017 13:07, David Brown wrote:
>> On 23/06/17 01:07, bartc wrote:
>
>>> C has a problem with curly braces. I had to add some to avoid a dangling
>>> else problem.
>>>
>>> And added more when I needed to add a printf between the first 'else if'
>>> and 'return 1'. When printf and extra braces were removed (on the posted
>>> code), one { was left behind. On the 'if (c){' line.
>>>
>>
>> C does not have a problem with curly braces - some C /programmers/ make
>> problems for themselves with curly braces.
>>
>> There are several ways to make clear, consistent rules about braces on
>> "if" (and "while", "for", etc.) statements in a way that makes it easy
>> to get it right. Unless you have reason to use something else (for
>> consistency with existing code, for example), you can't go wrong with
>> the "one true brace style".
>
> Yes it has. You can choose to use {} everywhere to reduce the problem of
> dangling elses, inserting extra statements into one-statement bodies, or
> avoiding the problem of an extra or missing { being hidden because
> because of a missing of extra { elsewhere.

Well as David said, that eliminates the problem. Most modern editors
will highlight missing or superfluous braces. Even if they didn't, your
unit test would fail, wouldn't they?

--
Ian

bartc

unread,
Jun 23, 2017, 6:04:56 PM6/23/17
to
On 23/06/2017 21:46, Ian Collins wrote:
> On 06/24/17 12:24 AM, bartc wrote:
>> On 23/06/2017 13:07, David Brown wrote:
>>> On 23/06/17 01:07, bartc wrote:
>>
>>>> C has a problem with curly braces. I had to add some to avoid a
>>>> dangling
>>>> else problem.
>>>>
>>>> And added more when I needed to add a printf between the first 'else
>>>> if'
>>>> and 'return 1'. When printf and extra braces were removed (on the
>>>> posted
>>>> code), one { was left behind. On the 'if (c){' line.
>>>>
>>>
>>> C does not have a problem with curly braces - some C /programmers/ make
>>> problems for themselves with curly braces.
>>>
>>> There are several ways to make clear, consistent rules about braces on
>>> "if" (and "while", "for", etc.) statements in a way that makes it easy
>>> to get it right. Unless you have reason to use something else (for
>>> consistency with existing code, for example), you can't go wrong with
>>> the "one true brace style".
>>
>> Yes it has. You can choose to use {} everywhere to reduce the problem of
>> dangling elses, inserting extra statements into one-statement bodies, or
>> avoiding the problem of an extra or missing { being hidden because
>> because of a missing of extra { elsewhere.
>
> Well as David said, that eliminates the problem.

Does it? Here's the original code with extra braces:

while (1) {
c=tolower(*s++);
d=tolower(*t++);

if (c && d) {
if (c<d) {
return -1;
} else if (c>d) {
return 1;
}
} else if (c) {
return 1;
} else if (d) {
return -1;
} else {
return 0;
}
}

Or maybe, it should be like this if they really should be used
everywhere, including between 'else' and 'if':

while (1) {
c=tolower(*s++);
d=tolower(*t++);

if (c && d) {
if (c<d) {
return -1;
} else {
if (c>d) {
return 1;
}
}
}
} else {
if (c) {
return 1;
}
}
} else if (d) {
return -1;
} else {
return 0;
}
}

Except that now I'm drowning in the bloody things, and can't easily tell
if the above is even right. I could change the style so the elses are
lined up with the corresponding ifs, but it would be even more spread out.

> Most modern editors
> will highlight missing or superfluous braces.

I don't use a modern editor. But I've just tried Notepad++ and SciTe,
set to 'C' language, and neither tell me about missing braces causing a
dangling else problem. Probably because in C, they are still optional.

> Even if they didn't, your unit test would fail, wouldn't they?

Why let it get to that point? And part of the reason for using a human
readable language is to impart an algorithm to someone else.


--
bartc

Richard Heathfield

unread,
Jun 23, 2017, 7:05:11 PM6/23/17
to
On 23/06/17 21:46, Ian Collins wrote:
> Well as David said, that eliminates the problem. Most modern editors
> will highlight missing or superfluous braces. Even if they didn't, your
> unit test would fail, wouldn't they?

Unit Test? It compiles! PASS.

Integration Test? It links! PASS.

User Acceptance Test? It runs! PASS.

Ian Collins

unread,
Jun 23, 2017, 7:11:23 PM6/23/17
to
<snip>

Yes...

> Or maybe, it should be like this if they really should be used
> everywhere, including between 'else' and 'if':
>
> while (1) {
> c=tolower(*s++);
> d=tolower(*t++);
>
> if (c && d) {
> if (c<d) {
> return -1;
> } else {
> if (c>d) {
> return 1;
> }
> }
> }
> } else {
> if (c) {
> return 1;
> }
> }
> } else if (d) {
> return -1;
> } else {
> return 0;
> }
> }
>
> Except that now I'm drowning in the bloody things, and can't easily tell
> if the above is even right. I could change the style so the elses are
> lined up with the corresponding ifs, but it would be even more spread out.

Chose a style you are happy with and stick to it.

> > Most modern editors
> > will highlight missing or superfluous braces.
>
> I don't use a modern editor.

There you go, if you chose to live in the dark, don't complain when you
bump into things.

> > Even if they didn't, your unit test would fail, wouldn't they?
>
> Why let it get to that point?

Why is it "that point"?

> And part of the reason for using a human
> readable language is to impart an algorithm to someone else.

Which is why teams and projects have coding standards.

--
Ian

Ian Collins

unread,
Jun 23, 2017, 7:11:39 PM6/23/17
to
On 06/24/17 11:05 AM, Richard Heathfield wrote:
> On 23/06/17 21:46, Ian Collins wrote:
>> Well as David said, that eliminates the problem. Most modern editors
>> will highlight missing or superfluous braces. Even if they didn't, your
>> unit test would fail, wouldn't they?
>
> Unit Test? It compiles! PASS.
>
> Integration Test? It links! PASS.
>
> User Acceptance Test? It runs! PASS.

Ship it!

--
Ian

bartc

unread,
Jun 23, 2017, 7:45:04 PM6/23/17
to
On 24/06/2017 00:11, Ian Collins wrote:
> On 06/24/17 10:04 AM, bartc wrote:

>> > Most modern editors
>> > will highlight missing or superfluous braces.
>>
>> I don't use a modern editor.
>
> There you go, if you chose to live in the dark, don't complain when you
> bump into things.

Does VS2017 count as a modern editor? I can enter:

printf("%d\n",strlen(6.34));

and it compiles and runs (and crashes). The only warning is about a
mismatch with %d.

If I type this:

if (0)
if (1) puts("one");
else
puts("two");

There are no build errors, but I don't get "two" printed. Dangling else
problem. Yes, using {,} might help, but this modern editor doesn't
appear to be much help when you forget to use them.

>> > Even if they didn't, your unit test would fail, wouldn't they?
>>
>> Why let it get to that point?
>
> Why is it "that point"?

Letting obvious coding errors through so that they need to be picked up
at runtime.

C's design, and the way C compilers work, seem to me to let an awful lot
of stuff through that ought really to be picked up. You shouldn't need
to rely on modern editors (which my tests show don't really work) or
need to waste time constructing tests for things which should not have
got past a compiler in the first place.

--
bartc

Ian Collins

unread,
Jun 23, 2017, 7:59:47 PM6/23/17
to
On 06/24/17 11:45 AM, bartc wrote:
> On 24/06/2017 00:11, Ian Collins wrote:
>> On 06/24/17 10:04 AM, bartc wrote:
>
>>> > Most modern editors
>>> > will highlight missing or superfluous braces.
>>>
>>> I don't use a modern editor.
>>
>> There you go, if you chose to live in the dark, don't complain when you
>> bump into things.
>
> Does VS2017 count as a modern editor?

It's a good editor but not such a good C compiler, especially if you
don't enable warnings.

> I can enter:
>
> printf("%d\n",strlen(6.34));
>
> and it compiles and runs (and crashes). The only warning is about a
> mismatch with %d.

So what does that have to do with brace placement?

gcc is happy to warn:

gcc x.c
x.c: In function ‘main’:
x.c:6:24: error: incompatible type for argument 1 of ‘strlen’
printf("%d\n",strlen(6.34));
^~~~
In file included from /usr/include/string.h:11:0,
from x.c:2:
/usr/include/iso/string_iso.h:63:15: note: expected ‘const char *’ but
argument is of type ‘double’
extern size_t strlen(const char *);

Sun cc is happy warn:

cc x.c
"x.c", line 6: argument #1 is incompatible with prototype:
prototype: pointer to const char : "/usr/include/iso/string_iso.h", line 63
argument : double

Sun CC is happy to barf:

CC x.c
"x.c", line 6: Error: Formal argument 1 of type const char* in call to
std::strlen(const char*) is being passed double.
1 Error(s) detected.

> If I type this:
>
> if (0)
> if (1) puts("one");
> else
> puts("two");
>
> There are no build errors, but I don't get "two" printed. Dangling else
> problem. Yes, using {,} might help, but this modern editor doesn't
> appear to be much help when you forget to use them.

gcc is happy to warn:

gcc -Wall x.c
x.c: In function ‘main’:
x.c:6:6: warning: suggest explicit braces to avoid ambiguous ‘else’
[-Wparentheses]
if (0)

>>> > Even if they didn't, your unit test would fail, wouldn't they?
>>>
>>> Why let it get to that point?
>>
>> Why is it "that point"?
>
> Letting obvious coding errors through so that they need to be picked up
> at runtime.
>
> C's design, and the way C compilers work, seem to me to let an awful lot
> of stuff through that ought really to be picked up.

Use a decent compiler with a decent level of warnings.

>You shouldn't need
> to rely on modern editors (which my tests show don't really work)

Editor != compiler

> or
> need to waste time constructing tests for things which should not have
> got past a compiler in the first place.

Tests are there to speed development, not wast time.

--
Ian

bartc

unread,
Jun 23, 2017, 8:17:53 PM6/23/17
to
On 24/06/2017 00:59, Ian Collins wrote:
> On 06/24/17 11:45 AM, bartc wrote:

>> and it compiles and runs (and crashes). The only warning is about a
>> mismatch with %d.
>
> So what does that have to do with brace placement?

It's about modern editors now.

> gcc is happy to warn:
>
> gcc x.c
> x.c: In function ‘main’:
> x.c:6:24: error: incompatible type for argument 1 of ‘strlen’
> printf("%d\n",strlen(6.34));
> ^~~~
> In file included from /usr/include/string.h:11:0,
> from x.c:2:
> /usr/include/iso/string_iso.h:63:15: note: expected ‘const char *’ but
> argument is of type ‘double’
> extern size_t strlen(const char *);

My code fragment didn't have string.h included. It should be saying
'what the hell is strlen?'.

>> If I type this:
>>
>> if (0)
>> if (1) puts("one");
>> else
>> puts("two");

> gcc is happy to warn:
>
> gcc -Wall x.c
> x.c: In function ‘main’:
> x.c:6:6: warning: suggest explicit braces to avoid ambiguous ‘else’
> [-Wparentheses]
> if (0)

My gcc doesn't say that with default options. Even with -Wall, it
doesn't say anything (gcc/tdm/5.1.0)

>> C's design, and the way C compilers work, seem to me to let an awful lot
>> of stuff through that ought really to be picked up.
>
> Use a decent compiler with a decent level of warnings.

Well this is what it comes down too. The language is flawed so it's left
to using particular compilers with specific options (which even then may
not work). It's pot luck.

With the right language design so that a dangling is impossible (say
that braces are mandatory, and there is a proper else-if clause), then
even with ANY version of ANY crappy compiler with ANY options and using
ANY editor, that error is never going to get through.

>> You shouldn't need
>> to rely on modern editors (which my tests show don't really work)
>
> Editor != compiler
>
>> or
>> need to waste time constructing tests for things which should not have
>> got past a compiler in the first place.
>
> Tests are there to speed development, not wast time.

Sorry, if you have to create tests that a compiler might let through,
then it is wasting time.

--
bartc

Ian Collins

unread,
Jun 23, 2017, 8:36:08 PM6/23/17
to
On 06/24/17 12:17 PM, bartc wrote:
> On 24/06/2017 00:59, Ian Collins wrote:
>> On 06/24/17 11:45 AM, bartc wrote:
>
>>> and it compiles and runs (and crashes). The only warning is about a
>>> mismatch with %d.
>>
>> So what does that have to do with brace placement?
>
> It's about modern editors now.

So modern editors check code how?

>> gcc is happy to warn:
>>
>> gcc x.c
>> x.c: In function ‘main’:
>> x.c:6:24: error: incompatible type for argument 1 of ‘strlen’
>> printf("%d\n",strlen(6.34));
>> ^~~~
>> In file included from /usr/include/string.h:11:0,
>> from x.c:2:
>> /usr/include/iso/string_iso.h:63:15: note: expected ‘const char *’ but
>> argument is of type ‘double’
>> extern size_t strlen(const char *);
>
> My code fragment didn't have string.h included. It should be saying
> 'what the hell is strlen?'.

How many times do people have to explain the improvements in each
iteration of the standard to you? How many times to people have to make
futile attempts at getting you to understand how to use your tools?

Removing <string.h>:

gcc x.c
x.c: In function ‘main’:
x.c:6:17: warning: implicit declaration of function ‘strlen’
[-Wimplicit-function-declaration]
printf("%d\n",strlen(6.34));

cc x.c
"x.c", line 6: warning: implicit function declaration: strlen

> My gcc doesn't say that with default options. Even with -Wall, it
> doesn't say anything (gcc/tdm/5.1.0)

/opt/gcc4.9/bin/gcc --version
gcc (GCC) 4.9.2

/opt/gcc4.9/bin/gcc x.c
x.c: In function ‘main’:
x.c:6:17: warning: incompatible implicit declaration of built-in
function ‘strlen’
printf("%d\n",strlen(6.34));

>>> C's design, and the way C compilers work, seem to me to let an awful lot
>>> of stuff through that ought really to be picked up.
>>
>> Use a decent compiler with a decent level of warnings.
>
> Well this is what it comes down too. The language is flawed so it's left
> to using particular compilers with specific options (which even then may
> not work). It's pot luck.

No language can protect idiots who refuse to learn how to use their tools.

>> Tests are there to speed development, not wast time.
>
> Sorry, if you have to create tests that a compiler might let through,
> then it is wasting time.

I create tests to help me write my code.

--
Ian

Gareth Owen

unread,
Jun 24, 2017, 2:01:57 AM6/24/17
to
Ian Collins <ian-...@hotmail.com> writes:

> How many times do people have to explain the improvements in each
> iteration of the standard to you?

Innumerable, apparently.

> How many times to people have to make futile attempts at getting you
> to understand how to use your tools?

I think the word "futile" gives us a hint here.

Yet, he persisted.

bartc

unread,
Jun 24, 2017, 6:40:50 AM6/24/17
to
On 24/06/2017 01:36, Ian Collins wrote:
> On 06/24/17 12:17 PM, bartc wrote:

> So modern editors check code how?

They don't? To do what they do, requires implementing half a C compiler,
so why not?

>> My code fragment didn't have string.h included. It should be saying
>> 'what the hell is strlen?'.
>
> How many times do people have to explain the improvements in each
> iteration of the standard to you? How many times to people have to make
> futile attempts at getting you to understand how to use your tools?

/My/ tools? I now use my own C compiler when compiling small C programs.
That one /does/ report an actual error when compiling that erroneous
strlen fragment, and doesn't let you proceed until it's fixed.

But see below for what the others do.

It can't however do anything about dangling else or code such as
'printf("%s")' because those are language issues. I don't intend to fix
the C language (and I anyway normally use my own, better designed
language where such things, and a dozen others, cannot occur).

> /opt/gcc4.9/bin/gcc x.c
> x.c: In function ‘main’:
> x.c:6:17: warning: incompatible implicit declaration of built-in
> function ‘strlen’
> printf("%d\n",strlen(6.34));

Yes, I get that when compiling:

int main(void) {strlen(6.5);}

But notice it's a warning. Same with Pelles C. Same with lccwin. But
nothing from DMC. Nothing from Tiny C. Nothing from VS2017. And it
doesn't stop a successful link because 'strlen' will be found in the
runtime library.

But that's OK because a proper unit test will pick it up. That is, when
it crashes!

That I might be able to get at least one compiler (gcc) to report an
error is irrelevant; it should do so anyway. And what about the ones
that don't have such an option?

>> Well this is what it comes down too. The language is flawed so it's left
>> to using particular compilers with specific options (which even then may
>> not work). It's pot luck.
>
> No language can protect idiots who refuse to learn how to use their tools.

Which tools?

If I run Notepad and Tiny C then it will not report on that dangling
else. I have to run a certain version of gcc on a certain OS and with
certain options. And then I get a warning which may well be lost when
compiling one module out of dozens in an automatic script.

THE LANGUAGE SHOULD HAVE DONE MORE to avoid all these little problems,
not have to rely on complicated editors and compilers, and even then
they have to be goaded into reported such errors.


> I create tests to help me write my code.

It sounds like you don't really need your compiler to report any
warnings at all, or even errors, because after all your unit tests will
pick up any problems!

-----------------------------------------

I'm not saying C ought to be fixed now - it's far too late - but at
least people can acknowledge that the language does have these flaws,
and not blame the people trying to use it, for not using gcc and not
using the right editor. Because apparently only the right tools will
paper over the cracks.

Ian Collins

unread,
Jun 24, 2017, 7:00:44 AM6/24/17
to
It's the weekend, fun time :)

--
Ian

Ian Collins

unread,
Jun 24, 2017, 7:24:57 AM6/24/17
to
On 06/24/17 10:40 PM, bartc wrote:
> On 24/06/2017 01:36, Ian Collins wrote:
>> On 06/24/17 12:17 PM, bartc wrote:
>
>> So modern editors check code how?
>
> They don't? To do what they do, requires implementing half a C compiler,
> so why not?

Most IDEs do just that.

>>> My code fragment didn't have string.h included. It should be saying
>>> 'what the hell is strlen?'.
>>
>> How many times do people have to explain the improvements in each
>> iteration of the standard to you? How many times to people have to make
>> futile attempts at getting you to understand how to use your tools?
>
> /My/ tools? I now use my own C compiler when compiling small C programs.
> That one /does/ report an actual error when compiling that erroneous
> strlen fragment, and doesn't let you proceed until it's fixed.

As do every C compiler I have on my boxes.

>> /opt/gcc4.9/bin/gcc x.c
>> x.c: In function ‘main’:
>> x.c:6:17: warning: incompatible implicit declaration of built-in
>> function ‘strlen’
>> printf("%d\n",strlen(6.34));
>
> Yes, I get that when compiling:
>
> int main(void) {strlen(6.5);}
>
> But notice it's a warning. Same with Pelles C. Same with lccwin. But
> nothing from DMC. Nothing from Tiny C. Nothing from VS2017. And it
> doesn't stop a successful link because 'strlen' will be found in the
> runtime library.

If you want a compiler error, turn warnings into errors or use a C++
compiler.

> But that's OK because a proper unit test will pick it up. That is, when
> it crashes!

No, I'll have warnings set to errors.

> That I might be able to get at least one compiler (gcc) to report an
> error is irrelevant; it should do so anyway. And what about the ones
> that don't have such an option?

I wouldn't use them.

>>> Well this is what it comes down too. The language is flawed so it's left
>>> to using particular compilers with specific options (which even then may
>>> not work). It's pot luck.
>>
>> No language can protect idiots who refuse to learn how to use their tools.
>
> Which tools?
>
> If I run Notepad and Tiny C then it will not report on that dangling
> else. I have to run a certain version of gcc on a certain OS and with
> certain options. And then I get a warning which may well be lost when
> compiling one module out of dozens in an automatic script.

I can't imagine any serious /professional/ programmer using that
combination.

> THE LANGUAGE SHOULD HAVE DONE MORE to avoid all these little problems,
> not have to rely on complicated editors and compilers, and even then
> they have to be goaded into reported such errors.

Well C++ does, but C compilers can do a decent job if you tell them to.

>> I create tests to help me write my code.
>
> It sounds like you don't really need your compiler to report any
> warnings at all, or even errors, because after all your unit tests will
> pick up any problems!

Yep.

> -----------------------------------------
>
> I'm not saying C ought to be fixed now - it's far too late - but at
> least people can acknowledge that the language does have these flaws,
> and not blame the people trying to use it, for not using gcc and not
> using the right editor. Because apparently only the right tools will
> paper over the cracks.

I look after the compiler warning settings on my current project, so we
soon know if anyone does anything silly...

--
Ian

Malcolm McLean

unread,
Jun 24, 2017, 7:41:22 AM6/24/17
to
On Saturday, June 24, 2017 at 12:24:57 PM UTC+1, Ian Collins wrote:
>
> > If I run Notepad and Tiny C then it will not report on that dangling
> > else. I have to run a certain version of gcc on a certain OS and with
> > certain options. And then I get a warning which may well be lost when
> > compiling one module out of dozens in an automatic script.
>
> I can't imagine any serious /professional/ programmer using that
> combination.
>
Whilst I'm a so-called "professional" programmer (in fact we're not a
"profession", there's no professional body with the right to determine
who can and cannot describe themselves as a chartered programmer),
not all the code I write is for professional use. Currently I don't
use a small system, but I can imagine myself using a basic editor
and Tiny C on a Raspberry PI, for example.

bartc

unread,
Jun 24, 2017, 7:59:17 AM6/24/17
to
On 24/06/2017 12:24, Ian Collins wrote:
> On 06/24/17 10:40 PM, bartc wrote:

>> If I run Notepad and Tiny C then it will not report on that dangling
>> else. I have to run a certain version of gcc on a certain OS and with
>> certain options. And then I get a warning which may well be lost when
>> compiling one module out of dozens in an automatic script.
>
> I can't imagine any serious /professional/ programmer using that
> combination.

Why not? There can be considerable advantages to using lightweight tools.

I used even more lightweight tools for years. And for work that I was
paid for so presumably that counts as being professional.

(I expect you would only read a novel written by someone using the
latest DTP equipment, and never one written in longhand!)

> I look after the compiler warning settings on my current project, so we
> soon know if anyone does anything silly...

Are your programs written in C, or written in C + stipulated version of
stipulated compiler + stipulated sets of options + stipulated configure
scripts + stipulated makefiles + ... ?

I'm more interested in pure C and generic tools.

--
bartc

Malcolm McLean

unread,
Jun 24, 2017, 8:18:46 AM6/24/17
to
On Saturday, June 24, 2017 at 12:59:17 PM UTC+1, Bart wrote:
> On 24/06/2017 12:24, Ian Collins wrote:
>
> > I can't imagine any serious /professional/ programmer using that
> > combination.
>
> Why not? There can be considerable advantages to using lightweight tools.
>
Nothing is more infuriating than having an editor crash out when all
you want to do is examine one short C source file. Or take ages to
fire up.
However an IDE is really the way to go, especially with modern languages
and big libraries. For example the ability to list functions in
a scope is useful for keeping you out of the documentation.

bartc

unread,
Jun 24, 2017, 8:58:02 AM6/24/17
to
On 24/06/2017 13:18, Malcolm McLean wrote:
> On Saturday, June 24, 2017 at 12:59:17 PM UTC+1, Bart wrote:
>> On 24/06/2017 12:24, Ian Collins wrote:
>>
>>> I can't imagine any serious /professional/ programmer using that
>>> combination.
>>
>> Why not? There can be considerable advantages to using lightweight tools.
>>
> Nothing is more infuriating than having an editor crash out when all
> you want to do is examine one short C source file. Or take ages to
> fire up.

It always surprises me how slow such things can be. I've just done some
tests on a test file from a few days ago containing 2.5M lines [25MB] of:

a=a*b^c;

Notepad: 24 seconds to load, but then it can be edited fluently (ie. no
lags or delays when editing).

Wordpad: <1 second to load, but delays of several seconds when trying to
edit (maybe it's just slow, maybe it's still loading in background, I
don't know).

Notepad++: <1 second to load, very slow to edit (press a bunch of keys,
and 15 seconds can elapse before the screen updates)

SciTe: <1 second, and no editing delays. So this one passes.

Word: 15 seconds to load, and no editing delays (or no worse than
normal). [Pagination going on in the background, it had gone up to page
34 the short while I was testing, but there are an estimated 40,000+
pages!]

So only one is nippy enough that you'd want to use to edit such a file.
(There is also QED, but that has a 4MB limit.)

Now we come to my own editor: 2 seconds to load, and no editing delays.
As it happens, my editor is an interpreted program, but is still more
practical to use than most of the above!

(If I used an older, non-interpreted editor, then both loading and
editing is instant.

I can't use gedit on Linux at the minute, but that was exasperatingly
slow even editing normal source files, let alone a file this big. So
let's say you would want to avoid it.)

> However an IDE is really the way to go, especially with modern languages
> and big libraries. For example the ability to list functions in
> a scope is useful for keeping you out of the documentation.

Some of those things are nice. But from playing with VS2017, I can see
how implementing a lot of that stuff isn't a big deal and can be done on
a small, informal scale like everything else I do. Except I haven't done
GUI for a long time.


--
bartc

Rick C. Hodgin

unread,
Jun 24, 2017, 10:40:35 AM6/24/17
to
I was able to defeat the compiler's ability to know the data set
by simply passing it through pointers:

char* left[5];
char* right[5];

left[0] = "rick";
left[1] = "rick";
left[2] = "ricker";
left[3] = "rick";
left[4] = "rick";

right[0] = "sammi";
right[1] = "rick";
right[2] = "rick";
right[3] = "ricker";
right[4] = "alex";

In doing this, these are the results of my assembly version, and
the others ones relatively using the new code requirements in
this revised version. These are the best results observed on
repeated testing, and were fairly stable iteratively:

Bart = 128
Ben = 131
Rick2 = 133
RickAsm = 107

Here is the assembly algorithm and its requirements for use:

https://pastebin.com/rv8EfxSP

If anyone wants to generate a data set to test, a list of
lefts and rights, then I'll be happy to run it through and
post the results.

Ben Bacarisse

unread,
Jun 24, 2017, 11:39:16 AM6/24/17
to
Malcolm McLean <malcolm.ar...@gmail.com> writes:

> On Saturday, June 24, 2017 at 12:59:17 PM UTC+1, Bart wrote:
>> On 24/06/2017 12:24, Ian Collins wrote:
>>
>> > I can't imagine any serious /professional/ programmer using that
>> > combination.
>>
>> Why not? There can be considerable advantages to using lightweight tools.
>>
> Nothing is more infuriating than having an editor crash out when all
> you want to do is examine one short C source file. Or take ages to
> fire up.

I don't recall the last time Emacs crashed on me. It runs continuously
from log-in to log-out (often for weeks at a time), and it opens a new
window on a C file so fast I can't measure the time by eye (even faster
if I open in an existing window). Maybe we've come so far that Emacs
now counts as a lightweight tool?

<snip>
--
Ben.

GOTHIER Nathan

unread,
Jun 24, 2017, 12:00:54 PM6/24/17
to
On Sat, 24 Jun 2017 23:24:47 +1200
Ian Collins <ian-...@hotmail.com> wrote:

> I can't imagine any serious /professional/ programmer using that
> combination.

Indeed, only close-minded people can't imagine making money with a such
rudimentary tool set. This explains why western workers are losing their job
with the globalization of the economy.

luser droog

unread,
Jun 24, 2017, 12:52:32 PM6/24/17
to
Perhaps. Without trying to put words in Ian's mouth, I think a better
point to take away from this discussion is that there are better tools
than Notepad. And it is worthwhile investing a little effort into learning
a better tool.

When I took CS 101 back in the 90s we given unix accounts and 1 sheet summaries
of both emacs and vi and told to pick one and use it. There was also pico
available, but we were strongly encouraged to learn a 'real' editor.

The process will vary somewhat from person to person, but the way I learned vi
back in the day was to fire up the online tutorial (something like :tutor IIRC)
and spend an hour following the directions. Then spend an evening using it
for your coursework. Then the next evening.

A year or so later I had the great fortune of having John Antognoli for
Hardware & Assembly II. We were learning 8086 assembly and hardware but
he still did everything in his unix shell. And the vi gymnastics he could do!
He would mention something and the cursor would leap across the screen
to the thing he was talking about, after just a few keystrokes.

A few classes in, and one of my classmates had the brilliant idea to start
interrupting him to ask what keys he was pressing. And we learned great
tricks like fF and ;, for jumping to letters in the line. And more crazy
stuff to put in /re-searches/.

Many years later I've made the switch to emacs. Vim has much the same
capabilities but the bare 'vi' that is sometimes available is more spartan.

Anyone using Notepad is strongly advised to investigate better text editors.
The queens are 'emacs' and 'vim'. But even Notepad++ is two +s up.

You can do it. We're rooting for you.

GOTHIER Nathan

unread,
Jun 24, 2017, 1:20:12 PM6/24/17
to
On Sat, 24 Jun 2017 09:52:25 -0700 (PDT)
luser droog <luser...@gmail.com> wrote:

> Perhaps. Without trying to put words in Ian's mouth, I think a better
> point to take away from this discussion is that there are better tools
> than Notepad. And it is worthwhile investing a little effort into learning
> a better tool.

There is no better tool that the one you feel comfortable with. Learning to use
any more sophisticated tool won't necessarily make you more productive if you
don't like the development process.

For my own use I prefer spartan tools such as notepad or vi rather than
playskool editors with syntax coloring, autocompletion, etc which consume more
time and memory to provide extra services with more bugs.

bartc

unread,
Jun 24, 2017, 3:29:31 PM6/24/17
to
On 24/06/2017 15:40, Rick C. Hodgin wrote:

> These are the best results observed on
> repeated testing, and were fairly stable iteratively:
>
> Bart = 128
> Ben = 131
> Rick2 = 133
> RickAsm = 107
>
> Here is the assembly algorithm and its requirements for use:
>
> https://pastebin.com/rv8EfxSP

I tried an assembly version a couple of days ago, but couldn't get it as
fast as the best C version.

The latest version is below; it passes parameters using globals (the
best C version probably just inlines the code). Your version you posted
today was about 25% slower on my test [on my one test] (but originally
it was much slower, like half the eventual speed; I don't know why).

;version using conventional register names:

mov esi,[ssx]
mov edi,[ttx]
mov ecx,0

L1:
movzx eax,byte [esi+ecx]
movzx ebx,byte [edi+ecx]
inc ecx

mov al,[rax+tolc]
mov bl,[rbx+tolc]

cmp al,bl
jb lessthan
ja morethan
cmp al,0
jnz L1
mov rax,0
ret

lessthan:
mov rax,-1
ret

morethan:
mov rax,1
ret

I found using 'xlat' to do the lower case conversion made no difference.

--
bartc

jak

unread,
Jun 24, 2017, 3:36:39 PM6/24/17
to
Il 24/06/2017 01:45, bartc ha scritto:
>
> If I type this:
>
> if (0)
> if (1) puts("one");
> else
> puts("two");
>
> There are no build errors, but I don't get "two" printed. Dangling else
> problem. Yes, using {,} might help, but this modern editor doesn't
> appear to be much help when you forget to use them.
>
Correct me if I'm wrong but except about your indentation, this is
exactly what it must do. The "else" belongs to the nearest "if", so
those lines of code will never print anything:

if(0)
if(1)
puts("one");
else
puts("two");
or
if(0)
{
if(1)
{
puts("one");
}
else
{
puts("two");
}
}
Just in Python identation make difference if I'm not wrong again.

Pascal J. Bourguignon

unread,
Jun 24, 2017, 3:43:01 PM6/24/17
to
then: emacs -Q # is for you! ;-)

--
__Pascal J. Bourguignon
http://www.informatimago.com

jak

unread,
Jun 24, 2017, 3:43:47 PM6/24/17
to
Do not think about it. I understood. The same can not understand why the
editor should perceive your intentions from the indentation.

bartc

unread,
Jun 24, 2017, 3:49:40 PM6/24/17
to
On 24/06/2017 18:16, GOTHIER Nathan wrote:
> On Sat, 24 Jun 2017 09:52:25 -0700 (PDT)
> luser droog <luser...@gmail.com> wrote:
>
>> Perhaps. Without trying to put words in Ian's mouth, I think a better
>> point to take away from this discussion is that there are better tools
>> than Notepad. And it is worthwhile investing a little effort into learning
>> a better tool.
>
> There is no better tool that the one you feel comfortable with.

Exactly.

But I also require a text editor where pressing Left cursor, Backspace
etc at the beginning of a line does nothing (not delete half the
previous line before you've noticed).

Programming editors need to be line-oriented where the beginnings and
ends of lines are hard stops, but very few are.

--
bartc

bartc

unread,
Jun 24, 2017, 3:59:45 PM6/24/17
to
On 24/06/2017 20:43, jak wrote:
> Do not think about it. I understood. The same can not understand why the
> editor should perceive your intentions from the indentation.

The point I was making is that the C language doesn't handle this well.

Braces can be used or not used. They can use this style or that. The
same closing } brace can close a function, if, while, for, switch, or
do-while block.

The whole thing is error-prone and allows too wide a variety of styles.

Done differently, you wouldn't be able to write this:

if (0)
if (1) puts("one");
else
puts("two");

because the language could require explicit block terminators, and the
error would come to light.

(I'm sure there are tools that can compare indentation with syntax, but
too much of using C seems to depend on using external programs to get
around poor design.)

Rick C. Hodgin

unread,
Jun 24, 2017, 4:27:30 PM6/24/17
to
I tried your version converting it to 32-bit assembly:

_declspec(naked)
int bart_stricmp2_asm()
{
_asm {
mov esi,leftptr
mov edi,rightptr
xor ecx,ecx

L1:
movzx eax,byte ptr [esi+ecx]
movzx ebx,byte ptr [edi+ecx]
inc ecx

mov al,byte ptr lower[eax]
mov bl,byte ptr lower[ebx]

cmp al,bl
jb lessthan
ja morethan
cmp al, 0
jnz L1
// If we get here, al is zero
equal:
// And, eax is also therefore zero
ret

lessthan:
mov eax,-1
ret

morethan:
mov eax,1
ret
}
}

I switched to a startup model using:

start /AFFINITY 0x1 /REALTIME test.exe

It would then be more symbolic of a real machine running the test.
In those cases I get these scores:

Bart = 120
Ben = 119
Rick2 = 126
BartAsm = 115
RickAsm = 115

I posted a message on the comp.lang.asm.x86 group to see if they
can help me optimize the assembly code. They like me even less
than most people do here, so I don't know if I'll see any replies:

https://groups.google.com/forum/#!topic/comp.lang.asm.x86/5k1HSBIT4-k

Would be nice though.

bartc

unread,
Jun 24, 2017, 4:55:10 PM6/24/17
to
On 24/06/2017 21:27, Rick C. Hodgin wrote:
> On Saturday, June 24, 2017 at 3:29:31 PM UTC-4, Bart wrote:

> It would then be more symbolic of a real machine running the test.
> In those cases I get these scores:
>
> Bart = 120
> Ben = 119
> Rick2 = 126
> BartAsm = 115
> RickAsm = 115

That's doesn't seem like it would be worthwhile using assembly. But
then, if using gcc-O3 for testing as I've done, such results can be
misleading.

If the strcmp-routines are put into a different module so not visible to
the module that has main(), then the timings are different again.

> I posted a message on the comp.lang.asm.x86 group to see if they
> can help me optimize the assembly code. They like me even less
> than most people do here, so I don't know if I'll see any replies:

I don't think I'm that welcome there either.

I recently posted something about why Nasm was so slow, but that never
appeared. I later discovered that the moderator of that group is one of
the authors of Nasm!

--
bartc

Keith Thompson

unread,
Jun 24, 2017, 5:15:34 PM6/24/17
to
bartc <b...@freeuk.com> writes:
> On 24/06/2017 01:36, Ian Collins wrote:
[...]
>> /opt/gcc4.9/bin/gcc x.c
>> x.c: In function "main":
>> x.c:6:17: warning: incompatible implicit declaration of built-in
>> function "strlen"
>> printf("%d\n",strlen(6.34));
>
> Yes, I get that when compiling:
>
> int main(void) {strlen(6.5);}
>
> But notice it's a warning. Same with Pelles C. Same with lccwin. But
> nothing from DMC. Nothing from Tiny C. Nothing from VS2017. And it
> doesn't stop a successful link because 'strlen' will be found in the
> runtime library.

Yes, it's a warning. The lesson you should have learned from that
is to pay attention to warnings.

[...]

> If I run Notepad and Tiny C then it will not report on that dangling
> else. I have to run a certain version of gcc on a certain OS and with
> certain options. And then I get a warning which may well be lost when
> compiling one module out of dozens in an automatic script.

I practically never have problems with dangling elses. I use a set of
coding conventions that prevent them from being an issue.

> THE LANGUAGE SHOULD HAVE DONE MORE to avoid all these little problems,
> not have to rely on complicated editors and compilers, and even then
> they have to be goaded into reported such errors.

My editor isn't configured to do C syntax checking. (Nothing wrong with
those that do, that's just the way I work.) And I don't run into these
little problems.

To summarize: You and I use the same language, C. You have problems
with dangling elses. I don't. Your conclusion: It must be the
language's fault.

[...]

> I'm not saying C ought to be fixed now - it's far too late - but at
> least people can acknowledge that the language does have these flaws,
> and not blame the people trying to use it, for not using gcc and not
> using the right editor. Because apparently only the right tools will
> paper over the cracks.

Yes, the language has flaws. We have acknowledged that over and over
again. And when we try to tell you how to work around those flaws, you
accuse us of claiming the language is perfect.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Rick C. Hodgin

unread,
Jun 24, 2017, 5:58:42 PM6/24/17
to
On Saturday, June 24, 2017 at 4:55:10 PM UTC-4, Bart wrote:
> On 24/06/2017 21:27, Rick C. Hodgin wrote:
> > On Saturday, June 24, 2017 at 3:29:31 PM UTC-4, Bart wrote:
>
> > It would then be more symbolic of a real machine running the test.
> > In those cases I get these scores:
> >
> > Bart = 120
> > Ben = 119
> > Rick2 = 126
> > BartAsm = 115
> > RickAsm = 115
>
> That's doesn't seem like it would be worthwhile using assembly. But
> then, if using gcc-O3 for testing as I've done, such results can be
> misleading.
>
> If the strcmp-routines are put into a different module so not visible to
> the module that has main(), then the timings are different again.

Isn't that weird? I wouldn't expect such a diverse range of samples.
They seem also to go in spells, like for N minutes it will be in a
particular range, and then later it will be in another range for a
period of time, etc. It's like there's background processing taking
place on the CPU (or its hyperthreaded component) that is affecting
the results of the main thread.

> > I posted a message on the comp.lang.asm.x86 group to see if they
> > can help me optimize the assembly code. They like me even less
> > than most people do here, so I don't know if I'll see any replies:
>
> I don't think I'm that welcome there either.
>
> I recently posted something about why Nasm was so slow, but that never
> appeared. I later discovered that the moderator of that group is one of
> the authors of Nasm!

Frank Kotler? I didn't know he worked on that. Makes sense though. He
is the clax moderator. He must have some assembly background. :-)

BTW, there was a reply there from a guru who I consider to be one of the
most skilled low-level developers I've met. He suggests that he's seen
good performance by creating a lower-case and upper-case copy of the
search string, and then allowing either one to match on each character
being tested against.

Depending on your search / comparison set, it could be worth the up-
front hit to create the cased-copies.

Rick C. Hodgin

unread,
Jun 24, 2017, 6:09:43 PM6/24/17
to
On Saturday, June 24, 2017 at 5:58:42 PM UTC-4, Rick C. Hodgin wrote:
> BTW, there was a reply there from a guru who I consider to be one of the
> most skilled low-level developers I've met. He suggests that he's seen
> good performance by creating a lower-case and upper-case copy of the
> search string, and then allowing either one to match on each character
> being tested against.
>
> Depending on your search / comparison set, it could be worth the up-
> front hit to create the cased-copies.

His algorithm was a variation of the algorithm Ben found:

int terje_stricmp(const void *p1, const void *p2)
{
char c;
const unsigned char *s1 = p1, *s2 = p2;
int d;

while (d = lower[c = *s1++] - lower[*s2++] && c)
{};

return d;
}

It produced this run:

Bart = 117
Ben = 116
Rick2 = 123
BartAsm = 104
RickAsm = 105
TerjeC = 96

bartc

unread,
Jun 24, 2017, 6:13:36 PM6/24/17
to
On 24/06/2017 22:58, Rick C. Hodgin wrote:
> On Saturday, June 24, 2017 at 4:55:10 PM UTC-4, Bart wrote:
>> On 24/06/2017 21:27, Rick C. Hodgin wrote:
>>> On Saturday, June 24, 2017 at 3:29:31 PM UTC-4, Bart wrote:
>>
>>> It would then be more symbolic of a real machine running the test.
>>> In those cases I get these scores:
>>>
>>> Bart = 120
>>> Ben = 119
>>> Rick2 = 126
>>> BartAsm = 115
>>> RickAsm = 115
>>
>> That's doesn't seem like it would be worthwhile using assembly. But
>> then, if using gcc-O3 for testing as I've done, such results can be
>> misleading.
>>
>> If the strcmp-routines are put into a different module so not visible to
>> the module that has main(), then the timings are different again.
>
> Isn't that weird? I wouldn't expect such a diverse range of samples.

Well, with caller and callee in different modules, in-lining is not
possible. Or not so easy.

> BTW, there was a reply there from a guru who I consider to be one of the
> most skilled low-level developers I've met. He suggests that he's seen
> good performance by creating a lower-case and upper-case copy of the
> search string, and then allowing either one to match on each character
> being tested against.
>
> Depending on your search / comparison set, it could be worth the up-
> front hit to create the cased-copies.

Don't forget it was the OP who had a need for this, and we don't know if
speed was an issue. So don't spend too much time on it.

I've not often had a need for such a function, and when I have, I use a
different approach because I know the length of the strings, don't need
to preserve original case, etc.

So it's strictly something to keep in a library if you want a quick
non-destructive case-ignoring compare of zero-terminated strings. Before
you implement customised code more suited to your app.

--
bartc

Rick C. Hodgin

unread,
Jun 24, 2017, 6:21:22 PM6/24/17
to
On Saturday, June 24, 2017 at 6:13:36 PM UTC-4, Bart wrote:
> On 24/06/2017 22:58, Rick C. Hodgin wrote:
> > On Saturday, June 24, 2017 at 4:55:10 PM UTC-4, Bart wrote:
> >> If the strcmp-routines are put into a different module so not visible to
> >> the module that has main(), then the timings are different again.
> > Isn't that weird? I wouldn't expect such a diverse range of samples.
>
> Well, with caller and callee in different modules, in-lining is not
> possible. Or not so easy.

I disabled in-lining on my optimization settings for that reason. I
wanted it to be demonstrating the effects of a full callback, and not
by optimization "trickery" which enables things that would not be
possible in a real qsort() callback implementation. It also places
the same burden of calling protocol, epilogue and prologue code on
each function.

> Don't forget it was the OP who had a need for this, and we don't know if
> speed was an issue. So don't spend too much time on it.

I've found it an interesting diversion. I haven't done regular
assembly programming for years, just little bits here and there.
It's been nice to have a momentary diversion.

> I've not often had a need for such a function, and when I have, I
> use a different approach because I know the length of the strings,
> don't need to preserve original case, etc.
>
> So it's strictly something to keep in a library if you want a quick
> non-destructive case-ignoring compare of zero-terminated strings.
> Before you implement customised code more suited to your app.

Agreed. I wouldn't worry about it too much if it just came up.

Terje's modification has given me a new line of thinking. I can
remember when I was a welder, I eventually began to realize that
when a customer brought in some piece of machinery that was broken,
it was sometimes more efficient to cut away a functional piece of
the device to gain better and quicker weld-repair access to the
broken part, and then repair also the other part later. In my
mind back then I called it "destructive constructive repair work."

I view Terje's example as something similar. It adds something
more to the algorithm, but gains something greater in return.

bartc

unread,
Jun 24, 2017, 6:24:04 PM6/24/17
to
On 24/06/2017 23:09, Rick C. Hodgin wrote:
> On Saturday, June 24, 2017 at 5:58:42 PM UTC-4, Rick C. Hodgin wrote:
>> BTW, there was a reply there from a guru who I consider to be one of the
>> most skilled low-level developers I've met. He suggests that he's seen
>> good performance by creating a lower-case and upper-case copy of the
>> search string, and then allowing either one to match on each character
>> being tested against.
>>
>> Depending on your search / comparison set, it could be worth the up-
>> front hit to create the cased-copies.
>
> His algorithm was a variation of the algorithm Ben found:
>
> int terje_stricmp(const void *p1, const void *p2)
> {
> char c;
> const unsigned char *s1 = p1, *s2 = p2;
> int d;
>
> while (d = lower[c = *s1++] - lower[*s2++] && c)

This doesn't seem to give the right answer. You might need to check d is
0 after assigning to it. Without that the timing is not meaningful.

(It's not clear why params have to be copied to s1 and s2, bit without
that, it's slower.)

Anyway wasn't that group supposed to give you an ASM solution?

--
bartc

GOTHIER Nathan

unread,
Jun 24, 2017, 6:48:16 PM6/24/17
to
On Sat, 24 Jun 2017 21:42:53 +0200
"Pascal J. Bourguignon" <p...@informatimago.com> wrote:

> then: emacs -Q # is for you! ;-)

I prefer vi since it's POSIX compliant. :-P

Rick C. Hodgin

unread,
Jun 24, 2017, 6:50:20 PM6/24/17
to
On Saturday, June 24, 2017 at 6:24:04 PM UTC-4, Bart wrote:
> On 24/06/2017 23:09, Rick C. Hodgin wrote:
> > On Saturday, June 24, 2017 at 5:58:42 PM UTC-4, Rick C. Hodgin wrote:
> >> BTW, there was a reply there from a guru who I consider to be one of the
> >> most skilled low-level developers I've met. He suggests that he's seen
> >> good performance by creating a lower-case and upper-case copy of the
> >> search string, and then allowing either one to match on each character
> >> being tested against.
> >>
> >> Depending on your search / comparison set, it could be worth the up-
> >> front hit to create the cased-copies.
> >
> > His algorithm was a variation of the algorithm Ben found:
> >
> > int terje_stricmp(const void *p1, const void *p2)
> > {
> > char c;
> > const unsigned char *s1 = p1, *s2 = p2;
> > int d;
> >
> > while (d = lower[c = *s1++] - lower[*s2++] && c)
>
> This doesn't seem to give the right answer. You might need to check d is
> 0 after assigning to it. Without that the timing is not meaningful.

True. I have validation code written to check it, but I guess I
missed it. I also found a few tweaks to my assembly version which
improved its score to 102:

(new algorithm at the bottom)
https://pastebin.com/rv8EfxSP

When I modify Terje's to continue so long as it equals zero, it's
identical to Ben's performance. That still makes our assembly
versions the fastest.

> (It's not clear why params have to be copied to s1 and s2, bit without
> that, it's slower.)

I think Ben indicated above that the declaration of the function
was the callback of a qsort(), so it has void * parameters.

> Anyway wasn't that group supposed to give you an ASM solution?

Yes. :-)

Ian Collins

unread,
Jun 25, 2017, 4:46:39 AM6/25/17
to
On 06/24/17 11:59 PM, bartc wrote:
> On 24/06/2017 12:24, Ian Collins wrote:
>> On 06/24/17 10:40 PM, bartc wrote:
>
>>> If I run Notepad and Tiny C then it will not report on that dangling
>>> else. I have to run a certain version of gcc on a certain OS and with
>>> certain options. And then I get a warning which may well be lost when
>>> compiling one module out of dozens in an automatic script.
>>
>> I can't imagine any serious /professional/ programmer using that
>> combination.
>
> Why not? There can be considerable advantages to using lightweight tools.

You complained earlier that I waste time writing tests to find errors,
well I could say you waste time compiling code to find them (or not in
the case of your chosen compiler). Most IDEs will highlight your errors
as you make them, so when you hit compile it compiles.

>> I look after the compiler warning settings on my current project, so we
>> soon know if anyone does anything silly...
>
> Are your programs written in C, or written in C + stipulated version of
> stipulated compiler + stipulated sets of options + stipulated configure
> scripts + stipulated makefiles + ... ?

A mix of C99 (Linux kernel options) and C++14. No extensions. The C++
parts are compiled with three compilers.

> I'm more interested in pure C and generic tools.

Then you should use a compiler that tells you when your C is impure...

--
Ian

bartc

unread,
Jun 25, 2017, 6:21:51 AM6/25/17
to
On 25/06/2017 09:46, Ian Collins wrote:
> On 06/24/17 11:59 PM, bartc wrote:
>> On 24/06/2017 12:24, Ian Collins wrote:
>>> On 06/24/17 10:40 PM, bartc wrote:
>>
>>>> If I run Notepad and Tiny C then it will not report on that dangling
>>>> else. I have to run a certain version of gcc on a certain OS and with
>>>> certain options. And then I get a warning which may well be lost when
>>>> compiling one module out of dozens in an automatic script.
>>>
>>> I can't imagine any serious /professional/ programmer using that
>>> combination.
>>
>> Why not? There can be considerable advantages to using lightweight tools.
>
> You complained earlier that I waste time writing tests to find errors,
> well I could say you waste time compiling code to find them (or not in
> the case of your chosen compiler). Most IDEs will highlight your errors
> as you make them, so when you hit compile it compiles.

What's the difference exactly when the error is pointed out?

For the IDE to tell you some detail is not right, it needs to know
something about the language to do that. It's doing the same job as the
early stages of a compiler. (So for your compiler to presumably do the
same checks is redundant.)

The distinction between IDE and compiler is blurred.

But I have reservations about how well an IDE can actually tell you what
is wrong. As my most of my code is created and maintained chaotically.
Jumping between different parts of a module and from module to module,
and it's only towards the end that all the different bits form valid,
compilable code.

An IDE that keeps telling me over my shoulder I'm doing something wrong
would drive me up the wall. I /know/ the code is incomplete!

I suspect that you like to use slow, cumbersome, non-interactive
compilers, which is why you need the instant feedback you get from the
IDE. A fast, lightweight compiler can also give instant results. (The
compiler I was using in the 90s was linked to my editor. After a compile
error, the editor would open on that error line.)

(Out of interest, on your IDE, how many keystrokes does it take to
comment or uncomment the current line? How about this line and the next?
How many keystrokes to duplicate the current line, so that:

x1 = y1;

becomes:

x1 = y1;
x1 = y1;

?)

>> I'm more interested in pure C and generic tools.
>
> Then you should use a compiler that tells you when your C is impure...

You don't get me. By pure C, I mean C, not 1000 lines of C plus 30,000
lines of Bash script. By generic tools, I mean a C compiler, ANY C
compiler, not version 6.2.3 subsection 6 of gcc using these 117
different options.

And even with the C, I rather it wasn't bristling full of #ifdef
__GNUC__ and #ifdef _MSC_VER blocks.

GOTHIER Nathan

unread,
Jun 25, 2017, 9:26:35 AM6/25/17
to
On Sun, 25 Jun 2017 20:46:28 +1200
Ian Collins <ian-...@hotmail.com> wrote:

> You complained earlier that I waste time writing tests to find errors,
> well I could say you waste time compiling code to find them (or not in
> the case of your chosen compiler). Most IDEs will highlight your errors
> as you make them, so when you hit compile it compiles.

Most IDE only higlight typos but not errors.

bartc

unread,
Jun 25, 2017, 11:01:54 AM6/25/17
to
Finally I have access to a desktop that can run Linux.

Running 'gedit' on my 2.5Mline test file takes about 13 seconds before I
can start editing (this is after doing it a few times to get everything
cached).

But responsiveness is dreadful to the point of being unusable.

This is running under virtual Linux. But if I run my own editor UNDER
THE SAME SYSTEM, that takes only 2.5 seconds to load, and responsiveness
is fine. This despite it being not only interpreted, but non-accelerated
(being Linux, I can only use pure C code for technical reasons).

I tried another editor, 'leafpad', more lightweight. That was much
better, loading in about 2 seconds. And highly responsive. But if
searching for some text at the end of the file, it took 11 seconds. My
editor, about 1 second. (With gedit, it was a fight just to GET to the
end of the file; I managed it once. It's a mess.)

I take it that your Emacs works better than this. But what is the matter
with all these other programs?

--
bartc

David Brown

unread,
Jun 25, 2017, 11:02:47 AM6/25/17
to
On 24/06/17 01:59, Ian Collins wrote:
> On 06/24/17 11:45 AM, bartc wrote:
>> On 24/06/2017 00:11, Ian Collins wrote:
>>> On 06/24/17 10:04 AM, bartc wrote:
>>
>>>> > Most modern editors
>>>> > will highlight missing or superfluous braces.
>>>>
>>>> I don't use a modern editor.
>>>
>>> There you go, if you chose to live in the dark, don't complain when you
>>> bump into things.
>>
>> Does VS2017 count as a modern editor?
>
> It's a good editor but not such a good C compiler, especially if you
> don't enable warnings.
>
>> I can enter:
>>
>> printf("%d\n",strlen(6.34));
>>
>> and it compiles and runs (and crashes). The only warning is about a
>> mismatch with %d.
>
> So what does that have to do with brace placement?
>
> gcc is happy to warn:
>
> gcc x.c
> x.c: In function ‘main’:
> x.c:6:24: error: incompatible type for argument 1 of ‘strlen’
> printf("%d\n",strlen(6.34));
> ^~~~
> In file included from /usr/include/string.h:11:0,
> from x.c:2:
> /usr/include/iso/string_iso.h:63:15: note: expected ‘const char *’ but
> argument is of type ‘double’
> extern size_t strlen(const char *);
>
> Sun cc is happy warn:
>
> cc x.c
> "x.c", line 6: argument #1 is incompatible with prototype:
> prototype: pointer to const char : "/usr/include/iso/string_iso.h",
> line 63
> argument : double
>
> Sun CC is happy to barf:
>
> CC x.c
> "x.c", line 6: Error: Formal argument 1 of type const char* in call to
> std::strlen(const char*) is being passed double.
> 1 Error(s) detected.
>
>> If I type this:
>>
>> if (0)
>> if (1) puts("one");
>> else
>> puts("two");
>>
>> There are no build errors, but I don't get "two" printed. Dangling else
>> problem. Yes, using {,} might help, but this modern editor doesn't
>> appear to be much help when you forget to use them.
>
> gcc is happy to warn:
>
> gcc -Wall x.c
> x.c: In function ‘main’:
> x.c:6:6: warning: suggest explicit braces to avoid ambiguous ‘else’
> [-Wparentheses]
> if (0)

And from gcc 6, you also have the "-Wmisleading-indentation" warning
that helps spot problems with indentation and brackets. That is yet
another useful tool to help stop mistakes early.

It is loading more messages.
0 new messages