C can be very literal.

DSF

unread,

Nov 1, 2013, 2:53:32 PM11/1/13

to

Hello,

A little peek "under the hood."

C can be very literal in its translation from C source to the final
executable code. Whether that's a blessing or a curse depends on what
you're doing. Take this short example:

(The variables were made static to give them some "substance,"
otherwise the compiler would either optimize the whole thing away, or,
if I gave the variables "something to do," such as feeding them to
printf, has the compiler piling them into as many registers as it
can.)

#include <stdlib.h>

void foo(void);
void bar(void);

int main()
{
foo();
bar();

return EXIT_SUCCESS;
}

void foo(void)
{
static int a, b, c, d, e, f, g, h, i;

a = 0;
b = 0;
c = 0;
d = 0;
e = 0;
f = 0;
g = 0;
h = 0;
i = 0;
}

void bar(void)
{
static int a, b, c, d, e, f, g, h, i;

a = b = c = d = e = f = g = h = i = 0;
}

Both foo and bar do the same thing, but the resulting code is quite
different.

To keep things non-platform specific, pseudo-code for foo:

set register "A" to 0
store in memory location "a"
set register "B" to 0
store in memory location "b"
set register "C" to 0
store in memory location "c"
set register "A" to 0
store in memory location "d"
set register "B" to 0
store in memory location "e"
set register "C" to 0
store in memory location "f"
set register "A" to 0
store in memory location "g"
set register "B" to 0
store in memory location "h"
set register "C" to 0
store in memory location "i"

Pseudo-code for bar:
set register "A" to 0
store in memory location "a"
store in memory location "b"
store in memory location "c"
store in memory location "d"
store in memory location "e"
store in memory location "f"
store in memory location "g"
store in memory location "h"
store in memory location "i"

This is with size optimization enabled. Note that in foo, the same
three registers are set to zero repeatedly.

So, at least with my current compiler, ganging up equals (variables
of the same type, of course) produces shorter and faster code.

A practical example. I'd written some code to delete leading
whitespace on a line, the end result being moving X characters of a
wide character string downward.

In this code line is the start of the line buffer, pe is the end of
the line buffer+1, and p is the start of the area to be moved down.
These are all of type wchar_t *.

memmove(line, p ((pe - line) * sizeof(wchar_t)) - ((p - line) *
sizeof(wchar_t));
This was my second to last coding. The resulting pseudo-code:

temp1 = pe - line
temp1 = temp1 / 2 //The pointers represent characters, not bytes.
temp2 = p - line
temp2 = temp2 / 2
temp3 = temp1 - temp2
temp3 = temp3 * 2
(Result in temp3)

It wasn't exactly following my code, but it was doing unnecessary
multiplying and dividing, even if by a power of two. So I changed it
to this: (uint is a typedef'd unsigned int.)
memmove(line, p, (((uint)pe - (uint)line) - ((uint)p - (uint)line)));
This produced:
temp1 = pe - line
temp2 = p - line
temp1 = temp1 - temp2
(Result in temp1)

More typing, but shorter, faster code and fewer temps.

Just my two cents.
"'Later' is the beginning of what's not to be."
D.S. Fiscus

Seebs

unread,

Nov 1, 2013, 2:51:59 PM11/1/13

to

On 2013-11-01, DSF <nota...@address.here> wrote:
> In this code line is the start of the line buffer, pe is the end of
> the line buffer+1, and p is the start of the area to be moved down.
> These are all of type wchar_t *.

> memmove(line, p ((pe - line) * sizeof(wchar_t)) - ((p - line) *
> sizeof(wchar_t));

... Why not (pe - p) * sizeof(wchar_t)?

> It wasn't exactly following my code, but it was doing unnecessary
> multiplying and dividing, even if by a power of two.

I am not sure that "by a power of two" matters at all in this case.

>So I changed it
> to this: (uint is a typedef'd unsigned int.)
> memmove(line, p, (((uint)pe - (uint)line) - ((uint)p - (uint)line)));

This isn't safe, don't do that. There is no guarantee that a pointer
will fit inside a uint without losing information which could result
in tragic failures.

> More typing, but shorter, faster code and fewer temps.

Also code that no longer makes sense.

Me, I'd suggest:
1. Consider using (pe - p) * sizeof(wchar_t).
2. Consider using wmemmove() and just using (pe - p).
3. Or just use (((char *) pe) - ((char *) p)) as the character count.

-s
--
Copyright 2013, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
Autism Speaks does not speak for me. http://autisticadvocacy.org/
I am not speaking for my employer, although they do rent some of my opinions.

Eric Sosman

unread,

Nov 1, 2013, 3:16:59 PM11/1/13

to

On 11/1/2013 2:53 PM, DSF wrote:
> [...]

Also fewer guarantees of correctness. Quoth 6.3.2.3p6:
"Any pointer type may be converted to an integer type. Except
as previously specified [null pointer constant], the result is
implementation-defined." Converting the pointers to integers
and then subtracting is not guaranteed to give the same result
as subtracting the pointers and multiplying by the sizeof.
(Bentley and McIlroy's classic "Engineering a Sort Function"
mentions one machine for which your rewrite would definitely
*not* produce the right result.)

What sort of generated code do you get if you simplify the
size expression to `(pe - p) * sizeof(wchar_t)' instead of
writing out all those unnecessary operators?

Also, if you're so concerned about speed: Are you moving
too much data? Is pe ("the end of the line buffer+1") really
the proper endpoint, or ought you to be using a pointer just
past the terminator and perhaps well short of buffer's end?
Yes, it may take some extra work to find that terminator -- but
that's the sort of thing you're might well have already ...

--
Eric Sosman
eso...@comcast-dot-net.invalid

Johannes Bauer

unread,

Nov 1, 2013, 4:25:59 PM11/1/13

to

Am 01.11.2013 19:53, schrieb DSF:

> So, at least with my current compiler, ganging up equals (variables
> of the same type, of course) produces shorter and faster code.

If your current compiler misses such an obvious optimization, it is
quite frankly a piece of shit.

Just for reference, gcc 4.7 produces the expected:

080484a0 <foo>:
80484a0: c7 05 20 a0 04 08 00 movl $0x0,0x804a020
80484a7: 00 00 00
80484aa: c7 05 24 a0 04 08 00 movl $0x0,0x804a024
80484b1: 00 00 00
80484b4: c7 05 28 a0 04 08 00 movl $0x0,0x804a028
80484bb: 00 00 00
80484be: c7 05 2c a0 04 08 00 movl $0x0,0x804a02c
80484c5: 00 00 00
80484c8: c7 05 30 a0 04 08 00 movl $0x0,0x804a030
80484cf: 00 00 00
80484d2: c7 05 34 a0 04 08 00 movl $0x0,0x804a034
80484d9: 00 00 00
80484dc: c7 05 38 a0 04 08 00 movl $0x0,0x804a038
80484e3: 00 00 00
80484e6: c7 05 3c a0 04 08 00 movl $0x0,0x804a03c
80484ed: 00 00 00
80484f0: c7 05 40 a0 04 08 00 movl $0x0,0x804a040
80484f7: 00 00 00
80484fa: c3 ret
80484fb: 90 nop
80484fc: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi

08048500 <bar>:
8048500: c7 05 44 a0 04 08 00 movl $0x0,0x804a044
8048507: 00 00 00
804850a: c7 05 48 a0 04 08 00 movl $0x0,0x804a048
8048511: 00 00 00
8048514: c7 05 4c a0 04 08 00 movl $0x0,0x804a04c
804851b: 00 00 00
804851e: c7 05 50 a0 04 08 00 movl $0x0,0x804a050
8048525: 00 00 00
8048528: c7 05 54 a0 04 08 00 movl $0x0,0x804a054
804852f: 00 00 00
8048532: c7 05 58 a0 04 08 00 movl $0x0,0x804a058
8048539: 00 00 00
804853c: c7 05 5c a0 04 08 00 movl $0x0,0x804a05c
8048543: 00 00 00
8048546: c7 05 60 a0 04 08 00 movl $0x0,0x804a060
804854d: 00 00 00
8048550: c7 05 64 a0 04 08 00 movl $0x0,0x804a064
8048557: 00 00 00
804855a: c3 ret
804855b: 66 90 xchg %ax,%ax
804855d: 66 90 xchg %ax,%ax
804855f: 90 nop

Regards,
Joe

DSF

unread,

Nov 4, 2013, 11:00:20 PM11/4/13

to

On 01 Nov 2013 18:51:59 GMT, Seebs <usenet...@seebs.net> wrote:

>On 2013-11-01, DSF <nota...@address.here> wrote:
>> In this code line is the start of the line buffer, pe is the end of
>> the line buffer+1, and p is the start of the area to be moved down.
>> These are all of type wchar_t *.
>
>> memmove(line, p ((pe - line) * sizeof(wchar_t)) - ((p - line) *
>> sizeof(wchar_t));

Oops! a comma got lost in translation. After the first p. I'm
surprised no one caught it since memmove takes three arguments. It
should be:

memmove(line, p, ((pe - line) * sizeof(wchar_t)) - ((p - line) *

sizeof(wchar_t));

>... Why not (pe - p) * sizeof(wchar_t)?

>> It wasn't exactly following my code, but it was doing unnecessary
>> multiplying and dividing, even if by a power of two.
>
>I am not sure that "by a power of two" matters at all in this case.

Because * sizeof(wchar_t) equals two, it compiles to "add register
to itself" instead of actually having to use a multiply instruction.
Higher multipliers that are powers of two can simply shift left.
Division can shift right.

>>So I changed it
>> to this: (uint is a typedef'd unsigned int.)
>> memmove(line, p, (((uint)pe - (uint)line) - ((uint)p - (uint)line)));
>
>This isn't safe, don't do that. There is no guarantee that a pointer
>will fit inside a uint without losing information which could result
>in tragic failures.

In this case, a pointer is the same size as an unsigned int. If I
had to rewrite the function for a different platform, this would be
the least of my problems.

>> More typing, but shorter, faster code and fewer temps.
>
>Also code that no longer makes sense.

I admit it's a truckload of parenthesis, but I wouldn't go that far.

>Me, I'd suggest:
>1. Consider using (pe - p) * sizeof(wchar_t).

It's fine, but it bugs me that it becomes: subtract two numbers,
divide the result by two, multiply the result of the division by two.

>2. Consider using wmemmove() and just using (pe - p).

No such thing 'round these here parts as wmemmove.

>3. Or just use (((char *) pe) - ((char *) p)) as the character count.

That one hit the spot! Does it all in one instruction. (#1 takes
five, for all the dividing and multiplying.) And within memmove, can
be trimmed somewhat:

memmove(line, p, (char *)pe - (char *)p);

>
>-s

DSF

unread,

Nov 4, 2013, 11:00:46 PM11/4/13

to

Portability is a non-issue here. Anyway, I was just trying to
illustrate how closely the compiler code and final code match.

> What sort of generated code do you get if you simplify the
>size expression to `(pe - p) * sizeof(wchar_t)' instead of
>writing out all those unnecessary operators?

It's fine, but it bugs me that it becomes: subtract two numbers,
divide the result by two, multiply the result of the division by two.

Seebs came up with memmove(line, p, (char *)pe - (char *) -p);
This pointer subtraction compiles to one instruction.

> Also, if you're so concerned about speed: Are you moving
>too much data? Is pe ("the end of the line buffer+1") really
>the proper endpoint, or ought you to be using a pointer just
>past the terminator and perhaps well short of buffer's end?
>Yes, it may take some extra work to find that terminator -- but
>that's the sort of thing you're might well have already ...

That was a descriptive error on my part. pe points to one object,
in this case a wide character) past the end of the end-of-string
terminator, and not the end of the line buffer. pe's very existence
is to mark that point. Its creation earlier in the function allowed
me to replace a costly strlen() with pointer arithmetic.

DSF

unread,

Nov 4, 2013, 11:01:28 PM11/4/13

to

On Fri, 01 Nov 2013 21:25:59 +0100, Johannes Bauer
<dfnson...@gmx.de> wrote:

>Am 01.11.2013 19:53, schrieb DSF:
>
>> So, at least with my current compiler, ganging up equals (variables
>> of the same type, of course) produces shorter and faster code.
>
>If your current compiler misses such an obvious optimization, it is
>quite frankly a piece of shit.

I would not disagree as far as code generation goes.

If we're going to be specific, here's what it compiles to for me:

Foo:
00404429 55 PUSH EBP
0040442A 8BEC MOV EBP,ESP
0040442C 33C0 XOR EAX,EAX
0040442E A3 D0264200 MOV DWORD PTR DS:[4226D0],EAX
00404433 33D2 XOR EDX,EDX
00404435 8915 D4264200 MOV DWORD PTR DS:[4226D4],EDX
0040443B 33C9 XOR ECX,ECX
0040443D 890D D8264200 MOV DWORD PTR DS:[4226D8],ECX
00404443 33C0 XOR EAX,EAX
00404445 A3 DC264200 MOV DWORD PTR DS:[4226DC],EAX
0040444A 33D2 XOR EDX,EDX
0040444C 8915 E0264200 MOV DWORD PTR DS:[4226E0],EDX
00404452 33C9 XOR ECX,ECX
00404454 890D E4264200 MOV DWORD PTR DS:[4226E4],ECX
0040445A 33C0 XOR EAX,EAX
0040445C A3 E8264200 MOV DWORD PTR DS:[4226E8],EAX
00404461 33D2 XOR EDX,EDX
00404463 8915 EC264200 MOV DWORD PTR DS:[4226EC],EDX
00404469 33C9 XOR ECX,ECX
0040446B 890D F0264200 MOV DWORD PTR DS:[4226F0],ECX
00404471 5D POP EBP
00404472 C3 RETN

Bar:
00404473 55 PUSH EBP
00404474 8BEC MOV EBP,ESP
00404476 33C0 XOR EAX,EAX
00404478 A3 14274200 MOV DWORD PTR DS:[422714],EAX
0040447D A3 10274200 MOV DWORD PTR DS:[422710],EAX
00404482 A3 0C274200 MOV DWORD PTR DS:[42270C],EAX
00404487 A3 08274200 MOV DWORD PTR DS:[422708],EAX
0040448C A3 04274200 MOV DWORD PTR DS:[422704],EAX
00404491 A3 00274200 MOV DWORD PTR DS:[422700],EAX
00404496 A3 FC264200 MOV DWORD PTR DS:[4226FC],EAX
0040449B A3 F8264200 MOV DWORD PTR DS:[4226F8],EAX
004044A0 A3 F4264200 MOV DWORD PTR DS:[4226F4],EAX
004044A5 5D POP EBP
004044A6 C3 RETN

Johannes Bauer

unread,

Nov 5, 2013, 1:45:58 AM11/5/13

to

Wow, this is *really* bad code.

Entry points are completely unaligned and the compiler even forgets that
registers are cleared that it cleared *itself* two instructions
beforehand. Also it seems not to do well in lifetime analysis of the
registers (since it switches around eax, edx, ecx thinking those values
are going to be needed later on).

Still you make no mention what compiler you're using and if you have
optimizations turned on (I had in my example, obviously). This is the
interesting part. The generation of stackframes leads me to believe that
you haven't turned them on (because any halfways decent compiler does
not generate stackframes at a certain optimization level, having one
register more to fiddle around with).

Regards,
Joe

Stephen Sprunk

unread,

Nov 5, 2013, 3:10:21 AM11/5/13

to

On 04-Nov-13 22:01, DSF wrote:
> On Fri, 01 Nov 2013 21:25:59 +0100, Johannes Bauer
> <dfnson...@gmx.de> wrote:
>> Am 01.11.2013 19:53, schrieb DSF:
>>> So, at least with my current compiler, ganging up equals
>>> (variables of the same type, of course) produces shorter and
>>> faster code.
>>
>> If your current compiler misses such an obvious optimization, it
>> is quite frankly a piece of shit.
>
> I would not disagree as far as code generation goes.
>
>> Just for reference, gcc 4.7 produces the expected:

>> ...

>
> If we're going to be specific, here's what it compiles to for me:

> ...

If you're going to make vague complaints about what an unspecified
compiler produces with unspecified settings, you should expect to see
folks chime in with more specific examples to discuss.

For instance, GCC 4.2.4 for Linux/x86 produces this with -O0:

foo:
pushl %ebp
movl %esp, %ebp
movl $0, a.2059
movl $0, b.2060
movl $0, c.2061
movl $0, d.2062
movl $0, e.2063
movl $0, f.2064
movl $0, g.2065
movl $0, h.2066
movl $0, i.2067
popl %ebp
ret
...
bar:
pushl %ebp
movl %esp, %ebp
movl $0, i.2079
movl i.2079, %eax
movl %eax, h.2078
movl h.2078, %eax
movl %eax, g.2077
movl g.2077, %eax
movl %eax, f.2076
movl f.2076, %eax
movl %eax, e.2075
movl e.2075, %eax
movl %eax, d.2074
movl d.2074, %eax
movl %eax, c.2073
movl c.2073, %eax
movl %eax, b.2072
movl b.2072, %eax
movl %eax, a.2071
popl %ebp
ret

The latter is definitely suboptimal. However, GCC is well-known to
produce ridiculously inefficient (but completely literal) code when
optimization is disabled. When I switch to -O3, which is what I
normally use, I get this:

foo:
pushl %ebp
movl %esp, %ebp
popl %ebp
movl $0, a.2114
movl $0, b.2115
movl $0, c.2116
movl $0, d.2117
movl $0, e.2118
movl $0, f.2119
movl $0, g.2120
movl $0, h.2121
movl $0, i.2122
ret
...
bar:
pushl %ebp
movl %esp, %ebp
popl %ebp
movl $0, i.2134
movl $0, h.2133
movl $0, g.2132
movl $0, f.2131
movl $0, e.2130
movl $0, d.2129
movl $0, c.2128
movl $0, b.2127
movl $0, a.2126
ret

I'm a little curious why the latter has the order reversed, but the
final result (and efficiency) is identical, which is as expected.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Philip Lantz

unread,

Nov 5, 2013, 3:20:27 AM11/5/13

to

Presumably because that's the order the assignments appear in the code.
The compiler doesn't have to do them in the same order as in the code,
but in this case there's clearly no reason not to.

The original source was:

James Kuyper

unread,

Nov 5, 2013, 7:37:26 AM11/5/13

to

On 11/04/2013 11:00 PM, DSF wrote:
> On 01 Nov 2013 18:51:59 GMT, Seebs <usenet...@seebs.net> wrote:
>
>> On 2013-11-01, DSF <nota...@address.here> wrote:

...

>>> So I changed it
>>> to this: (uint is a typedef'd unsigned int.)
>>> memmove(line, p, (((uint)pe - (uint)line) - ((uint)p - (uint)line)));
>>
>> This isn't safe, don't do that. There is no guarantee that a pointer
>> will fit inside a uint without losing information which could result
>> in tragic failures.
>
> In this case, a pointer is the same size as an unsigned int. If I
> had to rewrite the function for a different platform, this would be
> the least of my problems.

There's no guarantee that (uint)pe - (uint)line calculates any
meaningful number. In particular, there's no guarantee that it
calculates the same number calculated by the correct code:

>> 1. Consider using (pe - p) * sizeof(wchar_t).
> It's fine, but it bugs me that it becomes: subtract two numbers,
> divide the result by two, multiply the result of the division by two.

If that kind of thing worries you, and you're not willing to count on
the compiler to optimize it away, you shouldn't be writing in C - it
doesn't provide the level of control you need to stay happy. Try
assembler instead. Keep in mind that there's no guarantee that your code
doesn't also compile to "subtract two numbers, divide by 2, multiply by
2". Only trust in the competence of your compiler's designers allows you
to assume it hasn't been pessimized that way. The level of incompetence
needed to design a compiler that translates (pe-p)*sizeof(*p) into
anything other than the equivalent of (char*)pe - (char*)p is pretty
substantial.
--
James Kuyper

Tim Rentsch

unread,

Nov 8, 2013, 11:30:56 AM11/8/13

to

Seebs <usenet...@seebs.net> writes:

> On 2013-11-01, DSF <nota...@address.here> wrote:
>> In this code line is the start of the line buffer, pe is the end of
>> the line buffer+1, and p is the start of the area to be moved down.
>> These are all of type wchar_t *.
>
>> memmove(line, p ((pe - line) * sizeof(wchar_t)) - ((p - line) *
>> sizeof(wchar_t));
>
> ... Why not (pe - p) * sizeof(wchar_t)?
>
>> It wasn't exactly following my code, but it was doing unnecessary
>> multiplying and dividing, even if by a power of two.
>
> I am not sure that "by a power of two" matters at all in this case.
>
>>So I changed it
>> to this: (uint is a typedef'd unsigned int.)
>> memmove(line, p, (((uint)pe - (uint)line) - ((uint)p - (uint)line)));
>
> This isn't safe, don't do that. There is no guarantee that a pointer
> will fit inside a uint without losing information which could result
> in tragic failures.
>
>> More typing, but shorter, faster code and fewer temps.
>
> Also code that no longer makes sense.
>
> Me, I'd suggest:

> 1. Consider using (pe - p) * sizeof(wchar_t). [snip 2&3]

Normally I would rather see this as (pe - p) * sizeof *p .

Seebs

unread,

Nov 8, 2013, 3:28:12 PM11/8/13

to

On 2013-11-08, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> Me, I'd suggest:
>> 1. Consider using (pe - p) * sizeof(wchar_t). [snip 2&3]

> Normally I would rather see this as (pe - p) * sizeof *p .

... You know, that should have occurred to me, and I think it even did when I
was talking about this later with some other people.

Jorgen Grahn

unread,

Nov 9, 2013, 6:03:15 AM11/9/13

to

On Tue, 2013-11-05, Johannes Bauer wrote:
...

> Still you make no mention what compiler you're using and if you have
> optimizations turned on (I had in my example, obviously).

He did write "with size optimization enabled". But yeah, I don't see
why the identity of the compiler has to be kept secret ... and as long
as it is, this thread is rather irrelevant from my point of view.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

DSF

unread,

Nov 12, 2013, 11:36:33 PM11/12/13

to

Sorry that I forgot to mention the compiler, I thought I had. It's
Borland C++ 5.01A. Yes, I know it was around when dinosaurs walked
the planet. I should use something newer, but it's like changing word
processors, but worse! I'd still want to use an IDE. So most editing
commands been changed and half my code will probably need to be
rewritten to even compile, let alone run properly. I'm just
*dreading* the time it will take to get to the comfort/experience
point I'm at now.

I did say that I had optimize for size turned on.

I must say, I find a lot of inefficient code when I'm assembly level
debugging. Some can be blamed on the age, since processor
manufacturers have changed which instructions to optimize over the
years, and efficient code in 1995 isn't necessarily so in 2013. With
that said, I've seen code that sets EAX to 3 and jumps to an
instruction that sets EAX to 3, instead of the following instruction.

I found "Standard Stack Frame" under debugging and turned it off. It
did eliminate the EBP manipulation. Optimization choices are pretty
slim.

On the bright side, the assembly code correlates very closely with
the C code, and sometimes having the compiler do *exactly* what you
want instead of what its designers think is best. That said, if I
could put on a futuristic "learning cap" and in fifteen minutes
completely understand the modern compiler of my choice, I wouldn't
hesitate for a second! :o)

DSF

unread,

Nov 13, 2013, 12:12:55 AM11/13/13

to

On Tue, 05 Nov 2013 07:37:26 -0500, James Kuyper
<james...@verizon.net> wrote:

>On 11/04/2013 11:00 PM, DSF wrote:
>> On 01 Nov 2013 18:51:59 GMT, Seebs <usenet...@seebs.net> wrote:
>>
>>> On 2013-11-01, DSF <nota...@address.here> wrote:
>...
>>>> So I changed it
>>>> to this: (uint is a typedef'd unsigned int.)
>>>> memmove(line, p, (((uint)pe - (uint)line) - ((uint)p - (uint)line)));
>>>
>>> This isn't safe, don't do that. There is no guarantee that a pointer
>>> will fit inside a uint without losing information which could result
>>> in tragic failures.
>>
>> In this case, a pointer is the same size as an unsigned int. If I
>> had to rewrite the function for a different platform, this would be
>> the least of my problems.
>
>There's no guarantee that (uint)pe - (uint)line calculates any
>meaningful number. In particular, there's no guarantee that it
>calculates the same number calculated by the correct code:

I assume the above statement is made on the basis of portability,
since on my compiler, it produces the same results (in bytes) as
subtracting two unsigned integers.

>>> 1. Consider using (pe - p) * sizeof(wchar_t).
>> It's fine, but it bugs me that it becomes: subtract two numbers,
>> divide the result by two, multiply the result of the division by two.
>
>If that kind of thing worries you, and you're not willing to count on
>the compiler to optimize it away, you shouldn't be writing in C - it
>doesn't provide the level of control you need to stay happy. Try
>assembler instead. Keep in mind that there's no guarantee that your code
>doesn't also compile to "subtract two numbers, divide by 2, multiply by
>2". Only trust in the competence of your compiler's designers allows you
>to assume it hasn't been pessimized that way. The level of incompetence
>needed to design a compiler that translates (pe-p)*sizeof(*p) into
>anything other than the equivalent of (char*)pe - (char*)p is pretty
>substantial.

The compiler (I thought for sure I mentioned it) is Borland C++
5.01A. For the reason why, see my other post in this thread. So I
*know* it won't optimize it away. :o)

I have stepped through enough assembly code whilst debugging to
state that I cannot trust the compiler to produce an efficient version
of my self-optimized C code. I have duplicated over half of the
string/memory manipulation functions of the RTL. Partly to support
16-bit characters, but also for speed and efficiency. To be honest, a
lot of the code is inefficient and slow because of the changes in
processor design over the years, but not all of it. I'm talking about
assembly here, not C, so their code was written by people, not their
compiler.

To brag a little (as much as one can about besting almost
20-year-old code) one of my string routines (I can't remember which
one, it's been years) had an average timing of 300 times faster than
the RTL code. That's times, not percent. And I have no delusions I
could do the same against a modern compiler's RTL.

Seebs

unread,

Nov 13, 2013, 3:04:55 AM11/13/13

to

On 2013-11-13, DSF <nota...@address.here> wrote:
> On Tue, 05 Nov 2013 07:37:26 -0500, James Kuyper
><james...@verizon.net> wrote:
>>There's no guarantee that (uint)pe - (uint)line calculates any
>>meaningful number. In particular, there's no guarantee that it
>>calculates the same number calculated by the correct code:

> I assume the above statement is made on the basis of portability,
> since on my compiler, it produces the same results (in bytes) as
> subtracting two unsigned integers.

Yes. But there are lots of platforms where pointers cast to uint will
have lost some of their bits, and where you might occasionally see
strange behavior.

Assume ints are 32-bit, and pointers 64-bit, and consider what you
get from (uint) 0x100000010 - (uint) 0x0FFFFFFF0.

> I have stepped through enough assembly code whilst debugging to
> state that I cannot trust the compiler to produce an efficient version
> of my self-optimized C code.

In that case, I think it's a safe bet that you would spend an order
of magnitude less time converting to a modern compiler than you are
spending writing code that's less maintainable and more likely to
have subtle bugs than what a modern compiler would do.

> To brag a little (as much as one can about besting almost
> 20-year-old code) one of my string routines (I can't remember which
> one, it's been years) had an average timing of 300 times faster than
> the RTL code. That's times, not percent. And I have no delusions I
> could do the same against a modern compiler's RTL.

Which sort of renders the entire exercise a little silly, no?

James Kuyper

unread,

Nov 13, 2013, 8:57:21 AM11/13/13

to

On 11/13/2013 12:12 AM, DSF wrote:
> On Tue, 05 Nov 2013 07:37:26 -0500, James Kuyper
> <james...@verizon.net> wrote:
>
>> On 11/04/2013 11:00 PM, DSF wrote:
>>> On 01 Nov 2013 18:51:59 GMT, Seebs <usenet...@seebs.net> wrote:
>>>
>>>> On 2013-11-01, DSF <nota...@address.here> wrote:
>> ...
>>>>> So I changed it
>>>>> to this: (uint is a typedef'd unsigned int.)
>>>>> memmove(line, p, (((uint)pe - (uint)line) - ((uint)p - (uint)line)));
>>>>
>>>> This isn't safe, don't do that. There is no guarantee that a pointer
>>>> will fit inside a uint without losing information which could result
>>>> in tragic failures.
>>>
>>> In this case, a pointer is the same size as an unsigned int. If I
>>> had to rewrite the function for a different platform, this would be
>>> the least of my problems.
>>
>> There's no guarantee that (uint)pe - (uint)line calculates any
>> meaningful number. In particular, there's no guarantee that it
>> calculates the same number calculated by the correct code:
>
> I assume the above statement is made on the basis of portability,
> since on my compiler, it produces the same results (in bytes) as
> subtracting two unsigned integers.

I was talking about guarantees provided by the C standard. There's no
limit on the number and variety of guarantees provided by other sources.
Which is one of the reasons I don't see much point in discussing those
other sources of guarantees, except in a forum specific to the
particular source.

> The compiler (I thought for sure I mentioned it) is Borland C++

I can find no mention of it in any of messages before yesterday.

> 5.01A. For the reason why, see my other post in this thread. So I
> *know* it won't optimize it away. :o)

...

> To brag a little (as much as one can about besting almost

> 20-year-old code) ...

There may be legitimate reasons for worrying about and complaining about
the inadequacies of 20-year old compilers, though those reasons don't
apply to me. However, those inadequacies should be attributed to the age
of the compiler, not to the language it compiles. You Subject: header
should have been "Borland C++ can be very literal".
--
James Kuyper

DSF

unread,

Jan 16, 2014, 2:49:00 PM1/16/14

to

On Tue, 05 Nov 2013 02:10:21 -0600, Stephen Sprunk
<ste...@sprunk.org> wrote:

Sorry this is a tiny bit late. Been busy.

The former suffers from an inefficiency as well.
movl $0, a.2059
I'm not proficient in AA&T syntax, but I assume is the equivalent of:
move dword ptr a, 0

A move of 0 to a static address translates to:
C705F0D0410000000000 mov [0x41D0F0], 0x00000000
10 bytes per store.

A stack-relative address is a little better:
C7450400000000 mov [ebp+4], 0
7 bytes per store.

As compared to:
33C0 xor eax, eax
A3F0D04100 move [0x41d0f0], eax
7 bytes for initial store, 5 for each additional store.

DSF

unread,

Apr 21, 2014, 1:43:10 PM4/21/14

to

On Thu, 16 Jan 2014 14:49:00 -0500, DSF <nota...@address.here>
wrote:

>On Tue, 05 Nov 2013 02:10:21 -0600, Stephen Sprunk
><ste...@sprunk.org> wrote:
>
>Sorry this is a tiny bit late. Been busy.

Busy meaning sick. Sooo...see below.
{snipped}

> The former suffers from an inefficiency as well.
> movl $0, a.2059
> I'm not proficient in AA&T syntax, but I assume is the equivalent of:
> move dword ptr a, 0
>
>A move of 0 to a static address translates to:
>C705F0D0410000000000 mov [0x41D0F0], 0x00000000
>10 bytes per store.
>
>A stack-relative address is a little better:
>C7450400000000 mov [ebp+4], 0
>7 bytes per store.
>
>As compared to:
>33C0 xor eax, eax
>A3F0D04100 move [0x41d0f0], eax
>7 bytes for initial store, 5 for each additional store.

Sorry for the C-syntax hex numbers!

DSF

DSF

unread,

Apr 21, 2014, 1:43:13 PM4/21/14

to

On 13 Nov 2013 08:04:55 GMT, Seebs <usenet...@seebs.net> wrote:

>On 2013-11-13, DSF <nota...@address.here> wrote:
>> On Tue, 05 Nov 2013 07:37:26 -0500, James Kuyper
>><james...@verizon.net> wrote:
>>>There's no guarantee that (uint)pe - (uint)line calculates any
>>>meaningful number. In particular, there's no guarantee that it
>>>calculates the same number calculated by the correct code:
>
>> I assume the above statement is made on the basis of portability,
>> since on my compiler, it produces the same results (in bytes) as
>> subtracting two unsigned integers.
>
>Yes. But there are lots of platforms where pointers cast to uint will
>have lost some of their bits, and where you might occasionally see
>strange behavior.
>
>Assume ints are 32-bit, and pointers 64-bit, and consider what you
>get from (uint) 0x100000010 - (uint) 0x0FFFFFFF0.

I'm well aware of that. At the time, speed (of finishing the
project, not execution time) was of a priority. I don't recall right
now what that code was part of, but I believe it's still a work in
progress. (I was sick for over a month.)

>> I have stepped through enough assembly code whilst debugging to
>> state that I cannot trust the compiler to produce an efficient version
>> of my self-optimized C code.
>
>In that case, I think it's a safe bet that you would spend an order
>of magnitude less time converting to a modern compiler than you are
>spending writing code that's less maintainable and more likely to
>have subtle bugs than what a modern compiler would do.

True. But that is bound to be a major undertaking and I have
several projects in progress that I need yesterday! The time
learning/getting used to a new compiler plus some code conversion
would set me back months.

>> To brag a little (as much as one can about besting almost
>> 20-year-old code) one of my string routines (I can't remember which
>> one, it's been years) had an average timing of 300 times faster than
>> the RTL code. That's times, not percent. And I have no delusions I
>> could do the same against a modern compiler's RTL.
>
>Which sort of renders the entire exercise a little silly, no?

No. Because it wasn't an exercise. I needed 16-bit character
support and converting the 16-bit code to an 8-bit counterpart for
each of the string functions I wrote was a simple matter.

#define TONGUE_IN_CHEEK

Who knows? It might still be incredibly faster than a current
compiler. When I get the two remaining projects done and have some
time to look into finding/learning a new compiler, I'll let you know.

#undef TONGUE_IN_CHEEK

>-s

DSF

DSF

unread,

Apr 21, 2014, 1:43:14 PM4/21/14

to

On Wed, 13 Nov 2013 08:57:21 -0500, James Kuyper
<james...@verizon.net> wrote:

{much snipped}

>> To brag a little (as much as one can about besting almost
>> 20-year-old code) ...
>
>There may be legitimate reasons for worrying about and complaining about
>the inadequacies of 20-year old compilers, though those reasons don't
>apply to me. However, those inadequacies should be attributed to the age
>of the compiler, not to the language it compiles. You Subject: header
>should have been "Borland C++ can be very literal".

It should have, but I didn't want all of the responses to be flames
for being off-topic.

There are times when I *WANT* the compiler to be literal. Some
practical reasons: There are some optimizations in BC++ that cannot
be turned of and make debugging difficult. There are occasions where
I want the resulting code to be exactly how I wrote it. They're rare,
but I've heard complaints of errant optimizations altering the intent
of the code under certain conditions.

DSF

James Kuyper

unread,

Apr 21, 2014, 1:57:30 PM4/21/14

to

On 04/21/2014 01:43 PM, DSF wrote:
> On Wed, 13 Nov 2013 08:57:21 -0500, James Kuyper
> <james...@verizon.net> wrote:
...
>> There may be legitimate reasons for worrying about and complaining about
>> the inadequacies of 20-year old compilers, though those reasons don't
>> apply to me. However, those inadequacies should be attributed to the age
>> of the compiler, not to the language it compiles. You Subject: header
>> should have been "Borland C++ can be very literal".
>
> It should have, but I didn't want all of the responses to be flames
> for being off-topic.

The best way to avoid getting flamed for an off-topic post is to post it
to a forum where it isn't off-topic (a Borland C++ forum would be such a
one for this issue). Using a misleading "Subject:" header won't help
much, most people pay more attention to the body of the message than to
the "Subject:" header.

> There are times when I *WANT* the compiler to be literal. Some

Then, at those times, you should be using assembler, not C. C was never
intended to specify precisely the generated machine code. It was
intended to specify the resulting behavior, leaving implementations free
to choose how they achieve that behavior.