On 11/05/17 15:59, Jerry Stuckle wrote:
> On 5/11/2017 2:27 AM, Gareth Owen wrote:
>> David Brown <
david...@hesbynett.no> writes:
>>
>>> On 10/05/17 23:14, jacobnavia wrote:
>>>> Prefetching, pipeline construction are difficult to do for a given C++
>>>> program. Since the language doesn't offer any way to do that, you rely
>>>> on automatic translation.
>>>>
>>>
>>> And often the compiler can do a better job of it than an assembly
>>> programmer can. Failing that, implementation extensions can help
>>> (like __builtin_prefetch in gcc), and failing that, most compilers
>>> will let you make small inline function that wraps a piece of inline
>>> assembly.
>>
>> The interesting thing here is that Jacob claims that his hand-rolled
>> assembler is almost always faster than translated code - and the
>> original article makes the opposite claim - that peephole optimisations
>> and knowledge of CPU scheduling issues make the compiler-generated ASM
>> so baroque that its no human likely to generate it.
>>
>
> Compilers are written by humans, and so is the code generated by the
> compilers.
Well, sort of. Compilers are written by humans, yes (though there can
be layers of indirection, such as yacc).
Humans write the code and macros that generate the object code sequences
- they don't write the actual assembly code. When writing the output
macro for translating "x * y", for example, the human-written code will
say something like "Make sure x is in a register - call it rA. Make
sure y is in a register - call it rB. Find a free register rD.
Generate a "mul rD, rA, rB" instruction. The result is in rD, and the
flag register is updated". the compiler will interlace this with other
instructions, depending on processor scheduling and pipelining. It may
lead to instructions to load a register from memory, it may not. Some
types of code lead to complex object code generation - a switch might
lead to a series of comparisons, a binary tree of comparisons, a jump
table, calculated jumps, or a mixture.
The compiler code generation is not just a copy-and-paste of sequences
of hand-written assembly with a few register renames. There was a time,
long ago, when that was the case (and it may still be the case in
simpler compilers), but not now.
The information about what instructions to use are, of course, given by
a human - as is information about timing, pipelines, scheduling, etc.,
that helps the compiler pick between alternative solutions and interlacing.
>
> A good assembler programmer knows the language. An expert assembler
> programmer knows the language *and the processor*.
Agreed.
> He/she can write the
> same code the compiler generates, so his/her code is never slower.
In theory, yes. In practice - no, except for very short sequences or
particular special cases.
> However, the compiler still has constraints on it based on its design.
> The programmer has no such constraints.
In theory, yes - in practice, no. The programmer has constraints -
there are limits to how well he/she can track large numbers of
registers, or processor pipelines, or multi-issue scheduling. Making a
small change in one part of the assembly can have knock-on effects in
the timings in other parts. A human programmer simply cannot keep track
of it all without an inordinate amount of time and effort.
So /sometimes/ an assembler programmer can do better, especially on
short sequences or particular cases that might map well to processor
instructions but poorly to C code. But often it is simply too much
effort - even an expert assembly programmer will not be willing to spend
the time on the tedious detail, and has a high chance of getting at
least something wrong.
In particular, if you insist on writing clear and maintainable code in
an appropriate timeframe, as most professional programmers aim for, then
it is very rare that even an expert assembler programmer will beat a
good compiler. It is perfectly possible to write clear and maintainable
assembly code - but it is rarely the fastest possible result on a modern
chip.
>
> Additionally, the compiler generates code for a specific series of
> processors - i.e. 32 or 64 bit. If the programmer knows the code will
> only run on one specific processor, he/she can write code specific to
> that processor.
Many compilers can generate code specifically for particular target
processors. (Sorry, Jacob, but I must use gcc as the example again - it
is the compiler I know best.) gcc has options to generate code that
will work on a range of cpus (within a family such as x86-32) but have
scheduling optimised for one particular target or subfamily. Or it can
generate code that /requires/ a particular subfamily feature. Or it can
generate a number of implementations for a given function, and pick the
best one at run-time based on the actual cpu that is being used.
Yes, all of that /can/ be done by an assembler programmer - but it is
unrealistic to think that it /would/ be done, except in extreme cases.
>
> With that said, there are very few assembler programmers with the
> required level of expertise nowadays. But they are out there.
>
Agreed. And they are mostly spending their time doing something
/useful/ with those skills, rather than trying to beat compiler output
by a fraction of a percent on one particular processor. For example,
they are involved in writing compilers :-)