On 2017-09-02 23:54,
rug...@nospicedham.gmail.com wrote:
> On Thursday, August 31, 2017 at 10:49:11 AM UTC-5, Robert Prins wrote:
>>
>> my hacked-about code comes out as:
>>
>> {$ifdef mmx}
>> ...
>> {$else}
>> mov ecx, [ebx + offset lift_list.dtime]
>> mov [esi + offset s_rec.dtime], ecx
>>
>> mov ecx, [ebx + offset lift_list.atime]
>> mov [esi + offset s_rec.atime], ecx
>>
>> mov ecx, [ebx + offset lift_list.wtime]
>> mov [esi + offset s_rec.wtime], ecx
>>
>> mov ecx, [ebx + offset lift_list.itime]
>> mov [esi + offset s_rec.itime], ecx
>> {$endif}
>>
>> I would expect that GCC or the Intel C compiler would both
>> generate at least the code in the {$else} branch above when
>> the original Pascal is replaced by C(++).
>
> So why not rewrite in C? It can't be that hard (famous last words!).
I've been brought up with ALGOL 60, and TI-59-ese ;) and around 1984/5 my father
bought Turbo Pascal 2, and in 1985 I started work, with PL/I. The next language
(REXX) followed in early 1992, and I've never had the luck (or is it misfortune)
to work with C. The oldest ***saved*** version of "LIFT" dates back to 9 April
1994 (a 49k .COM file), and the first time I started using the old TP "inline"
statement was in version 46, on 30 July 1995.
And for history buffs, the last (60th) TP 3.01a version dates back to 5 August
1996. It was followed by the first TP 6.00 version on 2 September 1996, and the
last (53rd) TP6 version saw the light on 6 October 2008, to be followed by the
first VP version on the same day - the current (95th) VP version comes in at a
hefty 96k .EXE. I don't like bloatware. ;)
> Though personally I'd suggest rather fixing to work with FPC,
> that's more useful and important (IMHO).
No, it's not. VP may be dead, it's hellish to add post Pentium instructions in
the form of long DB sequences to it, and debugging them is of course impossible,
but the IDE is still light-years ahead of what FPC offers. My goal is to
eventually convert the program into pure assembler, probably via FASM, and I
will not go back to FPC until it has an IDE that is as smooth as the one used by
VP, in other words: probably never. :(
>> Both VP and FPC seem to make way too much use of EAX...
>
> I'm not an optimization guru. I haven't read Agner Fog's manuals
> closely. Modern cpus probably do heavy register renaming and
> lots of out-of-order (pipelined, superscalar, whatever) stuff.
> I think older ones were pickier about certain things, but I
> don't know if you care about (or test your code on) such machines.
My desktop uses an AMD FX8150, the laptop an Intel quadcore mobile i7, and the
Pure Pascal version of the program would likely still run on a 386, but how
useful is that, I don't even think I even have anything pre-486, and the 486
probably has DOS on it, as I still, one day, hope to get the TI-95 PC Interface
software working again, which would allow me to find the bugs in a TI-95
emulator, by simply testing each (emulated) instruction exhaustively.
> Relying too much on one register is probably a bad idea, but
> they probably just want to simplify register shuffling.
> I had thought I read that alternating registers was a better
> idea, so maybe try not relying too heavily on ECX either.
VP actually does put local variables into EBX/ESI/EDI, but only to a limited
extent, e.g. if a procedure uses more than three that are register-able, only
three will be put into registers, even though two (or more) might be more or
less independent, and could easily all be aliased to registers, which is what
I'm doing manually, by looking at the overall structure of the code, and
although I doubt that my hand-crafted in-line assembler is as fast as the
properly scheduled code emitted by the GCC or Intel C(++) compilers, I do know
that it's way ahead of what both VP and PFC produce, both size and speed-wise!.
> Actually, your {$else} code seems pretty sequential.
There's only so much you can do in parallel, and I think I've converted most of
what could have been converted to MMX (as using XMM instructions, if I may
believe everything I read, causes significant stalls, on (some) Intel CPUs, when
combined with YMM instructions, which I use for larger moves).
> I would
> just use "push dword[], pop dword[]" (but not for 486) and
> avoid ECX altogether. Maybe I'm naive, but that's a quick
> simplification. No idea if it really helps you, though.
It might be shorter (is it?), it definitely doesn't use registers, but is it
faster? I've got all the Agner Fog files, and I'm not sure.