Converting some way to clever PL/I code to x86?

Robert Prins

unread,

Aug 3, 2017, 5:09:51 PM8/3/17

to

I've recently come across some really clever/very nasty PL/I code, that would,
theoretically, save CPU by eliminating a conditional jump. It relies on
initializing a BCD-encoded ***integer*** variable ("sum") with -0.1, which
results, on IBM mainframes, the last nibble of the BCD encoded value to contain
0xD (rather than the normal 0xC). The author uses this to avoid a costly
(Phuleeze, pass me a bucket!) test, so rather than coding:

sum = -1;

do i = 1 to whatever;
if a(i) >= 0 then
if sum <> -1 then
sum = sum + a(i);
else
sum = a(i);
end;

if sum <> -1 then "print sum";

the code can be simplified to

sum = -0.1; /* fraction is discarded, but -sign (0xD) is kept! */

do i = 1 to whatever;
if a(i) >= 0 then
sum = sum + a(i);
end;

if last_nibble(sum) <> 0xD then "print sum";

where "last_nibble" is a simplification of using two actual PL/I builtin
functions that actually allow access to the last nibble of a BCD encoded value,
and the addition of any a(i) to "sum", even an a(i) = 0 will cause the CPU to
normalize the last nibble of "sum" to 0xC.

Testing this for big "whatever" (in an outer loop, and using a small array "a"
in the inner loop) makes no flipping difference (on a Hercules emulated) z/OS
system, which doesn't surprise yours truly one Iota. :)

However, I would be curious if there is a way to code something similar in x86
assembler, when using strictly integer values, which implies that sum/eax must
be initialized to -1(?), and the addition is preceded by a "cmp eax, -1(?)" to
set up a carry, but that doesn't seem to work.

Or am I just on a wild goose chase?

Obviously using a "cmp eax, -1" followed by a "sete dl / movzx edx,dl / add
eax,edx / add eax, 'a(i)'" works, but might take a few nano-seconds more than a
pretty much very well predicted conditional jump...

Any thoughts?

Robert
--
Robert AH Prins
robert(a)prino(d)org

Terje Mathisen

unread,

Aug 4, 2017, 6:40:41 AM8/4/17

to

The problem/idea here is that you initialize the BCD int to an illegal
value which is effectively zero, but which will maintain that flag info
until the first real operation on it, right?

For pure binary integer code there is of course no reserved value
(should probably have been MININT, i.e. 0x80000/-32768 for a 16-bit
int), so you cannot add any extra info here.

As soon as you reserve a single value as the starting point for your
sum, then you cannot handle arbitrary inputs, and since an input of zero
is legal and should be added, you must use a separate flag:

sum = 0;
added_values = 0;
foreach (a in arr[]) {
if (a >= 0) {
sum += a;
added_values++;
}
}

;; ESI->array, ECX has count
xor edx,edx
xor ebx,ebx
next:
lodsd
test eax,eax
jl skip
add edx,eax
inc ebx
skip:
loop next

What's expensive here is the test for >= 0 for each element, not the
separate flag value (in EBX): Updating this is totally free.

If the pattern of valid/invalid values in the input array is
unpredictable, then you could consider CMOV operations:

next:
lodsd
xor edi,edi
test eax,eax

setge bl
cmovge edi,eax

add edx,eax
or bh,bl
loop next

The snippet above will take ~5 cycles/iteration while the branchy
version is at least one cycle faster when correctly predicted.

If only positive array elements were OK, then you could initialize the
sum to -1, and at the end check it:

If still -1 then no legal values were found, otherwise increment the sum
and print it.

Terje

- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Steve

unread,

Aug 4, 2017, 8:10:49 AM8/4/17

to

Hi,

Robert Prins <rob...@nospicedham.prino.org> writes:
>I've recently come across some really clever/very nasty PL/I code, that would,
>theoretically, save CPU by eliminating a conditional jump. It relies on
>initializing a BCD-encoded ***integer*** variable ("sum") with -0.1, which
>results, on IBM mainframes, the last nibble of the BCD encoded value to contain
>0xD (rather than the normal 0xC). The author uses this to avoid a costly
>(Phuleeze, pass me a bucket!) test, so rather than coding:

If I understand Terje's answer, he might have a typo

add edx,eax

should be

add edx, edi

Or maybe I missed something.

Anyway, why not something like:
MOV [Sum],-1 ; Set sum to illegal value.
MOV ECX,[WhatEver] ; Your loop count.
MOV ESI, OFFSET a ; Your data array.

Label1:
LODSD ; EAX is an array value.
TEST EAX,EAX
JL Label2 ; Skip unwanted values.

MOV [Sum],EAX ; Sum initial value
JMP Label3 ; Go process the rest

Label2:
LOOP Label1
JMP Label5 ; No valid data, Sum still -1.

Label3:
LODSD ; EAX is an array value.
TEST EAX,EAX
JL Label4 ; Skip unwanted values.

ADD [Sum],EAX ; Update Sum.

Label4:
LOOP Label3

Label5: ; No valid data, Sum still -1.
; Valid data, Sum not -1.

Regards,

Steve N.

Terje Mathisen

unread,

Aug 4, 2017, 8:55:54 AM8/4/17

to

Hi Steve!

You are of course correct that I had a typo in one of the versions,
thanks for spotting that!

Re the idea to split the loop into first finding a valid input and then
handling the rest, that is of course the fastest solution as long as the
body of the loop isn't too complicated, i.e. like in the current case.

The only real problem is that you have to look out for the additional
fencepost issue, i.e. what happens if the first valid input is the last
entry?

You should probably do a JMP Label4 after finding that first entry, and
since it is perfectly possible for a bunch of additions to wrap around
and end up with -1 as the sum, I would also keep a separate valid flag:

sum = 0;
valid = 0;
int i = 0;
while (i < len && arr[i] < 0) { i++; }
if (i < len) {
valid = 1;
sum = arr[i++];
while (i < len) {
a = arr[i++];
if (a > 0) sum += a; // Don't need to add any zeroes!
}
}

Terje

Robert Prins

unread,

Aug 4, 2017, 9:10:56 AM8/4/17

to

On 2017-08-04 10:37, Terje Mathisen wrote:
> The problem/idea here is that you initialize the BCD int to an illegal value
> which is effectively zero, but which will maintain that flag info until the
> first real operation on it, right?

No (the value is legal) and yes (it is really zero), BCD integers on system Z
use the last nibble as an indicator for the sign, with A, C, E, and F indicating
a positive value, and B & D a negative one. All six of them are perfectly legal
(0..9 are not and lead to a S0C7, a hardware detected "Data Exception" error).
The only ones used by the system are C, D, and F (which in a grey past meant
unsigned)

Obviously this code should never have gone into production. A former colleague
sent it to me, as to him, and further afield, in the outsourcing outfit in the
far-east, it didn't make any sense. It was written before I joined the company,
and my guess is that it was written by a contractor who wrote a lot of tools (in
PL/I and z/OS assembler) that, once he left, entered into a "It works, nobody
really knows how, so use it, but don't touch it!" realm.

I've actually filed an R(equest) F(or) E(enhancement) with IBM

https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=108713

to flag such code with a I(nformational) or W(arning) message, or even to
introduce a new "RULES" option to the PL/I compiler to issue an E(rror) message,
which would, in most z/OS shops, stop the compiler from generating any code.

I've told my ex-colleague to init "sum" with -1 and accept the additional test,
in this day and age those few extra pico-seconds (the z14 runs at 5.2 GHz) don't
really matter. ;)

> For pure binary integer code there is of course no reserved value (should
> probably have been MININT, i.e. 0x80000/-32768 for a 16-bit int), so you cannot
> add any extra info here.

I know, it's just that the z/OS code has piqued my curiosity, be it that I'm
just summing or storing values >= 0, so my summing bins are all initialized to
-1. The stores obviously do not present a problem, but for the summations I need
two paths, in pseudo-code:

if sum = -1 then
sum = a[i]
else
sum = sum + a[i];

In most cases there are very few a[i]'s, but as an added complication they are
summed into various "sum" bins, leading, assumed, but not unlikely, to a pretty
high rate of branch miss-prediction.

So my

cmp eax, -1
sete dl
movzx edx, dl
add eax, edx
add eax, "a[i]"

would be another branch-less solution, although I suspect that there are some
interdependencies in the above to slow it down. (Why, oh why didn't the SETcc
commands use the full 32-bit registers? Hey AMD/Intel why not allow a 0x66
prefix on them to do so?)

Robert
--
Robert AH Prins
robert(a)prino(d)org

Terje Mathisen

unread,

Aug 4, 2017, 1:47:57 PM8/4/17

to

Robert Prins wrote:
> On 2017-08-04 10:37, Terje Mathisen wrote:
>> The problem/idea here is that you initialize the BCD int to an
>> illegal value which is effectively zero, but which will maintain
>> that flag info until the first real operation on it, right?
>
> No (the value is legal) and yes (it is really zero), BCD integers on

I know, I should have said "non-canonical encoding of zero" instead.

I.e. the fact that BCD has spare room in the encodings means that it is
possible to play tricks like this.

> system Z use the last nibble as an indicator for the sign, with A, C,
> E, and F indicating a positive value, and B & D a negative one. All
> six of them are perfectly legal (0..9 are not and lead to a S0C7, a
> hardware detected "Data Exception" error). The only ones used by the
> system are C, D, and F (which in a grey past meant unsigned)

I'm on the ieee754 team which is updating the floating point standard
for 2018, I know a bit about computer arithmetic and encodings. :-)

>
> Obviously this code should never have gone into production. A former
> colleague sent it to me, as to him, and further afield, in the
> outsourcing outfit in the far-east, it didn't make any sense. It was
> written before I joined the company, and my guess is that it was
> written by a contractor who wrote a lot of tools (in PL/I and z/OS
> assembler) that, once he left, entered into a "It works, nobody
> really knows how, so use it, but don't touch it!" realm.

I think it is perfectly OK, as long as the trick is properly documented.

>
> I've actually filed an R(equest) F(or) E(enhancement) with IBM
>
> https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=108713
>
>
>
> to flag such code with a I(nformational) or W(arning) message, or
> even to introduce a new "RULES" option to the PL/I compiler to issue
> an E(rror) message, which would, in most z/OS shops, stop the
> compiler from generating any code.
>
> I've told my ex-colleague to init "sum" with -1 and accept the
> additional test, in this day and age those few extra pico-seconds
> (the z14 runs at 5.2 GHz) don't really matter. ;)
>
>> For pure binary integer code there is of course no reserved value
>> (should probably have been MININT, i.e. 0x80000/-32768 for a
>> 16-bit int), so you cannot add any extra info here.
>
> I know, it's just that the z/OS code has piqued my curiosity, be it
> that I'm just summing or storing values >= 0, so my summing bins are
> all initialized to -1. The stores obviously do not present a problem,
> but for the summations I need two paths, in pseudo-code:
>
> if sum = -1 then sum = a[i] else sum = sum + a[i];
>
> In most cases there are very few a[i]'s, but as an added
> complication they are summed into various "sum" bins, leading,
> assumed, but not unlikely, to a pretty high rate of branch
> miss-prediction.

Can you post the real code? Maybe that gives more opportunities for
aoptimization?

Terje

>
> So my
>
> cmp eax, -1 sete dl movzx edx, dl add eax, edx add eax,
> "a[i]"
>
> would be another branch-less solution, although I suspect that there
> are some interdependencies in the above to slow it down. (Why, oh why
> didn't the SETcc commands use the full 32-bit registers? Hey
> AMD/Intel why not allow a 0x66 prefix on them to do so?)
>
> Robert

--

Steve

unread,

Aug 4, 2017, 2:02:59 PM8/4/17

to

Hi Terje,

Terje Mathisen <terje.m...@nospicedham.tmsw.no> writes:
>Hi Steve!
>
>You are of course correct that I had a typo in one of the versions,
>thanks for spotting that!

You are welcome.

>Re the idea to split the loop into first finding a valid input and then
>handling the rest, that is of course the fastest solution as long as the
>body of the loop isn't too complicated, i.e. like in the current case.
>
>The only real problem is that you have to look out for the additional
>fencepost issue, i.e. what happens if the first valid input is the last
>entry?
>
>You should probably do a JMP Label4 after finding that first entry,

Yes, thanks for finding that. As I had it, it will always read an
extra array entry. And that would be a real problem. Oops...

Regards,

Steve N.

Rod Pemberton

unread,

Aug 5, 2017, 5:03:52 AM8/5/17

to

On Thu, 3 Aug 2017 23:00:16 +0000
Robert Prins <rob...@nospicedham.prino.org> wrote:

> I've recently come across some really clever/very nasty PL/I code,
> that would, theoretically, save CPU by eliminating a conditional
> jump. It relies on initializing a BCD-encoded ***integer*** variable
> ("sum") with -0.1, which results, on IBM mainframes, the last nibble
> of the BCD encoded value to contain 0xD (rather than the normal 0xC).
> The author uses this to avoid a costly (Phuleeze, pass me a bucket!)
> test, so rather than coding:
>
> sum = -1;
>
> do i = 1 to whatever;
> if a(i) >= 0 then
> if sum <> -1 then
> sum = sum + a(i);
> else
> sum = a(i);
> end;
>
> if sum <> -1 then "print sum";
>
> the code can be simplified to

Do you need to print the result when the sum is zero? Why? ...

In other words, can you change "a(i)>=0" to "a(i)>0"? If you can, then
you should be able to use:

sum = 0;

do i = 1 to whatever;

if a(i) > 0 then

sum = sum + a(i);
end;

if sum <> 0 then "print sum";

If not, I would probably add a flag to separate out the check as to not
use any special tricks:

sum = 0;
set = 0;

do i = 1 to whatever;

if a(i) >= 0 then do;

sum = sum + a(i);

set = 1;
end;
end;

if set = 1 then "print sum";

Rod Pemberton
--
Liberals love to point out that vehicles contribute to climate change.
Conservatives should point out that living in skyscrapers does so too.

Robert Prins

unread,

Aug 5, 2017, 6:03:57 AM8/5/17

to

My rendition of the original problem was just bad, I should have posted
something more akin the original code.

In essence it sets up a number of bins, initializes them with -0, and then goes
through a list of accounts for various expenditures, adding the various amounts,
or at least those >= 0 to those bins. All negative (balancing) amounts go into
one single separate bin - I really do not know why an expenditure of +0 was
considered significant.

Using the (too) clever init to a perfectly valid and legal -0, the original
author was able to avoid a test at the start of processing to either put
something into an empty bin (like one initialized with -1), or add it to what
was already there. The -0 init means you can just add it, and rather than than
testing for this unchanged initial value of -1 at the end, the test was done for
the sign of result - remember this is code from around the mid-1980'ies, so from
just about the time that IBM started to use branch prediction on its mainframes,
see <https://en.wikipedia.org/wiki/Branch_predictor#History>. The original
programmer is unlikely to have known this.

As for the number of bins? There are 37, and the number of accounts being
processed is in essence unlimited...

Robert Prins

unread,

Aug 19, 2017, 1:30:24 PM8/19/17

to

On 2017-08-04 17:33, Terje Mathisen wrote:
> Robert Prins wrote:
>> On 2017-08-04 10:37, Terje Mathisen wrote:
>>> The problem/idea here is that you initialize the BCD int to an
>>> illegal value which is effectively zero, but which will maintain
>>> that flag info until the first real operation on it, right?
>>
>> No (the value is legal) and yes (it is really zero), BCD integers on
>
> I know, I should have said "non-canonical encoding of zero" instead.
>
> I.e. the fact that BCD has spare room in the encodings means that it is possible
> to play tricks like this.
>
>> system Z use the last nibble as an indicator for the sign, with A, C,
>> E, and F indicating a positive value, and B & D a negative one. All
>> six of them are perfectly legal (0..9 are not and lead to a S0C7, a
>> hardware detected "Data Exception" error). The only ones used by the
>> system are C, D, and F (which in a grey past meant unsigned)
>
> I'm on the ieee754 team which is updating the floating point standard for 2018,
> I know a bit about computer arithmetic and encodings. :-)

That's what's always puzzling me, how or why do you update a standard? What has
changed in floating pint formats? Puzzled!

>> Obviously this code should never have gone into production. A former
>> colleague sent it to me, as to him, and further afield, in the
>> outsourcing outfit in the far-east, it didn't make any sense. It was
>> written before I joined the company, and my guess is that it was
>> written by a contractor who wrote a lot of tools (in PL/I and z/OS
>> assembler) that, once he left, entered into a "It works, nobody
>> really knows how, so use it, but don't touch it!" realm.
>
> I think it is perfectly OK, as long as the trick is properly documented.

It obviously wasn't.

>> I've actually filed an R(equest) F(or) E(enhancement) with IBM
>> https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=108713
>> to flag such code with a I(nformational) or W(arning) message, or
>> even to introduce a new "RULES" option to the PL/I compiler to issue
>> an E(rror) message, which would, in most z/OS shops, stop the
>> compiler from generating any code.
>>
>> I've told my ex-colleague to init "sum" with -1 and accept the
>> additional test, in this day and age those few extra pico-seconds
>> (the z14 runs at 5.2 GHz) don't really matter. ;)
>>
>>> For pure binary integer code there is of course no reserved value
>>> (should probably have been MININT, i.e. 0x80000/-32768 for a
>>> 16-bit int), so you cannot add any extra info here.

Should have gone for floating point. ;)

>> I know, it's just that the z/OS code has piqued my curiosity, be it
>> that I'm just summing or storing values >= 0, so my summing bins are
>> all initialized to -1. The stores obviously do not present a problem,
>> but for the summations I need two paths, in pseudo-code:
>>
>> if sum = -1 then sum = a[i] else sum = sum + a[i];
>>
>> In most cases there are very few a[i]'s, but as an added
>> complication they are summed into various "sum" bins, leading,
>> assumed, but not unlikely, to a pretty high rate of branch
>> miss-prediction.
>
> Can you post the real code? Maybe that gives more opportunities for aoptimization?

I don't think anyone here would be too pleased with 400+ lines of Pascal or the
754 (now that's a coincidence!) lines of code generated (1,600+ lines in the
listing file) from them, or the 387 (seriously!) lines (of code) I reduced them to.

I'm sure there is some, probably fairly limited, scope to shave off a few more
bytes (without putting some small sections of repeated code in a separate
subroutine), but I doubt that anyone (other than me) would like to "waste" any
time this.

Robert

PS: What I would love to see is what a really optimizing C compiler would make
of the code when translated from Pascal to C!

George Neuner

unread,

Aug 19, 2017, 8:30:53 PM8/19/17

to

On Sat, 19 Aug 2017 20:19:23 +0000, Robert Prins
<rob...@nospicedham.prino.org> wrote:

>That's what's always puzzling me, how or why do you update a standard? What has
>changed in floating pint formats? Puzzled!

That's being discussed at length in comp.arch right now.

George

Terje Mathisen

unread,

Aug 20, 2017, 11:46:44 AM8/20/17

to

Robert Prins wrote:
> On 2017-08-04 17:33, Terje Mathisen wrote:
>> I'm on the ieee754 team which is updating the floating point
>> standard for 2018, I know a bit about computer arithmetic and
>> encodings. :-)
>
> That's what's always puzzling me, how or why do you update a
> standard? What has changed in floating pint formats? Puzzled!

If you believe any standard to be perfect, then I've got a bridge in
Brooklyn to sell you. :-)

We are simply fixing errors and omission, i.e. getting rid of the
current definition of max(a,b) and min(a,b) which breaks down badly in
the face of qNaN/sNaNs in an array reduction.

Terje

rug...@nospicedham.gmail.com

unread,

Aug 20, 2017, 4:46:59 PM8/20/17

to

Hi,

On Saturday, August 19, 2017 at 12:30:24 PM UTC-5, Robert Prins wrote:
>
> I don't think anyone here would be too pleased with 400+ lines
> of Pascal

Maybe not, but you can post it to a Pascal newsgroup (or PasteBin
or whatever).

> PS: What I would love to see is what a really optimizing C compiler
> would make of the code when translated from Pascal to C!

Depends on the Pascal dialect. Clearly you mean GCC or Clang, but
I guess FPC and GPC aren't working for you? Have you tried them?
Those are probably your best bet.

Or try this (ISO 7185) with modern GCC:

https://sourceforge.net/projects/pascal-p5c/files/pascal-p5c-code-35.zip/download

Robert Prins

unread,

Aug 21, 2017, 10:48:00 AM8/21/17

to

On 2017-08-20 20:40, rug...@nospicedham.gmail.com wrote:
> Hi,
>
> On Saturday, August 19, 2017 at 12:30:24 PM UTC-5, Robert Prins wrote:
>>
>> I don't think anyone here would be too pleased with 400+ lines of Pascal
>
> Maybe not, but you can post it to a Pascal newsgroup (or PasteBin or
> whatever).

- comp.lang.pascal.borland is dead for all intents and purposes
- comp.lang.pascal.iso-ansi is dead for all intents and purposes
- comp.lang.pascal.misc has turned into the personal blog of one Amine Moulay
Ramdane

And just in case anyone would care...

The full source can be found on my Google Drive @ <https://goo.gl/ZN3XAB> in
lift32bit.rar, together with the Virtual Pascal .vpo files I use to compile the
lot. And no, lift.exe does not contain a virus! For what it's worth, I've also
added a "lift-s...@2017-08-21.rar" containing both lift32bit.rar and the two
listing files of lift, lift.p.asm & lift.a.asm, where the .p. version contains
the code generated by VP from the Pure Pascal code, and the .a. version the code
I created by taking an axe (and later a scalpel) the this code. Note that it
contains streaks of code disassembled to DB's as VP has no understanding of
anything post Pentium, and my code contains MMX, SSE3 and even AVX instructions,
which I generate using the excellent on-line assembler @
<https://defuse.ca/online-x86-assembler.htm>

Note that the .EXE files have been renamed to .IXI and that they are the result
of compiling the assemblerised versions. liftdat.rar contains the input file
(lift.dat) required by "lift.exe", and some more info about the programs and the
rather complicated format of the input file can be found on
<https://prino.neocities.org/miscellaneous/pascal.html> and
<https://prino.neocities.org/miscellaneous/keeping-statistics.html> respectively.

>> PS: What I would love to see is what a really optimizing C compiler would
>> make of the code when translated from Pascal to C!
>
> Depends on the Pascal dialect. Clearly you mean GCC or Clang, but I guess
> FPC and GPC aren't working for you? Have you tried them? Those are probably
> your best bet.

The source files contains both Pure Pascal and assemblerised sections, and, with
one small tweak, FPC actually compiles the Pure Pascal version. However, FPC has
decided to throw a few conventions that stood since TP1 overboard, which means
that the code no longer runs. Also, even when compiled -O3, the resulting
assembler output shows code that is hardly better than the code produced by VP
(or even BP). And the thing that really has killed using FPC for me is the
miserable IDE, the developers seem to be hell-bent on adding more features,
units, and targets, but little time on the IDE and the code generation.

GPC is dead, as can be seen on the website, which hasn't changed since 2005.

> Or try this (ISO 7185) with modern GCC:
>
> https://sourceforge.net/projects/pascal-p5c/files/pascal-p5c-code-35.zip/download

Borland killed off iso-pascal with Turbo Pascal. Reverting to a theoretical
standard is not really an option.

Robert

rug...@nospicedham.gmail.com

unread,

Aug 22, 2017, 3:49:52 PM8/22/17

to

Hi,

On Monday, August 21, 2017 at 9:48:00 AM UTC-5, Robert Prins wrote:
> On 2017-08-20 20:40, rug...@nospicedham.gmail.com wrote:
> >
> > On Saturday, August 19, 2017 at 12:30:24 PM UTC-5, Robert Prins wrote:
> >>
> >> I don't think anyone here would be too pleased with 400+ lines
> >> of Pascal
> >
> > Maybe not, but you can post it to a Pascal newsgroup (or PasteBin or
> > whatever).
>
> - comp.lang.pascal.borland is dead for all intents and purposes
> - comp.lang.pascal.iso-ansi is dead for all intents and purposes
> - comp.lang.pascal.misc has turned into the personal blog of one
> Amine Moulay Ramdane

Yes, but focusing on the Pascal version probably doesn't belong here
in CLAX. So we should migrate to news://comp.lang.pascal.borland
instead.

> The full source can be found on my Google Drive in lift32bit.rar

I only see liftdat.rar (and other bagatela), thus can't find *.PAS .

> >> PS: What I would love to see is what a really optimizing C
> >> compiler would make of the code when translated from Pascal to C!
> >
> > Depends on the Pascal dialect. Clearly you mean GCC or Clang,
> > but I guess FPC and GPC aren't working for you? Have you tried
> > them? Those are probably your best bet.
>
> The source files contains both Pure Pascal and assemblerised sections,
> and, with one small tweak, FPC actually compiles the Pure Pascal version.

Again, I can't see the *.PAS source at all. There is no obvious download
link on your homepage either.

> However, FPC has decided to throw a few conventions that stood
> since TP1 overboard, which means that the code no longer runs.

I find that unlikely. FPC is highly compatible with TP.

> Also, even when compiled -O3, the resulting assembler output
> shows code that is hardly better than the code produced by VP
> (or even BP).

That's impossible. Just avoiding ENTER/LEAVE (which VP favors)
helps a lot. You need -Cp and -Op (and maybe -Si and -Mtp and ...).

> And the thing that really has killed using FPC for me is the
> miserable IDE, the developers seem to be hell-bent on adding
> more features, units, and targets, but little time on the IDE
> and the code generation.

The IDE is older code, but overall the codegen is very good.

> GPC is dead, as can be seen on the website, which hasn't
> changed since 2005.

That's misleading. The website isn't the same as the actual code.
The frontend was last officially updated (AFAIK) in late 2007.
The various GCC backends changed too much for them to worry
with keeping up, but there were working versions for at least
GCC 4.2 and 4.3 (circa 2011).

But here I'm actually referring to old static builds (DOS/DJGPP)
that still work fine. And by "old" I mean GCC 3.4.6 atop DJGPP
2.05 with latest COFF BinUtils. It's not miniscule output size,
but even older GCC is still good with codegen.

> > Or try this (ISO 7185) with modern GCC:
> >
> > https://sourceforge.net/projects/pascal-p5c/files/pascal-p5c-code-35.zip/download
>
> Borland killed off iso-pascal with Turbo Pascal. Reverting to a theoretical
> standard is not really an option.

Here I'm talking more about actual working code translation for
"modern" GCC, with its multitude of optimizations, than anything
else. I wasn't really trying to force the classic dialect on you
(although you should be a little sympathetic, IMHO, due to its
provenance).

Robert Prins

unread,

Aug 23, 2017, 5:05:47 AM8/23/17

to

On 2017-08-22 19:38, rug...@nospicedham.gmail.com wrote:> On Monday, August 21,

2017 at 9:48:00 AM UTC-5, Robert Prins wrote:
>> On 2017-08-20 20:40, rug...@nospicedham.gmail.com wrote:
>>>
>>> On Saturday, August 19, 2017 at 12:30:24 PM UTC-5, Robert Prins wrote:
>>>>
>>>> I don't think anyone here would be too pleased with 400+ lines of
>>>> Pascal
>>>
>>> Maybe not, but you can post it to a Pascal newsgroup (or PasteBin or
>>> whatever).
>>
>> - comp.lang.pascal.borland is dead for all intents and purposes -
>> comp.lang.pascal.iso-ansi is dead for all intents and purposes -
>> comp.lang.pascal.misc has turned into the personal blog of one Amine
>> Moulay Ramdane
>
> Yes, but focusing on the Pascal version probably doesn't belong here in
> CLAX. So we should migrate to news://comp.lang.pascal.borland instead.
>
>> The full source can be found on my Google Drive in lift32bit.rar

The files that you need is "lift-s...@2017-08-20.rar" To stop Google (I've
emailed them numerous times about it) from falsely claiming that it contains a
virus - submit lift.ixi to virustotal and you will see that it is safe, it's
encrypted with that safest of safe passwords,"password". ;) It contains three
other files,

- lift32bit.rar, which contains the full sources, .EXE's (renamed to .IXI),
.MAPfiles and .VPO files to compile the lot

- lift.a.asm - assembler output of the compilation of the assemblerised version
of LIFT

- lift.p.asm - assembler output of the compilation of the Pure Pascal version of
LIFT

> I only see liftdat.rar (and other bagatela), thus can't find *.PAS .

The encrypted "lift-s...@2017-08-20.rar" archive shows up here.

>>>> PS: What I would love to see is what a really optimizing C compiler
>>>> would make of the code when translated from Pascal to C!
>>>
>>> Depends on the Pascal dialect. Clearly you mean GCC or Clang, but I
>>> guess FPC and GPC aren't working for you? Have you tried them? Those are
>>> probably your best bet.
>>
>> The source files contains both Pure Pascal and assemblerised sections,
>> and, with one small tweak, FPC actually compiles the Pure Pascal version.
>
> Again, I can't see the *.PAS source at all. There is no obvious download
> link on your homepage either.

Same link to my Google drive,

Projects

* He is the executive producer of "North", a short movie by Sarah Franke.
* He is the author of a <set of programs> to extract various statistics from
notes made while hitchhiking. The programs are written in Pascal(!) and a here
is the fairly comprehensive manual.

>> However, FPC has decided to throw a few conventions that stood since TP1
>> overboard, which means that the code no longer runs.
>
> I find that unlikely. FPC is highly compatible with TP.

No, unlike TP/BP/VP, FPC will not honour the convention that three variables in
a const declaration are kept together as packed, i.e.

const
lift_ptr: liftptr = nil;
lift_top: liftptr = nil;
lift_end: liftptr = nil;

>> Also, even when compiled -O3, the resulting assembler output shows code
>> that is hardly better than the code produced by VP (or even BP).
>
> That's impossible. Just avoiding ENTER/LEAVE (which VP favors) helps a lot.
> You need -Cp and -Op (and maybe -Si and -Mtp and ...).

Been there, done that... And I have no clue as to how you can get VP to generate
ENTER. I've never seen it in any of my listings. LEAVE, yes.

>> And the thing that really has killed using FPC for me is the miserable
>> IDE, the developers seem to be hell-bent on adding more features, units,
>> and targets, but little time on the IDE and the code generation.
>
> The IDE is older code, but overall the codegen is very good.

Will download 3.0.2 (last tried version was 2.6.?), and give it one more, and
absolutely final try.

>> GPC is dead, as can be seen on the website, which hasn't changed since
>> 2005.
>
> That's misleading. The website isn't the same as the actual code. The
> frontend was last officially updated (AFAIK) in late 2007. The various GCC
> backends changed too much for them to worry with keeping up, but there were
> working versions for at least GCC 4.2 and 4.3 (circa 2011).
>
> But here I'm actually referring to old static builds (DOS/DJGPP) that still
> work fine. And by "old" I mean GCC 3.4.6 atop DJGPP 2.05 with latest COFF
> BinUtils. It's not miniscule output size, but even older GCC is still good
> with codegen.

I tried GPC more than a decade ago, actually before I switched to VP. Never
could get it to work.

>>> Or try this (ISO 7185) with modern GCC:
>>>
>>>
https://sourceforge.net/projects/pascal-p5c/files/pascal-p5c-code-35.zip/download
>>>
>> Borland killed off iso-pascal with Turbo Pascal. Reverting to a
>> theoretical standard is not really an option.
>
> Here I'm talking more about actual working code translation for "modern"
> GCC, with its multitude of optimizations, than anything else. I wasn't
> really trying to force the classic dialect on you (although you should be a
> little sympathetic, IMHO, due to its provenance).

Maybe, but when the page on SourceForge told me

"The project also contains p5x - pascal with extensions to the standard pascal
language (underscores allowed in identifiers, otherwise in case statement,
constant expressions, etc)"

it made me realize that the effort needed to convert my code would be way over
the top, as my code contains way too many of such "Borland-isms".

And next to that I have little interest in installing GCC...

George Neuner

unread,

Aug 23, 2017, 10:51:12 AM8/23/17

to

On Wed, 23 Aug 2017 11:59:35 +0000, Robert Prins
<rob...@nospicedham.prino.org> wrote:

>The files that you need is "lift-s...@2017-08-20.rar" To stop Google (I've
>emailed them numerous times about it) from falsely claiming that it contains a
>virus - submit lift.ixi to virustotal and you will see that it is safe, it's
>encrypted with that safest of safe passwords,"password". ;) It contains three
>other files,
>
>- lift32bit.rar, which contains the full sources, .EXE's (renamed to .IXI),
>.MAPfiles and .VPO files to compile the lot
>
>- lift.a.asm - assembler output of the compilation of the assemblerised version
>of LIFT
>
>- lift.p.asm - assembler output of the compilation of the Pure Pascal version of
>LIFT

Google is a PITA about executables and executable scripts, bootable
ISOs, and an ever growing list of file names that could be confused
with system files [on Windows or Linux].

Circumventing Google's censorship just to transfer files is a part
time job for many people.

George

Robert Prins

unread,

Aug 23, 2017, 12:21:19 PM8/23/17

to

Encrypting an archive with WinRAR *and* encrypting the filenames seems to be a
solution. I'm using DynDNS and back in Belgium that works OK from a fixed
connection for my FTP site, but for some reason I cannot get it to work here in
Vilnius via our mobile connection. Of course the real problem is the fact that
the idiots at Google still use virus scanners that use heuristics that do not
work, like F-Prot - which I've been using for more than a decade, which I've
emailed at least half a dozen times this year about their false positives, and
who are going to loose me me as a customer in the next three weeks!

Robert

Robert Prins

unread,

Aug 23, 2017, 1:21:24 PM8/23/17

to

On 2017-08-22 19:38, rug...@nospicedham.gmail.com wrote:

>> Also, even when compiled -O3, the resulting assembler output
>> shows code that is hardly better than the code produced by VP
>> (or even BP).
>
> That's impossible. Just avoiding ENTER/LEAVE (which VP favors)
> helps a lot. You need -Cp and -Op (and maybe -Si and -Mtp and ...).

Been there again, done that again, compiling via the IDE,

Syntax: TP compatible
Code: Only I/O checking (like what I do with VP), and all optimizations except
smaller code
Optimization target CPU: COREAVX, Code Generation target CPU: COREAVX
Verbose: all
Browser: No
Assembler: Intel style, Only List Source, Default output (how the flipping hell
do you get Intel output????)

Change a setr of LONGINT into WORD, and it compiles.

Optimizations? Utter crap!

From process_times, just one example:

//--------------------------------------------------------
// Departure, arrival, wait, and driving time of ride
//--------------------------------------------------------
s_arr^[1].dtime:= lift_ptr^.dtime;
s_arr^[1].atime:= lift_ptr^.atime;
s_arr^[1].wtime:= lift_ptr^.wtime;
s_arr^[1].itime:= lift_ptr^.itime;

Generated code:

# [1016] s_arr^[1].dtime:= lift_ptr^.dtime;
movl (%esp),%ecx
movl TC_$HHCOMMON_$$_LIFT_PTR,%eax
movl 64(%eax),%eax
movl %eax,4(%ecx)
# [1017] s_arr^[1].atime:= lift_ptr^.atime;
movl (%esp),%ecx
movl TC_$HHCOMMON_$$_LIFT_PTR,%eax
movl 68(%eax),%eax
movl %eax,8(%ecx)
# [1018] s_arr^[1].wtime:= lift_ptr^.wtime;
movl (%esp),%ecx
movl TC_$HHCOMMON_$$_LIFT_PTR,%eax
movl 60(%eax),%eax
movl %eax,12(%ecx)
# [1019] s_arr^[1].itime:= lift_ptr^.itime;
movl (%esp),%ecx
movl TC_$HHCOMMON_$$_LIFT_PTR,%eax
movl 76(%eax),%eax
movl %eax,16(%ecx)

or even worse:

inc(_s);
s_arr^[_s].stype:= lift_ptr^.s_type;
s_arr^[_s].dtime:= lift_ptr^.dtime;
s_arr^[_s].atime:= lift_ptr^.atime;
s_arr^[_s].itime:= lift_ptr^.itime;

# [1048] inc(_s);
addl $1,24(%esp)
# [1049] s_arr^[_s].stype:= lift_ptr^.s_type;
movl (%esp),%ecx
movl 24(%esp),%ebx
shll $5,%ebx
movl TC_$HHCOMMON_$$_LIFT_PTR,%eax
movl 52(%eax),%eax
movl %eax,-32(%ecx,%ebx)
# [1050] s_arr^[_s].dtime:= lift_ptr^.dtime;
movl (%esp),%ecx
movl 24(%esp),%ebx
shll $5,%ebx
movl TC_$HHCOMMON_$$_LIFT_PTR,%eax
movl 64(%eax),%eax
movl %eax,-28(%ecx,%ebx)
# [1051] s_arr^[_s].atime:= lift_ptr^.atime;
movl (%esp),%ebx
movl 24(%esp),%ecx
shll $5,%ecx
movl TC_$HHCOMMON_$$_LIFT_PTR,%eax
movl 68(%eax),%eax
movl %eax,-24(%ebx,%ecx)
# [1052] s_arr^[_s].itime:= lift_ptr^.itime;
movl (%esp),%ecx
movl 24(%esp),%ebx
shll $5,%ebx
movl TC_$HHCOMMON_$$_LIFT_PTR,%eax
movl 76(%eax),%eax
movl %eax,-16(%ecx,%ebx)

How about loading esi and edi with source and destination, lift_ptr and
s_arr^[_s] just once, and, OK this is advanced, using mmx (or xmm) for the
dtime/atime move (two consecutive 32-bit variables to the same).

This is AD 198x Turbo Pascal 3 type code.

And of course the code doesn't run, because all the xxxx_end pointers have been
"optimized" away.

George Neuner

unread,

Aug 25, 2017, 5:53:40 AM8/25/17

to

On Wed, 23 Aug 2017 19:11:15 +0000, Robert Prins
<rob...@nospicedham.prino.org> wrote:

>> Circumventing Google's censorship just to transfer files is a part time job
>> for many people.
>
>Encrypting an archive with WinRAR *and* encrypting the filenames seems to be a
>solution.

Hmm. I don't know about GoogleDrive, but Gmail often bounces password
protected archives. I know that they unpack zip files and scan inside
them. Mangling file names doesn't always work either - they seem to
be able to recognize many types of scripts by content.

Maybe RAR does fly under Google's radar, but I'd hesitate to go that
route because there have been so many problems with RAR over the
years. I can recall several versions that were so buggy as to be
unusable, and I have seen perfectly good archives fail to unpack with
RAR itself, but unpack successfully with, e.g., 7zip.

There also is the issue that so few people have it available. Unzip
at least comes in the box with Windows and Linux, but most people have
no idea what to do with a RAR.

I personally gave up on RAR somewhere around 3.6. Maybe it's better
now, but for better or worse, Zip has all but conquered the world.

YMMV,
George

wolfgang kern

unread,

Aug 25, 2017, 12:24:03 PM8/25/17

to

George Neuner wrote:
...

> I personally gave up on RAR somewhere around 3.6. Maybe it's better
> now, but for better or worse, Zip has all but conquered the world.

I use POWERARC.EXE ('unzips' almost all incl. rar and tar)
__
wolfgang

Terje Mathisen

unread,

Aug 25, 2017, 1:09:07 PM8/25/17

to

7zip is integrated with the Windows Explorer and handles everything I've
tried to throw at it.

The fact that it is also open source is just icing on the cake.

Terje

rug...@nospicedham.gmail.com

unread,

Aug 25, 2017, 6:09:27 PM8/25/17

to

Hi,

On Wednesday, August 23, 2017 at 4:05:47 AM UTC-5, Robert Prins wrote:
> On 2017-08-22 19:38, rugxulo@nospicedham wrote:> On Monday, August 21,

>
> The files that you need is "lift-s...@2017-08-20.rar"

> To stop Google from falsely claiming that it contains a virus

I see lots of sources, but nothing obvious on how to compile with FPC.
(What's the main .PAS ? Where's the diff/patch? .BATs are for VP only.)

You keep saying that FPC won't work, but all I hear is how many
changes you have to make to even get it to compile. It shouldn't
be this hard.

BTW, not to defend bad antivirus heuristics, but it might go easier
if you kept sources (plain text) separate from binary blobs in a
different archive.

> >> The source files contains both Pure Pascal and assemblerised sections,
> >> and, with one small tweak, FPC actually compiles the Pure Pascal version.
> >

> > I find that unlikely. FPC is highly compatible with TP.
>
> No, unlike TP/BP/VP, FPC will not honour the convention that three
> variables in a const declaration are kept together as packed, i.e.
>
> const
> lift_ptr: liftptr = nil;
> lift_top: liftptr = nil;
> lift_end: liftptr = nil;

Not sure why you're relying on that specific layout. All system-
specific code should be separated (or avoided). Otherwise, as
you've noticed, there's little gain in HLLs.

> Will download 3.0.2 (last tried version was 2.6.?), and give it one more, and
> absolutely final try.

I just don't even know where to start looking to compile this, though.

> I tried GPC more than a decade ago, actually before I switched to VP. Never
> could get it to work.

The benefit of standardization (and a public test suite and spec) is that
you shouldn't have this problem with conformant code.

> >>> Or try this (ISO 7185) with modern GCC: (p5c)

> >
> > Here I'm talking more about actual working code translation for "modern"
> > GCC, with its multitude of optimizations, than anything else.
>

> Maybe, but when the page on SourceForge told me
>
> "The project also contains p5x - pascal with extensions to the standard pascal
> language (underscores allowed in identifiers, otherwise in case statement,
> constant expressions, etc)"
>
> it made me realize that the effort needed to convert my code would be way over
> the top, as my code contains way too many of such "Borland-isms".

Well, yes, {$mode tp} is fairly incompatible with ISO 7185. That was
their choice to be non-standard. And similarly few others cared or
agreed on much later on either. Pascal has (too) many variants.

> And next to that I have little interest in installing GCC...

Well, for DOS, it's only five .ZIPs, unpack, set two env. vars, and
you're ready. Extremely easy, even under a VM. And GPC also supports
{$borland-pascal} mode.

Robert Prins

unread,

Aug 29, 2017, 1:29:52 PM8/29/17

to

On 2017-08-25 22:07, rug...@nospicedham.gmail.com wrote:
> Hi,
>
> On Wednesday, August 23, 2017 at 4:05:47 AM UTC-5, Robert Prins wrote:
>> On 2017-08-22 19:38, rugxulo@nospicedham wrote:> On Monday, August 21,
>>
>> The files that you need is "lift-s...@2017-08-20.rar" To stop Google
>> from falsely claiming that it contains a virus
>
> I see lots of sources, but nothing obvious on how to compile with FPC.
> (What's the main .PAS ? Where's the diff/patch? .BATs are for VP only.

The main.pas'es are:

chkdat.pas
lift.pas
dayform.pas
h-h2rtf.pas
h-h2html.pas

No diffs, just make the change.

> You keep saying that FPC won't work, but all I hear is how many changes you
> have to make to even get it to compile. It shouldn't be this hard.

Only one hard change, the four longint's in hhcommon.pas (write_time) need to be
changed into word. After that it compiles, but due to FPC optimizing out the
xxxx_top pointers none of the executables actually runs.

> BTW, not to defend bad antivirus heuristics, but it might go easier if you
> kept sources (plain text) separate from binary blobs in a different archive.

Because this way everything is in one place. Will give it some thought.

>>>> The source files contains both Pure Pascal and assemblerised sections,
>>>> and, with one small tweak, FPC actually compiles the Pure Pascal
>>>> version.
>>>
>>> I find that unlikely. FPC is highly compatible with TP.
>>
>> No, unlike TP/BP/VP, FPC will not honour the convention that three
>> variables in a const declaration are kept together as packed, i.e.
>>
>> const lift_ptr: liftptr = nil; lift_top: liftptr = nil; lift_end: liftptr =
>> nil;
>
> Not sure why you're relying on that specific layout. All system- specific
> code should be separated (or avoided). Otherwise, as you've noticed, there's
> little gain in HLLs.

Why this layout? See update_list_pointers in hhcommon.pas Basically been using
this since TP 3.01a (and on z/OS, using PL/I) and it should not break!

>> Will download 3.0.2 (last tried version was 2.6.?), and give it one more,
>> and absolutely final try.

As I already wrote, Crash, Boom, Bang, and code that looks as if it came
straight out of TP 1. Thanks, but no thanks, no more!

> I just don't even know where to start looking to compile this, though.
>
>> I tried GPC more than a decade ago, actually before I switched to VP.
>> Never could get it to work.
>
> The benefit of standardization (and a public test suite and spec) is that you
> shouldn't have this problem with conformant code.
>
>>>>> Or try this (ISO 7185) with modern GCC: (p5c)
>>>
>>> Here I'm talking more about actual working code translation for "modern"
>>> GCC, with its multitude of optimizations, than anything else.
>>
>> Maybe, but when the page on SourceForge told me
>>
>> "The project also contains p5x - pascal with extensions to the standard
>> pascal language (underscores allowed in identifiers, otherwise in case
>> statement, constant expressions, etc)"
>>
>> it made me realize that the effort needed to convert my code would be way
>> over the top, as my code contains way too many of such "Borland-isms".
>
> Well, yes, {$mode tp} is fairly incompatible with ISO 7185. That was their
> choice to be non-standard. And similarly few others cared or agreed on much
> later on either. Pascal has (too) many variants.

ISO 7185 is a language to teach programming. Turbo Pascal is a language to
actually write working programs.

>> And next to that I have little interest in installing GCC...
>
> Well, for DOS, it's only five .ZIPs, unpack, set two env. vars, and you're
> ready. Extremely easy, even under a VM. And GPC also supports
> {$borland-pascal} mode.

I don't use DOS, and the whole reason for going from BP7 to VP was to remove any
space constraints, and be able to code in-line assembler without loads of db
overrides.

Robert

rug...@nospicedham.gmail.com

unread,

Aug 29, 2017, 8:00:28 PM8/29/17

to

Hi,

On Tuesday, August 29, 2017 at 12:29:52 PM UTC-5, Robert Prins wrote:
> On 2017-08-25 22:07, rug...@nospicedham.gmail.com wrote:
> >
> > On Wednesday, August 23, 2017 at 4:05:47 AM UTC-5, Robert Prins wrote:
> >> On 2017-08-22 19:38, rugxulo@nospicedham wrote:> On Monday, August 21,
> >>
> >> The files that you need is "lift-s...@2017-08-20.rar" To stop Google
> >> from falsely claiming that it contains a virus
> >
> > I see lots of sources, but nothing obvious on how to compile with FPC.
> > (What's the main .PAS ? Where's the diff/patch? .BATs are for VP only.
> The main.pas'es are:
>
> chkdat.pas
> lift.pas
> dayform.pas
> h-h2rtf.pas
> h-h2html.pas
>
> No diffs, just make the change.

If you had {$ifdef}s for FPC (or an actual patch for Patch), it'd
be loads easier. Or even a separate, FPC-only version, that'd be nice.
I can sorta kinda get it compiled now, but ....

> > You keep saying that FPC won't work, but all I hear is how many changes you
> > have to make to even get it to compile. It shouldn't be this hard.
>
> Only one hard change, the four longint's in hhcommon.pas (write_time)
> need to be changed into word. After that it compiles, but due to FPC
> optimizing out the xxxx_top pointers none of the executables actually runs.

Adding "-gl" seems to show that the problem is in "readfile".

Not entirely sure what low-level pointer stuff is going on behind
the scenes (that you're referring to here).

> > BTW, not to defend bad antivirus heuristics, but it might go easier if you
> > kept sources (plain text) separate from binary blobs in a different archive.
>
> Because this way everything is in one place. Will give it some thought.

Heuristics are definitely bad, for the most part, and shouldn't be enabled
by default. That's their bug, not yours. Too many false positives isn't
fair, and honestly I'm also tired of dealing with it.

> >>>> The source files contains both Pure Pascal and assemblerised sections,
> >>>> and, with one small tweak, FPC actually compiles the Pure Pascal
> >>>> version.
> >>>
> >>> I find that unlikely. FPC is highly compatible with TP.
> >>
> >> No, unlike TP/BP/VP, FPC will not honour the convention that three
> >> variables in a const declaration are kept together as packed, i.e.
> >>
> >> const lift_ptr: liftptr = nil; lift_top: liftptr = nil; lift_end: liftptr =
> >> nil;
> >
> > Not sure why you're relying on that specific layout. All system- specific
> > code should be separated (or avoided). Otherwise, as you've noticed, there's
> > little gain in HLLs.
>
> Why this layout? See update_list_pointers in hhcommon.pas Basically been using
> this since TP 3.01a (and on z/OS, using PL/I) and it should not break!

But TP3 was thirty years ago, back when TP still supported CP/M and
.COM output. TP4 was the new one with .EXE support. So yes, things do
change (obviously).

FPC didn't even have a 16-bit target until two years ago! Here I'm
actually assuming Go32v2 (32-bit), but maybe we should try i8086-msdos
instead?

> >> Will download 3.0.2 (last tried version was 2.6.?), and give it one more,
> >> and absolutely final try.
>
> As I already wrote, Crash, Boom, Bang, and code that looks as if it came
> straight out of TP 1. Thanks, but no thanks, no more!

I think you're overreacting, to say the least. It's still very very good.
But if you can write better pure assembly, go ahead!

> ISO 7185 is a language to teach programming. Turbo Pascal is a language
> to actually write working programs.

ISO 7185 died with either Extended Pascal (ISO 10206) or Modula-2 or
maybe Oberon. It's still good, but only for historical reasons.

Turbo Pascal died with Delphi, which has had dozens of releases (and
tons of changes). Even FPC prefers Delphi these days but still keeps
faithful to {$mode tp} for all the legacy (which is good, IMHO).
FPC's ISO support isn't quite perfect yet.

> >> And next to that I have little interest in installing GCC...
> >
> > Well, for DOS, it's only five .ZIPs, unpack, set two env. vars,
> > and you're ready. Extremely easy, even under a VM. And GPC also
> > supports {$borland-pascal} mode.
>
> I don't use DOS, and the whole reason for going from BP7 to VP was
> to remove any space constraints, and be able to code in-line assembler
> without loads of db overrides.

Okay, but DOS is simple to setup, easy to install atop, free, so that
is my personal preference here.

But since you direly want inline assembly, GCC (GPC) "probably" isn't
going to be your cup of tea. FPC is probably somewhat easier (although
BinUtils has also supported .intel_syntax for almost two decades).

There are no space constraints in 32-bit pmode (DPMI) atop DOS.
So FPC and GPC have no 16-bit real-mode limits. (Heck, I sometimes
run VPC under DOS + HX, too.)

rug...@nospicedham.gmail.com

unread,

Aug 29, 2017, 8:00:29 PM8/29/17

to

Hi again,

On Wednesday, August 23, 2017 at 12:21:24 PM UTC-5, Robert Prins wrote:
> On 2017-08-22 19:38, rug...@nospicedham.gmail.com wrote:
>
> Been there again, done that again, compiling via the IDE,
>

> Assembler: Intel style, Only List Source, Default output
> (how the flipping hell do you get Intel output????)

From cmdline use, it seems to be "-al -Anasm".

> Optimizations? Utter crap!
>
> From process_times (daytimes.pas), just one example:

> How about loading esi and edi with source and destination,
> lift_ptr and s_arr^[_s] just once

Here's what I get (FPC 3.0.2, Go32v2):

; [1032] s_arr^[1].dtime:= lift_ptr^.dtime;
mov ecx,dword [esp]
mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
mov eax,dword [eax+64]
mov dword [ecx+4],eax
; [1033] s_arr^[1].atime:= lift_ptr^.atime;
mov ecx,dword [esp]
mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
mov eax,dword [eax+68]
mov dword [ecx+8],eax
; [1034] s_arr^[1].wtime:= lift_ptr^.wtime;
mov ecx,dword [esp]
mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
mov eax,dword [eax+60]
mov dword [ecx+12],eax
; [1035] s_arr^[1].itime:= lift_ptr^.itime;
mov ecx,dword [esp]
mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
mov eax,dword [eax+76]
mov dword [ecx+16],eax
jmp ..@j8572
..@j8550:
mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]

... and here's a simple attempt using "WITH" keyword:

; [1032] with s_arr^[1] do begin
mov ecx,dword [esp]
; [1033] dtime:= lift_ptr^.dtime;
mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
mov eax,dword [eax+64]
mov dword [ecx+4],eax
; [1034] atime:= lift_ptr^.atime;
mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
mov eax,dword [eax+68]
mov dword [ecx+8],eax
; [1035] wtime:= lift_ptr^.wtime;
mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
mov eax,dword [eax+60]
mov dword [ecx+12],eax
; [1036] itime:= lift_ptr^.itime;
mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
mov eax,dword [eax+76]
mov dword [ecx+16],eax
jmp ..@j8574
..@j8550:
mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]

... So not much improvement, but it does save a little.

Robert Prins

unread,

Aug 31, 2017, 11:49:11 AM8/31/17

to

> ...@j8550:
> mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
>
> .... and here's a simple attempt using "WITH" keyword:

>
> ; [1032] with s_arr^[1] do begin
> mov ecx,dword [esp]
> ; [1033] dtime:= lift_ptr^.dtime;
> mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
> mov eax,dword [eax+64]
> mov dword [ecx+4],eax
> ; [1034] atime:= lift_ptr^.atime;
> mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
> mov eax,dword [eax+68]
> mov dword [ecx+8],eax
> ; [1035] wtime:= lift_ptr^.wtime;
> mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
> mov eax,dword [eax+60]
> mov dword [ecx+12],eax
> ; [1036] itime:= lift_ptr^.itime;
> mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
> mov eax,dword [eax+76]
> mov dword [ecx+16],eax
> jmp ..@j8574

> ...@j8550:
> mov eax,dword [TC_$HHCOMMON_$$_LIFT_PTR]
>
> .... So not much improvement, but it does save a little.

Yes, and TP1 (and VP) generate the same kind of code using the "WITH" fudge, my
hacked-about code comes out as:

{$ifdef mmx}
db $0f,$6f,$43,offset lift_list.dtime // movq mm0, [ebx + offset
lift_list.dtime]
db $0f,$7f,$46,offset s_rec.dtime // movq [esi + offset
s_rec.dtime], mm0
db $0f,$6f,$43,offset lift_list.wtime // movq mm0, [ebx + offset
lift_list.wtime]
db $0f,$7f,$46,offset s_rec.wtime // movq [esi + offset
s_rec.wtime], mm0
{$else}
mov ecx, [ebx + offset lift_list.dtime]
mov [esi + offset s_rec.dtime], ecx

mov ecx, [ebx + offset lift_list.atime]
mov [esi + offset s_rec.atime], ecx

mov ecx, [ebx + offset lift_list.wtime]
mov [esi + offset s_rec.wtime], ecx

mov ecx, [ebx + offset lift_list.itime]
mov [esi + offset s_rec.itime], ecx
{$endif}

And with some additional restructuring of the lift_list and s_rec, I could
probably use db'ed XMM instructions (although that would cause delays on Intel)
as I already use YMMs to initialise the 32-byte s_rec - in fact most of the data
structures in the program are "tuned" to MMX (and later) instructions, which was
facilitated by removing all but one small bit of FPU code, where I could not get
an integer square root to reach the required accuracy.

I would expect that GCC or the Intel C compiler would both generate at least the
code in the {$else} branch above when the original Pascal is replaced by C(++).

Both VP and FPC seem to make way too much use of EAX...

rug...@nospicedham.gmail.com

unread,

Sep 2, 2017, 8:08:02 PM9/2/17

to

Hi,

On Thursday, August 31, 2017 at 10:49:11 AM UTC-5, Robert Prins wrote:
>
> my hacked-about code comes out as:
>
> {$ifdef mmx}

> ...

> {$else}
> mov ecx, [ebx + offset lift_list.dtime]
> mov [esi + offset s_rec.dtime], ecx
>
> mov ecx, [ebx + offset lift_list.atime]
> mov [esi + offset s_rec.atime], ecx
>
> mov ecx, [ebx + offset lift_list.wtime]
> mov [esi + offset s_rec.wtime], ecx
>
> mov ecx, [ebx + offset lift_list.itime]
> mov [esi + offset s_rec.itime], ecx
> {$endif}
>

> I would expect that GCC or the Intel C compiler would both
> generate at least the code in the {$else} branch above when
> the original Pascal is replaced by C(++).

So why not rewrite in C? It can't be that hard (famous last words!).

Though personally I'd suggest rather fixing to work with FPC,
that's more useful and important (IMHO).

> Both VP and FPC seem to make way too much use of EAX...

I'm not an optimization guru. I haven't read Agner Fog's manuals
closely. Modern cpus probably do heavy register renaming and
lots of out-of-order (pipelined, superscalar, whatever) stuff.
I think older ones were pickier about certain things, but I
don't know if you care about (or test your code on) such machines.

Relying too much on one register is probably a bad idea, but
they probably just want to simplify register shuffling.
I had thought I read that alternating registers was a better
idea, so maybe try not relying too heavily on ECX either.

Actually, your {$else} code seems pretty sequential. I would
just use "push dword[], pop dword[]" (but not for 486) and
avoid ECX altogether. Maybe I'm naive, but that's a quick
simplification. No idea if it really helps you, though.

rug...@nospicedham.gmail.com

unread,

Sep 2, 2017, 8:23:05 PM9/2/17

to

Hi,

On Friday, August 25, 2017 at 4:53:40 AM UTC-5, George Neuner wrote:
> On Wed, 23 Aug 2017 19:11:15 +0000, Robert Prins
> <rob...@nospicedham.prino.org> wrote:
>
> Maybe RAR does fly under Google's radar, but I'd hesitate to go that
> route because there have been so many problems with RAR over the
> years. I can recall several versions that were so buggy as to be
> unusable, and I have seen perfectly good archives fail to unpack with
> RAR itself, but unpack successfully with, e.g., 7zip.

It's somewhat niche, but some people still swear by RAR.

> There also is the issue that so few people have it available. Unzip
> at least comes in the box with Windows and Linux, but most people have
> no idea what to do with a RAR.

UnRAR isn't exactly Free/libre (which is a small, but noticeable,
annoyance), but it does come freely with sources.

There are also precompiled binaries of it for various platforms:

http://www.rarlab.com/rar_add.htm

> I personally gave up on RAR somewhere around 3.6. Maybe it's better
> now, but for better or worse, Zip has all but conquered the world.

.ZIP has so many variations and additions that it's hardly standard,
but indeed a subset of it is still very popular.

Robert Prins

unread,

Sep 3, 2017, 2:24:05 PM9/3/17

to

On 2017-09-02 23:54, rug...@nospicedham.gmail.com wrote:
> On Thursday, August 31, 2017 at 10:49:11 AM UTC-5, Robert Prins wrote:
>>
>> my hacked-about code comes out as:
>>
>> {$ifdef mmx}
>> ...
>> {$else}
>> mov ecx, [ebx + offset lift_list.dtime]
>> mov [esi + offset s_rec.dtime], ecx
>>
>> mov ecx, [ebx + offset lift_list.atime]
>> mov [esi + offset s_rec.atime], ecx
>>
>> mov ecx, [ebx + offset lift_list.wtime]
>> mov [esi + offset s_rec.wtime], ecx
>>
>> mov ecx, [ebx + offset lift_list.itime]
>> mov [esi + offset s_rec.itime], ecx
>> {$endif}
>>
>> I would expect that GCC or the Intel C compiler would both
>> generate at least the code in the {$else} branch above when
>> the original Pascal is replaced by C(++).
>
> So why not rewrite in C? It can't be that hard (famous last words!).

I've been brought up with ALGOL 60, and TI-59-ese ;) and around 1984/5 my father
bought Turbo Pascal 2, and in 1985 I started work, with PL/I. The next language
(REXX) followed in early 1992, and I've never had the luck (or is it misfortune)
to work with C. The oldest ***saved*** version of "LIFT" dates back to 9 April
1994 (a 49k .COM file), and the first time I started using the old TP "inline"
statement was in version 46, on 30 July 1995.

And for history buffs, the last (60th) TP 3.01a version dates back to 5 August
1996. It was followed by the first TP 6.00 version on 2 September 1996, and the
last (53rd) TP6 version saw the light on 6 October 2008, to be followed by the
first VP version on the same day - the current (95th) VP version comes in at a
hefty 96k .EXE. I don't like bloatware. ;)

> Though personally I'd suggest rather fixing to work with FPC,
> that's more useful and important (IMHO).

No, it's not. VP may be dead, it's hellish to add post Pentium instructions in
the form of long DB sequences to it, and debugging them is of course impossible,
but the IDE is still light-years ahead of what FPC offers. My goal is to
eventually convert the program into pure assembler, probably via FASM, and I
will not go back to FPC until it has an IDE that is as smooth as the one used by
VP, in other words: probably never. :(

>> Both VP and FPC seem to make way too much use of EAX...
>
> I'm not an optimization guru. I haven't read Agner Fog's manuals
> closely. Modern cpus probably do heavy register renaming and
> lots of out-of-order (pipelined, superscalar, whatever) stuff.
> I think older ones were pickier about certain things, but I
> don't know if you care about (or test your code on) such machines.

My desktop uses an AMD FX8150, the laptop an Intel quadcore mobile i7, and the
Pure Pascal version of the program would likely still run on a 386, but how
useful is that, I don't even think I even have anything pre-486, and the 486
probably has DOS on it, as I still, one day, hope to get the TI-95 PC Interface
software working again, which would allow me to find the bugs in a TI-95
emulator, by simply testing each (emulated) instruction exhaustively.

> Relying too much on one register is probably a bad idea, but
> they probably just want to simplify register shuffling.
> I had thought I read that alternating registers was a better
> idea, so maybe try not relying too heavily on ECX either.

VP actually does put local variables into EBX/ESI/EDI, but only to a limited
extent, e.g. if a procedure uses more than three that are register-able, only
three will be put into registers, even though two (or more) might be more or
less independent, and could easily all be aliased to registers, which is what
I'm doing manually, by looking at the overall structure of the code, and
although I doubt that my hand-crafted in-line assembler is as fast as the
properly scheduled code emitted by the GCC or Intel C(++) compilers, I do know
that it's way ahead of what both VP and PFC produce, both size and speed-wise!.

> Actually, your {$else} code seems pretty sequential.

There's only so much you can do in parallel, and I think I've converted most of
what could have been converted to MMX (as using XMM instructions, if I may
believe everything I read, causes significant stalls, on (some) Intel CPUs, when
combined with YMM instructions, which I use for larger moves).

> I would
> just use "push dword[], pop dword[]" (but not for 486) and
> avoid ECX altogether. Maybe I'm naive, but that's a quick
> simplification. No idea if it really helps you, though.

It might be shorter (is it?), it definitely doesn't use registers, but is it
faster? I've got all the Agner Fog files, and I'm not sure.

rug...@nospicedham.gmail.com

unread,

Sep 8, 2017, 5:19:59 AM9/8/17

to

Hi,

On Wednesday, August 23, 2017 at 11:21:19 AM UTC-5, Robert Prins wrote:
> On 2017-08-23 14:44, George Neuner wrote:
> > On Wed, 23 Aug 2017 11:59:35 +0000, Robert Prins
> > <rob...@nospicedham.prino.org> wrote:
> >
> >> The files that you need is "lift-s...@2017-08-20.rar" To stop Google
> >> (I've emailed them numerous times about it) from falsely claiming that it
> >> contains a virus - submit lift.ixi to virustotal and you will see that it
> >> is safe, it's encrypted with that safest of safe passwords,"password". ;)
> >

> > Google is a PITA about executables and executable scripts, bootable ISOs, and
> > an ever growing list of file names that could be confused with system files
> > [on Windows or Linux].
> >
> > Circumventing Google's censorship just to transfer files is a part time job
> > for many people.
>
> Encrypting an archive with WinRAR *and* encrypting the filenames seems to be a
> solution. I'm using DynDNS and back in Belgium that works OK from a fixed
> connection for my FTP site, but for some reason I cannot get it to work here in
> Vilnius via our mobile connection. Of course the real problem is the fact that
> the idiots at Google still use virus scanners that use heuristics that do not
> work, like F-Prot - which I've been using for more than a decade, which I've
> emailed at least half a dozen times this year about their false positives, and
> who are going to loose me me as a customer in the next three weeks!

I don't use Google Drive, but I saw this and thought I should mention it
(since it's barely relevant):

https://www.cnet.com/news/update-now-google-drive-dies-next-march-backup-and-sync/

"Google announced in July that it's replacing Google Drive with
Backup and Sync, which does the same thing as Google Drive but
slurps in more files. On Thursday, Google detailed its transition
plan, saying it'll stop supporting Drive on Dec. 11 and shut it
down completely on March 12, 2018."

"You can install Backup and Sync right now if you want to get the
transition out of the way. After all, you're not going to be able
to postpone it forever."

https://support.google.com/drive/answer/2374987

rug...@nospicedham.gmail.com

unread,

Sep 8, 2017, 5:21:01 AM9/8/17

to

Hi,

On Sunday, September 3, 2017 at 1:24:05 PM UTC-5, Robert Prins wrote:
> On 2017-09-02 23:54, rug...@nospicedham.gmail.com wrote:
> >
> > So why not rewrite in C? It can't be that hard (famous last words!).
>
> I've been brought up with ALGOL 60, and TI-59-ese ;) and around 1984/5 my father
> bought Turbo Pascal 2, and in 1985 I started work, with PL/I. The next language
> (REXX) followed in early 1992, and I've never had the luck (or is it misfortune)
> to work with C.

Well, one could argue that speed is irrelevant, but you seem intent
upon reaching modern GCC's compiled output speed. I'm more sympathetic
to Pascal, but if things like FPC and GPC aren't fast enough, there's
only so much one can do. (Convert to Ada??)

> > Though personally I'd suggest rather fixing to work with FPC,
> > that's more useful and important (IMHO).
>
> No, it's not. VP may be dead, it's hellish to add post Pentium instructions in
> the form of long DB sequences to it, and debugging them is of course impossible,
> but the IDE is still light-years ahead of what FPC offers. My goal is to
> eventually convert the program into pure assembler, probably via FASM, and I
> will not go back to FPC until it has an IDE that is as smooth as the one used by
> VP, in other words: probably never. :(

I don't use the IDE. I don't see how that's a dealbreaker.

> >> Both VP and FPC seem to make way too much use of EAX...
> >
> > I'm not an optimization guru. I haven't read Agner Fog's manuals
> > closely. Modern cpus probably do heavy register renaming and
> > lots of out-of-order (pipelined, superscalar, whatever) stuff.
> > I think older ones were pickier about certain things, but I
> > don't know if you care about (or test your code on) such machines.
>
> My desktop uses an AMD FX8150, the laptop an Intel quadcore mobile i7, and the
> Pure Pascal version of the program would likely still run on a 386, but how
> useful is that, I don't even think I even have anything pre-486, and the 486
> probably has DOS on it

I didn't realistically expect you to target a 486. Half my point was
that lots of classic optimizations are obsolete.

BTW, I find AVX uninteresting. Of course, I (still) don't have any
AVX-enabled machines! It's a waste of time (almost) chasing the
holy grail of optimizations, so I don't keep up with all the latest
additions.

> ... although I doubt that my hand-crafted in-line assembler is as fast

> as the properly scheduled code emitted by the GCC or Intel C(++) compilers,
> I do know that it's way ahead of what both VP and PFC produce, both size
> and speed-wise!.

I almost forgot: obviously, you can mix code from multiple compilers
(assuming common calling convention, linker format, etc). FPC is good
about that, but I guess you're stuck with VP (IDE) for now.

> > Actually, your {$else} code seems pretty sequential.
>
> There's only so much you can do in parallel, and I think I've converted
> most of what could have been converted to MMX

I meant using the same register in a row, depending on previous results.
Old cpus did it sequentially (slowly), but newer ones obviously can
"sometimes" avoid such stalls.

> > I would just use "push dword[], pop dword[]" (but not for 486)
> > and avoid ECX altogether. Maybe I'm naive, but that's a quick
> > simplification. No idea if it really helps you, though.
>
> It might be shorter (is it?), it definitely doesn't use registers,
> but is it faster? I've got all the Agner Fog files, and I'm not sure.

I'm still not sure how to test this since I can't fully rebuild it
with FPC. I could use VPC, but if you're using AVX a lot, that won't
work here either.