Comparison of compiler generated code AD 1980(ish) v 2010(ish)

Robert AH Prins

unread,

May 15, 2012, 6:07:51 PM5/15/12

to

Can anyone skilled in the art tell me why a compiler that probably dates back to
the late 1970'ies or early 1980'ies generates the following short and sweet code
for a PL/I "BY NAME" assignment, while the not completely new (but still fairly
recent) version of Enterprise PL/I (V3R9) generates the very, very, very
long-winded code below it? Or is this (V3R9) code (that predates the OOO z196
architecture) really faster?

OS PL/I V2.3.0 - OPT(2)
343 1 2 REPT_LINE = REPT_LIST, BY NAME;

* STATEMENT NUMBER 343
002664 58 70 8 268 L 7,REPT_WORK.LINE_PTR
002668 58 60 8 030 L 6,REPT_WORK.REPT_PTR
00266C 58 F0 3 600 L 15,1536(0,3)
002670 D2 03 7 003 F B54 MVC REPT_LINE.TR(4),2900(15)
002676 DE 03 7 003 6 00C ED REPT_LINE.TR(4),REPT_LIST.TR
00267C D2 03 7 00A F B54 MVC REPT_LINE.RE(4),2900(15)
002682 DE 03 7 00A 6 00E ED REPT_LINE.RI(4),REPT_LIST.RI
002688 D2 02 7 011 6 010 MVC REPT_LINE.DA(3),REPT_LIST.DA
00268E 58 E0 3 608 L 14,1544(0,3)
002692 D2 06 4 158 E 5D4 MVC 344(7,4),1492(14)
002698 DE 06 4 158 6 014 ED 344(7,4),REPT_LIST.K+1
00269E D2 05 7 017 4 159 MVC REPT_LINE.K(6),345(4)
0026A4 D2 06 4 158 E 5D4 MVC 344(7,4),1492(14)
0026AA DE 06 4 158 6 01B ED 344(7,4),REPT_LIST.V
0026B0 D2 04 7 028 4 15A MVC REPT_LINE.V(5),346(4)
0026B6 D2 03 7 030 6 026 MVC REPT_LINE.NA(4),REPT_LIST.NA
0026BC D2 03 7 036 6 02A MVC REPT_LINE.TY(4),REPT_LIST.TY
0026C2 D2 03 7 03D 6 02E MVC REPT_LINE.CO(4),REPT_LIST.CO
0026C8 D2 00 7 04B 6 036 MVC REPT_LINE.SP(1),REPT_LIST.SP
0026CE D2 03 7 05F 6 043 MVC REPT_LINE.DATE.YEAR(4),REPT_LIST.DATE.YEAR
0026D4 D2 01 7 064 6 047 MVC REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH
0026DA D2 01 7 067 6 049 MVC REPT_LINE.DATE.DAY(2),REPT_LIST.DATE.DAY

Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
3120.0 368 1 2 rept_line = rept_list, by name;

003E40 E350 D340 0624 003120 | STG r5,#SPILL33(,r13,25408)
003E46 E320 D270 0624 003120 | STG r2,#SPILL7(,r13,25200)
003E4C E350 D8FD 0571 003120 | LAY r5,_temp9(,r13,22781)
003E52 E300 D368 0604 003120 | LG r0,#SPILL38(,r13,25448)
003E58 E340 D308 0624 003120 | STG r4,#SPILL26(,r13,25352)
003E5E E310 D4B4 0271 003119 | LAY r1,LINE(,r13,9396)
003E64 E300 D8FC 0550 003120 | STY r0,_temp9(,r13,22780)
003E6A E300 D148 0214 003120 | LGF r0,<a1:d8520:l4>(,r13,8520)
003E70 D278 1000 4D33 003119 | MVC LINE(121,r1,0),REPT_INIT(r4,3379)
003E76 4110 E00C 003120 | LA r1,_shadow21(,r14,12)
003E7A E3E0 D8FC 0571 003120 | LAY r14,_temp9(,r13,22780)
003E80 DE03 E000 1000 003120 | ED _temp9(4,r14,0),_shadow21(r1,0)
003E86 B914 00E0 003120 | LGFR r14,r0
003E8A E300 D368 0604 003120 | LG r0,#SPILL38(,r13,25448)
003E90 4110 E003 003120 | LA r1,#AddressShadow(,r14,3)
003E94 41F0 E00A 003120 | LA r15,#AddressShadow(,r14,10)
003E98 D202 1001 5000 003120 | MVC _shadow21(3,r1,1),_temp9(r5,0)
003E9E 9240 E003 003120 | MVI _shadow21(r14,3),64
003EA2 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003EA8 E300 D984 0550 003120 | STY r0,_temp8(,r13,22916)
003EAE E350 D984 0571 003120 | LAY r5,_temp8(,r13,22916)
003EB4 4120 E017 003120 | LA r2,#AddressShadow(,r14,23)
003EB8 4110 100E 003120 | LA r1,_shadow21(,r1,14)
003EBC DE03 5000 1000 003120 | ED _temp8(4,r5,0),_shadow21(r1,0)
003EC2 E310 D985 0571 003120 | LAY r1,_temp8(,r13,22917)
003EC8 4140 E028 003120 | LA r4,#AddressShadow(,r14,40)
003ECC D202 F001 1000 003120 | MVC _shadow21(3,r15,1),_temp8(r1,0)
003ED2 9240 E00A 003120 | MVI _shadow21(r14,10),64
003ED6 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003EDC E3F0 D974 0571 003120 | LAY r15,_temp19(,r13,22900)
003EE2 D202 E011 1010 003120 | MVC _shadow21(3,r14,17),_shadow21(r1,16)
003EE8 E310 D238 0604 003120 | LG r1,#SPILL0(,r13,25144)
003EEE D206 F000 14A4 003120 | MVC _temp19(7,r15,0),' ......'(r1,1188)
003EF4 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003EFA D203 B95C 1013 003120 | MVC _temp15(4,r11,2396),_shadow18(r1,19)
003F00 E310 D90C 0571 003120 | LAY r1,_temp15(,r13,22796)
003F06 D202 B93C 1001 003120 | MVC _temp11(3,r11,2364),_shadow12(r1,1)
003F0C E310 D8EC 0571 003120 | LAY r1,_temp11(,r13,22764)
003F12 DE06 F000 1000 003120 | ED _temp19(7,r15,0),_temp11(r1,0)
003F18 E310 D975 0571 003120 | LAY r1,_temp19(,r13,22901)
003F1E D205 2000 1000 003120 | MVC _shadow21(6,r2,0),_temp19(r1,0)
003F24 E310 D238 0604 003120 | LG r1,#SPILL0(,r13,25144)
003F2A E320 D96C 0571 003120 | LAY r2,_temp21(,r13,22892)
003F30 D206 2000 14A4 003120 | MVC _temp21(7,r2,0),' ......'(r1,1188)
003F36 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003F3C D202 B939 101B 003120 | MVC _temp18(3,r11,2361),_shadow12(r1,27)
003F42 D202 B936 B939 003120 | MVC _temp20(3,r11,2358),_temp18(r11,2361)
003F48 E300 D8E6 0590 003120 | LLGC r0,<a1:d22758:l1>(,r13,22758)
003F4E E300 30EE 0080 003120 | NG r0,=X'00000000 0000000F'
003F54 E310 D8E6 0571 003120 | LAY r1,_temp20(,r13,22758)
003F5A E300 D8E6 0572 003120 | STCY r0,<a1:d22758:l1>(,r13,22758)
003F60 DE06 2000 1000 003120 | ED _temp21(7,r2,0),_temp20(r1,0)
003F66 E320 D96E 0571 003120 | LAY r2,_temp21(,r13,22894)
003F6C D204 4000 2000 003120 | MVC _shadow21(5,r4,0),_temp21(r2,0)
003F72 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003F78 E300 1026 0014 003120 | LGF r0,_shadow19(,r1,38)
003F7E 5000 E030 003120 | ST r0,_shadow19(,r14,48)
003F82 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003F88 E300 102A 0014 003120 | LGF r0,_shadow19(,r1,42)
003F8E 5000 E036 003120 | ST r0,_shadow19(,r14,54)
003F92 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003F98 E300 102E 0014 003120 | LGF r0,_shadow19(,r1,46)
003F9E 5000 E03D 003120 | ST r0,_shadow19(,r14,61)
003FA2 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003FA8 4300 1036 003120 | IC r0,_shadow21(,r1,54)
003FAC 4200 E04B 003120 | STC r0,_shadow21(,r14,75)
003FB0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003FB6 E300 1043 0014 003120 | LGF r0,_shadow19(,r1,67)
003FBC 5000 E05F 003120 | ST r0,_shadow19(,r14,95)
003FC0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003FC6 E300 1047 0015 003120 | LGH r0,_shadow20(,r1,71)
003FCC 4000 E064 003120 | STH r0,_shadow20(,r14,100)
003FD0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003FD6 E340 D9A8 0571 003121 | LAY r4,_temp12(,r13,22952)
003FDC E320 D270 0604 000000 | LG r2,#SPILL7(,r13,25200)
003FE2 E300 1049 0015 003120 | LGH r0,_shadow20(,r1,73)
003FE8 4000 E067 003120 | STH r0,_shadow20(,r14,103)

TEN superfluous reloads of R1? AD 2012? How the fluffing H can you call this an
optimizing compiler? How can someone from IBM tell you (i.e. me, two years ago!)
that "we are at least five years ahead of the competition"?

Oh, maybe it's because Enterprise PL/I is a direct descendant from Visual Age
PL/I for OS/2, a compiler that had to work on a CPU with just a dozen available
registers? Let's see what PL/I for Windows generates?

IBM(R) PL/I for Windows 8.0 (Built:20110825)
; 3132 rept_line = rept_list, by name;
mov ecx,[ebp-03680h]; REPT_WORK
mov [ebp-05938h],ecx; _temp67
push offset FLAT:@CBE273
add ecx,03h
mov edi,offset FLAT:@CBE213
mov edx,edi
mov [ebp-05a38h],edi; @CBE390
add eax,0ch
sub esp,0ch
mov edi,dword ptr __imp__IBMPCODP
call edi
mov edx,[ebp-05a38h]; @CBE390
push offset FLAT:@CBE273
mov eax,[ebp-05938h]; _temp67
lea ecx,[eax+0ah]
mov eax,[ebp-038b8h]; REPT_WORK
add eax,0eh
sub esp,0ch
call edi
mov eax,[ebp-05938h]; _temp67
mov edx,[ebp-038b8h]; REPT_WORK
add edx,010h
mov cx,[edx]
mov dl,[edx+02h]
mov [eax+013h],dl
mov [eax+011h],cx
push offset FLAT:@CBE58
lea ecx,[eax+017h]
mov edx,offset FLAT:@CBE224
mov eax,[ebp-038b8h]; REPT_WORK
add eax,013h
sub esp,0ch
call edi
mov eax,[ebp-05938h]; _temp67
push offset FLAT:@CBE27
lea ecx,[eax+028h]
mov edx,offset FLAT:@CBE218
mov eax,[ebp-038b8h]; REPT_WORK
add eax,01bh
sub esp,0ch
call edi
mov eax,[ebp-05938h]; _temp67
mov ecx,[ebp-038b8h]; REPT_WORK
mov ecx,[ecx+026h]
mov [eax+030h],ecx
mov ecx,[ebp-038b8h]; REPT_WORK
mov ecx,[ecx+02ah]
mov [eax+036h],ecx
mov ecx,[ebp-038b8h]; REPT_WORK
mov ecx,[ecx+02eh]
mov [eax+03dh],ecx
mov ecx,[ebp-038b8h]; REPT_WORK
mov cl,[ecx+036h]
mov [eax+04bh],cl
mov ecx,[ebp-038b8h]; REPT_WORK
mov ecx,[ecx+043h]
mov [eax+05fh],ecx
mov ecx,[ebp-038b8h]; REPT_WORK
mov cx,[ecx+047h]
mov [eax+064h],cx
mov ecx,[ebp-038b8h]; REPT_WORK
mov cx,[ecx+049h]
mov [eax+067h],cx

Wow! The code ends with the same six superfluous reloads, as ECX is needlessly
overwritten - why not use EDX?

Again, I'm only the observer, it's you and your companies that are paying for
the extra(?) CPU usage, and maybe a 16-byte three-instruction sequence like

003FC0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
003FC6 E300 1047 0015 003120 | LGH r0,_shadow20(,r1,71)
003FCC 4000 E064 003120 | STH r0,_shadow20(,r14,100)

is really faster than the simple 6-byte one-instruction sequence

0026D4 D2 01 7 064 6 047 MVC REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH

I always thought that the fastest instructions are those ones that are never
executed...

Robert
--
Robert AH Prins
robert(a)prino(d)org

Thomas David Rivers

unread,

May 15, 2012, 5:18:20 PM5/15/12

to

Robert AH Prins wrote:

...

>
> TEN superfluous reloads of R1? AD 2012? How the fluffing H can you
> call this an optimizing compiler? How can someone from IBM tell you
> (i.e. me, two years ago!) that "we are at least five years ahead of
> the competition"?
>

I was going to suggest checking the OPT() option setting, but I see in
your original
post that you specified OPT(2) in the IBM V2.3.0 compiler and OPT(3) in
the V3R9 compiler.
So, that's not it... OPT(3) is about as "OPT" as you can get...

This leads me to the my next question - just what "competition" does IBM
point to in the mainframe PL/I compiler
business?

And, we actually were the first compiler vendor with a 64-bit
offering... well before IBM... (in the C/C++ world.)
So, IBM is not always the first, or the best. (Our new v1.96 compiler
compares quite favorably to
IBM's, we think.)

Lastly - would you care to post the source to your example? Or, at
least the declarations of
"rept_line" and "rept_list"... wouldn't mind playing with this one myself...

- Dave Rivers -

--
riv...@dignus.com Work: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com

Shmuel Metz

unread,

May 15, 2012, 8:47:46 PM5/15/12

to

In <a1frcj...@mid.individual.net>, on 05/15/2012

at 10:07 PM, Robert AH Prins <spam...@prino.org> said:

>TEN superfluous reloads of R1? AD 2012?

I guess that peephole optimization is too recent.

>How the fluffing H can you call this an optimizing compiler?
>How can someone from IBM tell you (i.e. me, two years ago!)
>that "we are at least five years ahead of the competition"?

Their lips move.

Have they at least fixed it to use inline code for unaligned bit
strings with constant offsets and lengths. e.g., for SMF records?

--
Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

Unsolicited bulk E-mail subject to legal action. I reserve the
right to publicly post or ridicule any abusive E-mail. Reply to
domain Patriot dot net user shmuel+news to contact me. Do not
reply to spam...@library.lspace.org

robin....@gmail.com

unread,

May 16, 2012, 10:07:29 AM5/16/12

to rob...@prino.org

On Wednesday, 16 May 2012 08:07:51 UTC+10, Robert AH Prins wrote:

> OS PL/I V2.3.0 - OPT(2)
> 343 1 2 REPT_LINE = REPT_LIST, BY NAME;
>
> * STATEMENT NUMBER 343

> Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
> 3120.0 368 1 2 rept_line = rept_list, by name;

> IBM(R) PL/I for Windows 8.0 (Built:20110825)
> ; 3132 rept_line = rept_list, by name;

They are three different programs.

Robert AH Prins

unread,

May 16, 2012, 12:37:35 PM5/16/12

to

Those of you who have used OS PL/I, Enterprise PL/I and PL/I for Windows know that Enterprise PL/I
now bases the statement numbers of the pseudo-assembler listing on line-numbers and that "if..then"
now counts as two statements rather than one. PL/I for doze also bases its statement numbers on the
line of the source, but on z/OS a version number comment-line is added by the compile procedure, and
the z/OS compile was done with listview(afterall) whereas the doze compilation missed the
(irrelevant) extra comment line and used listview(source).

Anyway, of course this is the same program, but sadly RV seems to enjoy the board for his head too
much to actually investigate the matter, a bold "They are three different programs." is much easier.

In an off-list message I have told him what to do.

Robert AH Prins

unread,

May 16, 2012, 12:49:18 PM5/16/12

to

On 2012-05-16 16:37, Robert AH Prins wrote:
> On 2012-05-16 14:07, robin....@gmail.com wrote:
>> On Wednesday, 16 May 2012 08:07:51 UTC+10, Robert AH Prins wrote:
>>
>>> OS PL/I V2.3.0 - OPT(2)
>>> 343 1 2 REPT_LINE = REPT_LIST, BY NAME;
>>>
>>> * STATEMENT NUMBER 343
>>
>>
>>> Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
>>> 3120.0 368 1 2 rept_line = rept_list, by name;
>>
>>> IBM(R) PL/I for Windows 8.0 (Built:20110825)
>>> ; 3132 rept_line = rept_list, by name;
>>
>> They are three different programs.
>
> Those of you who have used OS PL/I, Enterprise PL/I and PL/I for Windows know that Enterprise PL/I
> now bases the statement numbers of the pseudo-assembler listing on line-numbers and that "if..then"
> now counts as two statements rather than one. PL/I for doze also bases its statement numbers on the
> line of the source, but on z/OS a version number comment-line is added by the compile procedure, and
> the z/OS compile was done with listview(afterall) whereas the doze compilation missed the
> (irrelevant) extra comment line and used listview(source).

Piggy-backing on myself: using listview(afterall) will produce a ".cod" file where the PL/I code and
generated x86 assembler are completely mashed up, that's why I don't use this option on doze.

I don't expect this to be fixed anytime soon.

Nomen Nescio

unread,

May 17, 2012, 4:38:51 AM5/17/12

to

That looks very odd to me, almost as if the optimization option didn't get
accepted and it generated pure dumb code. Can you verify the banner shows
that optimization is on and actually got done (however you do that
nowadays). Do you have to specify REORDER even with OPT(x)?

I guess you don't have a real machine to benchmark this on but if you did
running that assign statement in a big loop, if you could write one that
wouldn't get optimized out somehow, might tell you if it's really faster to
execute that pile of crap or not. Whatever you do, don't extrapolate the
results you got on Hercules to what will happen on a real Z9/Z10/Z196 etc.

> TEN superfluous reloads of R1? AD 2012? How the fluffing H can you call this an
> optimizing compiler? How can someone from IBM tell you (i.e. me, two years ago!)
> that "we are at least five years ahead of the competition"?

Because nobody else sells a PL/I compiler for MVS and if they wanted to it
would take 5 years to write one? ;-) But I have to admit this is disturbing
given IBM's PL/I has always had a good reputation for optimizing. I haven't
used it for work since before ESA but the new stuff is not looking too good!

> PL/I for OS/2, a compiler that had to work on a CPU with just a dozen available
> registers? Let's see what PL/I for Windows generates?

x86 doesn't really have a dozen available registers. Many of the so-called
GPRs are reserved for important stuff. You end up with 4 or 5 usable
registers in any heavy duty x86 code. Lots of thrashing is normal in x86
code but it doesn't seem to hurt performance much for some reason.

> Wow! The code ends with the same six superfluous reloads, as ECX is needlessly
> overwritten - why not use EDX?

At least IBM was smart enough to port the code generator from one platform
to the other. Those guys are no dummies! Somebody probably got a big bonus
for that.

> Again, I'm only the observer, it's you and your companies that are paying for
> the extra(?) CPU usage, and maybe a 16-byte three-instruction sequence like
>
> 003FC0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
> 003FC6 E300 1047 0015 003120 | LGH r0,_shadow20(,r1,71)
> 003FCC 4000 E064 003120 | STH r0,_shadow20(,r14,100)
>
> is really faster than the simple 6-byte one-instruction sequence
>
> 0026D4 D2 01 7 064 6 047 MVC REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH
>
> I always thought that the fastest instructions are those ones that are never
> executed...

I think that's still a safe bet. Thanks for posting this, it was very
enlightening.

Robert AH Prins

unread,

May 17, 2012, 9:22:10 AM5/17/12

to

On 2012-05-17 08:38, Nomen Nescio wrote:
> That looks very odd to me, almost as if the optimization option didn't get
> accepted and it generated pure dumb code. Can you verify the banner shows
> that optimization is on and actually got done (however you do that
> nowadays). Do you have to specify REORDER even with OPT(x)?

After my two previous goofs, I check, double-check and check again.

> I guess you don't have a real machine to benchmark this on but if you did

Now, how would you possibly know?

> running that assign statement in a big loop, if you could write one that
> wouldn't get optimized out somehow, might tell you if it's really faster to
> execute that pile of crap or not. Whatever you do, don't extrapolate the
> results you got on Hercules to what will happen on a real Z9/Z10/Z196 etc.

I might give it a try.

>> TEN superfluous reloads of R1? AD 2012? How the fluffing H can you call this an
>> optimizing compiler? How can someone from IBM tell you (i.e. me, two years ago!)
>> that "we are at least five years ahead of the competition"?
>
> Because nobody else sells a PL/I compiler for MVS and if they wanted to it
> would take 5 years to write one? ;-) But I have to admit this is disturbing
> given IBM's PL/I has always had a good reputation for optimizing. I haven't

The old optimizer also had it's fair share of quirks...

> used it for work since before ESA but the new stuff is not looking too good!

There are plenty more examples. The "funny" (not!) thing is that the program
that comes up with all this (and is responsible for at least three real APARs)
has absolutely nothing to do with anything business. The original version was
written in Turbo Pascal sometime in the late 1980'ies and my oldest readable
backups are from April 1994, anything older might be on 5 1/4" floopies and I
have now means of reading them. The oldest of the, currently 34, PL/I versions
dates back to 1993 and went "live" in 1994.

The PL/I versions have been tweaked and tweaked and tweaked (just like the
Pascal versions, in the current version whole procedures have been converted to
inline x86 assembler) and whenever I had the opportunity to use Strobe, I jumped
upon the chance to use it, and when there was no Strobe, the old OS Compiler's
"COUNT" option was immensely helpful in finding hotspots:

Current (V34) version: "Grand total 17,348,949" executed statements
Old (V3) version : "Grand total 992,522,398" executed statements

The ridiculous reduction is mostly thanks to a "Paul Green" who found a much
better way to code one procedure:

Old proc 975,133,412 executed statements
New proc 99,536 executed statements (0.01%, ouch!)

Paul used a completely different algorithm.

However, when working in the Netherlands in 1996, I optimized two CRC routines
and there the savings were a measly 99.3 and 99.5% - this was pre-Enterprise
PL/I and my change was to simply do all intermediate bit-fiddling with ALIGNED
bits, cutting out thousands of calls to the library.

A similar, non-algorithmic, change (also pre-EPLI) was made a few years later:
using "string()" on a structure of CHAR fields in a controlled structure turned
out to munch CPU like there was no tomorrow. Simply changing the structure into
one long CHAR field (and later accessing the fields via SUBSTR) brought the CPU
time used down by almost 90%.

All of this was possible because the generated assembler was, even if not
completely understandable, at least readable and seeing "L 15,A..IBMxxxxx" in
inner loops, combined with a list of what every IBMxxxxx library routine would
be doing could immediately raise red flags.

Of course we now have "LGF r15,=V(IBMxxxxx)(,rX,YY)" but where can we find what
"IBMQOFRD" is actually doing? Google? Try it, and or this one you'll get three
results, none of them making you any wiser.

Another? There is now a whole set of functions to convert dates and times into
other formats, and in the end they (probably) all use LE routines. From a point
of maintenance this is of course ideal, but performance does suffer, I've got
routines that convert a date to its JDN and back for dates from 1 March 0000 to
well beyond the year 9999, that take into account the Julian-Gregorian
transition from 4 to 15 October 1582, use floating point (ouch! - they can be
recoded to use only FIXED BIN (31)) and are still faster than the PL/I "DAYS"
and "DAYSTODATE" (nice symmetrical names). OK, they require a defined format,
but how many organisations use multiple date and/or time formats anyway.

>> PL/I for OS/2, a compiler that had to work on a CPU with just a dozen available
>> registers? Let's see what PL/I for Windows generates?
>
> x86 doesn't really have a dozen available registers. Many of the so-called
> GPRs are reserved for important stuff. You end up with 4 or 5 usable
> registers in any heavy duty x86 code. Lots of thrashing is normal in x86
> code but it doesn't seem to hurt performance much for some reason.

Typo, should have been half a dozen, as ESP and EBP are not available. (And I
don't see PL/I for Windows using SSE anytime soon)

>> Wow! The code ends with the same six superfluous reloads, as ECX is needlessly
>> overwritten - why not use EDX?
>
> At least IBM was smart enough to port the code generator from one platform
> to the other. Those guys are no dummies! Somebody probably got a big bonus
> for that.

Should we laugh or cry?

>> Again, I'm only the observer, it's you and your companies that are paying for
>> the extra(?) CPU usage, and maybe a 16-byte three-instruction sequence like
>>
>> 003FC0 E310 DF10 0158 003120 | LY r1,<a1:d7952:l4>(,r13,7952)
>> 003FC6 E300 1047 0015 003120 | LGH r0,_shadow20(,r1,71)
>> 003FCC 4000 E064 003120 | STH r0,_shadow20(,r14,100)
>>
>> is really faster than the simple 6-byte one-instruction sequence
>>
>> 0026D4 D2 01 7 064 6 047 MVC REPT_LINE.DATE.MONTH(2),REPT_LIST.DATE.MONTH
>>
>> I always thought that the fastest instructions are those ones that are never
>> executed...
>
> I think that's still a safe bet. Thanks for posting this, it was very
> enlightening.

Thanks!

I'd like to finish with something performance enhancing I've suggested to the
"PL/I powers that be" a long time ago, and I'll leave it up to the collective
readership of this group to judge its usefulness...

More and more XML data is being processed, and that's where the various PLISAX*
builtins come into play. Now, it would not surprise me if the data these parsers
return is put into trees and/or lists. As the format is potentially very free,
it's not unlikely that these trees and lists are being created using based
storage, either using structures with a defined size, or maybe in blobs of
storage grabbed using the ALLOC builtin. Of course one needs to keep track of
what's linked to what and FREEing a linked list or tree needs to be done
correctly to avoid storage leaks. Freeing a tree with 343 leafs or a linked list
with 42 items may be a pretty trivial task, but it takes time and CPU, and doing
it hundreds (or thousands, or millions) of times multiplies that time and the
CPU used.

So what did I suggest?

My suggestion was to modify the ALLOC builtin to allow allocation of a defined
number of bytes (as usual, rounded up to 8n) inside an AREA, while returning a
POINTER (and NOT an OFFSET). By using this functionality, an entire linked
list/binary tree could be build inside an AREA and all that would be needed to
free it in its entirety would be a simple "AREA = EMPTY();" statement, which
just rewrites the 16-byte area header, or in assember:

area = empty(); -> XC AREA(16,rX,whatever),AREA(rX,whatever)

which is probably quite fast. :D

I am not in a position to turn this into a formal request to IBM, but if you
think it's useful (and the method can be emulated very easily by directly
modifying the 16-byte header that precedes the actual data in an AREA variable),
and you discover that it saves you oodles of CPU, why not ask IBM to add it in a
next version of Enterprise PL/I.

Nomen Nescio

unread,

May 17, 2012, 11:30:13 AM5/17/12

to

Robert AH Prins <spam...@prino.org> wrote:

> On 2012-05-17 08:38, Nomen Nescio wrote:
> > That looks very odd to me, almost as if the optimization option didn't get
> > accepted and it generated pure dumb code. Can you verify the banner shows
> > that optimization is on and actually got done (however you do that
> > nowadays). Do you have to specify REORDER even with OPT(x)?
>
> After my two previous goofs, I check, double-check and check again.
>
> > I guess you don't have a real machine to benchmark this on but if you did
>
> Now, how would you possibly know?

Do you really want me to answer that?

> that comes up with all this (and is responsible for at least three real APARs)
> has absolutely nothing to do with anything business. The original version was
> written in Turbo Pascal sometime in the late 1980'ies and my oldest readable
> backups are from April 1994, anything older might be on 5 1/4" floopies and I
> have now means of reading them. The oldest of the, currently 34, PL/I versions
> dates back to 1993 and went "live" in 1994.

You can get a copy of 1970s PL/I F and run it on MVS 3.8. Maybe it will be
the best performer of all.

> Of course we now have "LGF r15,=V(IBMxxxxx)(,rX,YY)" but where can we find what
> "IBMQOFRD" is actually doing? Google? Try it, and or this one you'll get three
> results, none of them making you any wiser.

You might have a look in the LE manuals. I'm not saying it's there but it
would be a good place to start. And if that doesn't work and you really need
to know you could dump and trace it and find out what it does.

> Another? There is now a whole set of functions to convert dates and times into
> other formats, and in the end they (probably) all use LE routines. From a point
> of maintenance this is of course ideal, but performance does suffer, I've got
> routines that convert a date to its JDN and back for dates from 1 March 0000 to
> well beyond the year 9999, that take into account the Julian-Gregorian
> transition from 4 to 15 October 1582, use floating point (ouch! - they can be
> recoded to use only FIXED BIN (31)) and are still faster than the PL/I "DAYS"
> and "DAYSTODATE" (nice symmetrical names). OK, they require a defined format,
> but how many organisations use multiple date and/or time formats anyway.

Yeah that is the thing with LE and now you can't get away from it. One size
fits all whether it fits or not.

> More and more XML data is being processed, and that's where the various PLISAX*
> builtins come into play. Now, it would not surprise me if the data these parsers
> return is put into trees and/or lists. As the format is potentially very free,
> it's not unlikely that these trees and lists are being created using based
> storage, either using structures with a defined size, or maybe in blobs of
> storage grabbed using the ALLOC builtin. Of course one needs to keep track of
> what's linked to what and FREEing a linked list or tree needs to be done
> correctly to avoid storage leaks. Freeing a tree with 343 leafs or a linked list
> with 42 items may be a pretty trivial task, but it takes time and CPU, and doing
> it hundreds (or thousands, or millions) of times multiplies that time and the
> CPU used.
>
> So what did I suggest?

snip

The fast way to do this in assembler is using one unauthorized storage
subpool for all the requests for a tree and then doing a subpool release.
Not knowing if this is possible from PL/I, you could check and if it is you
could use that. If it is not available, then you could write an assist
routine in assembler that does the storage management as mentioned.
Hopefully this will not headbang LE but I don't work in that environment so
I don't know. I can't imagine it would but then IBM has done several things
I can't imagine lately.

>
> My suggestion was to modify the ALLOC builtin to allow allocation of a defined
> number of bytes (as usual, rounded up to 8n) inside an AREA, while returning a
> POINTER (and NOT an OFFSET). By using this functionality, an entire linked
> list/binary tree could be build inside an AREA and all that would be needed to
> free it in its entirety would be a simple "AREA = EMPTY();" statement, which
> just rewrites the 16-byte area header, or in assember:
>
> area = empty(); -> XC AREA(16,rX,whatever),AREA(rX,whatever)
>
> which is probably quite fast. :D

Is there any reason why you couldn't do this yourself in PL/I? BTW XC is
slow, and if you're clearing large areas it might be better to do it with
paging commands (I think you can) but you'll need to be APF authorized. You
can get pages cleared from VSM with normal GETMAIN but you have to get more
than a page or two, check the Macro Ref. This doesn't require APF
authorization. But that's only when the storage is acquired. It doesn't
help you clear it after you use it. I would guess MVCL would be faster for
clearing large areas than XC. There is probably another way or two but I
haven't needed this since we try not to depend on initialized storage
because it's expensive. If we need to clear it we use MVCL, unless the area
is tiny. Although that does consume a lot of registers...

> I am not in a position to turn this into a formal request to IBM, but if you
> think it's useful (and the method can be emulated very easily by directly
> modifying the 16-byte header that precedes the actual data in an AREA variable),
> and you discover that it saves you oodles of CPU, why not ask IBM to add it in a
> next version of Enterprise PL/I.

We do license PL/I along with everything else but I don't personally use it
and the support and QA guys don't care about performance because they just
use it to reproduce customer issues (we're not a production shop, we're a
software vendor). There should be somebody else who can help with this on
c.l.p though.

John W Kennedy

unread,

May 17, 2012, 12:23:22 PM5/17/12

to

On 2012-05-17 08:38:51 +0000, Nomen Nescio said:
> That looks very odd to me, almost as if the optimization option didn't get
> accepted and it generated pure dumb code. Can you verify the banner shows
> that optimization is on and actually got done (however you do that
> nowadays). Do you have to specify REORDER even with OPT(x)?

Yes, because REORDER breaks the language. The details of PL/I exception
handling, as the language is defined, are violently hostile to
meaningful optimization, which is a big reason that PL/I was unable to
replace FORTRAN, and the main reason that new languages intended for
compilation avoided exception handling until the 80s, when Ada and C++
introduced the try/catch model.

However, it is possible to install the Enterprise compiler in such a
way that REORDER is the default.

> x86 doesn't really have a dozen available registers. Many of the so-called
> GPRs are reserved for important stuff. You end up with 4 or 5 usable
> registers in any heavy duty x86 code. Lots of thrashing is normal in x86
> code but it doesn't seem to hurt performance much for some reason.

x64, of course, improves it massively.

--
John W Kennedy
Read the remains of Shakespeare's lost play, now annotated!
http://www.SKenSoftware.com/Double%20Falshood

Peter Flass

unread,

May 17, 2012, 4:04:34 PM5/17/12

to

On 5/17/2012 11:30 AM, Nomen Nescio wrote:
>
> You can get a copy of 1970s PL/I F and run it on MVS 3.8. Maybe it will be
> the best performer of all.

From what I've seen, I doubt it. (F) did a lot with subroutines for
data conversion, etc. where it would be faster to use inline code. Of
course, that's a tradeoff they made for smaller memory vs. faster
execution. I've often thought that I might better have tried to do less
inlining in Iron Spring PL/I, often it has come back to bite me, but I
first tried brute-force code and didn't like the way it looked.

>> My suggestion was to modify the ALLOC builtin to allow allocation of a defined
>> number of bytes (as usual, rounded up to 8n) inside an AREA, while returning a
>> POINTER (and NOT an OFFSET). By using this functionality, an entire linked
>> list/binary tree could be build inside an AREA and all that would be needed to
>> free it in its entirety would be a simple "AREA = EMPTY();" statement, which
>> just rewrites the 16-byte area header, or in assember:
>>
>> area = empty(); -> XC AREA(16,rX,whatever),AREA(rX,whatever)
>>
>> which is probably quite fast. :D
>
> Is there any reason why you couldn't do this yourself in PL/I?

Use a wrapper that converts the offset to a pointer and returns that.

--
Pete

Robert AH Prins

unread,

May 17, 2012, 7:09:51 PM5/17/12

to

That is of course a solution, but the fundamental problem is the fact that the
ALLOC builtin cannot be used to allocate storage inside an AREA variable.

Fritz Wuehler

unread,

May 17, 2012, 8:10:30 PM5/17/12

to

John W Kennedy <jwk...@attglobal.net> wrote:

> > x86 doesn't really have a dozen available registers. Many of the so-called
> > GPRs are reserved for important stuff. You end up with 4 or 5 usable
> > registers in any heavy duty x86 code. Lots of thrashing is normal in x86
> > code but it doesn't seem to hurt performance much for some reason.
>
> x64, of course, improves it massively.

Well massively as a percentage but not massively by 1964 OS/360 standards
since Intel (AMD actually) still requires you to throw 2 or 3 registers away
on stack management and other instructions that are so basic to getting
anything worthwhile done (ex. string compares/moves) have implied register
usage. But it doesn't seem like anybody coding on Intel cares. Most of them
haven't a clue and the guys who do have a clue usually haven't coded on
machines with enough registers (S/60, POWER, SPARC) to know what they're
missing. Like I said lots and lots of thrashing but they still get
acceptable performance whatever that means.

robin....@gmail.com

unread,

May 17, 2012, 10:16:35 PM5/17/12

to

On Friday, 18 May 2012 02:23:22 UTC+10, John W Kennedy wrote:
> On 2012-05-17 08:38:51 +0000, Nomen Nescio said:
> > That looks very odd to me, almost as if the optimization option didn't get
> > accepted and it generated pure dumb code. Can you verify the banner shows
> > that optimization is on and actually got done (however you do that
> > nowadays). Do you have to specify REORDER even with OPT(x)?
>
> Yes, because REORDER breaks the language.

That's nonsense.

> The details of PL/I exception
> handling, as the language is defined, are violently hostile to
> meaningful optimization,

Don't talk rubbish.
All that REORDER does is to allow the compiler to move code out of loops,
and to do things like eliminate common sub-expressions (compute them once),
in situations the same sub-expression is evaluated in two or more statements,
etc.

Things that can affect optimisation are labels (common to any language)
and presence of ON statements (e.g., code cannot be moved to a place
where it would be executed before an ON statement).

Given typical use of ON statements, they appear at or near the beginning of
a procedure, and thus do not have much influence on optimisation.

That said, it doesn't matter where the code is ; if it causes an interrupt,
the exception handler will get control, and, if appropriate, return control
to the point where the interrupt occurred.

> which is a big reason that PL/I was unable to replace FORTRAN,

It had nothing to do with whether or not PL/I replaced FORTRAN.

robin....@gmail.com

unread,

May 17, 2012, 10:24:39 PM5/17/12

to rob...@prino.org

On Thursday, 17 May 2012 23:22:10 UTC+10, Robert AH Prins wrote:

> However, when working in the Netherlands in 1996, I optimized two CRC routines
> and there the savings were a measly 99.3 and 99.5% - this was pre-Enterprise
> PL/I and my change was to simply do all intermediate bit-fiddling with ALIGNED
> bits, cutting out thousands of calls to the library.

The Programmer's Guide from PL/I-F days tells us that BIT strings
are best ALIGNed for speed.

glen herrmannsfeldt

unread,

May 17, 2012, 11:08:18 PM5/17/12

to

Fritz Wuehler <fr...@spamexpire-201205.rodent.frell.theremailer.net> wrote:

(snip on x64, IA32, and some others)

> Well massively as a percentage but not massively by 1964 OS/360 standards
> since Intel (AMD actually) still requires you to throw 2 or 3 registers away
> on stack management and other instructions that are so basic to getting
> anything worthwhile done (ex. string compares/moves) have implied register
> usage. But it doesn't seem like anybody coding on Intel cares. Most of them
> haven't a clue and the guys who do have a clue usually haven't coded on
> machines with enough registers (S/60, POWER, SPARC) to know what they're
> missing. Like I said lots and lots of thrashing but they still get
> acceptable performance whatever that means.

Well, S/360 requires that you not use some registers, too.

You at least need a base register, which you don't for IA32.

Register 0 has some limits on its use.

The OS/360 linkage registers, 1, 14, and 15 can be used for other
uses if one is careful. I don't know by now how much compilers do that.

-- glen

robin....@gmail.com

unread,

May 17, 2012, 11:30:51 PM5/17/12

to

On Friday, 18 May 2012 13:08:18 UTC+10, glen herrmannsfeldt wrote:

> Well, S/360 requires that you not use some registers, too.
>
> You at least need a base register, which you don't for IA32.

It's convenient to use a base register, but you don't have to have
one.

> Register 0 has some limits on its use.

It's the instructions that have limits on use of registers.
When zero is specified in the index field or the base field
of an instruction, no register is used.

The S/360 is archaic 1960s. Try System z.

robin....@gmail.com

unread,

May 17, 2012, 11:48:38 PM5/17/12

to rob...@prino.org

On Thursday, 17 May 2012 02:37:35 UTC+10, Robert AH Prins wrote:

> On 2012-05-16 14:07, robin.vow....@gmail.com wrote:
> > On Wednesday, 16 May 2012 08:07:51 UTC+10, Robert AH Prins wrote:
> >
> >> OS PL/I V2.3.0 - OPT(2)
> >> 343 1 2 REPT_LINE = REPT_LIST, BY NAME;
> >>
> >> * STATEMENT NUMBER 343
> >
> >
> >> Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
> >> 3120.0 368 1 2 rept_line = rept_list, by name;
> >
> >> IBM(R) PL/I for Windows 8.0 (Built:20110825)
> >> ; 3132 rept_line = rept_list, by name;
> >
> > They are three different programs.
>
> Those of you who have used OS PL/I, Enterprise PL/I and PL/I for Windows know that Enterprise PL/I
> now bases the statement numbers of the pseudo-assembler listing on line-numbers and that "if..then"
> now counts as two statements rather than one. PL/I for doze also bases its statement numbers on the
> line of the source, but on z/OS a version number comment-line is added by the compile procedure, and
> the z/OS compile was done with listview(afterall) whereas the doze compilation missed the
> (irrelevant) extra comment line and used listview(source).
>
> Anyway, of course this is the same program,

If it were the same program, we would see the same member names
in the code, but there aren't.
In any case, there is an obvious change, in case you hadn't noticed,
namely, the change from upper case source to lower-case.

And you haven't displayed the code as someone already asked.
Others (Tucker, Jalic, and Kewley) have made claims
about PL/I optimisation that proved to be false.

To justify you claim that the programs are the same,
you, Prins, would need to post the versions.
In view of your recent stuff-ups that is the bare minimum.

> but sadly RV seems to enjoy the board for his head too
> much to actually investigate the matter, a bold "They are three different programs." is much easier.

See above. I didn't base that conclusion on the lines shown above.
It was based on the pseudo-assembler.
I omitted the lengthy pseudo-assembler code, as there was no sense in
repeating it.

Nomen Nescio

unread,

May 18, 2012, 2:35:59 AM5/18/12

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> Fritz Wuehler <fr...@spamexpire-201205.rodent.frell.theremailer.net> wrote:
>
> (snip on x64, IA32, and some others)
>
> > Well massively as a percentage but not massively by 1964 OS/360 standards
> > since Intel (AMD actually) still requires you to throw 2 or 3 registers away
> > on stack management and other instructions that are so basic to getting
> > anything worthwhile done (ex. string compares/moves) have implied register
> > usage. But it doesn't seem like anybody coding on Intel cares. Most of them
> > haven't a clue and the guys who do have a clue usually haven't coded on
> > machines with enough registers (S/60, POWER, SPARC) to know what they're
> > missing. Like I said lots and lots of thrashing but they still get
> > acceptable performance whatever that means.
>
> Well, S/360 requires that you not use some registers, too.

That is true. All architectures require you to use *some* registers for
*some* things but in S/360 generally you're free to pick which ones and you
can use them however you want in other contexts.

> You at least need a base register, which you don't for IA32.

True. But you did for 808X. And not just one! And you still do in x86 and
AMD64 when you start up, until you switch modes ;-)

On most Intel implementations you're going to have to use EBP and ESP for
the stack. That's two registers gone. I could write S/360 code that shares a
base register for data and instructions, that's half as many registers.

> Register 0 has some limits on its use.

True, it's not a full GPR in that it can't be used for addressing. Other
architectures have even more severe contraints on some registers. For
example GPR0 in SPARC is always zero, and writing to it is like /dev/null
At least in S/360 you can read and write GPR0 just like any other register.

> The OS/360 linkage registers, 1, 14, and 15 can be used for other
> uses if one is careful. I don't know by now how much compilers do that.

Those registers are used for linkage but when you're not actually
invoking a service you can use them however you want. If you write code on
S/360 and x86 I think you'll agree x86 feels constrained regarding
registers. I haven't written enough AMD64 to make a decision yet but I still
feel they should have specified more registers when they had a chance. And
btw if you want to talk about linkage registers, look at the AMD64 ABI for
UNIX if you haven't already. What a complicated ugly mess, like everything
Intel...

Robert AH Prins

unread,

May 18, 2012, 4:41:21 AM5/18/12

to

On 2012-05-18 03:48, robin....@gmail.com wrote:
> On Thursday, 17 May 2012 02:37:35 UTC+10, Robert AH Prins wrote:
>> On 2012-05-16 14:07, robin.vow....@gmail.com wrote:
>>> On Wednesday, 16 May 2012 08:07:51 UTC+10, Robert AH Prins wrote:
>>>
>>>> OS PL/I V2.3.0 - OPT(2)
>>>> 343 1 2 REPT_LINE = REPT_LIST, BY NAME;
>>>>
>>>> * STATEMENT NUMBER 343
>>>
>>>
>>>> Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
>>>> 3120.0 368 1 2 rept_line = rept_list, by name;
>>>
>>>> IBM(R) PL/I for Windows 8.0 (Built:20110825)
>>>> ; 3132 rept_line = rept_list, by name;
>>>
>>> They are three different programs.
>>
>> Those of you who have used OS PL/I, Enterprise PL/I and PL/I for Windows know that Enterprise PL/I
>> now bases the statement numbers of the pseudo-assembler listing on line-numbers and that "if..then"
>> now counts as two statements rather than one. PL/I for doze also bases its statement numbers on the
>> line of the source, but on z/OS a version number comment-line is added by the compile procedure, and
>> the z/OS compile was done with listview(afterall) whereas the doze compilation missed the
>> (irrelevant) extra comment line and used listview(source).
>>
>> Anyway, of course this is the same program,
>
> If it were the same program, we would see the same member names

WTH are you talking about?

> in the code, but there aren't.
> In any case, there is an obvious change, in case you hadn't noticed,
> namely, the change from upper case source to lower-case.

You are such an amazing incompetent dork. If you had ever opened the manuals of
the latest versions of Enterprise PL/I, you would have known that there is now a
compiler option "pp(macro('case(asis)'))". But given that you couldn't be
bothered to look at the history of the Wiki pages, I don't expect you to ever
bother opening a 666 page Programming Guide or an 818 page Language Reference,
page numbers are for for the PDF's of the EPLI V4.2 manuals - *I* actually read
them!

> And you haven't displayed the code as someone already asked.
> Others (Tucker, Jalic, and Kewley) have made claims
> about PL/I optimisation that proved to be false.
>
> To justify you claim that the programs are the same,
> you, Prins, would need to post the versions.

Version. I am not going to post an 11,565 line program.

> In view of your recent stuff-ups that is the bare minimum.

I apologized for them, when was the last time you publicly apologized for anything?

>> but sadly RV seems to enjoy the board for his head too
>> much to actually investigate the matter, a bold "They are three different programs." is much easier.
>
> See above. I didn't base that conclusion on the lines shown above.
> It was based on the pseudo-assembler.
> I omitted the lengthy pseudo-assembler code, as there was no sense in
> repeating it.

Show the relevant lines or shut up!

glen herrmannsfeldt

unread,

May 18, 2012, 6:29:43 AM5/18/12

to

Nomen Nescio <nob...@dizum.com> wrote:

(snip, I wrote)

>> Well, S/360 requires that you not use some registers, too.

> That is true. All architectures require you to use *some* registers
> for *some* things but in S/360 generally you're free to pick
> which ones and you can use them however you want in other contexts.

>> You at least need a base register, which you don't for IA32.

> True. But you did for 808X. And not just one! And you still do in
> x86 and AMD64 when you start up, until you switch modes ;-)

> On most Intel implementations you're going to have to use EBP
> and ESP for the stack. That's two registers gone. I could write
> S/360 code that shares a base register for data and
> instructions, that's half as many registers.

Fortran usually shares the base registor for code and static
(the only kind in Fortran 66 days) data. PL/I used to share the
register for local data and the save area. (Maybe still does,
I haven't looked lately.)

>> Register 0 has some limits on its use.

> True, it's not a full GPR in that it can't be used for addressing.
> Other architectures have even more severe contraints on some
> registers. For example GPR0 in SPARC is always zero, and writing
> to it is like /dev/null At least in S/360 you can read and
> write GPR0 just like any other register.

>> The OS/360 linkage registers, 1, 14, and 15 can be used for other
>> uses if one is careful. I don't know by now how much compilers do that.

> Those registers are used for linkage but when you're not actually
> invoking a service you can use them however you want.

Yes. But do compilers actually do that? Since you could have a
function call while evaluating an expression, you might need them.

> If you write code on S/360 and x86 I think you'll agree x86
> feels constrained regarding registers. I haven't written
> enough AMD64 to make a decision yet but I still feel they
> should have specified more registers when they had a chance.

I suppose, but that means more bits to address the registers.

> And btw if you want to talk about linkage registers, look at
> the AMD64 ABI for UNIX if you haven't already. What a
> complicated ugly mess, like everything Intel...

I wonder who actually defined the ABI. Was it really intel?

-- glen

Shmuel Metz

unread,

May 18, 2012, 10:15:36 AM5/18/12

to

In <jp4ef2$809$1...@speranza.aioe.org>, on 05/18/2012

at 03:08 AM, glen herrmannsfeldt <g...@ugcs.caltech.edu> said:

>Well, S/360 requires that you not use some registers, too.

Not as many.

>You at least need a base register,

True for S/360 but not for any current processor in its line.

>Register 0 has some limits on its use.

Also, registers 1 and 2 are used by some instructions, e.g., TRT.

>The OS/360 linkage registers, 1, 14, and 15

Don't forget R13.

Shmuel Metz

unread,

May 18, 2012, 10:08:40 AM5/18/12

to

In <a1l7om...@mid.individual.net>, on 05/17/2012

at 11:09 PM, Robert AH Prins <spam...@prino.org> said:

>That is of course a solution, but the fundamental problem is the
>fact that the ALLOC builtin cannot be used to allocate storage
>inside an AREA variable.

Why not use the ALLOCATE statement with IN, rather than using the
ALLOC function?

Robert AH Prins

unread,

May 18, 2012, 1:32:21 PM5/18/12

to

On 2012-05-18 14:08, Shmuel (Seymour J.) Metz wrote:
> In<a1l7om...@mid.individual.net>, on 05/17/2012
> at 11:09 PM, Robert AH Prins<spam...@prino.org> said:
>
>> That is of course a solution, but the fundamental problem is the
>> fact that the ALLOC builtin cannot be used to allocate storage
>> inside an AREA variable.
>
> Why not use the ALLOCATE statement with IN, rather than using the
> ALLOC function?

Because the ALLOCATE statement doesn't allow you to allocate a variable number
of bytes like the ALLOC builtin.

Peter Flass

unread,

May 18, 2012, 1:32:59 PM5/18/12

to

What do you expect from a ripoff of a C function?

--
Pete

Peter Flass

unread,

May 18, 2012, 1:38:18 PM5/18/12

to

On 5/17/2012 10:16 PM, robin....@gmail.com wrote:
> On Friday, 18 May 2012 02:23:22 UTC+10, John W Kennedy wrote:
>> On 2012-05-17 08:38:51 +0000, Nomen Nescio said:
>>> That looks very odd to me, almost as if the optimization option didn't get
>>> accepted and it generated pure dumb code. Can you verify the banner shows
>>> that optimization is on and actually got done (however you do that
>>> nowadays). Do you have to specify REORDER even with OPT(x)?
>>
>> Yes, because REORDER breaks the language.
>
> That's nonsense.
>
>> The details of PL/I exception
>> handling, as the language is defined, are violently hostile to
>> meaningful optimization,
>
> Don't talk rubbish.
> All that REORDER does is to allow the compiler to move code out of loops,
> and to do things like eliminate common sub-expressions (compute them once),
> in situations the same sub-expression is evaluated in two or more statements,
> etc.

That does break the language. What if the code moved raises an
exception condition? Presumably it's trapped there, and therefore never
executes the loop head or any other code that might come before it in
the loop.

What should the compiler do when converting a character literal to
arithmetic when the literal contains invalid data? I thought about this
- I issue an error message during compilation, but then try to do the
conversion at run-time anyhow, because the programmer might have a
CONVERSION ON-Unit that will fix it up. Far-fetched, but possible.

>
> Things that can affect optimisation are labels (common to any language)
> and presence of ON statements (e.g., code cannot be moved to a place
> where it would be executed before an ON statement).
>
> Given typical use of ON statements, they appear at or near the beginning of
> a procedure, and thus do not have much influence on optimisation.

Usually...

>
> That said, it doesn't matter where the code is ; if it causes an interrupt,
> the exception handler will get control, and, if appropriate, return control
> to the point where the interrupt occurred.

See above.

>
>> which is a big reason that PL/I was unable to replace FORTRAN,
>
> It had nothing to do with whether or not PL/I replaced FORTRAN.

What's FORTRAN? ;-)

--
Pete

Peter Flass

unread,

May 18, 2012, 1:40:21 PM5/18/12

to

On 5/17/2012 11:30 PM, robin....@gmail.com wrote:
> On Friday, 18 May 2012 13:08:18 UTC+10, glen herrmannsfeldt wrote:
>
>> Well, S/360 requires that you not use some registers, too.

But it has a few more to work with.

>>
>> You at least need a base register, which you don't for IA32.
>
> It's convenient to use a base register, but you don't have to have
> one.

As long as all your code resides in the first 4K of memory, in which
vase you can use R0.

>
>> Register 0 has some limits on its use.
>
> It's the instructions that have limits on use of registers.
> When zero is specified in the index field or the base field
> of an instruction, no register is used.
>
> The S/360 is archaic 1960s. Try System z.

--
Pete

Peter Flass

unread,

May 18, 2012, 1:45:57 PM5/18/12

to

I think I had some way to fudge this before the ALLOCATE builtin was
available. I'll have to look and see if I can find what I did.

--
Pete

Robert AH Prins

unread,

May 18, 2012, 4:12:30 PM5/18/12

to

It's pretty simple to fudge it by updating the header of the area itself:
Assuming the the tree/list is built without deletions of earlier leafs/items,
it's just a matter of keeping track of the free space in the area (which is not
stored anywhere) and updating the next-available byte. You can completely bypass
the fact that all allocated data in an area is kept in a linked list. I've done
this at one client half a decade ago and it worked very well. The code took care
of the AREA condition by allocating another AREA (the area's themselves were
controlled and could be freed by a simple "do while allocn(myarea) > 0"), but
the size of the initial area was large enough for this to happen only very rarely.

John W Kennedy

unread,

May 18, 2012, 3:24:33 PM5/18/12

to

On 2012-05-18 06:35:59 +0000, Nomen Nescio said:

> glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>
>> Fritz Wuehler <fr...@spamexpire-201205.rodent.frell.theremailer.net> wrote:
>>
>> (snip on x64, IA32, and some others)
>>
>>> Well massively as a percentage but not massively by 1964 OS/360 standards
>>> since Intel (AMD actually) still requires you to throw 2 or 3 registers away
>>> on stack management and other instructions that are so basic to getting
>>> anything worthwhile done (ex. string compares/moves) have implied register
>>> usage. But it doesn't seem like anybody coding on Intel cares. Most of them
>>> haven't a clue and the guys who do have a clue usually haven't coded on
>>> machines with enough registers (S/60, POWER, SPARC) to know what they're
>>> missing. Like I said lots and lots of thrashing but they still get
>>> acceptable performance whatever that means.
>>
>> Well, S/360 requires that you not use some registers, too.
>
> That is true. All architectures require you to use *some* registers for
> *some* things but in S/360 generally you're free to pick which ones and you
> can use them however you want in other contexts.
>
>> You at least need a base register, which you don't for IA32.
>
> True. But you did for 808X. And not just one! And you still do in x86 and
> AMD64 when you start up, until you switch modes ;-)
>
> On most Intel implementations you're going to have to use EBP and ESP for
> the stack. That's two registers gone. I could write S/360 code that shares a
> base register for data and instructions, that's half as many registers.

Only if you're writing non-reentrant, non-recursive, mid-60s code.

Peter Flass

unread,

May 18, 2012, 8:02:17 PM5/18/12

to

I was thinking of alocating a controlled variable and passing back its
address. As long as you free the allocated data all at once by eptying
the area this works.

--
Pete

James J. Weinkam

unread,

May 19, 2012, 6:47:30 PM5/19/12

to

Just got back from a two week trip, so I may be missing something. If I understand it, you want to construct a linked
structure in an AREA but want to use pointers rather than offsets. This allows you to free the entire structure by
emptying the area. If no other freeing of objects in the area is involved, you can dispense with the area entirely and
do the following:

dcl (first,unav,avail) ptr, size bin fixed(31) value(whatever);
first,avail=allocate(size); unav=first+size;

...
alloc: proc(n,b) returns(ptr);
/* allocate n bytes on boundary b (power of 2) */
dcl (n,p) ptr, (b,pp def p) bin fixed(31) unsigned;
p=avail; if b>0 then pp=iand(pp+b-1,inot(b-1));
avail=p+n; if avail>unav then signal condition(full);
return(p);
end alloc;

If you don't like that you could use

alloc: proc(a,n) returns(ptr);
/* allocate n>4 bytes in area a and return a pointer to the allocated bytes */
dcl
a area(*),
n bin fixed(31) unsigned,
q offset(a),
1 xx based(q),
2 s bin fixed(31) unsigned,
2 x(n-4 refer(s)) char(1);
allocate xx in(a); return(q);
end alloc;

> Robert

Shmuel Metz

unread,

May 19, 2012, 9:52:18 PM5/19/12

to

In <a1n8bp...@mid.individual.net>, on 05/18/2012

at 05:32 PM, Robert AH Prins <spam...@prino.org> said:

>Because the ALLOCATE statement doesn't allow you to allocate a
>variable number of bytes like the ALLOC builtin.

True, it does not allow you to do it in the error-prone way that the
BIF does, but rather allows you to do it in a safer way. What's wrong
with allowing the compiler and run-time libraries to calculate the
number of bytes in a data structure?

Actually, if I really wanted to calculate the number of bytes myself,
I could still use the ALLOCATE statement, but I would consider doing
so to be bad form.

Fritz Wuehler

unread,

May 20, 2012, 7:35:34 AM5/20/12

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> >> The OS/360 linkage registers, 1, 14, and 15 can be used for other
> >> uses if one is careful. I don't know by now how much compilers do that.
>
> > Those registers are used for linkage but when you're not actually
> > invoking a service you can use them however you want.
>
> Yes. But do compilers actually do that? Since you could have a
> function call while evaluating an expression, you might need them.

I don't know the answer to that since everything I code is in assembler and
I haven't had to look into that in the recent past. The point is you *can*
do that, and the OS and program product code does that and vendor code does
that. And that is a significant portion of code. And it is all in assembler.

I use those registers for temporaries and it helps keep your code clean when
you do because you have to consider them volatile and pay attention on
minimizing thrashing and consider the lifetimes of control blocks and other
storage areas. Mostly, they are used for setting up parm lists for calls and
doing long moves since you need pairs of registers for the latter anyway.

> > If you write code on S/360 and x86 I think you'll agree x86
> > feels constrained regarding registers. I haven't written
> > enough AMD64 to make a decision yet but I still feel they
> > should have specified more registers when they had a chance.
>
> I suppose, but that means more bits to address the registers.

I don't think that is a valid complaint based on the abomination required to
encode Intel instructions. Up to 15 bytes!? Surely it couldn't be worse than
it is now and I don't think anybody would notice the difference.

> > And btw if you want to talk about linkage registers, look at
> > the AMD64 ABI for UNIX if you haven't already. What a
> > complicated ugly mess, like everything Intel...
>
> I wonder who actually defined the ABI. Was it really intel?

I don't know. The copy I have says it was "edited" by 3 guys from suse and
one guy from codesourcery. I don't know who "wrote" it. My point was not
necessarily that Intel wrote it, but like everything from Intel, it's an
overcomplicated sloppy mess.

robin....@gmail.com

unread,

May 23, 2012, 11:02:08 PM5/23/12

to rob...@prino.org

On Friday, 18 May 2012 18:41:21 UTC+10, Robert AH Prins wrote:

> On 2012-05-18 03:48, robin.vow...@gmail.com wrote:
> > On Thursday, 17 May 2012 02:37:35 UTC+10, Robert AH Prins wrote:
> >> On 2012-05-16 14:07, robin.vow....@gmail.com wrote:
> >>> On Wednesday, 16 May 2012 08:07:51 UTC+10, Robert AH Prins wrote:
> >>>
> >>>> OS PL/I V2.3.0 - OPT(2)
> >>>> 343 1 2 REPT_LINE = REPT_LIST, BY NAME;
> >>>>
> >>>> * STATEMENT NUMBER 343
> >>>
> >>>
> >>>> Enterprise PL/I for z/OS V3.R9.M0 (Built:20100923) - OPT(3)
> >>>> 3120.0 368 1 2 rept_line = rept_list, by name;
> >>>
> >>>> IBM(R) PL/I for Windows 8.0 (Built:20110825)
> >>>> ; 3132 rept_line = rept_list, by name;
> >>>
> >>> They are three different programs.
> >>
> >> Those of you who have used OS PL/I, Enterprise PL/I and PL/I for Windows know that Enterprise PL/I
> >> now bases the statement numbers of the pseudo-assembler listing on line-numbers and that "if..then"
> >> now counts as two statements rather than one. PL/I for doze also bases its statement numbers on the
> >> line of the source, but on z/OS a version number comment-line is added by the compile procedure, and
> >> the z/OS compile was done with listview(afterall) whereas the doze compilation missed the
> >> (irrelevant) extra comment line and used listview(source).
> >>
> >> Anyway, of course this is the same program,
> >
> > If it were the same program, we would see the same member names
>
> WTH are you talking about?

A member name is something that's part of a PL/I structure.

> > in the code, but there aren't.
> > In any case, there is an obvious change, in case you hadn't noticed,
> > namely, the change from upper case source to lower-case.
>
> You are such an amazing incompetent dork.

Of all the dorks, it's you who is the greatest incompetent.

> If you had ever opened the manuals of
> the latest versions of Enterprise PL/I, you would have known that there is now a
> compiler option "pp(macro('case(asis)'))". But given that you couldn't be
> bothered to look at the history of the Wiki pages,

I don't waste my time doing stupid things.
I see that you are still suffering from delusions of grandeur
by even hoping that someone somewhere would want to see some program
that you wrote.

> I don't expect you to ever
> bother opening a 666 page Programming Guide or an 818 page Language Reference,
> page numbers are for for the PDF's of the EPLI V4.2 manuals - *I* actually read
> them!

With your eyes shut? Obviously.

> > And you haven't displayed the code as someone already asked.
> > Others (Tucker, Jalic, and Kewley) have made claims
> > about PL/I optimisation that proved to be false.
> >
> > To justify you claim that the programs are the same,
> > you, Prins, would need to post the versions.
>
> Version. I am not going to post an 11,565 line program.

Well, then, you could put it on a web site.
Put up or shut up.

robin....@gmail.com

unread,

May 24, 2012, 11:57:17 AM5/24/12

to

On Saturday, 19 May 2012 03:38:18 UTC+10, Peter Flass wrote:

> On 5/17/2012 10:16 PM, robin.vow...@gmail.com wrote:
> > On Friday, 18 May 2012 02:23:22 UTC+10, John W Kennedy wrote:
> >> On 2012-05-17 08:38:51 +0000, Nomen Nescio said:
> >>> That looks very odd to me, almost as if the optimization option didn't get
> >>> accepted and it generated pure dumb code. Can you verify the banner shows
> >>> that optimization is on and actually got done (however you do that
> >>> nowadays). Do you have to specify REORDER even with OPT(x)?
> >>
> >> Yes, because REORDER breaks the language.
> >
> > That's nonsense.
> >
> >> The details of PL/I exception
> >> handling, as the language is defined, are violently hostile to
> >> meaningful optimization,
> >
> > Don't talk rubbish.
> > All that REORDER does is to allow the compiler to move code out of loops,
> > and to do things like eliminate common sub-expressions (compute them once),
> > in situations the same sub-expression is evaluated in two or more statements,
> > etc.
>
> That does break the language. What if the code moved raises an
> exception condition? Presumably it's trapped there, and therefore never
> executes the loop head or any other code that might come before it in
> the loop.

What if it does? It's still part of the loop.
If the error is such that it's continue-able (e.g. underflow, stringrange, etc)
execution will continue normally.
If the error is not continue-able (subscript error, division by zero, etc),
the condition is raised, and either standard system action takes place,
or the error-handler deals with it.
All perfectly normal.

What if there is an ON-unit that directs control to the head of the loop?
The label head of the loop is -- of course -- kept ahead of the moved code
during the optimisation process. The moved code is re-executed, because
that code is part of the loop -- and includes the normal loop head.

In other words, nothing is broken.

> What should the compiler do when converting a character literal to
> arithmetic when the literal contains invalid data? I thought about this
> - I issue an error message during compilation, but then try to do the
> conversion at run-time anyhow, because the programmer might have a
> CONVERSION ON-Unit that will fix it up. Far-fetched, but possible.

Since the conversion error is detected during compilation, then the obvious
outcome is a compilation error message, as you have done.

> > Things that can affect optimisation are labels (common to any language)
> > and presence of ON statements (e.g., code cannot be moved to a place
> > where it would be executed before an ON statement).
> >
> > Given typical use of ON statements, they appear at or near the beginning of
> > a procedure, and thus do not have much influence on optimisation.
>
> Usually...
>
> >
> > That said, it doesn't matter where the code is ; if it causes an interrupt,
> > the exception handler will get control, and, if appropriate, return control
> > to the point where the interrupt occurred.
>
> See above.
>
> >
> >> which is a big reason that PL/I was unable to replace FORTRAN,
> >
> > It had nothing to do with whether or not PL/I replaced FORTRAN.
>
> What's FORTRAN? ;-)

Well, yes.

robin....@gmail.com

unread,

May 24, 2012, 12:02:19 PM5/24/12

to

On Saturday, 19 May 2012 03:40:21 UTC+10, Peter Flass wrote:

> On 5/17/2012 11:30 PM, robin.vow...@gmail.com wrote:
> > On Friday, 18 May 2012 13:08:18 UTC+10, glen herrmannsfeldt wrote:
> >
> >> Well, S/360 requires that you not use some registers, too.
>
> But it has a few more to work with.
>
> >>
> >> You at least need a base register, which you don't for IA32.
> >
> > It's convenient to use a base register, but you don't have to have
> > one.
>
> As long as all your code resides in the first 4K of memory, in which
> vase you can use R0.

Well, you're not actually using Register 0.
The base register field in the relevant instructions are set to zero.

At any point in a program, you can use something like
BALR 4,0
L 5,8(4,0)

John W Kennedy

unread,

May 24, 2012, 11:25:31 PM5/24/12

to

You really don't know a damn thing about computers, do you? Not even,
in the end, about PL/I. (Do you think REORDER was added to the language
for joke?!)

Peter Flass

unread,

May 25, 2012, 7:40:36 AM5/25/12

to

Robin, do I *have* to give you an example. Off the top of my head:

do i=1 to 5;
call a(i);
j = <something that raises conversion or error>;
end;

Since the assignment to j is loop-invariant, suppose the compiler moves
it out of the loop. I is then never initialized, so it's value will be
unpredictable when the condition is signaled. Also, the first call a(i)
is never executed, so any side-effects it might have won't occur.

All of this may or may not be important for the program - usually it
isn't - but it's something to be aware of.

--
Pete

James J. Weinkam

unread,

May 25, 2012, 6:21:32 PM5/25/12

to

Peter Flass wrote:
> Robin, do I *have* to give you an example. Off the top of my head:
>
> do i=1 to 5;
> call a(i);
> j = <something that raises conversion or error>;
> end;
>
> Since the assignment to j is loop-invariant, suppose the compiler moves it out of the loop. I is then never initialized,
> so it's value will be unpredictable when the condition is signaled.

If the computation of the expression assigned to j is to be "moved out of the loop", it should be moved to a point after
the loop initialization but before the iterative execution of the loop body if it or any condition it might raise refers
to i; if it or any condition it might raise depends on side effects of a it can't be moved at all. If a or any
procedure it invokes refers to the same generation of j, the assignment to j must take place after the first call to a.
If the optimizer's dependency analysis is unable to figure this out for itself, the order option must be used.

robin....@gmail.com

unread,

May 26, 2012, 10:57:13 PM5/26/12

to

On Friday, 25 May 2012 13:25:31 UTC+10, John W Kennedy wrote:

On both counts (computers and PL/I), more than you.

Having written an optimising PL/I compiler, and modified another
to do optimising, I would think that I have some knowledge of the two topics
you mentioned.

robin....@gmail.com

unread,

May 26, 2012, 10:50:24 PM5/26/12

to

On Friday, 25 May 2012 21:40:36 UTC+10, Peter Flass wrote:

> Robin, do I *have* to give you an example. Off the top of my head:
>
> do i=1 to 5;
> call a(i);
> j = <something that raises conversion or error>;
> end;
>
> Since the assignment to j is loop-invariant, suppose the compiler moves
> it out of the loop. I is then never initialized, so it's value will be
> unpredictable when the condition is signaled.

That's irrelevant, on 2 counts:
1. If the condition is fatal one, no further execution of the code
after the point of interrupt occurs.
2. If the interrupt is not a fatal one and execution resumes,
then J may or may not receive a useful value.
However, variable 'I' then receives its initial value and the loop
is entered.
In case (1), the invariant code can be moved to a point AFTER the
loop initialization and BEFORE the loop body, as Weinkam points out.

> Also, the first call a(i)
> is never executed, so any side-effects it might have won't occur.

That is irrelevant, because the program terminated.
If, however, execution of the invariant code does continue after
the interrupt, the CALL *is* executed. That said, see below.

> All of this may or may not be important for the program - usually it
> isn't - but it's something to be aware of.

Introducing a CALL in loop that is subject to optimisation
considerably affects optimisation -- even more so than merely
having labels within a loop. It doesn't matter what language
is being used -- it even affects FORTRAN, by the way.

In the event of a call (or a reference to a user-supplied function),
optimisation may be entirely inhibited at that point.

If the called procedure is available at the time of compilation,
the compiler's analysis may reveal whether or not optimisation
can or cannot be sustained through the point of call.

Peter Flass

unread,

May 30, 2012, 12:57:12 PM5/30/12

to

On 5/26/2012 10:50 PM, robin....@gmail.com wrote:
> On Friday, 25 May 2012 21:40:36 UTC+10, Peter Flass wrote:
>
>> Also, the first call a(i)
>> is never executed, so any side-effects it might have won't occur.
>
> That is irrelevant, because the program terminated.
> If, however, execution of the invariant code does continue after
> the interrupt, the CALL *is* executed. That said, see below.

ON <whatever> GOTO somewhere;

>
>> All of this may or may not be important for the program - usually it
>> isn't - but it's something to be aware of.
>
> Introducing a CALL in loop that is subject to optimisation
> considerably affects optimisation -- even more so than merely
> having labels within a loop. It doesn't matter what language
> is being used -- it even affects FORTRAN, by the way.

Acxtually, I guess that's one of the uses of the IRREDUCIBLE attribute -
is this still used?

--
Pete