SIIS "issue" after upgrade to z13 machine.

819 views
Skip to first unread message

Philippe Cloarec

unread,
Nov 11, 2016, 11:52:14 PM11/11/16
to ASSEMBL...@listserv.uga.edu
Hi,
As you may know, there is some kind of performance issue because of SIIS(Store Into Instruction Stream) after upgrade to a z13 machine in some scenarios. CPU increase of 30% can be seen in some case, so it may be good to perform to related changes to avoid issue from reoccuring. Did start to work on this case since some time. Here we do talk of code written years ago, hundreds of programs for which there is no time to rewrite them as RENT, so I am interested to talk with any having some experience about this. More generally, dealing with old even old code, I would like to change the code beside SIIS case to improve the performance at execution time. I mean by using newer instructions and optimizing the code to save some CPU cycles. Did find some interesting documents but wanting to discuss of any scenario we can think about. TTYL then. Philippe (philippe...@gmail.com)

Rob van der Heij

unread,
Nov 12, 2016, 2:52:17 AM11/12/16
to ASSEMBL...@listserv.uga.edu
The example that I have seen was where the customer had a linkage model for
small subroutines that used a static save area and local storage after a
branch at the start of the program. That's painful for small routines
because a lot of the code is in the same cache line and gets hit each time
you store something. If you can't change it all, it would be interesting
how much can be undone by some slack space between the save area and the
code to ensure it's a different cache line. And you would only have to do
with the 10% of the code that makes up for 90% of the overhead.

Rob

On 12 November 2016 at 05:52, Philippe Cloarec <philippe...@gmail.com>
wrote:

Martin Truebner

unread,
Nov 12, 2016, 2:58:30 AM11/12/16
to ASSEMBL...@listserv.uga.edu
Philippe,

>> ... no time to rewrite them as RENT.

Well - if this is the case then i would not invest into using newer
instructions (with the exception of the fairly old "relative &
immediate" instructions)

You need to understand the code to replace instruction sequences.

To make code RENT is far less work.

Have you ever looked at ABO. It only covers compiled languages and
no assembler.

I would invest into converting the code to running baseless. There
is return on investment (no more hassle with registers covering the
code) and of course SIIS cases are identified and must be fixed.

Once this step is complete (code without base)- the next minor step on
the way to RENT is to prepare the initialization of the data-areas.

This is in MNSHO the better thing to do as opposed to looking for use
cases of SIMD cases or places where you could use FLOGR.

Martin Trübner; everything around "PoOps of z/arch"


Martin Truebner

unread,
Nov 12, 2016, 3:01:27 AM11/12/16
to ASSEMBL...@listserv.uga.edu
Rob,

excellent example that would be obvious when converting to
"baseless" (and eliminated to be performance-perfect even without going
to RENT)

Martin

Philippe Cloarec

unread,
Nov 12, 2016, 4:10:05 AM11/12/16
to ASSEMBL...@listserv.uga.edu
Hi Martin and Rob,

First thx much for your input. Here are 3 links which relate the "issue" and common code for which we fall into a SIIS scenario:
https://www.google.fr/#q=istream_flash_062606_v4
http://s3-us-west-1.amazonaws.com/watsonwalker/ww/wp-content/uploads/2016/03/06173415/18017-The-Cheryl-and-Frank-zRoadshow.pdf
http://conferences.gse.org.uk/attachments/presentations/ibHo4j_1446285934.pdf

Unfortunately I have more than 1000 pgms to review and handle on a short timeframe, so I will have no time to implement "Baseless processing". I do plan to SAK(Search And Kill) SIIS occurences for scenarios described in above links I will find and implement newer instructions as immediate ones to eliminate memory references and constants in storage as well.

I will check cases where two instructions can be replaced by one only...to use CIJE in place of LTR/BZ combination as for example.

Since we do talk of CPU cycles savings here I will check for AGI cases and their resolution and try to implement instruction grouping as much I can.

From my humble point this is a real topic and all z13 sites having old productions Batch programs should perform some action.

regards
Philippe

Pieter Wiid

unread,
Nov 12, 2016, 4:18:18 AM11/12/16
to ASSEMBL...@listserv.uga.edu
A good start for baseless is to include the following 2 statements at the top:
IEABRCX DEFINE
IEABRCX ENABLE

The would change most branch instructions to relative branches.

Also, use SYSSTATE ARCHLEVEL=n -- check your manual for the correct value.

Pieter
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Martin Truebner

unread,
Nov 12, 2016, 4:47:03 AM11/12/16
to ASSEMBL...@listserv.uga.edu
Pieter,

>> The would change most branch instructions to relative branches.

I say not "most" but "all".

Here is a construct that IEABRC (or its X brother) does not handle
well....

B *+4*NUMACONS
DC AL4(SOMEWHERE)
DC AL4(OVER_THE_RAINBOW)
.
.

but these are rare and easy to fix. #

Philippe,

don't go for high hanging fruits (explore new instructions) - go for
the easy to catch and guaranteed to catch all SIIS.

The main advantage of using baseless for code is that you will identify
all cases of sins (SIIS) like

STC R,*+5
MVC TARGET(*-*),SOURCE
or
CNOP 4,8
BAS R13,*+76 provide new SAVE-AREA
DC 18F'0'

or
NOP NOTFIRST
OI *-3,C'0' close first-time_gate

--- it will also catch weird cases like this:

ICM R,B'1000',* to make register negative

(found in IBMs code!!!)

But these are easy to fix and fast to identify and only minimal testing
required - as opposed to implementing new instructions here and there.

Martin

Rob van der Heij

unread,
Nov 12, 2016, 5:48:56 AM11/12/16
to ASSEMBL...@listserv.uga.edu
On 12 November 2016 at 10:10, Philippe Cloarec <philippe...@gmail.com>
wrote:

>
> Since we do talk of CPU cycles savings here I will check for AGI cases and
> their resolution and try to implement instruction grouping as much I can.
>
> From my humble point this is a real topic and all z13 sites having old
> productions Batch programs should perform some action.
>

I would reduce expectations of the benefits of "typical" things like
loading registers early and avoiding AGI. Our CPU is not really that
typical. Since a lot of our instructions have been coded in the Language of
our Fathers, we can't simply recompile and take advantage of such tricks
(even if customers had source code and wish to accept the business risk of
recompiles). Instead, our CPU does very good in figuring out those things
on its own with Out-of-Order Execution. Even better if you can take
advantage of SMT.

On z/OS you should be able to use hardware profiling to find the pieces
that are worth a closer look (since z/VM does not virtualize that support,
I had to write my own profiler in software). I have had numerous cases
where I expected low-hanging fruit and found that I could not do outrun our
CPU. For some critical parts it does help to unroll a loop a little bit,
but something extreme with 16-fold and swapping registers actually made it
slower. Hopefully you find a spot that is worth spending some time to
optimize. If you're looking at touching all code to scrape off 10% it may
be wiser to look higher up in the stack.

I will be the first to admit that you can achieve impressive results by
carefully coding a critical part using the right instructions. In one case
we had a end-user transaction take 700 ms - when I was done it was down to
7 ms. This was somewhat unique in that it did module multiplication for
cryptography. The code had been written with the assumption that operations
on words twice as long take 4 times more time. But on our CPU it takes just
log(2) times longer, so going from 16-bit multiply to 64-bit saves you a
lot.

Rob

Philippe Cloarec

unread,
Nov 12, 2016, 7:17:26 AM11/12/16
to ASSEMBL...@listserv.uga.edu
Hi Rob and Martin,
Thx for your input. MY primary purpose is to detect SIIS scenarios across code I have to scan and to fix them. Code is not complicated I mean you do have basic i/o processing against files and some characters and numerics fields are processed.
There are some Ex(ecute) instructions to review..and some Save_areas fields setting to review as well. Yes, macros used should
be reviewed...Save_areas setting...Yes IBM ones or use of them should be reviewed as well.
I will be careful when changing some code, I mean at instruction level to optimize current code. Thx much for your recommendations. TTYL Philippe

Gibney, Dave

unread,
Nov 12, 2016, 11:53:41 AM11/12/16
to ASSEMBL...@listserv.uga.edu
Just what are you running now? This was an issue when we moved from Multiprise to z800.

> -----Original Message-----
> From: IBM Mainframe Assembler List [mailto:ASSEMBLER-
> LI...@LISTSERV.UGA.EDU] On Behalf Of Philippe Cloarec
> Sent: Saturday, November 12, 2016 4:17 AM
> To: ASSEMBL...@LISTSERV.UGA.EDU
> Subject: Re: SIIS "issue" after upgrade to z13 machine.
>

Philippe Cloarec

unread,
Nov 12, 2016, 12:00:08 PM11/12/16
to ASSEMBL...@listserv.uga.edu
Apparently this is needed for some...sure SIIS is NOT new for sure, BUT performance issue was obviously revealed after upgrade
to a z13, please do read documents referenced in previous updates. Have a good evening.

John Dravnieks

unread,
Nov 13, 2016, 9:44:53 PM11/13/16
to ASSEMBL...@listserv.uga.edu
Hello Philippe

A comment about replacing LTR/BZ combinations with CIJE - the compare
immediate instructions do NOT set the condition code so you need to make
sure that the BZ is the only instruction testing the condition code.

Also earlier, you made a comment about Execute that you need to review
these as well - my understanding is that Execute only modifies the
instruction instream and as such does not attempt to update the
instruction cache and the subsequent delays that causes. Of course the
instruction that is being Executed may be a cause of the problem in
itself.

If you do have non-rent programs with embedded save areas, then you should
see a benefit from making sure that the save area is in a different cache
line - the judicious use of ORG to place code and data onto specific
boundaries may help here. For example, ORG ,256 will move the
location counter to the next 256 byte boundary - you will need to use the
assembler options GOFF and SECTALGN for this to work, as well as binding
the module so that it is loaded onto the correct boundary

Kind Regards

John

Internet: dr...@au.ibm.com
Phone: +61 8 926 18473 (xtn 18473; Tie-Line: 701 8473)




From: Philippe Cloarec <philippe...@GMAIL.COM>
To: ASSEMBL...@LISTSERV.UGA.EDU
Date: 12/11/2016 21:16
Subject: Re: SIIS "issue" after upgrade to z13 machine.

Philippe Cloarec

unread,
Nov 14, 2016, 12:05:40 AM11/14/16
to ASSEMBL...@listserv.uga.edu
Hi John,
Thx much for your input. Yes, I was planning to use newer instructions but I did realize seeing current design of the applications I am reviewing this will generate some extra needed time to be sure all is ok and I cannot afford this in current context. To implement Baseless pgm or HLASM for readibility purpose would have been great but here as well no time and related resources, data and people to redo some tests for applications changed, and for sure there is no budget for this :) .
Yes, I was aware of the point related to needed alignment to a 256 bytes boundary and for some options at Linkedit time to ensure alignement.
Beside User Code, I have to check all macros including IBM ones...for SIIS occurences. We can change User macros code but I am not sure IBM does plan to fix SIIS cases for delivered macros on a short timeframe or even ever. You have also the case of User macro including IBM ones behalf its code. I was giving some arbitrary list of programs to review and I do NOT have at the time the listings but the source code only. Basically SIIS audit phase may be larger and time consuming than expected I will inform involved people in the project for this point. Have a great day, regards Philippe

Pieter Wiid

unread,
Nov 14, 2016, 1:14:43 AM11/14/16
to ASSEMBL...@listserv.uga.edu
For IBM macros that do SIIS, use the MF=E/L or SF=E/L variants.


-----Original Message-----
From: IBM Mainframe Assembler List [mailto:ASSEMBL...@LISTSERV.UGA.EDU] On Behalf Of Philippe Cloarec
Sent: 14 November 2016 07:06
To: ASSEMBL...@LISTSERV.UGA.EDU
Subject: Re: SIIS "issue" after upgrade to z13 machine.

Philippe Cloarec

unread,
Nov 14, 2016, 1:23:26 AM11/14/16
to ASSEMBL...@listserv.uga.edu
Hi Pieter,
Yes I did figure out that to use of MF=E/L or SF=E/L will help to avoid the SIIS scenario for IBM Macros.
Thx much for your input have a good day Philippe
Reply all
Reply to author
Forward
0 new messages