Article on IBM's z196 Mainframe Architecture

David Kanter

unread,

Jan 16, 2012, 3:37:39 AM1/16/12

to

As a group dedicated to mainframe assembly, hopefully the topic of
IBM's zArchitecture requires no introduction.

I recently wrote an article on IBM’s z196, which is the first
mainframe with an out-of-order CMOS microprocessor, and also the first
with an integrated L3 cache. These two innovations are largely
responsible for a 30-40% improvement in performance over the previous
generation z10.

http://www.realworldtech.com/page.cfm?ArticleID=RWT010312153140

Hopefully you all find this an interesting and enjoyable read.

Regards,

David

John W Kennedy

unread,

Jan 16, 2012, 12:35:57 PM1/16/12

to

Early mainframes had bipolar chips, forsooth? Pardon me while I crawl
to my grave.

--
John W Kennedy
"You can, if you wish, class all science-fiction together; but it is
about as perceptive as classing the works of Ballantyne, Conrad and W.
W. Jacobs together as the 'sea-story' and then criticizing _that_."
-- C. S. Lewis. "An Experiment in Criticism"

swe...@earthlink.nospam.net

unread,

Jan 17, 2012, 12:52:45 PM1/17/12

to

John W Kennedy <jwk...@attglobal.net> writes:

> On 2012-01-16 08:37:39 +0000, David Kanter said:
>
>> As a group dedicated to mainframe assembly, hopefully the topic of
>> IBM's zArchitecture requires no introduction.
>>
>> I recently wrote an article on IBM’s z196, which is the first
>> mainframe with an out-of-order CMOS microprocessor, and also the first
>> with an integrated L3 cache. These two innovations are largely
>> responsible for a 30-40% improvement in performance over the previous
>> generation z10.
>>
>> http://www.realworldtech.com/page.cfm?ArticleID=RWT010312153140
>>
>> Hopefully you all find this an interesting and enjoyable read.
>
> Early mainframes had bipolar chips, forsooth? Pardon me while I crawl
> to my grave.

No, the *real* ones were ECL...

--
Stephen H. Westin
Software Engineer
Central New York

Allodoxaphobia

unread,

Jan 17, 2012, 11:05:30 PM1/17/12

to

On Tue, 17 Jan 2012 12:52:45 -0500, swe...@earthlink.nospam.net wrote:
> John W Kennedy <jwk...@attglobal.net> writes:
>> On 2012-01-16 08:37:39 +0000, David Kanter said:
>>
>>> As a group dedicated to mainframe assembly, hopefully the topic of
>>> IBM's zArchitecture requires no introduction.
>>>

>>> I recently wrote an article on IBM?s z196, which is the first

>>> mainframe with an out-of-order CMOS microprocessor, and also the first
>>> with an integrated L3 cache. These two innovations are largely
>>> responsible for a 30-40% improvement in performance over the previous
>>> generation z10.
>>>
>>> http://www.realworldtech.com/page.cfm?ArticleID=RWT010312153140
>>>
>>> Hopefully you all find this an interesting and enjoyable read.
>>
>> Early mainframes had bipolar chips, forsooth? Pardon me while I crawl
>> to my grave.
>
> No, the *real* ones were ECL...

swe...@earthlink.nospam.net

unread,

Jan 18, 2012, 9:34:59 AM1/18/12

to

Allodoxaphobia <knock_you...@example.net> writes:

> On Tue, 17 Jan 2012 12:52:45 -0500, swe...@earthlink.nospam.net wrote:
>> John W Kennedy <jwk...@attglobal.net> writes:
>>> On 2012-01-16 08:37:39 +0000, David Kanter said:
>>>
>>> Early mainframes had bipolar chips, forsooth? Pardon me while I crawl
>>> to my grave.
>>
>> No, the *real* ones were ECL...
>
> No, again. The *REAL* ones were SLT.
>
> http://www-03.ibm.com/ibm/history/exhibits/vintage/vintage_4506VV3081.html
>
> Jonesy -- who was there.

Well, I suppose *early* mainframes had 10,000 tubes or so :)

John W Kennedy

unread,

Jan 18, 2012, 11:31:56 AM1/18/12

to

On 2012-01-16 08:37:39 +0000, David Kanter said:

Since my little jest seems to have hijacked the thread, let me remark
that the article is an interesting one (to the extent that I could
follow it -- engineering is not my field). I had memories of reading
the 360/91 Functional Characteristics manual, yonks ago. I haven't
touched a mainframe for about 15 years, now, but I'd leap at it if IBM
were to produce a new successor to the XT/370; Intel just isn't the
same.

hanc...@bbs.cpcn.com

unread,

Feb 1, 2012, 1:59:32 PM2/1/12

to

On Jan 16, 3:37 am, David Kanter <dkan...@gmail.com> wrote:
> I recently wrote an article on IBM’s z196, which is the first
> mainframe with an out-of-order CMOS microprocessor, and also the first
> with an integrated L3 cache. These two innovations are largely
> responsible for a 30-40% improvement in performance over the previous
> generation z10.
> http://www.realworldtech.com/page.cfm?ArticleID=RWT010312153140

Could someone explain in layman's terms...

5.2GHz -- is that 5.2 billion operations per second? (In comparison,
would anyone know the comparable speed of a System 360 model 65 (or
any other S/360 model)? The model 65 was the common 'big boy' of its
day though there were a few faster models.)

quad-core -- well, I assume they're not using magnetic cores for
memory. What does this mean?

3 issue -- ???

out-of-order execution design -- ???

Thanks!

John W Kennedy

unread,

Feb 1, 2012, 3:15:46 PM2/1/12

to

On 2012-02-01 18:59:32 +0000, hanc...@bbs.cpcn.com said:

> On Jan 16, 3:37 am, David Kanter <dkan...@gmail.com> wrote:
>> I recently wrote an article on IBM’s z196, which is the first
>> mainframe with an out-of-order CMOS microprocessor, and also the first
>> with an integrated L3 cache. These two innovations are largely
>> responsible for a 30-40% improvement in performance over the previous
>> generation z10.
>> http://www.realworldtech.com/page.cfm?ArticleID=RWT010312153140
>
> Could someone explain in layman's terms...
>
> 5.2GHz -- is that 5.2 billion operations per second? (In comparison,
> would anyone know the comparable speed of a System 360 model 65 (or
> any other S/360 model)? The model 65 was the common 'big boy' of its
> day though there were a few faster models.)

It's the basic "clock ticks per second" in the processor. 360s were
more commonly rated in terms of the speed of main storage. The 65 was
0.75 microseconds per doubleword. But a more proper comparison would be
based on the internal cycle of the 65, which appears to have been 0.2
microseconds, which would give a rate of 5MHz, or 0.005GHz.

> quad-core -- well, I assume they're not using magnetic cores for
> memory. What does this mean?

In the world of Intel, etc., the word "processor" came to mean "a kind
of chip". So when they started putting more than one of what a
mainframe person would call a "processor" on a single chip, that would
produce "a processor with two processors", which sounded stupid, so the
old word "core" was co-opted to make it "a processor with two cores".
Even the latest iPhone has two cores, and my wife's new Apple laptop
has four.

> 3 issue -- ???
>
> out-of-order execution design -- ???

Wherever possible, if the system can see that it is possible, and there
are electronic resources available, an instruction can be executed
before the instruction before it has finished. This goes back at least
to the 360/91. It can also, wherever there is a conditional branch, try
to follow both ways, eventually discarding the results of the path that
wasn't followed.

In other words, each core of a z196 is about one thousand times as fast
as a 65. A single z196 unit has between 15 and 80 working cores, so a
top z196 can be compared to about 80,000 65s.

Of course, there are many differences between a 65 and a z196, so this
kind of comparison can only be very, very vague.

hanc...@bbs.cpcn.com

unread,

Feb 2, 2012, 10:43:03 AM2/2/12

to

On Feb 1, 3:15 pm, John W Kennedy <jwke...@attglobal.net> wrote:

> It's the basic "clock ticks per second" in the processor. 360s were
> more commonly rated in terms of the speed of main storage. The 65 was
> 0.75 microseconds per doubleword. But a more proper comparison would be
> based on the internal cycle of the 65, which appears to have been 0.2
> microseconds, which would give a rate of 5MHz, or 0.005GHz.

O M G.

Thanks for the explanation. (I can't help but wonder how long our S/
360-40's daytime workload would take to process. Of course, using a
2415 tape drive didn't help much <g>)

It just amazes me that such super fast and complex machines are able
to keep all the multiple tasks of CPU and I/O separate without
clashing and run inefficiently.

I once had sole use of a mainframe (a S/370-135, IIRC) and
experimented with wallclock time running several jobs either singly or
in multiple. In turned out that due to I/O clashing, it was actually
slightly faster to run the jobs singly (back to back) rather than
together.

(I also learned that it was most efficient for COBOL programs to use
binary for subscripts but packed decimal in other numeric fields.
Binary was not good in plain numeric fields of a business application--
the convert instructions apparently too a lot of time, much more the
pack).

> > out-of-order execution design -- ???
>
> Wherever possible, if the system can see that it is possible, and there
> are electronic resources available, an instruction can be executed
> before the instruction before it has finished. This goes back at least
> to the 360/91. It can also, wherever there is a conditional branch, try
> to follow both ways, eventually discarding the results of the path that

> wasn't followed. . . .

Does such parallel processing really save that much CPU time in
business applications, such as a batch accounting job and an on-line
CICS system? That is, a programming mix written by ordinary (sloppy)
programmers as opposed to specialists in parallel processing?

I could understand in a bank of MOVE statements or a complex math
formual (evaluate the terms in parallel) there would be a time
saving. But it seems to me so much processing is dependent on the
results before. Doing both paths and a discard seems wasteful.

I believe parallel processing was a big part of the design and
expected speed of IBM's Stretch computer. But in actual practice it
didn't work as well and the effective throughput wasn't as good as
expected, so IBM had to reduce the price and take some embarassment.

Thansk again for the info.

John W Kennedy

unread,

Feb 2, 2012, 11:45:38 AM2/2/12

to

On 2012-02-02 15:43:03 +0000, hanc...@bbs.cpcn.com said:
> On Feb 1, 3:15 pm, John W Kennedy <jwke...@attglobal.net> wrote:
>> It's the basic "clock ticks per second" in the processor. 360s were
>> more commonly rated in terms of the speed of main storage. The 65 was
>> 0.75 microseconds per doubleword. But a more proper comparison would be
>> based on the internal cycle of the 65, which appears to have been 0.2
>> microseconds, which would give a rate of 5MHz, or 0.005GHz.

> O M G.

> Thanks for the explanation. (I can't help but wonder how long our S/
> 360-40's daytime workload would take to process. Of course, using a
> 2415 tape drive didn't help much <g>)

> It just amazes me that such super fast and complex machines are able
> to keep all the multiple tasks of CPU and I/O separate without
> clashing and run inefficiently.

> I once had sole use of a mainframe (a S/370-135, IIRC) and
> experimented with wallclock time running several jobs either singly or
> in multiple. In turned out that due to I/O clashing, it was actually
> slightly faster to run the jobs singly (back to back) rather than
> together.

The situation is different on modern systems, because:
Files are not allocated by tracks or cylinders,
Disk drives have buffers,
Disk I/O is performed through a single pool of shared buffers
in the operating system.

> (I also learned that it was most efficient for COBOL programs to use
> binary for subscripts but packed decimal in other numeric fields.
> Binary was not good in plain numeric fields of a business application--
> the convert instructions apparently too a lot of time, much more the
> pack).

Yes indeed. But the new mainframes include decimal floating-point,
which I suspect may be even faster.

>>> out-of-order execution design -- ???

>> Wherever possible, if the system can see that it is possible, and there
>> are electronic resources available, an instruction can be executed
>> before the instruction before it has finished. This goes back at least
>> to the 360/91. It can also, wherever there is a conditional branch, try
>> to follow both ways, eventually discarding the results of the path that
>> wasn't followed. . . .

> Does such parallel processing really save that much CPU time in
> business applications, such as a batch accounting job and an on-line
> CICS system?

> That is, a programming mix written by ordinary (sloppy)
> programmers as opposed to specialists in parallel processing?

> I could understand in a bank of MOVE statements or a complex math
> formual (evaluate the terms in parallel) there would be a time
> saving. But it seems to me so much processing is dependent on the
> results before. Doing both paths and a discard seems wasteful.

Not if you have extra adder circuits, etc..

It's true that you don't gain so much when doing packed-decimal
arithmetic, but most of the work on a modern mainframe is in handling
I/O and dispatching transactions. The actual accounting arithmetic
doesn't usually amount to much. And now that virtually all code is
compiled, you can rely on the compiler to write code that will
parallelize nicely, just as we've relied for years on having the
compiler allocate registers sensibly.

> I believe parallel processing was a big part of the design and
> expected speed of IBM's Stretch computer. But in actual practice it
> didn't work as well and the effective throughput wasn't as good as
> expected, so IBM had to reduce the price and take some embarassment.

The Stretch attempted to be 100 times as fast as the state of the art
at the time it was announced. That was too much at one bite, and they
got only about 75 times. But we've learned a lot since then. Remember,
the z196 may be much faster than a 360/65, but so is an iPhone. Heck,
the /screen/ of an iPhone, all by itself, is a bigger, faster computer
than a 360/65, and a single USB 2.0 port can outperform all the
channels of a maxed-out 65 put together.

Nomen Nescio

unread,

Feb 2, 2012, 2:36:16 PM2/2/12

to

> got only about 75 times. But we've learned a lot since then. Remember,
> the z196 may be much faster than a 360/65, but so is an iPhone. Heck,
> the /screen/ of an iPhone, all by itself, is a bigger, faster computer
> than a 360/65, and a single USB 2.0 port can outperform all the
> channels of a maxed-out 65 put together.

Maybe on paper but not in reality. I have seen numerous companies run *all*
their workload (online and nightly reporting) on a 370 but I haven't seen
anybody run a business on an iPhone or a USB 2.0 port ;-)

John W Kennedy

unread,

Feb 2, 2012, 5:47:27 PM2/2/12

to

Yes, and I've seen companies run all their business on a 1401. And I've
worked for a company that, only a few years before, had run all their
business on a 402.

hanc...@bbs.cpcn.com

unread,

Feb 2, 2012, 9:59:56 PM2/2/12

to

On Feb 2, 5:47 pm, John W Kennedy <jwke...@attglobal.net> wrote:

> Yes, and I've seen companies run all their business on a 1401. And I've
> worked for a company that, only a few years before, had run all their
> business on a 402.

A former employer, a hospital, managed to do most of its work on a
1401 (emulation).

In my own opinion, one difference between the old days and now is that
programmers used to spend a lot of their time "shoe horning" the
application to work in the physical space available of the computer
they were using. For instance, using all eight bits of a byte for
eight separate flags instead of using one byte for a flag. Also,
converting dates to packed to save space and speed processing.

On my last mainframe applicaton development, the boss said not to
bother packing the date fields or other traditional shortcuts. As a
result our record lengths are long, but the throughput remains
extremely fast because today the machines are so damn powerful.
(Plus, it's a relatively small file).

And of course today we use 4Gls and other resource hog systems.

Years ago we programmers often had to come in during off hours to do
testing because the day shift was just too bogged down iwth
production. Not done today.

None the less, it's still amazing how much work they could shove
through a 1401 with only 16k (and many 1401s were even smaller.)

hanc...@bbs.cpcn.com

unread,

Feb 2, 2012, 10:03:54 PM2/2/12

to

On Feb 2, 5:47 pm, John W Kennedy <jwke...@attglobal.net> wrote:

> Yes, and I've seen companies run all their business on a 1401. And I've
> worked for a company that, only a few years before, had run all their
> business on a 402.

P.S.

One thing that has changed is the nature of work to be processed.
Years ago my phone bill was a small single slip of paper--service and
equipment on one side, long distance on the other. Today my phone
bill--even with unlimited long distance and no itemization--is
multiple pages on double sided paper. There are many line items of
only $0.03 for certain oddball charges and taxes.

Our electric bill was even smaller: a postcard. Name an address on
the right, KWH used and payment due on the left. Today the electric
bills have graphs of consumption.

Many of our old applications track far more information than they did
when they were first developed.

Nomen Nescio

unread,

Feb 3, 2012, 3:53:44 AM2/3/12

to

John W Kennedy <jwk...@attglobal.net> wrote:

> On 2012-02-02 19:36:16 +0000, Nomen Nescio said:
>
> >>
> >> got only about 75 times. But we've learned a lot since then. Remember,
> >> the z196 may be much faster than a 360/65, but so is an iPhone. Heck,
> >> the /screen/ of an iPhone, all by itself, is a bigger, faster computer
> >> than a 360/65, and a single USB 2.0 port can outperform all the
> >> channels of a maxed-out 65 put together.
> >
> > Maybe on paper but not in reality. I have seen numerous companies run *all*
> > their workload (online and nightly reporting) on a 370 but I haven't seen
> > anybody run a business on an iPhone or a USB 2.0 port ;-)
>
> Yes, and I've seen companies run all their business on a 1401. And I've
> worked for a company that, only a few years before, had run all their
> business on a 402.

We're in furious agreement then? IBM gets the job done, the expensive toys
don't. The difference is today people can still run significant corporations
on one System Z box, but there is no other single design that can support a
business on one box.

John W Kennedy

unread,

Feb 3, 2012, 11:46:45 AM2/3/12

to

That's not strictly true. A practical zEnterprise system requires a PC
at every seat to get actual work done.

Nomen Nescio

unread,

Feb 4, 2012, 4:15:05 PM2/4/12

to

> That's not strictly true. A practical zEnterprise system requires a PC
> at every seat to get actual work done.

Not as far as I know. You could use 3270s, a PC is just cheaper but it isn't required.

John W Kennedy

unread,

Feb 4, 2012, 4:42:40 PM2/4/12

to

Only if you steal them from museums.

--
John W Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
-- Charles Williams. "Taliessin through Logres: Prelude"

hanc...@bbs.cpcn.com

unread,

Feb 4, 2012, 5:40:07 PM2/4/12

to

On Feb 4, 4:42 pm, John W Kennedy <jwke...@attglobal.net> wrote:
> On 2012-02-04 21:15:05 +0000, Nomen Nescio said:
>
>
>
> >> That's not strictly true. A practical zEnterprise system requires a PC
> >> at every seat to get actual work done.
>
> > Not as far as I know. You could use 3270s, a PC is just cheaper but it
> > isn't required.
>
> Only if you steal them from museums.

I don't know if they still make 3270 equivalent terminals and
controllers and the like, but I'm sure a quantity order of stripped-
down PCs to act like a 3270 could be ordered at a good price--cheaper
than a regular PC. (What are they, "thin clients"?)

Waldek Hebisch

unread,

Feb 5, 2012, 12:35:33 PM2/5/12

to

The "computer" above was clearly about CPU, to run business
you need termianls and mass storage. I am not sure about
iPone, but there are cheap gizmos on the market with USB 2.0
port. Connect to that USB hub, ethernet interface and disks.
Put Linux on your box. Such box can easily handle about
simultaneous 100 telnet connections -- get network gear
and 100 terminals with built-in telnet portocol. For
bigger enterprise write 3270 emulator in Javascript and
requre Javascript enabled Web browser at client side.
Web server on your box should easily handle 50 hits per
seconds -- which means 50 screen updates per second.
Assuming think time of about 30 sec (IIUC typical for 360
sites) you are able to handle 1500 clients. For even
bigger scale go to proprietary terminal protocal...

So, technically you can run business on such box, but
economically it makes no sense -- external gear would
cost much more then central computer. Even in 100
connection case small productivity gain due to extra
features and possibly saving in wait time will justify
much bigger server.

--
Waldek Hebisch
heb...@math.uni.wroc.pl

John W Kennedy

unread,

Feb 5, 2012, 1:45:57 PM2/5/12

to

On 2012-02-05 17:35:33 +0000, Waldek Hebisch said:

> Nomen Nescio <nob...@dizum.com> wrote:
>>> got only about 75 times. But we've learned a lot since then. Remember,
>>> the z196 may be much faster than a 360/65, but so is an iPhone. Heck,
>>> the /screen/ of an iPhone, all by itself, is a bigger, faster computer
>>> than a 360/65, and a single USB 2.0 port can outperform all the
>>> channels of a maxed-out 65 put together.
>>
>> Maybe on paper but not in reality. I have seen numerous companies run *all*
>> their workload (online and nightly reporting) on a 370 but I haven't seen
>> anybody run a business on an iPhone or a USB 2.0 port ;-)
>>
>
> The "computer" above was clearly about CPU, to run business
> you need termianls and mass storage. I am not sure about
> iPone, but there are cheap gizmos on the market with USB 2.0
> port.

The iPhone uses the 30-pin Apple Dock Connector, which includes a
number of pins for other functions, such as analog audio and video
output. But among the pins it includes full USB 2.0 function.

> Connect to that USB hub, ethernet interface and disks.
> Put Linux on your box. Such box can easily handle about
> simultaneous 100 telnet connections -- get network gear
> and 100 terminals with built-in telnet portocol. For
> bigger enterprise write 3270 emulator in Javascript and
> requre Javascript enabled Web browser at client side.
> Web server on your box should easily handle 50 hits per
> seconds -- which means 50 screen updates per second.
> Assuming think time of about 30 sec (IIUC typical for 360
> sites) you are able to handle 1500 clients. For even
> bigger scale go to proprietary terminal protocal...
>
> So, technically you can run business on such box, but
> economically it makes no sense -- external gear would
> cost much more then central computer. Even in 100
> connection case small productivity gain due to extra
> features and possibly saving in wait time will justify
> much bigger server.

Of course. It would be ridiculous to /use/ an iPhone as a central
server, but it terms of mere MIPS and FLOPS and I/O throughput, it
leaves even the top supercomputers of the 60s in the dust. Heck, my
HDTV outperforms any mainframe of the 60s.

John W Kennedy

unread,

Feb 7, 2012, 3:07:12 PM2/7/12

to

3270-over-Telnet software still exists, I think, but actual hardware,
including coax-equipped PCs, is long gone, and if there such special
PCs as you describe exist, IBM doesn't seem to sell them. The norm for
provisioning today is Windows and a web browser, and if there are still
3270 applications to be used, virtual 3270s are created to be
screen-scraped and relayed as HTML, just as, in systems with DEC
heritage, virtual VT-100s are used. It's not what IBM was planning 20
years ago, which was to put the application in the PC (running OS/2),
talking to the mainframe with a DB2-to-DB2 SNA link, but the Web
screwed up everyone's plans.