http://www.theregister.co.uk/2009/12/17/eds_mainframe/
Two z10s crashed in the UK due to lack of microcode maintenance. The first one crashed. This caused a DR roll over to the second one, which then also crashed. I don't know how an application can cause a microcode problem. Likely a misstatement due to lack to knowledge.
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to list...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
2009/12/17 McKown, John <John....@healthmarkets.com>
--
Peter Nuttall
J&LR AMS Group 2
TATA Technologies House | Prospect Way | London Luton Airport | Luton |
Bedfordshire | LU2 9QH
Office +44 121 7008284
email: pnut...@jaguarlandrover.com |
Peter....@tatatechnologies.comwebsite:
http://www.tatatechnologies.com
HP managers are reaping the harvest of their deep cost-cutting at EDS,
in the form of a massive mainframe failure that crippled some very large
clients, including the taxpayer-owned bank RBS.
An IBM Z10 at EDS's Stockley Park site, west of London, fell over this
week after vital microcode fixes had not been applied, because all the
qualified staff had been fired.
Previously the updates would have been applied by the Stockley park
hardware team, who have all been made redundant.
When EDS' disaster recovery plan kicked in, switching processes to
another Z10 at Mitcheldean in Gloucestershire, a similar lack of
maintenance scuppered the stand-in machine.
<snip>
They perform their own microcode updates?
I would have thought there were IBM CE's for that as part of 'normal'
maintenance charges.
Or maybe they just didn't give IBM the machine time?
What sort of microcode fix (if not applied) causes an otherwise working
machine to crash?
The way this was written kid of negates the 'Mainframes never crash'
(from a hardware perspective) idea.
-----Original Message-----
McKown, John
I've been expecting something like this for years, having seen first hand all the cuts EDS made to the mainframe support organizations _before_ being acquired by HP. I'm actually surprised it took this long. I guess the difference before was that we actually still had people working like crazy to keep things running, and now there simply aren't any in some locations.
Mark Post
>http://www.theregister.co.uk/2009/12/17/eds_mainframe/
That's a poorly written article. I seriously doubt the "Stockley park hardware team" applies microcode updates. More likely they simply failed to schedule IBM to perform maintenance. The mention of Connect Direct bears no relevance to the hardware problem.
Bob Shannon
Rocket Software
These are low level programs still shipped in source (at least partially)
and run Authorized and preform device control and other non specialized
processes.
The article seems to have a bunch of information assembled in a random
fashion. Either the writer does not understand check processing or the
person who provided the information was not clear on the details.
It is not clear how microcode fits into this unless they are talking about
the 3890 check sorters (unit record devices).
Just not enough information to make any sense of what actually occurred.
Sam
What sort of microcode fix (if not applied) causes an otherwise working
machine to crash?
The way this was written kid of negates the 'Mainframes never crash'
(from a hardware perspective) idea.
>>
It's an evolving species! Stuff mutates. Mixed vendor environments
are especially challenging. Have to get past the finger pointing and
designated blame game to pin it down. The z10 like it's predecessors
downloads EC's and fixes as they are discovered and tested. It is
designed to do concurrent maintenance on all but the most critical
aspects. On the software side SMP/E provides HOLDDATA for EC or hardware
related actions. Usually these are level sets for future
enhancements or features.
It's up to management to schedule service time and provide a testing
environment for changes to include staffing and training.
Rex
<snip>
They perform their own microcode updates?
I would have thought there were IBM CE's for that as part of 'normal'
maintenance charges.
Or maybe they just didn't give IBM the machine time?
What sort of microcode fix (if not applied) causes an otherwise working
machine to crash?
The way this was written kid of negates the 'Mainframes never crash'
(from a hardware perspective) idea.
<SNIP>
Perhaps there is a TCP/IP microcode patch that needs to be put on their system? The type that if you don't put it on, you wind up with corrupted data or sockets that hang/block?
--Sent from my Dick Tracy Two-Way TV Wrist-Watch --
Bill Janulin
Mgr Tech Support & Product Dev.
ASPG, Inc.
I read that modern processors can be configured to automatically download and apply patches on the fly. Some prefer the patches be staged until someone pulls the trigger in a suitable window. Our CE does that for us, but the process is pretty straight forward and easy enough for mere mortals. And a large shop might want their own team going around pulling triggers. It follows that elimination of the team would leave triggers unpulled.
Wonder if we'll ever know what really happened?
<snip>
NOTICE: This electronic mail message and any files transmitted with it are intended
exclusively for the individual or entity to which it is addressed. The message,
together with any attachment, may contain confidential and/or privileged information.
Any unauthorized review, use, printing, saving, copying, disclosure or distribution
is strictly prohibited. If you have received this message in error, please
immediately advise the sender by reply email and delete all copies.
>>>> On 12/17/2009 at 9:25 AM, "McKown, John"
>>>> <John....@HEALTHMARKETS.COM>
> wrote:
>> From The Register (Vulture Central).
>>
>> http://www.theregister.co.uk/2009/12/17/eds_mainframe/
>>
>> Two z10s crashed in the UK due to lack of microcode maintenance. The
>> first
>> one crashed. This caused a DR roll over to the second one, which then
>> also
>> crashed.
>
> I've been expecting something like this for years, having seen first hand
> all the cuts EDS made to the mainframe support organizations _before_
> being acquired by HP. I'm actually surprised it took this long. I guess
> the difference before was that we actually still had people working like
> crazy to keep things running, and now there simply aren't any in some
> locations.
>
>
After Ross Perot left, EDS went straight downhill. In 1990, EDS owned
outsourcing. By 2000 IGS had eaten EDS for breakfast and spit it out. Sad,
really, to watch a once-great company that invented the business become an
also-ran.
Regards,
Tom Conley
.. and in support of their ignorance they surely will refer to
some glamorous consultant reports like those that recently
recommended "... now being the time to get ..." (to somewhere
else :-)
<rant>
With an ass at the right place shit is bound to come out. (With
appologies to the animals)
</rant>
--
Peter Hunkeler
Credit Suisse
Since when is it by Sterling?
When I worked at a Canadian bank, it was an IBM product using 3890 cheque processors.
All the presentations and support were by IBM.
Unless what I loosely call my mind has failed (again).
-
Too busy driving to stop for gas!
- One mainframe crashes because a critical MCL ( hiper? ) was not
applied/activated
- Naturally all LPAR's on this mainframe are then unavailable
- Connect Direct doesn't run because it ran in one of the unavailable
lpars
- If the cheque clearing system runs in a lpar, then it's unavailable
too
- If the cheque clearing system runs elsewhere, using a Connect Direct
connection to one of the lpar's, it doesn't receive input/can't send
output, so it doesn't function anymore
- Everything switches to the DR-site
- This mainframe also crashes because the critical MCL ( hiper? ) was
not applied/activated
- Naturally all LPAR's on this mainframe are then unavailable
- Connect Direct doesn't run because it ran in one of the unavailable
lpars
- If the cheque clearing system runs in a lpar, then it's unavailable
too
- If the cheque clearing system runs elsewhere, using a Connect Direct
connection to one of the lpar's, it doesn't receive input/can't send
output, so it doesn't function anymore
So the crash was not caused by the software, but the unavailability of
the check clearing system was a result of the crash.
Or am I missing something?
------------------------------------------------------------------------
----
This reminds me of SEGMENTATIONOFFLOAD, which crashed our OSA's with a
domino-effect.
All LPAR's that used those OSA's were unavailable through the network.
P.s. We also have a hardwareteam that activates the disruptive MCL's (
OSA MCL's amongst others )
P.p.s. Nice horror-scenario to show all those people that try to reject
MCL-changes because somewhere in the prehistoric an update went wrong.
--
Maarten
-----Oorspronkelijk bericht-----
Van: IBM Mainframe Discussion List [mailto:IBM-...@BAMA.UA.EDU] Namens
McKown, John
Verzonden: donderdag 17 december 2009 15:26
Aan: IBM-...@BAMA.UA.EDU
Onderwerp: EDS mainframe goes <elided>, crashes RBS cheque system
http://www.theregister.co.uk/2009/12/17/eds_mainframe/
-----------------------------------------------------------------
ATTENTION:
The information in this electronic mail message is private and
confidential, and only intended for the addressee. Should you
receive this message by mistake, you are hereby notified that
any disclosure, reproduction, distribution or use of this
message is strictly prohibited. Please inform the sender by
reply transmission and delete the message without copying or
opening it.
Messages and attachments are scanned for all viruses known.
If this message contains password-protected attachments, the
files have NOT been scanned for viruses by the ING mail domain.
Always scan attachments before opening them.
-----------------------------------------------------------------