checking internal disks on a RX3600 running OpenVMS

Joukj

unread,

Aug 15, 2016, 8:19:47 AM8/15/16

to

Hi All,

My RX36000 with raid-controler (How do I find oout the type?0 for the
internal disk-drives was recently moved from a room in my office to a
central server-center. So I'm not able to see if all disk are working
properly by just looking at the leds. How can I check the disks when the
machine is running OpenVMS 8.4?

Regards
Jouk

RobertsonEricW

unread,

Aug 15, 2016, 9:49:25 AM8/15/16

to

RobertsonEricW

unread,

Aug 15, 2016, 9:54:16 AM8/15/16

to

On Monday, August 15, 2016 at 8:19:47 AM UTC-4, Joukj wrote:

It may depend on the type of RAID controller you are using. But you can try running MSA$UTIL. This is a command line driven utility the initial command will always be SET CONTROLLER to determine which RAID controller subsequent commands are to be directed. Once the target RAID controller is set issue the command SHOW CONTROLLER/FULL will display all of the important information from the controller perspective including the individual disks that are currently connected to the controller. The SHOW unit command will display the logical RAID containers that are currently configured within the RAID controller. Hope this helps.

Eric

Ian Miller

unread,

Aug 15, 2016, 9:55:15 AM8/15/16

to

I do some parsing in DCL of the output of the following to check for things

$ MCR MSA$UTIL
SET CONTROLLER PKA
SHOW CONTROLLER
SHOW UNIT
EXIT

Stephen Hoffman

unread,

Aug 15, 2016, 10:37:48 AM8/15/16

to

Via MSA$UTIL and ANALYZE /ERROR /ELV, most commonly.

But details depend on the storage controllers installed, and the
particular failing hardware.

No single error tool supports all of the hardware that can be present,
unfortunately.

http://labs.hoffmanlabs.com/node/295

--
Pure Personal Opinion | HoffmanLabs LLC

erga...@gmail.com

unread,

Aug 15, 2016, 10:38:16 AM8/15/16

to

On Monday, 15 August 2016 14:54:16 UTC+1, RobertsonEricW wrote:

> It may depend on the type of RAID controller you are using. But you can try running MSA$UTIL.

SAS$UTIL works very similarly for low end controllers.

David Froble

unread,

Aug 15, 2016, 12:56:26 PM8/15/16

to

Isn't that disgusting? That you have to remember to run the utility, and treat
the output in such a manner. Makes you wonder if it was written by some point
and click weenie.

Such utilities should be running all the time, making periodic checks of the
disks, and having some manner of activating the electrodes in the system
manager's chair should some disk start having problems.

Bob Koehler

unread,

Aug 15, 2016, 1:51:49 PM8/15/16

to

I'd start with SHOW ERROR.

Then maybe I'd try doing a directory of [000000] on each disk.

Then maybe another SHOW ERROR.

IanD

unread,

Aug 15, 2016, 6:59:44 PM8/15/16

to

On Tuesday, August 16, 2016 at 2:56:26 AM UTC+10, David Froble wrote:
> Ian Miller wrote:

<snip>

>
> Isn't that disgusting? That you have to remember to run the utility, and treat
> the output in such a manner. Makes you wonder if it was written by some point
> and click weenie.
>
> Such utilities should be running all the time, making periodic checks of the
> disks, and having some manner of activating the electrodes in the system
> manager's chair should some disk start having problems.

+1 and lol, 'point and lick weenie' :-)

System management in OpenVMS needs revamping, from error / event reporting, up to automated repair (and we need a decent scripting language for this to happen)

We all keep saying this but where is the definitive list of what needs doing / fixing / nice to have?

Sending emails to VSI and having that disappear into a non-reflective void isn't the answer to my way of thinking.

A dropbox for suggestions, a Google Docs spreadsheet at least or something similar would be nice where suggestions could be collated and viewed and have input from all on and is trackable.

20 years ago when I worked for a place and we did application development we had a pin system. PIN - Product Improvement Note, a very simple concept that allowed people to PIN an improvement against a known aspect of the application.

Even something as low tech as this is would be a good idea, versus endlessly having people post on a forum such as this that is ill-equipped for tracking such suggestions. I'm sure I have posted ideas that have been brought up before but because Google groups isn't an issues tracker, then I'm sure I'm destined to repeat my evil posting ways time and time again.

I'm really really hoping that at the conference VSI address the whole community involvement / feedback mechanism that is severely lacking IMO.

While it may be fine to electrocute the system manager (and there are some I would love to have given a lethal dose to over the years), a framework of being able to automate a replacement disk or whatever would also be good too.

It could be sold as an add-on to OpenVMS *groan*.

Anything less than major configuration changes should fall into a framework where-by they are handled automagically should be the forward think model for OpenVMS.
I'm over having to muck around with bringing in extra shadowset members so that disks can be swapped out or having to wade through diagnostics logs to see that it was a simple soft read error. I'd like to see it automated like what EMC and the likes do, they have a pool of standby disks that get automatically swapped in and the old one is marked for replacement and the technician comes out and simply swaps out the bad for a good one.

The system manager needs to become the VMS architect in the future, not the firefighter of old (which was a lot of fun I must admit but if all you are doing is fighting fires then your never getting the chance to stop the blazes occurring in the first place).

Joukj

unread,

Aug 16, 2016, 2:19:03 AM8/16/16

to

Thanks, that worked.

Jouk

Joukj

unread,

Aug 16, 2016, 2:20:47 AM8/16/16

to

That checks the filesystem as it is seen by OpenVMS. I cannot see in
this way if a disk in the raid array failed and needs to be replaced.

Jouk

RobertsonEricW

unread,

Aug 16, 2016, 9:18:20 AM8/16/16

to

The SHOW UNIT command in MSA$UTIL will show failed disks even when the logical RAID container is still functional. You will then known which disk(s) need to be replaced.

Eric

Stephen Hoffman

unread,

Aug 16, 2016, 10:35:00 AM8/16/16

to

On 2016-08-15 22:59:38 +0000, IanD said:

> System management in OpenVMS needs revamping, from error / event
> reporting, up to automated repair (and we need a decent scripting
> language for this to happen)

OPCOM and operator communications and automated interfaces for same,
distributed logging and security mechanisms, logical volume management,
predictive repairs, the whole implementation is problematic.

That's ignoring that pretty much everything has to be managed from the
command line, and where the system manager has to "help" OpenVMS adapt
to even moderate changes in applications or loads — now imagine how
"app stacking" works in this context, adding or removing apps — through
the associated set of system parameter changes and AUTOGEN incantations
and quota settings and the rest, whether to add or remove apps, or to
contend with changes in system load or otherwise.

> We all keep saying this but where is the definitive list of what needs
> doing / fixing / nice to have?

On Clair Grant's whiteboard, right below the list of changes and
features — and the random weird stuff that paying customers can and do
ask for — that is expected to produce revenue.

> I'm really really hoping that at the conference VSI address the whole
> community involvement / feedback mechanism that is severely lacking IMO.

The buzz phrase for that is "social". "Social media", "social
marketing", et al. Used to be customer engagement — VSI is doing a
whole lot of that — what used to be called — customer engagement,
they're just doing it at small events (boot camp is a small event, too)
and in individual contacts with end-users and ISVs.

> Anything less than major configuration changes should fall into a
> framework where-by they are handled automagically should be the forward

> think model for OpenVMS.I'm over having to muck around with bringing in
> extra ; members so that disks can be swapped out or having to wade

> through diagnostics logs to see that it was a simple soft read error.
> I'd like to see it automated like what EMC and the likes do, they have
> a pool of standby disks that get automatically swapped in and the old
> one is marked for replacement and the technician comes out and simply
> swaps out the bad for a good one.
> The system manager needs to become the VMS architect in the future, not
> the firefighter of old (which was a lot of fun I must admit but if all
> you are doing is fighting fires then your never getting the chance to
> stop the blazes occurring in the first place).

No end-user customer wants to be the OpenVMS architect, nor to train
for and become an OpenVMS firefighter, or otherwise. Pragmatically,
end-users and ISVs don't want to fulfill those roles for even their own
bespoke applications and tools and procedures, though those tasks are
necessary when you're maintaining and developing your own code. The
less OpenVMS gets in the way, and the more OpenVMS helps, the more
likely folks will continue to use and new users will choose it.

Stephen Hoffman

unread,

Aug 16, 2016, 11:12:25 AM8/16/16

to

On 2016-08-16 06:20:55 +0000, Joukj said:

> Bob Koehler wrote:
>>
>> I'd start with SHOW ERROR.
>>
>> Then maybe I'd try doing a directory of [000000] on each disk.
>>
>> Then maybe another SHOW ERROR.
>>
> That checks the filesystem as it is seen by OpenVMS. I cannot see in
> this way if a disk in the raid array failed and needs to be replaced.

Ayup. Among other devices, DQ IDE devices will log almost no errors
(and some of the few errors those devices do log are expected and
normal), various soft memory errors don't show up in SHOW ERROR, more
than a few folks have encountered cases where disk failures have
occurred and those failures don't show up in SHOW ERROR because the
controller masked the problems all the way up to the point where the
array fell offline, and disks and SSDs can be approaching the threshold
for available replacement sectors and still look fine to SHOW ERROR,
too.

SHOW ERROR is a palliative, at best.

IanD

unread,

Aug 21, 2016, 8:53:54 AM8/21/16

to

On Wednesday, August 17, 2016 at 12:35:00 AM UTC+10, Stephen Hoffman wrote:

<snip>

>
> No end-user customer wants to be the OpenVMS architect, nor to train
> for and become an OpenVMS firefighter, or otherwise.

<snip>

>
>
>
> --
> Pure Personal Opinion | HoffmanLabs LLC

The point in saying that the system manager must move up the food chain was to point out that the day to day running of OpenVMS needs to be automated away

Jan-Erik Soderholm

unread,

Aug 21, 2016, 9:12:08 AM8/21/16

to

Today, many companies reads "automated away" as "moved to India"... :-)

Kerry Main

unread,

Aug 21, 2016, 10:20:05 AM8/21/16

to comp.os.vms to email gateway

Just to clarify, the 4 best practice guiding principles of
ALL IT has not changed in decades of IT.

- automation (workload, processes)
- standardization (does not mean 1 platform, but rather
standardization within the different platforms that exist.
Companies often make the mistake of moving to one platform
and spending millions to get there even though the
business case does not support it)
- virtualization (shadowing is just another form of
storage virtualization just as clustering is another form
of server virtualization)
- rationalization

What changes over time is industry / vendor / analyst hype
and technologies that support each of these guiding
principles.

What does not change is that each of these principles need
to be applied to make IT more efficient, cost effective
and to provide their companies with more competitive
products or services.

I expect these guiding principles will also apply to the
next few decades in IT as well.

Regards,

Kerry Main
Kerry dot main at starkgaming dot com

David Froble

unread,

Aug 21, 2016, 11:12:17 AM8/21/16

to

From my perspective, there is no "day to day" running of VMS, at least by user
attention. Most days it just runs and does the jobs assigned to it.

As for automation of some periodic tasks, maybe in some environments, but
definitely NOT in some other environments.

While I'm aware that there are some areas where VMS needs some improvement,
examples being TCPIP, security, the stupidity of the design of MSA$UTIL, and
things like the recent discussion on SHOW NET, I haven't seen any VMS systems
lately where daily "hands on" is required.

Are there issues? Yes. But "don't throw out the baby with the bath water"
seems an appropriate concept.

Stephen Hoffman

unread,

Aug 21, 2016, 2:06:22 PM8/21/16

to

On 2016-08-21 15:12:15 +0000, David Froble said:

> From my perspective, there is no "day to day" running of VMS, at least
> by user attention. Most days it just runs and does the jobs assigned
> to it.
>
> As for automation of some periodic tasks, maybe in some environments,
> but definitely NOT in some other environments.
>
> While I'm aware that there are some areas where VMS needs some
> improvement, examples being TCPIP, security, the stupidity of the
> design of MSA$UTIL, and things like the recent discussion on SHOW NET,
> I haven't seen any VMS systems lately where daily "hands on" is
> required.
>
> Are there issues? Yes. But "don't throw out the baby with the bath
> water" seems an appropriate concept.

Making sure that backups start and finish and verify. Making sure
you're not heading for core files with ;32767 versions. At least, not
unintentionally. Making sure server logs are collected and processed.
Log rotation, for that matter, is a concept entirely missing on
OpenVMS and something experienced system managers write code to manage.
Looking for aberrant network traffic, odd DNS traffic. Distributed
monitoring for security attacks, and collecting data on breaches.
Checking patch revisions and firmware. Tracking serial numbers and
FRUs and the rest. Watching for application crashes, stuck critical
processes and application-level errors. Sure. Nothing here is magic.
All of which is simple on a single server, and starts to get more
interesting as servers are added. All of which can be locally coded,
and tools to watch the processing locally implemented. Some folks
monitor some of this, some all of it, and some don't bother and deal
with the problems as they arise. Needs and expectations vary.
Increased automation, however, can and must and will happen.

David Froble

unread,

Aug 21, 2016, 3:54:16 PM8/21/16

to

And I for one would be very happy with some help to ease those items listed, and
more. What helps is a notifying system that can be used when there are
problems. We've got one in Codis. Yes, it's "home grown" and the programmer
must insure it is implemented. Not going to claim 100% compliance, who can?
But we know when there are problems with BACKUP, apps, and such.

Looking for aberrant network traffic, odd DNS traffic. Distributed monitoring
for security attacks, and collecting data on breaches.

Now these, we do not have, and I'd be very happy if a good implementation were
available. Actually, some may be available from third parties. And there is
the problem. Some customers don't want to pay for what they get. They want it
provided as part of the OS. I can see their point. I can also see the issue of
supporting third parties, if you want extra goodies.

Kerry Main

unread,

Aug 21, 2016, 7:35:04 PM8/21/16

to comp.os.vms to email gateway

> -----Original Message-----
> From: Info-vax [mailto:info-vax...@rbnsn.com] On
> Behalf Of David Froble via Info-vax
> Sent: 21-Aug-16 3:54 PM
> To: info...@rbnsn.com
> Cc: David Froble <da...@tsoft-inc.com>
> Subject: Re: [Info-vax] checking internal disks on a
RX3600
> running OpenVMS
>

I would love to see much more work done on the native
tools (SNMPV3 is critical and massive hole) on OpenVMS,
but very few Customers depend solely on the different
platform supplied support utilities.

Case in point - one could likely count the number of WW
Cust's who use the native Windows backup utility on your
fingers... (ok, perhaps there are some really small ones,
but you get the point)

Same goes for things like anti-virus scanning prod's.

To look at enterprise mgmt. solutions, it is also a major
political battle in most med-large shops because the
network teams have their favorite tools, storage teams
have their favorite tools, server teams their preferred,
App teams have theirs etc. One of the big challenges in
any med to large shop is tools consolidation i.e. how to
pick which ones for each tier will be used, how to
integrate the events, alerts etc., what escalation paths
will be in place, event correlation and how to ensure that
when critical events are missed, how to ensure a
continuous circle of improvement is in place so it does
not happen again.

While it might seem relatively simple, the whole area of
proactive (not reactive like today) enterprise management
in med to large shops is a much bigger beast than what
most people who have not lived and /or carried pagers in
24x7 Operations Support shops really grasp. It is a whole
different world than managing a few (<50?) Dev or prod
systems.

johnwa...@yahoo.co.uk

unread,

Aug 22, 2016, 9:38:06 AM8/22/16

to

Surely you're not suggesting that what people need is an
Enterprise Management Architecture and the interfaces,
protocols, tools, products, and procedures that go with
it?

That would be stretching the "20th century technology in a
21st century world" vision a little too far.

See e.g.
https://www.highbeam.com/doc/1G1-10990728.html

"Digital Equipment Corp., Maynard, Mass., has strengthened
its efforts in the systems software arena with the Polycenter
operations management architecture.

Polycenter, unveiled earlier this year (Software Magazine,
May), is a major part of DEC's overall systems software
blueprint, the Enterprise Management Architecture (EMA),
which was first outlined in 1988.

Since 1988, DEC has broadened EMA acceptance by making
available without charge the interfaces for database access,
data transfer and the user interface. Several third-party
vendors, including some traditional IBM systems software
developers, have agreed to utilize the interfaces in making
packages for DEC systems."

Largely it went nowhere, especially after it was Palmerised,
though some of the larger telcos thought the vision and the
products were the best thing since sliced bread (and then
they bought vendor-specific management products instead).

Some subset of the subject now seems to be called DevOps,
other parts are likely being re-invented from scratch,
frequently by people with no knowledge of the prior start of
the art (even if it's only to learn what mistakes not to
repeat).

Since redefining terms seems to be industry standard practice
these days perhaps DevOps could become DEVOps (Digital
Equipment VMS Ops) or maybe just miskeyed as DECops.

Kerry Main

unread,

Aug 22, 2016, 11:55:05 AM8/22/16

to comp.os.vms to email gateway

> -----Original Message-----
> From: Info-vax [mailto:info-vax...@rbnsn.com] On
> Behalf Of johnwallace4--- via Info-vax
> Sent: 22-Aug-16 9:38 AM
> To: info...@rbnsn.com
> Cc: johnwa...@yahoo.co.uk
> Subject: Re: [Info-vax] checking internal disks on a
RX3600
> running OpenVMS
>

[snip..]

You are correct that EMA was Digitals kick at tackling the
difficult and challenging world of how to create a
proactive management strategy that resolves issues before
they impact the business. Tivoli (IBM), OpenView (HP, now
more or less retired in favour of OneView) and CA are all
other examples of more recent offerings.

Fwiw, you could take the EMA architecture from Digital,
change a few slides, terms and technologies and you would
have a modern strategy for how to proactively manage
private clouds.

DevOPS is mostly industry hype that is similar to SOA, SDN
and other more recent terms. Has some core things that are
ok, but when you peel back the covers, there is lots of
vapor ware because at the core it requires huge changes in
the way multiple different Developers and Operations
groups work in a company. Getting a few developer / OPS
groups to agree on new workflows, strategies and tools
might be possible, but any more groups than that is next
to impossible.

The proper OPS strategy has not changed in 30+ years, but
the huge difficulty remains that IT shops think buying
cool products is the answer to all of this. It's not. At
the core remains that every site is different and you need
to look at each CI device, determine which alerts are
critical or not, forward them to an Operations Bridge for
screening, filtering, escalating etc. then, with a smart
ticketing strategy in place, you can develop how tickets
are created, escalated, group communications is handled.

This is really tough stuff politically because you need
the Network, Server, Storage, App, DB, facility (security
as well in some companies) groups all agreeing on a
strategy that ensures things like -
- when there is network issue, a ticket is created and
assigned to the network group, but communications are sent
to other groups that a network issue may be impacting
their service, but it is being worked. The common approach
today is a network issue happens, impacts everyone and
everyone jumps on their favorite tools to determine if the
incident is their problem or not.
- if a router fails, but it is a paired router for HA,
then notify the Operations group screen via yellow alert
that a router is down, the service is still up, but there
is something that needs to be addressed. If the same
router is not paired, then the same alert shows up on the
OPS screen as red.

This is all custom coding and custom workflows on each
site. There are also tools that overlap between server,
storage and network management - which ones will be used
for each potential failure scenario? Different App groups
will have different monitoring/custom tools as well.

Again, needs planning and consensus building.

To do this properly, the old saying goes that for every $1
spent on a mgmt. product, there should be $3's spent on
the services (internal or external) to properly integrate
that tool.

And yes, similar to why OpenView, Tivoli etc. are tough to
implement, this was a big reason why EMA did not take off.
Technically EMA is a very sound strategy. HP's recent
OneView architecture is actually not bad (with exception
it emphasizes X86-64 support only)
http://www8.hp.com/h20195/v2/GetDocument.aspx?docname=4AA5
-3811ENW

A pretty good video on why you need an Operations Bridge
can be found here: (HP marketing video, but has good,
solid technical strategy)
https://www.youtube.com/watch?v=ZLwfNtBu1sg