NonStop Monitoring

wbreidbach

unread,

Jun 7, 2010, 5:10:59 AM6/7/10

to

We are going to use to Open-Source tool NAGIOS to monitor our NonStop-
systems. I have created a couple of programs monitoring things like
CPU, I/O, Pathway-Servers, files, line and so on. The functionality is
growing repidly. All the programs are collecting the "things-to-
monitor" themselves as far as possible because I am a lazy guy and
want to avoid lots of configuration. The whole thing is database-based
and a bunch of SQL/MP-tables is used, one of these tables containing
only all the messages that are currently valid, like "File $A.B.C is
95% full". I want to get these messages to the NAGIOS-Server which is
running on a Linux box. Our Linux-people do not want to use a database-
connection like ODBC or JDBC and so we build a flat file that is
picked up by the NAGIOS-system. That works but I am not really happy
with this solution.
My question now: Has anybody experience in doing something like that
or has anybody an idea for a better solution?

mustlearntandem

unread,

Jun 7, 2010, 9:46:06 AM6/7/10

to

On Jun 7, 5:10 am, wbreidbach <wolfgang.breidb...@bv-

I don't suppose you are already using a database replication product
like Shadowbase for some other application are you? It could
replicate the SQL table.

wbreidbach

unread,

Jun 7, 2010, 10:40:02 AM6/7/10

to

On 7 Jun., 15:46, mustlearntandem <mustlearntan...@netscape.net>
wrote:

The Nagios product is running on a Linux system none of the common
databases installed. I am not sure what Nagios uses inside, but a
database-connection is not possble. Just a bit more information: If
you get a message to Nagios it shows that message and forgets it,
there is no feature of escalation and deescalation. So if you want to
provide nagios with a new message you have to send all your active
messages. So you see, Nagios has some disadvantages but is has one big
advantage: It is open source! And our main goal is to get all of our
many Linux-, Unix- and Windows-Systems into one monitoring system
without paying big amounts of money. The previously mentioned way with
the flat file works but in my opinion this is not "state of the art".

mustlearntandem

unread,

Jun 7, 2010, 10:57:04 AM6/7/10

to

On Jun 7, 10:40 am, wbreidbach <wolfgang.breidb...@bv-

By "active" messages do you mean live messages out of EMS? I guess
the best solution would be a simple tcpip server monitoring an ems
distributor on the Tandem with a Linux client passing the information
to the Nagios system. But, even though it is simple, it still would
require development. Just day-dreaming here... I wonder if there is a
telnet emulator that can capture and pipe its output. This way you
could start a tacl session from Linus and start an ems distributor and
get all live messages piped to Nagios. I often use PuTTY as an
emulator and it has a logging feature that can filter for printable
messages but I don't know if it can be redirected - perhaps on Linux.
At least there would be no development!

wbreidbach

unread,

Jun 7, 2010, 11:06:56 AM6/7/10

to

On 7 Jun., 16:57, mustlearntandem <mustlearntan...@netscape.net>

Probably there was a misunderstanding concerning the "active"
messages. Given the example "file $a.b.c is 95% full" this message is
active until someone has taken care of that and it is again beneath
let's say 80%. So if I send another message to Nagios I have to send
that "old" message, too. I am handling all the stuff with escalation
and deescalation by application so the previously mentioned table (and
the flat file) contain all the messages someone has to take care off,
like "file full", "line down", "process missing", "audittrail not
dumped within time" and perhaps lots of others (hopefully none). In
addition we prepare performance data, like CPU-usage, -queuelen and so
on.

wbreidbach

unread,

Jun 7, 2010, 11:11:03 AM6/7/10

to

On 7 Jun., 17:06, wbreidbach <wolfgang.breidb...@bv-

Just as additional information: I did a presentation on that last
year:

http://www.nonstopevent09.gtug.de/download/Presentation/Breidbach_NonStop_to_Nagios.pdf

Meanwhile a lot of functions have been added, but the presentation
gives an impression of what I am doing.

Keith Dick

unread,

Jun 7, 2010, 5:42:36 PM6/7/10

to

Perhaps I should not comment, since I have zero knowledge of Nagios. Please keep that in mind as you read this, and realize that I am only guessing about things.

I find it very surprising that a product intended for system monitoring does not have what I would consider an essential capability: Keeping track of the state of the objects that it is monitoring. It seems very odd that you must save the history in your SQL table and resend it whenever you send a new message.

Are you certain that you are not somehow overlooking object state tracking capability in Nagios? I hesitate to ask, because I do not want to make it seem that I am criticizing you about this (especially since I know nothing about Nagios and you certainly have been studying it). It is just that I find it very surprising that Nagios does not have a way to do what I think you want it to do.

I took a very quick look at the Nagios documentation. When I look at the Overview in the Nagios documentation, I see that Nagios includes an optional web interface "for viewing current network status". So it appears that Nagios does keep track of the state of the objects it is monitoring. Also, I get the impression that both the notification escalation feature and the state stalking feature require that Nagios has a way to keep information about the state of the objects it is monitoring. Maybe I'm misinterpreting what I see in the documentation, but I wanted to raise the question, since if Nagios does have the ability to keep track of the objects' state, that ought to make your job easier.

Maybe I'm focusing on the wrong thing (see my first paragraph, above). Maybe you have figured out how Nagios remembers state, everything you are doing is necessary, and the point of your question is just to find out how to send the appropriate messages from the NonStop system to the Linux system on which Nagios is running. I looked at the Nagios documentation with this in mind, and it sort of seems to me that what you are doing is what they call a passive check, discussed at this page:

http://nagios.sourceforge.net/docs/3_0/passivechecks.html

Near the bottom of that page, it specifically talks about submitting passive check results from remote hosts. It says that the NSCA addon was intended for that purpose. NSCA consists of a part that runs on the Nagios host and a part that runs on the remote host. I assume that NSCA is open source, and so you ought to be able to adapt the part for the remote host to run on the NonStop system. I imagine NSCA uses TCP to communicate between the remote host and the Nagios host, which the NonStop system certainly can use, so I imagine you could get it working without very much effort. (That, of course, raises the question of how you communicate a failure of the TCP subsystem, but maybe they have a solution for that, too.)

If you are already using NSCA to send messages to the Nagios system, then I would think that you have already integrated the NonStop system into Nagios in the way the designers expected. If that is the case, is the problem that you just aren't satisfied with using the normal way NSCA communicates and are looking for a different way to send the messages? Assuming NSCA is open source, you ought to be able to adapt both the Nagios host part and the NSCA remote host part to use whatever communications channel you like, though I cannot immediately think of a good alternative to TCP for the communication.

If you have not already tried this, it might help to find an online forum specifically for Nagios users and post a question about this. The Nagios web site has a page that names some mailing lists that seem to be for this purpose:

http://wiki.nagios.org/index.php/Mail_Lists

It also has a page that lists a few online discussion forums:

http://wiki.nagios.org/index.php/Forums

I also went to groups.google.com, typed nagios into the "Search for a group" box (the one in the middle of the page, not the one at the top), and it shows several groups that seem to be for discussing Nagios. They don't seem to be newsgroups, as comp.sys.tandem is, but somethings else I'm not familiar with.

dimandja

unread,

Jun 7, 2010, 6:51:48 PM6/7/10

to

My curiosity piqued, I went looking for information on Nagios. What I
found may not be too thrilling for Tandem users, but what do I know
after a few minutes of studying thos thing.

The first problem is that Nagios curently seems to support only 3
platforms: Unix variants, Windows, and Network appliances. No Tandem
or other mainframes.

But I looked at how Nagios "monitors" those entities to try to solve
the Tandem issue.

What I found is that the straightforward way of using Nagios to
monitor a platform is by using an plug-in on the Nagios host, and
installing an agent on the monitored machine. Also a (DDL like) map
defining tracked object should be made available to Nagios.

I think it is possible to cheat a bit here by piping Tandem data
through resources already available for, say, Windows.

For example, configure Nagios for Windows, install "check_nt" the
Windows plugin on the Lynux machine. Use, and modify to taste, the
already included Windows services definitions.

The only remaining challenge will be to code a 'Nagios agent' on
Tandem. There is already an agent available for Windows: the NSClient+
+ addon. You may be able to obtain the source file here (http://
sourceforge.net/projects/nscplus/), and modify it to suit Tandem.

You're right, sending flat files to Nagios will not do what Nagios was
intended to do.

Harald

unread,

Jun 8, 2010, 1:11:54 AM6/8/10

to

On 7 Jun., 11:10, wbreidbach <wolfgang.breidb...@bv-

The usual way we connect heterogenous systems is to use TCP/IP socket
communication.
Its simple enough. All systems support it. It's a synchronous way of
communication.
I understand that you have programmed the nagios part on NonStop.
So it should be fairly easy to integrate the socket communication.
I don't know the Nagios server part.

If you need if have sample programs for socket client/server
communication
even for Cobol Pathway servers.

wbreidbach

unread,

Jun 8, 2010, 4:23:50 AM6/8/10

to

Thank you for the interesting replies. In fact, Nagios was not created
for use with NonStop. But anyway, there has been the decision to use
Nagios for all the systems including NonStop and now I am fighting
with that problem. By the way, on the NonStop itself I did not have
big problems, there are lots of information available to collect.
I did not create a Nagios-specific monitoring on the NonStop, the idea
was:
Check everything important
If something is wrong, try to repair it (like a not running Pathway-
Server)
If you cannot repair immediately, create a table-entry
If it is repaired, mark the tableentry as "solved"
My primary purpose was to monitor lots of things on the NonStop,
collect lots of information for documentation and as much as possible
automatically, like reloads of tables/files. As I mentioned I am a
lazy guy and I do not like that "everyday's work". I prefer the more
interesting things like this monitoring tool.

Our very first version was to start processes from the Nagios server
like "pathcom $ABC;status server *", what required lots of additional
configuration on the nagios server and produced many process
creations. In addition every change of the NonStop configuration had
to be done on the Nagios server, too.

At the moment the flat file mentioned is picked up by Nagios via SCP
and processed by a Linux script.
I thought about the TCP/IP sockets connection myself, an other
proposed way would be to use SNMP traps. I will have to talk with the
Linux people about that.

dimandja

unread,

Jun 8, 2010, 8:28:31 AM6/8/10

to

> Our very first version was to start processes from the Nagios server
> like "pathcom $ABC;status server *", what required lots of additional
> configuration on the nagios server and produced many process
> creations. In addition every change of the NonStop configuration had
> to be done on the Nagios server, too.

I dunno. This sounds like too much redundant coding and
synchronization.

At any rate, it will still require a lot of code on the Tandem side to
provide live feed of Tandem events to Nagios -- flat files can only
offer stale info.

Instead of hardcoding every Pathcom server config in Nagios, you
should invest efforts into Tandem-based Subsystem Programmatic
Interface processes. It is messy and unreliable to keep duplicating
configs.

Nagios should never have to have prior knowledge of system
configurations on Tandem: it defeats monitoring fundamentals.
Monitoring means you don't know what is going on, but you would like
to know when it does.

If, for example, you tell Nagios in advance that the monitored host
only has 2 cpu and two servers in Pathway, Nagios will never detect
the extra hardware and software someone may install on purpose or by
accident.

Nagios should depend solely on the Tandem agent to feed it information
on every resource available or lost on Tandem, not the other way
around.

By the way, Nagios works well with TCP/IP.

wbreidbach

unread,

Jun 8, 2010, 8:46:13 AM6/8/10

to

I agree, it does not make any sense to maintain the NonStop
configuration parameters on the Nagios server, I just described our
very first steps into Nagios monitoring. I do not want do have any
NonStop configuration parameters (exception ipaddress and logical
name) on the Nagios server. My concept includes automatic
configuration whereever possible, like collecting all the lines, all
the Pathway monitors and servers, all RDF configurations, all TCP/IP
and so on. I do not like manual configuration at all.

The NonStop programs are using the SPI-interfaces whereever possible
(RDF does not yet have an SPI-inferface). Unfortunately not all the
SPI-documentation has been published, but you can use SCF to find most
of that information, it is a bit tricky, but it works.

Next thing we will try is using SNMP.

TEP

unread,

Jun 8, 2010, 8:55:30 AM6/8/10

to

Hello Herr Breidbach,

I assuming that nagios supports SNMP objects? I suspect the best
approach to this problem is to use the SNMP agent on the NSK to
translate EMS messages into traps to send to the nagios network
management host. You can use OMF to create EMS messages for events like
disk full.

There is a MIB provided with the NSK SNMP software to describe objects
on the machine. The MIB can be extended, and can be exported between
machines. That way, the network management host "knows" about NSK objects.
We did something like this for the Citibank machines several years back.

Regards
T Peel

wbreidbach

unread,

Jun 8, 2010, 9:39:25 AM6/8/10

to

Hello Mr. Peel,

that is exactly what I want to try next. I am just reading the manual.
The necessary EMS-events are already created.

Tom P

unread,

Jun 9, 2010, 5:37:27 AM6/9/10

to

You have a PM.
Mfg

wbreidbach

unread,

Jun 11, 2010, 8:52:18 AM6/11/10

to

Now I have implemented the SNMP solution on the NonStop, this is not a
big problem. The only thing that has to be done is to handle the
events on the Nagios server.
Thanks everybody for a lot of valuable input.
I do not think this "Nagios agent for NonStop" is going to be
something like a product because we are not a vendor.
Anyway, if somebody is interested in some more details or has some
good ideas concerning things to monitor feel free to contact me.

wbreidbach

unread,

Apr 11, 2013, 7:24:36 AM4/11/13

to

Just to give an update on this really old but probably interesting theme:
Meanwhile we have developed a complete set of monitoring tools. This toolbox is able to take care of most of the regularly daily work. in addition it is able to act as a client to Nagios and other monitoring servers. For those interested in this have a look at the March Availability Digest:

http://www.availabilitydigest.com

Robert Hutchings

unread,

Apr 11, 2013, 9:14:01 AM4/11/13

to

We used NetCool on a UNIX server for our event monitoring. This was 2000-2001 time frame. We had to use a special library to send messages from applications to NetCool.

It puzzled me why we were using a UNIX server to process NonStop events, but I guess the suits in the mahogany offices thought it was a great idea :)

Roberto Veldhoven

unread,

Apr 12, 2013, 5:34:20 AM4/12/13

to

We are using Nagios here to present the status of many systems/applications (Nagvis also being used, my colleague says). But because the interface is limited and poor (IMO) we have created a gSoap webserver on the Linux box running Nagios, and created a gSoap-client on the NonStop host to send over the statuses. Essentially this leaves all poor implementation-issues (like the limited amount of data that can be transferred) on the Linux server.

So, in Nagios-terms, it is only passive checks being done.

Its use is growing here too, one recent project started is to send over the result of each backup-job. So it is fairly strategic (I do not expect approval to share any code).

wbreidbach

unread,

Apr 12, 2013, 8:57:33 AM4/12/13

to

We have written our own monitoring which is acting proactive where possible. We are using a pretty simple TCP/IP interface to deliver all information to a Nagios server. This tool has taken over most of the boring daily work like reloading files, restarting processes, checking backup-listings, creating SQL-statistics and a lot more. In addition we collect most of the system configuration and store the information in SQL-tables. Last but not least we collect lots of statistical data.
The functionality is permanently growing. For us this is really stratigic and my management has decided that we would be willing to license this tool for others charging a monthly fee. An Enscribe-version is being tested at the moment, this has been developed because I wanted to verify if this is possible. We are not a software vendor so we are not going to "sell" the software like a vendor would do.