I don't suppose you are already using a database replication product
like Shadowbase for some other application are you? It could
replicate the SQL table.
The Nagios product is running on a Linux system none of the common
databases installed. I am not sure what Nagios uses inside, but a
database-connection is not possble. Just a bit more information: If
you get a message to Nagios it shows that message and forgets it,
there is no feature of escalation and deescalation. So if you want to
provide nagios with a new message you have to send all your active
messages. So you see, Nagios has some disadvantages but is has one big
advantage: It is open source! And our main goal is to get all of our
many Linux-, Unix- and Windows-Systems into one monitoring system
without paying big amounts of money. The previously mentioned way with
the flat file works but in my opinion this is not "state of the art".
By "active" messages do you mean live messages out of EMS? I guess
the best solution would be a simple tcpip server monitoring an ems
distributor on the Tandem with a Linux client passing the information
to the Nagios system. But, even though it is simple, it still would
require development. Just day-dreaming here... I wonder if there is a
telnet emulator that can capture and pipe its output. This way you
could start a tacl session from Linus and start an ems distributor and
get all live messages piped to Nagios. I often use PuTTY as an
emulator and it has a logging feature that can filter for printable
messages but I don't know if it can be redirected - perhaps on Linux.
At least there would be no development!
Probably there was a misunderstanding concerning the "active"
messages. Given the example "file $a.b.c is 95% full" this message is
active until someone has taken care of that and it is again beneath
let's say 80%. So if I send another message to Nagios I have to send
that "old" message, too. I am handling all the stuff with escalation
and deescalation by application so the previously mentioned table (and
the flat file) contain all the messages someone has to take care off,
like "file full", "line down", "process missing", "audittrail not
dumped within time" and perhaps lots of others (hopefully none). In
addition we prepare performance data, like CPU-usage, -queuelen and so
on.
Just as additional information: I did a presentation on that last
year:
http://www.nonstopevent09.gtug.de/download/Presentation/Breidbach_NonStop_to_Nagios.pdf
Meanwhile a lot of functions have been added, but the presentation
gives an impression of what I am doing.
Perhaps I should not comment, since I have zero knowledge of Nagios. Please keep that in mind as you read this, and realize that I am only guessing about things.
I find it very surprising that a product intended for system monitoring does not have what I would consider an essential capability: Keeping track of the state of the objects that it is monitoring. It seems very odd that you must save the history in your SQL table and resend it whenever you send a new message.
Are you certain that you are not somehow overlooking object state tracking capability in Nagios? I hesitate to ask, because I do not want to make it seem that I am criticizing you about this (especially since I know nothing about Nagios and you certainly have been studying it). It is just that I find it very surprising that Nagios does not have a way to do what I think you want it to do.
I took a very quick look at the Nagios documentation. When I look at the Overview in the Nagios documentation, I see that Nagios includes an optional web interface "for viewing current network status". So it appears that Nagios does keep track of the state of the objects it is monitoring. Also, I get the impression that both the notification escalation feature and the state stalking feature require that Nagios has a way to keep information about the state of the objects it is monitoring. Maybe I'm misinterpreting what I see in the documentation, but I wanted to raise the question, since if Nagios does have the ability to keep track of the objects' state, that ought to make your job easier.
Maybe I'm focusing on the wrong thing (see my first paragraph, above). Maybe you have figured out how Nagios remembers state, everything you are doing is necessary, and the point of your question is just to find out how to send the appropriate messages from the NonStop system to the Linux system on which Nagios is running. I looked at the Nagios documentation with this in mind, and it sort of seems to me that what you are doing is what they call a passive check, discussed at this page:
http://nagios.sourceforge.net/docs/3_0/passivechecks.html
Near the bottom of that page, it specifically talks about submitting passive check results from remote hosts. It says that the NSCA addon was intended for that purpose. NSCA consists of a part that runs on the Nagios host and a part that runs on the remote host. I assume that NSCA is open source, and so you ought to be able to adapt the part for the remote host to run on the NonStop system. I imagine NSCA uses TCP to communicate between the remote host and the Nagios host, which the NonStop system certainly can use, so I imagine you could get it working without very much effort. (That, of course, raises the question of how you communicate a failure of the TCP subsystem, but maybe they have a solution for that, too.)
If you are already using NSCA to send messages to the Nagios system, then I would think that you have already integrated the NonStop system into Nagios in the way the designers expected. If that is the case, is the problem that you just aren't satisfied with using the normal way NSCA communicates and are looking for a different way to send the messages? Assuming NSCA is open source, you ought to be able to adapt both the Nagios host part and the NSCA remote host part to use whatever communications channel you like, though I cannot immediately think of a good alternative to TCP for the communication.
If you have not already tried this, it might help to find an online forum specifically for Nagios users and post a question about this. The Nagios web site has a page that names some mailing lists that seem to be for this purpose:
http://wiki.nagios.org/index.php/Mail_Lists
It also has a page that lists a few online discussion forums:
http://wiki.nagios.org/index.php/Forums
I also went to groups.google.com, typed nagios into the "Search for a group" box (the one in the middle of the page, not the one at the top), and it shows several groups that seem to be for discussing Nagios. They don't seem to be newsgroups, as comp.sys.tandem is, but somethings else I'm not familiar with.
The first problem is that Nagios curently seems to support only 3
platforms: Unix variants, Windows, and Network appliances. No Tandem
or other mainframes.
But I looked at how Nagios "monitors" those entities to try to solve
the Tandem issue.
What I found is that the straightforward way of using Nagios to
monitor a platform is by using an plug-in on the Nagios host, and
installing an agent on the monitored machine. Also a (DDL like) map
defining tracked object should be made available to Nagios.
I think it is possible to cheat a bit here by piping Tandem data
through resources already available for, say, Windows.
For example, configure Nagios for Windows, install "check_nt" the
Windows plugin on the Lynux machine. Use, and modify to taste, the
already included Windows services definitions.
The only remaining challenge will be to code a 'Nagios agent' on
Tandem. There is already an agent available for Windows: the NSClient+
+ addon. You may be able to obtain the source file here (http://
sourceforge.net/projects/nscplus/), and modify it to suit Tandem.
You're right, sending flat files to Nagios will not do what Nagios was
intended to do.
The usual way we connect heterogenous systems is to use TCP/IP socket
communication.
Its simple enough. All systems support it. It's a synchronous way of
communication.
I understand that you have programmed the nagios part on NonStop.
So it should be fairly easy to integrate the socket communication.
I don't know the Nagios server part.
If you need if have sample programs for socket client/server
communication
even for Cobol Pathway servers.
Thank you for the interesting replies. In fact, Nagios was not created
for use with NonStop. But anyway, there has been the decision to use
Nagios for all the systems including NonStop and now I am fighting
with that problem. By the way, on the NonStop itself I did not have
big problems, there are lots of information available to collect.
I did not create a Nagios-specific monitoring on the NonStop, the idea
was:
Check everything important
If something is wrong, try to repair it (like a not running Pathway-
Server)
If you cannot repair immediately, create a table-entry
If it is repaired, mark the tableentry as "solved"
My primary purpose was to monitor lots of things on the NonStop,
collect lots of information for documentation and as much as possible
automatically, like reloads of tables/files. As I mentioned I am a
lazy guy and I do not like that "everyday's work". I prefer the more
interesting things like this monitoring tool.
Our very first version was to start processes from the Nagios server
like "pathcom $ABC;status server *", what required lots of additional
configuration on the nagios server and produced many process
creations. In addition every change of the NonStop configuration had
to be done on the Nagios server, too.
At the moment the flat file mentioned is picked up by Nagios via SCP
and processed by a Linux script.
I thought about the TCP/IP sockets connection myself, an other
proposed way would be to use SNMP traps. I will have to talk with the
Linux people about that.
I dunno. This sounds like too much redundant coding and
synchronization.
At any rate, it will still require a lot of code on the Tandem side to
provide live feed of Tandem events to Nagios -- flat files can only
offer stale info.
Instead of hardcoding every Pathcom server config in Nagios, you
should invest efforts into Tandem-based Subsystem Programmatic
Interface processes. It is messy and unreliable to keep duplicating
configs.
Nagios should never have to have prior knowledge of system
configurations on Tandem: it defeats monitoring fundamentals.
Monitoring means you don't know what is going on, but you would like
to know when it does.
If, for example, you tell Nagios in advance that the monitored host
only has 2 cpu and two servers in Pathway, Nagios will never detect
the extra hardware and software someone may install on purpose or by
accident.
Nagios should depend solely on the Tandem agent to feed it information
on every resource available or lost on Tandem, not the other way
around.
By the way, Nagios works well with TCP/IP.
I agree, it does not make any sense to maintain the NonStop
configuration parameters on the Nagios server, I just described our
very first steps into Nagios monitoring. I do not want do have any
NonStop configuration parameters (exception ipaddress and logical
name) on the Nagios server. My concept includes automatic
configuration whereever possible, like collecting all the lines, all
the Pathway monitors and servers, all RDF configurations, all TCP/IP
and so on. I do not like manual configuration at all.
The NonStop programs are using the SPI-interfaces whereever possible
(RDF does not yet have an SPI-inferface). Unfortunately not all the
SPI-documentation has been published, but you can use SCF to find most
of that information, it is a bit tricky, but it works.
Next thing we will try is using SNMP.
Hello Herr Breidbach,
I assuming that nagios supports SNMP objects? I suspect the best
approach to this problem is to use the SNMP agent on the NSK to
translate EMS messages into traps to send to the nagios network
management host. You can use OMF to create EMS messages for events like
disk full.
There is a MIB provided with the NSK SNMP software to describe objects
on the machine. The MIB can be extended, and can be exported between
machines. That way, the network management host "knows" about NSK objects.
We did something like this for the Citibank machines several years back.
Regards
T Peel
Hello Mr. Peel,
that is exactly what I want to try next. I am just reading the manual.
The necessary EMS-events are already created.
You have a PM.
Mfg
Now I have implemented the SNMP solution on the NonStop, this is not a
big problem. The only thing that has to be done is to handle the
events on the Nagios server.
Thanks everybody for a lot of valuable input.
I do not think this "Nagios agent for NonStop" is going to be
something like a product because we are not a vendor.
Anyway, if somebody is interested in some more details or has some
good ideas concerning things to monitor feel free to contact me.