I'm wondering why most of the applications available on Tandem, are
not programmed to be nonstop program. Why the most talked about
feature of Tandem (NonStop) are not used by the program running under
it. All the programs we have here are ordinary programs written in
TAL, C or Cobol running under
Pathway.
Regards
Sajid
Dave
Jim Volstad
jvol...@nonstopguru.com
HP Nonstop Consultant
"Sajid Azami" <saji...@hotmail.com> wrote in message
news:8545b7e3.03062...@posting.google.com...
Your question is a reasonable one. The answer lies in the fact that
designing, writing, and debugging NonStop process pairs is not an
easy task, and without a considerable amount of experience, most
people will do it incorrectly. There are many, many failure scenarios
that can occur such as CPU failure, an operator stopping the "wrong"
PID, or just the application calling ABEND or abending due to a
logic error. Ensuring that the backup process is in a proper state
to correctly handle the takeover requires quite a bit of analysis
even for simple applications since, for example in a CPU failure
scenario, other processes that your application might be using or
depending on might have also been wiped out, disk processes might
also be undergoing an ownership switch, communication lines or
other network resources might also be either be gone completely
or undergoing takeover too.
Consequently, for the majority of "usual" *application* programs,
as opposed to "system" programs, it is preferable to design
applications to be *restartable*, rather than run as a process
pair. That way, an application program is always being started in
known, initial state and does not have a backup process to create,
update its state, and monitor. This, in combination with facilities
like TMF to clean up incomplete disk operations, and a NonStop
"mother" process, such a Pathmon, gives all the advantages of
robust, clean, transaction handling in a much simpler way.
Oz
By the way, having a back up process does NOT constitute non-stop
coding. You need to be checkpointing to the backup process to ensure
that the transaction will get processed by the backup if the primary
dies.
Julian
Dave Bossi <dave...@C.O.M.C.A.S.T.net> wrote in message news:<3EFB40E3...@C.O.M.C.A.S.T.net>...
Note that we are discussing application, not system processes. Fault
tolerant disk and communication controllers require fault-tolerant
managers. I confess to developing my last new nonstop process over 5
years ago (i.e. beyond available recent memory) simply because the
improvements in transactional facilities and (I guess my) design
techniques reduced the need to do the extra work - it is simply cheaper
to develop and maintain persistent code.
Regarding in-flight EFT transactions, I would be curious to know when
the last completely new nonstop process needed to be developed for an
EFT service that makes use of transaction management tools - although
I'm not sure any Tandem-based EFT systems actually use TMF much less a
two-phase system so that probably isn't fair.
Dave
As far as i know Base24, the most popular application on Tandem,
neither use TMF nor used Fault-Tolerant programming tecniques.
The application we use does uses TMF and is running under pathway, so
i guess the simple answer to my query is you need Fault-Tolerant
programming only for system program not for application programs.
Or I'm missing something ...
Thanks
Sajid
If your entire world revolves around data persistence, then TMF (or Tuxedo,
or whatever), is going to do it. However, life isn't simple. In some
applications, there's communication-link persistence, and as Dave pointed
out, transaction persistence. I think Dave's point was that the need for
varying degrees of fault-tolerance is business and application dependent.
Yes, of course, there are often non-fault tolerant solutions to many
problems, but they may not be viable in a production environment. Pseudo
real-time manufacturing systems can have too high restart times after a
failure and can require true fault tolerant integration points (go figure
that one out ;-)).
In an EFT classroom, you might be correct. But life is not EFT. Nor does if
have a reset button.
"Sajid Azami" <saji...@hotmail.com> wrote in message
news:8545b7e3.03062...@posting.google.com...
>Relying on application persistence (such as Pathway) will keep the
>application up and running. However - if you have an application that
>absolutely, positively, CANNOT drop a transaction (such as an EFT
>transaction), then you really need to be using non-stop coding.
>Having Pathway restart a server will not continue to process the
>message that was "in-flight" when the server died.
>
Actually, non-stop programming avails you little/nothing as far as
the handling of in-flight transactions and having sone more than
one EFT/ATM/POS system the handling of in-flight transactions
is properly handled via a non-TMF-audited transaction log.
If one absolutely, positively cannot drop a transaction, then
one has to design for complete application/system loss and a
properly checkpointed backup process won't help because it's
gone too.
New transactions should be journalled at key points with appropriate
state flags so that even after a complete loss of the application/
system incomplete transactions can be rolled-back by the restarted
application and/or reversal transactions generated to external
authorisors (which may have to be repeated for several hours until
the external authorisor gives a response to adjudicate the
incomplete transaction).
The transaction log should be non-TMF-audited so that TMF doesn't
destroy the last state of the in-flight transaction during
recovery.
Oz
What EFT/ATM/POS system doesn't support a some variant of a
restart/timeout/reversal process? The NonStop system isn't the weakest link
in the chain, the ATM is. Chances are, either the ATM is going down,
possibly because somebody drove off with the unit, or the POS device is
going to die because somebody dropped it.
As far as losing a date center, there's nothing to prevent you from
implementing a cross-node checkpoint, using the lastest C-style checkpoints
to supplement the traditional CHECKPOINTMANYX calls. Your approach to
journaling isn't going to buy you anything more than TMF - and probably a
lot less, especially considering that you can decouple TMF from your
financial transactions, if you really need to (not recommended by the
author, just possible). There's also RDF and GoldenGate to handle cross-node
transaction persistence.
There is a fundamental difference between the TMF transaction concept and a
business transaction. Business transactions, when finally reconcilled can
span many TMF transactions, when you take reversals into account. Are you
trying to build a transaction monitor on your own that reflects business
transactions? If so, I think you may be missing the point of data-level
transactions.
But to the original poster's question, checkpointing is a very well
established and reliable pattern, regardless of whether you are in a
traditional NSK CHECKPOINT space, or cross system. The closest alternatives,
from a statistical standpoint, are to run: a) full hot standby; or b)
arbitrated parallel. It is required when connection-level and decision-level
reliability cannot be compromised (like in persistent TCP connections, or
Space Shuttle-type sitations). EFT/ATM/POS have a relatively low-levels of
fault-tolerant requirements compared with industrial automation and embedded
systems monitoring and control, so citing that as the definitive example is
going to end up being fairly self-limiting in a discussion on the topic.
XPNet performs functions similar to a PathMon process: it starts the
satellite and sends a startup message. However, XPNet is also
responsible for routing messages between satellite processes (i.e.,
all application messages pass through the XPNet process). The obvious
problem with this is that the XPNet process could become a bottleneck,
but the advantage is that if a satellite process goes down, XPNet
keeps the last message it sent to it (and queues up any new messages)
and resends it when the application process comes back up, with a flag
indicating the message may be a possible duplicate (from the
application's perspective), allowing the application process to deal
with it in whatever way is appropriate.
From the satellite process's perspective, this is all automatic
(assuming it uses an XPNet interface library to initialize its access
to the XPNet process).
That way, XPNet satellite processes don't have be coded as nonstop
process pairs. This has been a facet of BASE24 since the very
beginning (all the way back to the days of its predecessor application
ACI/NET and with XPNet's predecessor Spannet).
XPNet does a lot of other things, too, such as dealing with the I/O
ports for remote devices like ATMs and links to interchanges,
configuration control, etc., but the point is that application
software fault tolerance is provided via XPNet.
BTW: some newer BASE24 products do use TMF (and NonStop SQL).
```
Joe
(remove "q.w.e.r.t.y" to reply by email)
On 27 Jun 2003 12:33:01 -0700, saji...@hotmail.com (Sajid Azami)
spewed forth:
You have missed the point....it was Julian who brought up EFT as a
potential example of an application where nonstop coding was needed.
I was refuting that.
The other examples you cite, RDF, GoldenGate, etc. are examples of
system level frameworks and not examples of application level coding.
I had already stated elsethread that such system level functionality
may be an appropriate areas for nonstop coding.
Try reading for comprehension next time.
Oz
Now? Many years older, wiser (??) and grayer.
Hi Sajid,
I once heard a Tandem consultant said, when Spannet was designed,
there was no Pathway yet or TMF. Spannet and Pathway are alike. Take a
look at the program. You call initialize in Pathway program and
Netinit in Base24 program and etc. There another way (other than using
TMF) to achieve consistent in your database : readlock all the records
you need before updating. Instead of reading and updating CAF then
follow by PBF and other files, you can readlock the record in CAF, PBF
and update PBF and then CAF. You can read Guardian programmer's Guide
on this. BTW, Base24 rel 6 does use TMF.
Please try plonking for comprehension next time.
Oz