NonStop Mode Programming ?

Sajid Azami

unread,

Jun 26, 2003, 2:37:12 PM6/26/03

to

Hi,

I'm wondering why most of the applications available on Tandem, are
not programmed to be nonstop program. Why the most talked about
feature of Tandem (NonStop) are not used by the program running under
it. All the programs we have here are ordinary programs written in
TAL, C or Cobol running under
Pathway.

Regards

Sajid

Dave Bossi

unread,

Jun 26, 2003, 2:52:19 PM6/26/03

to

Many key programs - such as the Pathway product you mentioned - ARE
written NonStop. The key is that well developed applications only use
expensive (developer/resources) nonstop coding techniques where
absolutely required, relying on application persistence (such as
provided by Pathway) where it will suffice. It's all about proper
application design.

Dave

Jim Volstad

unread,

Jun 26, 2003, 6:33:45 PM6/26/03

to

If they are running as PATHWAY applications, they most likely are "Nonstop".
Are you using TMF? Is the server app running with a backup process? Batch
processing? Who cares. Just do a restart on the job.

Jim Volstad
jvol...@nonstopguru.com

HP Nonstop Consultant

"Sajid Azami" <saji...@hotmail.com> wrote in message
news:8545b7e3.03062...@posting.google.com...

ozbear

unread,

Jun 26, 2003, 7:56:51 PM6/26/03

to

On 26 Jun 2003 11:37:12 -0700, saji...@hotmail.com (Sajid Azami)
wrote:

Your question is a reasonable one. The answer lies in the fact that
designing, writing, and debugging NonStop process pairs is not an
easy task, and without a considerable amount of experience, most
people will do it incorrectly. There are many, many failure scenarios
that can occur such as CPU failure, an operator stopping the "wrong"
PID, or just the application calling ABEND or abending due to a
logic error. Ensuring that the backup process is in a proper state
to correctly handle the takeover requires quite a bit of analysis
even for simple applications since, for example in a CPU failure
scenario, other processes that your application might be using or
depending on might have also been wiped out, disk processes might
also be undergoing an ownership switch, communication lines or
other network resources might also be either be gone completely
or undergoing takeover too.

Consequently, for the majority of "usual" *application* programs,
as opposed to "system" programs, it is preferable to design
applications to be *restartable*, rather than run as a process
pair. That way, an application program is always being started in
known, initial state and does not have a backup process to create,
update its state, and monitor. This, in combination with facilities
like TMF to clean up incomplete disk operations, and a NonStop
"mother" process, such a Pathmon, gives all the advantages of
robust, clean, transaction handling in a much simpler way.

Oz

Julian

unread,

Jun 27, 2003, 12:39:03 AM6/27/03

to

Relying on application persistence (such as Pathway) will keep the
application up and running. However - if you have an application that
absolutely, positively, CANNOT drop a transaction (such as an EFT
transaction), then you really need to be using non-stop coding.
Having Pathway restart a server will not continue to process the
message that was "in-flight" when the server died.

By the way, having a back up process does NOT constitute non-stop
coding. You need to be checkpointing to the backup process to ensure
that the transaction will get processed by the backup if the primary
dies.

Julian

Dave Bossi <dave...@C.O.M.C.A.S.T.net> wrote in message news:<3EFB40E3...@C.O.M.C.A.S.T.net>...

Dave Bossi

unread,

Jun 27, 2003, 8:08:15 AM6/27/03

to

In all fairness to the importance of EFT transactions, other than
monitor and switch processes, which I grant you should still be
nonstopped, in this age of two-phased commits across heterogeneous
platforms, there are few transactions that - correctly implemented -
aren't retryable.

Note that we are discussing application, not system processes. Fault
tolerant disk and communication controllers require fault-tolerant
managers. I confess to developing my last new nonstop process over 5
years ago (i.e. beyond available recent memory) simply because the
improvements in transactional facilities and (I guess my) design
techniques reduced the need to do the extra work - it is simply cheaper
to develop and maintain persistent code.

Regarding in-flight EFT transactions, I would be curious to know when
the last completely new nonstop process needed to be developed for an
EFT service that makes use of transaction management tools - although
I'm not sure any Tandem-based EFT systems actually use TMF much less a
two-phase system so that probably isn't fair.

Dave

Sajid Azami

unread,

Jun 27, 2003, 3:33:01 PM6/27/03

to

Hi Dave,

As far as i know Base24, the most popular application on Tandem,
neither use TMF nor used Fault-Tolerant programming tecniques.

The application we use does uses TMF and is running under pathway, so
i guess the simple answer to my query is you need Fault-Tolerant
programming only for system program not for application programs.

Or I'm missing something ...

Thanks

Sajid

s b e c k e r _ n o s p a m@nexbridge_nospam.com Randall S. Becker

unread,

Jun 27, 2003, 3:43:59 PM6/27/03

to

You're missing something.

If your entire world revolves around data persistence, then TMF (or Tuxedo,
or whatever), is going to do it. However, life isn't simple. In some
applications, there's communication-link persistence, and as Dave pointed
out, transaction persistence. I think Dave's point was that the need for
varying degrees of fault-tolerance is business and application dependent.
Yes, of course, there are often non-fault tolerant solutions to many
problems, but they may not be viable in a production environment. Pseudo
real-time manufacturing systems can have too high restart times after a
failure and can require true fault tolerant integration points (go figure
that one out ;-)).

In an EFT classroom, you might be correct. But life is not EFT. Nor does if
have a reset button.

"Sajid Azami" <saji...@hotmail.com> wrote in message
news:8545b7e3.03062...@posting.google.com...

ozbear

unread,

Jun 27, 2003, 8:06:40 PM6/27/03

to

On 26 Jun 2003 21:39:03 -0700, JFre...@myrealbox.com (Julian) wrote:

>Relying on application persistence (such as Pathway) will keep the
>application up and running. However - if you have an application that
>absolutely, positively, CANNOT drop a transaction (such as an EFT
>transaction), then you really need to be using non-stop coding.
>Having Pathway restart a server will not continue to process the
>message that was "in-flight" when the server died.
>

Actually, non-stop programming avails you little/nothing as far as
the handling of in-flight transactions and having sone more than
one EFT/ATM/POS system the handling of in-flight transactions
is properly handled via a non-TMF-audited transaction log.

If one absolutely, positively cannot drop a transaction, then
one has to design for complete application/system loss and a
properly checkpointed backup process won't help because it's
gone too.

New transactions should be journalled at key points with appropriate
state flags so that even after a complete loss of the application/
system incomplete transactions can be rolled-back by the restarted
application and/or reversal transactions generated to external
authorisors (which may have to be repeated for several hours until
the external authorisor gives a response to adjudicate the
incomplete transaction).

The transaction log should be non-TMF-audited so that TMF doesn't
destroy the last state of the in-flight transaction during
recovery.

Oz

s b e c k e r _ n o s p a m@nexbridge_nospam.com Randall S. Becker

unread,

Jun 27, 2003, 10:48:00 PM6/27/03

to

"ozbear" <ozb...@no.spam.bigpond.com> wrote in message
news:3efcd9cf.665026167@news-server...

What EFT/ATM/POS system doesn't support a some variant of a
restart/timeout/reversal process? The NonStop system isn't the weakest link
in the chain, the ATM is. Chances are, either the ATM is going down,
possibly because somebody drove off with the unit, or the POS device is
going to die because somebody dropped it.

As far as losing a date center, there's nothing to prevent you from
implementing a cross-node checkpoint, using the lastest C-style checkpoints
to supplement the traditional CHECKPOINTMANYX calls. Your approach to
journaling isn't going to buy you anything more than TMF - and probably a
lot less, especially considering that you can decouple TMF from your
financial transactions, if you really need to (not recommended by the
author, just possible). There's also RDF and GoldenGate to handle cross-node
transaction persistence.

There is a fundamental difference between the TMF transaction concept and a
business transaction. Business transactions, when finally reconcilled can
span many TMF transactions, when you take reversals into account. Are you
trying to build a transaction monitor on your own that reflects business
transactions? If so, I think you may be missing the point of data-level
transactions.

But to the original poster's question, checkpointing is a very well
established and reliable pattern, regardless of whether you are in a
traditional NSK CHECKPOINT space, or cross system. The closest alternatives,
from a statistical standpoint, are to run: a) full hot standby; or b)
arbitrated parallel. It is required when connection-level and decision-level
reliability cannot be compromised (like in persistent TCP connections, or
Space Shuttle-type sitations). EFT/ATM/POS have a relatively low-levels of
fault-tolerant requirements compared with industrial automation and embedded
systems monitoring and control, so citing that as the definitive example is
going to end up being fairly self-limiting in a discussion on the topic.

QjoeW.Ep...@tmindspring.ycom

unread,

Jun 28, 2003, 2:37:17 PM6/28/03

to

FWIW, the BASE24 application executes as satellite processes of a
"middleware" package called XPNet. XPNet process *do* run a nonstop
process pairs.

XPNet performs functions similar to a PathMon process: it starts the
satellite and sends a startup message. However, XPNet is also
responsible for routing messages between satellite processes (i.e.,
all application messages pass through the XPNet process). The obvious
problem with this is that the XPNet process could become a bottleneck,
but the advantage is that if a satellite process goes down, XPNet
keeps the last message it sent to it (and queues up any new messages)
and resends it when the application process comes back up, with a flag
indicating the message may be a possible duplicate (from the
application's perspective), allowing the application process to deal
with it in whatever way is appropriate.

From the satellite process's perspective, this is all automatic
(assuming it uses an XPNet interface library to initialize its access
to the XPNet process).

That way, XPNet satellite processes don't have be coded as nonstop
process pairs. This has been a facet of BASE24 since the very
beginning (all the way back to the days of its predecessor application
ACI/NET and with XPNet's predecessor Spannet).

XPNet does a lot of other things, too, such as dealing with the I/O
ports for remote devices like ATMs and links to interchanges,
configuration control, etc., but the point is that application
software fault tolerance is provided via XPNet.

BTW: some newer BASE24 products do use TMF (and NonStop SQL).
```
Joe
(remove "q.w.e.r.t.y" to reply by email)

On 27 Jun 2003 12:33:01 -0700, saji...@hotmail.com (Sajid Azami)
spewed forth:

ozbear

unread,

Jun 28, 2003, 8:12:13 PM6/28/03

to

You have missed the point....it was Julian who brought up EFT as a
potential example of an application where nonstop coding was needed.
I was refuting that.

The other examples you cite, RDF, GoldenGate, etc. are examples of
system level frameworks and not examples of application level coding.
I had already stated elsethread that such system level functionality
may be an appropriate areas for nonstop coding.

Try reading for comprehension next time.

Oz

Jim Volstad

unread,

Jun 29, 2003, 3:21:27 PM6/29/03

to

Checkpoints. Brings back memoies of my first job out of college back in
1983. NS II Hardware, no TMF, and COBOL74. Then there was those massive
65xx terminals we used.

Now? Many years older, wiser (??) and grayer.

Randy at Home

unread,

Jun 29, 2003, 3:36:32 PM6/29/03

to

"ozbear" <ozb...@no.spam.bigpond.com> wrote in message

news:3efe2e26.752153539@news-server...

Plonk.

Charlie Lee

unread,

Jun 29, 2003, 9:06:04 PM6/29/03

to

saji...@hotmail.com (Sajid Azami) wrote in message news:<8545b7e3.03062...@posting.google.com>...

Hi Sajid,

I once heard a Tandem consultant said, when Spannet was designed,
there was no Pathway yet or TMF. Spannet and Pathway are alike. Take a
look at the program. You call initialize in Pathway program and
Netinit in Base24 program and etc. There another way (other than using
TMF) to achieve consistent in your database : readlock all the records
you need before updating. Instead of reading and updating CAF then
follow by PBF and other files, you can readlock the record in CAF, PBF
and update PBF and then CAF. You can read Guardian programmer's Guide
on this. BTW, Base24 rel 6 does use TMF.

ozbear

unread,

Jul 15, 2003, 6:21:02 AM7/15/03

to

Please try plonking for comprehension next time.

Oz