"Active" backup programming seems to be usable only with ANSI C file
streams, according to the Guardian C Library Calls Reference Manual.
"Passive" backup programming seems to be usable only with TAL,
according to the Guardian Procedure Calls Reference Manual.
So if I have a program written in C that does things like FILE_OPEN_,
and I want to make it into a backup process pair, too bad, I'm out of
luck?
I can't necessarily convert the FILE_OPEN_ calls to fopen-like calls,
because (for example) I might want to do nowaited I/O.
So I have to rewrite the program in TAL?
Or perhaps just write C prototype wrappers for the various Guardian
"passive backup" functions? But if it's just as simple as that, why
wouldn't there already be such wrappers in CEXTDECS, just like for all
the other Guardian functions?
Am I missing something here?
Thanks in advance.
Bob Vesterman.
You cannot write a successful and general C nonstop program using the
passive backup (e.g. CHECKMONITOR) functions.
"Robert William Vesterman" <bob....@vesterman.com> wrote in message
news:pjvd7015tr93ksqfb...@4ax.com...
>It is possible to write an active backup in both C and TAL. You might raise
>a case if you believe the manuals say otherwise.
>Look at the primary-processhandle argument of FILE_OPEN_(). Also look at
>FILE_GETSYNCINFO_() and FILE_SETSYNCINFO_(). Of course, how you get the
>open and sync information to the backup is entirely up to you. If fact,
>nearly *everything* you do in an active backup program is up to you.
Thank you for the response. I think that I am misunderstanding
something here, though.
"Active" backup programming, as defined by the Guardian Programmer's
Guide, seems to have to do with calls such as _ns_fopen_special,
_ns_backup_fopen, and so forth. According to the Guardian C Library
Calls Reference Manual, these are applicable only for ANSI-C style
file streams (i.e. FILE * streams, not Guardian FILE_OPEN_-based I/O).
"Passive" backup programming, on the other hand, is CHECKMONITOR,
CHECKPOINT and so forth. These functions, according to the Guardian
Procedure Calls manual, are "not supported in C programs".
A look at CEXTDECS, however, seems to indicate that some of those
calls - e.g. CHECKMONITOR - are supported (or, at least, have
prototypes), and that some of them - e.g. CHECKPOINT - are supported
only in TNS, not in TNS/R. So seemingly, contrary to what the manual
says, CHECKMONITOR is okay, and CHECKPOINT is okay unless you compile
with NMC?
Now, let's look at your examples: "primary-processhandle" of
FILE_OPEN_, and FILE_GETSYNCINFO_:
In the manual's FILE_OPEN_ description, the "primary-processhandle"
blurb contains the following statement:
"This option is used only when the backup process is the caller. It
is more common for the primary to perform this operation by a call to
FILE_OPEN_CHECKPT_".
But then FILE_OPEN_CHECKPT_ says that "this passive backup procedure
is not supported in C programs". So is not the
"primary-processhandle" parameter of FILE_OPEN_ related to passive,
rather than active, backup?
Then FILE_GETSYNCINFO_ says, "Typically, FILE_GETSYNCINFO_ is not
called directly by application programs. Instead, it is called
indirectly by CHECKPOINT".
And again, CHECKPOINT is related to passive - not active - backup, and
isn't supported by C (according to the manual), or by TNS/R (according
to CEXTDECS).
So maybe you're saying that "active" backup programming is actually
some combination of stuff like _ns_backup_fopen along with some, but
not all, of the calls typically associated with passive backup
programming? That is, with, for example, FILE_OPEN_CHECKPT_, but not
CHECKPOINT?
If so, the Guardian Programmers' Guide doesn't seem to contain any
examples of how to mix the two things. It only contains a couple of
far-too simplistic examples using only the _ns_* calls. There seems
to be basically no information in there about how to use the "old"
calls, let alone the old calls in combination with the new calls.
In case it's not clear from the above, I'm not arguing; I'm just
laying out some of the things that are confusing me. I am totally new
to both active and passive backup programming.
Thanks again,
Bob Vesterman.
>I am currently reading about programming "backup" process pairs.
>
>"Active" backup programming seems to be usable only with ANSI C file
>streams, according to the Guardian C Library Calls Reference Manual.
>
>"Passive" backup programming seems to be usable only with TAL,
>according to the Guardian Procedure Calls Reference Manual.
You can also write process pairs using passive backup in NonStop
COBOL.
Jeff Lanam HP NonStop COBOL Project INCITS/J4
NonStop Enterprise Division
Hewlett-Packard
HP/Tandem had to invent __ns_backup_fopen and the like either because the
function has a signature not defined by HP (e.g., fopen) or because a C
runtime specific function was needed (e.g., __ns_fget_file_state). On the
other hand, FILE_OPEN_ was always intended to support NonStop processes and
so has specific parameters for that reason.
The passive backup model (CHECKPOINT in the primary and CHECKMONITOR in the
backup) works so long as all the state information is visible to the
application programmer or is handled by CHECKPOINT (e.g., file sync id).
Two examples of hidden information are file stream information and metadata
for dynamic memory allocation (e.g., pool tags). To put it another way, the
C runtime maintains information and so checkpointing the stack, buffers, and
Guardian file information is not sufficient. (COBOL has built checkpointing
into the library and so is different.)
A reason to write an TAL active backup is that CHECKPOINT is a blocking
call. With care, it is sometimes possible for the primary of an active
backup to send asynchronous checkpoints and for the backup to reply before
it has completed processing the incoming message. Also, a carefully
constructed active backup needs less information that a typical passive
backup. An example of a pTAL active backup is PATHTCP2.
The "Guardian Programmer's Guide" addresses *only* active backups for C
programs. The statement "An active backup program must run in the Common
Run-Time Environment (CRE)." is true because this appears in a section
devoted to Fault-Tolerant Programming in C. It is not meant to mean that
*all* active backups must use the CRE but only that all C active backups
must use the CRE.
The primary-processhandle of FILE_OPEN_ is for use in an active backup. The
primary of a active backup would have to send a message to the backup,
instructing it to open a file with certain options. The backup would have
explicit code to receive the message and act on it. A passive backup would
use FILE_OPEN_CHECKPT_ and the backup would open the file "magically". (Of
course, there is no magic, so the backup open is done inside CHECKMONITOR.)
Also see my inline comments.
"Robert William Vesterman" <bob....@vesterman.com> wrote in message
news:5m3e70lb9dgp0rpbp...@4ax.com...
> On Fri, 09 Apr 2004 20:54:04 GMT, "Ben Voris \(replace
> myfirstname.mylastname in email\)" <myfirstname...@hp.com>
> wrote:
>
> >It is possible to write an active backup in both C and TAL. You might
raise
> >a case if you believe the manuals say otherwise.
> >Look at the primary-processhandle argument of FILE_OPEN_(). Also look at
> >FILE_GETSYNCINFO_() and FILE_SETSYNCINFO_(). Of course, how you get the
> >open and sync information to the backup is entirely up to you. If fact,
> >nearly *everything* you do in an active backup program is up to you.
>
> Thank you for the response. I think that I am misunderstanding
> something here, though.
>
> "Active" backup programming, as defined by the Guardian Programmer's
> Guide, seems to have to do with calls such as _ns_fopen_special,
> _ns_backup_fopen, and so forth. According to the Guardian C Library
> Calls Reference Manual, these are applicable only for ANSI-C style
> file streams (i.e. FILE * streams, not Guardian FILE_OPEN_-based I/O).
This is because this manual discusses only active backups written in C.
This is not meant to imply that active backups can only be written in C.
The __ns* functions are applicable only to a CRE C program (which includes
native C). Active backups have been written in TAL for a long time.
Because TAL has no runtime, they are entirely "roll your own."
>
> "Passive" backup programming, on the other hand, is CHECKMONITOR,
> CHECKPOINT and so forth. These functions, according to the Guardian
> Procedure Calls manual, are "not supported in C programs".
They do not work correctly for C programs. Too much of the state of a C
program is invisible to CHECKPOINT, etc.
>
> A look at CEXTDECS, however, seems to indicate that some of those
> calls - e.g. CHECKMONITOR - are supported (or, at least, have
> prototypes), and that some of them - e.g. CHECKPOINT - are supported
> only in TNS, not in TNS/R. So seemingly, contrary to what the manual
> says, CHECKMONITOR is okay, and CHECKPOINT is okay unless you compile
> with NMC?
CHECKPOINT is *not* OK for a C program.
>
> Now, let's look at your examples: "primary-processhandle" of
> FILE_OPEN_, and FILE_GETSYNCINFO_:
>
> In the manual's FILE_OPEN_ description, the "primary-processhandle"
> blurb contains the following statement:
>
> "This option is used only when the backup process is the caller. It
> is more common for the primary to perform this operation by a call to
> FILE_OPEN_CHECKPT_".
>
> But then FILE_OPEN_CHECKPT_ says that "this passive backup procedure
> is not supported in C programs". So is not the
> "primary-processhandle" parameter of FILE_OPEN_ related to passive,
> rather than active, backup?
The primary-processhandle parameter of FILE_OPEN_ is applicable only to an
active backup. FILE_OPEN_CHECKPT_ can only be successfully called by the
primary of a passive backup.
>
> Then FILE_GETSYNCINFO_ says, "Typically, FILE_GETSYNCINFO_ is not
> called directly by application programs. Instead, it is called
> indirectly by CHECKPOINT".
>
> And again, CHECKPOINT is related to passive - not active - backup, and
> isn't supported by C (according to the manual), or by TNS/R (according
> to CEXTDECS).
>
> So maybe you're saying that "active" backup programming is actually
> some combination of stuff like _ns_backup_fopen along with some, but
> not all, of the calls typically associated with passive backup
> programming? That is, with, for example, FILE_OPEN_CHECKPT_, but not
> CHECKPOINT?
See above.
>
> If so, the Guardian Programmers' Guide doesn't seem to contain any
> examples of how to mix the two things. It only contains a couple of
> far-too simplistic examples using only the _ns_* calls. There seems
> to be basically no information in there about how to use the "old"
> calls, let alone the old calls in combination with the new calls.
>
There are no examples of mixing the two models because it doesn't work.
You are not missing anything. The HP NonStop Server documentation,
however, are missing a lot, and tend to make up for lack of system
understanding by making up things that simply aren't true. Truly, the
current "NonStop" documentation sucks... The C/C++ Programmer's Guide
(Section 4, page 8) manual describes the "active" backup model as
follows:
Quote
In active backup programming, processes run in pairs: a primary
process that performs the tasks of the underlying application, and a
backup process that is ready to take over execution from the primary
process if the primary process or processor fails. Active backup
programs have the following characteristics:
• Active backup uses process pairs to achieve fault tolerance.
• The primary process sends state information to the backup process.
State information is information about the run-time environment that
is required for the backup process to take over for the primary
process.
• The backup process receives state information from the primary
process, detects a failed primary process or CPU, and takes over
execution.
An active backup program executes as a primary and backup process pair
running the same program file. The primary and backup processes
perform interprocess communication. The primary process sends
critical data to the backup process. This critical data serves two
purposes: to provide sufficient information to enable the backup
process to resume application processing (file state and application
state information), and to indicate to the backup process where it
should logically resume application.
End quote
This looks good at first glance, until one realizes that all one has
to do in order to have this section of text describe the so called
"passive backup" model is: replace the word "active" with the word
"passive"! What the above actually describes are the basic principles
for all fault tolerance processing, never mind if the "passive" or
"active" model is used.
I have been programming fault tolerant TNS process pairs since 1976.
The above is a perfectly good description of (a small part of) what
all fault tolerant process pairs must do. What the
neo-faulttolerantists call a "passive" backup (that is, a process
using the Guardian checkpointing facilities) is just as "active" as
one that does not, they both still have to do exactly the same thing.
To say that the passive model uses checkpointing and that the "active
model" uses inter-process communication to keep the backup ready for
take-over, simply reveals that the writer doesn't have a clue as to
what s/he's writing about (that the Guardian checkpointing facilities
uses inter-process communication to get the checkpoint data across).
F.Y.I., the term "active backup" was coined to describe how processes
like DP2 get away with doing one thing in the primary (DP2: disk file
access management) and another in the backup (DP2: TMF related stuff,
plus of course handling the checkpoint information that the primary
regularly sends across). The "passive model" is only "passive" in the
sense that it never does any application specific processing, it only
processes checkpoint and system messages that indicate a primary
failure. That is, with an "active" backup you can do really nifty
things, like letting the backup do co-processing, in a true parallel
fashion.
Worse still: in almost all manuals dealing with any aspect of fault
tolerance, there is a statement that is repeated as many times as
there are references to fault tolerance, namely: "Refer to the
Guardian Programmer's Guide for complete details on active backup
programming in C." Complete details, that is such a joke: the
examples given are way too simplistic (keeping a counter in sync?
Wow!), and no guidelines at all are given as to what else a fault
tolerant process pair must do, such as backup process management,
requester management, message management, properly handling duplicate
messages (detecting, upon a takeover, whether a failed-in-midstream
request was completed or not in the primary before it failed),
misrouted, and obsolete messages, and lots of other details. The art
of fault tolerance programming has been considered, and maybe with
some justification, rather difficult. But with good documentation, it
doesn't have to be!
The manual Availability Guide For Application Design (Section 6 Page
1) offers the following sparkling gem:
Quote
This section provides an overview of process pairs in application
design and some guidelines about when it might be appropriate to use
them. Two models are described:
• Passive backup, in which the backup process passively waits to be
activated in the event that the primary process fails
• Active backup, in which the backup process actively updates its
state while listening for system messages
Most process pairs involve copying state information from the primary
process to the backup process. B ecause the mechanisms for doing this
are significantly different, different terminology is used to describe
each mechanism. For passive backup process pair, we talk about
checkpointing information to the backup. For active backup process
pairs, we talk about updating state information in the backup.
End quote
First: a "passive backup" does not "passively wait" for the primary to
fail. It receives state information from its primary, in checkpoint
messages, which it uses to actively update its own process state.
Second: just like a "passive" backup, an "active" backup also receives
state information from its primary, which it uses to actively update
its own process state (however, it does not do this "while listening
for system messages", it serially processes one message at the time
from the receive queue)
Third: in both cases, after updating the backup process state with
process state information received in a checkpoint message, the backup
goes back to waiting for system messages, and for more checkpoint
messages.
Fourth: the "mechanisms for doing this" (that is, for copying state
information from the primary to the backup), are *exactly the same*
(not even slightly different), namely use WRITE() or WRITEREAD() in
the primary, to send state information to the backup, and in the
backup use READUPDATE() to read the information off the process
message queue, apply the process state updates (whatever), then call
REPLY()to tell the primary that it can safely proceed. The only
"significant difference" is that in the passive model, this backup
processing is done inside CHECKMONITOR(), and in the active model it
must be programmed by the user—it is still utilizing exactly the same
mechanism.
Again, the above quoted text implies that there is some magic
difference between checkpointing information and updating process
state information, but in fact there is no difference whatsoever!
Checkpoint information *is* process state information, and to get that
information from the primary to the backup, inter-process
cummunication is used. (Sheesh!)
For people new to the TNS platform, this *has* to be very confusing
(as your questions also indicate!). Part of the reason for this is,
probably, that today it may be rather difficult to find an
analyst/programmer (or technical writer ;)) that understands fault
tolerant programming.
So let me try to give you the real story on what can and cannot be
done in C!
First of all: the only difference between the so called "active" and
"passive" models is that in the "active" model (which is, by the way,
what most Guardian system processes use) is that the active model does
not use the Guardian checkpointing facilities. Instead, the backup
reads its receive queue by calling READUPDATE(), just as the primary
does, and the programmer must do implement the code that handles user
messages received by the backup (that is, the checkpoint messages,
containing process state information—assuming that it has been
properly coded no other user messages are ever processed by a
"passive" backup).
When the Guardian checkpointing facilities are used, the backup never
exits procedure CHECKMONITOR() (if it does, it is either because of an
error condiction (primary sends a checkpoint message before doing the
initial stack checkpoint) or because the process just took over as
primary), which handles the process state update and associated system
messages, whereas in the "active backup" model, the work done by
CHECKMONITOR() must be implemented by user code.
The key to understanding the difference is to know *why*
CHECKMONITOR() is not supported in C (the manuals don't tell us this):
the reason is that when the function main() gets control by the C
runtime library (or common runtime library (CRE), as the case may be),
$receive has already been opened (by the RTE), messages has already
been read, and it is simply *too late* to call CHECKMONITOR()!
However, this does *not at all* mean that you can't code fault
tolerant process pairs using the Guardian checkpointing facilities
from C.
All you need to do is to write a diminutive TAL procedure with the
attribute MAIN, and call CHECKMONITOR() from there. After this,
program away in C, and feel free to use all the "not supported" (a
blatant cop-out, the writer didn't want to learn enough about the
system to enable her/him to give the customers (and internal HP
programmers!) correct information!). As you have already noted, the
headers for the Guardian checkpointing facility functions are all
there (in $system.system.cextdecs), and that's pretty good proof that
they are, indeed, supported!
There is one restriction, however, if you wish to use native mode C
(nmc): in this case, you cannot use a TAL MAIN procedure, as native
mode requires a C main() function.
So, as a consequence, for native mode C (nmc) the "active" model is
*required*.
Your observations and questions tell me that although new to fault
tolerance, you are quite well versed in TNS system "navigation," and
you ask (not at all in an argumentative way, btw) good and well
pointed questions, all arising from the lack of proper documentation
(and outright misleading ditto).
Even though I know it may be frowned upon by some, I'll throw in a
little "commercial":
My company, MicroTech Consulting, has developed a commercially
available software package called "The C/TAL Non Stop Environment with
Multi Threading Support" (NSE/MT for short). This fault tolerance
application development tool allows you to write fault tolerant
applications in C/nmc or TAL/pTAL (NSE/MT comes with two libraries
containing exactly the same functions, one used for TNS mode (nselib)
and one used for native mode (nselibr)), with the only restriction
being: when using nmc, the function named main() must be written in
nmc, and no function may have the attribute MAIN.
Furthermore, with NSE/MT you can code fault tolerant C applications
with no need to know anything about esoteric proprietary Guardian
stuff: just hire a C programmer, and with NSE/MT, that programmer will
quickly become productive!
NSE/MT handles *all* aspects of fault tolerance, except of course the
checkpoint messages—these must still be put in their proper places
(and receive the proper information!) by the "user code" programmer.
On our website (URL www.microtechnonstop.com) you will find all the
information you need. Simply send us an email with the manufaturing
serial number of your NonStop Server system—you'll find it in the EDIT
file $system.sysnn.rlseid (where nn is the current load id)—and we'll
send you a time-restricted (60 days, unless we negotiate something
else) NSE/MT license, so you can evaluate the usefulness of the
software in your own environment.
Good Luck!
Henry Norman
MicroTech Consulting
www.microtechnonstop.com
PS. With a name like Vesterman, I'm willing to bet that one of your
ancestors came from Roslagen, the northern part of the Baltic Sea
archipelago outside Stockholm… True or false?
>Hello Robert!
>
<snip mostly good stuff>
>The key to understanding the difference is to know *why*
>CHECKMONITOR() is not supported in C (the manuals don't tell us this):
>the reason is that when the function main() gets control by the C
>runtime library (or common runtime library (CRE), as the case may be),
>$receive has already been opened (by the RTE), messages has already
>been read, and it is simply *too late* to call CHECKMONITOR()!
>However, this does *not at all* mean that you can't code fault
>tolerant process pairs using the Guardian checkpointing facilities
>from C.
>
>All you need to do is to write a diminutive TAL procedure with the
>attribute MAIN, and call CHECKMONITOR() from there. After this,
>program away in C, and feel free to use all the "not supported" (a
>blatant cop-out, the writer didn't want to learn enough about the
>system to enable her/him to give the customers (and internal HP
>programmers!) correct information!). As you have already noted, the
>headers for the Guardian checkpointing facility functions are all
>there (in $system.system.cextdecs), and that's pretty good proof that
>they are, indeed, supported!
>
>There is one restriction, however, if you wish to use native mode C
>(nmc): in this case, you cannot use a TAL MAIN procedure, as native
>mode requires a C main() function.
>
>So, as a consequence, for native mode C (nmc) the "active" model is
>*required*.
This is incorrect. One can use a small pTAL routine as the main
program and link with nmc. Both pTAL and nmc create code 700 files
and the nld linker will be quite happy to join them into an
executable.
The keys to successfully using the passive mode backups with C are
1) Link a small TAL or pTAL main program which doesn't do anything
except call your first C function where you logically want program
execution to start.
2) Because of (1), no run time environment has been set up that
that some C RTL functions might need, particularly the fprintf
family of routines.
3) Because of (2), you must not use any C standard library routines
that might depend on the environment having been created. This
means no C fopen, fclose or other C-file i/o calls or memory
management function (malloc/calloc, etc). Besides, even if you
did manage to get the environment created, you wouldn't have the
addresses of what to CHECKPOINT or know when to CHECKPOINT.
4) Stay with standard Guardian I/O calls (FILE_OPEN_, READX, WRITEX,
etc) just as you would with a TAL program and CHECKPOINT your
i/o buffers and stack just as you would normally.
5) Allocate your own memory segments (flat or selectable) and use
jacketed DEFINEPOOL/GETPOOL/PUTPOOL or your own allocator
routines for dynamic memory management.
If you have any specific questions, feel free to ask.
Oz