Intracare Implementation Log Episode 8: Power outage restart.

82 views
Skip to first unread message

Ignacio Valdes

unread,
Aug 14, 2008, 6:49:15 PM8/14/08
to hardhats
Hello all, We had a power outage with the server going totally down.
Here is a terminal session dump of what we had to do to get the
taskman re-started. The only thing that isn't as it is below is that
one enters 't' and gets taskman up and running. This shows several
useful commands such as how to find out where gtm exists as well as
showing how I was in the right id but in the wrong user space
/home/ivaldes instead of being in /home/vista/EHR

login as: ivaldes
ivaldes@IP address password:
Last login: Thu Aug 14 16:59:22 2008 from south124.ich.local
[ivaldes@vista ~]$ su
Password:
[root@vista ivaldes]# su vista
[vista@vista ivaldes]$ gtm

GTM>S DUZ=9

GTM>D ^XUP

Setting up programmer environment
GTM>W $ZS
150374954,XUP+4^XUP,%GTM-E-REQRUNDOWN, Error accessing database /home/vista/EHR/
g/mumps.dat. Must be rundown on cluster node vista.ich.local.
GTM>h
[vista@vista ivaldes]$ echo $GTM_DIST

[vista@vista ivaldes]$ alias GTM
alias GTM='/usr/local/gtm/mumps -direct'
[vista@vista ivaldes]$ alias gtm
alias gtm='/usr/local/gtm/mumps -direct'
[vista@vista ivaldes]$ /usr/local/gtm/mupip rundown
[vista@vista ivaldes]$ /usr/local/gtm/mupip rundown -r "*"
%GTM-I-MUFILRNDWNSUC, File /home/vista/EHR/g/mumps.dat successfully rundown
[vista@vista ivaldes]$ gtm

GTM>S DUZ=9

GTM>D ^XUP

Setting up programmer environment
This is a TEST account.

Terminal Type set to: C-VT320

Select OPTION NAME: EVE
1 EVE Systems Manager Menu
2 EVENT CAPTURE (ECS) EXTRACT AU ECX ECS SOURCE AUDIT Event Capture
(ECS) Extract Audit
3 EVENT CAPTURE DATA ENTRY ECENTER Event Capture Data Entry
4 EVENT CAPTURE EXTRACT ECXEC Event Capture Extract
5 EVENT CAPTURE MANAGEMENT MENU ECMGR Event Capture Management Menu
Press <RETURN> to see more, '^' to exit this list, OR
CHOOSE 1-5: 1 EVE Systems Manager Menu

WARNING -- TASK MANAGER DOESN'T SEEM TO BE RUNNING!!!!

Select Systems Manager Menu Option: taskman Management

WARNING -- TASK MANAGER DOESN'T SEEM TO BE RUNNING!!!!

Select Taskman Management Option: taskman Management Utilities

Select Taskman Management Utilities Option: r
1 Remove Taskman from WAIT State
2 Restart Task Manager
CHOOSE 1-2: 2 Restart Task Manager
ARE YOU SURE YOU WANT TO RESTART TASKMAN? NO//YES (YES)
Restarting...%GTM-E-JOBFAIL, JOB command failure
%GTM-I-TEXT, Error redirecting stdout (creat) to _ZTM0.mjo
%SYSTEM-E-ENO13, Permission denied

%GTM-E-JOBFAIL, JOB command failure
%GTM-I-TEXT, Failed to set STDIN/OUT/ERR for the job

GTM>h
[vista@vista ivaldes]$ whoami
vista
[vista@vista ivaldes]$ pwd
/home/ivaldes
[vista@vista ivaldes]$ cd /home/vista
[vista@vista ~]$ cd log
bash: cd: log: No such file or directory
[vista@vista ~]$ echo $gtmgbldir
/home/vista/EHR/g/mumps.gld
[vista@vista ~]$ cd EHR
[vista@vista EHR]$ ls
env2 g logs o r WVEHR-gui WVEHR-gui.log WVEHR-VOE1.0-GTM-Routines.tgz
[vista@vista EHR]$ cd logs
[vista@vista logs]$ ls
XWBTCPL.mje XWBTCPL.mjo
[vista@vista logs]$ gtm

GTM>S DUZ=9

GTM>D ^XUP

Setting up programmer environment
This is a TEST account.

Terminal Type set to: C-VT320

Select OPTION NAME: EVE
1 EVE Systems Manager Menu
2 EVENT CAPTURE (ECS) EXTRACT AU ECX ECS SOURCE AUDIT Event Capture
(ECS) Extract Audit
3 EVENT CAPTURE DATA ENTRY ECENTER Event Capture Data Entry
4 EVENT CAPTURE EXTRACT ECXEC Event Capture Extract
5 EVENT CAPTURE MANAGEMENT MENU ECMGR Event Capture Management Menu
Press <RETURN> to see more, '^' to exit this list, OR
CHOOSE 1-5: 1 EVE Systems Manager Menu

WARNING -- TASK MANAGER DOESN'T SEEM TO BE RUNNING!!!!

Select Systems Manager Menu Option: taskman Management

WARNING -- TASK MANAGER DOESN'T SEEM TO BE RUNNING!!!!

Select Taskman Management Option: taskman Management Utilities

Select Taskman Management Utilities Option: r
1 Remove Taskman from WAIT State
2 Restart Task Manager
CHOOSE 1-2: 2 Restart Task Manager
ARE YOU SURE YOU WANT TO RESTART TASKMAN? NO//YES (YES)
Restarting...TaskMan restarted!


Select Taskman Management Utilities Option: mtm Monitor Taskman


Checking Taskman. Current $H=61222,63435 (Aug 14, 2008@17:37:15)
RUN NODE=61222,63423 (Aug 14, 2008@17:37:03)
Taskman is current..
Checking the Status List:
Node weight status time $J
EHR:vista RUN T@17:37:03 3863 Main Loop

Checking the Schedule List:
Taskman has 1 task scheduled.
It is not overdue.

Checking the IO Lists:
There are no tasks waiting for devices.

Checking the Job List:
There are no tasks waiting for partitions.
For EHR:CACHEWEB there is 1 tasks. Out Of Service

Checking the Task List:
There are no tasks currently running.
On node EHR:vista there is 1 free Sub-Manager(s). Status: Run

Enter monitor action: UPDATE// ^

Select Taskman Management Utilities Option: halt

Do you really want to halt? YES//


Logged out at Aug 14, 2008 5:37 pm
GTM>h
[vista@vista logs]$

K.S. Bhaskar

unread,
Aug 14, 2008, 8:10:11 PM8/14/08
to Hard...@googlegroups.com
Ignacio --

You really ought to consider journaling. See how it's set up on the
latest Toasters, for example, and see how simple it is. The Toaster has
a small shell script that automatically recovers the database from the
journal file on boot up and even starts up Taskman. Of course, if you
like to practice typing... 8-)

Regards
-- Bhaskar

On 08/14/2008 06:49 PM, Ignacio Valdes wrote:
>
> Hello all, We had a power outage with the server going totally down.
> Here is a terminal session dump of what we had to do to get the
> taskman re-started. The only thing that isn't as it is below is that
> one enters 't' and gets taskman up and running. This shows several
> useful commands such as how to find out where gtm exists as well as
> showing how I was in the right id but in the wrong user space
> /home/ivaldes instead of being in /home/vista/EHR
>

[KSB] <...snip...>

_____________

The information contained in this message is proprietary and/or confidential. If you are not the
intended recipient, please: (i) delete the message and all copies; (ii) do not disclose,
distribute or use the message in any manner; and (iii) notify the sender immediately. In addition,
please be aware that any message addressed to our domain is subject to archiving and review by
persons other than the intended recipient. Thank you.
_____________

I, Valdes

unread,
Aug 15, 2008, 9:15:26 AM8/15/08
to Hardhats
Many years as a software engineer before medical school ruined the joy
of typing as well as video games for me... Can you please post the
script to this thread? -- IV

K.S. Bhaskar

unread,
Aug 15, 2008, 9:50:11 AM8/15/08
to Hard...@googlegroups.com
Ignacio --

You can adapt the following to your needs. You will need to turn on
before-image journaling.

The script /etc/init.d/wvehrvoe10 is automatically executed by the
system when it is booted or shut down:
------------------------------------------------------------------------
#! /bin/bash
### BEGIN INIT INFO
# Provides: wvehrvoe10
# Required-Start: $local_fs
# Required-Stop: $local_fs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: PIP V0.1
# Description: Starts and Stops WorldVisA EHR VOE/ 1.0
### END INIT INFO

# Author: K.S. Bhaskar <bha...@worldvista.org>

# Do NOT "set -e"

NAME=wvehrvoe10
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="WorldVistA EHR VOE/ 1.0"
SCRIPTNAME=/etc/init.d/$NAME


#
# Function that starts WorldVistA EHR VOE/ 1.0
#
do_start()
{
su -c /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstart wvehr
}

#
# Function that stops WorldVistA EHR VOE/ 1.0
#
do_stop()
{
su -c /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstop wvehr
}

case "$1" in
start)
do_start
;;
stop)
do_stop
;;
restart|force-reload)
do_stop
do_start
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
exit 3
;;
esac

:
------------------------------------------------------------------------

It calls the script /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstart to
recover the database (effectively a no-op if it was shut down cleanly,
starts Taskman, and removes journal files that are more than three days
old (this is for a demo; adjust to your needs):
------------------------------------------------------------------------
#!/bin/bash
cd `dirname $0`
rm -f tmp/*.mj[oe]
source ./env
$gtm_dist/mupip journal -recover -backward g/mumps.mjl \
&& $gtm_dist/mupip set -journal="enable,on,before" -file g/mumps.dat \
&& ./run START^ZTMB
find g -iname mumps.mjl_* -mtime +3 -exec rm -v {} \;
------------------------------------------------------------------------

The script /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstop stops Taskman and
attempts a clean shut down (not always possible):
------------------------------------------------------------------------
#!/bin/bash
cd `dirname $0`
source ./env
./run STOP^ZTMKU <<EOF
y
y
h
EOF
sleep 5
ps -ef | grep mumps | grep -v grep | awk '{print $2}' | xargs kill
2>/dev/null
------------------------------------------------------------------------

I use a small script /opt/wvehrvoe10/gtm_V5.3-001_i686/env to set
environment variables:
------------------------------------------------------------------------
# env - file to be sourced to create VistA environment
#
# This temporary version of the commands to set up the VistA
# environment assumes that the parent and child use the same
# version of GT.M.

export gtmver=`basename $PWD`
if [[ -d ../parent ]] ; then
pushd ../parent/$gtmver 1>/dev/null
source ./env
popd 1>/dev/null
fi

tmp=`dirname $PWD`
tmp0="$PWD/o($PWD/p $PWD/r $tmp/p $tmp/r)"

# If there is an existing $routines, this environment comes before it
if [[ -n $routines ]] ; then
export routines="$tmp0 $routines"
else
export routines="$tmp0"
fi

# If a mumps.dat exists (vs. mumps.dat.gz) then this a usable environment
if [[ -f $PWD/g/mumps.dat ]] ; then export vista_home=$tmp ; fi

source gtm/gtmprofile
export gtmgbldir=$PWD/g/mumps.gld
export gtmroutines="$routines $gtm_dist"
------------------------------------------------------------------------

The net of this is that when the Toaster boots, the database is
recovered, and Taskman started. It doesn't matter whether the system
was shut down cleanly or whether it crashed. I suggest that production
VistA environments, especially in non-ASP environments, be set up along
the lines of the Toaster.

Regards
-- Bhaskar

On 08/15/2008 09:15 AM, I, Valdes wrote:
>
> Many years as a software engineer before medical school ruined the joy
> of typing as well as video games for me... Can you please post the
> script to this thread? -- IV
>

_____________

Nancy Anthracite

unread,
Aug 15, 2008, 10:30:47 AM8/15/08
to Hard...@googlegroups.com
Note that that using the script to start and stop VistA itself is not
recommended.

The menu system should be used for starting the system, and if you insist on
using a script, Expect would be preferable as it would use the menu system.
Currently AND the correct routine that runs with the option that is used for
Taskman in the Menu system is RESTART^ZTMB.

By using the menu system, you know as best as is possible that patches and
checks and balances will be taken into account.

There is a similar startup routine that directly calls routines for starting
VistA for use with Cache circulating.

Doing things the "easy way" looks great when you want to do a demo, but for
productions systems, think seriously about using the menu system. You can
consolidate several items in the menu system into one menu if that would make
it easier for you, but please don't circumvent the checks and balances.


--
Nancy Anthracite

K.S. Bhaskar

unread,
Aug 15, 2008, 10:35:02 AM8/15/08
to Hard...@googlegroups.com
Nancy --

Whether for production or for demo purposes, the reason to script
Taskman startup is to facilitate the packaging of VistA as an
appliance. Are you saying that the wvehrstart script should use
RESTART^ZTMB instead of RESTART^ZTMB?

Regards
-- Bhaskar

Nancy Anthracite

unread,
Aug 15, 2008, 10:57:24 AM8/15/08
to Hard...@googlegroups.com, K.S. Bhaskar
RESTART instead of START, yes.

> > >- #! /bin/bash

> > >- #!/bin/bash


> > > cd `dirname $0`
> > > rm -f tmp/*.mj[oe]
> > > source ./env
> > > $gtm_dist/mupip journal -recover -backward g/mumps.mjl \
> > > && $gtm_dist/mupip set -journal="enable,on,before" -file g/mumps.dat \
> > > && ./run START^ZTMB
> > > find g -iname mumps.mjl_* -mtime +3 -exec rm -v {} \;
> > > -----------------------------------------------------------------------
> > >-
> > >
> > > The script /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstop stops Taskman
> > > and attempts a clean shut down (not always possible):
> > > -----------------------------------------------------------------------

> > >- #!/bin/bash


> > > cd `dirname $0`
> > > source ./env
> > > ./run STOP^ZTMKU <<EOF
> > > y
> > > y
> > > h
> > > EOF
> > > sleep 5
> > > ps -ef | grep mumps | grep -v grep | awk '{print $2}' | xargs kill
> > > 2>/dev/null
> > > -----------------------------------------------------------------------
> > >-
> > >
> > > I use a small script /opt/wvehrvoe10/gtm_V5.3-001_i686/env to set
> > > environment variables:
> > > -----------------------------------------------------------------------

> > >- # env - file to be sourced to create VistA environment


--
Nancy Anthracite

kdt...@gmail.com

unread,
Aug 15, 2008, 6:51:03 PM8/15/08
to Hardhats
Bhaskar,

I was looking through this script. It looks to me like you are
preloading responses for the mumps routine. I was trying to figure
out how to do this a year ago and never got a good answer.

So what are you doing here? It looks like you are redirecting
standard input. What does that EOF do?

Thanks
Kevin


#!/bin/bash
cd `dirname $0`
source ./env
./run STOP^ZTMKU <<EOF
y
y
h
EOF
sleep 5
ps -ef | grep mumps | grep -v grep | awk '{print $2}' | xargs kill
2>/dev/null



On Aug 15, 9:50 am, "K.S. Bhaskar" <ks.bhas...@fnis.com> wrote:
> Ignacio --
>
> You can adapt the following to your needs.  You will need to turn on
> before-image journaling.
>
> The script /etc/init.d/wvehrvoe10 is automatically executed by the
> system when it is booted or shut down:
> ------------------------------------------------------------------------
> #! /bin/bash
> ### BEGIN INIT INFO
> # Provides:          wvehrvoe10
> # Required-Start:    $local_fs
> # Required-Stop:     $local_fs
> # Default-Start:     2 3 4 5
> # Default-Stop:      0 1 6
> # Short-Description: PIP V0.1
> # Description:       Starts and Stops WorldVisA EHR VOE/ 1.0
> ### END INIT INFO
>
> # Author: K.S. Bhaskar <bhas...@worldvista.org>

K.S. Bhaskar

unread,
Aug 15, 2008, 11:42:49 PM8/15/08
to Hard...@googlegroups.com
The bash construct (which works on many shells) is, when there is a
command such as:

grvb -mbg kvtz <<GLZNOP
oinad
mnjbz
GLZNOP

it means run the command grvb -mbg kvtz, and as its STDIN (standard
input) feed the lines oinad and mnjbz. The GLZNOP on the command line
tells it the marker to look for, and the GLZNOP on a line by itself is a
marker that says no more input is available for the command. EOF is
just slightly more readable to programmers than GLZNOP, but the shell
doesn't care - it just matches the word after the << and the word on a
line by itself.

Regards
-- Bhaskar

On 08/15/2008 06:51 PM, kdt...@gmail.com wrote:
>
> Bhaskar,
>
> I was looking through this script. It looks to me like you are
> preloading responses for the mumps routine. I was trying to figure
> out how to do this a year ago and never got a good answer.
>
> So what are you doing here? It looks like you are redirecting
> standard input. What does that EOF do?
>
> Thanks
> Kevin
>
>
> #!/bin/bash
> cd `dirname $0`
> source ./env
> ./run STOP^ZTMKU <<EOF
> y
> y
> h
> EOF
> sleep 5
> ps -ef | grep mumps | grep -v grep | awk '{print $2}' | xargs kill
> 2>/dev/null
>
>

_____________

kdt...@gmail.com

unread,
Aug 16, 2008, 9:54:00 AM8/16/08
to Hardhats
VERY helpful! Thanks. This opens all kinds of possibilities....

Thanks again,
Kevin

Branden Tanga

unread,
Aug 17, 2008, 7:50:52 AM8/17/08
to Hardhats
Hello,

While using GT.M journaling is a good idea, that doesn't necessarily
mean that you can always recover your VistA database. This is due to
the fact that GT.M journals on the GT.M level, which is sets and
kills. VistA operates at the Fileman and business logic level, where
one Fileman command is made up of multiple sets and kills.
Unfortunately, VistA nor Fileman has journaling at it's own level.

So let's say that you have a task in taskman that is executing a
Fileman command, which in turn is made up of 10 GT.M sets. Your server
dies in the middle of that command, at GT.M set 5. GT.M journaling
will allow you to recover to GT.M set 5, but your Fileman call never
finished, and you cannot automatically roll back past GT.M set 1
because Fileman has no journal record of it's own, marking set 1. You
can manually roll back GT.M past set 1, but that means that YOU the
programmer has to know what was being executed, and know to which GT.M
set you have to roll back to.

Now imagine if you have multiple tasks running concurrently when your
server goes down. GT.M will recover happy as a clam, but you will have
multiple Fileman calls in various states of completion. What if
rolling back past one Fileman call puts another Fileman call in an
invalid state? To my knowledge, you cannot roll forward or back
through a GT.M journal file based on process id (please correct me if
I am wrong here). So all your sets and kills across all your processes
are interspersed with each other in the GT.M log.

So what do you do? When I have lost a server and ended up with the
results of an incomplete Fileman call, I had to find the incomplete
globals and edit them appropriately. Luckily, for my close calls the
end user was available to tell me what they were doing. That made it
much easier to find what globals were affected. Thus I have never
rolled back through a GT.M journal as a result of server failure, I
have only moved forward fixing errors as I find them.

Apologies if you already knew this, but I'm not sure how many people
have thought of the ramifications caused by VistA not having a
journaling system of its own.


Branden Tanga

P.S. I know that GT.M has the capabilities for an application to
leverage its journal file, in essence bringing the journal file to the
level of your business logic. Unfortunately, VistA does not take
advantage of anything like that, and the VistA or Fileman routines
would have to be edited.

K.S. Bhaskar

unread,
Aug 18, 2008, 9:47:22 AM8/18/08
to Hard...@googlegroups.com
Branden, this is not a GT.M issue, but rather, as you note, a
VistA/Fileman design issue, in that while the database engine can
provide recovery of database state, without the use of transaction
processing features by the application code, you are not guaranteed that
the database state is Consistent (referring to the ACID transaction
properties of Atomicity, Consistency, Isolation and Durability). I
don't know what a transaction might be in the health care arena, but
consider transferring $100 from your checking account to your savings
account that is implemented by subtracting $100 from your checking
account balance and adding $100 to your savings account balance. In the
event of a system crash, either both the subtraction and addition
operations should be reflected in the state of the database, or neither
should be reflected. It is not acceptable for one to be reflected and
the other not to be reflected. The MUMPS language provides TStart and
TCommit commands that you can bracket your code with and which provides
Atomicity. Thus, if the application logic is correct (in our example,
the transfer is implemented as a subtraction from one account and an
addition of the same amount to the other account), we have Consistency.

As you note, VistA/Fileman does not use MUMPS transaction processing
commands, and therefore, when a database state is recovered from a
crash, it can, and likely will, be Inconsistent. Since VistA has been
designed this way, and has operated for years, my guess is that either
(a) from an application point of view, transaction Consistency is not
important - for example, if a system crashes during registration,
perhaps an incomplete registration means that the patient has to be
re-registered, but and the consequence is simply an unused serial number
or (b) there is application logic to search for and correct Inconsistencies.

It would be good to hear from some application experts on this topic.
Thank you very much.

Regards
-- Bhaskar

_____________

fred trotter

unread,
Aug 18, 2008, 11:12:00 AM8/18/08
to Hard...@googlegroups.com
Is it a true statement that ACID compliance for VistA could be
implemented entirely in FileMan? Or would it require more fundamental
changes in other places?

The problem with Brandens story is that his workaround for a non-ACID
crash was to leverage extensive knowledge of how VistA works to figure
out where it was broken. Essentially these kinds of efforts prevent
the "kernelization" of VistA. Important details of how the VistA/MUMPs
works are required in order to fix this type of problem. Issues like
these ensure that VistA usage grows only as fast as VistA "kernel"
expertise, and that grows slowly indeed.

If the VistA project cannot find a way past these kinds of issues it
will be eclipsed by other FOSS projects. Either by VistA-based efforts
like WebVistA (knowing that it is difficult to tell what that looks
like) or by other efforts like OpenMRS, Tolven and ClearHealth proper.

It seems clear that Baskar has done his part. He has exposed an API
from GTM to handle this issue.

What now?

--
Fred Trotter
http://www.fredtrotter.com

K.S. Bhaskar

unread,
Aug 18, 2008, 11:58:25 AM8/18/08
to Hard...@googlegroups.com
Fred --

You are thinking like a programmer and not like a business person.
Remember that things like ACID properties (and more esoteric things like
two phase commit) are technologies intended to assist in business
continuity in the face of unplanned events. As a geek at heart, I keep
reminding myself that technology is only a means to an end, and not an
end unto itself. VistA (at least DHCP) existed well before ACID
properties and seems to run well. So, I think the questions to ask
(before imposing a requirement of ACIDity) are:

Do the business processes of health care require ACID transaction
properties or are the business processes inherently robust in the face
of non-Atomicity and non-Consistency? [Isolation and Durability are not
at issue here.] If this is the case, is a requirement of ACIDity like
requiring brake fluid for restaurants?

If the answer is that the business processes of health care (at least as
addressed by VistA) are not inherently robust in the face of
non-Atomicity and non-Consistency, then what mechanisms currently exist
in VistA that provide these requirements?

Until we look at the above questions first, looking at ACIDity is like
putting the cart before the horse. Branden was not the first to
experience a VistA system crash. Let's find out what others have done
before him after recovering from a crash.

Regards
-- Bhaskar

_____________

George Timson

unread,
Aug 18, 2008, 12:44:51 PM8/18/08
to Hardhats


Fred Trotter asks:
> ...
> Is it a true statement that ACID compliance for VistA could be
> implemented entirely in FileMan? Or would it require more fundamental
> changes in other places?
> ...
No, it is not a true statement, because other VistA code changes the
database without going thru FileMan calls.

Fred comments:
> ...
> It seems clear that Baskar has done his part. He has exposed an API
> from GTM to handle this issue.
> ...

What Bhaskar exposed was transaction-processing syntax that has been
in the MUMPS Standard for a long time, but which the VA chose not to
use. GTM of course is to be commended for implementing the MUMPS
Standard! ;-)

Fred asks:
> ...
> What now?
> ...

Well, if someone wants to fund a man-year of retrofitting all VA code
with the TS and TC commands, maybe the VA would be willing to change
their (SAC) standard, and test and distribute hundreds of transaction-
processing changes to their code. But I doubt it, when they don't
even take bug-fixes and functionality enhancements from the outside.

Woodhouse Gregory

unread,
Aug 18, 2008, 12:47:36 PM8/18/08
to Hard...@googlegroups.com
Production VistA systems normally use journalling. Other measures include the use of RAID and UPS devices. For historical reasons (lack of uniform support across MUMPS implementations) VistA systems have not used transactions. This is no longer the case, but there is plenty of legacy code out there that does not use transactions. Instead, it was/is necessary to restore journaled globals explicitly.

In response to Fred's question: Fileman does not provide ACID support directly: this needs to be handled by the underlying MUMPS system. The role of Fileman is to provide a higher level abstraction than MUMPS globals, and to provide various tools (import/export, reporting, query and update, etc.) Screenman and the Classic APIs also provide (character based) UI support.

fred trotter

unread,
Aug 18, 2008, 12:54:55 PM8/18/08
to Hard...@googlegroups.com
On Mon, Aug 18, 2008 at 10:58 AM, K.S. Bhaskar <ks.bh...@fnis.com> wrote:
>
> Fred --
>
> You are thinking like a programmer and not like a business person.

No exactly the opposite.


> As a geek at heart, I keep
> reminding myself that technology is only a means to an end, and not an
> end unto itself. VistA (at least DHCP) existed well before ACID
> properties and seems to run well.

Under the care and feeding of highly trained experts who do nothing else.

My point is not at all that we need ACID, my point is this:

If system crashes require in-depth knowledge of MUMPS/FileMan/VistA to
fix, then users cannot treat VistA as a "kernel". By "kernel" I mean a
reliable platform whose internal workings can safely be ignored if
certain requirements are respected (i.e. the right hardware, MUMPS
implementation, etc etc.)

It would be entirely fine for me to have the VistA community say

"Backup VistA every hour. If the system crashes, reinstall the most
recent good backup, and send a alert that 1 hours worth of data has
been potentially lost"

That's not great... ACID would be better but that is what you had to
do with MySQL for a long time and is an acceptable work-around.

Unacceptable answer is

"Use your extensive understanding of VistA internal state to correct
the values of Globals that were in use at the time of the crash"

That answer implies that you must be a MUMPS expert to support VistA
which is intractable. I am not a C expert but I use the C-based linux
kernel all the time.

I am talking about a business problem in the context of one technical
solution, but my concern is about the business problem.

Woodhouse Gregory

unread,
Aug 18, 2008, 1:11:17 PM8/18/08
to Hard...@googlegroups.com
On Aug 18, 2008, at 9:44 AM, George Timson wrote:




Fred Trotter asks:
...
Is it a true statement that ACID compliance for VistA could be
implemented entirely in FileMan? Or would it require more fundamental
changes in other places?
...
No, it is not a true statement, because other VistA code changes the
database without going thru FileMan calls.

This a perennial problem with VistA code. I've long argued that developers should
resist the urge to manipulate Fileman globals directly, but even if everyone stopped
today, there would still be plenty of code that bypasses Fileman. Another, perhaps
more insidious, problem is that developers and systems personnel often manipulate globals to 
correct errors ("crashes").


...

Well, if someone wants to fund a man-year of retrofitting all VA code
with the TS and TC commands, maybe the VA would be willing to change
their (SAC) standard, and test and distribute hundreds of transaction-
processing changes to their code.  But I doubt it, when they don't
even take bug-fixes and functionality enhancements from the outside.


The SAC has been revised to allow the the use TS and TC, but that doesn't address the legacy code
problem (the issue you address above).

"Judge a man by his questions not 
his answers."   --Voltaire





Steven McPhelan

unread,
Aug 18, 2008, 1:35:06 PM8/18/08
to Hard...@googlegroups.com
George stated "...if someone wants to fund a man-year of retrofitting all VA code with the TS and TC commands.."  I understand that George was making a different point.  I do not think that one man year is even close to sufficient time to rewrite all the existing VA code to be TP compliant.  To make the changes, QA it, and release it would be a very large task indeed.  Then it does no good as George implied to undertake such a task and to not put in place the structure to mandate and enforce that all new code from that point forward would only use TP procedures.
 
All of this is predicated upon the assumption that load testing of such rewritten code to be TP compliant shows that there is no decrease in the number of the transactions filed per time period without the requirement to upgrade the hardware to handle TP vs non-TP processing.  I won't get into the practical issues of how the existing code would handle TP rollbacks because the filing failed.  For good or bad, many VistA programs file data and proceed on with no checks to see if the filing of the data was indeed successful.


--
Steve
"Rest satisfied with doing well, and leave others to talk of you as they please." - Pythagoras

fred trotter

unread,
Aug 18, 2008, 1:45:35 PM8/18/08
to Hard...@googlegroups.com
>
> So what do you do? When I have lost a server and ended up with the
> results of an incomplete Fileman call, I had to find the incomplete
> globals and edit them appropriately. Luckily, for my close calls the
> end user was available to tell me what they were doing. That made it
> much easier to find what globals were affected. Thus I have never
> rolled back through a GT.M journal as a result of server failure, I
> have only moved forward fixing errors as I find them.

Ok,
I will make my question more specific. Is this paragraph
illustrative of how to handle a crash moving forward? If this is how
crashes are handled, then this is a problem. If there is another
procedure that can be followed, then it is important enough to have a
description on the WorldVistA wiki. Or to have a link from the wiki to
an already published solution. To help, I have created the page:

http://vistapedia.net/index.php?title=Restoring_a_VistA_installation


HTH,
-FT

fred trotter

unread,
Aug 18, 2008, 2:07:19 PM8/18/08
to Hard...@googlegroups.com
Going on to discuss the pure technical issue:

Is there no way to do this on a meta level? What about executing TS
and TC commands before and after every routine. So that at a minimum
you know roughly in which routine the failure took place.

Perhaps you could have some "named idle journal". So that you could
automatically roll back to a time when at the least nothing was
happening on the system.

Any time I suggest something like this I usually get back that
something like this already happens, or Baskar tells me that GTM
already does something like this. I know I am way way over my head
with regards to how MUMPS works....

Woodhouse Gregory

unread,
Aug 18, 2008, 2:15:40 PM8/18/08
to Hard...@googlegroups.com

On Aug 18, 2008, at 8:58 AM, K.S. Bhaskar wrote:

Do the business processes of health care require ACID transaction 

properties or are the business processes inherently robust in the face 

of non-Atomicity and non-Consistency?  [Isolation and Durability are not 

at issue here.]  If this is the case, is a requirement of ACIDity like 

requiring brake fluid for restaurants?


If the answer is that the business processes of health care (at least as 

addressed by VistA) are not inherently robust in the face of 

non-Atomicity and non-Consistency, then what mechanisms currently exist 

in VistA that provide these requirements?


This is interesting. It seems uncontroversial that database integrity is a requirement for health information systems (for example, we wouldn't want a penicillin allergy to be "lost"). In the ACID model, I would be hard pressed to say which of the four properties (atomicity, consistency, isolation and durability)  can be dispensed with. But what is less obvious is that the ACID approach is the only route to database integrity.  Thee latest ACM Queue  takes this on with a little column whimsically entitled "BASE: an alternative to ACID"


Results like the CAP theorem have interested me for some time, given that I am interested in (developing) alternatives to heavy-handed approaches database consistency like message ordering (frequently employed in HL7).

Anyway, the CAP theorem is just another version of a well-known dilemma in database programming: in choosing between the 2-phase and 3-phase commit, you are forced to choose between an algorithm that can fail, even when updating the database is safe, and one that can block indefinitely.

"It is never too late to become reasonable
and wise; but if the insight comes too late,
there is always more difficulty in starting 
the change." -- Immanuel Kant




Woodhouse Gregory

unread,
Aug 18, 2008, 2:23:46 PM8/18/08
to Hard...@googlegroups.com

On Aug 18, 2008, at 8:58 AM, K.S. Bhaskar wrote:


Free associating a bit, I can't help but think of a famous result in (mathematical) model theory called Löb's Theorem. It states that a system cannot assert its own soundness without being inconsistent. 

Woodhouse Gregory

unread,
Aug 18, 2008, 2:33:19 PM8/18/08
to Hard...@googlegroups.com
Basically, you're running into the legacy code problem. Modern MUMPS implementations do support ACID transactions, but this facility was not available when the bulk of VistA was developed. This has lead to a controversy between people arguing that it is not feasible  build transaction support into VistA, and people (like me) that argue that it is essential to do so. Unfortunately, this generally mutates into a highly emotional debate over the use of MUMPS, which is not the point at all.

"Mathematics is the science of patterns."
--Lynn Arthur Steen, 1988





Woodhouse Gregory

unread,
Aug 18, 2008, 2:49:15 PM8/18/08
to Hard...@googlegroups.com
On Aug 18, 2008, at 11:07 AM, fred trotter wrote:

Is there no way to do this on a meta level? What about executing TS

and TC commands before and after every routine. So that at a minimum

you know roughly in which routine the failure took place.


This is a good question. It shouldn't be difficult to write a meta-interpreter of the type you describe, though I'm unsure what the performance implications would be. 

fred trotter

unread,
Aug 18, 2008, 3:33:08 PM8/18/08
to Hard...@googlegroups.com
I agree that ACID vs no ACID is probably a waste of time. Any
practical suggestions for workarounds for VistA rebuilding?

Woodhouse Gregory

unread,
Aug 18, 2008, 3:39:32 PM8/18/08
to Hard...@googlegroups.com
It's close - far too close for my comfort. Production systems should always be journaled, but I suspect 
many people here who may be developers, or who may be just "kicking the tires", may not enable journaling.

"Think globally, act locally."
--René Dubos





Chris Richardson

unread,
Aug 18, 2008, 5:23:35 PM8/18/08
to Hard...@googlegroups.com
Well, guys, there is nothing left to do but contact your congressmen about
this and start a grass-roots effort to get this funding. It would be
embarressing if a foreign government might pay for our software to be
properly updated.

Branden Tanga

unread,
Aug 21, 2008, 11:27:34 PM8/21/08
to Hardhats
Sorry to bring up a seemingly dead topic, but I haven't kept up with
this thread over the past few days.



On Aug 18, 7:11 am, Woodhouse Gregory <gregory.woodho...@gmail.com>
wrote:
> On Aug 18, 2008, at 9:44 AM, George Timson wrote:
>
>
>
> > Fred Trotter asks:
> >> ...
> >> Is it a true statement that ACID compliance for VistA could be
> >> implemented entirely in FileMan? Or would it require more fundamental
> >> changes in other places?
> >> ...
> > No, it is not a true statement, because other VistA code changes the
> > database without going thru FileMan calls.
>
> This a perennial problem with VistA code. I've long argued that  
> developers should
> resist the urge to manipulate Fileman globals directly, but even if  
> everyone stopped
> today, there would still be plenty of code that bypasses Fileman.  
> Another, perhaps
> more insidious, problem is that developers and systems personnel  
> often manipulate globals to
> correct errors ("crashes").

I don't see code that directly edits globals as the major issue. The
main problem as I see it, is that having transactional processing
built into Fileman is not good enough to be able to safely roll back
and forward through a VistA log. In the same way that a single Fileman
call is made up of multiple Mumps sets and kills, a single VistA
transaction can be made up of multiple Fileman calls. So you would
need code in VistA itself that defines what a "transaction" is.
Likely, this definition would be different for each module in VistA.
There is no way for a pure programmer like me to denote VistA
transactions, you would need domain experts for each module to mark
which action or group of actions are a transaction.

Branden Tanga

unread,
Aug 21, 2008, 11:48:26 PM8/21/08
to Hardhats
I totally agree, my solution is not optimal. When I had a server
failure, I was faced with 2 options:

1. Figure out where in the GT.M journal to roll back to
2. Figure out how to fix the globals manually, and move on.

Because of the risk that rolling back through the journal may cause
other Fileman calls to be incomplete, and the ridiculous amount of
time it would take to figure out which exact GT.M set or kill I needed
to roll back to, I chose #2. I talked to my end users to figure out
what they were doing, edited the necessary globals, and if their
actions were "finished", then I considered the database recovery as
complete as possible. In short, I had to choose the lesser of 2 evils,
which was to fix the globals manually and move on.

Skip Ormsby

unread,
Aug 22, 2008, 7:01:30 AM8/22/08
to Hard...@googlegroups.com
If my creaky old brain remembers correctly, one of the reasons for non-traction processing is because of code like this (before the unsubscripted kills prevention was implemented and the New command, although there are still plenty of times the Kill is used)
S DIC=4,DIC(0)="AEMQZ" D ^DIC
;Now being a good developer since I am making a Classic call I need to do local variable clean up
K ^DIC ; Ahh oops

Generally the favorites were ^DD, ^DIC, and ^DPT in no particular order.  Solution - read the journal until you find the unsubscripted Kill and clip it out.  It may take X amount of time before you actually notice that something has disappeared, so you would have journal activity that needs to be applied post the unsubscripted Kill.
-skip
"we have met the enemy and he is us." - Pogo

Steven McPhelan

unread,
Aug 22, 2008, 8:29:59 AM8/22/08
to Hard...@googlegroups.com
That is how I have always handled this problem in the past so very
long ago since I have not had to do this in years. That is the
purpose of the journal which is to bring a backup copy up to date with
all the transactions since that backup by dejournaling. If there was
something I knew I dd not want to happen (Skip's K ^DIC example), we
would edit the journal file to remove the offending code and then
proceed with the normal dejournaling procedures. Of course if you are
not journaling then you do not have this option.

I have not looked in years, are the journal files still just text
files or have they been "updated and improved"?

--

K.S. Bhaskar

unread,
Aug 22, 2008, 10:08:17 AM8/22/08
to Hard...@googlegroups.com
Branden --

There was some off-list discussion of this topic. To summarize, when
the VA runs VistA, they rely on the MUMPS implementation to restore the
database to the state it was in just before the crash (of course, they
use computer hardware and operating systems that don't make a habit of
crashing). A combination of their business processes and VistA
application logic is such that after a crash, they don't usually need to
go in and make changes - in other words, their business processes are in
a good state when the MUMPS system recovers the database and VistA is
restarted.

I speculate that when units of work need to be done, they are put in
queues (in the database) for Taskman background processes to handle, and
the design of Taskman is such that when the database is recovered, it
picks up unfinished work from the queues. But this is just a guess.

Regards
-- Bhaskar

_____________

Steven McPhelan

unread,
Aug 22, 2008, 4:11:28 PM8/22/08
to Hard...@googlegroups.com
If the taskman globals are journaled, they will be recovered and Taskman will start where he left off.  However, any existing jobs running at the tiime of the crash will not be restarted.

Skip Ormsby

unread,
Aug 22, 2008, 8:15:58 PM8/22/08
to Hard...@googlegroups.com
As long as the subject is about power outages, at the hospital I was at we had no break power for 8 PDP 11-44s that would run the critters for 15-20 minutes, which was long enough for either the main generator to kick in or for one of us to gracefully shut the systems down.  When we went to the MSM/486 configuration, we made sure that all of the parallel bars were zipped tied and the plugs were zipped tied so the didn't accidentally come out of the plug.  And of course the no break power would last a very long time, but we never pushed it past 1/2 hour.  The biggest problem was more in line with a disk controller going nuts, or a nic card that would go bazerk, or bad memory, which in turn would put curd into the data base.  For us it was a case of fix and forget it, because there were other fish to fry.  Never did have a real power outage to the computer circuit, even when the room was flooded from a broken pipe in the ceiling.  Lost the lights, etc., but computer kept right on humming until we shut them down in a speedy, graceful shutdown.

-skip
"we have met the enemy and he is us." - Pogo


ivaldes

unread,
Oct 9, 2015, 10:18:32 AM10/9/15
to Hardhats, Hard...@googlegroups.com
The methods below are interesting from a historical stand point but are now obsolete with the Astronaut VistA distribution using the vistastart.sh command. -- IV

Nancy Anthracite

unread,
Oct 9, 2015, 11:42:17 AM10/9/15
to hard...@googlegroups.com, ivaldes, Hard...@googlegroups.com

Note that if you have been journaling properly, a rundown is NOT what should initially be done. I am sure Bhaskar will comment here soon.

 

--

Nancy Anthracite

Bhaskar, K.S

unread,
Oct 9, 2015, 2:19:00 PM10/9/15
to hard...@googlegroups.com
Nancy is evidently a good astrologer in foreseeing that Bhaskar would comment here soon! And yes, journaling is the way to configure systems to recover after an unclean shutdown.

Regards
-- Bhaskar
--
--
http://groups.google.com/group/Hardhats
To unsubscribe, send email to Hardhats+u...@googlegroups.com

---
You received this message because you are subscribed to the Google Groups "Hardhats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hardhats+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
GT.M - Rock solid. Lightning fast. Secure. Pick any three.
Reply all
Reply to author
Forward
0 new messages