Jeff Wilson
If you are restricting the system and no one is using the files.
Don't even bother with save while active. Nothing should be active.
The main reason I want to use Save While Active is for the TIME
savings. As I remember it, it takes a snapshot of your system and then
lets you unrestrict the system after taking that picture, often only
15-20 minutes. We used this at my last company in order to
substantially reduce downtime to the users. Our backup currently takes
almost 4 hours, during which the users are off the system. If I can
reduce that to 30 minutes, it would be a huge help.
Jeff Wilson
We were in a similar situation... although we don't drop down to a
restricted condition, I do hold the jobqueues, shut down the subsystem with
the QZDASOINIT jobs, and the applicaiton subsystems. then use the *SYNCLIB
option to get a sync point across multiple libraries.
There is another paramter to specify which msgqueue should get the
"complete" messages. we send it to qsysopr.
currently, we just start everything back up 40 minutes after the backups
start.. which gives us a buffer of about 10-15 minutes.. been bitten a
couple times when the backups are behind... but 99% of the time works fine.
If you know how to trigger a command when a certain messageid shows up in a
certain message queue, this could be much cleaner (but I don't know how to
do it)..
We have 4 lpars... all using SWA. (except for weekend backups when we do
*ALLUSR), and monthly for a *SAVSYS.
Tom
"Jeff Wilson" <JeffW...@aol.com> wrote in message
news:72bf9d04.03081...@posting.google.com...
Thanks for the input. I would suggest to you that if you are not using
BRMS to use it. And if you use it, you can just run a program after
the line that does the backup. It's very easy to setup and then you
wont have the problem you are having now. If youa re using BRMS, let
me know and I can send an example to you.
Jeff Wilson
We've used SWA for years. I have four CL's that w/designated
libraries totaling approx. 160gb. Operations ensures no batch or
interactive jobs are processing, hold the jobq's and fire off the save
(which submits the four CL's). It only takes us about 10 minutes (at
most) to get to Check Point Processing Complete at which time another
CL is called which releases everything for normal operations.
Recovery has been successfully tested (several times during Disaster
Recovery testing).
I've been a big advocate of SWA for years. If you ever go to COMMON, Al
Barsa puts on a great presentation on SWA that is a MUST attend!
There are actually 2 methods to do SWA and, contrary to what your system
admin says, both are very reliable.
The first method, and the most complicated method, is to leave your users on
while you do SWA. After all "while active" means users are active. This
method requires you to journal all of your files and save the journal at the
end. It also complicates recovery tremendously. I don't recommend this
unless you absolutely have to have your users on 24/7.
The second method is the one most commonly used and the one that we use
here. We end the interactive subsystem, hold the jobqs and get a
"checkpoint" on the data. It does NOT, as others have suggested, create a
snapshot of the data. It just flags each object with a date/time stamp of
the checkpoint. After the checkpoint has occurred, you can let users back
on the system and backup continues by backing up the database files. The
process of obtaining a complete checkpoint actually takes about 6 minutes on
our 250gb system. Your downtime will vary based upon the number of objects
you have. Size of objects is not a factor in how long the checkpoint takes.
Should anyone change/update/delete data after the checkpoint has been
established the data changes are placed in what IBM calls a "set aside"
file. Any read from data files is attempted from the set aside file first,
therefore the users are always seeing the most current data all of the time.
Once the backup has completed, the set aside files are merged with the
original database.
It's simple, painless and takes only minor modification in your backup
routine (assuming you use your own CL program for backup.)
During the down time we start a subsystem which has a different signon
screen that informs the users that backup is in progress. This signon
screen protects the user id and password fields (and we hid them to boot) so
that users can't log on. We end that subsystem and restart normal
interactive subsystems when the checkpoint has completed.
I recommend this process highly. We were able to do away with SAVCHGOBJ
entirely and now backup entire libraries daily. Having been through a
recovery recently I know this makes recovery MUCH simpler.
chuck
Opinions expressed are not necessarily those of my employer.
7
"Jeff Wilson" <JeffW...@aol.com> wrote in message
news:72bf9d04.0308...@posting.google.com...
Yes, we use savlib. Here's an example...
SAVLIB LIB(LIB1 LIB2 LIB3 LIB4) +
DEV(TAP07) ENDOPT(*LEAVE) CLEAR(*ALL) +
PRECHK(*YES) SAVACT(*SYNCLIB) +
SAVACTWAIT(0) SAVACTMSGQ(SWA_BACKUP) +
ACCPTH(*YES) DTACPR(*DEV)
Kim asked: "Are you backing up all of your libraries?"
We backup all of our production data libraries with SWA and then immediately
follow with program objecs an development libraries. The program objects
and development libraries aren't a part of SWA since no programmer is in the
house when backup runs.
Kim asked: "Of the 250gb, how much is used (how much are you backing up)?"
About 50%
Kim asked: "What at kind of tape units are you backing up"
We use an Ultrium LTO drive, IBM 3580. Everything fits on one tape and
complete backup takes about an hour.
Thanks for your comments. Someone here was nice enough to send me Al
Barsa's presentation and it was very interesting. I agree with your
suggestion regarding the total system quiesce. We have no need to be
24/7 and I know it makes things tons easier to have a static system
checkpoint. My sys admin called IBM who assured him that Save While
Active was safe, although he said "it should only be used as a last
resort". I think he made this last comment up to save face. Thanks
again!
Jeff