Before we get to them, though, I'm about to begin writing my
analysis of storage management for the Best Practices Report, and I
thought I'd put down some of my conclusions here for comment:
To begin with, here are two assumptions which I believe must form
the basis for any serious attempt to deal with backup on an enterprise level
1. Backup must be considered a subset of storage management.
The amount and diversity of data on networks in large companies is growing
at 50-100% per year. This can be expected to continue--perhaps not in a
linear way. A professor of IT I spoke to who insists that with the
advent of serious workgroup apps and multimedia, we can expect the average
company to be dealing with hundreds of terabytes and perhaps up to
petabytes of data by the turn of the century.
Whether or not this forecast turns out to be true, it's clear that storage
needs are going to be immense in the future, and that piecemeal backup
will not suffice. Backup must be integrated into an overall storage
management strategy which is flexible enough to easily integrate new
technologies as they become available, yet uniform across an organization. One solution gaining popularity is to combine a backup front-end with an HSM back-end to allow migration of backup files to less expensive media as in traditional HSM. Managers a
re also increasingly insisting that backup costs be included in any purchase of additional storage, thus passing through the costs of backup to the customers.
2. Effective storage management depends primarily on strategic planning.
That is, don't expect the tools out there to save you. Vendors are out
there today hawking HSM, database backup solutions, and all sorts of other
nifty solutions which are guaranteed to be scalable, flexible, robust,
open, automated, and every other buzzword their marketers can think of.
Alas, the truth is that there is currently no off-the-shelf product
capable of true enterprise storage management.
This is partly due to product limitations--there are still all
sorts of problems to be worked out in HSM for open systems, for example,
even among companies such as Epoch and OpenVision which actually offer
good HSM functionality. It is also due, however, to the fact that
"enterprise storage management" will mean different things to a Wall
Street trading firm and a national retailer.
Like all systems management, storage management must seek to
complement a company's overall business strategy. How much is your
company's data worth? How much of it must be online or nearline? How
important is security? How freqently will data be restored? How much
power over backup and recovery do you want to give your end-users? What
kinds of applications will you be backing up? What is your window for
backup, if any? The answers to these questions lie in the strategic goals
your company/organization is emphasizing; and those answers will determine
your storage management strategy.
The folks I've been interviewing have had differing degrees of
success in backing up and managing storage in their distributed systems.
They've tried all different products--Epoch, Legato, OpenVision,
CA-Unicenter, etc.--and all different infrastructures (HSM, departmental
server-based backup, centralized backup from servers in the data center,
mirrored databases, etc, etc.), but almost universally, they have stressed
the importance of planning ahead, establishing policies, and then seeking
out the technologies and the vendors who can best help you implement those
policies. No matter which product you go with, you will have to make
compromises. Your strategic planning will tell you which compromises are
acceptable.
In addition to the above questions, here are some others which
users have suggested you ask while developing a storage mgmt strategy:
1. How much disk space do I have now? In two years?
2. What is the value of my data (How much would it cost to replace?)
3. What hardware/OS platforms must the solution support now and in
the future?
4. What storage devices must the solution support now and in the future?
5. Is the solution scalable, not only in the amount of storage it supports,
but in its ability to allow automated, centralized management of
that increasing storage load?
6. What kinds of data will I be storing?
7. How long must data be stored? How frequently are stored files
accessed?
8. Is the solution customizable? Will it implement rules-based
storage management?
One last thing, which I suppose should have come earlier: What does
"enterprise backup" mean, anyway? I ask this question partly because of a
letter I received from Delta Microsystems claiming that their BudTool
product represents the closest thing to a n enterprise solution around.
By their definition (which you can read in full below), they may have a
point. They claim that "enterprise backup" means a product whcih can back
up every computer in the enterprise. Their criteria for enterprise backup
include amount of total disk back ed up, cost, and ease of restore. By
thesese standards, BudTool is certainly an enterprise backup tool--BudTool
can back up many different kinds of computers, since it functions as a
graphical front-end for the client's backup utility. It stores files i n
standard format, making restores easy. I've had some complaints about the
price tag, but relative to competitors in the UNIX world, it's not that
bad.
However, the Best Practices Report defines enterprise backup differently.
By our analysis, an enterprise backup solution is distinguished from a
domain or departmental backup solution by:
-Degree of automation
-Ability to centrally control distributed servers
-Ability to implement rules-based storage management
-Extent of library and volume management functionality
-Scalability to support multiple heirarchical levels of storage management
-Open-endedness to support developing manager-agent systems management
functions such as those being developed by EcoSystems, Tivoli, etc.
BudTool is an excellent product for backing up departmental servers,
especially those with large amounts of data. However, for large
corporations which need to implement uniform backup policies across
heterogeneous environments encompassing hundreds or t housands of nodes, I
don't see the product as a strategic solution.
Of course, no company can claim to offer all the functionality I've just
listed at present. However, several companies I've talked to--especially
OpenVision, IBM Adstar, and Epoch--have articulated sound visions for
moving to meet these requirements for true enterprise storage management.
Anyway, those are my thoughts on storage management. Even though my
analysis in the Best Practices Report will be coming out the first week in
March, I encourage people to respond either by posting or mailing me
directly--I'll keep posting summaries as long as there's enough interest.
And now, more comments from the net. Again, I've
removed names to protect the innocent. If you'd like to contact one of
the respondents, just mail me and I'll pass your request along.
MORE FREEWARE:
Another freeware backup system is the OSU backup system, which I
wrote. It is in use at several dozen sites around the world. We use
it to backup 163 file systems containing 69.2 gigabytes on 48 hosts.
The system is written in perl, so it is very portable. Here's a list
of features:
Uses a tape management system, which supports off site tape storage,
tape identification so you don't write or read the wrong tape,
supports different tape types, very flexible tape labeling, multiple
tape storage and drive locations, and automates tape aging.
Built on top of standard archive tools, like BSD dump or GNU tar.
Designed to be easy to add support for other archive programs, like
cpio if you wanted. We added simple support for backing up sybase
databases in a morning.
Doesn't maintain a list of which files are backed up on which tapes,
but it does maintain a database of which backups are on what tapes, so
it simplifies restores by telling you what tapes to load, and skips to
the correct file on each tape and starts the restore program for you.
Supports parallel backups like UMD's Amanda, though it isn't as
optimized (most of the folks that use the OSU system are using
multiple tape drives for various reasons). One difference is that the
OSU system supports multiple holding disks, so that you can make use
of existing free disk space and parallelize the backups to the holding
disks.
Backups can be scheduled in the future with at or cron, supports
backups in single user mode. Has a tape checker that can be run at
the end of the day to check to see that the tapes for the next morning
are loaded. Also includes a backup checker that reports on backups
that are overdue, file systems that aren't being backed up and so on.
Uses a flexible scheduling system so you can specify what levels of
backups to do how often, and how long to retain them before reusing
the tapes.
The software is available for anonymous ftp at ftp.cis.ohio-state.edu
in /pub/backup. It can also be retrieved through anonymous uucp -
send mail to uu...@cis.ohio-state.edu for more information. There are
mailing lists for discussions about the system and for announcements
about new releases - send mail to backup-...@cis.ohio-state.edu or
backup-anno...@cis.ohio-state.edu to subscribe to either.
A MODERATELY HAPPY EPOCH EXPERIENCE:
We are using Epoch's EpochBackup. Initially we had
great difficulty in getting this to run properly but things stabalized
once we installed Epoch migration software. Right now I have to say that
I am happy with the software. Epoch was recently bought out by EMC
and EMC is a company showing some tremendous growth recently. This
growth was recognized by the Boston Globe recently in which it was
stated that stockholders investment has increased by 177%. So hopefully
some of EMC's strength financially will result in money invested in
Epoch. Epoch does seem to be recruiting quite extensively in the area
recently. There has not been a major problem yet that they have not
been able to solve for us.
We did try BudTool about
two years ago and discontinued using the software. Problem with BudTool
for us was inaccurate logfile reporting (which we consider crucial).
We are currently backing up about 80 clients with EpochBackup.
Platforms for our backups are DEC, IBM RS6000, SGI's and Suns. I
believe that it is good software. This is not to say that we have not
had problems.
Most of our problems seem to have gone away when we had migration
installed on the backup server. Without migration we continually had
problems managing our /catalog filesystem which manages the client
catalogs.
We had problems with the volume manager which keeps track of the
tape media. This was resolved with a new mqmdaemon (mqmdaemon manages
the volumes). Initially we had difficulty getting data off of tape
for restores.
The load balancing feature did not work well. Epoch is working
on this for us. We are not running load balancing now. With the exception
of the load balancing problem Epoch has been able to fix all our needs.
With the problems that we have had Epoch and our operations staff
have both shouldered some of the blame. We had difficulty adjusting
to the new software and there were some problems on Epoch's end which
were resolved. Right now I have to say that the software is running
well for us. Below I have given some advantages/disadvantages as well
as some of the features of EpochBackup. Hope you find it useful.
Advantages of EpochBackup:
1) Backup of live filesystems, no downtime needed. Handles
skipping of active files.
2) Centralized backup configuration and control
3) On-line backup catalogs aid in file recovery.
4) Backup data management automated - self expiring backup
catalogs, saveset records and media. storage media maximized to
fullest potential.
5) System adminstrators can restore any files readable to their
own accounts.
Disadvantages of EpochBackup:
1) System adminstrators have to notify operations in advance
of file system changes or data will not be backed up.
2) Installation of clients requires temporary access to /.rhosts.
3) Special administrative account (~rbadmin) has permanent .rhost
entry.
4) Backup and recover run set UID to root.
5) Client disaster recovery requires temporary space. You need
to use a spare disk or make the client an NFS client.
Features of EpochBackup:
1) Maximizes throughput through simultaneous backups. Our
configuration is for 12 simultaneous backups. 4 full backups and
8 incremental backups can be run simultaneously. This allows for
faster backups and works well. This also improves the transfer rate
to tape. The transfer rate is actually fastest when your running
multiple backups rather than just one.
2) The system automatically retries failed backups. It will
try to backup a machine 3 times throughout a backup shift. So if
a machine is down while EpochBackup is scheduled to back it up
the software will go back later and try again.
3) It's relatively easy to add and manage new clients.
4) Comprhensive reporting on backup coverage, media and
client history reports that are available online.
5) Unattended operation.
6) Complete volume management which provides electronic
tracking of backup media, location and contents.
7) A true image recovery. the on-line backup catalogs
contain the necessary information which allows for the viewing
and reconstruction of a filesystem on any given day.
8) Automatic or Custom Scheduling features.
a) Automatic - schedules one full backup per
rotation period and the rest are incrementals. Our rotation period
is 14 days which is suggested by Epoch. You can also have 7 and 28
day rotation periods.
b) Custom Scheduling - allows one to specify when
and what level backups occur on a given day. For example on Saturday's
we run weekly and monthly backups along with the daily's. The weekly
and monthly's are for filesystems that do not change on a frequent
basis. This saves on tape media as well as cataloging space on our
backup server.
9) Migration - older on-line backup information is staged out
back to tape.
A VERY UNHAPPY ARCSERVE EXPERIENCE:
Okay, onward to The Things We Hate About Arcserve:
Number One: CHEYENNE. If the product were perfect, I'd never need to call
Cheyenne for help, and I'd have nothing to complain about. But, of course,
that's not the case, and I've had to call their "tech support" group
serveral times. They are, hands down, the worst support team I've ever had
to work with. First, their telephone system queues calls for absurd
amounts of time without warning. I have NEVER waited for LESS than 40
minutes to talk with the first human possible. That's not acceptable.
But, it gets worse. I've usually waited much longer, and once gave up
after two hours. Twice, after finally getting a human, I was inadvertantly
disconnected as my call was "transferred" to another department. When your
server is down and you're waiting for help to restore files, that's not
acceptable. A technician once gave me their BBS number, so that I could
download a patch I needed for a bug repair. Unfortunately, the line was
never open, and I tried for three hours to connect before giving up.
Frankly, I've had more success getting timely, accurate help from other
users via the Internet than I have through Cheyenne. Even if a major
revision of Arcserve were released next month to address my complaints
about the software, I'd have serious reservations about continuing to use a
Cheyenne product.
Second: Scheduling. Arcserve has terribly inflexible scheduling features.
What I'd really like to do is a complete backup once a month, followed by
differential backups daily. To the best of my knowlege, there is simply no
way to do it, even manually each day. Arcserve has two auto-scheduling
options. One is "repeat" or something similar; it allows you to run the
same job each day, such as a full backup. I believe there isn't a way to
make it do differential or incremental jobs, since the task is simply
repeated each day, without reference to an ongoing database that might tell
the application when files were last backed up. Also, it doesn't know to
append the data to a loaded tape. It's as if it's running a new job every
day, and it wants a new tape. The other scheduling feature, Autopilot, is
frustratingly inflexible. It requires you to save tapes for a period and
then reuse them. It requires you to do a full backup at least weekly. It
requires you to run a complete backup monthly and archive it. Now,
Cheyenne thought that their grandfather-father-son scheme was a smart one,
and they're right, but it's not fair to impose that system upon everyone.
I have a unique setup, for instance, and that schedule doesn't suit my
needs well. I have a server to back up only, no clients (they're all Macs,
which I back up with Retrospect, a wonderful product). The server, though,
is full of medical images-- some 20GB. And I have one DAT drive to back it
up. Therefore, a full backup takes longer than a full business day; it's
not something that I can do weekly.
Third: Tape rotation. I'd like to have two complete sets of backups all
the time, and update each one every other day. That would let me keep a
complete, up-to-date set off-site every night for security. I conduct the
Retrospect (Macintosh) backups that way. In fact, I have three sets: Sun,
Snow, and Rain. Retrospect cycles the sets and conducts incremental
backups to each of them. So, when I created Sun on a Monday, it made a
complete backup. It make a complete backup to Snow on Tuesday, and to Rain
on Wednesday. Thursday, it asked for the Sun tape, and added all the
changes since Monday. Friday, it asked for Snow, and added all the changes
since Tuesday, etc. So, each set is complete within one or two days.
Arcserve, though, isn't smart enough to do that. If I do a backup to one
tape Monday and to a different tape Tuesday, then insert the Monday tape on
Wednesday and ask it to do an incremental backup, it only records changes
since TUESDAY, because it checks the archive bit. I called Cheyenne,
waited all day on hold, and finally explained what I wanted to do. I
talked to two people. They both told me that it was impossible with
Arcserve. Worse, they acted like I was some odd-ball weirdo who wanted to
do something absurd. I think, though, that it's absurd for the software
developers to think that the single scheme they choose to impliment will
meet all users' needs.
And finally: Recovery. The process is tedious, unintuitive, and
unacceptably slow. To simply find out whether or not you even HAVE on tape
the file you'd like to recover, Arcserve can conduct a query that takes
literally hours, and always a minimum (for me) of 25 minutes. That query
should instead take seconds, and the entire recovery shouldn't take longer
than five minutes, start to finish. That should be time to check an index
on hard disk, load and wind any tape to any position, and recover the file.
Most of that time should be spent winding to the correct tape position,
and the max time for a 120m DAT is about 2 minutes. Half an hour to
recover a 200k image is not acceptable. Two hours to tell me which tape to
load is... it's assinine. I can't imaging what in the hell is taking so
long. It says "building an index." Is it building from tape? If so, what
the heck is the database on disk for? Also, that database is now 26MB, and
grows daily. That's 26MB of database for a 20GB server; seems a very large
file, to me.
Oh! Updates! I almost forgot. I'm using v5.0, which has several nasty,
known bugs. (The worst is a bug which prevents jobs from showing in the
job queue until they're running; the result is that you can't modify the
job without waiting for it to run, and then terminating it.) So, I ordered
the 5.01 upgrade, and found TO MY HORROR that there is not an installer,
there is not a patch, and there is not an updater-appliction; you have to
RE-INSTALL THE WHOLE DAMNED PROGRAM FROM SCRATCH. That, of course, is a
nightmare; you have to tell it again where your interface card is, what
hardware you're using, set the interrupts; it's a NIGHTMARE. But wait! It
gets worse! The DATABASES ARE INCOMPATIBLE! That's right: you lose your
old databases, which pertain to the backups you've been running all this
time. Sure, you can re-build new ones for 5.01-- from TAPE, but each one
of my backups now spans four or more tapes; to rebuild each one would take
about 10 hours. One set per wasted weekend. Come on; this is not
acceptable.
I just can't say enough bad things about Arcserve. I hope you'll spread
the word to potential victims. Feel free to use my name, pass on my
address or phone number, or to contact me personally if I can provide any
more details for you. Arcserve is simply a poorly conceived, poorly
executed product from a company with a terrible record for supporting its
customers.
I've been looking into the biggest competing product, Legato Networker, but
I've gotten mail from unhappy Networker users, too. Their complaits,
though, were with older versions of Networker; most had not upgraded yet...
one had switched to Arcserve (heaven forbid). So I'm not sure that I've
heard any legitimate complaints about the latest Networker release. I HAVE
heard GOOD things from every user of Palindrome (what's it called?) TNA?
And I'm about to research that product. But, again, I've heard from three
users so far, and they've all been very happy with that product. One user
worked with all three applications (and still does); he said the Palindrome
product was "hands down" the best. I hadn't ever heard of it before,
though, and I still know almost nothing about it.
AND NOW, A WORD FROM DELTA MICROSYSTEMS...
An "enterprise backup solution" is a single backup solution which is capable
of backing up all of the computers on a users network. Legato claims that
NetWorker is an "enterprise backup solution" on the basis of being able to
back up most of the popular UNIX computers and Novell networks. However,
neither NetWorker nor any other backup solution can legitimately claim to be
an "enterprise backup solution".
No backup package comes close to supporting all of the computers a customer
may have on their network. Therefore, if anyone can lay claim to having
an "enterprise backup solution", perhaps it should be the those who can
back up the most different types of computers. BudTool can backup any
client that meets the following criteria:
1. The backup server can initiate a connection to the client.
2. The client has a backup utility installed.
3. A command or sequence of commands will result in data being
written to "standard output"
Therefore, BudTool can back up ANY UNIX client and large assortment of non-
UNIX clients. By this definition, BudTool has the best case for being the
best "enterprise backup solution". However, the number of kinds of computers
which can be backed up is a poor way to rank "enterprise backup solutions".
I think a better (but still poor) method of ranking of "enterprise backup
solution" would be based on the percentage of a customers disk space which
can be backed up. This is also much harder to determine, since not only the
type of computer and the amount of disk on it are important, but also whether
the backup utility has sufficient performance. By this measure, BudTool
still finishes ahead of any of the other software packages which you have
mentioned.
A good method of ranking "enterprise backup solutions" has to include a
measurement of what percentage of a customers disk space can be backed up,
the cost (including product, maintenance, hardware (including the disk
consumed by any databases) and administration)), reliability of backup and
ease and reliability of retrieval. I could of course claim that BudTool
wins by this definition as well, but the only way to substantiate such a
claim is via customers experiences.
DEC's NETWORK SAVE AND RESTORE
We have been using DEC's "Network Save and Restore"
for some time. This is a rebadged Legato Networker, with apparently very
few changes from the original product apart from minor cosmetic fiddling.
The product's core functionality is fine - it is easy to set up clients,
and restoring a crashed system to its previous state is now much easier
(in fact, the filesystem recovery is the easiest part, while re-installing
the OS is still far too time-consuming). Recovery of individual files is
also easy, especially as it is the user who drives the recovery process,
eliminating transcription errors. It is also very fast.
The problems we have had with DECnsr have mostly been the sort of annoying
bugs and difficulties you find with any new product. Certain things are
badly documented, or you have to do things in what seems a perverse way.
To take an example: in order to recycle a backup tape, you have to re-label
it. Even if all the save sets it previously contained have expired, the
space they take up on the tape will not be re-used unless the tape is
explicitly re-labelled. This is not documented anywhere, and it took several
weeks for Digital Support to work out what we were "doing wrong".
However, the core functionality is fine and I am sure the minor problems
will be cleared up. Overall, we are quite pleased with the product.
We use a DECStation 5000/240 with two DAT drives - for backing up about 30
UNIX machines, a mix of Suns, Irises, and ULTRIX DECStations. Total backup
volume is about 50 Gbytes.
A HAPPY BUDTOOL USER:
We have been using BudTool for the past 1 1/2 years to backup all
our servers. It has been very easy to manage and maintain current
backups.
Some of the reasons we chose BudTool are:
It does NOT use a pripority tape format. The tapes can be read using
standard Unix utilities. Even if the BudTool application is not
available for any reason, the tapes can still be read.
It operates on any Unix system
Only the Media server must be licensed, instead of the server and all
clients.
BudTool gives the option of using dump, tar, cpio or writing your own
application for backups.
Our Environment:
I have Budtool with the 10I jukebox installed on a SLC. This system is
used as a workstation during the day and runs backups at night.
Budtool keeps three different histories, media, request, and file. The file
histories get rather large during the month. Normally I keep the current
history files in BudTool's history directory, then move them to the temphist
directory at the end of the month and create new history files. The restore
functions allow using either history or temphist for restore. A single file
needed from a backup longer than two or three months can be restored by
looking at the request file to determine the partition location on the tape,
manually load the tape, forward to the appropriate location, then run restore
manually. This occurs at our site maybe twice a year.
The restore function also allow for returning files to their original location,
a new location, or a staging directory.
Access is fast enough that from one Budtool media server, I backup 12
servers with a nightly incremental level 5 dump and full level 0 dumps
over Saturday and Sunday. No operatoris required.
The only problems we have encountered have been hardware -- the exabyte
drive decided not to write on the tapes and jambed one tape in the drive.
Support from Delta Micro has been responsive and the technical support
people helpful.
[Poster's Note: The quality of Delta Microsystem support has been borne
out by several folks I've interviewed. Open Vision also gets high marks
for support. Legato and Cheyenne both rate near the bottom...]
--
Jon Bines (jonb...@panix.com) ^ "I don't want to achieve immortality ^
NSM Best Practices Rept. ^ through my work, I want to achieve it ^
203 1st Ave #1 NY NY 10003 ^ through not dying." ^
Phone/Fax 212-254-7064 ^ -W. Allen ^