Why am I saving this backup?

149 views
Skip to first unread message

Tony Gravagno

unread,
Oct 22, 2017, 4:32:07 PM10/22/17
to Pick and MultiValue Databases
It seems almost every day I'm hitting this annoying issue, so I thought I'd toss it out here.

On client sites and my own systems there are backup files with .VTF and .D3P extensions. They have names like save1, FS_yymmdd, backup, etc. A lot of these are of multi-gigabytes in size and I'm just running out of room for them.

But what's in these files? Why do I care anymore?
Was there some major update made that day?
Did I do a huge purge? Resize files?
Was this supposed to be a backup in case I screwed something up on that day, or was it intended to serve as a reference a year later of someone said "it never used to work like that"?
Was there some valuable code or data in there that I forgot to backup elsewhere?
Now I have hundreds of gigabytes of these files. Why?

And a lot of those files have been shuffled to different systems and different hard drives because I keep running out of space to put them.

This really comes down to lack of record keeping - there is no metadata to accompany these files. There's no log of (pseudo-tape) backup files, the names, folders where they are saved, expiration date, etc. We can't look at the system and say "restore the dev account that has my web service code from before I made that stupid change in August". What's the process to find that account? We randomly search through systems, hard-drives, and folders, for some randomly named file. We attach a candidate (maybe after moving the file onto the right system), try a few account-restores, and hope the data we want is in there. No? Delete-account, repeat as required. Watta hassle.

Regular file-saves have many accounts including user data, app code, utilities, test/demo accounts, and random experiment accounts that we forgot to delete. We don't separate this stuff out but maybe we should.

When we do a save we get a chance to add a bit of text to the header to explain what it is. But I don't know anyone who uses this anymore, and with all of the stuff in a save what do we say in 80 bytes? It's only readable if we attach the media and do a t-read, not visible from the OS unless we're using a binary editor.

For regular backups we all have some kind of save policy and retention policy, but how do we really know what's actually in the various backups?
Even if we do periodic account-saves, we end up with filenames that are date-stamped but with no indication of why they're important.

And this isn't just about individual files. When I'm concerned about the integrity of the system I'll copy entire account folders from the D3 FSI folder tree. But again, with no metadata I can look at these huge folders months later and wonder if there was something in them that I wanted to preserve for recovery or an audit.

My question is : Is there any really good solution for this that many of us can use? Does anyone have what they consider to be a Best Practice for backup management with MV? This applies to all MV sites and platforms. The "right" solution would include consistent handling for full file-saves, period-end saves, one-off account-saves, t-dumps, and OS-level item exports.

Yes, we can make use of version-control systems. We can front-end backup programs to prompt and save metadata (who, when, why the media is being created) and implement procedures to track where files go and their retention policy. There are many one-off answers to each of these problems. Everyone can create their own system. I will rarely put files into a folder with a name like "DELETE_AFTER_180101". Or I'll add a README.txt file with some info about what was on my mind when creating backups. But that's all manual, random, and not of any use to others who might follow.

What about regulatory compliance? Do any VARs or end-users here have rules that need to be followed for SOX or HIPAA?

Of course this concept isn't MV-specific but because our save format is binary and proprietary, I think we need a similarly unique method of tracking the media.

Let the brainstorms begin!

Thanks.
T

Peter McMurray

unread,
Oct 23, 2017, 6:44:05 PM10/23/17
to Pick and MultiValue Databases
We don't have any 24/7 D3 sites so we setup a nightly save of the system and then use a standard Windows system save that includes the dated D3 save.
At month end we give the user a program that creates a pseudo floppy with a month end date that only includes the active data accounts and becomes part of the Windows standard save.
The Windows save that I prefer is a zip to a removable drive that emails me a success or fail message. On the odd occasion that something needs examining I simply load the data account required under a name including the date and take it off when finished. Alternatively I get them to send me the data and I check it on our system. I have trouble getting through to some support suppliers that a mirror image is a waste of time and basically useless so I insist that the zip is done first. In 40 years I have never had a mirror restore but have moved everything from Reality half inch to Internet transfer zips across to sundry machines and several operating systems.
As for those individual work files I call them SAV...... and knock them off when done.
The VMWARE sites all have their own unique way of saving so I just tell them which pseudo I want and they put it up for me.
Oh! I should say that I make sure that the saves always go into a properly setup Windows directory called Backups under whatever administrative Windows Account I am running D3 0n.

Tony Gravagno

unread,
Oct 25, 2017, 2:17:36 PM10/25/17
to Pick and MultiValue Databases
All very nice, thanks. But it doesn't respond to the OP.
We all do backups like you do. That's the problem.
The question is: How do we actually know the purpose of a backup after it's in a file? What kind of metadata do people keep? How do you know which backup to go to if something fails?
If the answer is: "I just load up everything I have until I find something that looks right", again, that's what we all do. I'm looking for a better solution.

I/we can easily solve this problem with the tools available. I'm just surprised that after so many decades that we're still fumbling with files named "SAV...".

HTH
T

Rex Gozar

unread,
Oct 25, 2017, 4:47:58 PM10/25/17
to mvd...@googlegroups.com
I tend to make few backups, and those I do make I treat as throwaways.
I'll use my name in the backup filename, e.g. REX-AR-171025.zip, so
anyone who finds it later knows to come to me to ask its purpose. If I
forget why I made it, I'll just delete it. Anything more than a few
days old has probably outlived its usefulness and gets deleted.

I'll make temporary copies of accounts for development and testing,
then throw them away after I'm done.

I tend to use source code control to backup dictionaries, procs, and
programs -- it's easier to compare and restore specific versions. It
also makes it easier to track modifications; just snapshot before you
modify and after, then compare the snapshots (or better yet just build
a patch).

Of course, all production accounts get backed up every night - those
are stored on our SAN by date. I don't co-mingle throwaway and
production backups.

rex
> --
> You received this message because you are subscribed to
> the "Pick and MultiValue Databases" group.
> To post, email to: mvd...@googlegroups.com
> To unsubscribe, email to: mvdbms+un...@googlegroups.com
> For more options, visit http://groups.google.com/group/mvdbms

Peter McMurray

unread,
Oct 25, 2017, 5:22:09 PM10/25/17
to Pick and MultiValue Databases
How do we know the purpose of a backup?
Rex has answered exactly the same as me 
"Of course, all production accounts get backed up every night - those 
are stored on our SAN by date. I don't co-mingle throwaway and 
production backups. "
To which I add a period end backup that goes in its own folder. That way we can reproduce the exact situation when there is a court query or a tax investigation.
We always used to have paper backup that was required to be kept for 7 years. Nowadays we keep PDFs of invoices and statements, again backed up and maintained off site.
Also we have an audit file of master data changes. I realised long ago, like many others, that simply using a key to a master file was fraught with danger.
An excellent customer changes her name by marriage for example, or begins trading as WEGOTBIG. We wish to keep a good trail of the credit and order history without deleting the account as some stupid accountants tried to insist. Deleting the account is a definite NONO under any circumstances making it COBD or Closed is the obvious answer.
As for things named SAV.... well obviously they are transitory as Rex says just dumpem.
I feel that you are looking for an answer to something that does not require a solution.

Wols Lists

unread,
Oct 25, 2017, 5:56:37 PM10/25/17
to mvd...@googlegroups.com
On 25/10/17 21:47, Rex Gozar wrote:
> I tend to make few backups, and those I do make I treat as throwaways.
> I'll use my name in the backup filename, e.g. REX-AR-171025.zip, so
> anyone who finds it later knows to come to me to ask its purpose. If I
> forget why I made it, I'll just delete it. Anything more than a few
> days old has probably outlived its usefulness and gets deleted.

I had a tape backup cycle that backed up the server. Any dumps or
backups on disk that were over a month old could safely be deleted
because they were backed up on "for ever" backup tapes.

I can't remember the details, but because I'd read all the stuff about
how long tapes last :-) I designed a cycle that would, if followed,
guarantee each tape was taken out of circulation long before its design
life was reached.

Five daily tapes per set, three sets. Each week you re-used four tapes
and swapped one out for the weekly backup. Something along the lines of
"start with set 1, swap it for set 2, next week swap set 2 for 3, then 3
for 1, etc". But every fourth week, the tape that was swapped out had
its write-protect tab flipped, and was thrown in the fire-safe to be
replaced by a new tape. That meant that every tape was replaced after 60
weeks, and was only used 20 times. We had a daily backup going back a
week, a weekly backup going back a month, and a monthly backup going
back for ever.

On the few occasions we needed a backup, this was pretty reliable, We
went from DDS-1 tapes at the start to DDS-4 tapes at the end as I
remember it, which will date me a bit ... :-)

Cheers,
Wol

Rex Gozar

unread,
Oct 26, 2017, 8:01:17 AM10/26/17
to mvd...@googlegroups.com
Getting back to the OP, why keep the backup? What's in it? Why was it
made in the first place?

I'll keep production backups as a safety net in case of corrupted or
lost files (think disk crash). That's pretty much it.

Any other backup I make I'll treat as a temporary, throwaway backup in
the off chance I'll need something in it. I don't need to log the
purpose or content - if I forget what's in it or why I made it, I'll
just delete it.

"The system used to work this way..." -- I keep source code in CVS or
GIT; it's a lot easier to compare versions (versus restoring a
backup). I find "archiving" source code in a backup is ineffective;
source code control is a more useful way to archive/detect changes.

rex

Wols Lists

unread,
Oct 26, 2017, 8:32:19 AM10/26/17
to mvd...@googlegroups.com
On 26/10/17 13:01, Rex Gozar wrote:
> Getting back to the OP, why keep the backup? What's in it? Why was it
> made in the first place?
>
> I'll keep production backups as a safety net in case of corrupted or
> lost files (think disk crash). That's pretty much it.

We produced statistical data. While I don't think they did it much, our
production systems only kept current data, and the ability to go through
old data sets can be useful ...
>
> Any other backup I make I'll treat as a temporary, throwaway backup in
> the off chance I'll need something in it. I don't need to log the
> purpose or content - if I forget what's in it or why I made it, I'll
> just delete it.

It was tapes - tapes wear out :-) Would you rather throw serviceable
tapes out, or discover that a tape was worn out just when you were
trying to recover something crucial? And how much room does a box of 10
DDS tapes take up, anyway? Probably less than one old 9-track!
>
> "The system used to work this way..." -- I keep source code in CVS or
> GIT; it's a lot easier to compare versions (versus restoring a
> backup). I find "archiving" source code in a backup is ineffective;
> source code control is a more useful way to archive/detect changes.

This was LoonngggG before git. CVS might have been around but we didn't
have it. And this was also the company server, so the backup regime
simply covered everything.

(Plus, pretty much everywhere I worked, before I turned up it was
"Backups, what are those?". I was responsible for implementing basic
"safe computing" practices almost everywhere :-)

I'm a great believer in "if the computer can do it, let it".

Cheers,
Wol

Kevin Powick

unread,
Oct 26, 2017, 1:48:58 PM10/26/17
to Pick and MultiValue Databases
As evident by some of the replies, the most metadata people seem to keep is a naming convention of the backup file itself.

I suspect that many backup processes are thoroughly automated  to avoid such tedium.  From creation to file naming, archival, and even off-site storage, our own backup system is completely hands off.  Of course, this is the daily process.  For special, one-off, backups, I name them and/or drop them in a folder along with any relevant metadata (readme.txt).

The daily, automatic process takes care of any compliance requirements.  Managing the "one-offs" has never been such a burden that I've even thought about a "solution". 

Even if there were/are better tools for describing the contents and purpose of a backup, it would rely on people to create this information in the first place.  Would many bother?

--
Kevin Powick 

Tony Gravagno

unread,
Oct 27, 2017, 12:45:59 PM10/27/17
to Pick and MultiValue Databases
As Peter says, I may be looking for a solution where there isn't a generally perceived problem.
With thanks for other responses, I'm not looking for a backup cycling strategy, or for code version control.
My question is prompted by scenarios where Kevin is using a README file, and yes, where most of us wouldn't bother to keep notes - that's the problem. I'm replying to Kevin because so far I think he's closest to understanding the scenario.

I've apparently not done well with describing the problem to be solved. Let's take another shot.

I occasionally get clients with old systems that aren't maintained in a DBA sense. I try to archive accounts to reduce processing time for many challenges. So there needs to be a standard "archive place". Does that place get a single file-save or multiple account-saves? For this purpose I'd think individual account-saves. How are they labelled? "Final prior to removal"? Sure, that makes sense. Or maybe the folder is "DBMS Accounts Prior to Removal" with files named with the account-name. In hindsight that problem is solved with some revision. But in our industry there is no consistency about how this is done.

Another example: When doing major re-organization for a client, like centralizing common code or data, or doing a purge of old transactions that are not critical to the app or regulatory compliance, we'll get a backup. The file might be labelled "Full-save before purge 2010 non-critical transactions". That's a mouthful and we can all think of shorter versions of that. But as a filename that's pretty poor. It begs for a README or some other metadata that links a more concise filename with details about who made the save and perhaps how the purge was done so that recovery might be facilitated if ever required.

Yes, in the above case, "ever" is probably a limit of a couple years. So where's the table that tells us that it's time to purge old files?

Another example: When making sweeping changes for a client, like adjusting a program that now puts attributes into another file of their own, I tend to get a special backup. What if we find some months later that we missed moving some attribute to that new file, but it was removed from the old? Sure, in hindsight "you should have done the QA immediately" - but we all know crap happens, end-users might OK a project without fully checking data, and a lot of time can go by before we know there's a problem. So where does that backup go? And how do we know when we can purge that extra 20GB of data ... and all of those other 20GB blobs that get shifted around to different systems because we quickly run out of periodic allocations of "just another 500GB" to storage space?

Final example: In the 1980's I was saving code to 5 1/4" floppies, 9-track, 1/4", 3 1/2". then CD, DVD, and now I use external disk and cloud for everything. When I archive one of my retired products (NebulaShip, NebulaPay, NebulaManager ... you guys have seen the pile), where is that data? Again, in hindsight it's obvious to suggest version control, and file naming conventions. In the real world all of this evolves over time - and I have no doubt many of you have followed the exact same pattern of evolution, patient with your former selves for not being better organized each time you try to find that one bit of cool code written so long ago.

I have the same issue with all code and data, whether Visual Studio or Java project, LAMP, Android app, email, screenshots, contacts, or recipe. Metadata has application to everything we do in life. For some people metadata can be as simple as a rule like "if it's 'old' destroy it". We constantly make decisions about the rigor associated with maintenance of any single bit of information, where to keep it and how to label it. Where do we keep a record of those decisions? In our head? For this thread I'm just focused on backups for file records, accounts, and full-saves.

Chatty Friday..
Thanks.
T
Reply all
Reply to author
Forward
0 new messages