Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

AIX, JFS 2, Snapshots and power loss

2 views
Skip to first unread message

Richard Price

unread,
Jul 1, 2009, 7:09:02 AM7/1/09
to

I have recently been tasked with improving the backup strategy of a
legacy server supporting 150 users via a terminal based interface. The
issue is currently that the server has a single backup taken at 2am,
and due to the nature of the application suite and languages involved
(each data file is discrete and there is no enforced referential
integrity between data files, but a record can be spread across
multiple data files - each data file is written to sequentially in the
application suite, and thus you have the potential for one file to be
updated while another is not, creating an inconsistent record), the
server needs to be unused during this time.

As such, we stand to lose a significant amount of work if the server
was to fail in some fashion at the end of the working day, but prior
to the backup being taken in the early morning.

As the server is running AIX 5.x, I have decided to implement JFS2
snapshots on the file systems that require backup, which means I can
reduce the 'time off system' in the early morning backup to just that
required to actually take hte backup. This will be our 'guaranteed
backup'.

However, I also wish to try and mitigate the risk of a full days data
loss via the taking of two 'non-guaranteed backups' during the day,
without removing users from the system.

The justification here is that, if we were to encounter a full
powerloss situation on the server, significant portions of the data
files will get corrupted - this occurred a month ago (the UPS blew the
protected circuit, taking down the server - one of those things that
are never supposed to happen). However, the act of taking a snapshot
will not result in corrupted data files within the snapshot, just the
potential for corrupted records currently being worked on. Or, in
other words, controllable, managable corruption levels that can be
checked for if everyone understands that it exists in the first place.

So, the question I need to ask is:

How well does JFS2 Snapshots handle complete powerloss situations? In
the incident we had last month, we lost approx 60% of our data through
corruption, but how would a snapshot of that partition have faired?
Would it also have suffered corruption, or would it have been OK?

For example, I have /mydata/ and I snapshot it to /mysnapshot at 6pm.
At 7pm we encounter the 'worst case scenario' and /mydata is left
significantly corrupt. Will the snapshot also be corrupt? How does AIX
and JFS2 handle this in the background? Will the snapshot be usable?

I hasten to add that there are also tape and remote file copy backups
being taken during the 2am window, so we are not relying on snapshots
as the actual backup, just a means to an end of improving the backup.
The extra snapshots during the day are a certain nicety rather than
anything we would be reliant on.


Cheers
Richard Price

Hajo Ehlers

unread,
Jul 1, 2009, 12:51:47 PM7/1/09
to
A snapshot helps against local user errors - like deleting the whole
content of a FS ;-(
It does not guarantee the file system integrity after a crash.
You should backup the snapshot thus in case of a crash and a defect fs
you can restore from the main backup and add just the delta from the
snapshot backup.

hth
Hajo

Richard Price

unread,
Jul 2, 2009, 4:05:48 AM7/2/09
to
Thanks, thats what I thought.

I was looking to see if I could rely on the intermediate snapshots
(the ones taken during the day) were enough to be relied upon up to a
point, with the safety of the guaranteed backup there.

I can't back the day snapshots up, as that increases the IO workload
on the server significantly, impacting the day-to-day operations of
the server :/

Thanks for the reply.

Regards
Richard

Jose Pina Coelho

unread,
Aug 7, 2009, 1:15:49 PM8/7/09
to
Richard Price <richar...@gmail.com> wrote in news:2661f9f2-563b-42e6-
be53-097...@r25g2000vbn.googlegroups.com:

> Thanks, thats what I thought.
>
> I was looking to see if I could rely on the intermediate snapshots
> (the ones taken during the day) were enough to be relied upon up to a
> point, with the safety of the guaranteed backup there.
>
> I can't back the day snapshots up, as that increases the IO workload
> on the server significantly, impacting the day-to-day operations of
> the server :/

This looks like one hell of a mess for what looks like a critical machine.


Does the application write a transaction log on the side that can be used
to recover the information ?

As Hajo says, a snapshot is a point-in-time copy.

0 new messages