restoring possibly damaged logs

Tam McLaughlin

unread,

Dec 20, 1999, 3:00:00 AM12/20/99

to

We had a situation where there was a power cut, UPS kicked in and shut
down informix.
On its way up, a chunk from a dbspace was reported corrupt/bad/down (I
am not at work so cant
remember the exact error). this was with IDS 7.3 and SCO Unixware 7.01.
I shut down and started the engine
but the chunk was still "not sane" (i think), so Informix support
reccomended doing a restore.
Now at this point we had our level 0 and 1 ontape archives handy and a
tape of current logical logs (ontape -c).
When the system restarted after the power was restored, the logical logs
files written were still on disk ready for a backup to tape.
At the start of the ontape -r restore, I choose not to backup existing
logs to tape as I believed we should only restore to the last log log on
the tape before informix was shut down when the UPS stopped. Anything
after this
may have been corrupted with the bad chunk so I did not see the point in
backing up/restoring the log logs taken after the chunk went offline as
I could not gaurantee the integrity of the system.
Was this a correct choice to make?
Would the bad chunk still exist and be reported from the logical logs or
could it be that the lovel 0/1 restore
would have brought the chunk online (as it did) and would still have
been sane as the log logs would only have rrolled forward transactions
and know nothing about what happened to the chunk?

Any ideas on this would be appreciated.

Art S. Kagel

unread,

Dec 20, 1999, 3:00:00 AM12/20/99

to

In article <385DFDD6...@net.ntl.com>,

Unless the bad chunk was part of the logical log dbspace you should
have backed up the logs. The logs are written independently of the
data and so will normally not become corrupted by problems with other
chunks.

Art S. Kagel

Sent via Deja.com http://www.deja.com/
Before you buy.

Yuri Dovgart

unread,

Dec 20, 1999, 3:00:00 AM12/20/99

to

Tam McLaughlin <tam.mcl...@net.ntl.com> wrote in message
news:385DFDD6...@net.ntl.com...

> We had a situation where there was a power cut, UPS kicked in and shut
> down informix.

Shutdown was correct ? I mean, before shutting down Informix must perform
checkpoint. Did you seen that ?

> On its way up, a chunk from a dbspace was reported corrupt/bad/down (I
> am not at work so cant
> remember the exact error). this was with IDS 7.3 and SCO Unixware 7.01.
> I shut down and started the engine
> but the chunk was still "not sane" (i think), so Informix support
> reccomended doing a restore.

OK. Can you send us log fragment related to the Informix start ?

> Now at this point we had our level 0 and 1 ontape archives handy and a
> tape of current logical logs (ontape -c).
> When the system restarted after the power was restored, the logical logs
> files written were still on disk ready for a backup to tape.
> At the start of the ontape -r restore, I choose not to backup existing
> logs to tape as I believed we should only restore to the last log log on
> the tape before informix was shut down when the UPS stopped.

Generally speaking, you wrong. If your chunk is corrupted (for example, one
of it's pages), this means engine problem, not transaction. Anyway, it
depends on what happends to your chunk. In your case you will lost all
transactions that are in your logs on disk.

Anything
> after this
> may have been corrupted with the bad chunk so I did not see the point in
> backing up/restoring the log logs taken after the chunk went offline as
> I could not gaurantee the integrity of the system.
> Was this a correct choice to make?

As I told you before - I think it's not correct. You can't corrupt internal
Informix structures on disk with any SQL statements. That's why you MUST
replay all transactions till crash.

> that the lovel 0/1 restore
> would have brought the chunk online (as it did) and would still have
> been sane as the log logs would only have rrolled forward transactions
> and know nothing about what happened to the chunk?

Correct.
You need restore with full logical restore of all transactions.

HTH.
-------------------------------------------------
With best regards, Yuri Dovgart
SAP R/3, Informix technical consultant,
Informix Certified Professional,
Senior System Consultant
System Architecture and High Availability Systems,
'Telecominvest' company
Email y_do...@tci.ukrtel.net
ICQ 39284285

Rudy Fernandes

unread,

Dec 20, 1999, 3:00:00 AM12/20/99

to

Yuri Dovgart wrote:

> Tam McLaughlin <tam.mcl...@net.ntl.com> wrote in message
> news:385DFDD6...@net.ntl.com...
> > We had a situation where there was a power cut, UPS kicked in and shut
> > down informix.

> ...

>
> Generally speaking, you wrong. If your chunk is corrupted (for example, one
> of it's pages), this means engine problem, not transaction. Anyway, it
> depends on what happends to your chunk. In your case you will lost all
> transactions that are in your logs on disk.

My understanding is slightly different. ontape -c would have copied all full
logical logs to tape. The log that was current at the time of the crash would
be the only one missing from the tape. Consequently, all transactions that
were committed in that log would be lost (in addition, of course, to the ones
that were still open at the time of the crash!).

The correct procedure, of course, would have been to salvage that last log
(and others, to play safe!) onto a different tape for use during the logical
restore.

Rudy

Tam McLaughlin

unread,

Dec 20, 1999, 3:00:00 AM12/20/99

to

Rudy Fernandes wrote:

Sorry if I have not explained this properly. From the continuous log (ontape -c)
we were able to restore to the last log on tape before the crash. After the crash,
the server came back up and automatically started informix but not a continuous
log. When I came in on the morning, some cron jobs had been running during this
tme (2:30 am - 9 am). I shut down the engine and eventually done a restore. The
programs running after the crash would have generated transactions and filled up a
few logical log files despite their being a corrupt chunk. At the restore I had
the option to backup these current logs for use in the restore later but I did not
roll forward the current logs (which would have been on a second log log tape) as
I thought it best to restore to the log log file just before the
crash. This way i def knew there would be no corrupt chunks. But would there have
still been a corrupt chunk or could I be 100% positive that the chunk was ok and
the integrity of the system was ok if i had restored the log logs completed while
there was a bad offline chunk ?

Rudy Fernandes

unread,

Dec 20, 1999, 3:00:00 AM12/20/99

to tam.mcl...@net.ntl.com

Tam McLaughlin wrote:

> Sorry if I have not explained this properly. From the continuous log (ontape -c)
> we were able to restore to the last log on tape before the crash. After the crash,
> the server came back up and automatically started informix but not a continuous
> log. When I came in on the morning, some cron jobs had been running during this
> tme (2:30 am - 9 am). I shut down the engine and eventually done a restore. The
> programs running after the crash would have generated transactions and filled up a
> few logical log files despite their being a corrupt chunk. At the restore I had
> the option to backup these current logs for use in the restore later but I did not
> roll forward the current logs (which would have been on a second log log tape) as
> I thought it best to restore to the log log file just before the
> crash. This way i def knew there would be no corrupt chunks. But would there have
> still been a corrupt chunk or could I be 100% positive that the chunk was ok and
> the integrity of the system was ok if i had restored the log logs completed while
> there was a bad offline chunk ?

OK, so let's confirm what happened, in chronological sequence

1. Power failure. UPS brings down Informix (and ontape -c).
2. Power restored. Informix brought back up. Ontape -c NOT restarted.
3. Chunk failure occurs, possibly during Informix startup, but not noticed.
4. Cron jobs run, modifying data and creating logical log records.
5. 9.00am. Down chunk noticed. Informix tech support suggests restore. Restore
carried out with logical restore using logs that were backed up by the earlier
ontape-c process. Newer logs were not salvaged.

If the above is how events transpired, here's the analysis

As mentioned earlier, all transactions which were committed in logs which were not
restored will be lost. Cron jobs can be rerun, but transactions that were committed
or open in the log that was current at the time of power failure will be lost.

Correct Method : As Yuri mentioned, chunk failure and logical log records bear no
relation (almost always). The correct method would have been to (a) investigate
external reasons for chunk failure - e.g disk corruption, etc - and correct. (b)
Restore instance, salvaging logs when prompted at the start of the restore and then
using all logs since the last Level 1 during the logical restore. You would then have
the database restored to the 9am state.

HTH
Rudy

Art S. Kagel

unread,

Dec 21, 1999, 3:00:00 AM12/21/99

to tam.mcl...@net.ntl.com

Tam McLaughlin wrote:
>
> Rudy Fernandes wrote:
>
> > Yuri Dovgart wrote:
> >
> > > Tam McLaughlin <tam.mcl...@net.ntl.com> wrote in message
> > > news:385DFDD6...@net.ntl.com...
> > > > We had a situation where there was a power cut, UPS kicked in and shut
> > > > down informix.
> > > ...
> > >
> > > Generally speaking, you wrong. If your chunk is corrupted (for example, one
> > > of it's pages), this means engine problem, not transaction. Anyway, it
> > > depends on what happends to your chunk. In your case you will lost all
> > > transactions that are in your logs on disk.
> >
> > My understanding is slightly different. ontape -c would have copied all full
> > logical logs to tape. The log that was current at the time of the crash would
> > be the only one missing from the tape. Consequently, all transactions that
> > were committed in that log would be lost (in addition, of course, to the ones
> > that were still open at the time of the crash!).
> >
> > The correct procedure, of course, would have been to salvage that last log
> > (and others, to play safe!) onto a different tape for use during the logical
> > restore.
> >
> > Rudy
>

> Sorry if I have not explained this properly. From the continuous log (ontape -c)
> we were able to restore to the last log on tape before the crash. After the crash,
> the server came back up and automatically started informix but not a continuous
> log. When I came in on the morning, some cron jobs had been running during this
> tme (2:30 am - 9 am). I shut down the engine and eventually done a restore. The
> programs running after the crash would have generated transactions and filled up a
> few logical log files despite their being a corrupt chunk. At the restore I had
> the option to backup these current logs for use in the restore later but I did not
> roll forward the current logs (which would have been on a second log log tape) as
> I thought it best to restore to the log log file just before the
> crash. This way i def knew there would be no corrupt chunks. But would there have
> still been a corrupt chunk or could I be 100% positive that the chunk was ok and
> the integrity of the system was ok if i had restored the log logs completed while
> there was a bad offline chunk ?

Except that the logical logs could NOT have contained ANY transactions
against the down chunk since IDS will not read or write a bad chunk so the
logical log records would not have contained transactions corrupted by the
damaged chunk ONLY valid transactions against other chunks. I have to agree
with Rudy et al and again recommend that you should have saved and restored
the current log and any other logs created after the crash.

Art S. Kagel