Media recovery when using an old controlfile

Jurjen Oskam

unread,

Jun 25, 2003, 4:40:32 AM6/25/03

to

Hi everybody,

I'm experimenting with several options regarding the backup and recovery of an
Oracle 8.1.7 database.

On the production machine, I first created a copy of the current controlfile.
After some seconds, I did the equivalent of a SHUTDOWN ABORT.[1]

Then, I copied *all* the files over to another machine, along with the copy of
the controlfile. I changed the control_files parameter on the backup machine
to the copy of the controlfile.

If I understand correctly, this results in (on the second machine) datafiles
that need media recovery, with a not-current controlfile.

The situation is that I cannot start this database. First, I did a STARTUP
MOUNT. Then, I did RECOVER DATABASE USING BACKUP CONTROLFILE UNTIL CANCEL.
This results in the following:

ORA-00279: change 1702722594 generated at 06/25/2003 08:22:23 needed for thread 1
ORA-00289: suggestion : /oracle/app/oracle/admin/prod1/arch/arch1_49248.rdo
ORA-00280: change 1702722594 for thread 1 is in sequence #49248
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
CANCEL
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/dbentry/prod1/div1_500_01'

However, sequence #49248 was not generated yet at the time of the SHUTDOWN
ABORT of the original database. And indeed, when I try to open the database
with RESETLOGS, I get the error I was warned about above.

Is it possible to do media recovery with an older version of a controlfile?
And if it's possible, is it sensible? :-)

Right after the above failure to start the database with the backup
controlfile, I changed the control_files parameter to the controlfile that
was current at the time of the SHUTDOWN ABORT. With this controlfile,
I was able to do a STARTUP without problems at all.

[1] I didn't really SHUTDOWN ABORT the production database. What I did
was atomically make a copy of the disks the database is on, and this
copy was presented to the second host.
--
Jurjen Oskam

PGP Key available at http://www.stupendous.org/

Norman Dunbar

unread,

Jun 25, 2003, 4:57:41 AM6/25/03

to

Morning Jurjen,

I've never had to do one, and as far as I can remember, I've never even
tried one, but assuming you are running in archivelog mode, then
attempting a recovery with an OLDER version of the control file is going
to leave you somewhat bolloxed !

The controlfile(s) should never be restored if you wish to recover
database because the current controlfile has the SCN number that is the
current one. The old control file will have the old one. I've no idea
whether Oracle will let you actually do this (and to be totally honest,
I've got no time to try it out !)

On the other hand, maybe I missed the point of your question !

Cheers,
Norman.

Arcangelo

unread,

Jun 25, 2003, 9:13:00 AM6/25/03

to

"Norman Dunbar" <Norman...@lfs.co.uk> wrote in message
news:E2F6A70FE45242488C86...@lnewton.leeds.lfs.co.uk...

I've missed the subtleties of the original post, but a couple of points:

Oracle does of course allow you to use an old controlfile to perform a
recovery. What else would you expect to be able to do if a disk failure took
out all your controlfiles and a datafile?

(Of course, multiplexing the controlfile should stop this happening, but
I've seen many sites where you are told 'of course we multiplex them', only
to discover that one copy is on C: drive, one is on D: drive, and both C and
D are partitions on the same physical disk!).

The command 'recover database using backup controlfile' allows you to
perform a complete recovery with an old controlfile. 'recover database until
cancel using backup controlfile' allows you to do an incomplete recovery
using an old controlfile.

In either case, the 'using backup controlfile' clause simply tells your
server process not to bother reading the controlfile's SCN to determine the
'age' of the database, but to look at the datafiles themselves to find the
highest possible SCN... that will be the point to which a complete recovery
is then performed, for example.

Despite some posts I've seen here, the 'using backup controlfile' clause
never needs to be supplied unless you've actually restored a binary backup
of that file prior to starting recovery. But in a total loss of controlfiles
scenario, it may well be that that's exactly what you have to do.

As to the original poster's question, I can't tell. He says he did a 'sort
of' shutdown abort, but that actually consisted of copying the datafiles!!
Don't even pretend to know how a copy can look like a shutdown abort.

Anyway: the error 'but OPEN RESETLOGS would get error below ORA-01194: file
1 needs more recovery to be consistent' invariably means that not all
datafiles were restored for the purposes of an incomplete recovery.
Therefore, a (or possibly several) datafile(s) is/are at a time ahead of
where the incomplete recovery is halted, and since they can't be rolled
backwards in time, the database is doomed to be forever inconsistent. I
suppose you could try a select from v$datafile_header to see what the SCNs
are at, and check the copying procedures to make sure you haven't forgotten
something.

;-*

Richard Foote

unread,

Jun 25, 2003, 9:33:28 AM6/25/03

to

"Jurjen Oskam" <jos...@quadpro.stupendous.org> wrote in message
news:slrnbfio00...@calvin.quadpro.stupendous.org...

Hi Jurjen

OK, let's see if this makes sense ;)

When you copied the datafiles across on the fly, they're inconsistent as a
checkpointed shutdown was not perform. Therefore, the headers of the files
have various and differing SCNs littered among them. Assuming that the last
completed checkpoint was performed during the previous log switch, any
inconsistencies can be resolved within the current, online redo log. By that
I mean that any changes to uncommited blocks written to disk (but recorded
within the undo segments) and any committed blocks found within the buffer
cache (but not written to disk) *must* all be found within this current
online redo log. Therefore in the case of an instance failure, or a shutdown
abort, or indeed your "inconsistent" copy of the datafile, Oracle will
require this current online redo log in order to apply all the committed
changes and clean out all the uncommited changes. That is recover the
database.

One word of warning here though is that because you didn't (I assume) put
the tablespaces into backup mode, your copy of the database doesn't quite
equate to a shutdown abort scenario. This is because a block as it's being
copied could have a portion of it copied but then have DB Writer come and
overwrite the remaining portion of the block. When this remaining portion
gets copied we now have an inconsistent (or "fractured" block) which Oracle
will *not* be able to recover. Oracle will require a consistent copy of this
block from which to recover, that which is recorded by Oracle in it's redo
stream when a tablespace is in backup mode (one of the key reasons why doing
so is so important for hot backups).

Assuming we don't have any fractured blocks let's move on. You now have a
set of inconsistent datafiles and a old or previous copy of the control
file. This control file is not going to be an issue for us, *providing* it
still describes the physical details of the current database, which I assume
it does.

When you quite correctly run your recovery statement, you have stated to
Oracle that the controlfile is not current and hence can't be used to police
the required SCN of the *current* database. In other words you wish to
perform an incomplete recovery and recover the controlfile as well.
Therefore Oracle investigates the headers of the datafiles and determines
which is the oldest, non read-only datafile as described in the control
file. This is the point (and the redo log sequence) from which we wish to
recover the database. Assuming that the last completed checkpoint was
performed during the previous log switch, this means it's the current online
redo that Oracle requires. That is, log 49428 is the current online redo log
and as such can not possibly be archived.

But you don't allow Oracle to perform any recovery, you simply issue the
cancel command. That means that the data files remain inconsistent. There is
currently a datafile that has not be written to before the SYSTEM datafile,
has an SCN previous to that recorded in the SYSTEM datafile and hence is at
a future point in time to that guaranteed to that in SYSTEM. Therefore
Oracle is complaining that SYSTEM (datafile 1) must be further recovered
before Oracle will allow you to open the database. Your incomplete recovery
was successful (as such) but you must perform more recovery and ensure that
*all* datafiles are consistent until Oracle is going to be happy with you.

To recover the database, you must therefore copy the current online redo log
across and *completely* apply all changes associated with this redo log.
Then all your datafiles *must* be consistent and when it now asks for the
next redo (which Oracle will do BTW because the control file will not know
it's reached the end, it's not the *current* redo log remember) you can
safely issue the cancel command.

Generally you perform an incomplete recovery by restoring all the necessary
datafiles and recovering forward to the required point in time (or redo
log). In your scenario, because all the datafiles are kinda current, they
can only become consistent once all the generated redo logs have been
applied (importantly, including the current one).

Make sense ?

Richard

Jurjen Oskam

unread,

Jun 25, 2003, 10:36:08 AM6/25/03

to

In article <3ef99fde$0$5429$afc3...@news.optusnet.com.au>, Arcangelo wrote:

> As to the original poster's question, I can't tell. He says he did a 'sort
> of' shutdown abort, but that actually consisted of copying the datafiles!!

I'll try to describe what I did:

* On machine A, a database named prod1 runs, using raw logical
volumes. Each raw logical volume consists of three copies on separate
disks in the storage system. All the datafiles, controlfiles, online
redo logs, etc. are on raw logical volumes.

* A copy of the controlfile is generated in /db/copy-controlfile

* The storage system splits off a mirror, reducing the 3-way mirror to
a 2-way mirror. It does this by suspending I/O to all database volumes
at the same time, splitting off the third mirror, and then resuming
I/O to the remaining database volumes. This takes about a quarter of a
second, during which the database keeps on running.

* The split off mirror is attached to machine B. This means that all the
database files as they were at the time of the split are now available on
machine B.

* The backup of the controlfile generated in step 2 is copied from
machine A to machine B.

* On machine B, the control_file parameter is set to the backup controlfile
from the previous step.

As I understand it, this results in datafiles that are further in time than
the controlfile. I'd like to start the database on machine B using the backup
controlfile. I failed to do this, with the symptoms as described in my
original post.

> Don't even pretend to know how a copy can look like a shutdown abort.

You're right, I shouldn't. But, as far as the split-off datafiles are
concerned, they could just as well be the result of a system crash: at time
T-1, they were being written to/read from by the Oracle instance, at time
T they were suddenly (and all at the same time) stopped being written to/read
from by the Oracle instance. From the split-off copy's viewpoint, the
Oracle instance just disappeared. My assumption was that this is equivalent
to a SHUTDOWN ABORT. Since I don't *exactly* know what SHUTDOWN ABORT does,
this assumption was not justified.

> Anyway: the error 'but OPEN RESETLOGS would get error below ORA-01194: file
> 1 needs more recovery to be consistent' invariably means that not all
> datafiles were restored for the purposes of an incomplete recovery.
> Therefore, a (or possibly several) datafile(s) is/are at a time ahead of
> where the incomplete recovery is halted, and since they can't be rolled
> backwards in time, the database is doomed to be forever inconsistent.

As I said above, I failed to start the database using the backup controlfile
created in step 2 above. However, after I changed the control_file parameter
(on machine B) to the split-off controlfile (the result from step 3 above),
the database could be started with a simple STARTUP command, nothing else
needed. With the backup controlfile the database failed to start up, with
the current controlfile no problems whatsoever were encountered, with
exactly the same datafiles.

Norman Dunbar

unread,

Jun 25, 2003, 10:23:16 AM6/25/03

to

I stand corrected - thanks.

I'm going back to boat building - I'm no good at this Oracle stuff :o)

Cheers,
Norman.

Jurjen Oskam

unread,

Jun 25, 2003, 11:17:34 AM6/25/03

to

In article <WdhKa.1167$p8.4...@newsfeeds.bigpond.com>, Richard Foote wrote:

[ excellent explanation snipped ]

> One word of warning here though is that because you didn't (I assume) put
> the tablespaces into backup mode, your copy of the database doesn't quite
> equate to a shutdown abort scenario. This is because a block as it's being
> copied could have a portion of it copied but then have DB Writer come and
> overwrite the remaining portion of the block. When this remaining portion

This is point I admittedly oversimplified in my original post. As you may
have read in my previous post, the storage system prevents this situation:
all I/O to the entire database is frozen, the copy is made (in the form
of a mirror being split off), and then I/O continues. From the DB Writer's
point of view, the copy of the entire database is made atomically.

> When you quite correctly run your recovery statement, you have stated to
> Oracle that the controlfile is not current and hence can't be used to police
> the required SCN of the *current* database. In other words you wish to
> perform an incomplete recovery and recover the controlfile as well.
> Therefore Oracle investigates the headers of the datafiles and determines
> which is the oldest, non read-only datafile as described in the control
> file. This is the point (and the redo log sequence) from which we wish to
> recover the database. Assuming that the last completed checkpoint was
> performed during the previous log switch, this means it's the current online
> redo that Oracle requires. That is, log 49428 is the current online redo log
> and as such can not possibly be archived.

That is what I understood as well.

> To recover the database, you must therefore copy the current online redo log
> across and *completely* apply all changes associated with this redo log.

The online redo logfiles were part of the copy-by-splitting-off-mirror, so
those are available on the second machine. Are you saying that instead of
pointing to log 49428 (which doesn't exist as an archived log yet, but only
in an online redo log), I should point to /dev/rredo_log_volume? Will that
work? How can I tell which online redolog volume to use? In that particular
database, there are nine online redolog files. Will Bad Things(tm) happen
if I try to apply the wrong online redolog?

> Make sense ?

Yes, very much so! Thank you, and thanks to everybody who takes part in this
thread. It's really nice of all of you to provide such helpful and detailed
explanations. I'm not a DBA, but I am responsible for backup/recovery so all
this is excellent information with respect to Oracle backup and recovery.

Tanel Poder

unread,

Jun 25, 2003, 12:59:43 PM6/25/03

to

Hi!

Some thoughts:

1) If your array supports write caching, are the cached writes written to
all splits?
2) Do you split *entire* database atomically, that means all datafiles,
redos and controlfiles? If half of db is taken at one time and second half
is taken 1/100 seconds later you could easily have inconsistencies.
3) I've always put datafiles to backup mode before splitting. There is a
command alter database suspend / resume which should disallow any writes
during set, but it has several problems (such corruptions causesd by direct
writes), so I don't recommend it. One good thing with splitting files in
backup mode, you can use them for recovery later on (if copied away before
opening)

Tanel.

"Jurjen Oskam" <jos...@quadpro.stupendous.org> wrote in message

news:slrnbfjf8d...@calvin.quadpro.stupendous.org...

Jurjen Oskam

unread,

Jun 25, 2003, 1:25:25 PM6/25/03

to

In article <3ef9d...@news.estpak.ee>, Tanel Poder wrote:

> 1) If your array supports write caching, are the cached writes written to
> all splits?

As I understand it, yes. The freezing and thawing of I/O happens at the
AIX device driver level; from a certain point of time no new I/O actions
are accepted, and current I/O actions are completed. The split is done, after
which new I/O actions are allowed again. The storage array is an EMC Symmetrix
with TimeFinder enabled, and PowerPath on the host.

> 2) Do you split *entire* database atomically, that means all datafiles,
> redos and controlfiles? If half of db is taken at one time and second half
> is taken 1/100 seconds later you could easily have inconsistencies.

The entire database is split atomically.

> 3) I've always put datafiles to backup mode before splitting. There is a
> command alter database suspend / resume which should disallow any writes

With these kind of splits, I have always been able to split off a restartable
database. It was only when non-current controlfiles started to get involved
that I ran into problems (the main problem being my limited Oracle - or
databases in general - knowledge :-) ).

Tanel Poder

unread,

Jun 25, 2003, 7:38:30 PM6/25/03

to

Ok, now I actually read your original post through.

Check this: the log #49248 is actually online redolog, but recovery process
doesn't know it, because of old controlfile.
Try to feed any of your (splitted) online logs to it. If you feed wrong one,
no harm is done, recovery process reads log# from the file and aborts if
gets the wrong one.

Tanel.

"Jurjen Oskam" <jos...@quadpro.stupendous.org> wrote in message

news:slrnbfjmo5...@calvin.quadpro.stupendous.org...

Jurjen Oskam

unread,

Jun 26, 2003, 1:54:25 AM6/26/03

to

In article <3efa3286$1...@news.estpak.ee>, Tanel Poder wrote:

> Try to feed any of your (splitted) online logs to it. If you feed wrong one,
> no harm is done, recovery process reads log# from the file and aborts if
> gets the wrong one.

Thanks, I'll try this today.

Tanel Poder

unread,

Jun 26, 2003, 3:48:25 AM6/26/03

to

Please let me know the results then.

Cheers,
Tanel.

"Jurjen Oskam" <jos...@quadpro.stupendous.org> wrote in message

news:slrnbfl2kh...@calvin.quadpro.stupendous.org...

Jurjen Oskam

unread,

Jun 26, 2003, 6:00:50 AM6/26/03

to

In article <3efaa55b$1...@news.estpak.ee>, Tanel Poder wrote:

> Please let me know the results then.

It works! In the log below, it asks for sequence 49366, but that isn't generated
yet. Trying to apply the online redo logs worked:

SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 1703294757 generated at 06/26/2003 10:17:08 needed for thread 1
ORA-00289: suggestion : /oracle/app/oracle/admin/prod1/arch/arch1_49366.rdo
ORA-00280: change 1703294757 for thread 1 is in sequence #49366

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

/dbentry/prod1/redo1_100_01
ORA-00310: archived log contains sequence 49362; sequence 49366 required
ORA-00334: archived log: '/dbentry/prod1/redo1_100_01'

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/dbentry/prod1/div1_500_01'

[snip several other wrong redologs]
SVRMGR> recover database using backup controlfile until cancel;
ORA-00279: change 1703294757 generated at 06/26/2003 10:17:08 needed for thread 1
ORA-00289: suggestion : /oracle/app/oracle/admin/prod1/arch/arch1_49366.rdo
ORA-00280: change 1703294757 for thread 1 is in sequence #49366

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

/dbentry/prod1/redo2_100_01
Log applied.
Media recovery complete.
SVRMGR> alter database open resetlogs;
Statement processed.
SVRMGR>

Unfortunately, I mindlessly opened the database with RESETLOGS. It would
have been nice to see NORESETLOGS also work, since the database is
recovered to the most recent condition.

Tanel Poder

unread,

Jun 26, 2003, 6:33:44 AM6/26/03

to

> It works! In the log below, it asks for sequence 49366, but that isn't
generated
> yet. Trying to apply the online redo logs worked:

Great!

Tanel.

Richard Foote

unread,

Jun 26, 2003, 8:46:04 AM6/26/03

to

"Jurjen Oskam" <jos...@quadpro.stupendous.org> wrote in message

news:slrnbflh2i...@calvin.quadpro.stupendous.org...

> In article <3efaa55b$1...@news.estpak.ee>, Tanel Poder wrote:
>
> > Please let me know the results then.
>
> It works! In the log below, it asks for sequence 49366, but that isn't
generated
> yet. Trying to apply the online redo logs worked:
>

Hi Jurjen,

I had every confidence in you ;)

Cheers

Richard

Brian Peasland

unread,

Jun 26, 2003, 9:09:33 AM6/26/03

to

I believe (haven't tried it myself), that since you recovered using a
backup controlfile that you would have had to open with RESETLOGS
anyway. You were able to recover everything, but the system probably
needed RESETLOGS to open. To the system, you still did incomplete
recovery even though you got everything. So you now have a new
incarnation of the database, and hopefully, you've backed it up by now.
Remember, when you open with RESETLOGS, you effectively wipe out what is
in the online redo logs so you won't be able to use the same recovery
method again.

HTH,
Brian

--
===================================================================

Brian Peasland
oracle_dba@remove_spam.peasland.com

Remove the "remove_spam." from the email address to email me.

"I can give it to you cheap, quick, and good. Now pick two out of
the three"

Jurjen Oskam

unread,

Jun 26, 2003, 10:32:51 AM6/26/03

to

In article <3EFAF08D.79FF7B6B@remove_spam.peasland.com>, Brian Peasland wrote:

> I believe (haven't tried it myself), that since you recovered using a
> backup controlfile that you would have had to open with RESETLOGS
> anyway. You were able to recover everything, but the system probably
> needed RESETLOGS to open. To the system, you still did incomplete

That's no big loss - I do not intend do to restores that often. :-) I
just needed to make sure that this method works. I'm currently working
on the exact procedure. When I'm done and tested it thoroughly, I'll
post it here if there's any interest.