Oracle 8i Standby Complete Recovery

Chris Forbis

unread,

Oct 14, 2002, 5:49:54 PM10/14/02

to

I am looking into ways of recovering in the case of a major failer on
a primary server. I have the idea to create a standby server, and
this seems to take care of much of the work. The problem I see is
when the primary system fails, and a log switch has not happened in
the last 4 minutes, It seems I loss that 4 minutes of data because the
log has not been archived and moved.

Ideas of how to get 100% no data loss?

Thanks

Chris

Ronald Rood

unread,

Oct 15, 2002, 12:21:12 AM10/15/02

to

On Mon, 14 Oct 2002 23:49:54 +0200, Chris Forbis wrote
(in message <f2dc430d.02101...@posting.google.com>):

Switch to 9i. It allows you to configure dataguard with no data loss (at the
cost of a performance penalty).

Ronald
-----------------------
http://ronr.nl/unix-dba

Richard Foote

unread,

Oct 15, 2002, 10:01:38 AM10/15/02

to

Hi Chris,

With 8i, this is the big issue with the use of a Standby DB.

If the "disaster" doesn't allow time for a final switch logfile, then yes,
potentially data will be lost. In some scenarios, you may have the
opportunity for an 'alter system switch logfile' command and the logfile's
subsequent archive, in some scenarios you may not.

In a site I have been working at, they issue a log switch every half hour to
resync their standby DB to minimise possible data loss.

If this possible data loss is a real issue for you, I would recommend
either:

1) Move to 9i and make use of Data Guard which through the use of the LGWR
process being used to transport redo to the standby DB, can guarantee no
loss of data. The cost of this is probable performance loss on the Prod DB
but there are various options/settings that can be used to "balance"
performance/protection as appropriate.

or

2) Hire DBAs who put the well being of the database ahead of personal
protection. You need DBAs made of the right stuff, who during an earthquake,
rather than flee the building as it comes crashing down and as flames from
burst gas pipes engulf the place will think, "Must ... get ... to ...
console (cough cough), must ... switch .... logfile ........".

However, I'm a rare breed and option 1 might be the way to go ;)

Cheers

Richard
"Chris Forbis" <chris...@yahoo.com> wrote in message
news:f2dc430d.02101...@posting.google.com...

Chris Forbis

unread,

Oct 15, 2002, 10:46:23 AM10/15/02

to

Yea, but the major issue with datagaurd is thatboth systems become a
point of failer, so level of failer goes up.

I had an idea, but not sure about it...
Is it possible by looking at the alert log to see what redo log is
current one, then look at the archive log numbers, and copy the
redo_current over to archvie_##### (what ever number is next), then
let it import till it is done, then restart db?

I know this assumes that you can get to the redo on the broken server,
but I figure unless just a crazy issue if you have a few copies on
differnet drive sets there should be a way...

Thoughts?

Ronald Rood <ro...@ronr.nl> wrote in message news:<0001HW.B9D16458...@news.cis.dfn.de>...

Niall Litchfield

unread,

Oct 15, 2002, 11:20:35 AM10/15/02

to

"Richard Foote" <richar...@bigpond.com> wrote in message
news:Q3Vq9.53472$g9.1...@newsfeeds.bigpond.com...

> 1) Move to 9i and make use of Data Guard which through the use of the LGWR
> process being used to transport redo to the standby DB, can guarantee no
> loss of data. The cost of this is probable performance loss on the Prod DB
> but there are various options/settings that can be used to "balance"
> performance/protection as appropriate.

As did Ronald. You should also do a cost benefit analysis of how much you
care about that 4 minutes worth of data in the case of your 'disaster'. Lets
assume that your datacentre has been wiped out by some actual disaster (say
a terrorist attack because you work for a defence agency). This attack has
also taken out your data management staff. Which is going to cost your
business more 4 minutes of data or the loss of the staff and facilities?

A more reasonable cost/benefit analysis might say using dataguard (or some
other 100% availability technology) costs us a 5% performance hit for every
moment the system is running. Do we think that this is worth it to ensure we
really, really don't ever lose any data. Then of course there is the
question as to how much of your data is accurate in the first place :(.

I think what I am saying is to get say 5 9's availability costs big time.
The question is is it worth it.#

--
Niall Litchfield
Oracle DBA
Audit Commission UK
*****************************************
Please include version and platform
and SQL where applicable
It makes life easier and increases the
likelihood of a good answer

******************************************

Karen Abgarian

unread,

Oct 16, 2002, 5:18:19 PM10/16/02

to

Some ideas that come to mind:

- the standby server cannot garantee 100% no data loss even with
Dataguard because the primary host can go down before the Dataguard
copies the changes to the standby.

- there are other options except for standby server. For example, the
VCS cluster has better chances to provide 100% no data loss because
even if the host does go down, the filesystems can be mounted from
another box and Oracle can do instance recovery.

- you will have to do a lot of automation work if you are not using
Dataguard. First of all, the managed recovery switches off if it
encounters any problems, e.g. a network failure. It does not switch
back on, and archived log files generated during the period it was
off need to be manually transfered to standby and applied. You will
need a mechanism to watch it and transfer files. What you might find
yourself doing is to ignore the managed recovery and write your own
scripts to transfer files. On that route, you will encounter a lot of
obstacles if you try to make it close to what Dataguard does.

- the DataGuard (and 9iR2) software is relatively new and is probably
full of bugs. If your system is critical, you may consider not to test
out
the Oracle bugs on it. 8i software has plenty bugs as well but they are
KNOWN :).

- when planning for failover implementation, you need to know your
requirements. There are other things to consider except for data loss.
For example, how quickly the system needs to become available. You
also need to define the types of disaster you are trying to protect your
system from. And to get an idea what it's going to cost you (your
company).

I have seen several posts mentioning the performance impact by
DataGuard. I am pretty sure there's got to be some, but maybe somebody
could elaborate on this and let me know what this performance impact
is due to?

Another question I am trying to get answered: how does DG manage
TWO standby databases? What happens with redo data if one of them
is temporarily unavailable?

Thank you for any answers.

Regs
AK

Pete Sharman

unread,

Oct 16, 2002, 10:52:08 PM10/16/02

to

In article <3DADD79B...@apple.com>, Karen says...

>
>Some ideas that come to mind:
>
>- the standby server cannot garantee 100% no data loss even with
>Dataguard because the primary host can go down before the Dataguard
>copies the changes to the standby.

THis is not correct when you are using DataGuard in either instant or guaranteed
mode. In each of these, LGWR writes synchronously to the standby redo logs. Of
course, this also has the greatest performance impact, so you have to balance
your performance requirements with your disaster recovery requirementst.

>
>- there are other options except for standby server. For example, the
>VCS cluster has better chances to provide 100% no data loss because
>even if the host does go down, the filesystems can be mounted from
>another box and Oracle can do instance recovery.
>
>- you will have to do a lot of automation work if you are not using
>Dataguard. First of all, the managed recovery switches off if it
>encounters any problems, e.g. a network failure. It does not switch
>back on, and archived log files generated during the period it was
>off need to be manually transfered to standby and applied. You will
>need a mechanism to watch it and transfer files. What you might find
>yourself doing is to ignore the managed recovery and write your own
>scripts to transfer files. On that route, you will encounter a lot of
>obstacles if you try to make it close to what Dataguard does.
>
>- the DataGuard (and 9iR2) software is relatively new and is probably
>full of bugs. If your system is critical, you may consider not to test
>out
>the Oracle bugs on it. 8i software has plenty bugs as well but they are
>KNOWN :).

DataGuard has been available since June last year, so I think you'll find most
of the bugs have been ironed out by now.

>
>- when planning for failover implementation, you need to know your
>requirements. There are other things to consider except for data loss.
>For example, how quickly the system needs to become available. You
>also need to define the types of disaster you are trying to protect your
>system from. And to get an idea what it's going to cost you (your
>company).
>
>I have seen several posts mentioning the performance impact by
>DataGuard. I am pretty sure there's got to be some, but maybe somebody
>could elaborate on this and let me know what this performance impact
>is due to?

The performance impact is directly related to the fact that there is more work
to do. Depending on the amount of redo generation and the type of network
transfer mode (syncrhonous or asynchronous) this could be a greater or lesser
impact. It's impossible to make generic statements like "DataGuard will cost
you x% in performance downgrade" for these reasons. As always, the only valid
measurement is to test it yourself with the application you're intending to use
it with.

>
>Another question I am trying to get answered: how does DG manage
>TWO standby databases? What happens with redo data if one of them
>is temporarily unavailable?

Depends on how you've set it up. You may have it set up as one mandatory and
one optional for example, and then the impact is less if the optional one dies.

>
>Thank you for any answers.

No problems.

Pete

>
>Regs
>AK
>
>
>Chris Forbis wrote:
>
>> I am looking into ways of recovering in the case of a major failer on
>> a primary server. I have the idea to create a standby server, and
>> this seems to take care of much of the work. The problem I see is
>> when the primary system fails, and a log switch has not happened in
>> the last 4 minutes, It seems I loss that 4 minutes of data because the
>> log has not been archived and moved.
>>
>> Ideas of how to get 100% no data loss?
>>
>> Thanks
>>
>> Chris
>

HTH. Additions and corrections welcome.

Pete

SELECT standard_disclaimer, witty_remark FROM company_requirements;

Howard J. Rogers

unread,

Oct 17, 2002, 7:22:30 AM10/17/02

to

"Karen Abgarian" <ab...@apple.com> wrote in message
news:3DADD79B...@apple.com...

> Some ideas that come to mind:
>
> - the standby server cannot garantee 100% no data loss even with
> Dataguard because the primary host can go down before the Dataguard
> copies the changes to the standby.
>

Not if you configure it so that a commit doesn't count as a commit UNTIL
it's been sent to the standby, and receipt of its successful transmission
has been received -which is exactly how you *can* configure Data Guard if
you so choose.

> - there are other options except for standby server. For example, the
> VCS cluster has better chances to provide 100% no data loss because
> even if the host does go down, the filesystems can be mounted from
> another box and Oracle can do instance recovery.

Configured in its toughest way, Data Guard *does* provide 100% no data loss.

> - you will have to do a lot of automation work if you are not using
> Dataguard. First of all, the managed recovery switches off if it
> encounters any problems, e.g. a network failure. It does not switch
> back on, and archived log files generated during the period it was
> off need to be manually transfered to standby and applied. You will
> need a mechanism to watch it and transfer files. What you might find
> yourself doing is to ignore the managed recovery and write your own
> scripts to transfer files. On that route, you will encounter a lot of
> obstacles if you try to make it close to what Dataguard does.
>
> - the DataGuard (and 9iR2) software is relatively new and is probably
> full of bugs. If your system is critical, you may consider not to test
> out
> the Oracle bugs on it. 8i software has plenty bugs as well but they are
> KNOWN :).

Honestly! Everyone's entitled to their own opinion, of course. But I do
rather wish less people would post their unsubstantiated opinion here as if
it counted for anything. Have you actually tested Data Guard? It's not
exactly difficult to do, and then you'd be able to post some real facts
about it. It's not "full of bugs": and neither is the product "relatively"
new, having been in the marketplace for over 12 months now.

> - when planning for failover implementation, you need to know your
> requirements. There are other things to consider except for data loss.
> For example, how quickly the system needs to become available. You
> also need to define the types of disaster you are trying to protect your
> system from. And to get an idea what it's going to cost you (your
> company).
>
> I have seen several posts mentioning the performance impact by
> DataGuard. I am pretty sure there's got to be some, but maybe somebody
> could elaborate on this and let me know what this performance impact
> is due to?

Uh, well (and I'm trying to be polite about this) your first statement in
this entire post is incorrect as mentioned above precisely because Data
Guard can be configured to use *LGWR* to synchronously transport redo to the
standby. Now the source of potential performance impacts should be rather
obvious: anything that slows down LGWR potentially causes grief on the
production system, and Data Guard potentially slows it down an awful lot,
charging it, as it does, with the responsibility of shipping redo to a
fistful of standby databases. Added to that, you can configure things so
that the failure to ship to any or all of these standbys causes the primary
database to be summarily shut down.

> Another question I am trying to get answered: how does DG manage
> TWO standby databases? What happens with redo data if one of them
> is temporarily unavailable?

It depends. You can configure some standbys to be 'must send to'
destinations, and others to be 'desirable to send to'. The point about Data
Guard is that it's up to you how you configure it. The failure to archive
to 'must send to' destinations *can* cause the primary to shutdown, or might
cause nothing very much to happen for the interim, followed by a massive
catch up operation when transmission becomes possible again.

HJR

Richard Foote

unread,

Oct 17, 2002, 10:03:55 AM10/17/02

to

"Howard J. Rogers" <howard...@yahoo.com.au> wrote in message
news:YXwr9.54893$g9.1...@newsfeeds.bigpond.com...

>
> "Karen Abgarian" <ab...@apple.com> wrote in message
> news:3DADD79B...@apple.com...
> > Some ideas that come to mind:
> >

A really big snip where Howard put paid to few of those ideas ;)

> > Another question I am trying to get answered: how does DG manage
> > TWO standby databases? What happens with redo data if one of them
> > is temporarily unavailable?
>
> It depends. You can configure some standbys to be 'must send to'
> destinations, and others to be 'desirable to send to'. The point about
Data
> Guard is that it's up to you how you configure it. The failure to archive
> to 'must send to' destinations *can* cause the primary to shutdown, or
might
> cause nothing very much to happen for the interim, followed by a massive
> catch up operation when transmission becomes possible again.

A nice feature with Data Guard is the ability for a Standby Database to
automatically realise that some redo logs are missing and for it to
automatically go off looking for where (hopefully) they might be. By setting
a couple of parameters on the Standby DB (FAL_CLIENT and FAL_SERVER)
basically makes the Standby DB behave in this manner:

Standby DB reading the paper waiting for the next log to arrive. It should
be log 1234 but boy it's taking it's time. Eventually log 1238 arrives.
"Shit" says the Standby DB, "Crystal Palace have lost again". It then puts
the paper down and says "Shit, what happened to logs 1234, 1235, 1236 and
1237" !!

It then thinks, "wait a minute, I've got the phone number (make that service
name) of my mate Standby 2 DB". "I'm a bit jealous of him to be honest as
he's more *important* than me, being mandatory while I'm just optional.
Still, he has he's uses and being the mandatory snob that he is should have
the missing redo logs".

A quick tap on the shoulder of the mandatory standby DB (who indeed does
have all the missing logs and is identified by the FAL_SERVER parameter) and
he politely delivers them to the poor optional standby DB (who is identified
by the mandatory DB via the FAL_CLIENT parameter).

Note the Primary DB could also have been used for this purpose.

I think it's pretty neat, despite all the bugs ;)

Richard

Karen Abgarian

unread,

Oct 20, 2002, 2:30:47 AM10/20/02

to

>
> > - the standby server cannot garantee 100% no data loss even with
> > Dataguard because the primary host can go down before the Dataguard
> > copies the changes to the standby.
> >
>
> Not if you configure it so that a commit doesn't count as a commit UNTIL
> it's been sent to the standby, and receipt of its successful transmission
> has been received -which is exactly how you *can* configure Data Guard if
> you so choose.

Objection sustained :). My knowledge is a year behind. I heard this "no data
loss" phrase but thought it was just marketing. I'll catch up with the manuals.

There is probably no need to explain the performance impact because it is clear
where it's source it. Although I'll have to learn technical details, it does
not look
to me upfront that the VCS should be completely thrown out of the window.

>
> >
> > - the DataGuard (and 9iR2) software is relatively new and is probably
> > full of bugs. If your system is critical, you may consider not to test
> > out
> > the Oracle bugs on it. 8i software has plenty bugs as well but they are
> > KNOWN :).
>
> Honestly! Everyone's entitled to their own opinion, of course. But I do
> rather wish less people would post their unsubstantiated opinion here as if
> it counted for anything. Have you actually tested Data Guard? It's not
> exactly difficult to do, and then you'd be able to post some real facts
> about it. It's not "full of bugs": and neither is the product "relatively"
> new, having been in the marketplace for over 12 months now.
>

No, I have not tested the DG! At our site, we are just beginning all this
testing.
So far we happily lived with our scripts to ship and apply the log files to both

standby databases. We don't even use managed recovery, just because you
need to watch it. 3 minutes allowed data loss, all works perfect.

I don't think I said something incorrect, though. Please notice the word
"critical".
It means to me that if there is some bug (maybe even one) that does something
to the databases I am responsible for, my head will be on the chopping bar.
This
kind of bumps up my requirements for stability.

To give an example, last week, during testing, we created a 9i database listener

to service two 8i and one 9i database. After bouncing one of the 8i databases
the listener crashed. This fact was communicated to Oracle, via a TAR, they
said "oops, sorry". Remember, all this is about a listener, of a software that
was
in production for 12 months or more. It seems to me that the DG is a much
bigger
thing and the chances are ...

Anyway, we'll test it and maybe there will be actual bugs to share with the
community. In the mean time, maybe you will post some details about how
you tested it if you have? Please list the machine type, recovery mode and
configuration.

>
> Uh, well (and I'm trying to be polite about this) your first statement in
> this entire post is incorrect as mentioned above precisely because Data
> Guard can be configured to use *LGWR* to synchronously transport redo to the
> standby. Now the source of potential performance impacts should be rather
> obvious: anything that slows down LGWR potentially causes grief on the
> production system, and Data Guard potentially slows it down an awful lot,
> charging it, as it does, with the responsibility of shipping redo to a
> fistful of standby databases. Added to that, you can configure things so
> that the failure to ship to any or all of these standbys causes the primary
> database to be summarily shut down.

It really is obvious now, but thanks for explaining anyway.

Howard J. Rogers

unread,

Oct 20, 2002, 6:31:25 AM10/20/02

to

"Karen Abgarian" <ab...@ureach.com> wrote in message
news:3DB24CF1...@ureach.com...

Hi thee to VMWare.com, and download the relevant 3.2 software. That gives
you the ability to create two virtual machines which run within the confines
of a single physical PC. With two virtual PCs to hand, testing Data Guard
is a piece of cake. On Linux, on Windows (pick a flavour). Given it's a
non-standard testing platform, I would expect the bugs to come crawling out
of the woodwork. They don't.

Regards
HJR