I am researching Pros & Cons of using DataGuard to keep Site2 ready
for disaster time, VS letting the SAN manage keeping the two sites in
sync.
What are your opinions on these 2 methods?
Thank you!!
-mac
Hi mac,
My first question would be does the SAN technology allow you to use
the Site2 for anything while it is not the "primary" site? I ask
because I consider that to be a major plus for Dataguard.
Regards,
Steve
Guessing you are comparing with Physical Standby of DataGuard. A big
factor to consider is the SAN replication licensing cost with
additional Oracle licensing required for Data Guard. SAN replication
is easier to setup (just my opinion) and requires less monitoring/
administration. With Data Guard you have the ability to open in read-
only mode anytime you want, while with SAN replication (and with no
extra Oracle licensing ) you are limited to validating for 10days /
year.
-Madhu S
There is a place in an Oracle shop for both SAN based replication and
Data Guard but the SAN based replication will NEVER better Data Guard
if the point is to have a DR site: Here are just a couple of the reasons.
1. Data Guard checks what it replicates for internal logic. A SAN will
happily replicate corrupt blocks.
2. Data Guard can guarantee zero data loss even when the data has not
been written to a data file. You can not make SAN replication
synchronous.
3. If SAN replication fails your primary will happily keep right on
running transactions. Data Guard can be configured to bring things
to a halt until the issue is resolved.
4. Data Guard, interestingly enough, is more efficient. What is being
replicated is the transactions themselves not operating system
blocks so are shipping less data.
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damo...@x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Thank you all for the great input.
In the last post, can you elaborate on #2? Regarding how dataguard can
guarantee zero data loss if it is configured for maximum protection.
(Essentially, the final transaction would not go through if the
primary database can not get the confirmation back.) However, can you
elaborate on your statement: "You can not make SAN replication
synchronous." ??
Thanks again!
snip
> Thank you all for the great input.
> In the last post, can you elaborate on #2? Regarding how dataguard can
> guarantee zero data loss if it is configured for maximum protection.
> (Essentially, the final transaction would not go through if the
> primary database can not get the confirmation back.) However, can you
> elaborate on your statement: "You can not make SAN replication
> synchronous." ??
Almost all of the ( probably all ? ) of the major storage vendors
support storage based replication technologies in both synchronous or
asynchronous modes.
Synchronous replication carries a performance hit that you will see in
events like "log file sync" varies somewhat in impact depending on
volume and how commit intensive applications are.
There is absolutely no sense in the statement "You can not make SAN
replication synchronous" sounds like either someone that doesn't have
relevant experience or perhaps meant something else unrelated to
storage based replication between multiple sites.
With synchronous replication DataGuard will not commit locally until
it has a guarantee that at least one remote copy of the shipped log
data has been successfully received. Not necessarily applied but at
least received.
SANs are configured to host multiple applications from different
vendors simultaneously. They have no way of understanding Oracle
traffic versus any other traffic.
Daniel Morgan
Oracle Ace Director
> There is absolutely no sense in the statement "You can not make SAN
> replication synchronous"
Actually there is: Oracle is not going to rollback a transaction
because the SAN can not replicate it. If you would like to name
and contact information for the product manager at Oracle contact
me off-line.
This does not make sense. SAN based replication is done only when a
physical write occurs. Since DG is pushing the logs to the secondary to
achieve replication, it is replicating for any change in the page.
Unless Oracle is flushing every page to disk as it is updated, then the
impact to performance for a SAN based solution should be much more
efficient than pushing the logs to the secondary.
Also consider the case with hot pages, such as index pages. DG will be
forced to send each update to the page to the secondaries while SAN
based replication will only replicate the page as it is flushed to disk.
The only logical way that DG could be more efficient would be if the
Oracle database flushes every dirty page to disk as it is updated. I can
see the logs being flushed immediately, but the data and index pages????
Is that the case?
> Actually there is: Oracle is not going to rollback a transaction
> because the SAN can not replicate it.
Hm, I would say that the SAN reports an I/O error in sync mode if it cannot
replicate.
Shouldn't this abort the oracle transaction?
You got it wrong. DG is only replicating the logs, while the db
writer can do its thing at any time later (even never, in some cases -
while several things can, and do, signal the writer to write, there
are cases where Oracle doesn't even bother, google delayed block
cleanout). The SAN would have to replicate the logs _and_ the db
writer writes as they happen, that's way more to do over the critical
network resource. The logs are then applied on the other end in a
continuous recovery. Yes, there is a trade-off between network and
local bandwidth. Which is cheaper? How do you define efficient?
Doing less stuff in the critical path usually leads to better
performance.
>
> Also consider the case with hot pages, such as index pages. DG will be
> forced to send each update to the page to the secondaries while SAN
> based replication will only replicate the page as it is flushed to disk.
No, read the Oracle Concepts manual, available online at
tahiti.oracle.com. DG doesn't know jack about hot pages, doesn't
care. The redo logs are the secret. The Achilles' heel, for that
matter. You need to understand recovery to understand how this works
with DG.
>
> The only logical way that DG could be more efficient would be if the
> Oracle database flushes every dirty page to disk as it is updated. I can
> see the logs being flushed immediately, but the data and index pages????
> Is that the case?
You REALLY need to learn the architecture. The database writer
flushes dirty blocks to disk at its leisure. It's the redo buffers
that are critical for being flushed to the logs. Since the data being
changed can be a lot less than a block, that's a lot less data to deal
with.
jg
--
@home.com is bogus.
http://www.oracle.com/webapps/events/EventsDetail.jsp?p_eventId=84954&src=6664513&src=6664513&Act=27
No. The local commit has already taken place. The write to the
local redo log file has already taken place. Otherwise there
would be nothing written to disk for the SAN to ship.
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damo...@x.washington.edu (replace x with u to respond)
You are assuming all Data Guard activities involve log file shipping:
They do not. Synchronous Data Guard, would never function if a commit
required waiting for a log file switch. I should have been clearer.
> You REALLY need to learn the architecture. The database writer
> flushes dirty blocks to disk at its leisure. It's the redo buffers
> that are critical for being flushed to the logs. Since the data being
> changed can be a lot less than a block, that's a lot less data to deal
> with.
>
> jg
> --
> @home.com is bogus.
> http://www.oracle.com/webapps/events/EventsDetail.jsp?p_eventId=84954&src=6664513&src=6664513&Act=27
Alas poor Madison lives in the house of IBM.
He will soon enough. But it will be awhile before the marketplace
forces his hand. <g>
Yet with DG, those same things would have to be created on the
secondary, wouldn't they? And with SAN replication, those would never
be replicated since they were never physically written to disk, were they?
The SAN would have to replicate the logs _and_ the db
> writer writes as they happen, that's way more to do over the critical
> network resource.
If that is the critical resource....
The logs are then applied on the other end in a
> continuous recovery. Yes, there is a trade-off between network and
> local bandwidth. Which is cheaper? How do you define efficient?
> Doing less stuff in the critical path usually leads to better
> performance.
>
>> Also consider the case with hot pages, such as index pages. DG will be
>> forced to send each update to the page to the secondaries while SAN
>> based replication will only replicate the page as it is flushed to disk.
>
> No, read the Oracle Concepts manual, available online at
> tahiti.oracle.com. DG doesn't know jack about hot pages, doesn't
> care. The redo logs are the secret. The Achilles' heel, for that
> matter. You need to understand recovery to understand how this works
> with DG.
You miss my point about hot pages.
Yea - the updates are sent via the logs. But if the primary has a hot
page, then the primary will eventually perform a flush on that page, but
not until there have been many updates to that hot page. The network
cost might increase - by one additional page -, but on the secondary
there would be no additional activity besides the updating of a page on
disk. However with DG you'd be loading that page into memory, updating
it from the logs, and then eventually writing it back to disk. All of
this creates additional work on the secondary copy which can create back
flow conditions. Not only that the same process would have to be
performed potentially several times, depending on the luck of the draw.
>
>> The only logical way that DG could be more efficient would be if the
>> Oracle database flushes every dirty page to disk as it is updated. I can
>> see the logs being flushed immediately, but the data and index pages????
>> Is that the case?
>
> You REALLY need to learn the architecture. The database writer
> flushes dirty blocks to disk at its leisure. It's the redo buffers
> that are critical for being flushed to the logs. Since the data being
> changed can be a lot less than a block, that's a lot less data to deal
> with.
Like I said - by using hardware solutions, the writes of the hot page on
the secondary would be being updated by the same trickle process as done
by the primary. Yet with DG, the page would be updated multiple times -
which may or may not require additional IO on the secondary - which may
or may not impact the delivery of logs to the secondary - which may or
may not impact the availability of the secondary.
I'll buy that. And also don't forget about the impact of avoiding
full/informational logging.
Why Dan - how considerate... ;-)
Why don't you read up on synchronous replication?
Oracle tries to write something ( say it is a log file block or
anything ) ... storage device A receives the write ... it transmits it
to storage device B ... storage device B acknowledges back to A that
it has the write ... before giving IO ack to oracle.
It is done block by block and has nothing to do with Oracle
transactions.
Sounds like you are just making up stuff.
> Sounds like you are just making up stuff.
Sounds like you weren't in San Francisco a week ago sitting in on the
excellent briefing Mark Townsend gave the Oracle Ace Directors. <g>
--
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damo...@x.washington.edu (replace x with u to respond)
I don't think this is an Oracle issue. Oracle is just like an other
app (I know it hurts to hear that) when it comes to stuff being
physically written to physical disk.
The disk subsystem controls what is physically written when and
where. Oracle merely twiddles its thumbs until it gets a return code
from the storage subsystem. The subsystem can do what it likes in the
meantime. If it hasn't made it to both subsystems, it will return an
error condition to the application doing the write (Oracle in this
case).
I know for a fact that EMC SRDF does it this way, as we use it.
Palooka
Thanks to all for the valuable/insightful thoughts on my original
post.
It sounds like (in summary), that using DG will save network bandwidth
as opposed to relying on SAN-based replication. However, CPU
utilization on the secondary (disaster-recover) database will most
likely be higher when using DG.
Also, there is a quote in "Oracle Data Guard in Oracle Database 10g -
Disaster Recovery for the Enterprise" that reads:
"Data Guard allows the administrator to choose whether this redo
data is sent synchronously or asynchronously to a standby site."
Can anyone think of any instance when the disaster-recovery database
would not be usable in the event of an emergency if you were relying
on SAN-based replication??
Thanks again.
Well, don't forget there are also other options like logical standby.
Oracle used to recommend the standby db be on identical hardware as
primary, probably for this reason, as well as capacity on the standby
has to be able handle when it is actually flipped over to.
>
> Also, there is a quote in "Oracle Data Guard in Oracle Database 10g -
> Disaster Recovery for the Enterprise" that reads:
> "Data Guard allows the administrator to choose whether this redo
> data is sent synchronously or asynchronously to a standby site."
>
> Can anyone think of any instance when the disaster-recovery database
> would not be usable in the event of an emergency if you were relying
> on SAN-based replication??
>
> Thanks again.
I've seen the FAL process get confused due to unknown errors, probably
network transmission errors. How the SAN would handle these sorts of
situations is beyond me. Does it sit there and retransmit
continuously? Would stopping your primary database because of a
problem on the network or the standby be an emergency? I know my
damagement thought that wasn't too cool...
jg
--
@home.com is bogus.
The Greater Depression: http://www.signonsandiego.com/uniontrib/20081005/news_lz1n5dean.html
I believe both technologies have its strengths and weakneses.
I have been using Data Guard as the primary DR solution. It works
great in situations where everything (or most of it) that you worry
about is the database. The switchover is reasonably fast and depending
on configuration can provide you with minimal or no datalos anfter the
failover. The problem starts if you have to worry about other non-
database stuff that is needed on the recovery site also. The ftp,
samba, application server, etc. make good examples. Althou these are
usually seldom changed there has to be a mechanism put in place that
will synchronise its data too. Althou it seams simple at first it is a
real pain in the ass.
We use the Data Guard configuration with Oracle E-business Suite - as
recomended solution by Oracle to provide the DR capabilities. The
database configuration is easy but the synchronisation, refreshes and
preconfiguration of application server is a nightmare. We were running
the DR tests and althou the database is switched over in 10 minutes
the whole switch to the backup site takes about 3 hours thanks to
varius application refreshes, reconfigurations, etc.
This all makes the whole switchover experience a very time consuming,
but most important complicated experience. The documentation of the
process has about 60 pages and needs experienced DBA to make it
happen.
With the storage or filesystem replication all this is gone - as
neither the database or application are aware of it. Simplicity is
preserved - which is a huge bonus. The only thing I see dangerous is
that doing "rm -Rf" will replicate to the standby site just fine
damaging both sites. With standby you can still failover.
To sum things up - with simple things - database + end clients = go
with the standby database. With anything more complicated I would
chose storage or filesystem replication.
Greetings
Remigiusz Boguszewicz