Introduction [Contents]
At the end of the day one Monday, I ran one of my scripts to show our
web usage for that day, but, instead of the listing I expected, it
gave me a batch of UGLY Oracle error messages:
ORA-01115: IO error reading block from file 13 (block # 66771)
ORA-01110: data file 13: '/ndxs/oradata/PROD/medi_PROD_01.dbf'
ORA-27091: skgfqio: unable to queue I/O
ORA-27072: skgfdisp: I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 66770
which meant that it got a read error on one of our index datafiles
(file 13, at that - a bit of triskaidekaphobia!). I did an "ls"
command on the directory containing our indexes, and got an equally
frightening message:
$ ls -ltr /ndxs/oradata/PROD
ls: /ndxs/oradata/PROD: There is an input or output error.
total 0
which indicated that it couldn't find ANY of our index datafiles
("total 0"). Apparently our RAID drive with the indexes directory had
gone south, so I shut down the database to await diagnostics and
repair. To top it off, our payroll group was scheduled to begin their
run to get the deposits and checks out by the end of the week. The
next day when there was no quick resolution, contingency plans started
being talked about to do the payroll checks by hand. WHAT DO WE DO
NOW? Round up some typewriters? I don't think so! I'll give you our
resolution at the end, but, in the mean time ...
How vulnerable is your Oracle database? What can you do to protect it?
And, if those efforts fail, how do you get your Oracle database back
in working order with minimal loss of data if disaster strikes? This
presentation shows you step-by-step how to bulletproof your database
as much as possible against data loss, gives you the what's and how's
of database backups and restores, and takes you through disaster
recovery scenarios if you encounter those heart-stopping Oracle errors
on database startup, such as datafile corruption or loss, disk drive
loss, and other nasties. This is geared for Oracle 9i (and 8i for the
most part) and UNIX installations.
Bulletproofing
Yes, you really can bulletproof your database so that it is always up
and available and you never lose any data from almost any disaster
scenario - for a price, whether in dollars, storage, servers, and/or
bandwidth. You can do things such as running parallel servers in
separate locations with standby databases for quick switchovers using
Oracle's Data Guard package, along with using other options that
Oracle and others provide. For the rest of us with or without large
budgets and staff, there are things that we, too, can do to lessen the
possibility of data loss, minimize downtime and make our databases
more bulletproof, such as those described below:
1. Enable archivelog mode (to reapply the changes during recovery;
which is required by most of the disaster recovery scenarios detailed
below).
2. Separate the archive logs from the redo logs (allocating them to
separate drives; likewise for the following items).
3. Separate the redo logs and archive logs from the datafiles.
4. Multiplex (mirror) the redo log groups and members.
5. Multiplex (mirror) the control file.
6. Multiplex (mirror) the archive log files.
Enable archivelog mode so that the old redo logs, which contain the
changes to the database that can be reapplied during datafile
recovery, will be saved to another disk directory after a redo log
switch before Oracle overwrites them when it cycles back around. Most
of the disaster recovery scenarios require that archiving is turned on
and that those archive logs are available on-line. Doing hot backups
while the database is running also requires archiving.
Separate the archive logs directory from the redo logs on different
disk drives to reduce disk contention and speed up writing when the
old redo logs are being archived while the next redo logs are being
written so that cycling back around doesn't freeze the database
momentarily til the archiving has completed, preferrably with those
disk drives being on separate drive controllers. (Having more than two
redo log groups will also reduce the possibility of freezing.)
Separate the redo logs and archive logs from the datafiles on
different disk drives to protect from data loss on a single disk crash
(a media failure), also on separate controllers. If the datafiles'
disk is lost, today's changes are still in the archive logs. If the
archive logs' disk is lost, today's changes are still in the
datafiles.
Have multiple redo log group members on separate disk drives (called
mirroring or multiplexing) to keep multiple copies of the changes
currently being made to the database so that, if one disk copy is
lost, the other disk copy(s) will be used to keep the database running
while you fix that lost disk, with those drives also on separate
controllers.
Have multiple control files, which contain the list of datafile
pathnames and the current system change number to synchronize all of
those datafiles, mirrored on separate disk drives, to allow you to
restart the database if one of those copies is lost, with those drives
also on separate controllers. Keeping a separate up-to-date textual
control file, produced by "alter database backup controlfile to
trace;", will also allow you to restart the database in case all
control files are lost. Note that Oracle only reads the first control
file listed in the init.ora file (or as listed in the "show parameter
control_files" results, for when you are using an spfile (server
parameter file) instead of an init.ora file), although it updates all
of them in parallel as changes are made.
Have multiple archive log directories, mirroring them on separate disk
drives, to protect the set of archive logs from media failure
(multiplexing of archive logs can be done through Oracle in 8i and
beyond). If you had a dedicated tape drive, you could have a cron job
write those new files to tape periodically. This is not the same as
archiving directly to tape, but could be used to get around the
problems inherent with direct tape archiving.
Some of the above is discussed in the Oracle 9i Dababase
Administrator's Guide (chapters 6 through 8, and chapter 12).
Shutting Down and Starting Up Your Database
If you've never worked as an Oracle Database Administrator before,
some basics that you need to know are how to position to the correct
database and how to shut down and start up your database. Each Oracle
database is identified by an "Oracle SID" (System Identifier), such as
PROD or TEST. To position to the PROD database in UNIX (shown with a
dollar prompt here), you would enter:
$ . oraenv (that's a dot and a space before oraenv)
PROD (when prompted for the ORACLE_SID)
Or, in NT (shown with a c:\> prompt here), you would enter:
c:\> set ORACLE_SID=PROD
c:\> set ORACLE_HOME=d:\oracle\v9204 (wherever your Oracle product
directory is located)
To shut down or start up your database, you will first have to connect
to your database as a system database administrator (sysdba) user
(from a UNIXor NT dba group user login, such as oracle). Prior to
Oracle 9i, you could use Server Manager (using the "svrmgrl" command
or version equivalent), or the line-mode SQL*DBA (using the "sqldba
lmode=y" command or version equivalent), but, in Oracle 8.1.5 and
later, Oracle moved the Server Manager commands to SQL*Plus (using the
"sqlplus" command or version equivalent). So, for Oracle 8.1.5 and
above on UNIX, to connect to your database as sysdba, you would enter:
$ sqlplus "/ as sysdba" (or, if already in sqlplus: SQL> connect /
as sysdba)
Or, prior to Oracle 9i, to use Server Manager, you would enter:
$ svrmgrl
SVRMGRL> connect internal
For the remaining examples, Oracle 9i on UNIX will be used.
To shut down the database (connected as sysdba), you would enter the
following, which kills any active sessions and rolls back any pending
database changes:
SQL> shutdown immediate
If "shutdown immediate" fails or hangs, bring up another sysdba
session and enter:
SQL> shutdown abort
SQL> startup
SQL> shutdown immediate
When an Oracle database starts up, it goes through three stages. The
nomount stage reads the init.ora file (or spfile), allocates SGA
memory, and starts the background processes, such as pmon (process
monitor). The mount stage opens the control files. The open stage
opens the datafiles and online redo logs, and performs crash recovery
if needed. Most times you will be just issuing a startup command,
which goes through the three stages and opens the database. However,
various database recovery scenarios and database administration tasks
require commands to be entered at a certain state. To get to or step
through each state (connected as sysdba), you would enter the
following:
nomount state:
SQL> startup nomount (from shutdown state)
mount state (use just one of these):
SQL> startup mount (from shutdown state)
SQL> alter database mount; (from nomount state)
open state (use just one of these):
SQL> startup [open] (from shutdown state)
SQL> alter database open; (from nomount or mount state)
Notes:
"$" assumes that you are at the unix prompt, and have exited out of
svrmgr or sqlplus (by typing "exit" at their prompts) if the prior
command was a svrmgr or sqlplus command.
"SVRMGR>" assumes that you are in the line-mode version of Server
Manager, which is entered by typing "svrmgrl" at the unix prompt.
Similar commands can be executed from the line-mode version of
SQL*DBA, which is entered by typing "sqldba lmode=y" at the unix
prompt. References to "svrmgr" mean through the "svrmgrl" command.
"SQL>" assumes that you are in SQL*PLUS, usually either connected as
sysdba (connect / as sysdba) or logged in as user "system", which is
entered by typing "sqlplus" at the unix prompt (but, if you don't know
that, you should take a basic course in SQL before attempting the
instructions given here).
"!" is the same thing as "host" at the SQL> prompt or SVRMGR> prompt,
so, for example, you could use either "!mv" or "host mv" to move a
file while you are in sqlplus or Server Manager on a UNIX system. In
NT, you would use "$" at the SQL> prompt instead of "!" to execute a
host command.
In all of these examples, replace "PROD" with your own Oracle SID
name, and replace the data file names with your own names, and, for
the most part, run these commands from the unix userid "oracle".
Also, replace "password" with the user id's password on your site.
When using the export command, be aware that changes to related tables
during the export may cause inconsistencies. To prevent this,
especially during full database exports, either make sure that no
users are on and "alter system enable restricted session;" has been
entered, or use the "consistent = Y" parameter to the exp line mode
command.
Finally, ALWAYS make sure you have a good current FULL BACKUP of your
database before using these scripts and commands. Some of them have
the potential to delete or corrupt large amounts of your precious data
if you don't do them just right. Don't say you haven't been warned!
Enabling Archiving
To enable archivelog mode so that Oracle saves the old redo logs to
disk after a log switch before overwriting them when it cycles back
around, perform the following commands. Oracle requires at least two
redo logs (log groups) in order to accommodate possible archiving,
since one could be being spooled during archiving to the archivelogs
directory while changes currently being made to the database are being
written to the other by the log writer.
$ vi $ORACLE_HOME/dbs/initPROD.ora
Edit init.ora file to contain the archive parameters (directory name
and file format and the start flag):
log_archive_dest = /u01/oradata/PROD/archivelogs/
log_archive_format = arch_PROD_%S.arc
log_archive_start = true
("archive log start" can also be used when connected as sysdba to
start the arch process, if you don't want it automatically done
through init.ora; but, neither of these turn on archivelog mode - they
just start the arch process itself.)
$ mkdir /u01/oradata/PROD/archivelogs
$ sqlplus "/ as sysdba"
SQL> shutdown immediate
SQL> startup mount
SQL> alter database archivelog;
Turns on archivelog mode for the database.
SQL> alter database open;
SQL> archive log list
Shows that you are now in archivelog mode, that archiving is enabled,
and shows you the current log sequence numbers. You can "alter system
switch logfile;" a few times to see archivelog files going into the
log_archive_dest directory.
SQL> select * from v$database;
Also shows that you are in archivelog mode.
To disable archivelog mode, similar commands (when connected as
sysdba) are used, except for using the "alter database noarchivelog;"
command instead (these commands are also performed after the database
is shut down). The log_archive_start parameter in the init.ora file
would also need to be set to false to permanently disable the arch
process itself.
Moving Datafiles
You can move a datafile to another disk, or change it's name (such as
for restoring a copy of it from your backup tape to that other
location in the event of a disk failure or moving it to another disk
that has more room for it) by making the move after shutting down the
database, and using the "alter database rename file" command to tell
Oracle where the file has been moved to, as in the following example,
which moves one of the development datafiles from disk u03 to disk
u04:
$ sqlplus "/ as sysdba"
SQL> shutdown immediate
SQL> !mv /u03/oradata/PROD/devl_PROD_01.dbf /u04/oradata/PROD
SQL> startup mount
SQL> alter database rename file '/u03/oradata/PROD/devl_PROD_01.dbf'
to '/u04/oradata/PROD/devl_PROD_01.dbf';
SQL> alter database open;
SQL> select * from v$datafile;
Lists the current datafiles, showing your changes.
This can also be done without shutting down the database, but taking
the associated tablespace offline first (which prevents others from
accessing that tablespace's tables, indexes, and other data):
$ sqlplus "/ as sysdba"
SQL> alter tablespace development offline;
SQL> !mv /u03/oradata/PROD/devl_PROD_01.dbf /u04/oradata/PROD
SQL> alter database rename file '/u03/oradata/PROD/devl_PROD_01.dbf'
to '/u04/oradata/PROD/devl_PROD_01.dbf';
SQL> alter tablespace development online;
SQL> select * from v$datafile;
Finally, the textual control file from the "alter database backup
controlfile to trace;" command can be used to change the datafile
locations, such as to move all datafiles from one disk to another (u03
to u04 in the example below). The textual control file is created in
the database's udump directory, which is found by using either of the
following commands:
SQL> select value from v$parameter where name = 'user_dump_dest';
SQL> show parameter user_dump
with a name like /u00/oracle/admin/PROD/udump/prod_ora_16060.trc. (A
sample textual control file is shown in the Appendix.)
SQL> connect / as sysdba
SQL> alter database backup controlfile to trace;
SQL> show parameter user_dump
SQL> shutdown immediate
SQL> exit
$ mv /u03/oradata/PROD/*PROD*.dbf /u04/oradata/PROD
$ cd /u00/oracle/admin/PROD/udump
$ ls -ltr *.trc
To find the name of the textual control file just created (the last
one in the ls listing, such as prod_ora_16060.trc).
$ vi prod_ora_16060.trc
Edit the textual control file and remove all lines before the STARTUP
NOMOUNT line, remove the lines after ALTER DATABASE OPEN, or, if
present, ALTER TABLESPACE TEMP ADD TEMPFILE, change the names in the
DATAFILE section to match the new disk names such as from "u03" to
"u04" (DO NOT change the order of the file names!), comment out (put #
in front of) the RECOVER command, and, for Oracle 9i and above, and
change all comment lines to start with dashes. You may also want to
change the TEMPFILE location, which is not in the DATAFILE section for
Oracle 9i+. The vi commands to do these (for 9i+; except for
TEMPFILE) are usually:
:1,/STARTUP NOMOUNT/-1d
:/ALTER TABLESPACE TEMP/+2,$d
:g/^DATAFILE/,/;/s/\/u03/\/u04/
:/RECOVER DATABASE/s/^/# /
:1,$s/^#/--/
:wq
[Be sure to read the NOTES below.]
$ sqlplus "/ as sysdba"
SQL> @prod_ora_16060.trc
Recreates the control files with the new disk volume names.
SQL> select * from v$datafile;
NOTES: Be aware that the 9i "alter database backup controlfile to
trace;" command creates a textual control file with two "create
database" commands. You will use the first set (the one with
NORESETLOGS) and edit out the second set, while you are changing the
directory and/or file names in the textual control file. Also, be sure
to include the ADD TEMPFILE command that may be generated following
the OPEN command - that is easy to overlook and get edited out
accidentally.
Adding Redo Log Members
To add another member on another disk drive and separate controller to
a redo log group, in order to mirror the logs in the group for media
failure protection, you will need to specify the new file name and the
current group number, as shown below. The new redo log file will be
created the same size as all the other redo log files in the group.
SQL> alter database add logfile member
'/u04/oradata/PROD/log_PROD_1C.rdo' to group 1;
SQL> select * from v$logfile;
Lists the current log files, showing your changes.
To add more members than maxlogmembers to a redo log group, you will
need to recreate the control files with that new maximum by creating a
textual control file, editing the maxlogmembers parameter in it, and
running that SQL when connected as sysdba after shutting down the
database:
$ sqlplus "/ as sysdba"
SQL> alter database backup controlfile to trace;
SQL> !ls -ltr /u00/oracle/admin/PROD/udump
To find the name of the textual control file just created.
SQL> !vi /u00/oracle/admin/PROD/udump/prod_ora_16060.trc
Edit the textual control file and remove all lines before the STARTUP
NOMOUNT line, remove the lines after ALTER DATABASE OPEN, or, if
present, ALTER TABLESPACE TEMP ADD TEMPFILE, change the maxlogmembers
value from the default 2 to 3 or 4, comment out (put # in front of)
the RECOVER command, and, for Oracle 9i and above, and change all
comment lines to start with dashes. The vi commands to do these (for 9i
+) are usually:
:1,/STARTUP NOMOUNT/-1d
:/ALTER TABLESPACE TEMP/+2,$d
:/MAXLOGMEMBERS/s/2/3/
:/RECOVER DATABASE/s/^/# /
:1,$s/^#/--/
:wq
SQL> shutdown immediate
SQL> @/u00/oracle/admin/PROD/udump/prod_ora_16060.trc
Recreates the control files with the new maxlogmembers value.
Adding Redo Log Groups
Additional redo log groups can be added to lessen the possibility of
the database freezing while waiting on archiving to free up the next
log group. The new group number along with the list of new redo logs
and the allocation size for those member files is specified as shown:
SQL> alter database add logfile group 4
('/u00/oradata/PROD/log_PROD_4A.rdo',
'/u01/oradata/PROD/log_PROD_4B.rdo') size 500K;
SQL> select * from v$logfile;
Multiplexing Control Files
Multiplexing the control files involves just shutting down the
database, copying the current control file to the other disks and
renaming them, editing the init.ora file to include the new control
file names, and restarting the database. From that point, all datafile
header updates and datafile additions and migrations will be reflected
in all of the identical multiplexed control files, and the other
remaining control file(s) can be used to bring the database back up if
one or more of the control files are lost.
$ sqlplus "/ as sysdba"
SQL> shutdown immediate
SQL> host
$ cp -p /u03/oradata/PROD/ctrl_PROD_01.ctl /u01/oradata/PROD/
ctrl_PROD_02.ctl
$ cp -p /u03/oradata/PROD/ctrl_PROD_01.ctl /u00/oradata/PROD/
ctrl_PROD_03.ctl
Copies the original control file to the other disks, renaming them to
match the control_files parameter.
$ vi $ORACLE_HOME/dbs/initPROD.ora
Edit init.ora to add other control files on separate disks to the
control_files parameter, using their full pathnames, separated by
commas, such as:
control_files = (/u03/oradata/PROD/ctrl_PROD_01.ctl,
/u01/oradata/PROD/ctrl_PROD_02.ctl,
/u00/oradata/PROD/ctrl_PROD_03.ctl)
$ exit
SQL> startup
SQL> select * from v$controlfile;
Lists the current control files, showing your changes.
Multiplexing Archive Log Files
I haven't done this yet, but, multiplexing archive log files is
available in Oracle in 8i and beyond (see chapter 8 in Oracle9i
Database Administrator's Guide). For earlier versions, you could set
up a cron job to run every half-hour or so to copy the archive log
files not currently in the mirror directory into it. Make sure that
the mirror directory is on a different disk than the archive logs
directory; otherwise, you defeat the purpose if the archive logs disk
goes out.
Backups
The DBA's primary job is to make sure that the data is available and
accessable by the users during those times that they need it, which
means that a complete and well-tested backup and recovery procedure is
in place and functioning. This section covers what files need to be
included in that backup, the types of backups (cold backups and hot
backups), and other processing that you could perform during the
backup for proactive maintenance.
What To Back Up
The following types of files making up the database and providing
recovery capabilities should be backed up during one backup cycle,
such as all of them during a cold backup, or all or subsets of them
during a hot backup cycle, assuming the entire set is eventually
backed up during that cycle (see Oracle9i User-Managed Backup and
Recovery Guide, chapter 2):
· Datafiles (for all tablespaces)
· Control Files (binary and textual versions)
· Redo Log Files (cold backups only, not hot backups)
· Archive Log Files (archived redo logs, if archivelog mode is
enabled)
· Parameter Files (init.ora; like $ORACLE_HOME/dbs/initPROD.ora)
· Password Files (like $ORACLE_HOME/dbs/orapwdPROD, if used)
The most basic way to get the names of the datafiles, control files,
and redo log files through SQL is shown below, which can be used in
both the startup mount state and the database open state:
SQL> select name from v$datafile;
SQL> select name from v$tempfile;
SQL> select name from v$controlfile;
SQL> select member from v$logfile;
You can also get the tablespace name with the datafile and temp file
names using:
SQL> select tablespace_name,file_name from dba_data_files order by
tablespace_name;
SQL> select tablespace_name,file_name from dba_temp_files;
The list of archive log files which are currently on-line cannot be
found through SQL. However, you can get the directory containing those
archive logs, the format of the file names, and whether or not
archiving was turned on in the init.ora parameter file using:
SQL> select name,value from v$parameter where name in
('log_archive_dest',
'log_archive_format','log_archive_start');
These archiving parameters can also be found in an sqldba session by
issuing either of the following commands (but the first one might
truncate the dest directory name):
SQL> show parameter archive
SQL> archive log list
To get the list of archive log files for the last 5 days (the date is
when Oracle started writing to that redo log, not when it was copied
to the archivelogs directory), which may or may not still be on-line,
you can query the v$archived_log table in Oracle 8.x and above:
SQL> select name from v$archived_log
where trunc(completion_time) >= trunc(sysdate)-5;
Or, you can query the v$log_history table in Oracle 7.x:
SQL> select archive_name from v$log_history
where trunc(to_date(substr(time,1,8),'MM/DD/YY')) >=
trunc(sysdate)-5;
The name of the parameter file is not kept in the database, but is
usually $ORACLE_HOME/dbs/initPROD.ora (substitute your Oracle SID).
Since you can't find the currently existing archive log files or the
init.ora file or the password file through SQL, the best way to get
the list of database files to back up (assuming you are using a
standard naming convention for all of your files related to a database
instance, such as using the Oracle SID in the names of those files or
in the name of one level of the directory hierarchies containing your
datafiles, control files, redo log files, and archive log files) is by
using the unix find command to get the list of file pathnames that
contain that Oracle SID, as shown below (which is redirect to a file
named backemup.dat):
$ find / -name '*PROD*' ! -type d 2>/dev/null >backemup.dat
The addendums to this presentation contain the scripts backup_list.shl
and backup_list.sql, which are sample unix and SQL scripts used to
create the list of files to be backed up, assuming a standard naming
convention. More information on naming conventions can be found in
papers on Oracle's OFA, which is referred to as either Oracle Flexible
Architecture or Optimal Flexible Architecture.
Cold Backups
Cold backups are done while the database is shut down, usually after a
"shutdown immediate" is issued, and contain the most complete backup
snapshot, since all database files, including control files and redo
log files, are backed up from the same point in time.
During the time that a cold backup is in progress, the database is not
available for querying or updating while those files are being copied
to tape or to some other disk drive or directory (possibly compressing
(zipping) them at the same time). Cold backups can be done with the
database either in archivelog mode or in noarchivelog mode.
To perform cold backups, you will first need to get the list of files
to be backed up (putting the file names into backemup.dat in the
examples below), including datafiles, control files, and redo log
files, which you can get either by querying the database itself while
the database is still up (see "What To Back Up") or by using the unix
"find" command to locate files following a standard naming convention,
preferrably after you have shut down the database (see below), since
the archive log files and other database files can also be picked up
at that time by that "find" command. Be sure to get a fresh list of
files for each backup in order to accommodate structural changes in
the database (such as new datafiles) as well as additional archive log
files.
Then, you should include the parameter file and password file (if
used) in the list. If you are using the "find" command, those files
could also be picked up by "find", since they both usually contain the
SID name. It would also be a good idea to include a textual control
file in the list, which you would create before shutting down the
database for the backup, as shown below (which contains the SID name
as part of the directory name and, so, could also be picked up by
"find"):
$ sqlplus "/ as sysdba"
SQL> alter database backup controlfile to trace;
SQL> shutdown immediate
SQL> exit
$ ls -ltr /u00/oracle/admin/PROD/udump/* | tail -1 | sed 's/^.* \//
\//' >>backemup.dat
Puts the name of the textual control file just created in the udump
directory (the last one in the ls listing, such as prod_ora_16060.trc)
into the list of files to be backed up.
Finally, after shutting down the database, you will need to get the
list of the archive log files to back up (the shutdown may cause new
archive log files to be written out). You could do this either as part
of the "find" command at this point, along with the datafiles, control
files, and redo log files, or by just getting the fully-qualified
listing of the file pathnames in the archivelogs directory. You could
also combine the two and get the list of non-archive files for the
backup, followed by the list of archive files (zipped followed by not
zipped here), as shown below:
$ find / -name '*PROD*' ! -type d 2>/dev/null | grep -v 'arc$' | grep -
v 'gz$' >>backemup.dat
$ ls /u01/oradata/PROD/archivelogs/*.arc.gz >>backemup.dat
$ ls /u01/oradata/PROD/archivelogs/*.arc >>backemup.dat
After the list of files has been created (in backemup.dat) and the
database has been shut down, you can do any of the following to back
up the files or make copies for the backup. The first two options
allow you to start up the database before writing the copied files to
tape, which usually results in a shorter down time for the database.
· Copy the files to another staging (backup) directory (such as /u03/
oradata/prod1 here) to be written to tape later, and restart the
database, or,
· Compress (using GNU Zip here) the files to another staging directory
to be written to tape later, and restart the database, or
· Copy the files directly to tape (no staging directory), then,
restart the database.
These three options can be performed as shown below (do only one of
them):
$ cat backemup.dat | sed 's/\(^.*\)\/\(.*\)$/cp -p \1\/\2 \/u03\/
oradata\/prod1\/\2/' | sh
Copies the files to another staging (backup) directory.
$ cat backemup.dat | sed 's/\(^.*\)\/\(.*\)$/gzip -cv1 \1\/\2 >\/u03\/
oradata\/prod1\/\2.gz;
touch -r \1\/\2 \/u03\/oradata\/prod1\/\2.gz/' | sh
Compresses (zips) the files to another staging directory (the touch
command must be on the same line as the sed command).
$ cat backemup.dat | cpio -ovC64 >/dev/rmt0
Copies the files directly to tape (without using a staging directory).
Then, restart the database after either of the three above commands:
$ sqlplus "/ as sysdba"
SQL> startup
SQL> exit
Be sure to copy the backup disk directory files to tape after you have
done the startup for the first two options, such as shown below. (See
"Backup/Restore Using Unix cpio Command" for cpio description.)
$ ls /u03/oradata/prod1/* | cpio -ovC64 >/dev/rmt0
Hot Backups
Hot backups are done while the database is still up and running and
being accessed and updated. This is used when continuous 24-hour-per-
day / 7-day-per-week operations are required. If you have even a half-
hour window when your database can be down, you can probably do a cold
backup using the staging directory method above.
Doing hot backups requires that archivelog mode is turned on. It is
more taxing on redo logs and archivelogs when the backups are being
performed, since entire database blocks are written to the redo logs
for the changes that are made, instead of just the individual changes
(transaction data).
You must NEVER backup the online redo log files when doing hot
backups. Restoring a backup of an online redo log file would cause
corruption during database recovery. The online redo log files are the
most vulnerable items in the hot backup scheme. However, forcing log
switches can catch the pending datafile updates in the redo logs and
give you some coverage in this area.
You cannot backup the online control files themselves, but you can
backup a special copy of them, either a binary copy or a textual copy,
as shown below.
To perform hot backups, you will only back up one tablespace's
datafiles at a time. In order to reduce the time that the tablespace
is in backup mode, it is best to copy its datafiles to a backup disk
directory, instead of to tape at the current time. When all copies
have been done, you can then copy that backup disk directory to tape
while the database is running normally. You must use the "begin
backup" and "end backup" commands, as shown below; otherwise, your
backups will be corrupted and useless. Note that the data in the
tablespace remains available to the users while in backup mode, and
any transactions can still be performed on that data. (See section on
Disaster Recovery Scenarios for info on database crashes during hot
backups.)
For each tablespace, perform the following steps while connected as
sysdba (the DEVELOPMENT tablespace is shown here - change the
tablespace name and the datafile names as appropriate for each
tablespace backup):
SQL> alter tablespace development begin backup;
SQL> !cp -p /u03/oradata/PROD/devl_PROD_*.dbf /u03/oradata/prod1
Copies the tablespace's datafiles to a backup disk directory (/u03/
oradata/prod1 here), using standard naming conventions.
SQL> alter tablespace development end backup;
Force a log switch to archive the current log to catch all of the
committed transactions in it. Then, copy the archive log files (zipped
and not zipped) to your backup disk directory, as shown below:
SQL> alter system switch logfile;
$ ls /u01/oradata/PROD/archivelogs/*.arc.gz >logfile.lst
$ ls /u01/oradata/PROD/archivelogs/*.arc >>logfile.lst
Generates the list of archivelog files to be copied (don't just copy
the entire directory - you might catch an archivelog file in mid-
creation, but, do keep the archiving process running).
$ sleep 5
Waits for a few seconds to allow current archivelog writes to
complete.
$ cat logfile.lst | sed "s/\(.*\/\)\([^\/].*\)/cp -p \1\2 \/u03\/
oradata\/prod1\/\2/" >logfile.shl
$ sh logfile.shl
The above commands copy the archivelog files to your backup disk
directory.
Create a control file to back up (binary and textual) AFTER all other
files have been copied:
SQL> alter database backup controlfile to '/u03/oradata/prod1/
controlfile.ctl';
SQL> alter database backup controlfile to trace;
$ ls -ltr /u00/oracle/admin/PROD/udump
To find the name of the textual control file just created (the last
one in the ls listing, such as prod_ora_16060.trc).
$ cp -p /u00/oracle/admin/PROD/udump/prod_ora_16060.trc /u03/oradata/
prod1
Copies that textual control file to your backup disk directory.
After all datafiles and log files have been copied to your backup disk
directory, back up those copies in your backup disk directory to
tape. If you wanted to zip those files first with a utility such as
GNU Zip to backup the compressed versions (but see below), you would
do that before the cpio copy command, as shown below.
$ cd /u03/oradata/prod1
$ gzip -vN1 *
$ find /u03/oradata/prod1 | cpio -ovC64 >/dev/rmt0
Partial hot backups, in which the tablespaces are backed up at
separate times, possibly across several days for the entire set of
tablespaces, are possible, but not recommended.
NOTE: Only put one tablespace at a time into hot backup mode, and,
then, for as short a time as possible. Hot backups are hard on your
redo logs, since entire database blocks are written to them for each
change, instead of just the individual changes themselves, so, your
redo logs tend to get filled up fast, resulting in much more frequent
log switches while your tablespaces are in hot backup mode. A checksum
is written to the beginning and to the end of the database block
whenever a change is made. In Oracle7, during recovery, if those two
checksums are different (meaning a change occurred in that block while
it was being written out during a hot backup), the changed block is
retrieved from the archivelogs. In Oracle8 (from what I remember from
my Oracle class), those two checksums are compared during the hot
backup, and the archivelog block is written out instead of the
database block if a change is detected. It is for this reason that
entire changed blocks are archived during hot backups, instead of just
the changed bytes.
Instead of doing compression after you have copied the files into the
backup disk directory, you can use a UNIX pipe (or other operating
system equivalent, if available) to compress the files on the fly as
you copy or create them, such as in this export example, in which exp
writes to the pipe while gzip reads from the pipe (that's a double
quote/single quote after exp and a single quote/double quote after
sysdba in the export (exp) command):
$ mknod /tmp/exp_pipe p
$ gzip -cNf </tmp/exp_pipe >prod.dmp.gz &
$ exp "'/ as sysdba'" file=/tmp/exp_pipe full=y compress=n
log=prod.dmp.log
$ rm -f /tmp/exp_pipe
Other Nightly Processing
In addition to backups, other nightly processing can also be done to
guard against data loss and allow for proactive problem detection and
correction, such as the following, which are described below:
1. Create and keep current copies of your textual control files.
2. Create and keep current copies of your textual init.ora file if you
are using an spfile.
3. Perform full database exports for quick table restores and to check
datafile integrity.
4. Generate definitions for all of your tables and indexes.
5. Gather statistics on datafile and index space usage and table
extent growth for proactive maintenance.
Keep current copies of the textual control files for each of your
database instances, in case the control file is corrupted and you have
to rebuild it. For each of the database instances (such as SEED, TRNG,
PPRD, and PROD), switch to that instance (use ". oraenv" and enter
SEED, or, use "export ORACLE_SID=SEED", to switch to SEED in unix),
and enter the following in sqlplus or svrmgr to create the textual
control file in the instance's user_dump_dest directory specified in
the init.ora file:
SQL> alter database backup controlfile to trace;
Keep a current copy of the textual init.ora file (initialization
parameter file) if you are using an spfile (server parameter file)
instead of an init.ora file to hold your initialization parameters.
If the binary spfile becomes corrupt, the init.ora file can be used in
its place to restart your database (after deleting the spfile, since
Oracle looks for the spfile first). To create the init.ora file
(which is placed in the $ORACLE_HOME/dbs directory on UNIX systems, or
the $ORACLE_HOME\database directory on NT systems):
SQL> create pfile from spfile;
Create an export file of the entire database for individual table
restores. This will also detect any block corruption in the table
data, since a full table scan is performed. Block corruptions are not
detected in physical block copies during tape backups, so you wouldn't
know that you were propagating bad blocks to all of your backups until
a user tried to access data in that block. During export, if a bad
data block is encountered, the export will fail.
$ exp "'/ as sysdba'" file=/u03/oradata/prod1.dmp full=y \
log=/u03/oradata/prod1.dmp.log >/dev/null 2>/dev/null
Create an indexfile of that export for table and index definitions.
The indexfile can also be used to separate tables into product-based
tablespaces, or to split the indexes off into their own tablespace.
$ imp "'/ as sysdba'" file=/u03/oradata/prod1.dmp \
indexfile=/u03/oradata/prod1.idx full=y \
log=/u03/oradata/prod1.idx.log >/dev/null 2>/dev/null
Gather statistics on space usage so that you will know when to
reallocate tables to give them larger extents before they run out of
extents to allocate (using gurrddl or gurnddl along with export/
import), or to add datafiles to tablespaces that are about to fill up
(alter tablespace ... add datafile ...), or to recreate indexes that
currently have unused or wasted space due to deletions from their
associated tables. Below are some sample commands you can use to see
potential space problems for tables and tablespaces and to see if an
index has a lot of wasted space (and should be recreated at a time of
no activity):
SQL> select segment_name,segment_type,extents,max_extents
from sys.dba_segments where extents + 5 > max_extents
or extents > 50 order by segment_name,segment_type desc;
SQL> col "Free" format 9,999,999,999
SQL> col "Tot Size" format 9,999,999,999
SQL> select tablespace_name,sum(bytes) "Tot Size"
from dba_data_files group by tablespace_name;
SQL> select tablespace_name,sum(bytes) "Free"
from dba_free_space group by tablespace_name;
For each index, find the number of deleted rows (may rebuild it
later):
SQL> validate index posnctl.nhrfinc_key_index;
SQL> select name,del_lf_rows_len from index_stats;
The above pair are used together (validate clears out the index_stats
table before putting its stats into it).
Archiving To Tape
The best method for periodically writing the archive log files to tape
is to send them to disk first and have a cron job backup those archive
logs to tape later on. Never archive directly to tape, since that
would require a dedicated tape drive just for the archivelogs, and
since the backup would need to complete before cycling back to that
online redo log (otherwise, the database will "freeze" until that redo
log is available for reuse - the same thing that happens if your
archivelog directory gets filled up). If you did have a tape drive
that could be dedicated to the archivelogs, they could be backed up
more frequently than a normal backup procedure, such as every 10
minutes. In that case, you should only backup the new files which
haven't been placed on the tape yet.
Disaster Recovery Scenarios
Backups are no good unless you know how to use them to restore your
database up to and including the most recent change made to the data.
Oracle gives those capabilities for up-to-the-instant recoveries to
you, as long as you know how to use them. This section describes
Oracle's recovery capabilities and how to use them for general
recovery of datafiles, tablespaces, or the entire database, and for
specific disaster recovery scenarios, which are variations on the
themes of the general recovery procedures.
Things To Check
Messages displayed during database startup will usually indicate what
kind of problems the database is experiencing. After you get an error
message during your initial startup, you will most likely need to do a
shutdown before proceeding with database recovery.
Alert Log in the database's bdump directory (you can find bdump using
"select value from v$parameter where name = 'background_dump_dest';"
or "show parameter background_dump"), named something like /u00/oracle/
admin/PROD/bdump/alert_PROD.log).
Other recent trace (.trc) files in the database's bdump
(background_dump_dest) and udump (user_dump_dest) directories.
Oracle processes, using "ps -ef | grep ora" at the unix command
line. Should see entries like ora_smon_PROD, ora_pmon_PROD,
ora_dbwr_PROD, ora_lgwr_PROD, ora_arch_PROD, and other ora_xxxx_PROD
processes for each database instance, and oraclePROD for jobsub and
other users.
The most recent .lis files in the job submission directory (such as /
home/jobsub), at the end of the "ls -ltr /home/jobsub" listing, which
may indicate why a user's run crashed or why it filled up a table or
archive log directory.
What To Restore
Only restore the datafile that has gone bad if you are recovering up
to the current time.
Restore ALL datafiles if you are recovering up to an earlier time than
now but after those datafiles were backed up (for point-in-time
recovery or cancel-based recovery). Since Oracle requires that all
datafiles be synchronized at the same SCN (System Change Number), just
recovering one datafile to an earlier time would cause that datafile
to have a lower SCN than all the other datafiles, which Oracle
wouldn't accept.
Restore all archivelog files that will be needed in the recovery (for
the time range between the backup and the point-in-time that you are
recovering to), if they are not currently available on-line or if
those on-line archivelog files have been corrupted.
NEVER restore control files unless all copies are lost. See "Loss of
Control Files" for information about restoring control files.
NEVER restore online redo log files from a hot backup (you shouldn't
have backed them up in the first place); it is OK to restore them when
doing cold backups, but ONLY IF you are restoring up to the time of
that backup or doing a partial restore where those old logs won't be
used during the restore. Restoring a backup of an active redo log
file would cause corruption during database recovery. If the redo
logs are deleted or damaged, just remove the damaged redo log files
and reset the logs on open, and Oracle will recreate the redo log
files for you (be sure you backup immediately after any resetlogs
command, since Oracle can't recover datafiles restored from a backup
made prior to resetting the logs):
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> alter database open resetlogs;
Backup/Restore Using Unix cpio Command
Here is a quick overview of using the unix cpio command for tape
backups and restores. The tape drive for our system is /dev/rmt0. (A
blocksize of 32K, indicated by C64, is used by these cpio commands.
You can change the blocksize of your tape drive in AIX from user root
through the smit utility, or by using the chdev command, as in "chdev -
l 'rmt0' -a block_size='32768'", which will speed up the writing to
that tape drive from the default 512-byte blocksize.)
Saving files to backup:
$ cat backemup.dat | cpio -ovC64 >/dev/rmt0
Listing tape contents:
$ cpio -itvC64 </dev/rmt0
Restoring files to same pathnames:
$ cpio -imvC64 </dev/rmt0
Restoring files individually to new pathnames, entering the new
pathname when prompted (also use "u" flag to overwrite newer files if
they exist as newpathname; for fullpathname, enclose patterns in
double quotes):
$ cpio -irmC64 fullpathname </dev/rmt0
newpathname (or, period (".") for same name, or, enter to
skip file)
Disaster Recovery Overview
The first thing you should ALWAYS do when your database crashes is to
make a BACKUP of your datafiles, control files, archive log files, and
initialization parameters file (init.ora) for your database that
crashed, either copying those files to tape or to another directory,
or zipping them with something like GNU Zip, or making a tar file of
them. DO NOT TRY TO RESTART YOUR DATABASE BEFORE BACKING UP THOSE
FILES. Otherwise, you might make the situation worse. If all else
fails, the people at SCT or Oracle may be able to get your database
going again from that initial backup, even though the crash may have
corrupted some files.
After a crash occurs and you've made a backup, do a STARTUP when
connected as sysdba. If there are problems encountered during startup,
Oracle will display an error message about the first problem, which
will usually give you an idea about what type of recovery scenario you
will need to apply. In most cases, you will need to do a shutdown
after this initial startup attempt before beginning the recovery.
There are three primary recovery options for media failure scenarios.
The three recovery options are:
1) Recover Database 2) Recover Datafile 3) Recover
Tablespace
Most of the other recovery scenarios are just variations on these
three themes. Recovery steps for specific disaster recovery scenarios
are given later on; however, in a generic recovery:
1. The database is shut down or the affected datafile(s) or tablespace
is taken offline.
2. The affected datafile(s) is restored from a backup along with the
archived redo logs created since that backup.
3. A recover command is issued.
4. The tablespace or datafile(s) is brought back online.
You can see the Oracle9i User-Managed Backup and Recovery Guide for
more information on Oracle database disaster recovery.
Basic Recover Database Recovery Option
The Recover Database option is used to recover all datafiles needing
recovery, either up to the point of failure, called complete recovery,
or up to a point in the past before the failure, called incomplete
recovery, which is only available using this option. Recover Database
is performed from the MOUNT state, so, the database will have to be
SHUT DOWN before using this. All datafiles must be ONLINE to be
recovered, in contrast to the other two options where the datafiles
are offline in most cases. The basic steps are: restore the datafiles,
mount, online the datafiles, recover database, and open, as shown
below:
$ cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
Restores the datafiles from backup, such as devl_PROD_01.dbf here.
$sqlplus "/ as sysdba"
SQL> startup mount
SQL> select * from v$datafile;
Shows status of datafiles, indicating those that are offline.
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf'
online;
Online the offline datafiles shown above, such as devl_PROD_01.dbf
here.
SQL> set autorecovery on
Tells recovery not to ask which redo logs to use (in
log_archive_dest).
SQL> recover database;
Recovers all datafiles, as needed, up to the current point in time.
SQL> alter database open;
Opens the database for user access.
You may also need to drop and recreate the TEMP tablespace after a
database recovery. See Loss of TEMP Datafile on how to do this.
Recover Database can also be used to perform an incomplete recovery up
to some point in time before the failure occurred, or before the
database was last shut down. To do this, ALL datafiles must be
restored from the backup before doing the incomplete recovery (not
control files or online redo log files), and ALL datafiles are
recovered back to the same point in time - you can't do a partial
incomplete recovery.
There are three incomplete recovery options available: time-based,
cancel-based, and change-based recovery. Time-based recovery brings
the datafiles up to a given date and time. Cancel-based recovery
applies entire archive log files and online redo log files one at a
time until the word CANCEL is typed in, so you can't use autorecovery
with it. Change-based recovery applies all archive log files and
online redo log files up to, but NOT including, a given system change
number. If you have added datafile(s) since the last backup, and you
want to recover up to a point before the datafile(s) were added, you
will need to restore the control files and use the "using backup
controlfile" option of the recover command (see Loss of Control
Files). (I don't know of a way to recover up to a point between
datafile additions - it's all or none.) The basic steps are similar
to the complete recover database option, except for the incomplete
recovery commands, which are shown below (only use one of these):
SQL> recover automatic database until time '2005-02-14:15:45:00';
Automatic is similar to Set Autorecovery On. Use 1 second before the
time of failure or of the log to exclude, which, for Oracle 8.x and
above, could be found using:
select to_char(a.time,'YYYY-MM-DD:HH24:MI:SS') as time,
a.name from
(select first_time as time, substr(name,1,58) as name, first_change#
as change from v$archived_log union select first_time, 'Redo Log Group
' || to_char(group#), first_change# from v$log) a order by change;
Or, for Oracle 7.x, could be found using:
select to_char(a.time,'YYYY-MM-DD:HH24:MI:SS') as time,
a.name from
(select time, substr(archive_name,1,58) as name, low_change# as change
from v$log_history union select first_time, 'Redo Log Group ' ||
to_char(group#), first_change# from v$log) a order by change;
SQL> recover database until cancel;
Accept logs until you type in CANCEL.
SQL> recover automatic database until change 43047423;
The system change numbers (SCN) contained in each archive log file is
shown in Oracle 8.x and above using: select
name,first_change#,next_change# - 1 from v$archived_log;
Or, for Oracle 7.x, using: select name,low_change#,high_change# from v
$log_history;
For the online redo logs, you could use "select max(first_change# - 1)
from v$log;" to find the SCN to use for applying all but the current
online redo log file.
After an incomplete recovery, you must open the database with the
RESETLOGS option, as shown below, and then immediately shut down the
database and make a BACKUP, since the old backup is unuseable after
RESETLOGS is used.
SQL> alter database open resetlogs;
Basic Recover Datafile Recovery Option
The Recover Datafile option is used to recover the specified datafile
up to the point of failure, synchronizing it with the other datafiles
(complete recovery only). Recover Datafile is performed either from
the MOUNT state (after a shutdown), with the datafile ONLINE or
OFFLINE, or from the OPEN state, with the datafile OFFLINE. A bad
datafile must be taken offline before the database can be opened.
Since the SYSTEM tablespace CANNOT be taken offline, you cannot use
Recover Datafile from the OPEN state on SYSTEM.
The basic steps for the MOUNT state are: restore the datafile, mount,
recover datafile (using automatic here instead of autorecovery),
online the datafile if needed, and open, as shown below:
$ cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> recover automatic datafile '/u03/oradata/PROD/devl_PROD_01.dbf';
SQL> select * from v$datafile;
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf'
online;
SQL> alter database open;
The basic steps for Recover Datafile from the OPEN state (except for
SYSTEM) while the rest of the database is still up and running, are:
offline the datafile, restore the datafile, recover datafile, and
online the datafile, as shown below:
$ sqlplus "/ as sysdba"
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf'
offline;
SQL> !cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
SQL> recover automatic datafile '/u03/oradata/PROD/devl_PROD_01.dbf';
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf'
online;
Doing a Recover Datafile from the OPEN state (except for SYSTEM) can
also be used to bring up your database for use before doing a
recovery. The basic steps are: mount, offline the datafile, open,
restore the datafile, recover datafile, and online the datafile, as
shown below:
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf'
offline;
SQL> alter database open;
The database is now available to the users, except for that datafile.
SQL> !cp -p /u03/oradata/prod1/devl_PROD_01.dbf /u03/oradata/PROD
SQL> recover automatic datafile '/u03/oradata/PROD/devl_PROD_01.dbf';
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf'
online;
You can still access data in tables in a tablespace which has an
offlined datafile, just so long as the data is in one of the other
datafiles of the tablespace and the table header is not in the
offlined datafile.
Basic Recover Tablespace Recovery Option
The Recover Tablespace option is used to recover all datafiles needing
recovery in a tablespace up to the point of failure, synchronizing
them with the other datafiles (complete recovery only). Recover
Tablespace is performed from the OPEN state after taking the
tablespace OFFLINE (which brings its datafiles offline). Since the
SYSTEM tablespace CANNOT be taken offline, you cannot use Recover
Tablespace on SYSTEM. The basic steps while the database is open are:
offline the tablespace, restore the datafiles, recover tablespace, and
online the tablespace, as shown below:
$ sqlplus "/ as sysdba"
SQL> alter tablespace development offline immediate;
Using Immediate rolls back currently pending transactions.
SQL> !cp -p /u03/oradata/prod1/devl_PROD* /u03/oradata/PROD
SQL> recover automatic tablespace development;
SQL> alter tablespace development online;
Doing a Recover Tablespace after opening the database can be used to
bring up your database for use before doing a recovery. The basic
steps are: mount, offline the bad datafiles, open, offline the
tablespace, restore the datafiles, recover tablespace, and online the
tablespace, as shown below:
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> alter database datafile '/u03/oradata/PROD/devl_PROD_01.dbf'
offline;
SQL> alter database open;
SQL> alter tablespace development offline;
SQL> !cp -p /u03/oradata/prod1/devl_PROD* /u03/oradata/PROD
SQL> recover automatic tablespace development;
SQL> alter tablespace development online;
If a write error occurs on one of the datafiles when you offline a
tablespace, use "alter tablespace tsname offline temporary;". Then,
you can run a recovery on the datafile. If all datafiles in an
offlined tablespace have write errors, use "alter tablespace tsname
offline immediate;". Then, you can run a recovery on the tablespace.
Archivelogs Disk Volume Filled Up
Symptoms: If the disk containing the directory for the archive log
files fills up, Oracle will stop all transactions until the archiver
process is able to continue writing to that directory. Doing a "df -k"
shows that there is no room left for more archivelogs in that
directory's disk. Current users' sessions will freeze, and users
trying to log on will get "ERROR: ORA-00257: archiver error. Connect
internal only, until freed.", since the archiver process is still
waiting to archive that redo log group. (Note: Use the oerr utility
in UNIX to get the error description, such as "oerr ora 257".) To
verify this conclusion:
$ sqlplus "/ as sysdba"
SQL> select value from v$parameter where name =
'background_dump_dest';
Shows the pathname of the bdump directory.
SQL> !tail -200 /u00/oracle/admin/PROD/bdump/alert_PROD.log
Shows "ORA-00272: error writing archive log", indicating that the redo
log group can't be written.
SQL> !ls -ltr /u00/oracle/admin/PROD/bdump/arch*
Get the latest archiver trace file in the bdump directory, such as
arch_13106.trc (the last one listed).
SQL> !cat /u00/oracle/admin/PROD/bdump/arch_13106.trc
Also shows "ORA-00272: error writing archive log".
Action: You will need to free up some space on that disk volume for
Oracle to continue, either by moving files off of that volume, or,
more likely, by deleting old archive log files out of that directory
which you have already backed up to tape. You must not delete archive
log files that have not been backed up, since you wouldn't be able to
recover transactions in those files if your database crashes. The
following script deletes the old log files earlier than the given
number of days (24-hour periods). If you back up nightly, 1 day is
the smallest number you should enter, or 3 on Monday's if you don't
back up on weekends. You have to be logged in as userid oracle or as
root to remove the archive log files.
# File: remove_old_logs.shl
echo "You must be logged in as user Oracle to run this script,"
echo "which removes all archivelog files older than X days."
echo "Enter number of days to keep: \c"
read DAYS_KP; export DAYS_KP
find /u01/oradata/PROD/archivelogs -name '*.arc' -mtime +$DAYS_KP -
exec rm {} \;
find /u01/oradata/PROD/archivelogs -name '*.arc.gz' -mtime +$DAYS_KP -
exec rm {} \;
echo "Results after deletions:"
du -k
df -k
If you are on a Windows NT-based system (including Windows XP and
Windows 2000) or on Windows 2003 (added 6/13/06), I've written a
similar script using DOS Batch commands, which you can download by
clicking on this: remove_old_logs.bat. This script uses extended DOS
commands in Windows NT and Windows 2003, which aren't available in
Windows 98 and before. It defaults to Windows 2003, so, if you want
to use it on Windows NT, change the osver variable in the script to NT
instead of 2003.
If you can't free up any space on the archive log disk, you can
redirect the archives to another disk until you are able to free up
space on it, such as shown below. This redirection goes away if you
restart your database, but you can make it permanent by updating the
server parameter file (spfile), also shown below, or the init.ora file
before restarting, according to which one of those that you are
using. I haven't tried this, but just saw it on the web in a
presentation by Chris Lawson on
dbspecialists.com.
$ sqlplus "/ as sysdba"
SQL> alter system archive log start to '<new archive log path name>';
SQL> alter system set log_archive_dest='<new archive log path name>'
scope=spfile;
Loss of Control Files
Symptoms: May be none until you try to shutdown and startup the
database. On shutdown, if the control files were deleted, you would
get "ORA-00210: cannot open control file '/u03/oradata/PROD/
ctrl_PROD_01.ctl'", or, if the control files were overwritten, you
would get "ORA-00201: control file version incompatible with ORACLE
version" along with their names. On startup for both cases, you would
get "ORA-00205: error in identifying control file '/u03/oradata/PROD/
ctrl_PROD_01.ctl', along with "ORA-07366: sfifi: invalid file, file
does not have valid header block." if overwritten.
Action: If you have a recent up-to-date textual control file (no added
datafiles) from "alter database backup controlfile to trace;" in your
user_dump_dest directory, you can just edit out the control file
header and run it:
$ sqlplus "/ as sysdba"
SQL> shutdown abort
SQL> !ls -ltr /u00/oracle/admin/PROD/udump/*.trc
Get the latest textual control file in user_dump_dest directory, such
as prod_ora_31494.trc (the last one listed).
SQL> !vi /u00/oracle/admin/PROD/udump/prod_ora_31494.trc
Edit the textual control file and remove all lines before the STARTUP
NOMOUNT line, remove the lines after ALTER DATABASE OPEN, and, for
Oracle 9i and above, and change all comment lines to start with
dashes. The vi commands to do these (for 9i+) are usually:
:1,/STARTUP NOMOUNT/-1d
:/ALTER DATABASE OPEN/+1,$d
:1,$s/^#/--/
:wq
SQL> @/u00/oracle/admin/PROD/udump/prod_ora_31494.trc
If you don't have a textual control file, you'll need to restore all
datafiles and control files, but, not online redo log files, from the
last backup and do a recovery "using backup controlfile":
$ sqlplus "/ as sysdba"
SQL> shutdown abort
At this point, restore ALL datafiles AND control files from the last
backup, along with any archivelogs that are needed since that time,
but, NOT the online redo log files.
SQL> connect / as sysdba
SQL> startup mount
SQL> recover automatic database using backup controlfile;
Then, AUTO on "ORA-00308: cannot open archived log ...".
SQL> alter database open resetlogs;
Immediately shut down the database and make a BACKUP, since the old
backup is unuseable after RESETLOGS is used.
Note that you can combine the recovery options depending on what's
needed. For example, I needed to recover up to a point in time before
a datafile was added, so, I did the steps above, including deleting
the added datafile after shutting down the database, and substituting
the following recover command (you won't enter AUTO as above, and,
when it's finished the partial recovery, it will show "Media recovery
cancelled."):
SQL> recover automatic database using backup controlfile until time
'2001-07-16:13:15:00';
Loss of TEMP Datafile
Symptoms: On large sorts (select distinct, order by, group by, union)
that can't be done in memory, the sort will fail with "ORA-01157:
cannot identify data file 3 - file not found" if the loss happened in
the middle of the sort, or "ORA-01116: error in opening database file
3" if the loss happened before the sort started, along with the file
name "ORA-01110: data file 3: '/u03/oradata/PROD/temp_PROD_01.dbf'".
Nothing is put into the alert.log file, and no trace files will be
generated for this.
You may also need to drop and recreate the TEMP tablespace after a
database recovery, which may be indicated by an error message like
"ORA-25153: temporary tablespace is empty" when you first log into
your application (such as SCT Banner), although you probably won't get
that message just logging into sqlplus.
Action: All you need to do is to take the datafile offline and drop
and recreate the TEMP tablespace, which can be done either while the
database is still running (from sqlplus) or beginning at the mount
state (when connected as sysdba). Here, we are using a locally
managed temporary tablespace.
$ sqlplus "/ as sysdba"
SQL> alter database datafile '/u03/oradata/PROD/temp_PROD_01.dbf'
offline;
Then, if from mount state, do an "alter database open;" here.
SQL> select file_name,bytes/1024 kbytes from dba_temp_files;
Shows the size of the TEMP datafile(s) in Kbytes.
SQL> select initial_extent/1024 kbytes from dba_tablespaces
where tablespace_name = 'TEMP';
Shows the uniform size of the extents in Kbytes.
SQL> drop tablespace temp;
SQL> !rm /u03/oradata/PROD/temp_PROD_01.dbf
SQL> create tablespace temp
tempfile '/u03/oradata/PROD/temp_PROD_01.dbf' size 40064K
extent management local uniform size 640K;
TEMP Datafile Offline
Symptoms: Similar to Loss of TEMP Datafile, except for the message
"ORA-00376: file 3 cannot be read at this time", along with the file
name message.
Action: You can offline the datafile and drop and recreate the
tablespace, as above, or you can do a datafile recovery with the
database open and bring the datafile online.
SQL> recover automatic datafile '/u03/oradata/PROD/temp_PROD_01.dbf';
SQL> alter database datafile '/u03/oradata/PROD/temp_PROD_01.dbf'
online;
Loss of INACTIVE Online Redo Log Group (which had already been
archived)
Symptoms: The database crashes when an attempt is made to access that
redo log group during a logfile switch. Current users will be kicked
out with "ORA-01092: ORACLE instance terminated. Disconnection
forced", and users trying to log on will get "ERROR: ORA-03114: not
connected to ORACLE" and "ERROR: ORA-00472: PMON process terminated
with error". Nothing will be placed in the alert.log file; however,
there will be pmon (process monitor), dbwr (database writer), and lgwr
(log writer) trace files generated in the background dump dest
directory. Check these from the unix command line, since the database
is not up to check them from:
$ grep background_dump_dest /u00/oracle/product/v723/dbs/initPROD.ora
Shows the pathname of the bdump directory.
$ cd /u00/oracle/admin/PROD/bdump
$ ls -ltr *.trc
Get the latest trace files in the bdump directory, such as
pmon_13612.trc and lgwr_32306.trc (the last ones listed).
$ cat pmon_13612.trc
Shows "ORA-00470: LGWR process terminated with error" (also in dbwr).
$ cat lgwr_32306.trc
Shows "ORA-00313: open failed for members of log group 3 of thread 1"
and messages containing the missing log file names like "ORA-00312:
online log 3 thread 1: '/u03/oradata/PROD/log_PROD_3B.rdo'".
Action: You will need to startup the database in the mount state, drop
the missing logfile group (and remove the group's member files, if
they are not already gone), and add the logfile group back, at which
time the database can be opened for use.
$ sqlplus "/ as sysdba"
SQL> startup
Shows "ORA-01081: cannot start already-running ORACLE - shut it down
first".
SQL> startup force
Or, you could have done a "shutdown abort" followed by a "startup".
Shows "Database mounted." and then "ORA-00313: open failed for members
of log group 3 of thread 1" with additional messages containing the
log file names.
SQL> select bytes/1024 from v$log where group# = 3;
Shows the size in K of the missing group's members.
SQL> select member from v$logfile where group# = 3;
Shows the member (file) names of the missing group.
SQL> alter database drop logfile group 3;
SQL> alter database add logfile group 3
('/u03/oradata/PROD/log_PROD_3A.rdo',
'/u03/oradata/PROD/log_PROD_3B.rdo') size 500K;
Recreates all member files for that group of the given size.
SQL> alter database open;
If just one member of group is lost, the other member(s) will take
over and allow Oracle to keep running with no apparent problem.
However, alert_PROD.log will show errors such as "ORA-00313: open
failed for members of log group 3 of thread 1" and "ORA-00312: online
log 3 thread 1: '/u03/oradata/PROD/log_PROD_3A.rdo'" on each log
switch for that group. In this case, drop and recreate the member:
SQL> select * from v$log where group# = 3;
If the status is active or current for that group, do an "alter system
switch logfile;"
SQL> alter database drop logfile member
'/u03/oradata/PROD/log_PROD_3A.rdo';
SQL> alter database add logfile member
'/u03/oradata/PROD/log_PROD_3A.rdo' to group 3;
Loss of CURRENT Online Redo Log Group (which needed to be archived)
Symptoms: This behaves like the database freezing when the archivelogs
disk volume is filled up; however, doing a "df -k" shows that there is
plenty of room for more archivelogs. Current users' sessions will
freeze, and users trying to log on will get "ERROR: ORA-00257:
archiver error. Connect internal only, until freed.", since the
archiver process is still waiting to archive that redo log group after
Oracle has cycled through the other groups and needs to use that group
again. To verify this conclusion:
$ sqlplus "/ as sysdba"
SQL> select value from v$parameter where name =
'background_dump_dest';
Shows the pathname of the bdump directory.
SQL> !tail -200 /u00/oracle/admin/PROD/bdump/alert_PROD.log
Shows "ORA-00286: No members available, or no member contains valid
data", indicating that the redo log group is missing or corrupt, as
well as messages such as "ORACLE Instance PROD - Can not allocate log,
archival required", "Thread 1 cannot allocate new log, sequence 21",
and "All online logs needed archiving".
SQL> !ls -ltr /u00/oracle/admin/PROD/bdump/arch*
Get the latest archiver trace file in the bdump directory, such as
arch_22882.trc (the last one listed).
SQL> !cat /u00/oracle/admin/PROD/bdump/arch_22882.trc
Also shows "ORA-00286: No members available, or no member contains
valid data".
SQL> shutdown abort
SQL> startup
Shows "ORA-00313: open failed for members of log group 2 of thread 1",
or whatever group number has the problem.
Then, if you had tried to just drop and recreate the redo log group
from this mount state (as in loss of inactive online redo log group),
you would have gotten:
SQL> alter database drop logfile group 2;
Fails with "ORA-00350: log 2 of thread 1 needs to be archived".
Action: This requires an incomplete recovery to just before that group
was used, since Oracle can't continue without doing a successful
archive, and since you can't drop and recreate the redo log group
while the archiver has it marked for archiving. Note that all changes
in the lost redo log group will be lost (yuck!). An incomplete time-
based recovery is shown below:
SQL> connect / as sysdba
SQL> shutdown abort
At this point, restore ALL datafiles from the last backup, any
archivelogs that are needed since that time, AND the lost online redo
log group's files (this is the only case in which you would restore
those redo logs), but, NOT the control files.
SQL> startup mount
SQL> select group#, sequence#, bytes, first_change#,
to_char(first_time,'DD-MON-YY HH24:MI:SS'), status from v$log;
Get the first time for changes in that missing group, subtract 1
second from it, and use that value in the recover command below.
SQL> recover automatic database until time '2005-02-14:12:59:59';
SQL> alter database open resetlogs;
Then, immediately shut down the database and make a BACKUP, since the
old backup is unuseable after RESETLOGS is used.
Failure During Hot Backup
If you have a failure while you are doing a hot backup on a
tablespace, besides doing any recovery that is needed for the
particular failure, you will also need to bring those tablespace
datafiles back out of hot backup mode. To do this, while the database
is in a mount state, do an "end backup" on each of those datafiles
before opening the database. This is available in Oracle 7.2 and
above - before that, you would have to restore all the datafiles and
do an incomplete recovery to the time before the hot backup was
started on that tablespace.
$ sqlplus "/ as sysdba"
SQL> startup mount
SQL> select
df.name,bk.time from v$datafile df,v$backup bk
where df.file# = bk.file# and bk.status = 'ACTIVE';
Shows the datafiles currently in a hot backup state.
SQL> alter database datafile
'/u03/oradata/PROD/devl_PROD_01.dbf' end backup;
Do an "end backup" on those listed hot backup datafiles.
SQL> alter database open;
Loss of Server Parameter File (spfile)
If your server parameter file (spfile) becomes corrupt, and you
haven't been creating a textual init.ora parameter file as a backup,
you can pull the parameters from it using the strings command in UNIX
to create an init.ora file. You will need to edit the resulting file
to get rid of any garbage characters in it (but don't worry about the
"*." characters at the beginning of the lines) and make any
corrections to it before using it to start your database, but, at
least you will have something to go by:
$ cd $ORACLE_HOME/dbs
$ strings spfilePROD.ora >initPROD.ora
If you have been saving off a textual init.ora parameter file as a
backup, you can restore that init.ora file to the $ORACLE_HOME/dbs
directory in UNIX (or $ORACLE_HOME\database directory in NT). You
will need to delete the corrupt spfile before trying to restart your
database, since Oracle looks for the spfile first, and the init.ora
file last, to use as the parameter file when it starts up the database
(or, you could leave the spfile there and use the pfile option in the
startup command to point to the init.ora file). Then, once your
database is up, you can recreate the spfile using the following (as
sysdba):
SQL> create spfile from pfile;