dbwr and lgwr don't shutdown

NetComrade

unread,

Mar 31, 2005, 1:10:22 PM3/31/05

to

This is a weird issue.

We have a machine (v880/solaris8/latest recommended patch cluster),
where a particular Standard Edition 9.2.0.5 database shuts down
'cleanly' yet leaves two processes behind, which we cannot kill unless
reboot. The same database (storage attached to multiple machines) on a
solaris7/e4500 machine comes up/shuts down fine. (i tiried to shutdown
the rest with aborts, as you can see from the logs, didn't help)

Any idea?
here's a log from shutdown on v880

Thu Mar 31 10:28:50 2005
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 4
Waiting for dispatcher 'D000' to shutdown
Waiting for dispatcher 'D001' to shutdown
Waiting for dispatcher 'D002' to shutdown
Waiting for dispatcher 'D003' to shutdown
All dispatchers and shared servers shutdown
Thu Mar 31 10:28:54 2005
ALTER DATABASE CLOSE NORMAL
Thu Mar 31 10:28:54 2005
SMON: disabling tx recovery
SMON: disabling cache recovery
Thu Mar 31 10:28:54 2005
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Thread 1 closed at log sequence 1015
Successful close of redo thread 1
Thu Mar 31 10:28:54 2005
Completed: ALTER DATABASE CLOSE NORMAL
Thu Mar 31 10:28:54 2005
ALTER DATABASE DISMOUNT
Completed: ALTER DATABASE DISMOUNT
ARCH: Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
.......
We use Oracle 8.1.7.4 and 9.2.0.5 on Solaris 2.7 boxes
remove NSPAM to email

NetComrade

unread,

Mar 31, 2005, 1:36:20 PM3/31/05

to

well, 'cleanly' wasn't really a proper word

shutdown immediate hangs on database dismount

as you can see tried more than once.
[TEST@v880 oracle]$ps -ef|egrep 'lgw|dbw'
oracle 13033 1 0 13:30:28 ? 0:00 ora_dbw0_TEST
oracle 3906 1 0 13:19:39 ? 0:00 ora_dbw0_TEST
oracle 3908 1 0 13:19:39 ? 0:00 ora_lgwr_TEST
oracle 13035 1 0 13:30:28 ? 0:00 ora_lgwr_TEST
oracle 13432 4325 0 13:32:31 pts/3 0:00 egrep lgw|dbw

HansF

unread,

Mar 31, 2005, 1:45:51 PM3/31/05

to

On Thu, 31 Mar 2005 18:10:22 +0000, NetComrade wrote:

>
> This is a weird issue.
>
> We have a machine (v880/solaris8/latest recommended patch cluster),
> where a particular Standard Edition 9.2.0.5 database shuts down
> 'cleanly' yet leaves two processes behind, which we cannot kill unless
> reboot. The same database (storage attached to multiple machines) on a
> solaris7/e4500 machine comes up/shuts down fine. (i tiried to shutdown
> the rest with aborts, as you can see from the logs, didn't help)

Let me verify by paraphrasing:

a) Two machines, V880 & E4500, set up to talk to same 9.2.0.5SE DB
instance (controlled so it's not concurrent), with NAS or SAN (not sure
which).

Database is started on one machine, then shutdown.

b) When E4500 (Solaris 7) is used, shutdown instance goes down properly;

c) When V880 (Solaris 8) is used, all shuts down except the two processes
that are supposed to write (and wait for write verification) to the shared
disk.

If this is true, perhaps you could indicate a bit more on the disk
subsystem.

/Hans

NetComrade

unread,

Mar 31, 2005, 2:04:04 PM3/31/05

to

after a while this appears..

waiting for detached processes to terminate

NetComrade

unread,

Mar 31, 2005, 2:49:33 PM3/31/05

to

the binaries live on the disk array together with the data
the disk array is a5200.

apologies for not making it clear

I tried trussing either processes, they just didn't seem to be doing
anything (perhaps I didn't wait long enough though..)

.......

HansF

unread,

Mar 31, 2005, 3:20:13 PM3/31/05

to

On Thu, 31 Mar 2005 19:49:33 +0000, NetComrade wrote:

> the binaries live on the disk array together with the data
> the disk array is a5200.
>
> apologies for not making it clear
>
> I tried trussing either processes, they just didn't seem to be doing
> anything (perhaps I didn't wait long enough though..)
>

1) Are you saying this is NFS and not SAN or NAS? That leads to a
possible area of interest.

2) If you insist on top-posting, I will not respond further. It's not
worth MY time to flip between top and bottom, and I use bottom (as do
many, if not most, traditional newsgroup posters).

Top posting is a traditionally accepted method for email, and embedded
bottom posting (with trim of unneeded info) is the accepted method for
newsgroups.

Reason - one to one communication in email assumes remembering previous
communication, and therefore scrolling down is just used as a memory jog.
Newsgroup posting assumes that anyone in the world will jump in as
appropriate without prior interaction, so they should be able to follow
the thread segment from top to bottom where the latest info resides.

/Hans

Sybrand Bakker

unread,

Mar 31, 2005, 3:34:37 PM3/31/05

to

On Thu, 31 Mar 2005 20:20:13 GMT, HansF <News...@telus.net> wrote:

>1) Are you saying this is NFS and not SAN or NAS? That leads to a
>possible area of interest.

Isn't SAN or NAS always running on top of NFS? The older versions of
NFS use the UDP protocol, which by definition is unsafe, as packet
receipt is not acknowlegded by the receiving party.

--
Sybrand Bakker, Senior Oracle DBA

HansF

unread,

Mar 31, 2005, 3:44:53 PM3/31/05

to

On Thu, 31 Mar 2005 22:34:37 +0200, Sybrand Bakker wrote:

> On Thu, 31 Mar 2005 20:20:13 GMT, HansF <News...@telus.net> wrote:
>
>>1) Are you saying this is NFS and not SAN or NAS? That leads to a
>>possible area of interest.
>
> Isn't SAN or NAS always running on top of NFS? The older versions of

That's not quite the way I learned it, but I won't argue (too much) -
except with the 'always' word. <g>

> NFS use the UDP protocol, which by definition is unsafe, as packet
> receipt is not acknowlegded by the receiving party.

This is heading in the same direction I was heading - the current SAN/NAS
protocols tend to be safe, whereas pure NFS has been known to cause
'write-confirmation' issues on occasion.

In addition, I note there are different versions of Solaris. I have no
idea whether the filesystem [drivers] are identical.

No matter which way I look, this points at a filesystem issue, not an
Oracle issue. I'd need to eliminate that first before pointing back at
Oracle.

/Hans

HansF

unread,

Mar 31, 2005, 3:53:50 PM3/31/05

to

On Thu, 31 Mar 2005 20:44:53 +0000, HansF wrote:

>
> No matter which way I look, this points at a filesystem issue, not an
> Oracle issue. I'd need to eliminate that first before pointing back at
> Oracle.

And that leads to the other question - NetComrade mumbled about an A5200
array, but has not indicated what the file system is, or which clusterware
is used. In my experience, clusterware tends to be very version-sensitive.

/Hans

Fabrizio

unread,

Mar 31, 2005, 3:58:48 PM3/31/05

to

HansF wrote:
> On Thu, 31 Mar 2005 22:34:37 +0200, Sybrand Bakker wrote:
>
>
>>On Thu, 31 Mar 2005 20:20:13 GMT, HansF <News...@telus.net> wrote:
>>
>>
>>>1) Are you saying this is NFS and not SAN or NAS? That leads to a
>>>possible area of interest.
>>
>>Isn't SAN or NAS always running on top of NFS? The older versions of
>
>
> That's not quite the way I learned it, but I won't argue (too much) -
> except with the 'always' word. <g>
>

SAN is on top of SCSI or Fiber Channel.
There is an implementation using TCP called iSCSI.

>>NFS use the UDP protocol, which by definition is unsafe, as packet
>>receipt is not acknowlegded by the receiving party.
>
>
> This is heading in the same direction I was heading - the current SAN/NAS
> protocols tend to be safe, whereas pure NFS has been known to cause
> 'write-confirmation' issues on occasion.
>

NFS can be used on top of TCP instead of UDP and the modern
implementation are much safer than the olds (but I'm not advocating the
use of NFS for critical application: I have seen to many processes
blocked in D state).

Regards

--
Fabrizio Magni

fabrizi...@mycontinent.com

replace mycontinent with europe

NetComrade

unread,

Mar 31, 2005, 5:44:52 PM3/31/05

to

I apologize.. Let me re-state the problem

a) v880 and 2 e4500 (nodes) are connected ot same storage via fibre
hubs in Arbitraded Loops, so this is an old SAN
b) each node controls different disks via Veritas Volume Manager
c) each node is capable of running each other's databases.
Databases/disks are grouped, and groups can be swung over by Veritas
Cluster server
d) e4500 run solaris 7
e) v880 is running solaris 8 (it can't run 7)
g) the file system is vxfs

This is a 'cluster', but there is always a once instance of a
database. The only 'cluster' thing about it, is that instances can
fail over if node/host fails.

e4500 has no problem opening/closing the TEST database
(oracle_sid=TEST). The V880 has problems.. but didn't have problems
earlier.. this happenned after we upgraded OBP on the server.. but the
machine had too many issues prior to that, so we are currently
rebuilding the box.

On Thu, 31 Mar 2005 20:20:13 GMT, HansF <News...@telus.net> wrote:

.......

HansF

unread,

Mar 31, 2005, 11:35:10 PM3/31/05

to

On Thu, 31 Mar 2005 22:44:52 +0000, NetComrade wrote:

> I apologize.. Let me re-state the problem
>
> a) v880 and 2 e4500 (nodes) are connected ot same storage via fibre

>>

>>2) If you insist on top-posting, I will not respond further. It's not
>>worth MY time to flip between top and bottom, and I use bottom (as do

.... plonk

Paul

unread,

Apr 1, 2005, 1:27:16 AM4/1/05

to

netcomr...@bookexchange.net (NetComrade) wrote:

>I apologize.. Let me re-state the problem

Did you miss this bit of Hans' post

>2) If you insist on top-posting, I will not respond further. It's not
>worth MY time to flip between top and bottom, and I use bottom (as do
>many, if not most, traditional newsgroup posters).

Check my sig also. There is a *_REASON_* why people don't like
top-posting - it makes messages and threads difficult to follow.

People who help out here are doing it of their own free will and
without payment of any kind. The *_LEAST_* you can do is make things
easy for them by making your responses as clear and as legible as
possible.

Paul...

--

plinehan __at__ yahoo __dot__ __com__

XP Pro, SP 2,

Oracle, 9.2.0.1.0 (Enterprise Ed.)
Interbase 6.0.2.0;

When asking database related questions, please give other posters
some clues, like operating system and version of db being used.
Thanks.

Furthermore, As a courtesy to those who spend
time analyzing and attempting to help, please
do not top post.

HansF

unread,

Apr 1, 2005, 7:28:29 AM4/1/05

to

On Fri, 01 Apr 2005 07:27:16 +0100, Paul interested us by writing:

> Did you miss this bit of Hans' post
>
>>2) If you insist on top-posting, I will not respond further. It's not
>>worth MY time to flip between top and bottom, and I use bottom (as do
>>many, if not most, traditional newsgroup posters).
>
>
> Check my sig also. There is a *_REASON_* why people don't like

Funny way to demonstrate the 'comrade' part of his handle... Oh well, now
they have more justification for accusing us of arrogance.

--
Hans Forbrich
Oracle training and consulting in Canada
mailto: Fuzzy.GreyBeard _at_ gmail.com
or: `echo "News.Hans@Telus_NOSPAM.net" | sed s/_NOSPAM//g`

NetComrade

unread,

Apr 1, 2005, 3:43:04 PM4/1/05

to

I see what you mean now.. I apologize.. in about 12years using UseNet
this is the first complaint I've seen.. I never even heard a complaint
on 'top posting.

I've used a newsreader client called "Free Agent" in the past 8-10
years, and the default of the cusror is at the top of the post.

Apologies once again.

-a

Sybrand Bakker

unread,

Apr 1, 2005, 4:55:19 PM4/1/05

to

On Fri, 01 Apr 2005 20:43:04 GMT, netcomr...@bookexchange.net
(NetComrade) wrote:

>I've used a newsreader client called "Free Agent" in the past 8-10
>years, and the default of the cusror is at the top of the post.

Sure, and one single click with your mouse or pressing ctrl-end brings
you at the bottom. Computers make people lazy!

Joel Garry

unread,

Apr 1, 2005, 5:58:39 PM4/1/05

to

Hey, that's not in the charter!

:-) sorry, resisted it in that other thread, could resist no longer.

Actually, I agree, and read oracle-l in digest format, where the
concatenated top-posts just drive me nuts (so in fact I sometimes don't
read it for that very reason, it's just a PITA). But of course, it is
email based and not a traditional usenet group.

But you must be too young to remember the rule about keeping .signature
files to no more than 4 lines, 78 characters per line. :-)

jg
--
@home.com is bogus. "Early to bed and early to rise makes a man
healthy, wealthy, and wise. " - Benjamin Franklin, who liked to sleep
late, and wrote a whimsical letter to the Journal de Paris suggesting
daylight-saving time.

HansF

unread,

Apr 1, 2005, 6:10:33 PM4/1/05

to

On Fri, 01 Apr 2005 14:58:39 -0800, Joel Garry interested us by writing:

>> Furthermore, As a courtesy to those who spend
>> time analyzing and attempting to help, please
>> do not top post.
>
> Hey, that's not in the charter!

The charters were created back when courtesy and community spirit was
common. (Now it seems to be demanded - of the other person.)

I'm fed up with it, and going to do something about it - see my sig. And,
as a side effect, it'll reduce the number of Windows-related questions I
get involved with. :-Q

(BTW: OP has been plonked for 30 days. Punting it to anyone else!)

/Hans
--
Hans Forbrich
Canada-wide Oracle training and consulting
mailto: Fuzzy.GreyBeard_at_gmail.com
*** I no longer assist with top-posted newsgroup queries ***

Paul

unread,

Apr 3, 2005, 10:53:50 AM4/3/05

to

netcomr...@bookexchange.net (NetComrade) wrote:

>I see what you mean now.. I apologize.. in about 12years using UseNet
>this is the first complaint I've seen.. I never even heard a complaint
>on 'top posting.
>I've used a newsreader client called "Free Agent" in the past 8-10
>years, and the default of the cusror is at the top of the post.

I've used Free Agent for years now (and will consider purchasing it,
when you don't have to open multiple instances for different
newsservers) and I have no problem at all not top-posting.

Most computers have a thingy called a mouse that you can use to scroll
down to place the text near to the spot where it is most relevant - an
item called a keyboard can be used to do the same thing (with the
arrow and page down/up keys).

Paul...

>-a

--

plinehan __at__ yahoo __dot__ __com__

XP Pro, SP 2,

Oracle, 9.2.0.1.0 (Enterprise Ed.)
Interbase 6.0.2.0;

When asking database related questions, please give other posters
some clues, like operating system and version of db being used.
Thanks.

HansF

unread,

Apr 3, 2005, 1:21:55 PM4/3/05

to

On Sun, 03 Apr 2005 15:53:50 +0100, Paul interested us by writing:

I don't see OP's posts, but do see responses. Based on that, I assume
OP's question has not yet been answered.

(I'd first eliminate idea that problem is related to file system, possibly
clusterware and potentially NFS issues that have differences between the
versions of Solaris. Probably something to chase in a Solaris-related
group.)

> Most computers have a thingy called a mouse that you can use to ...

(I suspect OP got the point by now.)

NetComrade

unread,

Apr 4, 2005, 6:01:18 PM4/4/05

to

On Sun, 03 Apr 2005 17:21:55 GMT, HansF <News...@telus.net> wrote:

>On Sun, 03 Apr 2005 15:53:50 +0100, Paul interested us by writing:
>
>I don't see OP's posts, but do see responses. Based on that, I assume
>OP's question has not yet been answered.
>
>(I'd first eliminate idea that problem is related to file system, possibly
>clusterware and potentially NFS issues that have differences between the
>versions of Solaris. Probably something to chase in a Solaris-related
>group.)
>
>> Most computers have a thingy called a mouse that you can use to ...
>
>(I suspect OP got the point by now.)

Thanks Hans.. :) I get it.

NetComrade

unread,

Apr 4, 2005, 6:05:01 PM4/4/05

to

On Thu, 31 Mar 2005 18:10:22 GMT, netcomr...@bookexchange.net
(NetComrade) wrote:

>This is a weird issue.
>
>We have a machine (v880/solaris8/latest recommended patch cluster),
>where a particular Standard Edition 9.2.0.5 database shuts down
>'cleanly' yet leaves two processes behind, which we cannot kill unless
>reboot. The same database (storage attached to multiple machines) on a
>solaris7/e4500 machine comes up/shuts down fine. (i tiried to shutdown
>the rest with aborts, as you can see from the logs, didn't help)
>
>Any idea?
>here's a log from shutdown on v880

The issue has been narrowed down to DISK_ASYNCH_IO=true
When set to false, it works, however, disk performance degrades 4-5X
It has been further narrowed down to QIO feature of Veritas.
When QIO is turned off, shutdown also works, but performance still
degrades 4-5x.

Veritas analyzed savecores and suggested to upgrade to supported
version (3.5+) and see what happens..

We can live with performance degration for now, since this is an OLTP
db (99%+ memory reads)

We'll look for proper upgrade paths...

NetComrade

unread,

Apr 6, 2005, 11:46:59 AM4/6/05

to

workaround is to set dbwr_io_slaves, btw.