Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Judson Valeski (or any other embedding guru): Please Help

0 views
Skip to first unread message

Ed Burns

unread,
Jul 13, 2001, 4:17:46 PM7/13/01
to
I have a long standing embedding bug,
<http://bugzilla.mozilla.org/show_bug.cgi?id=64332>, webclient freezes
on WINNT after brief period of use.

I've determined that for some reason the mozilla event queue stops
firing. The java event queue does continue to fire however. If anyone
is experienced with how the event queue works for embedding apps, I'd
really appreciate any guidance.

Thanks,

Ed

--
Remove REMOVE_THIS from email address before replying.
These are my views, and may not be the same as Sun Microsystems Inc.

Ed Burns

unread,
Jul 13, 2001, 7:16:30 PM7/13/01
to
Ed Burns <ed.burnsR...@sun.com> writes:

> I have a long standing embedding bug,
> <http://bugzilla.mozilla.org/show_bug.cgi?id=64332>, webclient freezes
> on WINNT after brief period of use.
>
> I've determined that for some reason the mozilla event queue stops
> firing. The java event queue does continue to fire however. If anyone
> is experienced with how the event queue works for embedding apps, I'd
> really appreciate any guidance.

After writing some debugging code, I found the last MSG.message
processed by the mozilla event queue is c138. What is this message
value? It's always the last one before the freeze.

Judson Valeski

unread,
Jul 16, 2001, 11:50:37 AM7/16/01
to Ed Burns, mozilla-...@mozilla.org, mozilla...@mozilla.org
Are you calling NS_DoIdleEmbeddingStuff() and NS_HandleEmbeddingEvent()
as necessary?

Jud

Ed Burns

unread,
Jul 16, 2001, 5:13:21 PM7/16/01
to
Judson Valeski <val...@netscape.com> writes:

> Are you calling NS_DoIdleEmbeddingStuff() and NS_HandleEmbeddingEvent()
> as necessary?

Thanks Jud,

I didn't know about these. I'm currently running my stress test to see
if it works. It's working so far.

Is there a way to programmically disable all assertions for a debug
build? My automated test is hampered by the assertions that continually
pop up about Mork and such.

Thanks again. It looks like it's working.

Ed Burns

unread,
Jul 16, 2001, 5:32:32 PM7/16/01
to
Ed Burns <ed.burnsR...@sun.com> writes:

> Judson Valeski <val...@netscape.com> writes:
>
> > Are you calling NS_DoIdleEmbeddingStuff() and NS_HandleEmbeddingEvent()
> > as necessary?
>
> Thanks Jud,
>
> I didn't know about these. I'm currently running my stress test to see
> if it works. It's working so far.
>
> Is there a way to programmically disable all assertions for a debug
> build? My automated test is hampered by the assertions that continually
> pop up about Mork and such.
>
> Thanks again. It looks like it's working.

Grr. Lamentably, incorporating these calls in my event loop in the same
manner as used in winEmbed did not fix the problem.

This time the last event processed by the msg queue is 0xC16F. Any ideas?

Jud, when's the next time you are coming to the Bay Area?

Ed Burns

unread,
Jul 16, 2001, 6:36:12 PM7/16/01
to
Ed Burns <ed.burnsR...@sun.com> writes:

> Grr. Lamentably, incorporating these calls in my event loop in the same
> manner as used in winEmbed did not fix the problem.
>
> This time the last event processed by the msg queue is 0xC16F. Any ideas?

I modified prmon.c to print out a message on monitor enter and exit like
this:

fprintf(msgFile, "Enter Monitor: %p\n", mon);
fflush(msgFile);

I analyzed the output and found that the monitor with pointer value
0x056BB390 was "Entered" 5153 times and "Exited" 5150 times before the
crash. Also monitor 0x063FA9C0 "Entered" 11 times, "Exited" 9 times.

Could this be causing deadlock?

Here's the full output of my test data:

The number after the pointer is the number of times that monitor was
entered or exited.

Enter Monitor: 051D2170 37
Exit Monitor: 051D2170 37
Enter Monitor: 051D4420 6
Exit Monitor: 051D4420 6
Enter Monitor: 051D4CF0 117
Exit Monitor: 051D4CF0 117
Enter Monitor: 051D6880 4327
Exit Monitor: 051D6880 4327
Enter Monitor: 052A66C0 14
Exit Monitor: 052A66C0 14
Enter Monitor: 056566B0 3182
Exit Monitor: 056566B0 3182
Enter Monitor: 056BB390 5153
Exit Monitor: 056BB390 5150
Enter Monitor: 056BED70 9
Exit Monitor: 056BED70 9
Enter Monitor: 056D36A0 55
Exit Monitor: 056D36A0 55
Enter Monitor: 06382750 178
Exit Monitor: 06382750 178
Enter Monitor: 063B8D10 2
Exit Monitor: 063B8D10 2
Enter Monitor: 063C1030 7
Exit Monitor: 063C1030 7
Enter Monitor: 063C1420 5
Exit Monitor: 063C1420 5
Enter Monitor: 063C2D50 4
Exit Monitor: 063C2D50 4
Enter Monitor: 063F0E90 16
Exit Monitor: 063F0E90 16
Enter Monitor: 063F1A60 21
Exit Monitor: 063F1A60 21
Enter Monitor: 063FA9C0 11
Exit Monitor: 063FA9C0 9

Judson Valeski

unread,
Jul 16, 2001, 7:52:38 PM7/16/01
to Ed Burns
There's definately a lock/monitor in-balance here. I'm at a complete loss as to what the cause could be though (esp. now that you're "doing embedding idle stuff".

Jud

Ed Burns

unread,
Jul 16, 2001, 7:55:31 PM7/16/01
to
Ed Burns <ed.burnsR...@sun.com> writes:

> Ed Burns <ed.burnsR...@sun.com> writes:
>
> > Grr. Lamentably, incorporating these calls in my event loop in the same
> > manner as used in winEmbed did not fix the problem.
> >
> > This time the last event processed by the msg queue is 0xC16F. Any ideas?
>
> I modified prmon.c to print out a message on monitor enter and exit like
> this:
>
> fprintf(msgFile, "Enter Monitor: %p\n", mon);
> fflush(msgFile);

I have refined the test data some more, writing a perl program to take
the monitor exit and enter output data and print out only the cases
where numEnters != numExits.

I ran the program in the debugger until it froze and collected the data,
along with the stack traces on the threads mentioned in the data. I did
this twice.

Hypothesis: I believe that deadlock is occurring and I have a hunch that
there are some valuable clues in the data below. Can someone please
look at the stack traces and see if they can spot the deadlock? Is this
the right forum for this kind of information? I'm really stuck here.
My MO is to collect enough information for someone who is an expert to
gain insight.

CASE 1
------

Enter Monitor: 056BBD70: 229949.
Exit Monitor: 056BBD70: 229945.
Enter Monitor: 08AFFA80: 14.
Exit Monitor: 08AFFA80: 12.

Thread A
========

PR_EnterMonitor(PRMonitor * 0x056bbd70) line 87 + 14 bytes
util_PostEvent(WebShellInitContext * 0x051d53d0, PLEvent * 0x08ae7404) line 49 + 21 bytes
Java_org_mozilla_webclient_wrapper_1native_NavigationImpl_nativeStop(JNIEnv_ * 0x0086faa0, _jobject * 0x070dfebc, long 85808080) line 295 + 13 bytes

Thread B
========

PR_EnterMonitor(PRMonitor * 0x08affa80) line 87 + 14 bytes
nsAutoMonitor::nsAutoMonitor(PRMonitor * 0x08affa80) line 184 + 13 bytes
nsSocketTransport::Dispatch(nsSocketRequest * 0x08aff790) line 1288
nsSocketRequest::Cancel(nsSocketRequest * const 0x08aff790, unsigned int 2152398850) line 2527
nsHttpConnection::OnTransactionComplete(unsigned int 2152398850) line 247
nsHttpTransaction::Cancel(nsHttpTransaction * const 0x08afadd0, unsigned int 2152398850) line 598
nsHttpChannel::Cancel(nsHttpChannel * const 0x08afccb0, unsigned int 2152398850) line 1563
nsLoadGroup::Cancel(nsLoadGroup * const 0x063765b0, unsigned int 2152398850) line 239 + 16 bytes
nsDocLoaderImpl::Stop(nsDocLoaderImpl * const 0x06376620) line 278 + 31 bytes
nsURILoader::Stop(nsURILoader * const 0x06376e40, nsISupports * 0x06376638) line 536 + 23 bytes
nsDocShell::Stop(nsDocShell * const 0x06376e90) line 2211
wsStopEvent::handleEvent() line 355 + 18 bytes
handleEvent(PLEvent * 0x08aff844) line 48 + 11 bytes
PL_HandleEvent(PLEvent * 0x08aff844) line 590 + 10 bytes
processEventLoop(WebShellInitContext * 0x051d53d0) line 439 + 9 bytes
Java_org_mozilla_webclient_wrapper_1native_NativeEventThread_nativeProcessEvents(JNIEnv_ * 0x0083e840, _jobject * 0x0544febc, long 85808080) line 242 + 9 bytes

Thread C
========

PR_EnterMonitor(PRMonitor * 0x056bbd70) line 87 + 14 bytes
PL_PostEvent(PLEventQueue * 0x056bb740, PLEvent * 0x051d5eac) line 251 + 10 bytes
nsEventQueueImpl::PostEvent(nsEventQueueImpl * const 0x056bb1a0, PLEvent * 0x051d5eac) line 251 + 16 bytes
nsMemoryImpl::FlushMemory(const unsigned short * 0x05141db4, int 0) line 432 + 30 bytes
MemoryFlusher::Run(MemoryFlusher * const 0x051d6d60) line 177 + 43 bytes
nsThread::Main(void * 0x051d6bb0) line 105 + 26 bytes
_PR_NativeRunThread(void * 0x051d6990) line 399 + 13 bytes
_threadstartex(void * 0x051d67e0) line 212 + 13 bytes

Thread D
========

PR_EnterMonitor(PRMonitor * 0x056bbd70) line 87 + 14 bytes
PL_PostEvent(PLEventQueue * 0x056bb740, PLEvent * 0x08aff324) line 251 + 10 bytes
nsEventQueueImpl::PostEvent(nsEventQueueImpl * const 0x056bb1a0, PLEvent * 0x08aff324) line 251 + 16 bytes
nsRequestObserverProxy::FireEvent(nsARequestObserverEvent * 0x08aff320) line 244 + 35 bytes
nsRequestObserverProxy::OnStartRequest(nsRequestObserverProxy * const 0x08afb630, nsIRequest * 0x08afadd0, nsISupports * 0x00000000) line 185 + 12 bytes
nsStreamListenerProxy::OnStartRequest(nsStreamListenerProxy * const 0x08afcee0, nsIRequest * 0x08afadd0, nsISupports * 0x00000000) line 224
nsHttpTransaction::HandleContent(char * 0x07a90b88, unsigned int 0, unsigned int * 0x05cbfdc8) line 466 + 41 bytes
nsHttpTransaction::Read(nsHttpTransaction * const 0x08afadd4, char * 0x07a90b88, unsigned int 0, unsigned int * 0x05cbfdc8) line 709 + 23 bytes
nsReadFromInputStream(nsIOutputStream * 0x08afb5c4, void * 0x08afadd4, char * 0x07a90b88, unsigned int 0, unsigned int 4096, unsigned int * 0x05cbfdc8) line 831
nsPipe::nsPipeOutputStream::WriteSegments(nsPipe::nsPipeOutputStream * const 0x08afb5c4, unsigned int (nsIOutputStream *, void *, char *, unsigned int, unsigned int, unsigned int *)* 0x050b5530 nsReadFromInputStream(nsIOutputStream *, void *, char *, unsigned int, unsigned int, unsigned int *), void * 0x08afadd4, unsigned int 16384, unsigned int * 0x05cbfe5c) line 704 + 29 bytes
nsPipe::nsPipeOutputStream::WriteFrom(nsPipe::nsPipeOutputStream * const 0x08afb5c4, nsIInputStream * 0x08afadd4, unsigned int 16384, unsigned int * 0x05cbfe5c) line 839
nsStreamListenerProxy::OnDataAvailable(nsStreamListenerProxy * const 0x08afcee0, nsIRequest * 0x08afadd0, nsISupports * 0x00000000, nsIInputStream * 0x08afadd4, unsigned int 0, unsigned int 16384) line 283 + 38 bytes
nsHttpTransaction::OnDataReadable(nsIInputStream * 0x08afc3b0) line 214 + 72 bytes
nsHttpConnection::OnDataAvailable(nsHttpConnection * const 0x08afaa10, nsIRequest * 0x08aff790, nsISupports * 0x00000000, nsIInputStream * 0x08afc3b0, unsigned int 0, unsigned int 8192) line 631 + 15 bytes
nsSocketReadRequest::OnRead() line 2670 + 57 bytes
nsSocketTransport::doReadWrite(short 1) line 991 + 14 bytes
nsSocketTransport::Process(short 1) line 477 + 13 bytes
nsSocketTransportService::Run(nsSocketTransportService * const 0x056b9fb4) line 419 + 13 bytes
nsThread::Main(void * 0x056bd950) line 105 + 26 bytes
_PR_NativeRunThread(void * 0x056bd730) line 399 + 13 bytes
_threadstartex(void * 0x056bd580) line 212 + 13 bytes

---------------------------------------------------------------------------

CASE 2
------

Enter Monitor: 056BBD70: 5828.
Exit Monitor: 056BBD70: 5825.
Enter Monitor: 063EA6B0: 6.
Exit Monitor: 063EA6B0: 3.

Thread A
========

PR_EnterMonitor(PRMonitor * 0x056bbd70) line 87 + 14 bytes
util_PostEvent(WebShellInitContext * 0x051d53d0, PLEvent * 0x063ebe64) line 49 + 21 bytes
Java_org_mozilla_webclient_wrapper_1native_NavigationImpl_nativeStop(JNIEnv_ * 0x0086f4c0, _jobject * 0x0705fe98, long 85808080) line 295 + 13 bytes

Thread B
========

PR_EnterMonitor(PRMonitor * 0x056bbd70) line 87 + 14 bytes
PL_PostEvent(PLEventQueue * 0x056bb740, PLEvent * 0x063ea180) line 251 + 10 bytes
nsEventQueueImpl::PostEvent(nsEventQueueImpl * const 0x056bb1a0, PLEvent * 0x063ea180) line 251 + 16 bytes
nsProxyObject::Post(unsigned int 4, nsXPTMethodInfo * 0x06dc2d6c, nsXPTCMiniVariant * 0x05cbfd4c, nsIInterfaceInfo * 0x05296c10) line 470
nsProxyEventObject::CallMethod(nsProxyEventObject * const 0x063ea940, unsigned short 4, const nsXPTMethodInfo * 0x06dc2d6c, nsXPTCMiniVariant * 0x05cbfd4c) line 463 + 52 bytes
PrepareAndDispatch(nsXPTCStubBase * 0x063ea940, unsigned int 4, unsigned int * 0x05cbfdfc, unsigned int * 0x05cbfdec) line 100 + 31 bytes
SharedStub() line 124
nsHttpConnection::OnStatus(nsHttpConnection * const 0x063eaf48, nsIRequest * 0x063ea470, nsISupports * 0x063eaf40, unsigned int 2152398851, const unsigned short * 0x05cbfe50) line 666
nsSocketTransport::OnStatus(nsSocketRequest * 0x063ea470, nsISupports * 0x063eaf40, unsigned int 2152398851) line 1772 + 63 bytes
nsSocketTransport::OnStatus(unsigned int 2152398851) line 1787
nsSocketTransport::Process(short 0) line 462
nsSocketTransportService::ProcessWorkQ() line 243 + 10 bytes
nsSocketTransportService::Run(nsSocketTransportService * const 0x056b9fb4) line 446 + 11 bytes
nsThread::Main(void * 0x056bd950) line 105 + 26 bytes
_PR_NativeRunThread(void * 0x056bd730) line 399 + 13 bytes
_threadstartex(void * 0x056bd580) line 212 + 13 bytes

Thread C
========

PR_EnterMonitor(PRMonitor * 0x063ea6b0) line 87 + 14 bytes
nsAutoMonitor::nsAutoMonitor(PRMonitor * 0x063ea6b0) line 184 + 13 bytes
nsSocketTransport::OnFound(nsSocketTransport * const 0x063ea804, nsISupports * 0x00000000, const char * 0x063ea350, nsHostEnt * 0x06db8e74) line 1337
nsDNSRequest::FireStop(unsigned int 0) line 271 + 62 bytes
nsDNSLookup::CompleteLookup(unsigned int 0) line 702 + 18 bytes
nsDNSService::ProcessLookup(HWND__ * 0x005502d8, unsigned int 1024, unsigned int 1, long 64) line 849 + 22 bytes
nsDNSEventProc(HWND__ * 0x005502d8, unsigned int 1024, unsigned int 1, long 64) line 869 + 27 bytes

Thread D
========

PR_EnterMonitor(PRMonitor * 0x063ea6b0) line 87 + 14 bytes
nsAutoMonitor::nsAutoMonitor(PRMonitor * 0x063ea6b0) line 184 + 13 bytes
nsSocketTransport::AsyncRead(nsSocketTransport * const 0x063ea800, nsIStreamListener * 0x063eaf40, nsISupports * 0x00000000, unsigned int 0, unsigned int 4294967295, unsigned int 3, nsIRequest * * 0x063eaf60) line 1420
nsHttpConnection::ActivateConnection() line 382 + 65 bytes
nsHttpConnection::SetTransaction(nsHttpTransaction * 0x063e9300) line 154 + 8 bytes
nsHttpHandler::InitiateTransaction(nsHttpTransaction * 0x063e9300, nsHttpConnectionInfo * 0x063e7110, int 0) line 387 + 12 bytes
nsHttpChannel::Connect(int 1) line 242
nsHttpChannel::AsyncOpen(nsHttpChannel * const 0x063e71c0, nsIStreamListener * 0x063e8d80, nsISupports * 0x00000000) line 1802 + 10 bytes
nsDocumentOpenInfo::Open(nsIChannel * 0x063e71c0, int 0, nsISupports * 0x06376e80) line 184 + 18 bytes
nsURILoader::OpenURIVia(nsURILoader * const 0x06376e40, nsIChannel * 0x063e71c0, int 0, nsISupports * 0x06376e80, unsigned int 0) line 521 + 20 bytes
nsURILoader::OpenURI(nsURILoader * const 0x06376e40, nsIChannel * 0x063e71c0, int 0, nsISupports * 0x06376e80) line 483
nsDocShell::DoChannelLoad(nsIChannel * 0x063e71c0, int 0, nsIURILoader * 0x06376e40) line 4667 + 24 bytes
nsDocShell::DoURILoad(nsIURI * 0x063e5ba0, nsIURI * 0x00000000, nsISupports * 0x00000000, int 0, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000) line 4456 + 36 bytes
nsDocShell::InternalLoad(nsDocShell * const 0x06376e80, nsIURI * 0x063e5ba0, nsIURI * 0x00000000, nsISupports * 0x00000000, int 1, int 0, const unsigned short * 0x0544fab4, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000, unsigned int 1, nsISHEntry * 0x00000000) line 4275 + 43 bytes
nsDocShell::LoadURI(nsDocShell * const 0x06376e80, nsIURI * 0x063e5ba0, nsIDocShellLoadInfo * 0x00000000, unsigned int 0) line 559 + 72 bytes
nsDocShell::LoadURI(nsDocShell * const 0x06376e90, const unsigned short * 0x063e1d60, unsigned int 0) line 2161 + 31 bytes
wsLoadURLEvent::handleEvent() line 70 + 33 bytes
handleEvent(PLEvent * 0x063e1f84) line 48 + 11 bytes
PL_HandleEvent(PLEvent * 0x063e1f84) line 590 + 10 bytes
processEventLoop(WebShellInitContext * 0x051d53d0) line 439 + 9 bytes
Java_org_mozilla_webclient_wrapper_1native_NativeEventThread_nativeProcessEvents(JNIEnv_ * 0x0083e840, _jobject * 0x0544febc, long 85808080) line 242 + 9 bytes

Judson Valeski

unread,
Jul 17, 2001, 11:47:53 AM7/17/01
to Ed Burns, Rick Potts
Rick just landed (on the trunk) a re-write of the proxy object code to
prevent some crashing. I haven't looked at the code, but it's quite
possible that monitors/locks were re-worked in the process and may have
a positive impact here.

The socket transport thread shows up here too (no suprise).... could be
some bad interraction there.

Jud

Judson Valeski

unread,
Jul 17, 2001, 12:05:01 PM7/17/01
to Ed Burns
You're clearly banging on this hard :-/. I'm going to be out there next week for a few days, but we're in the middle of shipping a few products (NS6, and a couple embedding products). On top of that I'll be giving a presentation in San Diego on embedding on Wed of next week. In short, I'm totally slammed for the forseeable future.

I'm going to be out in late August (which I'm sure is too late for you anyway)... at which point things should have settled down.

I'm really sorry you're hitting this, and especially sorry I can't help anytime soon.

Are you in position to try the trunk (see my previous posting about the proxy object code changing)?

Jud

Ed Burns wrote:
On 16 July 17:52:38, Judson Valeski wrote:
There's definately a lock/monitor in-balance here. I'm at a complete 
loss as to what the cause could be though (esp. now that you're "doing
embedding idle stuff".

Could we sit together sometime, next time you're out?

Ed

Rick Potts

unread,
Jul 17, 2001, 5:08:13 PM7/17/01
to Judson Valeski, Ed Burns
hey ed,

I looked at your thread stack-traces and here's my "wild ass guess" as to whats happening :-)

It appears that the basic deadlock is between the UI thread and the socket transport thread in a classic A-B-B-A deadlock.  The two monitors appear to be the PLEventQ monitor and the socket transport monitor.

In order for this to happen, a couple of things must be true:
  1. Your version of nsSocketTransport.cpp < rev 2.206.  Since in rev 2.206 darin added a patch to release the sockettransport monitor *before* calling OnRead(...)
  2. The function processEventLoop(...) must be holding on to the PLEventQ monitor when PL_HandleEvent(...) is called.
It appears that the second case exposes another potential deadlock in the nsSocketTransport.  Because the socketTransport lock is *not* released before OnStatus(...) is called.  I think that it probably should be...

ed, do these ramblings make any sense to you?
-- rick

Rick Potts

unread,
Jul 17, 2001, 5:13:05 PM7/17/01
to Judson Valeski, Ed Burns
hey ed,

I looked at your thread stack-traces and here's my "wild ass guess" as to whats happening :-)

It appears that the basic deadlock is between the UI thread and the socket transport thread in a classic A-B-B-A deadlock.  The two monitors appear to be the PLEventQ monitor and the socket transport monitor.

In order for this to happen, a couple of things must be true:
  1. Your version of nsSocketTransport.cpp < rev 2.206.  Since in rev 2.206 darin added a patch to release the sockettransport monitor *before* calling OnRead(...)
  2. The function processEventLoop(...) must be holding on to the PLEventQ monitor when PL_HandleEvent(...) is called.
It appears that the second case exposes another potential deadlock in the nsSocketTransport.  Because the socketTransport lock is *not* released before OnStatus(...) is called.  I think that it probably should be...

ed, do these ramblings make any sense to you?
-- rick


Judson Valeski wrote:
0 new messages