Web Images Videos Maps News Shopping Gmail more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Serious regression: Win2K3 SP1 kills Timers
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Oran  
View profile  
 More options Jun 28 2005, 4:57 pm
Newsgroups: microsoft.public.dotnet.framework.clr
From: "Oran" <odenni...@gmail.com>
Date: 28 Jun 2005 13:57:58 -0700
Local: Tues, Jun 28 2005 4:57 pm
Subject: Serious regression: Win2K3 SP1 kills Timers
Windows Server 2003 Service Pack 1 causes the System.Threading.Timer to
not fire, sometimes immediately and sometimes after a while.  Once a
timer dies, it will never fire again.

Jamus Sprinson posted this first with a simple repro app at
http://groups-beta.google.com/group/microsoft.public.dotnet.framework...

Jamus's repro app causes this problem almost immediately by using a
large number of timers.  We encountered this problem in a larger
production app that only had 7 timers.  Eventually some timers would
stop firing, never to fire again, while other timers continued to fire.

A variation on Jamus's repro app is to change the Timer period from
Timeout.Infinite to something like 15000 (15 seconds).  This allows you
to see that sometimes all timers will fire the first time around, but
on subsequent firings some timers will start dying off.  Sometimes all
timers will fire repeatedly for quite a while, and then system load
appears to cause them to drop off.  We did some other tests to verify
that this isn't a problem with ThreadPool.QueueUserWorkItem or
ThreadPool.UnsafeQueueUserWorkItem.  It happens with both Debug and
Release builds.

This problem only occurs on Windows Server 2003 with Service Pack 1.
We tested several OS and Service Pack variants including the .NET
Framework 1.1 with and without the .NET Framework 1.1 SP1.  The culprit
is very clearly Windows Server 2003 SP1.  No other OS exhibits this
behavior.

Our partial workaround is to implement our own timers using a dedicated
thread, however this is insufficient since we also use classes in the
.NET Framework that use the System.Threading.Timer.  Classes that use
System.Threading.Timer include:

System.Data.SqlClient.ConnectionPool
System.Data.SqlClient.TdsParser
System.Data.SqlClient.Lifetime.LeaseManager

System.Timers.Timer

System.Web.Caching.CacheExpires
System.Web.Caching.CacheInternal.StartCacheMemoryTimers
System.Web.HttpRuntime
System.Web.RequestQueue
System.Web.RequestTimeoutManager
System.Web.SessionState
System.Web.Util.ResourcePool

As you can see from the list, this has a fairly serious impact on
important pieces of the .NET Framework.  We have also reproduced this
problem using the System.Timers.Timer, and I assume you could find ways
of reproducing it with other classes listed above.

One difference we saw from what Jamus reported is that this problem
reproduced for us quite easily on single-processor Windows Server 2003
machines.

The problem appears to be worse under load.  I took a look at the .NET
performance counters while the repro app was running, and the only
difference I noticed between runs that lost timers and runs that didn't
was an extra GC on the successful runs.  I can't see how that would be
significant, but the following comment in the Rotor source for
AddTimerCallbackEx in comthreadpool.cpp makes me nervous:

// NOTE: there is a potential race between the time we retrieve the app
domain pointer,
// and the time which this thread enters the domain.
//
// To solve the race, we rely on the fact that there is a thread sync
(via GC)
// between releasing an app domain's handle, and destroying the app
domain.  Thus
// it is important that we not go into preemptive gc mode in that
window.

Another bit of weirdness we saw while debugging our production app with
7 timers is that for TimerCallback delegates pointing to different
instances of the same exact type of object, the TimerCallback's
_methodPtr field was sometimes the same as the MethodDesc table's Entry
value which points to the beginning of the method's instructions, while
at other times the _methodPtr field points to an instruction that does
a jmp to the the beginning of the method referenced by the MethodDesc
table.  I was able to see this with WinDbg and SOS using !dumpmt -MD
and !u.  This seemed pretty weird since I was under the impression that
delegate signatures were "equal" if _target and _methodPtr matched.
Perhaps delegate pointers aren't always being fixed up during a GC?
However this doesn't match with what we see in Jamus's repro app that
only uses a single TimerCallback delegate for all Timers and only some
die, so this may be yet another issue, or more likely a
misunderstanding on my part.

There is also another unresolved report of slightly different
System.Threading.Timer flakiness on Windows Server 2003 here
http://groups-beta.google.com/group/microsoft.public.dotnet.languages...

Does anyone know of a hotfix or better workaround for this issue?

Oran


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jamus  
View profile  
 More options Jun 29 2005, 12:38 pm
Newsgroups: microsoft.public.dotnet.framework.clr
From: "Jamus" <jamus...@earthlink.net>
Date: 29 Jun 2005 09:38:49 -0700
Local: Wed, Jun 29 2005 12:38 pm
Subject: Re: Serious regression: Win2K3 SP1 kills Timers

Oran wrote:
> One difference we saw from what Jamus reported is that this problem
> reproduced for us quite easily on single-processor Windows Server 2003
> machines.

Apologies; this was due to poor testing on our part; we had tested on
three machines, and only the multi-processor machine exhibited the
problem- this was before we noticed that only that machine had updated
to SP1 as well.  Later tests showed that all machines with SP1 seemed
to exhibit this behavior.

> The problem appears to be worse under load.  I took a look at the .NET
> performance counters while the repro app was running, and the only
> difference I noticed between runs that lost timers and runs that didn't

Running several copies of the app I made will cause more of the timers
to fail, particularly on the later executions.  For instance, running
10 copies simultaneously will often give 100% success on the first one
or two copies, then progressively lower rates for each subsequently
launched process.  Similar behavior could be generated, however, by
running a process on a higher priority thread which did garbage
calculations, creating artificially high CPU load.  I suspect that
downward trend in running 10 copies of the app is due to the small
delay between each process actually starting, which allowed earlier
copies to create their timers with less CPU load (or less timers?) on
the system than later instances, which had to compete with the earlier
instances.

I hope that made sense..

As for the GC.. we had a wrapping class around our
system.threading.timers so that we could have some special error
reporting and timer tracking in our application.  Our application was
EXTREMELY heavy on threading timers, with cases of many timers being
generated on the same instance of an object, and also of many timers
being generated on unique instances.  There does not seem to be any
difference in performance.  Also, a finalizer notified us if the object
(and therefore its system.threading.timer reference) were released
before the timer firing; this was to ensure that the timers were not
somehow either being GCed before firing or failing, and being then
GCed.  What we found is that those timers that don't fire are NEVER
collected.  If timers are being continually generated, this is
particularly evident by watching the memory charge on the process,
which continues to climb as timers are allocated and never destroyed.

We made similar tests on the threadpool functions in an effort to
narrow down the problem, but the threadpool does not seem to be the
culprit.

We're watching with interest, hoping something comes of this soon; it
seems that the only thing which corrects this behavior is to uninstall
SP1, and sometimes even that does not work.

Jamus


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
gavinjoyce@gmail.com  
View profile  
 More options Jul 3 2005, 1:18 pm
Newsgroups: microsoft.public.dotnet.framework.clr
From: "gavinjo...@gmail.com" <gavinjo...@gmail.com>
Date: 3 Jul 2005 10:18:32 -0700
Local: Sun, Jul 3 2005 1:18 pm
Subject: Re: Serious regression: Win2K3 SP1 kills Timers
I went report this bug on MSDN, but could not find anywhere to post
bugs from non-beta software. Has anyone else reported this to microsoft?

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Todd Reifsteck  
View profile  
 More options Jul 7 2005, 5:31 pm
Newsgroups: microsoft.public.dotnet.framework.clr
From: toddr...@online.microsoft.com (Todd Reifsteck)
Date: Thu, 07 Jul 2005 21:31:50 GMT
Local: Thurs, Jul 7 2005 5:31 pm
Subject: RE: Serious regression: Win2K3 SP1 kills Timers
This problem can appear for a number of reasons. It often appears because of deadlocks in an application.

If the code has been reviewed and it is clear that no deadlocks exist, the issue may be caused by a Timer issue identified in the following KB article.
http://support.microsoft.com/?kbid=900822

Thanks!
Todd Reifsteck


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oran  
View profile  
 More options Jul 8 2005, 6:09 pm
Newsgroups: microsoft.public.dotnet.framework.clr
From: "Oran" <odenni...@gmail.com>
Date: 8 Jul 2005 15:09:51 -0700
Local: Fri, Jul 8 2005 6:09 pm
Subject: Re: Serious regression: Win2K3 SP1 kills Timers
The KB article referenced by Todd is the fix for this, but it has a
misleading title.  It claims this is a fix for "a Windows Forms-based
application" using the System.Threading.Timer class.  Our tests and the
repro app mentioned above have reproduced this problem in a Windows
Service and a Console app.  It will also happen in an ASP.NET app,
anything that uses Timers directly, or anything that uses the other
classes listed above.

In addition, the KB article claims that this problem is with the .NET
Framework in general.  In fact, the problem described above only
happens when Windows Server 2003 Service Pack 1 is applied, and the
hotfix that fixes this is the one for Windows Server 2003 titled
240661_ENU_i386_zip.exe, not the one titled 240388_ENU_i386_zip.exe
which is for the rest of the .NET Framework OSs.  I don't doubt there
were other problems with the Timer class that are fixed by the second
hotfix, but the problem described above only occurs on Windows Server
2003 SP1 and therefore is fixed by the 2003 hotfix.

Anyway, Microsoft won't charge you for the call if you call them to get
this hotfix, and their responsiveness to this bug has been great.

In my correspondence with Microsoft on this issue, they referenced
another KB article number that doesn't exist as of this writing but may
appear at some point: 903091.  This one is in relation to the Windows
Server 2003 SP1 hotfix.  Hopefully it will contain more accurate
information on this bug and its fix.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
gyurisc  
View profile  
 More options Jul 10 2005, 1:09 pm
Newsgroups: microsoft.public.dotnet.framework.clr
From: "gyurisc" <gyurisc.nospam at smartwombat.net>
Date: Sun, 10 Jul 2005 19:09:02 +0200
Local: Sun, Jul 10 2005 1:09 pm
Subject: Re: Serious regression: Win2K3 SP1 kills Timers
Hello,

Can you see if this kb applies to you problem by any chance?

FIX: When a Windows Forms-based application uses the System.Threading.Timer
class, the timer event may not be signaled in the .NET Framework 1.1 SP1
http://support.microsoft.com/?id=900822

"Jamus" <jamus...@earthlink.net> wrote in message

news:1120063129.330358.122590@g43g2000cwa.googlegroups.com...


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Oran  
View profile  
 More options Jul 11 2005, 3:51 pm
Newsgroups: microsoft.public.dotnet.framework.clr
From: "Oran" <odenni...@gmail.com>
Date: 11 Jul 2005 12:51:56 -0700
Local: Mon, Jul 11 2005 3:51 pm
Subject: Re: Serious regression: Win2K3 SP1 kills Timers
I talked with someone at Microsoft about changing the title of the
KB900822 article to reflect the fact that it has nothing to do with
Windows Forms, and they said they had made a request to have the KB
article edited to more clearly indicate that the problem isn't
application-specific, but is a problem with the Timer class in Windows
Server 2003 SP1.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
dixy  
View profile  
 More options Jul 20 2005, 8:07 am
Newsgroups: microsoft.public.dotnet.framework.clr
From: "dixy" <d...@windvale.com>
Date: 20 Jul 2005 05:07:06 -0700
Local: Wed, Jul 20 2005 8:07 am
Subject: Re: Serious regression: Win2K3 SP1 kills Timers
I have been troubled too with this issue, and have tried every way
suggested in forums' articles I found on the net, but no help. Finally,
just now, it seems that I have found one workaround.

I have dedicated Elapsed event handler for every Timer object. By
adding the following line on the top of every handler, it seems that
all Timer's objects now work without stopping again.

System.Threading.Thread.Sleep(0);

As we have already known, the method with argument 0 lets other waiting
threads to start, and I think, it just lets any 'stopped' Timer object
to "back on the business" again.

--
dixy


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google