Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Message from discussion Serious regression: Win2K3 SP1 kills Timers
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Oran  
View profile  
 More options Jun 28 2005, 4:57 pm
Newsgroups: microsoft.public.dotnet.framework.clr
From: "Oran" <odenni...@gmail.com>
Date: 28 Jun 2005 13:57:58 -0700
Local: Tues, Jun 28 2005 4:57 pm
Subject: Serious regression: Win2K3 SP1 kills Timers
Windows Server 2003 Service Pack 1 causes the System.Threading.Timer to
not fire, sometimes immediately and sometimes after a while.  Once a
timer dies, it will never fire again.

Jamus Sprinson posted this first with a simple repro app at
http://groups-beta.google.com/group/microsoft.public.dotnet.framework...

Jamus's repro app causes this problem almost immediately by using a
large number of timers.  We encountered this problem in a larger
production app that only had 7 timers.  Eventually some timers would
stop firing, never to fire again, while other timers continued to fire.

A variation on Jamus's repro app is to change the Timer period from
Timeout.Infinite to something like 15000 (15 seconds).  This allows you
to see that sometimes all timers will fire the first time around, but
on subsequent firings some timers will start dying off.  Sometimes all
timers will fire repeatedly for quite a while, and then system load
appears to cause them to drop off.  We did some other tests to verify
that this isn't a problem with ThreadPool.QueueUserWorkItem or
ThreadPool.UnsafeQueueUserWorkItem.  It happens with both Debug and
Release builds.

This problem only occurs on Windows Server 2003 with Service Pack 1.
We tested several OS and Service Pack variants including the .NET
Framework 1.1 with and without the .NET Framework 1.1 SP1.  The culprit
is very clearly Windows Server 2003 SP1.  No other OS exhibits this
behavior.

Our partial workaround is to implement our own timers using a dedicated
thread, however this is insufficient since we also use classes in the
.NET Framework that use the System.Threading.Timer.  Classes that use
System.Threading.Timer include:

System.Data.SqlClient.ConnectionPool
System.Data.SqlClient.TdsParser
System.Data.SqlClient.Lifetime.LeaseManager

System.Timers.Timer

System.Web.Caching.CacheExpires
System.Web.Caching.CacheInternal.StartCacheMemoryTimers
System.Web.HttpRuntime
System.Web.RequestQueue
System.Web.RequestTimeoutManager
System.Web.SessionState
System.Web.Util.ResourcePool

As you can see from the list, this has a fairly serious impact on
important pieces of the .NET Framework.  We have also reproduced this
problem using the System.Timers.Timer, and I assume you could find ways
of reproducing it with other classes listed above.

One difference we saw from what Jamus reported is that this problem
reproduced for us quite easily on single-processor Windows Server 2003
machines.

The problem appears to be worse under load.  I took a look at the .NET
performance counters while the repro app was running, and the only
difference I noticed between runs that lost timers and runs that didn't
was an extra GC on the successful runs.  I can't see how that would be
significant, but the following comment in the Rotor source for
AddTimerCallbackEx in comthreadpool.cpp makes me nervous:

// NOTE: there is a potential race between the time we retrieve the app
domain pointer,
// and the time which this thread enters the domain.
//
// To solve the race, we rely on the fact that there is a thread sync
(via GC)
// between releasing an app domain's handle, and destroying the app
domain.  Thus
// it is important that we not go into preemptive gc mode in that
window.

Another bit of weirdness we saw while debugging our production app with
7 timers is that for TimerCallback delegates pointing to different
instances of the same exact type of object, the TimerCallback's
_methodPtr field was sometimes the same as the MethodDesc table's Entry
value which points to the beginning of the method's instructions, while
at other times the _methodPtr field points to an instruction that does
a jmp to the the beginning of the method referenced by the MethodDesc
table.  I was able to see this with WinDbg and SOS using !dumpmt -MD
and !u.  This seemed pretty weird since I was under the impression that
delegate signatures were "equal" if _target and _methodPtr matched.
Perhaps delegate pointers aren't always being fixed up during a GC?
However this doesn't match with what we see in Jamus's repro app that
only uses a single TimerCallback delegate for all Timers and only some
die, so this may be yet another issue, or more likely a
misunderstanding on my part.

There is also another unresolved report of slightly different
System.Threading.Timer flakiness on Windows Server 2003 here
http://groups-beta.google.com/group/microsoft.public.dotnet.languages...

Does anyone know of a hotfix or better workaround for this issue?

Oran


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google