Google Groups

Re: [blink-dev] setImmediate?


James Robinson Jul 29, 2013 1:54 AM
Posted in group: blink-dev
Thanks for bringing this up, Tony.  I'm not sure a concise explanation is possible here, but I can try to explain my perspective.  Unfortunately much has been written about this topic based on incorrect information about how timers and/or setImmediate work and some of the issues are rather subtle.  We should definitely strive to explain this area more correctly and precisely.

Nearly everything I've read starts off with an incorrect idea about how timer clamping works or why it's in place.  The way timer clamping works [1] is every task has an associated timer nesting level.  If the task originates from a setTimeout() or setInterval() call, the nesting level is one greater than the nesting level of the task that invoked setTimeout() or the task of the most recent iteration of that setInterval(), otherwise it's zero.  The 4ms clamp only applies once the nesting level is 4 or higher.  Timers set within the context of an event handler, animation callback, or a timer that isn't deeply nested are not subject to the clamping.  This detail hasn't been reflected in the HTML spec until recently (and there's an off-by-one error in it right now), but has been true in WebKit/Blink since 2006 [2] and in Gecko since 2009 [3].  The practical effect of this is that setTimeout(..., x) means exactly what you think it would even for x in [0, 4) so long as the nesting level isn't too high.

Back before the initial public launch of Chrome, we did a number of experiments with this clamping including removing it completely.  We found that removing the clamp completely made us quickly aware that too many pages leak infinitely running zero-delay timers, most of which did nothing useful in all sorts of creative ways [4].  These sorts of errors are not immediately obvious to the page author since the page is generally visually and functionally fine but chews through the user's CPUUnclamped, these timers consumed 100% of the CPU but with even a fairly small clamp the CPU usages decreased significantly.  If each iteration of the timers takes time Xms then a clamp of Yms results in a CPU usage of X/Y, which is very small when X is very small.  Modern JS engines can give us very small values for X, so we dropped the clamp from 10ms down to 4ms but kept the nesting level behavior unchanged.

With a better understanding of timer clamping, let's consider the possible use cases for scheduling asynchronous work.  One is to time work relative to display updates.  Obviously requestAnimationFrame() is the right API to use for animations, but it's also the right choice for one-off uses.  To perform work immediately before the next display - for example to batch up graphical updates as data is streamed in off the network - just make a one-off requestAnimationFrame() call.  To perform work immediately -after- a display, just call setTimeout() from inside the requestAnimationFrame() handler.  The nesting level is zero within a rAF() callback so the timeout parameter will not be clamped to 4.

Another common use case is to run a portion of an event handler asynchronously.  If it's important that the asynchronous work happens after a display, for instance to make sure the user sees the visual effects from the synchronous part of the user handler, schedule the work using requestAnimationFrame().  Otherwise, setTimeout() works just fine even if the work happens in a few chunks.  The clamping only applies if the work executes in a large number of batches.

Cases where a large amount of work - more than can reasonably be performed in 2-3 batches - has to be executed without janking up the main thread.  Ideally, such work would be performed on a web worker where it can simply run to completion without interfering with the main thread, but that's not feasible for many use cases with the current web platform.  That leaves running the work on the main thread in small-ish chunks.  One way to structure this (as seen on several of the setImmediate demos from Microsoft such as http://ie.microsoft.com/testdrive/Performance/setImmediateSorting/Default.html) is this:

// Performs one atomic unit of work, returns true if there is more work to do.
function doAtom() { ... }

function tick() {
  if (doAtom())
    window.setImmediate(tick);  // or setTimeout
}

This looks valid at a glance but is actually a deeply unfortunate way to structure the code since it means jumping in and out of the JS VM for every atom of work.  Exiting and entering the VM is far more expensive than running small amounts of script in modern JS engines.  The Microsoft demo does a few bits of math and swaps two elements within an array in each callback.  Perversely, the faster JS implementations become more time is (proportionally) wasted on overhead since each iteration with its relatively fixed VM overhead takes less time to do the same amount of work. It's much better to batch the work up into reasonably-sized chunks before yielding:

var CHUNK_SIZE_MS = ...;
function tick() {
  var id = window.setTimeout(tick, CHUNK_SIZE_MS);
  var start = now();  // Remember to use a good time function!
  while (now() - start < CHUNK_SIZE_MS)) {
    if (!doAtom()) {
      // Ah, we're all done!  Clear the scheduled timer and return.
      window.clearTimeout(id);
      return;
    }
  }
  // We must have exceeded the timeout without finishing out work, so return control to the browser until we get called back.
}

This avoids jumping in and out of the VM on every iteration and actually gets more efficient as JS engines get faster since more work is executed per tick.  Additionally, so long as CHUNK_SIZE_MS is >= 4ms, timer clamping has no effect on this code.  By the time the tick() function exits, either the work is done or enough time has elapsed that the timer is ready to fire.

Of course, being ready to fire and firing are not the same thing.  This gets to the heart of what I suspect a lot of the discussion and misinformation is really trying to address which is the problem of queuing and scheduling on the main JS thread.  JavaScript is both single-threaded and run-to-completion and has to compete with plenty of other tasks that have to run on the main thread.  Since the browser can't interrupt a task to start running another without violating run-to-completion semantics, the only real choice the browser has when scheduling work is picking a task to run when the event loop is idle.  This is a very short part of the HTML spec [5] but one of the trickiest parts of a browser implementation.  We've spent a lot of time trying to improve this and have a lot of work left to do.

The problem statement from a browser's perspective is this: Given a set of runnable tasks, pick one and run it.  Not all pending tasks can be run at any time - pending tasks are ordered into task queues that must be drained in FIFO order.  If a page post two messages to a frame, the browser has to deliver them in the order posted although it can interleave other tasks between the two messages.  One a task is picked the thread is blocked until that task completes.  The goal is to get whatever the user wants done as quickly as possible, but that's not always easy to determine what the user wants and whether draining a particular task queue will get them closer to that goal.  We do what we can with the information we have.  We make sure that display-related tasks such as firing animation callbacks happens as regularly as we can and as close to in sync with the physical display as possible with throttling when we detect bottlenecks in the graphics pipeline.  We dispatch input-related tasks as quickly as we can to minimize latency, but also have some sophisticated throttling and batching logic to make sure high-frequency inputs like mouse moves don't end up flooding the system.  We're just starting to tweak how we deliver progress events for network loads when the user is interacting with the page to try to help some situations we've seen on mobile browsers.  For task sources where we don't have a strong signal or we just haven't paid much attention to yet, we default to a FIFO-ish ordering.

Timers are a bit of a black box in this system.  In contrast with input event handlers, which are tied to the input event and thus the user's perceived latency, or requestAnimationFrame callbacks, which are tied in to the display pipeline, timers do not have any semantic meaning other than some opaque work that the page wants to do as some point in time.  This means there isn't much a browser can do other than interleave timer tasks with other work as often as is possible.  The big disappointment of setImmediate() for me is that it provides no additional information that the browser can use to perform task servicing more efficiently.  If it produced tasks that contained some information about what they were trying to accomplish then we could try to service it at a better time.  As it is, it doesn't provide anything new except for avoiding timer clamping which only applies in cases where it's really useful.

For non-browser contexts using JS, such as NodeJS, a different set of constraints apply and I wouldn't at all be surprised if a different set of APIs made sense.  Node is a server technology so the UI responsiveness considerations do not apply.

I want to be really clear here that I don't think we are doing the best job we could in minimizing queuing delays.  In fact, we have a very long way to go to get there.  One thing we can do to reduce delays is to move as much work as possible off of the JS thread.  We've made a lot of progress here with our threaded HTML parsing and graphics stack, but have a lot more to do both in terms of implementation and in providing appropriately designed platform capabilities.  Another thing to do is to simply speed up JS and JS-related operations by adding more optimizations and primitives to the language and moving work like garbage collection out of the main thread.  Last, but certainly not least, we need to improve the way we choose tasks to run.  Today, it's not all that hard to change the performance characteristics of a page by sending work through different task sources depending on the vagaries of how these sources are scheduled in different browser implementations.  I don't think we batch up network and input tasks the way we should, perform style+layout work at the times we should, or service timers as the right frequency relative to other task sources.  These are all things we have been working on and will continue to work on.  What's important to get from the platform is enough information about what tasks are trying to accomplish to know when to run them.

The spec is as PhistucK notes in roughly the same shape today as it was a year ago.  The spec's quality is not very high - the non-normative text contradicts the normative text and the normative text is not very good.  It doesn't specify a task source at all for tasks it generates.  Given that the spec hasn't received any traction from browser vendors outside of IE, I expect that a year from today the spec will still be in the same state.

As for the polyfill, it puzzles me.  It's many hundreds of lines of code, yes, but most of this is to support older browsers like IE8 that will never have a newly proposed API.  Other than that, I don't really understand what the purpose of it is in the first place.  setTimeout(..., 0) will generally do what callers want unless they're already doing something inefficient, but that needs to be handled by looking at the caller.  I could believe that manipulating the task source changes the performance characteristics of a page, or especially a synthetic microbenchmark, as this is one of the most complex areas of browser implementations, but without a semantic underpinning any wins this produces will be fleeting.

- James


[4] One example was from a very popular newspaper website that included a fairly old version of a javascript library.  This library had a polyfill for a load event that used polling with a setInterval().  Unfortunately, the script stored the interval ID in a global and the embedding page included the script multiple times from different widgets, which clobbered the ID for all setInterval() calls but the last one.  The page then had multiple zero-delay setInterval()s running for the entire lifetime of the page, each doing nothing useful except for attempt to cancel themselves.