Re: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...

140 views
Skip to first unread message

Brian C. Anderson

unread,
Mar 12, 2015, 2:35:24 PM3/12/15
to Jerry Jongerius, Brandon Jones, schedu...@chromium.org
+scheduler-dev since other people are more familiar with the guts of base::MessageLoop than I am.

The base::MessageLoop has to manage a sorted list of delayed tasks and somehow sleeps for the next delayed task, but in a way that it can still wake up if an earlier task/event comes in. So we don't just call Sleep().

Following the code on code search, it looks like the delay is passed as whole milliseconds into MsgWaitForMultipleObjectsEx from MessagePumpForUI::WaitForWork

From the documentation it isn't clear if MsgWaitForMultipleObjectsEx's delay precision depends on timeBeginPeriod or not, but I'm under the impression that it is. Regardless, the API limits us to at least whole milliseconds.

What approach are you using to get such good vsync alignment that doesn't depend on timeBeginPeriod? If there's a way to add it as something MsgWaitForMultipleObjectsEx can wait on without a delay, it might work very well.

Do you happen to be using IDXGIOutput::WaitForVBlank? If we could call that from a dedicated thread and then post an immediate task to the MessageLoop, that should wake up the MsgWaitForMultipleObjectsEx.

-Brian


On Thu, Mar 12, 2015 at 9:27 AM, Jerry Jongerius <jer...@duckware.com> wrote:

This is going to work really well.  Is Google interested?

 

From: Jerry Jongerius [mailto:jer...@duckware.com]
Sent: Wednesday, March 11, 2015 8:35 PM


To: 'Brian C. Anderson'
Subject: RE: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...

 

I know what follows may not make a lot of sense, but it is output from my prototype that proves this can be done (see bold for killer vsync alignment when the test code is activate).  Hopefully that would translate to Chrome as well, and Chrome could perfectly align rAF callbacks to VSYNC, even if timeBeginPeriod() was never called, and regardless of what sleep accuracy there is on the system (like 10ms on my win7 machine and 15.625ms accuracy on my Win8 machine).  Regardless of all of this, under the fix, Sleeps will wake up VSYNC aligned.

 

Everyone points to a frame budget of 16.66ms – but on Windows, it is really 1ms less on my Win7 computer, and 2ms less on my Win8 computer (due to sleep inaccuracy).  And as I assume you know, when on battery power, accuracy reduces even more, meaning the frame budget is reduced even more – all for no good reason.

 

===========================================================

***** 218562  16.719949

qpf = 1948750

---vsync interval test---

vNext  vNow  LATE  delta  ms  align  num-sleep-spins

413860 420637  6777  39029  20.027710   6.201854  1

446443 459610  13167  38973  19.998974   7.397968  1

479026 479088  62  19478  9.995125   7.995765  1

511609 518097  6488  39009  20.017447   9.192984  1

544192 557039  12847  38942  19.983066   10.388147  1

576775 596044  19269  39005  20.015394   11.585244  1

609358 615502  6144  19458  9.984862   12.182426  1

641941 654476  12535  38974  19.999487   13.378572  1

674524 693478  18954  39002  20.013855   14.575576  1

707107 712937  5830  19459  9.985375   15.172789  1

++++++++++++turn fix ON

739690 740041  351  27104  13.908403   16.004634  1

772273 772632  359  32591  16.724054   17.004880  2

804856 805200  344  32568  16.712251   18.004419  1

837439 837733  294  32533  16.694291   19.002885  2

870022 870342  320  32609  16.733291   20.003683  2

902605 902859  254  32517  16.686081   21.001657  1

935188 935433  245  32574  16.715330   22.001381  2

967771 968092  321  32659  16.758948   23.003714  2

1000354 1000666  312  32574  16.715330   24.003437  1

1032937 1033264  327  32598  16.727646   25.003898  2

++++++++++++turn fix OFF

1065520 1083193  17673  49929  25.621039   26.536261  2

1098103 1102714  4611  19521  10.017191   27.135377  1

1130686 1141774  11088  39060  20.043618   28.334162  1

1163269 1180703  17434  38929  19.976395   29.528926  1

===========================================================

 

It is rather hard to follow the Chrome code for someone like me who does not look at the source code every day, but where does DelayBasedTimeSource  ultimately terminate (under Windows) in a wait/sleep/whatever?  I assume it is an actual Sleep() or WaitForXXX?

 

I was able to prototype something that really works in an afternoon, and have tested it on Win7/Win8 computers.  Hopefully your answer to the above question will tell me if this idea would be crazy easy to add into Chrome (or actually take a little work).

 

 

- Jerry

 

 

From: Jerry Jongerius [mailto:jer...@duckware.com]
Sent: Wednesday, March 11, 2015 3:28 PM
To: 'Brian C. Anderson'
Subject: RE: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...

 

In other words, in Windows, the inter-frame times (green), and the offset from true vsync (brown) that the rAF is called back are a sawtooth pattern, due to Windows Sleep() precision of 1ms

 

        cid:image001.png@01D05C31.AE8A8190

There is a way around that, causing these lines to become virtually flat, that does not involve spin waiting.

 

- Jerry

 

 

From: Jerry Jongerius [mailto:jer...@duckware.com]
Sent: Wednesday, March 11, 2015 2:47 PM
To: 'Brian C. Anderson'
Subject: RE: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...

 

That is able to sleep to a microsecond precision?

 

Chrome release/Canary is currently bound (under Windows) to the OS Sleep() precision of 1ms.  Meaning that for a 120Hz (8.3ms) display, up to 1ms of that is essentially wasted.

 

There is a technique to get around that, and get back (for vsync alignment at least) microsecond sleeps under Windows.

 

- Jerry

 

 

 

From: Brian C. Anderson [mailto:brian...@google.com]
Sent: Wednesday, March 11, 2015 1:55 PM
To: Jerry Jongerius
Cc: Brandon Jones
Subject: Re: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...

 

Are you talking about using a different approach than what DelayBasedTimeSource uses?

 

 

 

On Wed, Mar 11, 2015 at 9:16 AM, Jerry Jongerius <jer...@duckware.com> wrote:

... Do you/Google already have a solution for this that does not involve
spin waiting?  Because I think I just thought of a way to do it, if you want
to know about it...  Please let me know...

Already been prototyping some code and it works.

- Jerry

 


image001.jpg

Jerry Jongerius

unread,
Mar 13, 2015, 7:13:02 AM3/13/15
to Brian C. Anderson, Brandon Jones, schedu...@chromium.org

Brian, and team,

 

How any wait/sleep/etc under Windows typically works -- is that no matter when (in the OS allocated time slice) the call was issued, Windows will treat the wait/sleep/etc (for timeout purposes) to have been issued when the OS time slice began.  Which means depending upon how much your timeout is and when in a time slice it was issued, the actual time out can be anywhere from one time slice too early to one time slice too late.

 

I noticed in Chrome that Sleep(ms) is implemented in a loop, possibly sleeping more than once, to make up for the ‘too early’ case.

 

If we changed Sleep(ms) to WaitForSingleObject(hEvent,ms), and then behind the scenes could ‘poke’ the hEvent at some higher accuracy, then Windows time slice errors could be greatly reduced.

 

So the entire trick is: How do we “behind the scenes ‘poke’ hEvent”.  The beauty is that if we don’t poke, we have changed nothing.  But if we do poke, all we have done is increase accuracy.  So, the ‘poke’ happens from another dedicated thread (running at the absolute highest priority we can, like THREAD_PRIORITY_TIME_CRITICAL, and sleeps almost all the time, but smartly wakes up).

 

I was kind of using WaitForVBlank.  For my quick prototype, I was using DirectDraw’s WaitForVerticalBlank.  And it actually works fantastic.  The only show stopper problem was that WaitForVerticalBlank is implemented via a spin wait.  After extensive searches online, it appears that any/all of the ‘vertical blank’ wait calls in Windows are implemented via spin wait (I never tried IDXGIOutput::WaitForVBlank since I found comments online stating that it also spin waits).

 

BUT, Windows own Desktop Window Manager IS waiting for vertical blank somehow, without spin waiting.  How that happens is not yet known, but I assume that given the size of Google, that Google would have some developer contacts with Microsoft to ask how that is done?

 

Until that answer is known, the second best solution is to use DwmFlush().  On my systems, it appears to trigger at a relatively steady offset (around 0.25ms) after vertical blanking starts, and before vertical blanking ends (around 0.6ms after it starts).  Here is my prototype code:

 

DWORD WINAPI vsyncAlignerThread(LPVOID lpParam) {

    typedef HRESULT (WINAPI *LPFNDWNFLUSH)(void);

    HMODULE hDWMAPI = GetModuleHandle("dwmapi");

    LPFNDWNFLUSH lpfnDwmFlush = hDWMAPI ? (LPFNDWNFLUSH)GetProcAddress(hDWMAPI, "DwmFlush") : NULL;

    if (lpfnDwmFlush) {

        SetThreadPriority( GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL );

        while (true) {

            HRESULT result = (*lpfnDwmFlush)();

            if (S_OK==result) {

                triggerSleepsToWakeup();  // poke event

                }

            else {

                Sleep(100);

                }

            }

        }

    return 0;

    }

 

Actually, there is a subtle but incredibly critical advantage to using DwmFlush. The vertical blank timing information obtained from DwmGetCompositionTimingInfo() is not exact, but is itself sampled information.  That means that once Chrome computes a time in microseconds that it wants to wake up for vsync work, that time itself has subtle error (micro-jank).  It we switched to using a trigger that is ‘precisely’ vsync aligned, we would absolutely need to address that subtle error.  But as it stands now, the delay in DwmFlush() after true vsync is larger than the vsync wakeup times micro-jank, meaning that we don’t need to adjust anything at all.

 

Hopefully there are no show stopper issues with using DwmFlush().  The only crazy thing I did run into on Win7 is that during the first couple of seconds of my application starting up, DwmFlush fails (but then starts working).  Which is why on an error, I Sleep and retry later.  I did not run into a problem on Win8.

 

Anyone there willing to give this is quick try in Canary, replacing triggerSleepsToWakeup() with code to wakes up MsgWaitForMultipleObjectsEx, and see if rAF callback inter-frame times improve a lot?

image001.jpg

Jerry Jongerius

unread,
Mar 14, 2015, 7:23:49 PM3/14/15
to Brian C. Anderson, Brandon Jones, schedu...@chromium.org

Brian and team,

 

I just figured it out.  I can now wait for the exact vsync alignment, and wake up, without spin waiting – even when Sleep(1) won’t wake up for another 15.625ms.  This must be the technique that Windows Desktop Window Manager is using.  Should work on Vista and later.  Tested (and works) on Win7 and Win81.

 

Now it will be up to you guys to figure out how best to work around the micro-jitter issue.  Namely, if the head timer object is a vsync timer object, and wants to wake up at “T”, but the new code above wakes up at “T” minus 0.03ms, the timer T has not expired and will not fire -- even though it is a vsync timer and should fire.

 

Brian, please follow up for details…

 

- Jerry

 

 

From: Jerry Jongerius [mailto:jer...@duckware.com]
Sent: Friday, March 13, 2015 7:13 AM
To: 'Brian C. Anderson'
Cc: 'Brandon Jones'; 'schedu...@chromium.org'
Subject: RE: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...

 

Brian, and team,

Brian C. Anderson

unread,
Mar 16, 2015, 2:58:53 PM3/16/15
to Jerry Jongerius, Brandon Jones, schedu...@chromium.org
Jerry,

Thanks for figuring all this out. I've responded to https://code.google.com/p/chromium/issues/detail?id=467617. Let's move the conversation there.

-Brian

Reply all
Reply to author
Forward
0 new messages