This is going to work really well. Is Google interested?
From: Jerry Jongerius [mailto:jer...@duckware.com]
Sent: Wednesday, March 11, 2015 8:35 PM
To: 'Brian C. Anderson'
Subject: RE: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...
I know what follows may not make a lot of sense, but it is output from my prototype that proves this can be done (see bold for killer vsync alignment when the test code is activate). Hopefully that would translate to Chrome as well, and Chrome could perfectly align rAF callbacks to VSYNC, even if timeBeginPeriod() was never called, and regardless of what sleep accuracy there is on the system (like 10ms on my win7 machine and 15.625ms accuracy on my Win8 machine). Regardless of all of this, under the fix, Sleeps will wake up VSYNC aligned.
Everyone points to a frame budget of 16.66ms – but on Windows, it is really 1ms less on my Win7 computer, and 2ms less on my Win8 computer (due to sleep inaccuracy). And as I assume you know, when on battery power, accuracy reduces even more, meaning the frame budget is reduced even more – all for no good reason.
===========================================================
***** 218562 16.719949
qpf = 1948750
---vsync interval test---
vNext vNow LATE delta ms align num-sleep-spins
413860 420637 6777 39029 20.027710 6.201854 1
446443 459610 13167 38973 19.998974 7.397968 1
479026 479088 62 19478 9.995125 7.995765 1
511609 518097 6488 39009 20.017447 9.192984 1
544192 557039 12847 38942 19.983066 10.388147 1
576775 596044 19269 39005 20.015394 11.585244 1
609358 615502 6144 19458 9.984862 12.182426 1
641941 654476 12535 38974 19.999487 13.378572 1
674524 693478 18954 39002 20.013855 14.575576 1
707107 712937 5830 19459 9.985375 15.172789 1
++++++++++++turn fix ON
739690 740041 351 27104 13.908403 16.004634 1
772273 772632 359 32591 16.724054 17.004880 2
804856 805200 344 32568 16.712251 18.004419 1
837439 837733 294 32533 16.694291 19.002885 2
870022 870342 320 32609 16.733291 20.003683 2
902605 902859 254 32517 16.686081 21.001657 1
935188 935433 245 32574 16.715330 22.001381 2
967771 968092 321 32659 16.758948 23.003714 2
1000354 1000666 312 32574 16.715330 24.003437 1
1032937 1033264 327 32598 16.727646 25.003898 2
++++++++++++turn fix OFF
1065520 1083193 17673 49929 25.621039 26.536261 2
1098103 1102714 4611 19521 10.017191 27.135377 1
1130686 1141774 11088 39060 20.043618 28.334162 1
1163269 1180703 17434 38929 19.976395 29.528926 1
===========================================================
It is rather hard to follow the Chrome code for someone like me who does not look at the source code every day, but where does DelayBasedTimeSource ultimately terminate (under Windows) in a wait/sleep/whatever? I assume it is an actual Sleep() or WaitForXXX?
I was able to prototype something that really works in an afternoon, and have tested it on Win7/Win8 computers. Hopefully your answer to the above question will tell me if this idea would be crazy easy to add into Chrome (or actually take a little work).
- Jerry
From: Jerry Jongerius [mailto:jer...@duckware.com]
Sent: Wednesday, March 11, 2015 3:28 PM
To: 'Brian C. Anderson'
Subject: RE: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...
In other words, in Windows, the inter-frame times (green), and the offset from true vsync (brown) that the rAF is called back are a sawtooth pattern, due to Windows Sleep() precision of 1ms
There is a way around that, causing these lines to become virtually flat, that does not involve spin waiting.
- Jerry
From: Jerry Jongerius [mailto:jer...@duckware.com]
Sent: Wednesday, March 11, 2015 2:47 PM
To: 'Brian C. Anderson'
Subject: RE: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...
That is able to sleep to a microsecond precision?
Chrome release/Canary is currently bound (under Windows) to the OS Sleep() precision of 1ms. Meaning that for a 120Hz (8.3ms) display, up to 1ms of that is essentially wasted.
There is a technique to get around that, and get back (for vsync alignment at least) microsecond sleeps under Windows.
- Jerry
From: Brian C. Anderson [mailto:brian...@google.com]
Sent: Wednesday, March 11, 2015 1:55 PM
To: Jerry Jongerius
Cc: Brandon Jones
Subject: Re: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...
Are you talking about using a different approach than what DelayBasedTimeSource uses?
On Wed, Mar 11, 2015 at 9:16 AM, Jerry Jongerius <jer...@duckware.com> wrote:
... Do you/Google already have a solution for this that does not involve
spin waiting? Because I think I just thought of a way to do it, if you want
to know about it... Please let me know...Already been prototyping some code and it works.
- Jerry
Brian, and team,
How any wait/sleep/etc under Windows typically works -- is that no matter when (in the OS allocated time slice) the call was issued, Windows will treat the wait/sleep/etc (for timeout purposes) to have been issued when the OS time slice began. Which means depending upon how much your timeout is and when in a time slice it was issued, the actual time out can be anywhere from one time slice too early to one time slice too late.
I noticed in Chrome that Sleep(ms) is implemented in a loop, possibly sleeping more than once, to make up for the ‘too early’ case.
If we changed Sleep(ms) to WaitForSingleObject(hEvent,ms), and then behind the scenes could ‘poke’ the hEvent at some higher accuracy, then Windows time slice errors could be greatly reduced.
So the entire trick is: How do we “behind the scenes ‘poke’ hEvent”. The beauty is that if we don’t poke, we have changed nothing. But if we do poke, all we have done is increase accuracy. So, the ‘poke’ happens from another dedicated thread (running at the absolute highest priority we can, like THREAD_PRIORITY_TIME_CRITICAL, and sleeps almost all the time, but smartly wakes up).
I was kind of using WaitForVBlank. For my quick prototype, I was using DirectDraw’s WaitForVerticalBlank. And it actually works fantastic. The only show stopper problem was that WaitForVerticalBlank is implemented via a spin wait. After extensive searches online, it appears that any/all of the ‘vertical blank’ wait calls in Windows are implemented via spin wait (I never tried IDXGIOutput::WaitForVBlank since I found comments online stating that it also spin waits).
BUT, Windows own Desktop Window Manager IS waiting for vertical blank somehow, without spin waiting. How that happens is not yet known, but I assume that given the size of Google, that Google would have some developer contacts with Microsoft to ask how that is done?
Until that answer is known, the second best solution is to use DwmFlush(). On my systems, it appears to trigger at a relatively steady offset (around 0.25ms) after vertical blanking starts, and before vertical blanking ends (around 0.6ms after it starts). Here is my prototype code:
DWORD WINAPI vsyncAlignerThread(LPVOID lpParam) {
typedef HRESULT (WINAPI *LPFNDWNFLUSH)(void);
HMODULE hDWMAPI = GetModuleHandle("dwmapi");
LPFNDWNFLUSH lpfnDwmFlush = hDWMAPI ? (LPFNDWNFLUSH)GetProcAddress(hDWMAPI, "DwmFlush") : NULL;
if (lpfnDwmFlush) {
SetThreadPriority( GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL );
while (true) {
HRESULT result = (*lpfnDwmFlush)();
if (S_OK==result) {
triggerSleepsToWakeup(); // poke event
}
else {
Sleep(100);
}
}
}
return 0;
}
Actually, there is a subtle but incredibly critical advantage to using DwmFlush. The vertical blank timing information obtained from DwmGetCompositionTimingInfo() is not exact, but is itself sampled information. That means that once Chrome computes a time in microseconds that it wants to wake up for vsync work, that time itself has subtle error (micro-jank). It we switched to using a trigger that is ‘precisely’ vsync aligned, we would absolutely need to address that subtle error. But as it stands now, the delay in DwmFlush() after true vsync is larger than the vsync wakeup times micro-jank, meaning that we don’t need to adjust anything at all.
Hopefully there are no show stopper issues with using DwmFlush(). The only crazy thing I did run into on Win7 is that during the first couple of seconds of my application starting up, DwmFlush fails (but then starts working). Which is why on an error, I Sleep and retry later. I did not run into a problem on Win8.
Anyone there willing to give this is quick try in Canary, replacing triggerSleepsToWakeup() with code to wakes up MsgWaitForMultipleObjectsEx, and see if rAF callback inter-frame times improve a lot?
Brian and team,
I just figured it out. I can now wait for the exact vsync alignment, and wake up, without spin waiting – even when Sleep(1) won’t wake up for another 15.625ms. This must be the technique that Windows Desktop Window Manager is using. Should work on Vista and later. Tested (and works) on Win7 and Win81.
Now it will be up to you guys to figure out how best to work around the micro-jitter issue. Namely, if the head timer object is a vsync timer object, and wants to wake up at “T”, but the new code above wakes up at “T” minus 0.03ms, the timer T has not expired and will not fire -- even though it is a vsync timer and should fire.
Brian, please follow up for details…
- Jerry
From: Jerry Jongerius [mailto:jer...@duckware.com]
Sent: Friday, March 13, 2015 7:13 AM
To: 'Brian C. Anderson'
Cc: 'Brandon Jones'; 'schedu...@chromium.org'
Subject: RE: A way for rAF callbacks to be perfectly VSYNC aligned and work around Windows Sleep() accuracy...
Brian, and team,