In terms of reducing record/replay non-determinism, we have developed
a number of "best effort" approaches to limit it, both for Blink and
for V8. For example, for V8 we record/replay values returned from
Math.random() and from Date(). We do this at the level of V8's
platform (e.g., we "wrap" the platform to hook OS::TimeCurrentMillis).
We do more "advanced" things for JS-driven asynchronous requests, etc.
I could share more technical details privately, if you are interested
(we have a full academic paper under submission, and would like not to
openly disclose all the details until that becomes public).
BTW, you can see WebCapsule in action here (and other videos by
searching for "WebCapsule project" on Youtube):
https://www.youtube.com/watch?v=K1CwIwcTgbE
Thanks,
Roberto
On Wed, May 20, 2015 at 11:20 AM, Rick Byers <rby...@chromium.org> wrote:
>
> On Wed, May 20, 2015 at 10:57 AM, Roberto Perdisci
> <roberto....@gmail.com> wrote:
>>
>> Dear Blink-dev list,
>>
>> I'm trying to figure out if there is a specific past Git commit ID for
>> Blink's code that contains a full implementation of the Blink scheduler
>> described in this document:
>> https://docs.google.com/document/d/11N2WTV3M0IkZ-kQlKWlBcwkOkKTCuLXGVNylK5E2zvc/edit#heading=h.3ay9sj44f0zd
>
>
> +scheduler-dev.
>
>> I understand that there is a heavy refactoring in progress for the Blink
>> scheduler, as outlined in this other document:
>> https://docs.google.com/document/d/16f_RIhZa47uEK_OdtTgzWdRU0RFMTQWMpEWyWXIpXUo/edit#heading=h.srz53flt1rrp
>>
>> However, my understanding is that this refactoring is far from complete,
>> and the code is being heavily changed.
>>
>> Just to give a bit of background about the above question: we have
>> developed a system called WebCapsule that is able to record web browsing
>> traces, offload the recorded data, and then seamlessly replay the recorded
>> browsing activities in a separate isolated environment with no new user
>> input or network resources (we had a poster about WebCapsule at this year's
>> Usenix NSDI conference: http://goo.gl/RrJRDZ). This is done via a
>> self-contained instrumentation of Blink (no changes to any code outside of
>> Blink), and by leveraging DevTools. Our current replay strategy takes a
>> "best effort" approach to cope with non-determinism introduced by thread
>> scheduling. While our current approach works quite well in practice, we are
>> planning to instrument the Blink scheduler to get closer to fully
>> deterministic replay. As we are not currently interested in all the UI-level
>> optimizations that seem to have motivated the Blink scheduler refactoring,
>> my thinking is that we can work off of the previous Blink scheduler
>> implementation to achieve (or get really close) to our goals.
>
>
> Note that "UI-level optimizations" are the primary reason for the existence
> of the blink scheduler in the first place (eg. to try to get smooth
> scrolling during page load). Perhaps rather than find an old version to
> use, you just want to disable the scheduler with
> --disable-blink-features=BlinkScheduler?
>
> BTW, your system sound interesting. We rely heavily on "web page replay"
> for our 'telemetry' performance testing, but it doesn't attempt to replay
> user input - just network traffic. Adding user input record and replay
> seems like it could be valuable for both perf and functional testing. If
> there are other places you've successfully reduced non-determinism I'd love
> to hear details (perhaps we can bake it more directly into chrome or
> telemetry) non-determinism can be a huge pain for our performance testing.
>
>> Any help would be greatly appreciated.
>>
>> Thank you,
>> regards
>>
>>
>> Roberto
>>
>>
>
https://www.amazon.com/empty.gif?1424608322667
where 1424608322667 is a timestamp (in milliseconds). During replay, that timestamp may be off, making it difficult to match the requested URL with its response during replay.
In WebCapsule we use a number of approaches to try to solve this type of problems:
1) We record the return value of calls to CurrentTime and MonotonicallyIncreasingTime from both Blink's and V8's platform APIs. During replay, we attempt to re-synch the "replay clock" to the recorded timeline (this is a purely best-effort approach, but it actually brings us close to what we want).
2) We record JS calls to Math.random(), and during replay we attempt to replay the same return values as seen during recording (in case the URL embeds a parameter derived from a random number).
3) For every network request, we record the current JavaScript call stack. If we are not able to match a response with the previous methods, we can attempt to match the JS call stack during replay with what seen during recording, to identify the correct response. Essentially, the JS callstack becomes a key to the table of network responses. Combined with other information (URL's domain/structure/timestamp) this can actually help a lot).
4) One thing that we have not yet implemented but are planning to do is approximate matching of URLs. Again, the idea is to try to identify the correct response, even if the URL requested during replay is slightly different from the URL seen during recording.
Methods 1) and 2) aim to "force" Blink/V8 to re-generate the very same URLs as seen during recording. Methods 3) and 4) aim to take care of those cases in which 1) and 2) failed, for some reason.
The other part of WebCapsule that I think may be helpful to Telemetry is the recording and replay of user-browser interactions (key-presses, clicks, mouse movements, taps, gestures, page scrolls, etc.).
I'm not familiar with Tactile. I searched online, but I found several possible relevant results. Could you point me more specifically to the Tactile you are referring to?
Thank you very much,
regards
Roberto
Hi Roberto, sorry, didn't realize you are not at Google, sorry for using an internal nickname:) Let's start from the beginning... say, Google Maps – do a search, wait for the results to display on the map, click on one of the results, wait for the result popup to render. There is a lot of semi-randomness hidden underneath. Have you tried that with WebCapsule?
Michael
Sent from my mobile device
When you say modifications, does that involve building a custom binary of the browser with WebCapsule enabled?
Michael
Sent from my mobile device
Also when replaying, what drives the browser - WebCapsule, or could it be a separate automated test framework like e.g. Telemetry or WebDriver? If WebCapsule drives the browser on replay, how do you deal with situations when in the recording phase the test framework was waiting for certain page elements to render before proceeding - how would WebCapsule know to do the same waits on replay? Page loading is never fully deterministic (and forcing it to be fully deterministic kind of undermines the performance testing use case), so I'd imagine it's hard to infer such waits just by observing browser traces.
Michael
Sent from my mobile device
Think WebPageTest.org, with scripted multi-step tests, not just URL loads. Use for example Speed Index computed from recorded video as the main performance metric. Network conditions simulated via dummynet.
https://sites.google.com/a/webpagetest.org/docs/system-design/mobile-testing
https://sites.google.com/a/webpagetest.org/docs/system-design/webpagetest-relay
https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/scripting
https://sites.google.com/a/webpagetest.org/docs/private-instances/node-js-agent/async-js
Reliable and complete replay for multistep test scenarios is very important: everything must load correctly, e.g. no missed map tiles on replay, even if missing tiles wouldn't fail the test per se. This is in contrast with a different use case where a test harness measures page load for 10000 pages and doesn't care if 5% of them fail to replay properly. We need 100%, and we deal mostly with handwritten tests for a given web app, not measuring across 10000 URLs.
We also seriously care about non-Chrome browsers, in particular Mobile Safari. But if there is a much better replay for Chrome, we would certainly consider integrating that in addition to the browser-agnostic, but more complex setup+maintenance WebPageReplay.
Michael
Thank you, Michael.
The links you sent provided us with very useful information, I really
appreciate it.
I have one more question to further clarify if this may be one of the
use cases that Telemetry/WPR is interested in:
Let's assume we have recorded a browsing session of a (simulated?)
user that visits and interacts with Google Maps (e.g., the user
searches for a location, clicks on the map, moves/zooms the map,
etc.). Now, we want to replay this browsing session. Specifically, we
want to replay the user inputs and all the resulting network
requests/responses generated by the browser.
In essence, the UI inputs and network traffic could be seen as a
"constant" at this point. What would vary, and is therefore the
important thing to measure, is the Speed Index (or similar metric)
related to replaying the same browsing trace on multiple devices and
browsers.
The net results is that, all other conditions being equal, we can
directly compare the performance of rendering Google Maps on different
browsers, or the same browser but on different devices/platforms.