Background script streaming thread for async scripts

271 views
Skip to first unread message

Mark Hahnenberg

unread,
Nov 19, 2015, 4:17:06 PM11/19/15
to v8-dev, mhahn...@fb.com
Hello v8 folks, 

I filed this Chromium issue (https://code.google.com/p/chromium/issues/detail?id=557466) a couple days ago, but I attended the Chrome Dev Summit in Mountain View yesterday and was directed to this list for V8-related internal issues. So here I am! 

I've been poking around at the background script streaming thread logic for the past week or so to see if I could increase the amount of async script parsing that we do off the main thread when loading facebook.com (disclaimer: I work for Facebook). Facebook sends a pretty sizable pile of JS down to the browser on page load, most of which is not very hot (e.g. in Safari, ~87% of the ~6500 executed functions never make it past the bytecode interpreter). When running Instruments on a trunk release build of Chromium I see ~20% of overall execution time spent parsing JS and generating code on the main thread while I see < 1% CPU time spent on the script streaming thread. According to some console-fu, it seems that ~70% of the script tags in the document on home.php are marked with the async attribute, so given how much time we spend on the main thread parsing JS I figured that there should also be plenty of work for the script streaming thread.

I think I've improved utilization of the script streaming thread slightly with a few tweaks including:

(1) reducing the small script threshold from 30 KB, which was preventing the background thread from even attempting to parse many scripts.
(2) forcing the script streaming thread to perform eager rather than lazy parsing in hopes of reducing the total amount of parsing happening on the main thread.
(3) allowing the main thread to retry posting parsing tasks to the background thread at a later time if the background thread is currently busy.

These tweaks may have moved the needly slightly (I'm not 100% convinced since I haven't done any rigorous measurements), but I'm still seeing a suspiciously high amount of CPU time spent on the main thread parsing JS along with a suspiciously low amount of parsing activity on the background thread when looking at both Instruments profiles and Chrome traces recorded with chrome://tracing. In a simplified test page that tries loading multiple async scripts concurrently, I noticed that even when the script streaming thread parses a script, it then sends a task back to the main thread which then proceeds to do a bunch more lazy parsing! I realize that the background thread isn't allowed to generate code so it must give it back to the main thread, but this additional parsing time surprised me.

I tried adding some more tracing events to get a better picture of what's going on at various times. For example, it seems like we receive JS scripts on the main thread in clusters rather than in an evenly spaced manner which, due to the fact that the background thread synchronously parses a single JS file at a time, causes the other scripts to have to wait to be parsed. A pool of worker threads might help some, but probably won't be sufficient on devices with low core counts. It might be better if we could incrementally parse JS and multiplex multiple JS files on one or two parsing threads. Incremental parsing sounds like a big, potentially invasive effort though.

I realize that forcing more parsing onto the background thread could cause adverse side effects related to page interactivity, but I'd like to be able to experiment with different amounts of parsing on either the main thread or the background thread. I was wondering, given the steps I've taken, if I've missed or misunderstood anything. I think off-main-thread parsing is a great idea, so any help debugging/understanding/improving the script streaming thread would be very much appreciated :-) And while the script streaming stuff was just the first weird thing I stumbled across, given the importance of parsing in general on complex, JS-heavy sites like Facebook, any additional info related to how V8 does parsing and how sites might be able to take advantage of that would be very helpful.

Thanks!
-Mark Hahnenberg

Daniel Vogelheim

unread,
Nov 23, 2015, 1:35:37 PM11/23/15
to v8-...@googlegroups.com, mhahn...@fb.com
Hi there,

On Thu, Nov 19, 2015 at 10:17 PM, Mark Hahnenberg <mhah...@gmail.com> wrote:
Hello v8 folks, 

I filed this Chromium issue (https://code.google.com/p/chromium/issues/detail?id=557466) a couple days ago, but I attended the Chrome Dev Summit in Mountain View yesterday and was directed to this list for V8-related internal issues. So here I am! 

I've been poking around at the background script streaming thread logic for the past week or so to see if I could increase the amount of async script parsing that we do off the main thread when loading facebook.com (disclaimer: I work for Facebook). Facebook sends a pretty sizable pile of JS down to the browser on page load, most of which is not very hot (e.g. in Safari, ~87% of the ~6500 executed functions never make it past the bytecode interpreter). When running Instruments on a trunk release build of Chromium I see ~20% of overall execution time spent parsing JS and generating code on the main thread while I see < 1% CPU time spent on the script streaming thread. According to some console-fu, it seems that ~70% of the script tags in the document on home.php are marked with the async attribute, so given how much time we spend on the main thread parsing JS I figured that there should also be plenty of work for the script streaming thread.

I think I've improved utilization of the script streaming thread slightly with a few tweaks including:

(1) reducing the small script threshold from 30 KB, which was preventing the background thread from even attempting to parse many scripts.

We should try this. Which threshold did you choose?
 
(2) forcing the script streaming thread to perform eager rather than lazy parsing in hopes of reducing the total amount of parsing happening on the main thread.

I expect this to be _very_ bad for memory consumption. The V8 AST - the result of a parse - is unfortunately rather large, so that not parsing everything upfront conserves a lot of memory. I'm skeptical this change would be beneficial in general and think this would require some rather careful benchmarking across different websites and device types.

That said, I've recently run across several situations - usually large, framework-y web apps - where excessive lazy- and then re-parsing was a problem. I'm quite certain our lazy parse heuristic could be improved; I don't really have a plan how. If you have any suggestions, I'd be eager to listen.

Generally speaking, the heuristic tries to "guess" whether a given function will be called during the initial script evaluation. This gets called a lot and hence needs to be fast, and since it needs to decide before the function is even parsed, it only has the function header to go on.
 
(3) allowing the main thread to retry posting parsing tasks to the background thread at a later time if the background thread is currently busy.

This might make sense, but I guess it's difficult to do, since there isn't really a good point when that should get done. See below on task scheduling, though.

These tweaks may have moved the needly slightly (I'm not 100% convinced since I haven't done any rigorous measurements), but I'm still seeing a suspiciously high amount of CPU time spent on the main thread parsing JS along with a suspiciously low amount of parsing activity on the background thread when looking at both Instruments profiles and Chrome traces recorded with chrome://tracing. In a simplified test page that tries loading multiple async scripts concurrently, I noticed that even when the script streaming thread parses a script, it then sends a task back to the main thread which then proceeds to do a bunch more lazy parsing! I realize that the background thread isn't allowed to generate code so it must give it back to the main thread, but this additional parsing time surprised me.

Not sure what you're describing here. There's two things:

- There's a 'finishing' step to background parsing that needs to happen on the main thread. Basically, background parsing can't modify any global state while the main thread might mutate it, so the background parser builds up a separate data structure, which the main thread will then need to patch into the heap. This happens for every background parse, and I don't see a chance to avoid this without a major rewrite.

- The background parse thread uses the same lazy/eager parse heuristic as the main thread. If it guesses wrongly, then the main thread will have to do a lot of synchronous parsing while it executes.

You should be able to distinguish these in chrome://tracing, as the first happens before any execute (for that script), while the second happens during an execute.

I tried adding some more tracing events to get a better picture of what's going on at various times. For example, it seems like we receive JS scripts on the main thread in clusters rather than in an evenly spaced manner which, due to the fact that the background thread synchronously parses a single JS file at a time, causes the other scripts to have to wait to be parsed. A pool of worker threads might help some, but probably won't be sufficient on devices with low core counts. It might be better if we could incrementally parse JS and multiplex multiple JS files on one or two parsing threads. Incremental parsing sounds like a big, potentially invasive effort though.

Hmm. There are some changes to Chromium's task scheduling in progress, that might help us here. And with (3) above. That work is still ongoing, though, and I'm not terribly familiar with it, so I'd need to find the right people to talk to.

I realize that forcing more parsing onto the background thread could cause adverse side effects related to page interactivity, but I'd like to be able to experiment with different amounts of parsing on either the main thread or the background thread. I was wondering, given the steps I've taken, if I've missed or misunderstood anything. I think off-main-thread parsing is a great idea, so any help debugging/understanding/improving the script streaming thread would be very much appreciated :-) And while the script streaming stuff was just the first weird thing I stumbled across, given the importance of parsing in general on complex, JS-heavy sites like Facebook, any additional info related to how V8 does parsing and how sites might be able to take advantage of that would be very helpful.

Generally, we might look at (code) caching. The code cache - if successful - skips the parse entirely. How effective that is of course depends on how often you update how much of your code; and possibly on how exactly you do that.

Another, more complex thing (where I'm not sure about the details) might be Service Workers, since they allow a site to be explicit about what it wants cached, etc. This is not merely an implementation detail, though, and may require some greater rework on your side. Honestly, I'm not sure what the current status of ServiceWorkers is, though.


A silly question: How do we reproduce this? https://facebook.com, and I suspect I need to be logged in? Anything else?


Thanks,
Daniel



Mark Hahnenberg

unread,
Dec 9, 2015, 5:34:42 PM12/9/15
to v8-dev, mhahn...@fb.com, voge...@google.com
Apologies for the delayed response :-)


On Monday, November 23, 2015 at 10:35:37 AM UTC-8, Daniel Vogelheim wrote:

(1) reducing the small script threshold from 30 KB, which was preventing the background thread from even attempting to parse many scripts.

We should try this. Which threshold did you choose?
 
I lowered it to 1 byte in an effort to get the script streaming thread to parse as many scripts as possible regardless of size. I realize that this might not be a good strategy since the thread synchronously parses whichever script it gets first, so it actually might decrease overall throughput/utilization. I can try some other settings, (e.g. 1 KB) which might give better signal that the script has a higher probability of being fully downloaded.
 
 
(2) forcing the script streaming thread to perform eager rather than lazy parsing in hopes of reducing the total amount of parsing happening on the main thread.

I expect this to be _very_ bad for memory consumption. The V8 AST - the result of a parse - is unfortunately rather large, so that not parsing everything upfront conserves a lot of memory. I'm skeptical this change would be beneficial in general and think this would require some rather careful benchmarking across different websites and device types.

That said, I've recently run across several situations - usually large, framework-y web apps - where excessive lazy- and then re-parsing was a problem. I'm quite certain our lazy parse heuristic could be improved; I don't really have a plan how. If you have any suggestions, I'd be eager to listen.

Generally speaking, the heuristic tries to "guess" whether a given function will be called during the initial script evaluation. This gets called a lot and hence needs to be fast, and since it needs to decide before the function is even parsed, it only has the function header to go on.

Makes sense. I'm mostly seeing how far I can push scripts to parse in the background, so memory wasn't my main concern. But yes, benchmarking to find sensible heuristics seems like a good plan to me :-)

Re-parsing is an interesting aspect to this. I'll look into this some more to see if we're getting hit by lots of re-parsing.
 
 
(3) allowing the main thread to retry posting parsing tasks to the background thread at a later time if the background thread is currently busy.

This might make sense, but I guess it's difficult to do, since there isn't really a good point when that should get done. See below on task scheduling, though.

These tweaks may have moved the needly slightly (I'm not 100% convinced since I haven't done any rigorous measurements), but I'm still seeing a suspiciously high amount of CPU time spent on the main thread parsing JS along with a suspiciously low amount of parsing activity on the background thread when looking at both Instruments profiles and Chrome traces recorded with chrome://tracing. In a simplified test page that tries loading multiple async scripts concurrently, I noticed that even when the script streaming thread parses a script, it then sends a task back to the main thread which then proceeds to do a bunch more lazy parsing! I realize that the background thread isn't allowed to generate code so it must give it back to the main thread, but this additional parsing time surprised me.

Not sure what you're describing here. There's two things:

- There's a 'finishing' step to background parsing that needs to happen on the main thread. Basically, background parsing can't modify any global state while the main thread might mutate it, so the background parser builds up a separate data structure, which the main thread will then need to patch into the heap. This happens for every background parse, and I don't see a chance to avoid this without a major rewrite.

I think this is what I was referring to. I saw some references to this step in some of the comments.
 

- The background parse thread uses the same lazy/eager parse heuristic as the main thread. If it guesses wrongly, then the main thread will have to do a lot of synchronous parsing while it executes.

You should be able to distinguish these in chrome://tracing, as the first happens before any execute (for that script), while the second happens during an execute. 

Cool, I'll check that out.
 

I tried adding some more tracing events to get a better picture of what's going on at various times. For example, it seems like we receive JS scripts on the main thread in clusters rather than in an evenly spaced manner which, due to the fact that the background thread synchronously parses a single JS file at a time, causes the other scripts to have to wait to be parsed. A pool of worker threads might help some, but probably won't be sufficient on devices with low core counts. It might be better if we could incrementally parse JS and multiplex multiple JS files on one or two parsing threads. Incremental parsing sounds like a big, potentially invasive effort though.

Hmm. There are some changes to Chromium's task scheduling in progress, that might help us here. And with (3) above. That work is still ongoing, though, and I'm not terribly familiar with it, so I'd need to find the right people to talk to.

Understanding how Chrome scheduling works would be useful even outside of all this parsing stuff :-)
 

I realize that forcing more parsing onto the background thread could cause adverse side effects related to page interactivity, but I'd like to be able to experiment with different amounts of parsing on either the main thread or the background thread. I was wondering, given the steps I've taken, if I've missed or misunderstood anything. I think off-main-thread parsing is a great idea, so any help debugging/understanding/improving the script streaming thread would be very much appreciated :-) And while the script streaming stuff was just the first weird thing I stumbled across, given the importance of parsing in general on complex, JS-heavy sites like Facebook, any additional info related to how V8 does parsing and how sites might be able to take advantage of that would be very helpful.

Generally, we might look at (code) caching. The code cache - if successful - skips the parse entirely. How effective that is of course depends on how often you update how much of your code; and possibly on how exactly you do that.

I've also looked into the V8 code cache a little bit. We seem to have a decent hit rate with a warm cache, but we still spend a non-trivial chunk of time in parsing/code-gen looking stuff according to Instruments (the exact numbers escape me right now). I've been wanting to dig into this a bit more to get a sense of the heuristics the cache uses w.r.t. things like size, eviction policy, etc.
 

Another, more complex thing (where I'm not sure about the details) might be Service Workers, since they allow a site to be explicit about what it wants cached, etc. This is not merely an implementation detail, though, and may require some greater rework on your side. Honestly, I'm not sure what the current status of ServiceWorkers is, though.

Facebook is actively investigating using ServiceWorkers more widely for a variety of things, so this might be a possibility.
 


A silly question: How do we reproduce this? https://facebook.com, and I suspect I need to be logged in? Anything else?

Yep, just refreshing news feed while logged in and waiting until the page has stopped changing/loading.

Thanks a bunch!
-Mark
Reply all
Reply to author
Forward
0 new messages