Re: CFI: new (limited) launch is coming soon

6 views
Skip to first unread message

Kentaro Hara

unread,
Aug 17, 2016, 10:36:07 PM8/17/16
to Ivan Krasin, platform-architecture-dev, Elliott Sprehn, Matt Sheets, Kostya Serebryany, Peter Collingbourne, Brett Wilson, Annie Sullivan, Nico Weber, Elliott Friedman, Primiano Tucci, Bruce Dawson, Project TRIM, Nat Duca
+platform-architecture-dev

(Context: Ivan is planning to launch CFI for virtual function calls. See here for the full context.)

Is there any update about the performance regression you'd been observing in blink_perf.layout?

From the Blink perspective, I support this change assuming that the following benchmarks don't regress:

- blink_perf.*
- dromaeo.dom*
- speedometer
- animometer



On Thu, Aug 18, 2016 at 8:52 AM, Ivan Krasin <kra...@google.com> wrote:
Hello everyone,

we're back. :)
In the past month three things happened bringing us closer to no-perf-regressions state:

1. I have landed a few changes to LLVM that allow us to see which virtual call sites / methods have been devirtualized and why. Please, find this wonderful list of 51491 devirtualized site calls pointing to 23149 virtual methods:

Many of the call sites have been devirtualized due to a single implementation of a virtual method. Usually, such methods are virtual, only because there's a real and a test implementation, where Chrome binary only has the real one, and LinkTimeOptimization can see that removing the virtual call. Others have been eliminated by virtual const propagation. All in all, the list looks good, and we know how to devirtualize even more methods, which is a matter of an additional engineering effort.

2. After profiling the observed slowdown, I have identified ~130 methods which are either too hot, or have too many CFI checks, so they affect the overall performance, see https://crbug.com/634139
I plan to add an attribute on top of each such method to disable CFI on them, and that bring us to an overhead that I could not measure. See https://storage.googleapis.com/cfi-stats/2016-08-15/results.html

While 130 seems a lot, that's only ~0.1% of all the methods in Chrome => we'll have it enabled on 99.9%, which is pretty good already.

For the methods which have too many CFI checks, there're ideas how to significantly reduce their number, see, for example: https://crbug.com/638056. That optimization alone should cover 6 out of 8 methods in V8 which we'll need to disable CFI on. That will take time to implement, though, and does not seem like a launch blocker.

3. I have created a launch bug as suggested by Brett: https://crbug.com/638779

krasin

On Sat, Jul 16, 2016 at 3:35 PM, Ivan Krasin <kra...@google.com> wrote:
Sure!
And thank you for your rigor review from the perf side: I was impressed seeing that you went through our open bugs to see what's there.

krasin

On Sat, Jul 16, 2016 at 3:32 PM, Elliott Sprehn <esp...@chromium.org> wrote:

Thanks so much for staying on top of this! Let me know if you need help.

- E


On Jul 16, 2016 11:02 AM, "Ivan Krasin" <kra...@google.com> wrote:

Thank you everyone for the input and help. I hope to come back to this thread, if we find a way to speed this up.





--
Kentaro Hara, Tokyo, Japan

Ivan Krasin

unread,
Aug 18, 2016, 2:30:34 AM8/18/16
to Kentaro Hara, platform-architecture-dev, Elliott Sprehn, Matt Sheets, Kostya Serebryany, Peter Collingbourne, Brett Wilson, Annie Sullivan, Nico Weber, Elliott Friedman, Primiano Tucci, Bruce Dawson, Project TRIM, Nat Duca
Hi Kentaro,

On Wed, Aug 17, 2016 at 7:35 PM, Kentaro Hara <har...@chromium.org> wrote:
+platform-architecture-dev

(Context: Ivan is planning to launch CFI for virtual function calls. See here for the full context.)

Is there any update about the performance regression you'd been observing in blink_perf.layout?
This is literally the point #2 in my update (see above). Sorry if that was not clear. :)
 

From the Blink perspective, I support this change assuming that the following benchmarks don't regress:

- blink_perf.*
 

- dromaeo.dom*
Will collect stats for that tomorrow.
 
- speedometer
- animometer
For these two benchmarks (speedometer and animometer) there was no slowdown even in our previous try.

Kentaro Hara

unread,
Aug 18, 2016, 4:30:10 AM8/18/16
to Ivan Krasin, platform-architecture-dev, Elliott Sprehn, Matt Sheets, Kostya Serebryany, Peter Collingbourne, Brett Wilson, Annie Sullivan, Nico Weber, Elliott Friedman, Primiano Tucci, Bruce Dawson, Project TRIM, Nat Duca
On Thu, Aug 18, 2016 at 3:30 PM, Ivan Krasin <kra...@google.com> wrote:
Hi Kentaro,

On Wed, Aug 17, 2016 at 7:35 PM, Kentaro Hara <har...@chromium.org> wrote:
+platform-architecture-dev

(Context: Ivan is planning to launch CFI for virtual function calls. See here for the full context.)

Is there any update about the performance regression you'd been observing in blink_perf.layout?
This is literally the point #2 in my update (see above). Sorry if that was not clear. :)
 

From the Blink perspective, I support this change assuming that the following benchmarks don't regress:

- blink_perf.*
 

- dromaeo.dom*
Will collect stats for that tomorrow. 
 
- speedometer
- animometer
For these two benchmarks (speedometer and animometer) there was no slowdown even in our previous try. 

Regarding Dromaeo, I think you already collected the data before. At that point there was no regression.

Sorry, reading your email, I thought that you've made some substantial changes to LLVM since the last update, which would have changed (hopefully improved) performance. So I just wanted to say that if you don't observe any regression on the listed benchmarks, I'm fine with shipping it (from the Blink perspective).

Ivan Krasin

unread,
Aug 18, 2016, 4:33:43 AM8/18/16
to Kentaro Hara, platform-architecture-dev, Elliott Sprehn, Matt Sheets, Kostya Serebryany, Peter Collingbourne, Brett Wilson, Annie Sullivan, Nico Weber, Elliott Friedman, Primiano Tucci, Bruce Dawson, Project TRIM, Nat Duca
The substantial changes are related to the visibility of devirtualization. That allows us to both keep track of regressions and understand what else can be devirtualized. No new optimizations have been implemented just yet, but there's a good list of what might be the next.

Yuta Kitamura

unread,
Aug 18, 2016, 5:03:15 AM8/18/16
to Ivan Krasin, Kentaro Hara, platform-architecture-dev, Elliott Sprehn, Matt Sheets, Kostya Serebryany, Peter Collingbourne, Brett Wilson, Annie Sullivan, Nico Weber, Elliott Friedman, Primiano Tucci, Bruce Dawson, Project TRIM, Nat Duca
Just curious, if you apply the optmization without any CFI checks, how much perf gain will you get?

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CAOei5RezWnUZQ1oDgMZDdrbGG4J9CQoahXsw6EYxx7G17M7cxw%40mail.gmail.com.

Ivan Krasin

unread,
Aug 18, 2016, 1:55:52 PM8/18/16
to Yuta Kitamura, Kentaro Hara, platform-architecture-dev, Elliott Sprehn, Matt Sheets, Kostya Serebryany, Peter Collingbourne, Brett Wilson, Annie Sullivan, Nico Weber, Elliott Friedman, Primiano Tucci, Bruce Dawson, Project TRIM, Nat Duca
Hi Yuta,

this is exactly what currently happens in the official Chrome on Linux x86-64: devirtualization without CFI. When we turned it on, the overall the effect was small (~1%, may be even less), but there had been some blink micro-benchmarks which got faster up to 8%. See more details on https://crbug.com/580389 and https://crbug.com/617283 (the latter is Perf team runs bisects on a few selected microbenchmarks to confirm the speedups)

What we didn't get with turning on devirtualization is an automated report from the Perf dashboard about XX benchmarks improved. Even for these few large speedups, we had to manually look into the graphs to request the bisects. And if if 8% was easy to spot, even 3% is much harder as the level of noise is usually 3%-5% on the perf graphs.

The best thing about devirtualization is that we're not done yet. While not every method could potentially be devirtualized, there's still a number of interesting cases which are shown as hot in the profile.

krasin


On Thu, Aug 18, 2016 at 2:02 AM, Yuta Kitamura <yu...@chromium.org> wrote:
Just curious, if you apply the optmization without any CFI checks, how much perf gain will you get?
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsubsc...@chromium.org.

To post to this group, send email to platform-architecture-dev@chromium.org.

Ivan Krasin

unread,
Aug 19, 2016, 7:35:51 PM8/19/16
to Yuta Kitamura, Kentaro Hara, platform-architecture-dev, Elliott Sprehn, Matt Sheets, Kostya Serebryany, Peter Collingbourne, Brett Wilson, Annie Sullivan, Nico Weber, Elliott Friedman, Primiano Tucci, Bruce Dawson, Project TRIM, Nat Duca
Hi everyone,

this is just an FYI.

I made the second attempt to enable CFI on Linux x86-64: https://codereview.chromium.org/2259293002/
This is still not a launch, unless we got our launch bug approved (https://crbug.com/638779), the CL will be reverted next week. The intent is to reassess perf impact and make one-two iterations on adding more hot methods to the blacklist if needed, but the chances for that need are low; I was pretty aggressive already).

So far, it's only the binary size (4.9% larger), in line with the predictions. I don't expect any more regressions reported (unless they are unrelated to the launch and caused by adjacent changes from others), and to see if we regress by 2%-3% on some microbenchmarks will require to look the graphs in a day or so: such small changes are impossible to spot given just a couple of points.

krasin

Ivan Krasin

unread,
Aug 21, 2016, 3:19:05 PM8/21/16
to Yuta Kitamura, Kentaro Hara, platform-architecture-dev, Elliott Sprehn, Matt Sheets, Kostya Serebryany, Peter Collingbourne, Brett Wilson, Annie Sullivan, Nico Weber, Elliott Friedman, Primiano Tucci, Bruce Dawson, Project TRIM, Nat Duca
There're no real regressions happened: https://chromeperf.appspot.com/report?sid=f29b95ffd471515fe48e6a8dd5a13592611997bbf5c4163454a3bcf14a3eb232
Initially (r413252), there was a spike caused by my mistake in defining the blacklist, which I have since fixed: https://codereview.chromium.org/2267543002/ and then a I added a few more methods to the list (just to be on the safe side). 

On a related note, once the new Clang toolchain was rolled, the performance of large-table-with-collapsed-borders-and-colspans-wider-than-table.html (proved to be the most challenging last time) was improved by ~4%. The improvement is independent of our devirtualization efforts (as it got faster on Mac too), but it shows how noisy this microbenchmark is.

Anyway, I am pretty happy regarding to the Perf impact. I would appreciate if someone took a critical look at the graphs and confirm / refute my statement. I would not be surprised some graphs to show ~2% degradation, and I am willing to look into them and add more things to the blacklist, but it would be bad and feels unlikely, if something is 10% slower.

krasin

Kentaro Hara

unread,
Aug 21, 2016, 9:06:24 PM8/21/16
to Ivan Krasin, Yuta Kitamura, platform-architecture-dev, Elliott Sprehn, Matt Sheets, Kostya Serebryany, Peter Collingbourne, Brett Wilson, Annie Sullivan, Nico Weber, Elliott Friedman, Primiano Tucci, Bruce Dawson, Project TRIM, Nat Duca
I don't see any substantial regression in the graph. Congrats!

What's the effect on the binary size? (Sorry if you have mentioned that earlier -- this thread is getting a bit too messy to understand the latest status :-)




You received this message because you are subscribed to the Google Groups "Project TRIM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim+unsubscribe@chromium.org.
To post to this group, send email to projec...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CAOei5RdJiM4RbHPmLi3QkB7zNJJ_7WT09G8ZN%3DA5K88zDyVXow%40mail.gmail.com.

Ivan Krasin

unread,
Aug 21, 2016, 10:03:48 PM8/21/16
to Kentaro Hara, Yuta Kitamura, platform-architecture-dev, Elliott Sprehn, Matt Sheets, Kostya Serebryany, Peter Collingbourne, Brett Wilson, Annie Sullivan, Nico Weber, Elliott Friedman, Primiano Tucci, Bruce Dawson, Project TRIM, Nat Duca
Hi Kentaro,

thank you for looking into this. The binary size increased by 4.9%.

If Chrome adopts relative ABI for vtables, that could bring up to 9% in binary size savings: https://bugs.chromium.org/p/chromium/issues/detail?id=589384#c2 but it's not yet clear if it's a win from the perf perspective. We're looking into this, but I am not terribly optimistic.

krasin
Reply all
Reply to author
Forward
0 new messages