base::FeatureParam<T> optimization with a static local cache

9 views
Skip to first unread message

Takashi Toyoshima

unread,
Aug 8, 2024, 9:25:38 PM8/8/24
to binar...@chromium.org, anth...@chromium.org, Alexei Svitkine, Gabriel Charette
Hi, binary-size members!

I'm now working on a CL that introduces macros to provide a function local static cache for base::FeatureParam<T>.

In my POC, https://chromium-review.googlesource.com/c/chromium/src/+/5557049/63, it shows 1-5ms speed up on each speedometer3 benchmark, but results in 14.5 KiB APK size increase.

Is this acceptable, or should we consider applying the cache only for effective cases to minimize the binary size impact?

Currently, I think 3 choices;
1) enforce the cache to all use cases; 14.5KiB increase
2) apply the cache only to commonly effective cases, i.e. just exclude std::string cases
3) apply the cache manually to minimum cases

1) is most beneficial from the viewpoint of performance, but it would be a balance between the gain vs binary size. I want to decide the best choice in this mail list.


Thanks.


--
Takashi Toyoshima
Software Engineer, Google

Andrew Grieve

unread,
Aug 8, 2024, 9:46:11 PM8/8/24
to Takashi Toyoshima, binar...@chromium.org, anth...@chromium.org, Alexei Svitkine, Gabriel Charette
I don't think hand-selecting params to cache will age well, so I think #3 is not meaningfully different from the status quo.

I don't have a good feel on whether #2 is easy to do or how much it would save, but it seems like a nice idea if it still gets you the wins we're hoping for.

Ultimately, I think those CC'ed would have a better idea of what's best than the binary size team.

--
You received this message because you are subscribed to the Google Groups "binary-size" group.
To unsubscribe from this group and stop receiving emails from it, send an email to binary-size...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/binary-size/CAFWCB1kLKUdA6-F4kU-ruPLuRA-Zg9tZnMCQorZwTivBCdPD-g%40mail.gmail.com.

Takashi Toyoshima

unread,
Aug 8, 2024, 9:59:40 PM8/8/24
to Andrew Grieve, binar...@chromium.org, anth...@chromium.org, Alexei Svitkine, Gabriel Charette, Scott Violet
On Fri, Aug 9, 2024 at 10:46 AM Andrew Grieve <agr...@chromium.org> wrote:
I don't think hand-selecting params to cache will age well, so I think #3 is not meaningfully different from the status quo.

I don't have a good feel on whether #2 is easy to do or how much it would save, but it seems like a nice idea if it still gets you the wins we're hoping for.

Ultimately, I think those CC'ed would have a better idea of what's best than the binary size team.

OK, so let me also invite sky@ to the explicit CC list.
He drives the performance efforts and will know well how to interpret the benchmark result.

On Thu, Aug 8, 2024 at 9:25 PM Takashi Toyoshima <toyo...@chromium.org> wrote:
Hi, binary-size members!

I'm now working on a CL that introduces macros to provide a function local static cache for base::FeatureParam<T>.

In my POC, https://chromium-review.googlesource.com/c/chromium/src/+/5557049/63, it shows 1-5ms speed up on each speedometer3 benchmark, but results in 14.5 KiB APK size increase.

Is this acceptable, or should we consider applying the cache only for effective cases to minimize the binary size impact?

Currently, I think 3 choices;
1) enforce the cache to all use cases; 14.5KiB increase
2) apply the cache only to commonly effective cases, i.e. just exclude std::string cases
3) apply the cache manually to minimum cases

1) is most beneficial from the viewpoint of performance, but it would be a balance between the gain vs binary size. I want to decide the best choice in this mail list.


Thanks.


--
Takashi Toyoshima
Software Engineer, Google

--
You received this message because you are subscribed to the Google Groups "binary-size" group.
To unsubscribe from this group and stop receiving emails from it, send an email to binary-size...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/binary-size/CAFWCB1kLKUdA6-F4kU-ruPLuRA-Zg9tZnMCQorZwTivBCdPD-g%40mail.gmail.com.

Takashi Toyoshima

unread,
Aug 8, 2024, 10:27:38 PM8/8/24
to Andrew Grieve, binar...@chromium.org, anth...@chromium.org, Alexei Svitkine, Gabriel Charette, Scott Violet
I also put another data point into the design doc.
In my attempt, 135 FeatureParams were adopted.
So, roughly said the cost is extra 100 bytes per FeatureParam instance with a cache.

When we implement a feature, we need more binary size to implement the feature itself, and extra 100 bytes would not dominate the binary size increase in the long run. That's my expectation.

Thanks,

Scott Violet

unread,
Aug 9, 2024, 1:27:57 PM8/9/24
to Takashi Toyoshima, Andrew Grieve, binar...@chromium.org, anth...@chromium.org, Alexei Svitkine, Gabriel Charette
On Thu, Aug 8, 2024 at 6:59 PM Takashi Toyoshima <toyo...@chromium.org> wrote:


On Fri, Aug 9, 2024 at 10:46 AM Andrew Grieve <agr...@chromium.org> wrote:
I don't think hand-selecting params to cache will age well, so I think #3 is not meaningfully different from the status quo.

I don't have a good feel on whether #2 is easy to do or how much it would save, but it seems like a nice idea if it still gets you the wins we're hoping for.

Ultimately, I think those CC'ed would have a better idea of what's best than the binary size team.

OK, so let me also invite sky@ to the explicit CC list.
He drives the performance efforts and will know well how to interpret the benchmark result.

I ran a pinpoint for the change using the mac bot here: https://pinpoint-dot-chromeperf.appspot.com/job/167a6624210000 . It is not showing the gains shown on the android bot. For the competitive benchmark effort we're largely focused on the mac, and since the patch didn't move the needle on the mac, the decision is largely up to the android owners. That said, I believe we have an OKR of improving speedometer on large screen android devices. If this patch impacts large screen android in a similar way, then it may be worth pursuing based on that OKR.

  -Scott

Alexei Svitkine

unread,
Aug 9, 2024, 5:07:34 PM8/9/24
to Scott Violet, Takashi Toyoshima, Andrew Grieve, binar...@chromium.org, anth...@chromium.org, Gabriel Charette
I strongly feel that if we do add this, it should be the "2) apply the cache only to commonly effective cases, i.e. just exclude std::string cases".

The reason is that caching these values also incurs additional memory use, not just binary size, and some string params can be quite large (e.g. a few KB) and may not benefit from any caching since they might just be parsed once.

Takashi Toyoshima

unread,
Aug 15, 2024, 10:40:53 AM8/15/24
to Alexei Svitkine, Scott Violet, Andrew Grieve, binar...@chromium.org, anth...@chromium.org, Gabriel Charette
Hi, sorry for a blank period, but now I'm back from a vacation.

> Scott,
The last patchset already excluded some params. So, I ran the benchmark set with multiple configurations for comparison.

This applies the cache to most params in Blink, but except for enum cases
APK Size: +19.1KiB

M1 mac: 1.04ms improvement in total (20 cases), 0.046 point score up
Android: 18.11ms improvement in total (20 cases), 0.021 point score up

So, the M1 mac gain of the running time is much smaller than the Android case as you said.
But it seems the score has increased more on mac (though I'm not sure why, and how the score is calculated).

Also, our team's OKR focuses also on Android. Actually we use a mid-range Android phone as a reference.
So, for my team, this is still a big win, and I want to seek a common ground to land it.

> Alexei
Here is another try, to apply the cache to major params in Blink except for enum and std::string cases
APK Size: +13.7KiB
Android try jos is still running

APK Size diff seems to be affected only by applied params count and it matches my assumption.
But as you said, runtime memory use would be different.
If you feel the APK Size diff is OK, but have a concern on runtime memory use, I think we can run a finch experiment as std::string would not be initialized until the first cache code run, and the memory use can be gated by the finch.

Or if the perf result is almost the same with the previous one, just excluding std::string cases would be the best.
It's likely to be true that most std::string use cases call it only once, like they already have a local cache, or parsed the string once. Also they were not costly originally compared to other types that need to be parsed.

Scott Violet

unread,
Aug 15, 2024, 12:26:01 PM8/15/24
to Takashi Toyoshima, Alexei Svitkine, Andrew Grieve, binar...@chromium.org, anth...@chromium.org, Gabriel Charette
On Thu, Aug 15, 2024 at 7:40 AM Takashi Toyoshima <toyo...@chromium.org> wrote:
Hi, sorry for a blank period, but now I'm back from a vacation.

Welcome back. I hope you enjoyed your vacation. 

> Scott,
The last patchset already excluded some params. So, I ran the benchmark set with multiple configurations for comparison.

This applies the cache to most params in Blink, but except for enum cases
APK Size: +19.1KiB

M1 mac: 1.04ms improvement in total (20 cases), 0.046 point score up
Android: 18.11ms improvement in total (20 cases), 0.021 point score up

So, the M1 mac gain of the running time is much smaller than the Android case as you said.
But it seems the score has increased more on mac (though I'm not sure why, and how the score is calculated).

Also, our team's OKR focuses also on Android. Actually we use a mid-range Android phone as a reference.
So, for my team, this is still a big win, and I want to seek a common ground to land it.

I'm glad to hear this is a win for both. Yay! My primary focus is on the mac side of things. I definitely appreciate any gain (at the moment we are behind on Mac, so any gain is most definitely appreciated). That said, we have to be mindful of other priorities as well. I believe agrieve is the decision maker for android binary size and can better weigh on if the increase in size is acceptable.

  -Scott

Takashi Toyoshima

unread,
Aug 16, 2024, 12:53:18 AM8/16/24
to Scott Violet, Alexei Svitkine, Andrew Grieve, binar...@chromium.org, anth...@chromium.org, Gabriel Charette
Thank you for your comment, Scott!

Andrew, can you make the final decision, based on our performance discussion?
I'd like to try the 2) based approach.

The result of no string case shows less gains. So, I feel I need to take a look at each case because this result indicates a few cases may affect performance a lot.

I will land the infra change first, then continue to land follow-up CLs that enable the cache for manually selected tens of effective cases per CL. I will track the APK size diff for each CL, and pay attention to the total diff of the series of CLs.

Andrew Grieve

unread,
Aug 16, 2024, 9:32:29 AM8/16/24
to Takashi Toyoshima, Scott Violet, Alexei Svitkine, Andrew Grieve, binar...@chromium.org, anth...@chromium.org, Gabriel Charette
Yes, the APK size gain is fine. Hurray for faster!

Takashi Toyoshima

unread,
Aug 19, 2024, 2:55:32 AM8/19/24
to Andrew Grieve, Scott Violet, Alexei Svitkine, binar...@chromium.org, anth...@chromium.org, Gabriel Charette
Thanks.

I'm landing the base change.

Also, I created this bug entry to track all related works that will enable the cache for chosen params.
Reply all
Reply to author
Forward
0 new messages