SECURITY_DCHECK to SECURITY_CHECK in casting.h ?

304 views
Skip to first unread message

Daniel Vogelheim

unread,
Feb 15, 2023, 7:58:08 AM2/15/23
to Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough
Hello, friends & owners of platform/wtf/casting.h,

I help out in Chrome's VRP program, and we noticed a steady stream of memory corruption bugs that trip over the SECURITY_DCHECKs in casting.h (e.g. in ASAN). In release builds, execution would continue since these are DCHECKs. That makes many of these bugs exploitable.

In 2022 alone, the VRP rewarded 62.5 k$ to 16 memory corruption bugs that the checks in casting.h would have caught. There's 3 fresh bugs for 2023 already.

I wonder why these are DCHECK in the first place, and whether this could be turned into a proper CHECK. If these were to crash the tab in release builds, the issues would become unexploitable, or certainly much more difficult to exploit. Also, the issues would gain more visibility in crash reports, making it more likely that the underlying root cause would get fixed. The default reason for DCHECK vs CHECK seems to be performance concerns. A trial CL shows a low but measurable effect.

I am not sure how to weigh this. Would it make sense to upgrade the two checks in casting.h to SECURITY_CHECKs? 

Sincerely,
Daniel



dan...@chromium.org

unread,
Feb 15, 2023, 9:12:11 AM2/15/23
to Daniel Vogelheim, Peter Boström, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough
+Peter Boström

In general we are shifting our take on DCHECKs, and leaning toward CHECKs, though wholesale transition without care risks stability problems. Not that it means we should not convert DCHECKs to CHECKs but that we want to put effort into finding the mistakes they would catch before they catch them in stable.

Some very good context here: https://www.youtube.com/watch?v=YDCNICYiq08

So, yes, it would make these CHECKs. It would be worth attempting to flush out bugs and fix them first with DumpWithoutCrashing. You can sync up with pbos@ for how to go about this cleanly.


--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CALG6KPPDK7gDaguRaMvRW%2B_NuYPQWL%2Boq-qrC0aO7H3pZqzwuQ%40mail.gmail.com.

Peter Boström

unread,
Feb 15, 2023, 9:23:42 AM2/15/23
to dan...@chromium.org, Daniel Vogelheim, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough
It's hard to tell without more blink expertise, but it looks to me like that code may be a hotspot and if so then checking IsA<T> may be a performance issue as it doesn't trivially look trivial.

I don't think I've seen that ::To signature show up ever in our DCHECK builds, so I would assume it's fairly rare and that the slower DumpWithoutCrashing rollout would be unnecessary. If I'm wrong then hand-wavy prefer crash over the UB just about to happen?

If you can show that we can take the performance hit or that this doesn't move the benchmarks, then I think you should just replace this with a SECURITY_CHECK without the flush-out attempt.

Peter Boström

unread,
Feb 15, 2023, 9:32:01 AM2/15/23
to dan...@chromium.org, Daniel Vogelheim, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough
Sorry I only read Dana's email not your original email. I'd run a Speedometer job against the M1 bots as well, with ~100 repeat count. That CL shows a 1.2% increase in Speedometer runtime, which if it holds over more repeats is seen as significant. I've had CLs reverted in that range.

"The regression was ~1% (maybe a bit less) on the overall score of Speedometer2. That's pretty substantial and usually takes up many person weeks to recover"

Not sure about the security gains here. This also rules out rolling this out with base::debug::DumpWithoutCrashing() which has at least comparable performance overhead to CHECK.

dan...@chromium.org

unread,
Feb 15, 2023, 9:43:16 AM2/15/23
to Peter Boström, Daniel Vogelheim, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough
If the performance is an issue we can consider introducing two paths.

A commonly used path that does CHECK and a clearly-marked dangerous path that does DCHECK and is only used in places that are extremely hot. And we should then make effort to encapsulate the pointers that go through the DCHECK path such that it's possible to actually reason thru their types.

Daniel Vogelheim

unread,
Feb 15, 2023, 9:46:02 AM2/15/23
to Peter Boström, dan...@chromium.org, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough
On Wed, Feb 15, 2023 at 3:31 PM Peter Boström <pb...@chromium.org> wrote:
Sorry I only read Dana's email not your original email. I'd run a Speedometer job against the M1 bots as well, with ~100 repeat count.

In progress. I'll report back when I have results.

Peter Boström

unread,
Feb 15, 2023, 9:49:31 AM2/15/23
to dan...@chromium.org, Daniel Vogelheim, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough
That seems reasonable to me (and way better than Just Giving Up). ::ToUnsafe<T> would not be bad naming if you need ideas.

Daniel Vogelheim

unread,
Feb 15, 2023, 1:02:03 PM2/15/23
to Peter Boström, dan...@chromium.org, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough

The perf delta is uneven across platforms: M1: ~0.3%; x64/Win10: ~1.5%

Peter Boström

unread,
Feb 15, 2023, 1:08:46 PM2/15/23
to Daniel Vogelheim, dan...@chromium.org, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough
I'd say the ~1.5% regression is probably significant enough that you want to figure out what's up and see if you can have an unsafe version of ::To that only DCHECKs (if that regression is attributable to only a few call sites that can be well audited). 0.3% is probably "not nothing" either.

Dave Tapuska

unread,
Feb 15, 2023, 1:16:44 PM2/15/23
to Peter Boström, Daniel Vogelheim, dan...@chromium.org, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough
I know this is a public forum, so I'm just discussing in terms of percentages. From my understanding 84% of the bugs were found via clusterfuzz weren't they? So that already has the DCHECK on and enabling this wouldn't change that source of bugs. Leaving roughly ~11% of real bugs coming from external reports. 

dave.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.

dan...@chromium.org

unread,
Feb 15, 2023, 1:30:19 PM2/15/23
to Dave Tapuska, Peter Boström, Daniel Vogelheim, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev, Alex Gough
On Wed, Feb 15, 2023 at 1:16 PM Dave Tapuska <dtap...@chromium.org> wrote:
I know this is a public forum, so I'm just discussing in terms of percentages. From my understanding 84% of the bugs were found via clusterfuzz weren't they? So that already has the DCHECK on and enabling this wouldn't change that source of bugs. Leaving roughly ~11% of real bugs coming from external reports. 

I think the real problem is the rest of the bugs that are not represented in either number as they are not known but are being shipped in chromium. The high quantity here suggests there's many others that aren't known.

Alex Gough

unread,
Feb 15, 2023, 1:56:36 PM2/15/23
to dan...@chromium.org, Dave Tapuska, Peter Boström, Daniel Vogelheim, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev
I'm going to push back a bit here on labelling clusterfuzz bugs as less real - those are exactly the ones that well resourced adversaries can also find by fuzzing.

Thanks Daniel for giving this a go - 0.5-1.0% is exactly in the difficult area of "is the security worth the perf" but getting that down a bit should be enough to land some hardening.

Alex

Daniel Cheng

unread,
Feb 15, 2023, 5:19:49 PM2/15/23
to Alex Gough, dan...@chromium.org, Dave Tapuska, Peter Boström, Daniel Vogelheim, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev
To give some context here, I've long wanted to change those to be CHECK instead of DCHECK. The main reason they haven't been changed is because of the potential perf regressions (as noted in this thread).

Ideally, we would profile and see where the hotspots are and relax (hopefully) just a few places to use UnsafeTo<> and then allow To<> to safely CHECK() by default.

Daniel

Dave Tapuska

unread,
Feb 15, 2023, 5:29:12 PM2/15/23
to Daniel Cheng, Alex Gough, dan...@chromium.org, Peter Boström, Daniel Vogelheim, Kent Tamura, Nico Weber, Kentaro Hara, platform-architecture-dev
We should likely change the DynamicTo to use UnsafeTo to avoid two virtual calls to IsA<...> in any prototype. 

I wasn't saying clusterfuzz bugs weren't real, I was just saying changing this wouldn't impact that source of bugs since DCHECK was already on there. But I do agree with Dana that there could be a source of undiscovered bugs that these could catch. 

dave.

Kentaro Hara

unread,
Feb 15, 2023, 6:46:37 PM2/15/23
to Dave Tapuska, Daniel Cheng, Alex Gough, dan...@chromium.org, Peter Boström, Daniel Vogelheim, Kent Tamura, Nico Weber, platform-architecture-dev
My bet:

Replace DCHECK with CHECK in all directories other than core/{dom,css,style,html}, and measure performance :D


--
Kentaro Hara, Tokyo

Daniel Vogelheim

unread,
Feb 16, 2023, 7:44:04 AM2/16/23
to Kentaro Hara, Dave Tapuska, Daniel Cheng, Alex Gough, dan...@chromium.org, Peter Boström, Kent Tamura, Nico Weber, platform-architecture-dev
Hi all,

I was going to take Kentaro's bet, but then noticed that the two To<.> variants in question have around 300 usages each, but that very many of them are in generated code. So my first try is allowing unsafe use from code generators only. (Patch set 2 of 4249403.) I'll report back when I have results.

In the meantime: Would anyone have an explanation why the performance penalty would be greater on one platform vs another? If I take the pinpoint numbers at face value, then the checks are 5x more expensive on x64/Win vs M1/Mac (+1.5% vs +0.3% increase; relative to the base performance). I find that rather strange.

Daniel

Daniel Cheng

unread,
Feb 16, 2023, 8:15:14 AM2/16/23
to Daniel Vogelheim, Kentaro Hara, Dave Tapuska, Alex Gough, dan...@chromium.org, Peter Boström, Kent Tamura, Nico Weber, platform-architecture-dev
On Thu, 16 Feb 2023 at 04:44, Daniel Vogelheim <voge...@google.com> wrote:
Hi all,

I was going to take Kentaro's bet, but then noticed that the two To<.> variants in question have around 300 usages each, but that very many of them are in generated code. So my first try is allowing unsafe use from code generators only. (Patch set 2 of 4249403.) I'll report back when I have results.

This sounds like a reasonable approach to me. I would actually feel bad about excluding so much non-generated code otherwise, because I'd wonder how many type confusion bugs we've had in those directories given their heavy use of our manual RTTI hierarchy.
 

In the meantime: Would anyone have an explanation why the performance penalty would be greater on one platform vs another? If I take the pinpoint numbers at face value, then the checks are 5x more expensive on x64/Win vs M1/Mac (+1.5% vs +0.3% increase; relative to the base performance). I find that rather strange.

I don't know exactly the underlying reasons, but others have made observations along similar lines: when running Meet and sharing a window, benchmarks on non-M1 platforms have a fairly sizable drop in score while a M1 Pro was apparently minimally affected.

K. Moon

unread,
Feb 16, 2023, 11:44:44 AM2/16/23
to Daniel Cheng, Daniel Vogelheim, Kentaro Hara, Dave Tapuska, Alex Gough, dan...@chromium.org, Peter Boström, Kent Tamura, Nico Weber, platform-architecture-dev
One guess: M1 has a pretty highly optimized memory system. One source claims this is due to a high level of parallelism in issuing memory accesses:
https://lemire.me/blog/2021/01/06/memory-access-on-the-apple-m1-processor/

Daniel Vogelheim

unread,
Feb 16, 2023, 1:32:12 PM2/16/23
to Kentaro Hara, Dave Tapuska, Daniel Cheng, Alex Gough, dan...@chromium.org, Peter Boström, Kent Tamura, Nico Weber, platform-architecture-dev
On Thu, Feb 16, 2023 at 1:43 PM Daniel Vogelheim <voge...@google.com> wrote:
I was going to take Kentaro's bet, but then noticed that the two To<.> variants in question have around 300 usages each, but that very many of them are in generated code. So my first try is allowing unsafe use from code generators only. (Patch set 2 of 4249403.) I'll report back when I have results.


So about 1/3 of the cost was generated code.

I found some more generated usages and am running those, too, but I'm not expecting those to move the needle by a lot.

Michael Lippautz

unread,
Feb 20, 2023, 5:00:47 AM2/20/23
to Daniel Vogelheim, Kentaro Hara, Dave Tapuska, Daniel Cheng, Alex Gough, dan...@chromium.org, Peter Boström, Kent Tamura, Nico Weber, platform-architecture-dev
Thanks for checking performance on these changes upfront. 

You can use crossbench to find the hot paths that are affected by e.g. Speedometer2. The tool can spit out a pprof profile, similar to this one (sorry Google-only), which you can nicely search for bottlenecks. This can give you hints for where to place exceptions for hot paths in the DOM. Just make sure to compile with `symbol_level=2` to have symbols for inlined frames.

A regression for more usage of `IsA<>()` is not surprising to me as these already show up in the profile today as something that we spend significant time in.

-Michael 

Daniel Vogelheim

unread,
Feb 21, 2023, 7:46:58 AM2/21/23
to Michael Lippautz, Kentaro Hara, Dave Tapuska, Daniel Cheng, Alex Gough, dan...@chromium.org, Peter Boström, Kent Tamura, Nico Weber, platform-architecture-dev
On Mon, Feb 20, 2023 at 11:00 AM Michael Lippautz <mlip...@chromium.org> wrote:
Thanks for checking performance on these changes upfront. 

You can use crossbench to find the hot paths that are affected by e.g. Speedometer2. The tool can spit out a pprof profile, similar to this one (sorry Google-only), which you can nicely search for bottlenecks. This can give you hints for where to place exceptions for hot paths in the DOM. Just make sure to compile with `symbol_level=2` to have symbols for inlined frames.

A regression for more usage of `IsA<>()` is not surprising to me as these already show up in the profile today as something that we spend significant time in.

Thanks, I didn't know about crossbench. 
A cursory look at the profile suggests there's a handful of large users. I tried excluding those from the checks, but that hasn't moved the needle. (Still at ~+1% for x64/Win.)

I think I've about exhausted the easy options. The initial VRP analysis suggests to me that solving this would be worth several SWE-months. I'll bring this up with my manager (currently ooo) to see whether my team has an interest in taking this on.

Will Harris

unread,
May 23, 2023, 5:17:56 PM5/23/23
to platform-architecture-dev, Daniel Vogelheim, Kentaro Hara, Dave Tapuska, Daniel Cheng, Alex Gough, danakj, Peter Boström, Kent Tamura, Nico Weber, platform-architecture-dev, Michael Lippautz
this same SECURITY_DCHECK would have caught another security bug see https://bugs.chromium.org/p/chromium/issues/detail?id=1448032

Given we've made quite a few improvements in perf through other projects, can we "spend" some of these "perfcoins" and change the occurrence in casting.h to a CHECK?


I think this would be a meaningful security improvement. Who needs to approve this sort of change?

Will 

On Tuesday, February 21, 2023 at 4:46:58 AM UTC-8 Daniel Vogelheim wrote:
On Mon, Feb 20, 2023 at 11:00 AM Michael Lippautz <mlip...@chromium.org> wrote:
Thanks for checking performance on these changes upfront. 

You can use crossbench to find the hot paths that are affected by e.g. Speedometer2. The tool can spit out a pprof profile, similar to this one (sorry Google-only), which you can nicely search for bottlenecks. This can give you hints for where to place exceptions for hot paths in the DOM. Just make sure to compile with `symbol_level=2` to have symbols for inlined frames.

A regression for more usage of `IsA<>()` is not surprising to me as these already show up in the profile today as something that we spend significant time in.

Thanks, I didn't know about crossbench. 
A cursory look at the profile suggests there's a handful of large users. I tried excluding those from the checks, but that hasn't moved the needle. (Still at ~+1% for x64/Win.)

I think I've about exhausted the easy options. The initial VRP analysis suggests to me that solving this would be worth several SWE-months. I'll bring this up with my manager (currently ooo) to see whether my team has an interest in taking this on.

Daniel


dave.

To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.


--
Kentaro Hara, Tokyo

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.

dan...@chromium.org

unread,
May 24, 2023, 11:41:22 AM5/24/23
to Will Harris, platform-architecture-dev, Daniel Vogelheim, Kentaro Hara, Dave Tapuska, Daniel Cheng, Alex Gough, Peter Boström, Kent Tamura, Nico Weber, Michael Lippautz
On Tue, May 23, 2023 at 5:18 PM Will Harris <w...@chromium.org> wrote:
this same SECURITY_DCHECK would have caught another security bug see https://bugs.chromium.org/p/chromium/issues/detail?id=1448032

Given we've made quite a few improvements in perf through other projects, can we "spend" some of these "perfcoins" and change the occurrence in casting.h to a CHECK?


I think this would be a meaningful security improvement. Who needs to approve this sort of change?

There is no one who can approve it, which is a problem.

We keep having security bugs so we should feel empowered to do this IMO.

Preventing security bugs are of huge importance to our product, for each bug that happens we should determine a way to prevent that same bug from ever happening again in another place. This is our way to do it for these casts, and we don't know any better way.
 

Will 

On Tuesday, February 21, 2023 at 4:46:58 AM UTC-8 Daniel Vogelheim wrote:
On Mon, Feb 20, 2023 at 11:00 AM Michael Lippautz <mlip...@chromium.org> wrote:
Thanks for checking performance on these changes upfront. 

You can use crossbench to find the hot paths that are affected by e.g. Speedometer2. The tool can spit out a pprof profile, similar to this one (sorry Google-only), which you can nicely search for bottlenecks. This can give you hints for where to place exceptions for hot paths in the DOM. Just make sure to compile with `symbol_level=2` to have symbols for inlined frames.

A regression for more usage of `IsA<>()` is not surprising to me as these already show up in the profile today as something that we spend significant time in.

Thanks, I didn't know about crossbench. 
A cursory look at the profile suggests there's a handful of large users. I tried excluding those from the checks, but that hasn't moved the needle. (Still at ~+1% for x64/Win.)

I think I've about exhausted the easy options. The initial VRP analysis suggests to me that solving this would be worth several SWE-months. I'll bring this up with my manager (currently ooo) to see whether my team has an interest in taking this on.

Daniel


dave.

To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.


--
Kentaro Hara, Tokyo

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.

Peter Boström

unread,
May 24, 2023, 12:19:31 PM5/24/23
to dan...@chromium.org, Will Harris, platform-architecture-dev, Daniel Vogelheim, Kentaro Hara, Dave Tapuska, Daniel Cheng, Alex Gough, Kent Tamura, Nico Weber, Michael Lippautz
This seems to me like either an ATL question or something that ATL should be able to help with an approval path for.

Will Harris

unread,
May 24, 2023, 1:18:06 PM5/24/23
to platform-architecture-dev, Peter Boström, Will Harris, platform-architecture-dev, Daniel Vogelheim, Kentaro Hara, Dave Tapuska, Daniel Cheng, Alex Gough, Kent Tamura, Nico Weber, Michael Lippautz, danakj
Okay that seems reasonable. I put a CL up for review and sent it to rbyers@ who seems like the ATL most responsible for web platform.

Thanks,

Will

Will 


Daniel


dave.

To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.


--
Kentaro Hara, Tokyo

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.

Rick Byers

unread,
May 24, 2023, 2:03:31 PM5/24/23
to platform-architecture-dev, Will Harris, Peter Boström, platform-architecture-dev, Daniel Vogelheim, Kentaro Hara, Dave Tapuska, Daniel Cheng, Alex Gough, Kent Tamura, Nico Weber, Michael Lippautz, danakj, Scott Violet, chrome-atls-discuss, Mike Wittman
Hey folks,
I agree with the principle that we should be willing to trade some perf for concrete security wins (as we obviously do with sandboxing etc.). I find the security argument here pretty compelling given the VRP payouts and I like that metric as a concrete way to get a rough measure of security value to be traded off against other sorts of value.

But to do that tradeoff properly I think we need to dive a little deeper into the perf costs. +Scott has asked for updated pinpoint jobs. Scott, once we have that, can you own putting the perf costs in perspective (eg. relative to the benchmark gains y'all have gotten)? I wonder if we also need to try to do some analysis of in-the-wild perf costs too? Eg. if this meaningfully regresses LCP for a non-trivial fraction of the web then that would be a tough cost to eat also. +wittman for any thoughts. Any chance we could finch this, even though that would also add a bunch of overhead? Or is a binary A/B test the only practical option?

Also do we know if anyone has gone searching for micro-optimization opportunities in the IsA path yet? Best case we find some way to speed that up which more than offsets the cost here.

Rick