Hi blink-dev,We all love asserts; they make our assumptions explicit and get the machine to check them for us. However, IMO, our current approach or using ASSERT/DCHECK is helping us find only the most blatant of bugs since they're basically only checked on our own developer binaries and tests. I propose we start using release asserts more aggressively throughout our code. This will help us know when our assumptions are untrue and make it easier to fix issues. It has a number of advantages:1. Crash early - A crash at an assert is much easier to debug and less likely to cause unexpected behavior down the line. I'm currently debugging a crash that started off as a mysterious access violation. It took several hours of poking at disassembly and minidumps to figure out what's actually wrong. That pointed me to an ASSERT which I turned into a RELEASE_ASSERT in canary triggering the problem. Having a crash at the assert would have saved me hours of work and is a much clearer signal of what's wrong.
2. More visibility - A violated release assert will show up in the crash dashboard and is likely to get attention. If we're violating an assertion, it's likely we're going to crash at some point anyway (or worse, do something unexpected and compromise security) so we're not losing much. At least now the crashes will be grouped together and we'll know that ASSERT is being violated.3. Easier assignment - It's hard to assign blame to a random stack trace. It's much easier to find who added a triggering assert. At the least, that person will have context into what's wrong.4. Confidence in our code - I'm trying to find the root cause of the crash mentioned above and I'm looking very suspiciously at the surrounding ASSERTs. If these were included in release builds I'd have much more confidence in what can and can't happen. More productive debugging means more crashes fixed means better stability for our users.There's some potential drawbacks I can think of:Performance: Obviously, we don't want to slow down our release builds. This should only apply to simple, quick checks. Anything that iterates over a tree or similar should stay debug only.Increased end-user crashes: These could potentially trip for end users. I suspect in many cases we'd crash somewhere else instead anyway; however, to mitigate this we would deal with it as we do regular crashes - fixing them aggressively in pre-stable channels or, barring that, a failing release assert can be quickly demoted to a debug assert with a FIXME added and a bug filed. At least now we know that an ASSERT is potentially invalid. I don't think we're any worse off.What do you think Blink-dev?
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
We already have RELEASE_ASSERTs in cases where it's a disaster/security bug to fail the assert, or we really want to know the cause and stamp it out. I think it's ok to add some more of them if they do not affect performance, but it's still a good idea to limit their use to important invariants.
On Mon, Oct 5, 2015 at 7:40 AM, David Bokan <bo...@chromium.org> wrote:1. Crash early - A crash at an assert is much easier to debug and less likely to cause unexpected behavior down the line. I'm currently debugging a crash that started off as a mysterious access violation. It took several hours of poking at disassembly and minidumps to figure out what's actually wrong. That pointed me to an ASSERT which I turned into a RELEASE_ASSERT in canary triggering the problem. Having a crash at the assert would have saved me hours of work and is a much clearer signal of what's wrong.I usually compile the debug version when debugging, for this reason. There is also a way to compile release mode with asserts, which will have some performance advantage if that is needed.
On Monday, October 5, 2015 at 12:27:33 PM UTC-4, Chris Harrelson wrote:We already have RELEASE_ASSERTs in cases where it's a disaster/security bug to fail the assert, or we really want to know the cause and stamp it out. I think it's ok to add some more of them if they do not affect performance, but it's still a good idea to limit their use to important invariants.My argument here is that we should change our practices/culture such that if we're inserting a new assert the default should be to make it release unless there's a good reason not to (e.g. performance hit). As complex as Blink is, invalid debug-only asserts in edge cases aren't likely to be tripped so at best they're adding little value, at worst they're misleading. Flipping our existing debug asserts to release is probably impractical since I expect we'd get swamped with crash reports. However, starting to use release asserts in new CLs should be straightforward since we'd find out about invalid asserts via crash reports on dev/canary/beta channels and can fix them quickly, rather than a contextless top-crasher a few months later.
On Mon, Oct 5, 2015 at 7:40 AM, David Bokan <bo...@chromium.org> wrote:1. Crash early - A crash at an assert is much easier to debug and less likely to cause unexpected behavior down the line. I'm currently debugging a crash that started off as a mysterious access violation. It took several hours of poking at disassembly and minidumps to figure out what's actually wrong. That pointed me to an ASSERT which I turned into a RELEASE_ASSERT in canary triggering the problem. Having a crash at the assert would have saved me hours of work and is a much clearer signal of what's wrong.I usually compile the debug version when debugging, for this reason. There is also a way to compile release mode with asserts, which will have some performance advantage if that is needed.Right, but we'll never catch more than the mainline issues this way since we have <1000 (<100?) people running this way, following very specific usage patterns as they test their specific features. Things have gotten a bit better recently, but I've been put off running Debug mode in the past since I'd be tripping over various unrelated ASSERTs just trying to navigate to test my feature.
-David
On Mon, Oct 5, 2015 at 10:36 AM, David Bokan <bo...@chromium.org> wrote:On Monday, October 5, 2015 at 12:27:33 PM UTC-4, Chris Harrelson wrote:We already have RELEASE_ASSERTs in cases where it's a disaster/security bug to fail the assert, or we really want to know the cause and stamp it out. I think it's ok to add some more of them if they do not affect performance, but it's still a good idea to limit their use to important invariants.My argument here is that we should change our practices/culture such that if we're inserting a new assert the default should be to make it release unless there's a good reason not to (e.g. performance hit). As complex as Blink is, invalid debug-only asserts in edge cases aren't likely to be tripped so at best they're adding little value, at worst they're misleading. Flipping our existing debug asserts to release is probably impractical since I expect we'd get swamped with crash reports. However, starting to use release asserts in new CLs should be straightforward since we'd find out about invalid asserts via crash reports on dev/canary/beta channels and can fix them quickly, rather than a contextless top-crasher a few months later.Chromium has a documented policy for DCHECK vs CHECKs : http://dev.chromium.org/developers/coding-style#TOC-CHECK-DCHECK-and-NOTREACHED-Is there a blink-specific reason to deviate from that policy?I haven't studied the cost of a RELEASE_ASSERT in blink, but for Chromium, the cost is actually a lot more than "just an if test":- the generated code is very non-trivial. typically ~10 assembly instructions plus the actual test. About 40 bytes of .text (plus the test) per CHECK on x86-64. This makes the code bigger, and has a direct effect on i-cache effectiveness.- the function calls in the generated code prevent compiler reordering, which can limit optimizations.- the CHECK makes a string that ends up in .rodata. Typically a few 10-20 of bytes, but can be a lot more for more complex tests.In isolation this seems a small price to pay, but in aggregate over the code base, it becomes measurable. With the policy, there are ~6000 CHECK and ~55000 DCHECK of some kind in the codebase (rough measurements excluding tests). If DCHECKs are turned on on Chrome, .text increases by 11% (22.3 to 24.8MB), and .rodata by 7% (5.6 to 6.0MB). This is close to the cost of a on-by-default CHECKs policy. Death by a 1000 cuts.On Mon, Oct 5, 2015 at 7:40 AM, David Bokan <bo...@chromium.org> wrote:1. Crash early - A crash at an assert is much easier to debug and less likely to cause unexpected behavior down the line. I'm currently debugging a crash that started off as a mysterious access violation. It took several hours of poking at disassembly and minidumps to figure out what's actually wrong. That pointed me to an ASSERT which I turned into a RELEASE_ASSERT in canary triggering the problem. Having a crash at the assert would have saved me hours of work and is a much clearer signal of what's wrong.I usually compile the debug version when debugging, for this reason. There is also a way to compile release mode with asserts, which will have some performance advantage if that is needed.Right, but we'll never catch more than the mainline issues this way since we have <1000 (<100?) people running this way, following very specific usage patterns as they test their specific features. Things have gotten a bit better recently, but I've been put off running Debug mode in the past since I'd be tripping over various unrelated ASSERTs just trying to navigate to test my feature.On the waterfall, we compile Release with dcheck_always_on=1, which forces DCHECKs to be on for tests. I'm sure we can have a similar behavior for blink, to make sure DCHECKs don't become stale because people don't build/run/test in debug.
OTOH, forcing crashes on users (so that we can see them on crash/) is a super heavy hammer, that leads to a poor experience. Not all asserts are created equal. While I agree that some check conditions that would cause other crashes when false, others may lead to much more mundane issues, from invisible (invalid or overeager assert) to things like elements not rendering properly, at which point crashing is the wrong thing to do. We should expect that the product we ship will have bugs, but we should aim at minimizing the impact. Queue in users first / product excellence....I agree that in some cases forcing a crash on canary / dev / beta, and even sometimes stable is the right thing to do to fix a high impact bug. This is allowed by the chromium policy (as a temporary measure).
On Mon, Oct 5, 2015 at 7:40 AM, David Bokan <bo...@chromium.org> wrote:1. Crash early - A crash at an assert is much easier to debug and less likely to cause unexpected behavior down the line. I'm currently debugging a crash that started off as a mysterious access violation. It took several hours of poking at disassembly and minidumps to figure out what's actually wrong. That pointed me to an ASSERT which I turned into a RELEASE_ASSERT in canary triggering the problem. Having a crash at the assert would have saved me hours of work and is a much clearer signal of what's wrong.I usually compile the debug version when debugging, for this reason. There is also a way to compile release mode with asserts, which will have some performance advantage if that is needed.Right, but we'll never catch more than the mainline issues this way since we have <1000 (<100?) people running this way, following very specific usage patterns as they test their specific features. Things have gotten a bit better recently, but I've been put off running Debug mode in the past since I'd be tripping over various unrelated ASSERTs just trying to navigate to test my feature.
Chromium has a documented policy for DCHECK vs CHECKs : http://dev.chromium.org/developers/coding-style#TOC-CHECK-DCHECK-and-NOTREACHED-Is there a blink-specific reason to deviate from that policy?
I haven't studied the cost of a RELEASE_ASSERT in blink, but for Chromium, the cost is actually a lot more than "just an if test":- the generated code is very non-trivial. typically ~10 assembly instructions plus the actual test. About 40 bytes of .text (plus the test) per CHECK on x86-64. This makes the code bigger, and has a direct effect on i-cache effectiveness.- the function calls in the generated code prevent compiler reordering, which can limit optimizations.- the CHECK makes a string that ends up in .rodata. Typically a few 10-20 of bytes, but can be a lot more for more complex tests.In isolation this seems a small price to pay, but in aggregate over the code base, it becomes measurable. With the policy, there are ~6000 CHECK and ~55000 DCHECK of some kind in the codebase (rough measurements excluding tests). If DCHECKs are turned on on Chrome, .text increases by 11% (22.3 to 24.8MB), and .rodata by 7% (5.6 to 6.0MB). This is close to the cost of a on-by-default CHECKs policy. Death by a 1000 cuts.
OTOH, forcing crashes on users (so that we can see them on crash/) is a super heavy hammer, that leads to a poor experience. Not all asserts are created equal. While I agree that some check conditions that would cause other crashes when false, others may lead to much more mundane issues, from invisible (invalid or overeager assert) to things like elements not rendering properly, at which point crashing is the wrong thing to do. We should expect that the product we ship will have bugs, but we should aim at minimizing the impact. Queue in users first / product excellence....
All tests on bots, release or debug, should be running with all assertions enabled. The coverage is thus not just the manual usage of a few hundred developers, it's our whole test coverage. If that's still grossly insufficient to find bugs, then I suggest that implicates our test coverage.In other words, instead of encouraging people to default to release-mode assertions, I think we should encourage people to use debug-only assertions wherever appropriate, and when adding any, to try to have some form of test that would verify each such assertion.
Yeah, the current policy is you can insert a bunch of RELEASE_ASSERTs in non-Stable to try to find a bug. I'm not sure we should be adding too many more asserts, can you give some specific examples bokan@ ? Naively just switching our current ASSERTs, even if we assume they never trip, has a fairly negative impact on performance and binary size.
On Mon, Oct 5, 2015 at 10:36 AM, David Bokan <bo...@chromium.org> wrote:On Mon, Oct 5, 2015 at 7:40 AM, David Bokan <bo...@chromium.org> wrote:1. Crash early - A crash at an assert is much easier to debug and less likely to cause unexpected behavior down the line. I'm currently debugging a crash that started off as a mysterious access violation. It took several hours of poking at disassembly and minidumps to figure out what's actually wrong. That pointed me to an ASSERT which I turned into a RELEASE_ASSERT in canary triggering the problem. Having a crash at the assert would have saved me hours of work and is a much clearer signal of what's wrong.I usually compile the debug version when debugging, for this reason. There is also a way to compile release mode with asserts, which will have some performance advantage if that is needed.Right, but we'll never catch more than the mainline issues this way since we have <1000 (<100?) people running this way, following very specific usage patterns as they test their specific features. Things have gotten a bit better recently, but I've been put off running Debug mode in the past since I'd be tripping over various unrelated ASSERTs just trying to navigate to test my feature.All tests on bots, release or debug, should be running with all assertions enabled. The coverage is thus not just the manual usage of a few hundred developers, it's our whole test coverage. If that's still grossly insufficient to find bugs, then I suggest that implicates our test coverage.
To be clear, the release trybots are compiled with assertions enabled. The release continuous (waterfall) bots are not, because we want them to run what we ship (and we have debug continuous coverage to catch the assertion failures). Hopefully that's what you meant?
PK
Couldn't we release the binaries for Canary/Dev (and maybe Beta) with dcheck_always_on?That way we would catch more asserts than we currently do without impacting stable.