Getting automated tests for WebRTC

Henrik Skupin

unread,

Nov 19, 2012, 7:43:58 PM11/19/12

to mozilla-...@lists.mozilla.org

Hi all,

There is a lot of work happening on the WebRTC code and I have been
asked a while back to get automated tests written. So our team has
started with some plain Mochitests about a month ago but failed to run
them due to a lot of crash related bugs. As result I have started to get
some crashtests into the tree, but we were never able to enable them due
to leaks in the WebRTC code. Given all those problems and a high demand
for us on the WebAPI project we stepped away for a while.

Now that I'm back on the WebRTC project the situation is probably better
now. So I'm going to analyze various problems we had before to get those
solved and new tests implemented. That said we have to cover two types
of tests which are crashtests and mochitests. As best we want to see
those running on tbpl for mozilla-central builds. But there are some
roadblockers in the way we should try to get fixed over the next couple
of days (or weeks). I think as earlier as better. So I propose the
following:

1. Lets get the crashtests enabled on m-c by fixing all the leaks and
failures: This probably doesn't take that long given that only a single
crashtest is causing leaks at the moment (bug 813339). But I can't tell
how much work is necessary to get this fixed. Also there is one
remaining test failure for Cipc tests (bug 811873). For myself it's most
likely only the latter which is remaining to get investigated.

2. Create more crashtests: Seeing all those crashes in the WebRTC code
we definitely have to create crashtests in time when the patches land.
Otherwise it's getting harder to get them reproduced and to verify their
stability later, because other check-ins could have caused other crashes
meanwhile. I have seen it a couple of times in the last weeks.

3. Create a basic set of mochitests: With more stable code we probably
can create some basic mochitests now. Therefore I would ask Jason to
give me a good list of those tests he most likely has in his mind. Those
should be added to the WebRTC QA page for an easier reference. I will
then pick items, get those filed as bugs, and work on the
implementation. Those I'm currently not working on I will mark as
mentored so our wonderful community can pick up tasks too.

Just some more notes:

- When I find leaks for any type of tests I will investigate those and
get them filed. I wish that we can get those fixed in a short interval.
Would that be possible?

- From my judgement I would create crashtests and mochitests on a 3:1
ratio, which indeed depends on the amount of remaining crashtests to
write or to transform from an already given testcase.

- Keep in mind that those created Mochitests can leak! We will not be
able to check those into mozilla-central until the leak has been fixed.
Means we could land those temporarily on the alder branch so we get at
least coverage across platforms.

- Any mochitest and crashtest we are running gets implemented with faked
media streams. We will never be able to use real devices. That means we
still need manual testcases.

- It would be a great help for me to get a good list of possible
mochitests (as mentioned above). That way I will be able to get started
as soon as possible on their implementation.

Could be that I missed something. So please ask if something is unclear
or needs a discussion. I really would like to see green results on tbpl
for webrtc tests in the near future and a good process is crucial here.

Thanks!

--
Henrik

jsmith....@gmail.com

unread,

Nov 19, 2012, 11:04:14 PM11/19/12

to mozilla....@googlegroups.com, mozilla-...@lists.mozilla.org

Comments inline.

On Monday, November 19, 2012 4:44:08 PM UTC-8, Henrik Skupin wrote:
> Hi all,
>
>
>
> There is a lot of work happening on the WebRTC code and I have been
>
> asked a while back to get automated tests written. So our team has
>
> started with some plain Mochitests about a month ago but failed to run
>
> them due to a lot of crash related bugs. As result I have started to get
>
> some crashtests into the tree, but we were never able to enable them due
>
> to leaks in the WebRTC code. Given all those problems and a high demand
>
> for us on the WebAPI project we stepped away for a while.
>
>
>
> Now that I'm back on the WebRTC project the situation is probably better
>
> now. So I'm going to analyze various problems we had before to get those
>
> solved and new tests implemented. That said we have to cover two types
>
> of tests which are crashtests and mochitests. As best we want to see
>
> those running on tbpl for mozilla-central builds. But there are some
>
> roadblockers in the way we should try to get fixed over the next couple
>
> of days (or weeks). I think as earlier as better. So I propose the
>
> following:
>
>
>
> 1. Lets get the crashtests enabled on m-c by fixing all the leaks and
>
> failures: This probably doesn't take that long given that only a single
>
> crashtest is causing leaks at the moment (bug 813339). But I can't tell
>
> how much work is necessary to get this fixed. Also there is one
>
> remaining test failure for Cipc tests (bug 811873). For myself it's most
>
> likely only the latter which is remaining to get investigated.

That seems like the right first step. I agree.

>
>
>
> 2. Create more crashtests: Seeing all those crashes in the WebRTC code
>
> we definitely have to create crashtests in time when the patches land.
>
> Otherwise it's getting harder to get them reproduced and to verify their
>
> stability later, because other check-ins could have caused other crashes
>
> meanwhile. I have seen it a couple of times in the last weeks.
>

Agreed here as well. In fact, this is probably more critical at this point in the WebRTC code then getting the mochitests up and running for some basic tests due to the point you've made - the crash bug count is quite high and still climbing.

>
>
> 3. Create a basic set of mochitests: With more stable code we probably
>
> can create some basic mochitests now. Therefore I would ask Jason to
>
> give me a good list of those tests he most likely has in his mind. Those
>
> should be added to the WebRTC QA page for an easier reference. I will
>
> then pick items, get those filed as bugs, and work on the
>
> implementation. Those I'm currently not working on I will mark as
>
> mentored so our wonderful community can pick up tasks too.
>

A bunch of this work was already taken care of during the brainstorm Rob and I had a while back. There's already a few bugs already on file on the WebRTC side for tests to get started with. For the tests that weren't filed in bugs, the etherpad here - https://etherpad.mozilla.org/automation-webrtc already calls a bunch of others to go after. There's also the various amount of bugs already on file that could be used probably more functional-based mochitest automation.

For more formal test runs, getUserMedia (not the full stack) has a bunch of ideas that could be turned into automation based off of my defined test cases here - https://wiki.mozilla.org/Platform/Features/Camera_API_-_Phase_2_%28getUserMedia%29/Test_Plan#Test_Cases. For the full stack (e.g. peer connection), the tests I have defined are more smoke tests off of the etherpad + reusing the existing test case pages. I'll plan on expanding that as I go along, but with those docs, that will probably keep you busy for a good while.

>
>
> Just some more notes:
>
>
>
> - When I find leaks for any type of tests I will investigate those and
>
> get them filed. I wish that we can get those fixed in a short interval.
>
> Would that be possible?

I agree it's important to get these leaks fixed. However, there are many "bad" bug types on file right now all competing for development resources such as:

- sec-critical bugs
- crash bugs
- memory leak bugs
- basic functional bustage of the webrtc stack

We need to balance the priority across all of those areas carefully. Probably the right thing to do if you hit a memory leak bug that has you blocked is to clearly raise it in a comment/whiteboard of some sorts to indicate that you are blocked. That'll help indicate that the bug needs a higher priority.

>
>
>
> - From my judgement I would create crashtests and mochitests on a 3:1
>
> ratio, which indeed depends on the amount of remaining crashtests to
>
> write or to transform from an already given testcase.

Hmm...I'd actually put that ratio on the higher end even more for crashtests right now given the amount of crash bugs coming in. That's my opinion though.

>
>
>
> - Keep in mind that those created Mochitests can leak! We will not be
>
> able to check those into mozilla-central until the leak has been fixed.
>
> Means we could land those temporarily on the alder branch so we get at
>
> least coverage across platforms.
>
>
>
> - Any mochitest and crashtest we are running gets implemented with faked
>
> media streams. We will never be able to use real devices. That means we
>
> still need manual testcases.

Right. The getUserMedia pieces are fully covered with a bunch of manual test cases. WebRTC full stack pretty much has some high level basics defined from the brainstorm Rob & I did, although I'll be working out more details as I go along.

>
>
>
> - It would be a great help for me to get a good list of possible
>
> mochitests (as mentioned above). That way I will be able to get started
>
> as soon as possible on their implementation.

See the comment above - take advantage of the existing bugs on file + the brainstorming etherpad.

Eric Rescorla

unread,

Nov 19, 2012, 11:33:08 PM11/19/12

to Henrik Skupin, mozilla-...@lists.mozilla.org

TL;DR:
I think this is almost 180 degrees opposed from the right strategy.
What we need right now is coverage of the things that work so that
when we make changes we know we're not making the situation worse.
*All* other testing work should wait on this. In particular, crashtests
are not useful at this point.

Analysis follows:
Currently, the state of the code can be best described as brittle. The
sunny day cases work and work fairly reliably, but as you deviate from
them, you start to get crashes. This isn't surprising; it's a fairly normal
state of affairs for software that was developed in a hurry with an
emphasis on features rather than testing, which is what this is.

There are two major classes of problems in the code:

- Functional defects (stuff which should work and does not).
- Crashes (i.e., memory errors, whether security relevant or not.)
- Memory and other resource leaks

There are a lot of all of these, and a number of them require fairly
significant refactoring to fix. The result of this is that:

1. It's not going to be possible to *run* automated unit tests for some
time (this is partly due to deficiencies in the testing framework, but you
go to the release with the testing framework you have, not the one you
wish you had.)

2. It's quite common to have multiple reports of problems due to the same
underlying defect. Even when that's not true, we're often going to
accidentally
fix one defect while working on another.

3. It's quite easy to break one piece of functionality while refactoring
another.
We've already had a number of instances on this.

For the reasons above, building crashtests for each problem really isn't
that
useful to development because it's quite likely that those problems will go
away
on their own for reason #2. I'm obviously not saying that one shouldn't
report
the bugs or *eventually* produce a crash test, but it's not useful to us
*now*.

Rather, what is useful is to develop unit tests for the stuff that does
work so
that when we are fixing one thing we don't accidentally break another.I
realize
that these cannot currently be run automatically on m-c, but they can be run
either (a) on a developer's own system or (b) on alder under the more
reslient
test framework Ted is hacking up. It already takes far too long to validate
manually
that your changes haven't broken stuff and we don't have anything like full
coverage in JS tests of the functionality that works.

If we want people not to break the system, it needs to be reasonably easy
to verify that one has not done so.

The nice thing about this level of test is that it's easy to develop.
Indeed, we
already have a number of tests that test one variant manually. All we need
is someone to stamp out a bunch of tests that (a) are automatic and (b)
cover more of the space. If you find bugs doing that, then great. Otherwise,
we have a regression harness and you can move on to crashtests.

-Ekr

On Mon, Nov 19, 2012 at 4:43 PM, Henrik Skupin <hsk...@gmail.com> wrote:

> Hi all,
>
> There is a lot of work happening on the WebRTC code and I have been
> asked a while back to get automated tests written. So our team has
> started with some plain Mochitests about a month ago but failed to run
> them due to a lot of crash related bugs. As result I have started to get
> some crashtests into the tree, but we were never able to enable them due
> to leaks in the WebRTC code. Given all those problems and a high demand
> for us on the WebAPI project we stepped away for a while.
>
> Now that I'm back on the WebRTC project the situation is probably better
> now. So I'm going to analyze various problems we had before to get those
> solved and new tests implemented. That said we have to cover two types
> of tests which are crashtests and mochitests. As best we want to see
> those running on tbpl for mozilla-central builds. But there are some
> roadblockers in the way we should try to get fixed over the next couple
> of days (or weeks). I think as earlier as better. So I propose the
> following:
>
> 1. Lets get the crashtests enabled on m-c by fixing all the leaks and
> failures: This probably doesn't take that long given that only a single
> crashtest is causing leaks at the moment (bug 813339). But I can't tell
> how much work is necessary to get this fixed. Also there is one
> remaining test failure for Cipc tests (bug 811873). For myself it's most
> likely only the latter which is remaining to get investigated.
>

> 2. Create more crashtests: Seeing all those crashes in the WebRTC code
> we definitely have to create crashtests in time when the patches land.
> Otherwise it's getting harder to get them reproduced and to verify their
> stability later, because other check-ins could have caused other crashes
> meanwhile. I have seen it a couple of times in the last weeks.
>

> 3. Create a basic set of mochitests: With more stable code we probably
> can create some basic mochitests now. Therefore I would ask Jason to
> give me a good list of those tests he most likely has in his mind. Those
> should be added to the WebRTC QA page for an easier reference. I will
> then pick items, get those filed as bugs, and work on the
> implementation. Those I'm currently not working on I will mark as
> mentored so our wonderful community can pick up tasks too.
>

> Just some more notes:
>
> - When I find leaks for any type of tests I will investigate those and
> get them filed. I wish that we can get those fixed in a short interval.
> Would that be possible?
>

> - From my judgement I would create crashtests and mochitests on a 3:1
> ratio, which indeed depends on the amount of remaining crashtests to
> write or to transform from an already given testcase.
>

> - Keep in mind that those created Mochitests can leak! We will not be
> able to check those into mozilla-central until the leak has been fixed.
> Means we could land those temporarily on the alder branch so we get at
> least coverage across platforms.
>
> - Any mochitest and crashtest we are running gets implemented with faked
> media streams. We will never be able to use real devices. That means we
> still need manual testcases.
>

> - It would be a great help for me to get a good list of possible
> mochitests (as mentioned above). That way I will be able to get started
> as soon as possible on their implementation.
>
>

> Could be that I missed something. So please ask if something is unclear
> or needs a discussion. I really would like to see green results on tbpl
> for webrtc tests in the near future and a good process is crucial here.
>
> Thanks!
>
> --
> Henrik

> _______________________________________________
> dev-media mailing list
> dev-...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-media
>

Randell Jesup

unread,

Nov 20, 2012, 6:11:43 AM11/20/12

to dev-...@lists.mozilla.org

On 11/19/2012 11:33 PM, Eric Rescorla wrote:
> TL;DR:
> I think this is almost 180 degrees opposed from the right strategy.
> What we need right now is coverage of the things that work so that
> when we make changes we know we're not making the situation worse.
> *All* other testing work should wait on this. In particular, crashtests
> are not useful at this point.

I'll note that "crashtests" is a term being used loosely here.
Crashtests are nothing more than "when I load this nothing crashes",
while mochitests are "when I load this it calls Ok or No Ok"
(effectively). So really, there isn't a lot of difference there beyond
(for most WebRTC things) "we got to the normal end of the test".

The way some people are using "crashtests" is really "we once crashed
here, capture that as a crashtest", with the assumption you're
protecting against that (exact) regression.

In active development of a new feature, tests for real-world(ish) cases
are most important. This is (as ekr points out) that the first order of
things is "don't break existing functionality". Testing for that on each
checkin is far more important than checking if we regressed a minimized
testcase from a specific bug.

Once we have the main functionality tests in place (both "sunny day" and
"normal failure" cases), then more-specific regression tests begin to
rise in importance, though they still should usually be as broad a test
as possible that captures the failure instead of all minimized
testcases. The reason is that for it to have value for the future, it
should not just capture someone undoing the bugfix, but should also
capture any similarly-provoked failures as much as is possible, since
most future errors will not precisely duplicate an old bug, but will
arise from similar stress on the component.

An example might be a bug triggered by having >10 DataChannels open.
The minimized crashtest testcase might stop there, and it *would*
prevent the specific bug from being regressed. But if some new checkin
caused >16 DataChannels to fail, this crashtest wouldn't catch it, so a
good (IMO) test would create an unreasonable number of DataChannels, do
something basic with them to verify they all work, and then return
success. Note that this ends up being a Mochitest, not a crashtest!
So, don't think "this was a crash bug, it needs a crashtest". I would
use crashtests only where there's no reasonable way to report success or
failure. For example, if we changed an API, a crashtest would happily
stay green, while as a mochitest it would flag that we broke the test.

This does mean almost all tests should be mochitests. That's not a bad
thing, it's a good thing! But the bigger takeaway is that tests should
start with functionality (as the feature is useless without that), then
add general stressors (create/destroy fast, create a lot, evil
arrangements, obvious edge cases, etc), then (especially as code matures
and stabilizes) add very specific tests. If you focus on the very
specific tests too early, you end up with a ton of those many of which
no longer are relevant due to architectural changes and refactors (and
may spend a lot of time revising them when an API is still in flux!)
This isn't an absolute, and it doesn't mean that a crash bug doesn't get
morphed into a (mochi)test - many still will, but again those should be
broader tests wherever possible, or it should be worked into an existing
broad test.

That's another point: a few broad tests are a lot simpler to revise when
we have an unstable API than a thousand specific-instance testcases.
Given the W3C (and IETF) are still changing the API, we should be
cautious about this. If a large, generic sunny-day test that hits much
of the API fails, that's generally of (much) more use to me today the
the same set of features being tested in 100 small separate tests, if
for no other reason than the maintenance effect (and there are other
reasons, per above.)

> Analysis follows:
> Currently, the state of the code can be best described as brittle. The
> sunny day cases work and work fairly reliably, but as you deviate from
> them, you start to get crashes. This isn't surprising; it's a fairly normal
> state of affairs for software that was developed in a hurry with an
> emphasis on features rather than testing, which is what this is.
>
> There are two major classes of problems in the code:
>
> - Functional defects (stuff which should work and does not).
> - Crashes (i.e., memory errors, whether security relevant or not.)
> - Memory and other resource leaks
>
> There are a lot of all of these, and a number of them require fairly
> significant refactoring to fix. The result of this is that:
>
> 1. It's not going to be possible to *run* automated unit tests for some
> time (this is partly due to deficiencies in the testing framework, but you
> go to the release with the testing framework you have, not the one you
> wish you had.)
>
> 2. It's quite common to have multiple reports of problems due to the same
> underlying defect. Even when that's not true, we're often going to
> accidentally
> fix one defect while working on another.

Not just accidentally, but often on purpose as we notice a logical hole
that can or has caused a number of similar failures. A perfect example
is the current "big lock" patch, which causes numerous semi-random
failures related to timing.

> 3. It's quite easy to break one piece of functionality while refactoring
> another.
> We've already had a number of instances on this.

Yes; that's my number one concern for tests at this point. We're
checking in code with the only true cross-check being ourselves and
other devs running manual tests and using the feature. While doing that
is good, tests (automated or run-by-hand-but-scripted) would greatly
reduce the "you broke calls" problem.

> Rather, what is useful is to develop unit tests for the stuff that does
> work so
> that when we are fixing one thing we don't accidentally break another.I
> realize
> that these cannot currently be run automatically on m-c, but they can be run
> either (a) on a developer's own system or (b) on alder under the more
> reslient
> test framework Ted is hacking up. It already takes far too long to validate
> manually
> that your changes haven't broken stuff and we don't have anything like full
> coverage in JS tests of the functionality that works.
>
> If we want people not to break the system, it needs to be reasonably easy
> to verify that one has not done so.

Agreed.

> The nice thing about this level of test is that it's easy to develop.
> Indeed, we
> already have a number of tests that test one variant manually. All we need
> is someone to stamp out a bunch of tests that (a) are automatic and (b)
> cover more of the space. If you find bugs doing that, then great. Otherwise,
> we have a regression harness and you can move on to crashtests.

Agreed, and honestly a basic set of such functionality tests should be
fairly quick to develop and provide significant help to the project
quickly. And they don't (in my mind) have to be a lot of specific small
functionality tests. For example, I'd rather have one test that covered
all the basic call scenarios (audio-only, video-only, audio/video,
one-way audio, one-way video, etc) - because they are likely to be
easier to revise if the API changes, and if it fails, we can find out
why quickly. Not saying it has to be that way, but I might prefer that,
especially if I can build a "basic call test framework" I can then
internally automate. See things like the test_URIs xpcshell test for
the URI code where it has a framework and then runs a bazillion
test/edge cases through it, all in one "test".

--
Randell Jesup

Randell Jesup

unread,

Nov 20, 2012, 6:22:47 AM11/20/12

to dev-...@lists.mozilla.org

On 11/20/2012 6:11 AM, Randell Jesup wrote:
> This does mean almost all tests should be mochitests. That's not a
> bad thing, it's a good thing! But the bigger takeaway is that tests
> should start with functionality (as the feature is useless without
> that), then add general stressors (create/destroy fast, create a lot,
> evil arrangements, obvious edge cases, etc), then (especially as code
> matures and stabilizes) add very specific tests.

One addendum: another high-priority test type (perhaps right after broad
functionality tests) are broad touch-all-the-API-points tests, which
again protect us from regressions of the API and serve to help verify
what we've implemented.

--
Randell Jesup

jsmith....@gmail.com

unread,

Nov 21, 2012, 3:28:57 AM11/21/12

to mozilla....@googlegroups.com, dev-...@lists.mozilla.org

>From what I'm getting from Randell and Ekr's feedback, it sounds like getting the smoke tests for the full stack of webrtc is the priority, not the crashtests. I think that makes logical sense as the primary focus area, although I should note that given the high bug rates for crashes, I do think we need to continuously burn through expanding the crashtest automation to protect ourselves carefully from crash regressions. From the QA side, what I'd like to see is a combination of both ideas proposed to achieve the joint balance of effective sunny day automation + protection against crash regressions to achieve the benefits of both areas.

My suggestion I would go for is something along the lines of this:

1. Any work Henrik already started for certain crashtests (since I know he's got a few in queue) - finish those off
2. Henrik shifts focus towards smoke test automation for the full stack, I'll finish off the gUM smoke test automation (which is 90% done).
3. We establish a review policy that any patches that can definitely have a crashtest associated with them has to be included on check-in by the developer building the patch. If the patch comes in without a crashtest that can indeed possibly have one, the patch gets a review- until the test is included. That will balance both validating that the patch actually does work at check-in, but also starts to burn through progressively building out a crashtest regression automation suite.

Does that work? I do understand the concerns Ekr and Randell raise, but I also don't want to take risks on crash regressions too much given the high crash bug rate, even with the feature currently prefed off.

Eric Rescorla

unread,

Nov 21, 2012, 11:52:32 AM11/21/12

to jsmith....@gmail.com, dev-...@lists.mozilla.org, mozilla....@googlegroups.com

On Wed, Nov 21, 2012 at 12:28 AM, <jsmith....@gmail.com> wrote:

> From what I'm getting from Randell and Ekr's feedback, it sounds like
> getting the smoke tests for the full stack of webrtc is the priority, not
> the crashtests. I think that makes logical sense as the primary focus area,
> although I should note that given the high bug rates for crashes, I do
> think we need to continuously burn through expanding the crashtest
> automation to protect ourselves carefully from crash regressions. From the
> QA side, what I'd like to see is a combination of both ideas proposed to
> achieve the joint balance of effective sunny day automation + protection
> against crash regressions to achieve the benefits of both areas.
>
> My suggestion I would go for is something along the lines of this:
>
> 1. Any work Henrik already started for certain crashtests (since I know
> he's got a few in queue) - finish those off
>

I think it depends on how long this takes. We are already suffering from
this now, so if it means that
we won't see a start of functional testing till next week (I'm assuming
that Henrik isn't taking TG off)
then I would prefer to just suspend the crashtests.

2. Henrik shifts focus towards smoke test automation for the full stack,
> I'll finish off the gUM smoke test automation (which is 90% done).
> 3. We establish a review policy that any patches that can definitely have
> a crashtest associated with them has to be included on check-in by the
> developer building the patch. If the patch comes in without a crashtest
> that can indeed possibly have one, the patch gets a review- until the test
> is included. That will balance both validating that the patch actually does
> work at check-in, but also starts to burn through progressively building
> out a crashtest regression automation suite.
>

> I don't really understand what this means. I agree that if a crashtest
exists, it should pass before
we submit the patch. Do you mean something else?

-Ekr

> Does that work? I do understand the concerns Ekr and Randell raise, but I
> also don't want to take risks on crash regressions too much given the high
> crash bug rate, even with the feature currently prefed off.

Randell Jesup

unread,

Nov 21, 2012, 3:33:07 PM11/21/12

to dev-...@lists.mozilla.org

On 11/21/2012 11:52 AM, Eric Rescorla wrote:
> On Wed, Nov 21, 2012 at 12:28 AM, <jsmith....@gmail.com> wrote:
>
>> From what I'm getting from Randell and Ekr's feedback, it sounds like
>> getting the smoke tests for the full stack of webrtc is the priority, not
>> the crashtests. I think that makes logical sense as the primary focus area,
>> although I should note that given the high bug rates for crashes, I do
>> think we need to continuously burn through expanding the crashtest
>> automation to protect ourselves carefully from crash regressions. From the
>> QA side, what I'd like to see is a combination of both ideas proposed to
>> achieve the joint balance of effective sunny day automation + protection
>> against crash regressions to achieve the benefits of both areas.
>>
>> My suggestion I would go for is something along the lines of this:
>>
>> 1. Any work Henrik already started for certain crashtests (since I know
>> he's got a few in queue) - finish those off
>>
> I think it depends on how long this takes. We are already suffering from
> this now, so if it means that
> we won't see a start of functional testing till next week (I'm assuming
> that Henrik isn't taking TG off)
> then I would prefer to just suspend the crashtests.

I don't have a problem with this if it's relatively short (a couple of
days), as you ask.

> 2. Henrik shifts focus towards smoke test automation for the full stack,
>> I'll finish off the gUM smoke test automation (which is 90% done).
>> 3. We establish a review policy that any patches that can definitely have
>> a crashtest associated with them has to be included on check-in by the
>> developer building the patch. If the patch comes in without a crashtest
>> that can indeed possibly have one, the patch gets a review- until the test
>> is included. That will balance both validating that the patch actually does
>> work at check-in, but also starts to burn through progressively building
>> out a crashtest regression automation suite.
>>
> I don't really understand what this means. I agree that if a crashtest exists, it should pass before
> we submit the patch. Do you mean something else?

I do think he means something else. I think he means blocking checkin for any crash-bug patch without an associated crashtest (if it's possible to build one).

At this point in the project, we have *critical* needs for smoketests/mochitests of the full stack, plus some obvious stressors (and the more generic the stressors, generally the better). I think some crash-bug-derived mochitests make sense right now, especially if they cover a fair bit of the stack or cover an important edgecase. True anti-specific-regression crashtests don't show much value to me at this point. Many of them would likely either break quietly or become irrelevant due to API changes or refactorings. Note: many crash bugs are currently coming from fairly "normal" use of the APIs, and so smoketests and sunny-day/common-failure mochitests should do a good job with them, and testcases for them can be turned into or added to those tests.

Where it's easy to write a test, I wouldn't stop anyone (crashtest or not), though I'd suggest mochitests always make more sense if there's any way to know we finished the test (connected a call, captured media, etc). And instead of tightening down the crashtest to the tightest case that makes it fail, I'd widen them to the widest case that we can easily do that triggers the bug, at least currently.

Please re-read my message. That lays out what I think as module owner should be the priorities for tests, especially at this point. I'm open to persuasion, but please indicate what the practical improvement will be relative to what I suggested. Otherwise, that's the way we should proceed.

p.s. a few concrete examples of how I want us to approach test creation:

A very useful mochitest/stressor would be deleting a PeerConnection (or navigating away or reloading) immediately after creation, and repeated with progressively longer periods (run as a modification to a baseline test that starts a loopback call). This would cover a huge range of possible failures and race conditions (and sunny-day paths).

An example of a stressor that is of less practical use (though still important for security reasons perhaps) would be something that quickly creates a very large number of (fake) mediastreams, or peerconnections, datachannels, etc.

An example of a crashtest that's of less use right now would be bug 799191 (we used the number of video devices as the loop limit for looking at audio devices). Yes, we can test this, but this specific bug is unlikely to recur, and we would end up testing for that from here to eternity on every push - and we have more important tests to get in. And it's very likely that other standard smoketests/mochitests/sunny-day tests would cover this path well anyways.

An example of a reasonably useful "crashtest" (which should be coded as a mochitest) would be bug 798873: support for arbitrary-length SDP strings, though I doubt it would regress. This might best be tested as part of a suite of SDP creation/manipulation tests (and that probably can only be fully tested from C++).

An even better example (because it covers a bunch of sunny-day cases) would be bug 802376 (selecting a video device other than the first one crashes), though it may be tough to test, or NSS startup, or bug 780790 (which really should be part of an API-conformance mochitest). To be honest, I had trouble coming up with a *great* example here, though as we clean more into the corners I'm sure we'll find some. Probably the best current examples would come from fuzz bugs.

>> Does that work? I do understand the concerns Ekr and Randell raise, but I
>> also don't want to take risks on crash regressions too much given the high
>> crash bug rate, even with the feature currently prefed off.

--
Randell Jesup, Mozilla Corp

Henrik Skupin

unread,

Nov 23, 2012, 4:34:17 AM11/23/12

to jsmith....@gmail.com, dev-...@lists.mozilla.org, mozilla....@googlegroups.com

jsmith....@gmail.com wrote on 11/21/12 9:28 AM:

> 1. Any work Henrik already started for certain crashtests (since I
> know he's got a few in queue) - finish those off

The primarily work I wanted to do here is done now. I have investigated
our two blockers which prevents us from turning on crashtests on
mozilla-central. Those are now in developers hands. That means today I
will get some of the crashtests written for still broken cases. I will
not spend more than half a day on it.

> 2. Henrik shifts focus towards smoke test automation for the full
> stack, I'll finish off the gUM smoke test automation (which is 90%
> done).

To combine what everyone else said, what do you want me to start on
first? Are these gUM or PeerConnection tests? I think with Jasons basic
gUM tests we have a good first start once landed, and I should do a
similar thing for the peerconnection piece.

> 3. We establish a review policy that any patches that can definitely
> have a crashtest associated with them has to be included on check-in
> by the developer building the patch. If the patch comes in without a
> crashtest that can indeed possibly have one, the patch gets a review-
> until the test is included. That will balance both validating that
> the patch actually does work at check-in, but also starts to burn
> through progressively building out a crashtest regression automation
> suite.

I would love that because of:

* it helps devs to check that the crash has been fixed across platforms
by pushing it to try (via the alder branch for now) instead of only
testing it on their own platform work happens.

* crashtests are easy to create from the already attached testcases.

> Does that work? I do understand the concerns Ekr and Randell raise,
> but I also don't want to take risks on crash regressions too much
> given the high crash bug rate, even with the feature currently prefed
> off.

That's my concern too and I will see soon what has been changed for
basic PeerConnection tests since I tried it the last time.

--
Henrik

Henrik Skupin

unread,

Nov 23, 2012, 4:34:17 AM11/23/12

to jsmith....@gmail.com, dev-...@lists.mozilla.org, mozilla....@googlegroups.com

jsmith....@gmail.com wrote on 11/21/12 9:28 AM:

> 1. Any work Henrik already started for certain crashtests (since I
> know he's got a few in queue) - finish those off

The primarily work I wanted to do here is done now. I have investigated
our two blockers which prevents us from turning on crashtests on
mozilla-central. Those are now in developers hands. That means today I
will get some of the crashtests written for still broken cases. I will
not spend more than half a day on it.

> 2. Henrik shifts focus towards smoke test automation for the full
> stack, I'll finish off the gUM smoke test automation (which is 90%
> done).

To combine what everyone else said, what do you want me to start on
first? Are these gUM or PeerConnection tests? I think with Jasons basic
gUM tests we have a good first start once landed, and I should do a
similar thing for the peerconnection piece.

> 3. We establish a review policy that any patches that can definitely
> have a crashtest associated with them has to be included on check-in
> by the developer building the patch. If the patch comes in without a
> crashtest that can indeed possibly have one, the patch gets a review-
> until the test is included. That will balance both validating that
> the patch actually does work at check-in, but also starts to burn
> through progressively building out a crashtest regression automation
> suite.

I would love that because of:

* it helps devs to check that the crash has been fixed across platforms
by pushing it to try (via the alder branch for now) instead of only
testing it on their own platform work happens.

* crashtests are easy to create from the already attached testcases.

> Does that work? I do understand the concerns Ekr and Randell raise,
> but I also don't want to take risks on crash regressions too much
> given the high crash bug rate, even with the feature currently prefed
> off.

Henrik Skupin

unread,

Nov 23, 2012, 4:42:08 AM11/23/12

to jsmith....@gmail.com, dev-...@lists.mozilla.org, mozilla....@googlegroups.com

jsmith....@gmail.com wrote on 11/21/12 9:28 AM:

> 3. We establish a review policy that any patches that can definitely
> have a crashtest associated with them has to be included on check-in
> by the developer building the patch. If the patch comes in without a
> crashtest that can indeed possibly have one, the patch gets a review-
> until the test is included. That will balance both validating that
> the patch actually does work at check-in, but also starts to burn
> through progressively building out a crashtest regression automation
> suite.

We need a policy for in-testsuite? in general. Lots of bugs depend on
our judgement right now. I would love if devs would take care of this
flag and set it appropriately after the landing on inbound or m-c. That
would give us a perfect way to track upcoming and left-to-do work. Shall
we only do that for crashtests and mochitests? I would think so because
unit tests should be landed together with the actual patch.

--
Henrik

Henrik Skupin

unread,

Nov 23, 2012, 4:42:08 AM11/23/12

to jsmith....@gmail.com, dev-...@lists.mozilla.org, mozilla....@googlegroups.com

jsmith....@gmail.com wrote on 11/21/12 9:28 AM:

> 3. We establish a review policy that any patches that can definitely
> have a crashtest associated with them has to be included on check-in
> by the developer building the patch. If the patch comes in without a
> crashtest that can indeed possibly have one, the patch gets a review-
> until the test is included. That will balance both validating that
> the patch actually does work at check-in, but also starts to burn
> through progressively building out a crashtest regression automation
> suite.

Henrik Skupin

unread,

Nov 23, 2012, 4:42:08 AM11/23/12

to jsmith....@gmail.com, dev-...@lists.mozilla.org, mozilla....@googlegroups.com

jsmith....@gmail.com wrote on 11/21/12 9:28 AM:

> 3. We establish a review policy that any patches that can definitely
> have a crashtest associated with them has to be included on check-in
> by the developer building the patch. If the patch comes in without a
> crashtest that can indeed possibly have one, the patch gets a review-
> until the test is included. That will balance both validating that
> the patch actually does work at check-in, but also starts to burn
> through progressively building out a crashtest regression automation
> suite.

Eric Rescorla

unread,

Nov 23, 2012, 10:45:44 AM11/23/12

to Henrik Skupin, dev-...@lists.mozilla.org, mozilla-...@lists.mozilla.org, mozilla....@googlegroups.com, jsmith....@gmail.com

On Fri, Nov 23, 2012 at 1:34 AM, Henrik Skupin <hsk...@gmail.com> wrote:

> jsmith....@gmail.com wrote on 11/21/12 9:28 AM:
>

> > 1. Any work Henrik already started for certain crashtests (since I
> > know he's got a few in queue) - finish those off
>

> The primarily work I wanted to do here is done now. I have investigated
> our two blockers which prevents us from turning on crashtests on
> mozilla-central. Those are now in developers hands. That means today I
> will get some of the crashtests written for still broken cases. I will
> not spend more than half a day on it.
>

> > 2. Henrik shifts focus towards smoke test automation for the full
> > stack, I'll finish off the gUM smoke test automation (which is 90%
> > done).
>

> To combine what everyone else said, what do you want me to start on
> first? Are these gUM or PeerConnection tests? I think with Jasons basic
> gUM tests we have a good first start once landed, and I should do a
> similar thing for the peerconnection piece.

PC. What we need is something analagous to the tests we have
in C++ (signaling_unittests.cpp) though perhaps not quite as
aggressive about testing a pile of SDP variants.

> > 3. We establish a review policy that any patches that can definitely
> > have a crashtest associated with them has to be included on check-in
> > by the developer building the patch. If the patch comes in without a
> > crashtest that can indeed possibly have one, the patch gets a review-
> > until the test is included. That will balance both validating that
> > the patch actually does work at check-in, but also starts to burn
> > through progressively building out a crashtest regression automation
> > suite.
>

> I would love that because of:
>
> * it helps devs to check that the crash has been fixed across platforms
> by pushing it to try (via the alder branch for now) instead of only
> testing it on their own platform work happens.
>
> * crashtests are easy to create from the already attached testcases.

I agree with Randell that while this may be good practice in many
cases, it's not a reasonable general policy as a gating factor on
checkin. As I said, to the extent to which test cases are already
available (or Test develops them) I agree that they should pass,
but I don't think it's a requirement that the developers write them
for every crash.

-Ekr

Eric Rescorla

unread,

Nov 23, 2012, 10:45:44 AM11/23/12

to Henrik Skupin, dev-...@lists.mozilla.org, mozilla-...@lists.mozilla.org, mozilla....@googlegroups.com, jsmith....@gmail.com

On Fri, Nov 23, 2012 at 1:34 AM, Henrik Skupin <hsk...@gmail.com> wrote:

> jsmith....@gmail.com wrote on 11/21/12 9:28 AM:
>

> > 1. Any work Henrik already started for certain crashtests (since I
> > know he's got a few in queue) - finish those off
>

> The primarily work I wanted to do here is done now. I have investigated
> our two blockers which prevents us from turning on crashtests on
> mozilla-central. Those are now in developers hands. That means today I
> will get some of the crashtests written for still broken cases. I will
> not spend more than half a day on it.
>

> > 2. Henrik shifts focus towards smoke test automation for the full
> > stack, I'll finish off the gUM smoke test automation (which is 90%
> > done).
>

> To combine what everyone else said, what do you want me to start on
> first? Are these gUM or PeerConnection tests? I think with Jasons basic
> gUM tests we have a good first start once landed, and I should do a
> similar thing for the peerconnection piece.

PC. What we need is something analagous to the tests we have
in C++ (signaling_unittests.cpp) though perhaps not quite as
aggressive about testing a pile of SDP variants.

> > 3. We establish a review policy that any patches that can definitely
> > have a crashtest associated with them has to be included on check-in
> > by the developer building the patch. If the patch comes in without a
> > crashtest that can indeed possibly have one, the patch gets a review-
> > until the test is included. That will balance both validating that
> > the patch actually does work at check-in, but also starts to burn
> > through progressively building out a crashtest regression automation
> > suite.
>

Randell Jesup

unread,

Nov 23, 2012, 4:56:25 PM11/23/12

to dev-...@lists.mozilla.org

On 11/23/2012 10:45 AM, Eric Rescorla wrote:
> On Fri, Nov 23, 2012 at 1:34 AM, Henrik Skupin <hsk...@gmail.com> wrote:
>

>>> 3. We establish a review policy that any patches that can definitely
>>> have a crashtest associated with them has to be included on check-in
>>> by the developer building the patch. If the patch comes in without a
>>> crashtest that can indeed possibly have one, the patch gets a review-
>>> until the test is included. That will balance both validating that
>>> the patch actually does work at check-in, but also starts to burn
>>> through progressively building out a crashtest regression automation
>>> suite.

>> I would love that because of:
>>
>> * it helps devs to check that the crash has been fixed across platforms
>> by pushing it to try (via the alder branch for now) instead of only
>> testing it on their own platform work happens.
>>
>> * crashtests are easy to create from the already attached testcases.
>
> I agree with Randell that while this may be good practice in many
> cases, it's not a reasonable general policy as a gating factor on
> checkin. As I said, to the extent to which test cases are already
> available (or Test develops them) I agree that they should pass,
> but I don't think it's a requirement that the developers write them
> for every crash.

Most of us can test cross-platform if we want or feel a need to. And a
fair number of the crashes are ASAN failures, which we can't easily
test. As very few of our bugs are platform dependent (a few in gum are,
for example), cross-platform testing doesn't add much to our tests.
More than nothing, but not a ton at this stage in development.

I agree with Jason (and Henrik?) that developers for WebRTC should try
to indicate using in-testsuite?/+/- what they think the appropriate use
of tests is on checkin. (I'm guessing that the admonition about "only
QA actively working on testcases in this component should use this
keyword" is not actually the case?) But even if a bug should have a
test, if we don't have one yet, I'd mark it in-testsuite?, and comment
on the bug about what type of test (I'd generally prefer mochitests!)
should be done and how broad it should be (right now, I like broad if it
also hits the bug).

I'll be clear: we will not at this point block r+'s for bugs that
can/should have a test.

If a crash fix should have a test, I'd love to get a test with the
patch, but at this point it's more important to fix the bug - please
mark it "in-testsuite?" and we'll try to get a test in place ASAP. Also,
please look at the existing tests and see if one can be easily/safely
modified to include the new bug.

jsmith....@gmail.com

unread,

Nov 24, 2012, 12:42:14 AM11/24/12

to mozilla-...@lists.mozilla.org, dev-...@lists.mozilla.org

Hi Randell,

That sounds like a good proposal. The more help we can clearly get a picture of what bugs are good candidates for automation, the better we'll know what automation to write.

That also makes sense to not block patches right now for required automation. In order to aid in a clearer use of in-testsuite?, what I think we should do based on the above feedback is something like:

1. During review, if the reviewer notices that X bug is a good candidate for a test, immediately flip the in-testsuite? flag and indicate what type of test would be worthwhile to get in as a result of the bug (e.g. XPC Shell, Mochitest, etc). We might want to throw a whiteboard tag in there to indicate what type of test for clarification.
2. During the weekly triage, we include as part of the triage to immediately flag in-testsuite to a value if we know whether it will or won't be a good candidate for automation if we know up-front, along with specifying what type of test in the whiteboard
3. We might want to take a quick skim during a triage session of existing bugs to flag any bugs in the bug list that haven't been triaged for in-testsuite to an appropriate value to help aid in automation definition

I think that process aligns with the proposal you have above.

Oh and as an extension to this discussion (since it already started in the bugs as a trend) - any time automation is blocked, we can use the whiteboard [automation-blocked]. That'll help raise awareness if automation development is getting stuck (which right now, it kinda is true due to plethora of memory leaks).

Sincerely,
Jason Smith