Re-enabling "net_perftests" on perf bots

3 views
Skip to first unread message

Helen Li

unread,
Mar 14, 2017, 3:20:03 PM3/14/17
to net-dev
For some reason, there isn't any bot running "net_perftests" target. I think this test target might be accidentally dropped during migration. I am re-enabling it on Linux first (https://codereview.chromium.org/2748073003/), following by other platforms if there isn't any major issue.

These few tests will be run:
       "base/mime_sniffer_perftest.cc",
       "cookies/cookie_monster_perftest.cc",
       "disk_cache/disk_cache_perftest.cc",
       "extras/sqlite/sqlite_persistent_cookie_store_perftest.cc",
       "proxy/proxy_resolver_perftest.cc",
       "socket/udp_socket_perftest.cc"

Let me know if there's anything I need to be aware of.

Ryan Sleevi

unread,
Mar 14, 2017, 4:14:07 PM3/14/17
to Helen Li, net-dev
I don't believe these were ever run.

Running them is only valuable if the metrics they provide are valuable, and if they are monitored. I don't believe enabling it on bots would help with those, but perhaps there are changes in the performance monitoring infrastructure that makes these tests valuable.

Are you sure these haven't bitrotted - and are they measuring the right things? Historically, I'm only aware of these being used for one-off validations of substantive changes. 

Helen Li

unread,
Mar 14, 2017, 4:49:20 PM3/14/17
to rsl...@chromium.org, net-dev
> Are you sure these haven't bitrotted - and are they measuring the right things? Historically, I'm only aware of these being used for one-off validations of substantive changes. 

How do we determine what are "substantive changes"? I think the point with perf tests is that they can be run continuously and we don't need to make the distinction between a substantive change and a minor change. Not running them continuously makes these tests hard to maintain. For example, CookieMonsterTest.TestAddCookieOnManyHosts is no longer passing.  If we only use them for one-off validations, we can write metrics to prove "improvements" and not care about what happens later, which seems to be a bad idea. 

As you said, these tests might not be relevant since most of them are more than 4 years old and are un-owned. I am adding more tests so I would like to reuse this test target. I will keep an eye on these tests in the meanwhile. If a test isn't doing anything meaningful, we can consider excluding it. WDYT?

Ryan Sleevi

unread,
Mar 14, 2017, 5:16:25 PM3/14/17
to Helen Li, Ryan Sleevi, net-dev
On Tue, Mar 14, 2017 at 4:49 PM, Helen Li <xunj...@chromium.org> wrote:
> Are you sure these haven't bitrotted - and are they measuring the right things? Historically, I'm only aware of these being used for one-off validations of substantive changes. 

How do we determine what are "substantive changes"? I think the point with perf tests is that they can be run continuously and we don't need to make the distinction between a substantive change and a minor change. Not running them continuously makes these tests hard to maintain. For example, CookieMonsterTest.TestAddCookieOnManyHosts is no longer passing.  If we only use them for one-off validations, we can write metrics to prove "improvements" and not care about what happens later, which seems to be a bad idea. 

Oh, I agree absolutely. However, I was trying to highlight that their historic use may mean they're not suitable or appropriate for continuous monitoring, because they might be reporting the wrong thing relative to the goal. For example, a perftest that is intended to be run 5x/10x and the average taken (because it uses highly variable metrics) likely doesn't make a good candidate for continuous monitoring, because of the noise and variability.
 
As you said, these tests might not be relevant since most of them are more than 4 years old and are un-owned. I am adding more tests so I would like to reuse this test target. I will keep an eye on these tests in the meanwhile. If a test isn't doing anything meaningful, we can consider excluding it. WDYT?

My gut instinct, which is probably on the more extreme side of things and thus may be wrong, is that if these tests aren't being run, the right answer is to delete them, and introduce the tests you think are meaningful and can/will maintain. Alternatively, it may require much more source code archaeology and (unfortunately) lost knowledge to determine whether or not these tests are measuring the right thing in the right way, and if you want to go down that path, I think it's probably more useful to only enable the tests once you're confident they're measuring the right thing, rather than adding them all back at once.

I'm totally supportive of your desire to add better performance testing, and the only way to ensure it's meaningful is through continuous running, monitoring, and response, so I'm really glad you're taking this on. I'm just worried that unreliable/flaky tests may end up taking much more time to investigate, and may result in desensitizing to alerts, versus tests for which there is a strong sense of ownership and understanding attached to them. 

Randy Smith

unread,
Mar 14, 2017, 5:43:23 PM3/14/17
to Ryan Sleevi, Helen Li, net-dev
On a slightly orthogonal note, my memory of the cookie perftests is that there was absolutely no way for them to fail, i.e. they were only useful if run manually (and possibly not even then).  Based on that single data point, my assumption would be that the other perftests are similar; just code storage for people who wanted to have easy access to manual tests that could be run at will.  

If that assumption's correct, I don't see any point in re-enabling them to be run automatically.  So I'd suggest verifying/disproving that assumption or starting from scratch.  And in starting from scratch, I'd suggest the first problem to address is how to design a perf test so that it can return a "pass/fail" value without being flaky.  I suspect it's a hard problem.

-- Randy


--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+unsubscribe@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/CACvaWvY0wr%3DoUDYy%2BAyQVcH6DT2kk3%3DgBbUjVKa1rUqzRwatMw%40mail.gmail.com.

Helen Li

unread,
Mar 14, 2017, 5:48:46 PM3/14/17
to Randy Smith, Ryan Sleevi, net-dev
And in starting from scratch, I'd suggest the first problem to address is how to design a perf test so that it can return a "pass/fail" value without being flaky.  I suspect it's a hard problem.

My impression is that perf tests will just report values to perf dashboard, and we will get notified if these values suddenly increase/decrease. These tests should always pass right? since they are functionally correct.





To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.

To post to this group, send email to net...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.

To post to this group, send email to net...@chromium.org.

Randy Smith

unread,
Mar 14, 2017, 5:52:09 PM3/14/17
to Helen Li, Ryan Sleevi, net-dev
On Tue, Mar 14, 2017 at 5:48 PM, Helen Li <xunj...@chromium.org> wrote:
And in starting from scratch, I'd suggest the first problem to address is how to design a perf test so that it can return a "pass/fail" value without being flaky.  I suspect it's a hard problem.

My impression is that perf tests will just report values to perf dashboard, and we will get notified if these values suddenly increase/decrease. These tests should always pass right? since they are functionally correct.

Ah, right, if we hook things into the perf dashboard that would work fine--the perf dashboard presumably does good statistics to evaluate the pass/fail issue.  Ok, so the challenge is hooking them into the perf dashboard; my apologies for the red herring.  If that's already been done, please ignore my messages.

On the bright side, those are much smaller challenge than defining "passing" :-}.

-- Randy
 





On Tue, Mar 14, 2017 at 5:43 PM Randy Smith <rds...@chromium.org> wrote:
On a slightly orthogonal note, my memory of the cookie perftests is that there was absolutely no way for them to fail, i.e. they were only useful if run manually (and possibly not even then).  Based on that single data point, my assumption would be that the other perftests are similar; just code storage for people who wanted to have easy access to manual tests that could be run at will.  

If that assumption's correct, I don't see any point in re-enabling them to be run automatically.  So I'd suggest verifying/disproving that assumption or starting from scratch.  And in starting from scratch, I'd suggest the first problem to address is how to design a perf test so that it can return a "pass/fail" value without being flaky.  I suspect it's a hard problem.

-- Randy


On Tue, Mar 14, 2017 at 5:15 PM, Ryan Sleevi <rsl...@chromium.org> wrote:


On Tue, Mar 14, 2017 at 4:49 PM, Helen Li <xunj...@chromium.org> wrote:
> Are you sure these haven't bitrotted - and are they measuring the right things? Historically, I'm only aware of these being used for one-off validations of substantive changes. 

How do we determine what are "substantive changes"? I think the point with perf tests is that they can be run continuously and we don't need to make the distinction between a substantive change and a minor change. Not running them continuously makes these tests hard to maintain. For example, CookieMonsterTest.TestAddCookieOnManyHosts is no longer passing.  If we only use them for one-off validations, we can write metrics to prove "improvements" and not care about what happens later, which seems to be a bad idea. 

Oh, I agree absolutely. However, I was trying to highlight that their historic use may mean they're not suitable or appropriate for continuous monitoring, because they might be reporting the wrong thing relative to the goal. For example, a perftest that is intended to be run 5x/10x and the average taken (because it uses highly variable metrics) likely doesn't make a good candidate for continuous monitoring, because of the noise and variability.
 
As you said, these tests might not be relevant since most of them are more than 4 years old and are un-owned. I am adding more tests so I would like to reuse this test target. I will keep an eye on these tests in the meanwhile. If a test isn't doing anything meaningful, we can consider excluding it. WDYT?

My gut instinct, which is probably on the more extreme side of things and thus may be wrong, is that if these tests aren't being run, the right answer is to delete them, and introduce the tests you think are meaningful and can/will maintain. Alternatively, it may require much more source code archaeology and (unfortunately) lost knowledge to determine whether or not these tests are measuring the right thing in the right way, and if you want to go down that path, I think it's probably more useful to only enable the tests once you're confident they're measuring the right thing, rather than adding them all back at once.

I'm totally supportive of your desire to add better performance testing, and the only way to ensure it's meaningful is through continuous running, monitoring, and response, so I'm really glad you're taking this on. I'm just worried that unreliable/flaky tests may end up taking much more time to investigate, and may result in desensitizing to alerts, versus tests for which there is a strong sense of ownership and understanding attached to them. 

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+unsubscribe@chromium.org.

To post to this group, send email to net...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+unsubscribe@chromium.org.

To post to this group, send email to net...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+unsubscribe@chromium.org.

To post to this group, send email to net...@chromium.org.

Ryan Sleevi

unread,
Mar 14, 2017, 5:55:21 PM3/14/17
to Randy Smith, Helen Li, Ryan Sleevi, net-dev
On Tue, Mar 14, 2017 at 5:52 PM, Randy Smith <rds...@chromium.org> wrote:


On Tue, Mar 14, 2017 at 5:48 PM, Helen Li <xunj...@chromium.org> wrote:
And in starting from scratch, I'd suggest the first problem to address is how to design a perf test so that it can return a "pass/fail" value without being flaky.  I suspect it's a hard problem.

My impression is that perf tests will just report values to perf dashboard, and we will get notified if these values suddenly increase/decrease. These tests should always pass right? since they are functionally correct.

Ah, right, if we hook things into the perf dashboard that would work fine--the perf dashboard presumably does good statistics to evaluate the pass/fail issue.  Ok, so the challenge is hooking them into the perf dashboard; my apologies for the red herring.  If that's already been done, please ignore my messages.

My concern was whether they measure the right thing in the right way :) I think the number of Chrip alerts for existing metrics suggests we may not have systems in place that are good enough statistically - or that we're measuring the wrong things.

That's not to say we shouldn't measure, but we should make sure to scale gracefully so that everything is actionable. Even one test is an improvement over the status quo :)

Helen Li

unread,
Mar 14, 2017, 5:59:37 PM3/14/17
to Randy Smith, Ryan Sleevi, net-dev
Hooking them into the perf dashboard is easy. All pieces are there. it's just that they aren't enabled. I have CL at https://codereview.chromium.org/2748073003/ with owner stamps. I've put that on hold.

I think both yours and Ryan's concerns make sense. 
I will start a test target from scratch.  My main focus right now is to address memory regressions which are easier than measuring speed.

Thanks for the feedback!


On Tue, Mar 14, 2017 at 5:52 PM Randy Smith <rds...@chromium.org> wrote:
On Tue, Mar 14, 2017 at 5:48 PM, Helen Li <xunj...@chromium.org> wrote:
And in starting from scratch, I'd suggest the first problem to address is how to design a perf test so that it can return a "pass/fail" value without being flaky.  I suspect it's a hard problem.

My impression is that perf tests will just report values to perf dashboard, and we will get notified if these values suddenly increase/decrease. These tests should always pass right? since they are functionally correct.

Ah, right, if we hook things into the perf dashboard that would work fine--the perf dashboard presumably does good statistics to evaluate the pass/fail issue.  Ok, so the challenge is hooking them into the perf dashboard; my apologies for the red herring.  If that's already been done, please ignore my messages.

On the bright side, those are much smaller challenge than defining "passing" :-}.

-- Randy
 
On Tue, Mar 14, 2017 at 5:43 PM Randy Smith <rds...@chromium.org> wrote:
On a slightly orthogonal note, my memory of the cookie perftests is that there was absolutely no way for them to fail, i.e. they were only useful if run manually (and possibly not even then).  Based on that single data point, my assumption would be that the other perftests are similar; just code storage for people who wanted to have easy access to manual tests that could be run at will.  

If that assumption's correct, I don't see any point in re-enabling them to be run automatically.  So I'd suggest verifying/disproving that assumption or starting from scratch.  And in starting from scratch, I'd suggest the first problem to address is how to design a perf test so that it can return a "pass/fail" value without being flaky.  I suspect it's a hard problem.

-- Randy


On Tue, Mar 14, 2017 at 5:15 PM, Ryan Sleevi <rsl...@chromium.org> wrote:


On Tue, Mar 14, 2017 at 4:49 PM, Helen Li <xunj...@chromium.org> wrote:
> Are you sure these haven't bitrotted - and are they measuring the right things? Historically, I'm only aware of these being used for one-off validations of substantive changes. 

How do we determine what are "substantive changes"? I think the point with perf tests is that they can be run continuously and we don't need to make the distinction between a substantive change and a minor change. Not running them continuously makes these tests hard to maintain. For example, CookieMonsterTest.TestAddCookieOnManyHosts is no longer passing.  If we only use them for one-off validations, we can write metrics to prove "improvements" and not care about what happens later, which seems to be a bad idea. 

Oh, I agree absolutely. However, I was trying to highlight that their historic use may mean they're not suitable or appropriate for continuous monitoring, because they might be reporting the wrong thing relative to the goal. For example, a perftest that is intended to be run 5x/10x and the average taken (because it uses highly variable metrics) likely doesn't make a good candidate for continuous monitoring, because of the noise and variability.
 
As you said, these tests might not be relevant since most of them are more than 4 years old and are un-owned. I am adding more tests so I would like to reuse this test target. I will keep an eye on these tests in the meanwhile. If a test isn't doing anything meaningful, we can consider excluding it. WDYT?

My gut instinct, which is probably on the more extreme side of things and thus may be wrong, is that if these tests aren't being run, the right answer is to delete them, and introduce the tests you think are meaningful and can/will maintain. Alternatively, it may require much more source code archaeology and (unfortunately) lost knowledge to determine whether or not these tests are measuring the right thing in the right way, and if you want to go down that path, I think it's probably more useful to only enable the tests once you're confident they're measuring the right thing, rather than adding them all back at once.

I'm totally supportive of your desire to add better performance testing, and the only way to ensure it's meaningful is through continuous running, monitoring, and response, so I'm really glad you're taking this on. I'm just worried that unreliable/flaky tests may end up taking much more time to investigate, and may result in desensitizing to alerts, versus tests for which there is a strong sense of ownership and understanding attached to them. 

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.

To post to this group, send email to net...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.

To post to this group, send email to net...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.

To post to this group, send email to net...@chromium.org.

Chris Bentzel

unread,
Mar 15, 2017, 1:12:56 PM3/15/17
to Helen Li, Randy Smith, Ryan Sleevi, net-dev
At a broader level, we're interested in having some continually running benchmarks for CPU/memory/IO.

I've historically been pretty skeptical about benchmarks for networking and page load performance, but doing it for these cases (and the scalable loading case) seem like a place where perftests will help us. 

Reply all
Reply to author
Forward
0 new messages