On 8/31/15 11:53 AM, Don Garrett wrote:
> +Richard Barnette <mailto:
jrbar...@google.com>
>
TL;DR: The devservers in the test lab are used during
telemetry testing. There's traffic to and from those
servers that we don't measure, and that I don't fully
understand. I believe that _that_ traffic is a possible
source of the problem, and we need to figure out how to
rule it in or out.
Point #1:
The problem started with the test run for this canary build:
https://uberchromegw.corp.google.com/i/chromeos/builders/Canary%20master/builds/1220
The previous canary build was fine. After the first occurrence,
the problem repeated with every canary build that followed, until
we pinned Chrome back to the version prior to the problem.
Conclusion #1:
The problem was caused by changes in the Chrome source base.
This could include telemetry changes.
Point #2:
The only metric we've been able to observe that clearly shows the
problem is the one showing bandwidth in and out of one of the
Destiny test lab. The graphs show that we're saturating outgoing
bandwidth from Destiny during every canary run.
We have metrics that measure the total data from the DUTs through
various known channels. Most notably, we know the total size of
test results over time, and that value is largely unchanged before
and after the incident. The "total test results" number includes
Chrome crashes, Chrome logs, and test logs.
Conclusion #2:
Whatever is driving the extra traffic, it's apparently not something
tracked by our existing metrics. In particular, it's probably not
Chrome crashes, Chrome logs on the DUT, or test output.
Point #3:
As noted, there's telemetry related traffic between the devservers
and the DUTs. I don't fully understand that traffic, but I know
it's not tracked. This data clearly falls into the "not covered
by existing metrics" category, and because it's telemetry, it can
change every time telemetry changes.
Conclusion #3:
The telemetry code running on the devserver is necessarily a suspect,
and needs to be ruled in or out of the search.
> On Mon, Aug 31, 2015 at 11:29 AM Achuith Bhandarkar
> <
ach...@chromium.org <mailto:
ach...@chromium.org>> wrote:
>
> 524814
> <
https://code.google.com/p/chromium/issues/detail?id=524814> tracks
> <
http://salus.prodmon.global.ls.google.com:3350/nebgua.html#borgmon=0.network.borgmon.netops.ih.borg.google.com&graph_type=graph&title=us-mtv-2081-labsw1-2-1_mtv%3AGigabitEthernet0_29%20%20%20%20%20Interface%20traffic%2C%20Bit%2Fsec&grid=xtics%20ytics&key=top%20left&ar=5m&yformat=%25.1s%25c&yrange=%5B0%3A%5D&hist_staleness=15m&xformat=%25H%3A%25M&duration=7d&with_0=lines&format_0=in&with_1=lines&format_1=out&expr=%7Bvar%3D%22irate_bps_adjusted%22%2Cjob%3D%22interfaceStats%22%2Cinstance%3D%22us-mtv-2081-labsw1-2-1_mtv%22%2Cinterface%3D%22GigabitEthernet0_29%22%2Cshard%3D%22us-mtv-2081%22%7D%3B%7Bvar%3D%22orate_bps_adjusted%22%2Cjob%3D%22interfaceStats%22%2Cinstance%3D%22us-mtv-2081-labsw1-2-1_mtv%22%2Cinterface%3D%22GigabitEthernet0_29%22%2Cshard%3D%22us-mtv-2081%22%7D>
>
>
--
"For your convenience, an elevator is located in CHINA"
seen in Dillard's department store, Omaha, NE