telemetry_perf_unittests (Twitter) failing, stalling CQ. Caused by DNS DOS attack?

30 views
Skip to first unread message

Kevin Marshall

unread,
Oct 21, 2016, 4:13:27 PM10/21/16
to Chromium-dev
The CQ seems stuck on telemetry_perf_unittests. I looked at the failing test and it looks like it's failing to navigate to a page on twitter.com. It's probably related to the ongoing DNS DDoS. Can we temporarily remove this step from the buildbot recipes until the service is restored?


[161/306] benchmarks.system_health_smoke_test.SystemHealthBenchmarkSmokeTest.system_health.memory_mobile.browse:social:twitter failed unexpectedly 120.9157s:
  [ RUN      ] browse:social:twitter
  Traceback (most recent call last):
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 86, in _RunStoryAndProcessErrorIfNeeded
      state.RunStory(results)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 311, in RunStory
      self._current_page.Run(self)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/page/__init__.py", line 105, in Run
      shared_state.page_test.RunNavigateSteps(self, current_tab)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/page/legacy_page_test.py", line 195, in RunNavigateSteps
      page.RunNavigateSteps(action_runner)
    File "/b/swarm_slave/w/irmq0gjH/tools/perf/page_sets/system_health/system_health_story.py", line 108, in RunNavigateSteps
      super(SystemHealthStory, self).RunNavigateSteps(action_runner)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/page/__init__.py", line 114, in RunNavigateSteps
      url, script_to_evaluate_on_commit=self.script_to_evaluate_on_commit)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py", line 160, in Navigate
      timeout_in_seconds=timeout_in_seconds))
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/actions/action_runner.py", line 53, in _RunAction
      action.RunAction(self._tab)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/actions/navigate.py", line 23, in RunAction
      self._timeout_in_seconds)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/browser/web_contents.py", line 252, in Navigate
      self._inspector_backend.Navigate(url, script_to_evaluate_on_commit, timeout)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 37, in inner
      inspector_backend._ConvertExceptionFromInspectorWebsocket(e)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
      return func(*args, **kwargs)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 34, in inner
      return func(inspector_backend, *args, **kwargs)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py", line 172, in Navigate
      self._page.Navigate(url, script_to_evaluate_on_commit, timeout)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_page.py", line 125, in Navigate
      self.WaitForNavigate(timeout)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_page.py", line 94, in WaitForNavigate
      self._inspector_websocket.DispatchNotifications(remaining_time)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 134, in DispatchNotifications
      self._Receive(timeout)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_websocket.py", line 149, in _Receive
      data = self._socket.recv()
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 596, in recv
      opcode, data = self.recv_data()
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 606, in recv_data
      frame = self.recv_frame()
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 637, in recv_frame
      self._frame_header = self._recv_strict(2)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 746, in _recv_strict
      bytes = self._recv(shortage)
    File "/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/third_party/websocket-client/websocket.py", line 732, in _recv
      raise WebSocketTimeoutException(e.message)
  TimeoutException: 
  ********************************************************************************
  (/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/telemetry/internal/backends/chrome_inspector/inspector_backend.py:389 _ConvertExceptionFromInspectorWebsocket) The app is probably crashed:
  
  Found Minidump: True

Alexei Svitkine

unread,
Oct 21, 2016, 4:17:53 PM10/21/16
to mars...@google.com, Chromium-dev, Annie Sullivan
It seems broken that our perf tests are trying to ping real live websites.

I thought we had infrastructure to have a snapshot of the data and use a local copy of it?

Otherwise, not only is this kind of issue possible, but a change to the website could cause the perf result to move.

+Annie

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Juan Antonio Navarro Pérez

unread,
Oct 24, 2016, 5:37:04 AM10/24/16
to asvi...@chromium.org, mars...@google.com, nedn...@google.com, Chromium-dev, Annie Sullivan
+Ned 

These tests are not pinging real websites, they do use web page replay. So this could not be related to the DNS DOS attack.

See in the linked logs for example:
  Starting Web-Page-Replay: ['/usr/bin/python', '/b/swarm_slave/w/irmq0gjH/third_party/catapult/telemetry/third_party/web-page-replay/replay.py', '--host=127.0.0.1', '--port=0', '--ssl_port=0', '--no-dns_forwarding', '--use_closest_match', '--log_level=info', '--should_generate_certs', '--https_root_ca_cert_path=/b/swarm_slave/w/itORncj5/tmptaECx_/testca.pem', u'/b/swarm_slave/w/irmq0gjH/tools/perf/page_sets/data/system_health_mobile_014.wpr']


---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Primiano Tucci

unread,
Oct 24, 2016, 5:48:28 AM10/24/16
to Juan Pérez, Alexei Svitkine, Kevin Marshall, Ned, h...@chromium.org, Annie Sullivan, Chromium-dev
+hjd 

iirc those devices run with WiFi disabled to avoid accidentally depending on live network. 

is there a bug tracking this issue  already? 

Primiano Tucci

unread,
Oct 24, 2016, 7:13:50 AM10/24/16
to Juan Pérez, Alexei Svitkine, Kevin Marshall, Ned, h...@chromium.org, telemetry, Annie Sullivan
Moving this thread to telemetry@, chromium-dev to BCC.

hjd and I took a look to this. It's actually due to the DDoS. Thanks a lot for pointing that out, it helped a lot the investigation.

TL;DR: the devices actually run with WiFI disabled. However, recently TSProxy (Traffic Shaping Proxy) was introduced on the host between the device and WebPageReplay.
Turns out TSProxy is doing dns resolutions on the host, hence adding a dependency between the test and the live network.
I am pretty sure this is an unintended, very subtle, side effect of the tsproxy switch and should be fixed.

Thanks for pointing that out.
Reply all
Reply to author
Forward
0 new messages