Re: [chromium-dev] Chrome (headless) process stuck

1,843 views
Skip to first unread message

Eric Seckler

unread,
Mar 23, 2017, 5:37:25 AM3/23/17
to anil...@gmail.com, headless-dev
+cc headless-dev
bcc chromium-dev

Can you provide a repro case? Does it always get stuck after the same (DevTools) interaction?

Also, what exactly does "stuck" mean? Is your DevTools socket closed, or do you simply not receive responses to commands you send? Can you still access http://localhost:9222/ ?

Cheers,
Eric

On Thu, Mar 23, 2017 at 1:58 AM Anil <anil...@gmail.com> wrote:
Hi guys, I'm trying to automate some UI testing using Chrome (version 58 beta, with --headless switch and others listed below, on Ubuntu Linux 16.04) and chrome-remote-interface.

/usr/bin/google-chrome --headless --disable-gpu --remote-debugging-port=9222 --user-data-dir=/chrome-user-data --window-size=1366x768 --disable-remote-fonts --disable-translate --disable-extensions --ignore-certificate-errors

It works by loading different webpages in separate tabs (managed with DevTools Target API), but after some time the main Chrome process gets stuck and stops working altogether. Here are my observations:
  • The main Chrome process gets stuck at using 10-30% memory but under 5% CPU, yet unresponsive. Several child processes (for the open tabs I suppose) appear orphan. Usually child processes are disposed when a tab is closed via chrome-remote-interface.
  • Running strace on the main chrome process (PID 5679 in this case) shows:
[pid  5726] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid  5724] futex(0x7f41b678d9ec, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  5721] futex(0x7f41b7f909bc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  5720] futex(0x7f41b87919bc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  5717] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid  5716] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid  5715] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid  5714] restart_syscall(<... resuming interrupted futex ...> <unfinished ...>
[pid  5705] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid  5732] futex(0x34c658cd3bb4, FUTEX_WAIT_PRIVATE, 3044, NULL <unfinished ...>
[pid  5729] select(47, [46], NULL, NULL, NULL <unfinished ...>
[pid  5728] epoll_wait(34,  <unfinished ...>
[pid  5727] futex(0x34c658cd3bb4, FUTEX_WAIT_PRIVATE, 3044, NULL <unfinished ...>
[pid  5725] futex(0x34c658c9e5b4, FUTEX_WAIT_PRIVATE, 1019, NULL <unfinished ...>
[pid  5711] wait4(5708,  <unfinished ...>
[pid  5722] futex(0x7f41b778f9bc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  5719] epoll_wait(18,  <unfinished ...>
[pid  5718] futex(0x7f41b97939bc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  5712] epoll_wait(40,  <unfinished ...>
[pid  5679] write(2, "[0322/150353.844850:INFO:CONSOLE(47)] \"A Parser-blocking, cross site (i.e. different eTLD+1) script"..., 617 <unfinished ...>
[pid  5713] epoll_wait(29,  <unfinished ...>
[pid  5723] epoll_wait(23, [], 32, 231) = 0
[pid  5723] write(27, "\0", 1)          = 1
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] epoll_wait(23, [{EPOLLIN, {u32=26, u64=26}}], 32, 0) = 1
[pid  5723] read(26, "\0", 1)           = 1
[pid  5723] epoll_wait(23, [], 32, 0)   = 0
[pid  5723] epoll_wait(23, [], 32, 250) = 0
[pid  5723] write(27, "\0", 1)          = 1
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] gettid()                    = 5723
[pid  5723] epoll_wait(23, [{EPOLLIN, {u32=26, u64=26}}], 32, 0) = 1
 
The child processes (e.g. tab with PID 5723) don't exist (checked with ps -p 5723), but a console message from the main process up there states that "A Parser-blocking, cross site (i.e. different eTLD+1) script", which indicates that a webpage script (or maybe a Flash/HTML5 video) may have caused the issue. Now, I don't know which webpage or script caused it, but never-the-less Chrome should still be able to timeout or end the child process gracefully. Is there a command-line switch to restrict (sandbox?) or better handle such situations?
  • Chrome's main process stack (running: cat /proc/5679/stack) shows:
[<ffffffff81217410>] pipe_wait+0x70/0xc0
[<ffffffff81217597>] pipe_write+0xc7/0x420
[<ffffffff8120e4bb>] new_sync_write+0x9b/0xe0
[<ffffffff8120e526>] __vfs_write+0x26/0x40
[<ffffffff8120eea9>] vfs_write+0xa9/0x1a0
[<ffffffff8120fb65>] SyS_write+0x55/0xc0
[<ffffffff8183c5f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[<ffffffffffffffff>] 0xffffffffffffffff

Are there any other logs I can refer to find the root cause?

I'm not sure what to make of it, and how to resolve this problem? I'll appreciate your suggestions.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/66b84076-4efe-4f96-9d74-5a36dbe9671e%40chromium.org.

Alex Clarke

unread,
Mar 23, 2017, 6:48:32 AM3/23/17
to Eric Seckler, anil...@gmail.com, headless-dev
Can you attach DevTools to it once it's stuck?  If so it should be possible to get a trace which may help with diagnosis.

--
You received this message because you are subscribed to the Google Groups "headless-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to headless-dev+unsubscribe@chromium.org.
To post to this group, send email to headle...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/headless-dev/CAHZJZiHma01uZC-fLBD8F7mFV01aAb7eKgfUYhxw1hiQyspXkA%40mail.gmail.com.

anil

unread,
Mar 23, 2017, 8:36:15 PM3/23/17
to Chromium-dev, anil...@gmail.com, headle...@chromium.org
Hey Eric,

By stuck I mean it hangs, and there's no response from Chrome headless over debugging protocol.

I ran 'netsat' and it shows that localhost:9222 is open.

To test the open port, I ran 'wget localhost:9222/json/version', which connects, but times-out due to no response:

Resolving localhost (localhost)... 127.0.0.1, 127.0.0.1, ::1, ...
Connecting to localhost (localhost)|127.0.0.1|:9222... connected.
HTTP request sent, awaiting response... Read error (Connection timed out) in headers.

Same with 'curl -v --max-time 10 localhost:9222/json/version':

*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9222 (#0)
> GET /json/version HTTP/1.1
> Host: localhost:9222
> User-Agent: curl/7.47.0
> Accept: */*
>
* Operation timed out after 10001 milliseconds with 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received

As I don't know which webpage or script causes Chrome to hang, I'll re-run it headless with logging enabled, and push some traffic. Let you know soon.

Cheers.

James Hartig

unread,
Mar 23, 2017, 10:49:07 PM3/23/17
to anil, Chromium-dev, headle...@chromium.org

We're having a similar issue and haven't been able to track down why it's hanging yet. We're running with logging on to hopefully see something but not seeing anything interesting thus far.


You received this message because you are subscribed to the Google Groups "headless-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to headless-dev...@chromium.org.

To post to this group, send email to headle...@chromium.org.
--


James Hartig
Co-Founder Leven Labs
Phone: 352-608-8859

anil...@gmail.com

unread,
Mar 24, 2017, 9:19:48 AM3/24/17
to headless-dev, chromi...@chromium.org, anil...@gmail.com
Update: I left a Chrome headless instance running (with logging enabled), along with our test runner. After a few hours, Chrome is not responding on debugging port again. The log file (chrome_debug.log) has a bunch of following errors (besides all other info and verbose messages):

ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.digicert.com
ERROR:web_contents_delegate.cc(199)] WebContentsDelegate::CheckMediaAccessPermission: Not supported.
ERROR:interface_registry.cc(210)] Failed to locate a binder for interface: blink::mojom::SensitiveInputVisibilityService requested by: content_renderer exposed by: content_browser via InterfaceProviderSpec "navigation:frame".

This log message also appears towards the end almost every time:
VERBOSE1:sandbox_linux.cc(70)] Activated seccomp-bpf sandbox for process type: renderer

Any idea what might be the problem. I'm thinking of trying the latest release build (M57) instead of the beta (M58) that I'm running.

Eric Seckler

unread,
Mar 24, 2017, 10:27:06 AM3/24/17
to anil...@gmail.com, headless-dev, chromi...@chromium.org
Do you know which website you were trying to load when this happens? Is it possible to reproduce by reloading the same site?

You received this message because you are subscribed to the Google Groups "headless-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to headless-dev...@chromium.org.
To post to this group, send email to headle...@chromium.org.

Jeff Tierney

unread,
Mar 28, 2017, 4:21:05 PM3/28/17
to Chromium-dev, anil...@gmail.com, headle...@chromium.org
Hi Eric,

I recently ran into what seems to be the same issue that Anil had... and can reliably reproduce navigating to https://www.bloomingdales.com

i ran into it originally via some scripts that i have been working on that control headless chromium using the devtools protocol, but can reproduce by navigating directly to the site.

when i run:

./chrome --headless --remote-debugging-port=9222 --disable-gpu --remote-debugging-address=0.0.0.0  https://bloomingdales.com


i see the following output:

[0328/201519.192936:WARNING:audio_manager.cc(321)] Multiple instances of AudioManager detected

[0328/201519.193411:WARNING:audio_manager.cc(278)] Multiple instances of AudioManager detected

[0328/201519.522486:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.comodoca.com

[0328/201519.522533:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.comodoca.com

[0328/201519.522551:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: crl.comodoca.com

[0328/201519.524497:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.comodoca.com

[0328/201519.524546:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.comodoca.com

[0328/201519.524566:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: crl.comodoca.com

[0328/201519.526473:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.usertrust.com

[0328/201519.526524:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.usertrust.com

[0328/201519.526544:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: crl.usertrust.com

[0328/201520.129229:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.comodoca.com

[0328/201520.129277:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.comodoca.com

[0328/201520.129296:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: crl.comodoca.com

[0328/201520.131156:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.comodoca.com

[0328/201520.131320:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.comodoca.com

[0328/201520.131354:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: crl.comodoca.com

[0328/201520.133244:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.usertrust.com

[0328/201520.133292:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: ocsp.usertrust.com

[0328/201520.133310:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler. host: crl.usertrust.com



it never completes loading the page, and if i take any screenshots of what does load, its just a blank white screen.


Thanks!
Jeff

Jeff Tierney

unread,
Mar 28, 2017, 4:27:09 PM3/28/17
to Chromium-dev, anil...@gmail.com, headle...@chromium.org
i meant to have the command load https://www.bloomingdales.com like so:

./chrome --headless --remote-debugging-port=9222 --disable-gpu --remote-debugging-address=0.0.0.0  https://www.bloomingdales.com


which has basically the same output, but fewer of these lines, because it skips the redirect from bloomingdales.com to www.bloomingdales.com:

[0328/202215.613042:ERROR:nss_ocsp.cc(591)] No URLRequestContext for NSS HTTP handler.


anil...@gmail.com

unread,
Mar 28, 2017, 8:46:11 PM3/28/17
to headless-dev, chromi...@chromium.org
As Jeff has reported, I can confirm that loading https://www.bloomingdales.com also causes the headless process to hang after a few tries (doesn't usually happen on the first few loads), and there-after the debugging port stops responding sooner or later.

Some other webpages that cause the lockdown quite often for me are:


It seems that heavy webpages with lots of scripts cause a crash most often.

What happens internally is a mystery, but after much investigation I've only been able to find that the Chrome renderer process for the new tab/target ends prematurely and abnormally during a Network.responseReceived event (sometimes before and sometimes after the Page.loadEventFired event). In some occasions the devtools endpoint was still accessible, and looking at /json/list it seems like the newly loaded tab/target was still active (either as about:blank or the webpage URL itself), but it doesn't get closed due to the renderer crash.

Due to this issue, and the nature of our usage, I'm considering not using Chrome headless. It has been a pain to make it work reliably.

Sami Kyostila

unread,
Mar 29, 2017, 7:28:37 AM3/29/17
to anil...@gmail.com, headless-dev, chromi...@chromium.org
I've opened https://bugs.chromium.org/p/chromium/issues/detail?id=706355. Let's see if we can reproduce this on our end.

- Sami

--
You received this message because you are subscribed to the Google Groups "headless-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to headless-dev...@chromium.org.
To post to this group, send email to headle...@chromium.org.

Jeff Tierney

unread,
Mar 29, 2017, 7:50:17 AM3/29/17
to Sami Kyostila, anil...@gmail.com, headless-dev, chromi...@chromium.org
Reply all
Reply to author
Forward
0 new messages