Intermittent GPU process freezes

Alexander Semashko

unread,

Apr 27, 2018, 3:37:31 PM4/27/18

to Graphics-dev

Hi!

We're seeing a strange browser behavior on one of our users. Sometimes the browser starts freezing intermittently which is caused by very long processing of some commands in GPU process. It is unclear what triggers this behavior - sometimes it occurs during playing videos, sometimes after switching tabs (with quite heavy pages loaded) and resizing the browser window. When we were able to reproduce this and record traces, an interesting thing was that the freeze ends immediately after switching to another application. Attached is a screenshot of what happens in GPU process during such freezes.

The user's video adapter is Intel HD 620, OS is Windows 10. However we could not reproduce the bug on other devices of the same model (Lenovo Thinkpad X1 Carbon).

We also have several crash reports from this user when the GpuWatchdogThread killed the GPU process. In all of them the hanging command was gpu::gles2::GLES2DecoderImpl::HandleEndQueryEXT.

Does this look similar to an issue that you're aware of? If so, is there a bug that we can participate in? If not, do you have any suggestions on what to do here other than communicating with Intel?

We're currently thinking about an extension to the GpuWatchdogThread that would crash the GPU process if there is a steady amount of very time-consuming tasks. It became clear that it must have a sort of whitelist for actions like shader compilation, context destruction, and maybe something else. Does this look reasonable?

—

Best regards,

Alexander Semashko

2018-04-27 22-16-50.png

Zhenyao Mo

unread,

Apr 27, 2018, 5:22:23 PM4/27/18

to ah...@yandex-team.ru, graphics-dev, Sunny Sachanandani

I haven't seen such reports in crbug.com before. If it's just a single
device, I wouldn't recommend spending time. It could be local
hardware/software corruption.

As for GpuWatchdogThread, I believe we already extend it to 15s (vs 10s on
other platforms) on Windows. I don't see any reasons to further extend it.
That will just cause frozen UI, pretty negative user experience.

On Fri, Apr 27, 2018 at 12:37 PM Alexander Semashko <ah...@yandex-team.ru>
wrote:

Alexander Semashko

unread,

Apr 27, 2018, 5:47:49 PM4/27/18

to Zhenyao Mo, graphics-dev, Sunny Sachanandani

> It could be local hardware/software corruption.

Probably. But this can also be caused by some specific usage scenarios which we can't dismiss for now.

Sorry for probably not being clear, I'm not talking about increasing the 15 sec. timeout. I meant an additional detector that looks for patterns like "there were N (e.g. > 5) tasks longer than t (e.g. 3 seconds) in a recent time slice (e.g. 30 seconds)". This is aimed to capture the aforementioned freezes that happen all the time without going away.

28 апр. 2018 г., в 0:22, Zhenyao Mo <z...@chromium.org> написал(а):

—

С уважением,

Александр Семашко

ah...@yandex-team.ru

Zhenyao Mo

unread,

Apr 27, 2018, 5:58:18 PM4/27/18

to ah...@yandex-team.ru, graphics-dev, Sunny Sachanandani

On Fri, Apr 27, 2018 at 2:47 PM Alexander Semashko <ah...@yandex-team.ru>
wrote:

> > It could be local hardware/software corruption.
> Probably. But this can also be caused by some specific usage scenarios
which we can't dismiss for now.

If you have two exact devices, you can compare their about:gpu and see if
there are any differences. Also, you can force both to run with clean by
passing --user-data-dir=NEW_LOCAL_TEMP_DIR and see if the issue goes away.

> Sorry for probably not being clear, I'm not talking about increasing the
15 sec. timeout. I meant an additional detector that looks for patterns
like "there were N (e.g. > 5) tasks longer than t (e.g. 3 seconds) in a
recent time slice (e.g. 30 seconds)". This is aimed to capture the
aforementioned freezes that happen all the time without going away.

In general, personally I prefer something like this simple, because there
will always be a case that needs a new rule, and soon it will be a mess.
It's better to know a low end device and just use blacklisting to put it on
software path.

Antoine Labour

unread,

Apr 27, 2018, 7:42:12 PM4/27/18

to Alexander Semashko, graphics-dev

On Fri, Apr 27, 2018 at 12:37 PM Alexander Semashko <ah...@yandex-team.ru> wrote:

In the trace you shared, several freezes are during CreateAndConsumeTextureINTERNAL, which should not even talk to the driver (just some map lookups/updates). The only reason for this to freeze is if the thread gets descheduled by the OS, most likely because of CPU over-subscription or memory swapping/trashing. Well, suspiciously the device traces also look like they're freezing at the exact same time (but expectedly on a different call, namely glClear), which is quite insane, unless it's also due to memory pressure.

Antoine

Reply all

Reply to author

Forward