Chromium GPU flags on Windows 10 on ARM

803 views
Skip to first unread message

Jon Kunkee (MSFT)

unread,
Jul 10, 2018, 2:15:58 PM7/10/18
to Graphics-dev
(Cross post from Chromium-Discuss per suggestion there)

Hi!

I'm currently a Microsoft dev. I work on ARM64 Windows, including the x86-on-ARM64 usermode emulation layer, so it's in my interest to see Chromium run well on it.

It actually does--once we figured out how to get it to recognize the GPU. (Long story. Ask me sometime; I'm not proud of how I filed bug 785688, but I did find a way around it.) As you can imagine, though, running under emulation on a Snapdragon 835 isn't quite like running on an i7, so we're always looking for ways to improve performance.

One of my coworkers recently pulled up about://flags and wondered if any of the commonly disabled features would be beneficial for Chromium under emulation. When one of my more graphics-savvy teammates saw that GPU Rasterization is only enabled on specific GPUs, he explained to me that it is not just a compatibility issue but also a DMA vs. CPU cost trade-off: GPU rasterization reduces CPU load but requires an extra host-GPU memory copy. (I'd love to understand a bit more about this, but that's what I have so far.) In the case of x86 emulation on ARM64, the GPU offload makes conceptual sense and, in my simple local testing, doesn't cause crashes.

I'd like to propose adding a GPU DB entry that turns on GPU Rasterization for Adreno GPUs on Windows. If the JSON schema is sufficiently expressive, this could be for specific GPUs and should only be when Windows is ARM64 (Adreno is a good proxy for now) and Chromium is x86 (known at build time).

To that end:
  • Is there any existing test collateral for GPU rasterization?
  • Are there any benchmarks that do a good job measuring its impact?
  • What background around this feature might I be missing?
  • Where does the GPU blocklist live these days? (I'm happy to make the change and prep a PR.)
  • If this makes sense, could I get some help crafting the DB entry?
I understand that the best solution here would be an ARM64 port of Chromium, but that's another thread for another day. The GPU DB entry can happen sooner, so that's why I'm asking. :)

Thanks,
Jon

Eric Karl

unread,
Jul 10, 2018, 7:56:53 PM7/10/18
to jonatha...@gmail.com, Heather Miller, graphics-dev
Hi Jon,

This makes sense to me. In general we've found that GPU raster does benefit systems with lower CPU power, so I'd love to see this enabled on ARM64 Windows. I've tried to answer your questions inline.

On Tue, Jul 10, 2018 at 11:16 AM Jon Kunkee (MSFT) <jonatha...@gmail.com> wrote:
(Cross post from Chromium-Discuss per suggestion there)

Hi!

I'm currently a Microsoft dev. I work on ARM64 Windows, including the x86-on-ARM64 usermode emulation layer, so it's in my interest to see Chromium run well on it.

It actually does--once we figured out how to get it to recognize the GPU. (Long story. Ask me sometime; I'm not proud of how I filed bug 785688, but I did find a way around it.) As you can imagine, though, running under emulation on a Snapdragon 835 isn't quite like running on an i7, so we're always looking for ways to improve performance.

One of my coworkers recently pulled up about://flags and wondered if any of the commonly disabled features would be beneficial for Chromium under emulation. When one of my more graphics-savvy teammates saw that GPU Rasterization is only enabled on specific GPUs, he explained to me that it is not just a compatibility issue but also a DMA vs. CPU cost trade-off: GPU rasterization reduces CPU load but requires an extra host-GPU memory copy. (I'd love to understand a bit more about this, but that's what I have so far.) In the case of x86 emulation on ARM64, the GPU offload makes conceptual sense and, in my simple local testing, doesn't cause crashes.
I don't think there are any reasons this wouldn't work well for Adreno on Windows - it's just not a configuration we tested. I'm not sure that GPU rasterization should lead to any extra host-GPU memory copies - we do need to upload/copy images to the GPU for raster, but with CPU raster we had to upload full tiles of rendered content. Either way, we've found that GPU rasterization generally beneficial.

I'd like to propose adding a GPU DB entry that turns on GPU Rasterization for Adreno GPUs on Windows. If the JSON schema is sufficiently expressive, this could be for specific GPUs and should only be when Windows is ARM64 (Adreno is a good proxy for now) and Chromium is x86 (known at build time).

To that end:
  • Is there any existing test collateral for GPU rasterization?
GPU rasterization is handled by the Skia rendering library, and a good place to start would be with Skia's unit and pixel tests. You can build/run these locally by following instructions here: https://skia.org/dev/testing/testing
The first step would be to run the unit tests - I believe these *should* all pass - +Heather Miller who might know of any known issues.
You could also run the pixel tests with both SW and GPU rasterization and diff the generated images (the skdiff tool mentioned on that page should work). Note that we expect *some* differences between SW/GPU, so it might be a lot to triage - mostly very minor changes.
  • Are there any benchmarks that do a good job measuring its impact?
In general, I'd recommend running our "rendering.desktop" benchmark suite. You can run from a Chrome checkout:
./tools/perf/run_benchmark rendering.desktop --extra-browser-args=--disable-gpu-rasterization --results-label=SW 
./tools/perf/run_benchmark rendering.desktop --extra-browser-args=--enable-gpu-rasterization --results-label=GPU 

Once you've run these two commands (will take a while to complete), an HTML report will be generated which will let you compare results.
  • What background around this feature might I be missing?
I think everything you mention makes sense. 
  • Where does the GPU blocklist live these days? (I'm happy to make the change and prep a PR.)
You should be able to add an exception for the Adreno GPU's Vendor ID.
 
  • If this makes sense, could I get some help crafting the DB entry?
I understand that the best solution here would be an ARM64 port of Chromium, but that's another thread for another day. The GPU DB entry can happen sooner, so that's why I'm asking. :)

Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "Graphics-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev...@chromium.org.

Jon Kunkee (MSFT)

unread,
Jul 20, 2018, 6:08:48 PM7/20/18
to Graphics-dev, jonatha...@gmail.com, h...@google.com
 
This makes sense to me. In general we've found that GPU raster does benefit systems with lower CPU power, so I'd love to see this enabled on ARM64 Windows. I've tried to answer your questions inline.

Great to hear!
  • Is there any existing test collateral for GPU rasterization?
GPU rasterization is handled by the Skia rendering library, and a good place to start would be with Skia's unit and pixel tests. You can build/run these locally by following instructions here: https://skia.org/dev/testing/testing
The first step would be to run the unit tests - I believe these *should* all pass - +Heather Miller who might know of any known issues.
You could also run the pixel tests with both SW and GPU rasterization and diff the generated images (the skdiff tool mentioned on that page should work). Note that we expect *some* differences between SW/GPU, so it might be a lot to triage - mostly very minor changes.

Thanks--this is what I was looking for.
  • Are there any benchmarks that do a good job measuring its impact?
In general, I'd recommend running our "rendering.desktop" benchmark suite. You can run from a Chrome checkout:
./tools/perf/run_benchmark rendering.desktop --extra-browser-args=--disable-gpu-rasterization --results-label=SW 
./tools/perf/run_benchmark rendering.desktop --extra-browser-args=--enable-gpu-rasterization --results-label=GPU 

Once you've run these two commands (will take a while to complete), an HTML report will be generated which will let you compare results.
  • Where does the GPU blocklist live these days? (I'm happy to make the change and prep a PR.)
You should be able to add an exception for the Adreno GPU's Vendor ID. 

After digging around, it turns out I am blocked by bug 785688. It so happens that ACPI-buss devices ID strings are not handled by the current Device ID and Hardware/Compatible ID string parsing because the prefix is not three characters (ACPI instead of PCI or AGP) and the vendor ID is an arbitrary four-character string, not a number (QCOM, in this case). Sounds like I'll need to prep a bigger change than expected.

Jon

Ken Russell

unread,
Jul 20, 2018, 7:11:25 PM7/20/18
to jonatha...@gmail.com, graphics-dev, Heather Miller
Any change you can propose to the GPU blacklist in order to support ACPI buses will be appreciated. Not sure what's the best way to generalize the code, since it's currently (over-) specialized to handle both machines that have PCI buses (device/vendor ID) and those that don't (blacklist based on GL renderer/vendor strings). Please give it some thought and post here if you need help.

-Ken


 
Jon

Jon Kunkee (MSFT)

unread,
Jul 20, 2018, 7:41:05 PM7/20/18
to Graphics-dev, jonatha...@gmail.com, h...@google.com
After digging around, it turns out I am blocked by bug 785688. It so happens that ACPI-buss devices ID strings are not handled by the current Device ID and Hardware/Compatible ID string parsing because the prefix is not three characters (ACPI instead of PCI or AGP) and the vendor ID is an arbitrary four-character string, not a number (QCOM, in this case). Sounds like I'll need to prep a bigger change than expected.

 
Any change you can propose to the GPU blacklist in order to support ACPI buses will be appreciated. Not sure what's the best way to generalize the code, since it's currently (over-) specialized to handle both machines that have PCI buses (device/vendor ID) and those that don't (blacklist based on GL renderer/vendor strings). Please give it some thought and post here if you need help.

I propose two changes:

1. Move from fixed-length-string parsing to delimiter-based parsing (thus allowing for more busses in the future)
2. Move from numeric vendor IDs to string vendor IDs

Since ACPI.sys can be convinced to imitate PCI.sys device strings except for numeric vendor IDs, this is sufficient and somewhat more future-proof.

The long version: The device's buss driver gets to decide the format of both its Device IDs and its Hardware/Compatible IDs. This generally reflects the underlying standard; PCI has VEN_nnnn where nnnn are hexadecimal digits. GPUs are generally under Windows buss drivers, but not even those are consistent. ACPI.sys uses completely different formats between Device IDs and Hardware/Compatible IDs, partly because the ACPI standard has so many optional fields. I filed the bug because this mismatch meant Chrome didn't find a primary GPU, so it saved an empty string as the primary GPU's date, then when the date parsing broke it assumed the date was prior to 2009 and turned off all acceleration. After reading some code, I found that ACPI.sys will consistently imitate the PCI.sys format for *both* Device IDs and Hardware/Compatible IDs if the optional hardware revision field is present in the ACPI table entry for the device, so with a firmware change we got Chrome to enable hardware acceleration. 

Even with the two kinds of strings matching and the three-character-buss-name issue fixed, the vendor ID in the ACPI spec is still an arbitrary four-character string and so breaks the assumption that it's a number. Oddly enough, I don't see any place where the numerical interpretation is actually required--including in the JSON--so I think this fix is safe to make.

Thoughts?

Thanks,
Jon

Ken Russell

unread,
Jul 20, 2018, 8:31:24 PM7/20/18
to jonatha...@gmail.com, graphics-dev, Heather Miller
True, gpu_driver_bug_list.json and software_rendering_list.json represent the four-digit hex PCI IDs as strings rather than numbers. Internally however they are currently represented as ints: see src/gpu/config/gpu_info.h. These will have to be upgraded to strings, the platform-dependent code changed to report these as strings rather than ints, and many uses throughout the code changed.

Here's an incomplete query enumerating the locations where vendor_id is referenced:

A better way to find all the uses might be to go to this file:

and click the "vendor_id" and "device_id" fields. Codesearch will find all of the references and report them. There are ~100 references to each so it's not too bad. There will be auxiliary code changes that will have to be traced through.

For example, here's one place where the computation of keys in the crash database will be updated:

Here's one which is more subtle and which will require changes to src/third_party/catapult (maintained in a separate repository; https://github.com/catapult-project/catapult )

The way this will have to be handled is to check the type of the value and do one thing if it's a str, another if it's an int. The data comes from here:

Chrome's about:gpu page rendering will need to be changed too.

After discussion with a couple of colleagues we agree that upgrading these vendor_id and device_id fields to strings is the right thing to do so we'll appreciate your help doing this.

-Ken


Thoughts?

Thanks,
Jon

Jon Kunkee (MSFT)

unread,
Jul 20, 2018, 9:36:20 PM7/20/18
to Graphics-dev, jonatha...@gmail.com, h...@google.com
1. Move from fixed-length-string parsing to delimiter-based parsing (thus allowing for more busses in the future)
2. Move from numeric vendor IDs to string vendor IDs

Since ACPI.sys can be convinced to imitate PCI.sys device strings except for numeric vendor IDs, this is sufficient and somewhat more future-proof.

True, gpu_driver_bug_list.json and software_rendering_list.json represent the four-digit hex PCI IDs as strings rather than numbers. Internally however they are currently represented as ints: see src/gpu/config/gpu_info.h. These will have to be upgraded to strings, the platform-dependent code changed to report these as strings rather than ints, and many uses throughout the code changed.

I stopped here and groaned. Because I was debugging from disablement down, I had tunnel vision. Windows parses hex to int, then int to hex for comparison with the JSON, but that's just one path for vendor_id.
 
Here's an incomplete query enumerating the locations where vendor_id is referenced:

A better way to find all the uses might be to go to this file:

and click the "vendor_id" and "device_id" fields. Codesearch will find all of the references and report them. There are ~100 references to each so it's not too bad. There will be auxiliary code changes that will have to be traced through.

For example, here's one place where the computation of keys in the crash database will be updated:

Here's one which is more subtle and which will require changes to src/third_party/catapult (maintained in a separate repository; https://github.com/catapult-project/catapult )

The way this will have to be handled is to check the type of the value and do one thing if it's a str, another if it's an int. The data comes from here:

Chrome's about:gpu page rendering will need to be changed too.

Alas, that's not bad, true, but it's much bigger than I'm funded to do at the moment.

(Were I to secure funding for the dev work, would I be able to somehow benefit from the CI and other infra Chromium devs rely on? I won't get funded to run full-matrix tests, not even on Windows.)

As an interim fix, CollectDriverInfoD3D or DeviceIDToVendorAndDevice could recognize non-numeric vendor IDs and 'translate' them into matching (PCI?) numeric vendor IDs. (The delimiter-based parsing change would be needed since DeviceIDToVendorAndDevice assumes all bus drivers will have three-letter names.)
 
After discussion with a couple of colleagues we agree that upgrading these vendor_id and device_id fields to strings is the right thing to do so we'll appreciate your help doing this.

I'm going to do some benchmarking using the override flag to try and get funding to help, but I'm not very optimistic. :/

(Of course, if the benchmarking doesn't show marked improvement, this work no longer has a motivating force... :)

Cheers,
Jon

Ken Russell

unread,
Jul 23, 2018, 2:35:20 PM7/23/18
to Jonathan Kunkee, graphics-dev, Heather Miller
On Fri, Jul 20, 2018 at 6:36 PM Jon Kunkee (MSFT) <jonatha...@gmail.com> wrote:
1. Move from fixed-length-string parsing to delimiter-based parsing (thus allowing for more busses in the future)
2. Move from numeric vendor IDs to string vendor IDs

Since ACPI.sys can be convinced to imitate PCI.sys device strings except for numeric vendor IDs, this is sufficient and somewhat more future-proof.

True, gpu_driver_bug_list.json and software_rendering_list.json represent the four-digit hex PCI IDs as strings rather than numbers. Internally however they are currently represented as ints: see src/gpu/config/gpu_info.h. These will have to be upgraded to strings, the platform-dependent code changed to report these as strings rather than ints, and many uses throughout the code changed.

I stopped here and groaned. Because I was debugging from disablement down, I had tunnel vision. Windows parses hex to int, then int to hex for comparison with the JSON, but that's just one path for vendor_id.
 
Here's an incomplete query enumerating the locations where vendor_id is referenced:

A better way to find all the uses might be to go to this file:

and click the "vendor_id" and "device_id" fields. Codesearch will find all of the references and report them. There are ~100 references to each so it's not too bad. There will be auxiliary code changes that will have to be traced through.

For example, here's one place where the computation of keys in the crash database will be updated:

Here's one which is more subtle and which will require changes to src/third_party/catapult (maintained in a separate repository; https://github.com/catapult-project/catapult )

The way this will have to be handled is to check the type of the value and do one thing if it's a str, another if it's an int. The data comes from here:

Chrome's about:gpu page rendering will need to be changed too.

Alas, that's not bad, true, but it's much bigger than I'm funded to do at the moment.

Understood. It's a large refactoring.
 
(Were I to secure funding for the dev work, would I be able to somehow benefit from the CI and other infra Chromium devs rely on? I won't get funded to run full-matrix tests, not even on Windows.)

Yes, certainly. If you don't already have it, we can request try job access for your Gerrit account so that you can test your CLs.

As an interim fix, CollectDriverInfoD3D or DeviceIDToVendorAndDevice could recognize non-numeric vendor IDs and 'translate' them into matching (PCI?) numeric vendor IDs. (The delimiter-based parsing change would be needed since DeviceIDToVendorAndDevice assumes all bus drivers will have three-letter names.)
 
After discussion with a couple of colleagues we agree that upgrading these vendor_id and device_id fields to strings is the right thing to do so we'll appreciate your help doing this.

I'm going to do some benchmarking using the override flag to try and get funding to help, but I'm not very optimistic. :/

(Of course, if the benchmarking doesn't show marked improvement, this work no longer has a motivating force... :)

Another option would be to add new acpi_vendor_id and acpi_device_id string fields to GpuInfo and plumb them through only to the GPU blacklisting code (and not all the way up to Telemetry, for example). Other folks here can chime in if they think that idea's unacceptable, but I think it would be generally OK.

-Ken

 

Cheers,
Jon

Reply all
Reply to author
Forward
0 new messages