SIGILL on A10/A11 Cuttlefish

259 views
Skip to first unread message

Bas van Tiel

unread,
May 19, 2021, 1:05:59 PM5/19/21
to swiftshader

Hello,

I'm using A10 Cuttlefish on a cloud AWS a1.metal instance (arm64). Quite frequently a SIGILL is happening

with a complete/incomplete backtrace when eglSwapBuffersWithDamageKHRImpl is called.


To be sure it's not related to the application, I could reproduce the issue with the testsuite

inside the AOSP (frameworks/native/libs/gui/tests/libgui_test) for both android10-gsi and android11-gsi.


The swiftshader unit-tests run OK for both A10 and A11 

system-unittests, gles-unittests, vk-unittests,math-unittests, ReactorUnitTests.


The 1st question would be, is this the correct group or should it be in a google group specifically for the AOSP?


My 2nd question is how to get more debugging output w.r.t. the code generation? Is it possible to switch from reactor-backend, e.g. use Subzero 

or LLVM? Any other tips to narrow down the issue would be helpful.


Thanks a lot in advance.

Bas




backtrace



signal 4 (SIGILL), code 1 (ILL_ILLOPC), fault addr 0x7c5b22a000 (*pc=0x2a1f03e9)

    x0  0000007bf89cb000  x1  0000007c5b35a6d0  x2  0000007c5232e710  x3  0000000000000002

    x4  0000007c5a36a020  x5  0000007c5a36a040  x6  0000007c5cc1f118  x7  0000007c5cc1f268

    x8  0000007c5b22a000  x9  0000007c02cec8a0  x10 000000002b628f50  x11 0000007c40000000

    x12 000000002b628f30  x13 0000007c5cc1fc78  x14 0000000000000001  x15 0000007c5a34cd40

    x16 0000007cec4e78f0  x17 0000007cec4d9b00  x18 0000007c01a6e000  x19 0000007c02ce5c00

    x20 0000000000000000  x21 0000007c02ce5c68  x22 0000007c5a508020  x23 0000007c5232e738

    x24 0000007c5a508020  x25 0000000000003450  x26 00000dba55a41dff  x27 0000007c5d0da204

    x28 000000000000001d  x29 0000007c5a507a70

    sp  0000007c5a507910  lr  0000007c5165afb4  pc  0000007c5b22a000


backtrace:

      #00 pc 0000000000000000  [anon:libc_malloc]

      #01 pc 0000000000455fb0  /vendor/lib64/egl/libGLESv2_swiftshader.so (sw::FrameBuffer::copyLocked()+256) (BuildId: 6d5a9931ff35fdef801ccf9ba7f7b191)

      #02 pc 0000000000455e64  /vendor/lib64/egl/libGLESv2_swiftshader.so (sw::FrameBuffer::copy(sw::Surface*)+204) (BuildId: 6d5a9931ff35fdef801ccf9ba7f7b191)

      #03 pc 0000000000455220  /vendor/lib64/egl/libGLESv2_swiftshader.so (sw::FrameBufferAndroid::blit(sw::Surface*, sw::RectT<int> const*, sw::RectT<int> cons

      #04 pc 000000000009c96c  /vendor/lib64/egl/libEGL_swiftshader.so (egl::WindowSurface::swap()+36) (BuildId: 6778e06d95ea7d120b50720da4544697)

      #05 pc 000000000009e888  /vendor/lib64/egl/libEGL_swiftshader.so (egl::SwapBuffers(void*, void*)+88) (BuildId: 6778e06d95ea7d120b50720da4544697)

      #06 pc 0000000000020924  /system/lib64/libEGL.so (android::eglSwapBuffersWithDamageKHRImpl(void*, void*, int*, int)+324) (BuildId: f9d65399d3536eea3ba102856b229746)

Nicolas Capens

unread,
May 20, 2021, 12:47:24 PM5/20/21
to Bas van Tiel, swiftshader
Hi Bas,

This is a good group to initiate a discussion about SwiftShader issues, thanks for reaching out!

I see you're using SwiftShader's OpenGL ES implementation. We've actually switched things over to using ANGLE + SwiftShader Vulkan ("SwANGLE"): https://android-review.googlesource.com/c/device/google/cuttlefish/+/1652109. The legacy code is deprecated.

Are you able to use SwANGLE instead? Hopefully that doesn't suffer from this issue, but in case it does, we'll give that appropriate priority.

Cheers,
Nicolas

--
You received this message because you are subscribed to the Google Groups "swiftshader" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swiftshader...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/swiftshader/61457c9b-4378-4859-9918-89eed940affbn%40googlegroups.com.

Bas van Tiel

unread,
May 21, 2021, 4:52:59 AM5/21/21
to swiftshader
Hi Nicolas,

thanks for reaching out that quickly, we are going to give this a try and let it know. 

kind regards
Bas

Bas van Tiel

unread,
Jul 11, 2021, 5:09:06 PM7/11/21
to swiftshader
Hi Nicolas,

to give you a quick update

part of our application suite is ported to Android 11 (master) and by using the SwANGLE implementation, we didn't saw the issues related to SIGILL. 

To still have a workable solution on A10, the dynamically generated code is replaced by statically compiled code and it didn't occur anymore.

kind regards
Bas

Martin

unread,
Jul 17, 2021, 12:12:52 AM7/17/21
to swiftshader
Hi bas,
Thank you for sharing, but I have two questions,

First of all, why can the SIGILL problem be avoided by using the SwANGLE implementation?
Then, can you tell me what you have done in the case of replaceing by statically  compiled code ?

Best regards
Martin

Bas van Tiel

unread,
Jul 19, 2021, 3:30:36 AM7/19/21
to swiftshader
Hi Martin,

1) In my case I always had a stack-trace that looked like:

      #01 pc 0000000000455fb0  /vendor/lib64/egl/libGLESv2_swiftshader.so (sw::FrameBuffer::copyLocked()+256) (BuildId: 6d5a9931ff35fdef801ccf9ba7f7b191)

      #02 pc 0000000000455e64  /vendor/lib64/egl/libGLESv2_swiftshader.so (sw::FrameBuffer::copy(sw::Surface*)+204) (BuildId: 6d5a9931ff35fdef801ccf9ba7f7b191)

      #03 pc 0000000000455220  /vendor/lib64/egl/libGLESv2_swiftshader.so (sw::FrameBufferAndroid::blit(sw::Surface*, sw::RectT<int> const*, sw::RectT<int> cons

      #04 pc 000000000009c96c  /vendor/lib64/egl/libEGL_swiftshader.so (egl::WindowSurface::swap()+36) (BuildId: 6778e06d95ea7d120b50720da4544697) 

the SIGILL probably depended on timing and the used ARM platform, but with enough time it was able to reproduce within 1 - 15 minutes. 

By converting parts of the stack to A11 and have the swAngle implementation running it didn't occur for a sustainable period of time. 

2) The first thing was to get some debugging info out of the library in the complete AOSP stack (Android10). By creating a local unix pipe inside the swiftshader library and finding the unix process, the fd[0] pipe-descriptor can be read from /proc/pid/fd/<xx>. Instead of using the dynamic generated code of copyRoutine, I created a regular c function (that one is statically compiled) to do the image conversion. Basically bypassing the complete code generation. Running the complete stack for several weeks it showed no SIGILL anymore.


kind regards
Bas

Nicolas Capens

unread,
Jul 21, 2021, 3:25:13 PM7/21/21
to Bas van Tiel, Martin, swiftshader
Hi all,

Sorry for the late reply. It's great to hear that you have a workable solution, Bas. Quite clever to replace the dynamically generated code with C code. Just to be clear, this workaround is not required on Android 11 with SwANGLE? We enabled SwANGLE for Cuttlefish after Android 10, so it makes sense that it would run into some issues on that release. I just want to make sure there's no SIGILL crashes on Android 11.

Martin, please note that this only works on the Cuttlefish virtual devices. Physical devices are not meant to have SwiftShader co-exist with a GPU driver since they have incompatible gralloc needs. SwiftShader can be a user-level Vulkan implementation (aka. the "Pastel" project), but it can't be a system-level driver unless there's no GPU.

Kind regards,
Nicolas

Bas van Tiel

unread,
Jul 21, 2021, 5:37:15 PM7/21/21
to Nicolas Capens, Martin, swiftshader
Hi Nicolas,

I had to do a quick and partly port of the sw-stack on A11 (with SwAngle enabled) and didn't see a SIGILL happening on the same arm64 hw architecture with the same test-set. To be 100% sure we need to do a complete port and that might take some time. In the meantime I keep an eye on it and let you know in case something comes up.

Without knowing the exact details I assume that the llvm code-generation part of the SwiftShader project for Android 11 is probably updated to a newer version?

kind regards
Bas
Reply all
Reply to author
Forward
0 new messages