Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1020534: sgx: EPC section 0x50200000-0x55f7ffff (crash)

4 views
Skip to first unread message

Diederik de Haas

unread,
Sep 22, 2022, 4:40:03 PM9/22/22
to
Source: linux
Version: 5.18.2-1
Severity: normal

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Since kernel 5.18.2-1 I'm getting the following error/message in dmesg:

[ 0.465573] DMAR: Intel(R) Virtualization Technology for Directed I/O
[ 0.465579] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 0.465581] software IO TLB: mapped [mem 0x0000000049be6000-0x000000004dbe6000] (64MB)
[ 0.465676] sgx: EPC section 0x50200000-0x55f7ffff
[ 0.466859] ------------[ cut here ]------------
[ 0.466863] WARNING: CPU: 1 PID: 55 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b3/0x1c0
[ 0.466877] Modules linked in:
[ 0.466881] CPU: 1 PID: 55 Comm: ksgxd Not tainted 5.18.0-1-amd64 #1 Debian 5.18.2-1
[ 0.466887] Hardware name: LENOVO 20HRCTO1WW/20HRCTO1WW, BIOS N1MET70W (1.55 ) 07/07/2022
[ 0.466889] RIP: 0010:ksgxd+0x1b3/0x1c0
[ 0.466896] Code: ff e9 f6 fe ff ff 48 89 df e8 99 dd 0c 00 84 c0 0f 84 c7 fe ff ff 31 ff e8 fa dd 0c 00 84 c0 0f 85 98 fe ff ff e9 b3 fe ff ff <0f> 0b e9 83 fe ff ff e8 a1 d8 90 00 90 0f 1f 44 00 00 41 57 48 c1
[ 0.466903] RSP: 0000:ffffb24640463ed8 EFLAGS: 00010283
[ 0.466913] RAX: ffffb246403a1970 RBX: ffff9104c2753400 RCX: 0000000000000000
[ 0.466922] RDX: 0000000080000000 RSI: ffffb246403a1930 RDI: 00000000ffffffff
[ 0.466925] RBP: ffff9104c1345300 R08: ffff9104c13456c0 R09: ffff9104c13456c0
[ 0.466928] R10: 0000000000000000 R11: 0000000000000001 R12: ffffb24640073ce0
[ 0.466930] R13: ffff9104c2789980 R14: ffffffffa0a5c1c0 R15: 0000000000000000
[ 0.466933] FS: 0000000000000000(0000) GS:ffff910851680000(0000) knlGS:0000000000000000
[ 0.466937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.466940] CR2: 0000000000000000 CR3: 000000042d210001 CR4: 00000000003706e0
[ 0.466943] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.466945] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.466948] Call Trace:
[ 0.466952] <TASK>
[ 0.466956] ? _raw_spin_lock_irqsave+0x24/0x50
[ 0.466976] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 0.466982] ? __kthread_parkme+0x36/0x80
[ 0.466990] kthread+0xe8/0x110
[ 0.466994] ? kthread_complete_and_exit+0x20/0x20
[ 0.466998] ret_from_fork+0x22/0x30
[ 0.467008] </TASK>
[ 0.467010] ---[ end trace 0000000000000000 ]---
[ 0.468117] Initialise system trusted keyrings
[ 0.468137] Key type blacklist registered
[ 0.468217] workingset: timestamp_bits=36 max_order=22 bucket_order=0
[ 0.472438] zbud: loaded


It does NOT occur with version 5.17.3-1 and 5.18-1~exp1, but it does
occur with 5.18.2-1, 5.18.16-1, 5.19.6-1 and I first noticed it with a
self-compiled 6.0-rc6 (custom branch based on Debian kernel's master
branch).

Where it does NOT occur, there is no message containing 'sgx' in dmesg
at all and where it DOES occur it appears to be the same with only a
variantion in line number with ``arch/x86/kernel/cpu/sgx/main.c``

There are 5 commits with 'sgx' in their primary commit message:

diederik@prancing-pony:~/dev/kernel.org/linux$ git log --oneline v5.18..v5.18.2 | grep -ci sgx
5

And they are all sequential:

$ git log --oneline 557b6a9ccceeec1ae13a83b4490458b92e064c0e..5aada654649d9bcf6b89d7c0d1ff4b794f9295d3
5aada654649d media: i2c: imx412: Fix reset GPIO polarity
22e83371210d x86/sgx: Ensure no data in PCMD page after truncate
0e1f97633953 x86/sgx: Fix race between reclaimer and page fault handler
69432ff18091 x86/sgx: Obtain backing storage page with enclave mutex held
876053dd7503 x86/sgx: Mark PCMD page as dirty when modifying contents
5ded81f42258 x86/sgx: Disconnect backing page references from dirty status
6ad9dbb202a9 HID: multitouch: add quirks to enable Lenovo X12 trackpoint

But I haven't verified that the issue got introduced with one of them.

No idea if it could be relevant, but the SGX related section in my BIOS
has the following settings (on a Lenovo ThinkPad X1 Carbon 5th gen):
Intel (R) SGX Control: Software Controlled
Current State: Enabled

(And an item to 'Change Owner EPOCH')

I haven't changed those settings between reboots with the various
kernels (or ever AFAIR).


- -- System Information:
Debian Release: bookworm/sid
APT prefers testing
APT policy: (990, 'testing'), (500, 'stable-security'), (500, 'unstable'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 5.18.0-1-amd64 (SMP w/4 CPU threads; PREEMPT)
Kernel taint flags: TAINT_WARN
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

-----BEGIN PGP SIGNATURE-----

iHUEARYIAB0WIQT1sUPBYsyGmi4usy/XblvOeH7bbgUCYyzEdQAKCRDXblvOeH7b
blRMAQDxcH2fNVpKGoGU6AupSn7uQKSuNI0daaf+XKDCXbYiswD/T7R54kdPqh7L
xIpNsQJijfZ2r6+jVQ+232GODcbCkAs=
=B9if
-----END PGP SIGNATURE-----

Diederik de Haas

unread,
Sep 22, 2022, 4:50:03 PM9/22/22
to
On donderdag 22 september 2022 22:24:31 CEST Diederik de Haas wrote:
> Since kernel 5.18.2-1 I'm getting the following error/message in dmesg

FTR: the 'crash' seems to be in the SGX component, not the whole system.
I don't use SGX, so I don't know if or what for effect this has on its working.
signature.asc

Diederik de Haas

unread,
Sep 26, 2022, 12:10:03 PM9/26/22
to
On Thursday, 22 September 2022 22:24:31 CEST Diederik de Haas wrote:
> It does NOT occur with version 5.17.3-1 and 5.18-1~exp1, but it does
> occur with 5.18.2-1, 5.18.16-1, 5.19.6-1

This is interesting...

> There are 5 commits with 'sgx' in their primary commit message:
> And they are all sequential:
> $ git log --oneline 557b6a9ccc..5aada65464

The goal was 'git bisect': https://wiki.debian.org/DebianKernel/GitBisect

My assumption was that the issue got introduced by one of those commits.
So I build a kernel based on the commit preceding those, namely:
6ad9dbb202a9 HID: multitouch: add quirks to enable Lenovo X12 trackpoint

Booting into that kernel showed the error, so it wasn't introduced with the
'suspect list'.

Then I build a new kernel based on 5.18.0 (the release by Linus) ... and that
also showed the error!!
Booting into 5.18.0-trunk-amd64 (=5.18-1~exp1) did NOT show the issue!

The '.config' was obtained by "cp /boot/config-5.18.0-4-amd64 .config"

IOW: the Debian provided 5.18.0 kernel did not show the issue, while the self
compiled 5.18.0 kernel did!

Did I do something wrong? What should I do next?
signature.asc

Diederik de Haas

unread,
Sep 26, 2022, 2:20:02 PM9/26/22
to
Control: tag -1 help

On Monday, 26 September 2022 17:56:31 CEST Diederik de Haas wrote:
> On Thursday, 22 September 2022 22:24:31 CEST Diederik de Haas wrote:
> > It does NOT occur with version 5.17.3-1 and 5.18-1~exp1, but it does
> > occur with 5.18.2-1, 5.18.16-1, 5.19.6-1
>
> This is interesting...
>
> Then I build a new kernel based on 5.18.0 (the release by Linus) ... and
> that also showed the error!!
> Booting into 5.18.0-trunk-amd64 (=5.18-1~exp1) did NOT show the issue!
>
> The '.config' was obtained by "cp /boot/config-5.18.0-4-amd64 .config"
>
> IOW: the Debian provided 5.18.0 kernel did not show the issue, while the
> self compiled 5.18.0 kernel did!

I rebuild the 5.18.0 kernel, but this time with config-5.18.0-trunk-amd64 as
the source for `.config`.

First thing I noticed (the -3 is the newest built):
```
$ ls -lh linux-image-5.18.0_5.18.0-*
-rw-r--r-- 1 diederik diederik 69M sep 26 17:22 linux-image-5.18.0_5.18.0-2_amd64.deb
-rw-r--r-- 1 diederik diederik 56M sep 26 19:21 linux-image-5.18.0_5.18.0-3_amd64.deb
```
But I did use ``scripts/config --disable DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
--disable DEBUG_INFO_DWARF4 --disable DEBUG_INFO_DWARF5
--enable DEBUG_INFO_NONE --disable DEBUG_INFO`` this time.

Booting into it and I did NOT get the error!

So apparently between ``-trunk`` and ``-4`` something relevant changed in the config.

I have attached the full diff, but I'm pretty sure I found the relevant change:
$ grep SGX config-5.18-diff-trunk-ABI4.diff
-# CONFIG_X86_SGX is not set
+CONFIG_X86_SGX=y
+# CONFIG_X86_SGX_KVM is not set

The changelog from 5.18.1-1~exp1 has this item: ``[amd64] Enable X86_SGX``.

I'd need help/guidance on how to proceed from here, hence the 'help' tag.
config-5.18-diff-trunk-ABI4.diff
signature.asc

Debian Bug Tracking System

unread,
Sep 26, 2022, 2:20:02 PM9/26/22
to
Processing control commands:

> tag -1 help
Bug #1020534 [src:linux] sgx: EPC section 0x50200000-0x55f7ffff (crash)
Added tag(s) help.

--
1020534: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020534
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems

Debian Bug Tracking System

unread,
Oct 27, 2022, 8:40:03 AM10/27/22
to
Your message dated Thu, 27 Oct 2022 14:38:11 +0200
with message-id <2663521.mvXUDI8C0e@bagend>
and subject line Seems to be fixed
has caused the Debian Bug report #1020534,
regarding sgx: EPC section 0x50200000-0x55f7ffff (crash)
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)
signature.asc
0 new messages