Corebooted G505s Suspend/Resume Fails

129 views
Skip to first unread message

awokd

unread,
Apr 1, 2019, 7:21:07 PM4/1/19
to qubes-users
Trying to debug this; not something I use often but would be nice to
figure it out. Under Linux Mint 19.1 (no noises please, it was
convenient for troubleshooting), suspend "just works"- I close the lid,
power light starts blinking off & on, opening lid resumes normally.
Under Qubes 4.0.1, closing the lid acts like above, but resume hangs on
a black screen and the CPU fan slowly spinning up to full speed. Holding
down power button isn't enough to recover- although it will power off,
when I power back on it's still stuck the same way. I have to pull the
battery and power cable to get it to boot. I've tried:

- shutting down sys-net and sys-usb prior to suspend
- shutting down just sys-usb (since only those devices have no-strict-reset)
- adding mem_sleep_default=deep to kernel boot options
- adding mem_sleep_default=shallow to kernel boot options [resulted in
only screen going to sleep but not coming back]
- adding acpi.ec_no_wakeup=1 to kernel boot options

Dmesg says "ACPI: (supports S0 S1 S3 S5)" and /sys/power/mem_sleep says
"s2idle shallow [deep]". The last log lines before it enters suspend are:

dom0 systemd[1]: Starting Suspend...
dom0 systemd-sleep[3586]: Suspending system...
dom0 kernel: PM: suspend entry (deep)

Then nothing until I force a reset.

Any suggestions for a more intelligent way to troubleshoot? Logs or
settings I can look at somewhere in Mint that would give me a hint how
it's managing to successfully resume?
Message has been deleted

dm1.l...@gmail.com

unread,
Apr 6, 2019, 3:13:45 AM4/6/19
to qubes-users

dm1.l...@gmail.com

unread,
Apr 6, 2019, 3:14:34 AM4/6/19
to qubes-users
2019. április 1., hétfő 23:21:07 UTC időpontban awokd a következőt írta:
This issue is due to a xen patch ("Fix resume, when using microcode upgrade"), that has been included when releases changed from xen-4.8.3-4 to xen-4.8.3-5. This patch checks the availability of previous CPU features (..Spectre) during resume, and results in a xen panic on G505s - IMHO due to the static nature how the most recent (0x600111f) AMD microcodes need to be compiled in Corebooted systems.
It is no use to revert the whole patch, because it'll break the other xen patches introduced since. But you can:

diff -ur a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
--- a/xen/arch/x86/acpi/power.c 2019-03-31
+++ b/xen/arch/x86/acpi/power.c 2019-03-31
@@ -256,9 +256,9 @@

microcode_resume_cpu(0);

- if ( !recheck_cpu_features(0) )
+/* if ( !recheck_cpu_features(0) )
panic("Missing previously available feature(s).");
-
+*/
/* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
spec_ctrl_exit_idle(ci);

have this workaround, which solves the issue until someone provides a working solution on CB'd systems with AMD Fam15h. (..and also assesses the possible security impacts...)
Of course you'll need to recompile git:qubes-vmm-xen, but that is straightforward.
There could be some strange kernel messages in dom0 after resume, and you might have issues in sys-net devices waking up, but this mostly works fine (with kernel 4.14.103 --> kernels 4.19-xx still have issues with the radeon module)

awokd

unread,
Apr 6, 2019, 8:34:31 AM4/6/19
to dm1.l...@gmail.com, qubes-users, Mike Banon
dm1.l...@gmail.com wrote on 4/6/19 7:14 AM:

> This issue is due to a xen patch ("Fix resume, when using microcode upgrade"), that has been included when releases changed from xen-4.8.3-4 to xen-4.8.3-5. This patch checks the availability of previous CPU features (..Spectre) during resume, and results in a xen panic on G505s - IMHO due to the static nature how the most recent (0x600111f) AMD microcodes need to be compiled in Corebooted systems.
> It is no use to revert the whole patch, because it'll break the other xen patches introduced since. But you can:
>
> diff -ur a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
> --- a/xen/arch/x86/acpi/power.c 2019-03-31
> +++ b/xen/arch/x86/acpi/power.c 2019-03-31
> @@ -256,9 +256,9 @@
>
> microcode_resume_cpu(0);
>
> - if ( !recheck_cpu_features(0) )
> +/* if ( !recheck_cpu_features(0) )
> panic("Missing previously available feature(s).");
> -
> +*/
> /* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
> ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
> spec_ctrl_exit_idle(ci);
>
> have this workaround, which solves the issue until someone provides a working solution on CB'd systems with AMD Fam15h. (..and also assesses the possible security impacts...)
> Of course you'll need to recompile git:qubes-vmm-xen, but that is straightforward.
> There could be some strange kernel messages in dom0 after resume, and you might have issues in sys-net devices waking up, but this mostly works fine (with kernel 4.14.103 --> kernels 4.19-xx still have issues with the radeon module)
>

Thank you, I will definitely try it out and report back here! Any idea
if someone has submitted upstream to Xen? Seems like it would be an
issue with any Corebooted AMD AGESA system since they all handle
microcode the same way.
Message has been deleted
Message has been deleted

qubes123

unread,
Apr 8, 2019, 1:51:31 AM4/8/19
to qubes-users
I'm not aware of that. I think for CB'd fam15h we need a xen patch, that will not check the CPU features after resume, as microcode update never happens...

qubes123

unread,
Apr 8, 2019, 2:08:19 PM4/8/19
to qubes-users
...distribution kernels (fedora, debian) with xen 4.11.1 still have issues with suspend & g505s...

awokd

unread,
Apr 8, 2019, 2:44:05 PM4/8/19
to qubes123, qubes-users, Mike Banon
qubes123 wrote on 4/8/19 6:08 PM:
> ...distribution kernels (fedora, debian) with xen 4.11.1 still have issues with suspend & g505s...
>
Understood, thank you! That will save a lot of testing time. I'm still
getting my build environment stood up, but will update once I can
confirm the workaround.

awokd

unread,
Apr 10, 2019, 4:50:49 PM4/10/19
to qubes123, qubes-users, Mike Banon
awokd wrote on 4/8/19 6:43 PM:
Got my build environment going, but I think I am missing a step. I edit
/home/user/qubes-builder/chroot-dom0-fc25/home/user/rpmbuild/BUILD/xen-4.8.5/xen/arch/x86/acpi/power.c
with the above patch. Then I run "make vmm-xen". Then I see it has
overwritten my change and the code is no longer commented out. What am I
doing wrong?

awokd

unread,
Apr 10, 2019, 7:48:00 PM4/10/19
to qubes123, qubes-users, Mike Banon
awokd wrote on 4/10/19 8:50 PM:

> Got my build environment going, but I think I am missing a step. I edit
> /home/user/qubes-builder/chroot-dom0-fc25/home/user/rpmbuild/BUILD/xen-4.8.5/xen/arch/x86/acpi/power.c
> with the above patch. Then I run "make vmm-xen". Then I see it has
> overwritten my change and the code is no longer commented out. What am I
> doing wrong?

Never mind, forgot I had noted this down a while back:

cd /home/user/qubes-builder
sudo chroot chroot-dom0-fc25
su user
cd ~
make -C rpmbuild/BUILD/xen-4.8.5/xen

Then copy from the chroot's
home/user/rpmbuild/BUILD/xen-4.8.5/xen/xen.{gz,efi} .

For some reason closing & opening the lid doesn't do anything any more.
Don't understand what that section of code would have to do with it.
However, if I choose Suspend from the menu, and then hit a key it
successfully resumes! Thank you, that is very interesting. It would be
good to figure out what's broken and upstream the fix...


Mike Banon

unread,
Apr 11, 2019, 2:26:09 AM4/11/19
to awokd, qubes123, qubes-users
Excellent discovery, awokd and qubes123!
Please try to somehow upstream your solution to Xen.
Idea: find a way to detect a CPU type before executing this
"recheck_cpu_features(0)"
function, and if it is AMD CPU - maybe just skip this check :

> - if ( !recheck_cpu_features(0) )
> +/* if ( cpu_is_not_amd() && !recheck_cpu_features(0) )
> panic("Missing previously available feature(s).");
> -
> +*/

Perhaps this problem affects all the AMD and not just the coreboot'ed ones,
but maybe only a few people are using AMD laptop with xen so nobody noticed it

Best regards,
Mike Banon

awokd

unread,
Apr 11, 2019, 2:33:06 AM4/11/19
to Mike Banon, qubes123, qubes-users
Mike Banon:
> Excellent discovery, awokd and qubes123!

qubes123 gets 100% of the credit, I merely confirmed it.

> Please try to somehow upstream your solution to Xen.
> Idea: find a way to detect a CPU type before executing this
> "recheck_cpu_features(0)"
> function, and if it is AMD CPU - maybe just skip this check :

Xen is difficult to debug without a classic onboard serial port for
console output. Has to be some bug in that function.

>> - if ( !recheck_cpu_features(0) )
>> +/* if ( cpu_is_not_amd() && !recheck_cpu_features(0) )
>> panic("Missing previously available feature(s).");
>> -
>> +*/
>
> Perhaps this problem affects all the AMD and not just the coreboot'ed ones,
> but maybe only a few people are using AMD laptop with xen so nobody noticed it

That crossed my mind as well.

qubes123

unread,
Apr 11, 2019, 3:29:55 AM4/11/19
to qubes-users
For the patching, I modified the xen.spec(.in) file adding the patch to the already existing set of patches (eg. as Patch628). This way the patch is applied even after make clean, when the xen sources are gunzipped again.
I also compiled xen-4.12.0 with the patch, and there were no compilation errors. I cannot test this yet on dev:Qubes 4.1, FC29 dom0, because I'm having trouble setting up a working fc29 dom0.

Interestingly, suspend-by-lid-closing works for the first time after a clean boot, but then better to use the menu. Maybe some ACPI functions get broken after first suspend/resume. (but this is rather a CB and not Qubes topic...).

About debugging: yes, it would be great to have serial debugging on g505s (without an addon card), and I think it is not entirely impossible through the EC debug port (JP3), or EHCI...But this is rather a topic for CB and @Mike...

Mike Banon

unread,
Apr 11, 2019, 5:32:21 AM4/11/19
to awokd, qubes123, qubes-users
> Xen is difficult to debug without a classic onboard serial port for
> console output. Has to be some bug in that function.

Could Xen print messages to a screen? If yes, then it is possible to
find this function and insert the bunch of printf("1/2/3/etc") //
sleep(1) ( sleep is necessary to ensure that, before some action that
freezes the system, your just-printed message will be displayed on a
screen - without sleep, if it freezes too fast, may be not enough time
to display)

Although I have FT232H USB debug dongle, which could be used to get
the console output from USB 2.0 port (e.g. coreboot cbmem log) - I
don't know if it could be useful for Xen messages as well (and if any
extra configuration is required to make Xen output to this dongle),
and so many projects I don't have enough time to figure this out. So,
if you have some free time, you may try this printf / sleep approach
above.

Or, alternatively, please open a bug at Xen about this regression,
maybe they know an easy way of how to disable this check for AMD or at
least could provide some debugging ideas... It is in our best
interests that some solution for this problem gets upstreamed.

Best regards,
Mike Banon
Reply all
Reply to author
Forward
0 new messages