Debugging a sleep/suspend problem on Razer Blade Stealth 2016 - Qubes

87 views
Skip to first unread message

Guerlan

unread,
Jan 8, 2020, 8:55:13 PM1/8/20
to qubes-users
First of all, here's the HCL for my Razer Blade Stealth 2016 4K touchscreen 16gb RAM 512gb SSD: https://groups.google.com/forum/#!searchin/qubes-users/razer$20blade%7Csort:date/qubes-users/PalZ-1inxnA/D3mQ4OI3CAAJ

When I close the lid and open again, keyboard wont ligth up, screen wont turn on (it's LED so I can see a brigth black when it turns on), and hitting keyboard or touchpad does nothing. I have to reboot. I don't know, however, if keyboard not ligthing when I open the lid is because sys-usb, which contains the keyboard, is not waken. Every other aspect of the laptop seems to be working perfectly.

I followed Ubuntu's guide on kernel suspend bugs: https://wiki.ubuntu.com/DebuggingKernelSuspend

Then, following what they suggest

`sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"`

and find the lines that says hash matches in dmesg rigth after reboot (what does that mean?)

Well, I found two:

```
[    3.583591] ima: Allocated hash algorithm: sha1
[    3.593050] input: AT Raw Set 2 keyboard as /devices/platform/i8042/serio0/input/input4
[    3.638808]   Magic number: 0:929:176
[    3.638867] acpi device:39: hash matches
[    3.638893] acpi device:0c: hash matches
[    3.639073] rtc_cmos 00:01: setting system clock to 2016-01-01 12:09:51 UTC (1451650191)
```

I couldn't find anything related to those acpi devices. I thougth first that there was a driver for them, so I should just rmmod those drivers before sleep and insmod when wakeup, but couldn't find anything. There's this issue https://ubuntuforums.org/archive/index.php/t-2393029.html which have those exact hash matches, but no answer.

Then I asked for help on a forum and they found this problematic line on my dmesg:

`[    2.543596] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM`

seems like ASPM is disabled on my Qubes. I don't know why. Should this be considered a bug? Is there anything I can do to get it working? This looks promising.

It's worth noting that on Ubuntu 18, 19, Fedora 30, Linux Mint, etc, all these systems work like a charm with the sleep process. I can close the lid and open and it works. So the problem seems to be **related to Qubes**. I even tried qubes most recent dom0 kernel, based on 5.x linux kernel, but the problem persists.

I also tried `pcie_aspm=force` on `/boot/efi/EFI/qubes/xen.cfg` (is this where I put kernel parameters?) like this:

`kernel=vmlinuz-4.14.74-1.pvops.qubes.x86_64 root=/dev/mapper/qubes_dom0-root rd.luks.uuid=luks-39fc83eb-9829-43b7-86e8-08068bd81087 rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap i915.alpha_support=1 pcie_aspm=force rhgb quiet plymouth.ignore-serial-consoles`

but it didn't help.

I pratically need to run Qubes on this notebook because any Linux distribution with any kernel will have a problem that corrupts my SSD many times a day. No one could solve it, and on Qubes it never happens. I tried Qubes just to see if it'd solve and it does! I'm loving it, not going back even on other notebooks. However, closing the lid/putting the system to sleep is essential for a notebook.

```
[lz@dom0 ~]$ cat /sys/power/mem_sleep
s2idle [deep]
```

as you see, the suspend default is deep mode.

I tried s2idle by doing `echo freeze > /sys/power/state` and the screen turns off but they keyboard keeps with lights on. Pressing buttons does nothing. Pressing touchpad, nothing. Pressing power rapidly, nothing. Had to reboot by long pressing power. I thougth s2idle should always work since it's software based.

Here's my journalctl of the moment when I go to suspend by closing the lid (that is, suspending in deep mode):

```
Jan 07 20:56:24 dom0 systemd-logind[1925]: Lid closed.
Jan 07 20:56:24 dom0 systemd-logind[1925]: Suspending...
Jan 07 20:56:24 dom0 systemd[1]: Starting Qubes suspend hooks...
Jan 07 20:56:25 dom0 qmemman.daemon.algo[1921]: balance_when_enough_memory(
xen_free_memory=8172072647, total_mem_pref=2493652659.2, total_available_memory=13171544083.8)
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: dom '5' act=3198156800 pref=963591782.4 last_target=3198156800
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: dom '0' act=4294967296 pref=1530060876.8 last_target=4294967296
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: xenfree=8224501447 memset_reqs=[('5', 3198156800), ('0', 4294967296)]
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: mem-set domain 5 to 3198156800
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: mem-set domain 0 to 4294967296
Jan 07 20:56:25 dom0 qrexec[3884]: qubes.GetDate: social -> @default: allowed to dom0
Jan 07 20:56:25 dom0 qmemman.daemon.algo[1921]: balance_when_enough_memory(xen_free_memory=8172072647, total_mem_pref=2450575027.2, total_available_memory=13214621715.8)
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: dom '5' act=3198156800 pref=920514150.4 last_target=3198156800
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: dom '0' act=4294967296 pref=1530060876.8 last_target=4294967296
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: stat: xenfree=8224501447 memset_reqs=[('5', 3198156800), ('0', 4294967296)]
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: mem-set domain 5 to 3198156800
Jan 07 20:56:25 dom0 qmemman.systemstate[1921]: mem-set domain 0 to 4294967296
Jan 07 20:56:26 dom0 qmemman.daemon.algo[1921]: balance_when_enough_memory(xen_free_memory=8172072647, total_mem_pref=2398557056.0, total_available_memory=13266639687.0)
Jan 07 20:56:26 dom0 qmemman.systemstate[1921]: stat: dom '5' act=3198156800 pref=920514150.4 last_target=3198156800
Jan 07 20:56:26 dom0 qmemman.systemstate[1921]: stat: dom '0' act=4294967296 pref=1478042905.6000001 last_target=4294967296
Jan 07 20:56:26 dom0 qmemman.systemstate[1921]: stat: xenfree=8224501447 memset_reqs=[('5', 3198156800), ('0', 4294967296)]
Jan 07 20:56:26 dom0 qmemman.systemstate[1921]: mem-set domain 5 to 3198156800
Jan 07 20:56:26 dom0 qmemman.systemstate[1921]: mem-set domain 0 to 4294967296
Jan 07 20:56:26 dom0 qmemman.daemon.algo[1921]: balance_when_enough_memory(xen_free_memory=8172072647, total_mem_pref=2398557056.0, total_available_memory=13266639687.0)
Jan 07 20:56:26 dom0 qmemman.systemstate[1921]: stat: dom '5' act=3198156800 pref=920514150.4 last_target=3198156800
Jan 07 20:56:26 dom0 qmemman.systemstate[1921]: stat: dom '0' act=4294967296 pref=1478042905.6000001 last_target=4294967296
Jan 07 20:56:26 dom0 qmemman.systemstate[1921]: stat: xenfree=8224501447 memset_reqs=[('5', 3198156800), ('0', 4294967296)]
Jan 07 20:56:26 dom0 qmemman.systemstate[1921]: mem-set domain 5 to 3198156800
Jan 07 20:56:26 dom0 qmemman.systemstate[1921]: mem-set domain 0 to 4294967296
Jan 07 20:56:27 dom0 qmemman.daemon.algo[1921]: balance_when_enough_memory(xen_free_memory=8172072647, total_mem_pref=2398557056.0, total_available_memory=13266639687.0)
Jan 07 20:56:27 dom0 qmemman.systemstate[1921]: stat: dom '5' act=3198156800 pref=920514150.4 last_target=3198156800
Jan 07 20:56:27 dom0 qmemman.systemstate[1921]: stat: dom '0' act=4294967296 pref=1478042905.6000001 last_target=4294967296
Jan 07 20:56:27 dom0 qmemman.systemstate[1921]: stat: xenfree=8224501447 memset_reqs=[('5', 3198156800), ('0', 4294967296)]
Jan 07 20:56:27 dom0 qmemman.systemstate[1921]: mem-set domain 5 to 3198156800
Jan 07 20:56:27 dom0 qmemman.systemstate[1921]: mem-set domain 0 to 4294967296
Jan 07 20:56:27 dom0 52qubes-pause-vms[3877]: 0
Jan 07 20:56:27 dom0 systemd[1]: Started Qubes suspend hooks.
Jan 07 20:56:27 dom0 systemd[1]: Reached target Sleep.
Jan 07 20:56:27 dom0 systemd[1]: Starting Suspend...
Jan 07 20:56:27 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 07 20:56:27 dom0 kernel: audit: type=1130 audit(1578441387.401:154): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 07 20:56:27 dom0 qmemman.daemon.algo[1921]: balance_when_enough_memory(xen_free_memory=8172072647, total_mem_pref=2355229158.4, total_available_memory=13309967584.6)
Jan 07 20:56:27 dom0 qmemman.systemstate[1921]: stat: dom '5' act=3198156800 pref=920514150.4 last_target=3198156800
Jan 07 20:56:27 dom0 qmemman.systemstate[1921]: stat: dom '0' act=4294967296 pref=1434715008.0 last_target=4294967296
Jan 07 20:56:27 dom0 qmemman.systemstate[1921]: stat: xenfree=8224501447 memset_reqs=[('5', 3198156800), ('0', 4294967296)]
Jan 07 20:56:27 dom0 qmemman.systemstate[1921]: mem-set domain 5 to 3198156800
Jan 07 20:56:27 dom0 qmemman.systemstate[1921]: mem-set domain 0 to 4294967296
Jan 07 20:56:27 dom0 systemd-sleep[3912]: /usr/lib/systemd/system-sleep/custom-xhci_hcd: Going to suspend...
Jan 07 20:56:27 dom0 systemd-sleep[3912]: Suspending system...
Jan 07 20:56:27 dom0 kernel: PM: suspend entry (deep)
```

Here's my full dmesg with acpi grep in case anyone needs

```
dmesg | grep -i acpi

0.000000] Xen: [mem 0x000000006e503000-0x000000006e503fff] ACPI NVS
[    0.000000] Xen: [mem 0x000000007a3d9000-0x000000007a444fff] ACPI data
[    0.000000] Xen: [mem 0x000000007a445000-0x000000007abfefff] ACPI NVS
[    0.000000] efi:  ESRT=0x7b226f18  ACPI=0x7a3ee000  ACPI 2.0=0x7a3ee000  SMBIOS=0x7b210000  SMBIOS 3.0=0x7b20f000
[    1.069765] ACPI: Early table checksum verification disabled
[    1.069772] ACPI: RSDP 0x000000007A3EE000 000024 (v02 ALASKA)
[    1.069782] ACPI: XSDT 0x000000007A3EE0B0 0000E4 (v01 ALASKA A M I    01072009 AMI  00010013)
[    1.069821] ACPI: FACP 0x000000007A414898 000114 (v06 ALASKA A M I    01072009 AMI  00010013)
[    1.069889] ACPI: DSDT 0x000000007A3EE228 02666B (v02 ALASKA A M I    01072009 INTL 20160422)
[    1.069903] ACPI: FACS 0x000000007ABE6C40 000040
[    1.069916] ACPI: APIC 0x000000007A4149B0 000084 (v03 ALASKA A M I    01072009 AMI  00010013)
[    1.069930] ACPI: FPDT 0x000000007A414A38 000044 (v01 ALASKA A M I    01072009 AMI  00010013)
[    1.069943] ACPI: MCFG 0x000000007A414A80 00003C (v01 ALASKA A M I    01072009 MSFT 00000097)
[    1.069957] ACPI: FIDT 0x000000007A414AC0 00009C (v01 ALASKA A M I    01072009 AMI  00010013)
[    1.069970] ACPI: MSDM 0x000000007A414B60 000055 (v03 ALASKA A M I    01072009 AMI  00010013)
[    1.069984] ACPI: SSDT 0x000000007A414BB8 003154 (v02 SaSsdt SaSsdt   00003000 INTL 20160422)
[    1.070007] ACPI: HPET 0x000000007A417D10 000038 (v01 INTEL  KBL-ULT  00000001 MSFT 0000005F)
[    1.070021] ACPI: SSDT 0x000000007A417D48 000E3B (v02 INTEL  Ther_Rvp 00001000 INTL 20160422)
[    1.070034] ACPI: SSDT 0x000000007A418B88 0006BB (v02 INTEL  xh_OEMBD 00000000 INTL 20160422)
[    1.070048] ACPI: UEFI 0x000000007A419248 000042 (v01 INTEL  EDK2     00000002      01000013)
[    1.070062] ACPI: SSDT 0x000000007A419290 000EDE (v02 CpuRef CpuSsdt  00003000 INTL 20160422)
[    1.070075] ACPI: LPIT 0x000000007A41A170 000094 (v01 INTEL  KBL-ULT  00000000 MSFT 0000005F)
[    1.070089] ACPI: WSMT 0x000000007A41A208 000028 (v01 INTEL  KBL-ULT  00000000 MSFT 0000005F)
[    1.070102] ACPI: SSDT 0x000000007A41A230 00029F (v02 INTEL  sensrhub 00000000 INTL 20160422)
[    1.070116] ACPI: SSDT 0x000000007A41A4D0 003002 (v02 INTEL  PtidDevc 00001000 INTL 20160422)
[    1.070130] ACPI: DBGP 0x000000007A41D4D8 000034 (v01 INTEL           00000002 MSFT 0000005F)
[    1.070143] ACPI: DBG2 0x000000007A41D510 000054 (v00 INTEL           00000002 MSFT 0000005F)
[    1.070157] ACPI: BGRT 0x000000007A41D568 000038 (v01 ALASKA A M I    01072009 AMI  00010013)
[    1.070171] ACPI: RMAD 0x000000007A41D5A0 000114 (v01 INTEL  KBL      00000001 INTL 00000001)
[    1.070185] ACPI: SSDT 0x000000007A41D6B8 00054D (v01 TbtGfx TbtGfx   00001000 INTL 20160422)
[    1.070198] ACPI: TPM2 0x000000007A41DC08 000034 (v03        Tpm2Tabl 00000001 AMI  00000000)
[    1.070212] ACPI: ASF! 0x000000007A41DC40 0000A0 (v32 INTEL   HCG     00000001 TFSM 000F4240)
[    1.070249] ACPI: Local APIC address 0xfee00000
[    2.173665] ACPI: PM-Timer IO Port: 0x1808
[    2.173672] ACPI: Local APIC address 0xfee00000
[    2.173707] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[    2.173708] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
[    2.173710] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
[    2.173711] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
[    2.173762] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    2.173765] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    2.173769] ACPI: IRQ0 used by override.
[    2.173771] ACPI: IRQ9 used by override.
[    2.173780] Using ACPI (MADT) for SMP configuration information
[    2.173784] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    2.352646] ACPI: Core revision 20180810
[    2.371112] ACPI BIOS Warning (bug): Incorrect checksum in table [BGRT] - 0xDC, should be 0x84 (20180810/tbprint-177)
[    2.379908] PM: Registering ACPI NVS region [mem 0x6e503000-0x6e503fff] (4096 bytes)
[    2.379908] PM: Registering ACPI NVS region [mem 0x7a445000-0x7abfefff] (8101888 bytes)
[    2.380709] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[    2.380710] ACPI: bus type PCI registered
[    2.459664] ACPI: Added _OSI(Module Device)
[    2.459664] ACPI: Added _OSI(Processor Device)
[    2.459665] ACPI: Added _OSI(3.0 _SCP Extensions)
[    2.459666] ACPI: Added _OSI(Processor Aggregator Device)
[    2.459667] ACPI: Added _OSI(Linux-Dell-Video)
[    2.459667] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    2.499998] ACPI: 8 ACPI AML tables successfully acquired and loaded
[    2.505360] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[    2.511690] ACPI: Dynamic OEM Table Load:
[    2.511696] ACPI: SSDT 0xFFFF88818180E800 0006F6 (v02 PmRef  Cpu0Ist  00003000 INTL 20160422)
[    2.512125] ACPI: \_PR_.CPU0: _OSC native thermal LVT Acked
[    2.513777] ACPI: Dynamic OEM Table Load:
[    2.513781] ACPI: SSDT 0xFFFF888180CE5C00 0003FF (v02 PmRef  Cpu0Cst  00003001 INTL 20160422)
[    2.514667] ACPI: Dynamic OEM Table Load:
[    2.514671] ACPI: SSDT 0xFFFF888180F43800 00065C (v02 PmRef  ApIst    00003000 INTL 20160422)
[    2.515347] ACPI: Dynamic OEM Table Load:
[    2.515351] ACPI: SSDT 0xFFFF888180D50600 00018A (v02 PmRef  ApCst    00003000 INTL 20160422)
[    2.516841] ACPI: EC: EC started
[    2.516842] ACPI: EC: interrupt blocked
[    2.516872] ACPI: \_SB_.PCI0.LPCB.EC0_: Used as first EC
[    2.516873] ACPI: \_SB_.PCI0.LPCB.EC0_: GPE=0x50, EC_CMD/EC_SC=0x66, EC_DATA=0x62
[    2.516874] ACPI: \_SB_.PCI0.LPCB.EC0_: Used as boot DSDT EC to handle transactions
[    2.516875] ACPI: Interpreter enabled
[    2.516908] ACPI: (supports S0 S3 S5)
[    2.516909] ACPI: Using IOAPIC for interrupt routing
[    2.516948] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    2.518072] ACPI: Enabled 7 GPEs in block 00 to 7F
[    2.521961] ACPI: Power Resource [WRST] (on)
[    2.522316] ACPI: Power Resource [WRST] (on)
[    2.522680] ACPI: Power Resource [WRST] (on)
[    2.523031] ACPI: Power Resource [WRST] (on)
[    2.523381] ACPI: Power Resource [WRST] (on)
[    2.523735] ACPI: Power Resource [WRST] (on)
[    2.524082] ACPI: Power Resource [WRST] (on)
[    2.524542] ACPI: Power Resource [WRST] (on)
[    2.524902] ACPI: Power Resource [WRST] (on)
[    2.525259] ACPI: Power Resource [WRST] (on)
[    2.525610] ACPI: Power Resource [WRST] (on)
[    2.525970] ACPI: Power Resource [WRST] (on)
[    2.526323] ACPI: Power Resource [WRST] (on)
[    2.526633] ACPI: Power Resource [WRST] (on)
[    2.526983] ACPI: Power Resource [WRST] (on)
[    2.527330] ACPI: Power Resource [WRST] (on)
[    2.527684] ACPI: Power Resource [WRST] (on)
[    2.529007] ACPI: Power Resource [WRST] (on)
[    2.529360] ACPI: Power Resource [WRST] (on)
[    2.529716] ACPI: Power Resource [WRST] (on)
[    2.542041] ACPI: Power Resource [FN00] (off)
[    2.542129] ACPI: Power Resource [FN01] (off)
[    2.542211] ACPI: Power Resource [FN02] (off)
[    2.542294] ACPI: Power Resource [FN03] (off)
[    2.542378] ACPI: Power Resource [FN04] (off)
[    2.543548] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
[    2.543553] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    2.543596] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM
[    2.562396] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
[    2.562462] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 *10 11 12 14 15)
[    2.562525] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 10 *11 12 14 15)
[    2.562588] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 10 *11 12 14 15)
[    2.562655] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 10 *11 12 14 15)
[    2.562718] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 10 *11 12 14 15)
[    2.562779] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 10 *11 12 14 15)
[    2.562841] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 10 *11 12 14 15)
[    2.563595] ACPI: EC: interrupt unblocked
[    2.563621] ACPI: EC: event unblocked
[    2.563639] ACPI: \_SB_.PCI0.LPCB.EC0_: GPE=0x50, EC_CMD/EC_SC=0x66, EC_DATA=0x62
[    2.563640] ACPI: \_SB_.PCI0.LPCB.EC0_: Used as boot DSDT EC to handle transactions and events
[    2.563765] ACPI: bus type USB registered
[    2.580383] PCI: Using ACPI for IRQ routing
[    2.627863] pnp: PnP ACPI init
[    2.628141] system 00:00: Plug and Play ACPI device, IDs PNP0c02 (active)
[    2.628280] pnp 00:01: Plug and Play ACPI device, IDs PNP0b00 (active)
[    2.628317] system 00:02: Plug and Play ACPI device, IDs INT3f0d PNP0c02 (active)
[    2.629513] system 00:03: Plug and Play ACPI device, IDs PNP0c02 (active)
[    2.629566] system 00:04: Plug and Play ACPI device, IDs PNP0c02 (active)
[    2.629937] system 00:05: Plug and Play ACPI device, IDs PNP0c02 (active)
[    2.631187] system 00:06: Plug and Play ACPI device, IDs PNP0c02 (active)
[    2.632238] pnp: PnP ACPI: found 7 devices
[    3.392469] ACPI: AC Adapter [AC0] (on-line)
[    3.392539] ACPI: Lid Switch [LID0]
[    3.392581] ACPI: Sleep Button [SLPB]
[    3.392621] ACPI: Power Button [PWRB]
[    3.392663] ACPI: Power Button [PWRF]
[    3.455097] ACPI: Thermal Zone [TZ00] (28 C)
[    3.455324] ACPI: Thermal Zone [TZ01] (30 C)
[    3.461779] hpet_acpi_add: no address or irqs in _CRS
[    3.566006] battery: ACPI: Battery Slot [BAT0] (battery present)
[    3.644076] acpi LNXCPU:06: hash matches
[    5.082551] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   16.887708] xen_acpi_processor: Uploading Xen processor PM info
```

If anyone has other debug ideas, I'm very thankful!!!!!!!!!!!!!

Claudia

unread,
Jan 11, 2020, 2:37:17 PM1/11/20
to Guerlan, qubes...@googlegroups.com
January 9, 2020 1:55 AM, "Guerlan" <worm...@gmail.com> wrote:

> First of all, here's the HCL for my Razer Blade Stealth 2016 4K touchscreen 16gb RAM 512gb SSD:

> https://groups.google.com/forum/#!searchin/qubes-users/razer$20blade|sort:date/qubes-users/PalZ-1inx


> A/D3mQ4OI3CAAJ
>
> When I close the lid and open again, keyboard wont ligth up, screen wont turn on (it's LED so I can
> see a brigth black when it turns on), and hitting keyboard or touchpad does nothing. I have to
> reboot. I don't know, however, if keyboard not ligthing when I open the lid is because sys-usb,
> which contains the keyboard, is not waken. Every other aspect of the laptop seems to be working
> perfectly.

When you're testing, make sure there are no VMs set to start on boot, especially not sys-net and
sys-usb, and make sure rd.qubes.hide_all_usb is not set. You can try to get that stuff working
later on.

Does pressing caps lock or num lock turn on/off their lights on the keyboard? Does ctrl-alt-delete,
or Alt-SysRq-B (you have to enable it first) cause it to reboot? If you suspend with sound playing,
can you hear it when you try to resume?

> I followed Ubuntu's guide on kernel suspend bugs: https://wiki.ubuntu.com/DebuggingKernelSuspend
>
> Then, following what they suggest
>
> `sudo sh -c "sync && echo 1 > /sys/power/pm_trace && pm-suspend"`
>
> and find the lines that says hash matches in dmesg rigth after reboot (what does that mean?)
>
> Well, I found two:
>
> ```
> [ 3.583591] ima: Allocated hash algorithm: sha1
> [ 3.593050] input: AT Raw Set 2 keyboard as /devices/platform/i8042/serio0/input/input4
> [ 3.638808] Magic number: 0:929:176
> [ 3.638867] acpi device:39: hash matches
> [ 3.638893] acpi device:0c: hash matches
> [ 3.639073] rtc_cmos 00:01: setting system clock to 2016-01-01 12:09:51 UTC (1451650191)
> ```
>
> I couldn't find anything related to those acpi devices. I thougth first that there was a driver for
> them, so I should just rmmod those drivers before sleep and insmod when wakeup, but couldn't find
> anything. There's this issue https://ubuntuforums.org/archive/index.php/t-2393029.html which have
> those exact hash matches, but no answer.

I don't know a lot about pm_trace, but it seems like there might be a problem decoding the hash.
Normally it should show you a PCI address, /sys device name, driver name, or something more
specific (see example in link below).

According to s2ram kernel documentation:

If no device matches the hash (or any matches appear to be false positives), the culprit may be a
device from a loadable kernel module that is not loaded until after the hash is checked. You can
check the hash against the current devices again after more modules are loaded using sysfs:

cat /sys/power/pm_trace_dev_match

https://www.kernel.org/doc/html/latest/power/s2ram.html#using-trace-resume

However, in qubes we may also have the opposite problem. Qubes takes over your network cards and
sometimes USB controllers in early userspace, so the drivers are not available anytime. To disable
this behavior for USB controllers, remove rd.qubes.hide_all_usb from the kernel cmdline. For
network cards it's a little more complicated.

You can try modifying the qubes initramfs hook. First, make sure there are no VMs configured to
start automatically at boot. Move /usr/lib/dracut/modules.d/90qubes-pciback/ to your home
directory, or open the qubes-pciback.sh file and comment out the last 9 or so lines (from "for dev
in $HIDEPCI"). Rebuild the initramfs. Then, do the pm_trace again as you did before. Then, try
pm_trace_dev_match as described in the link above.

It might give you better information about the problem device, or it might just give you the same
info as before, but it's something to try.

If it doesn't work, don't forget to put that file back how it was, and rebuild initramfs again.

> Then I asked for help on a forum and they found this problematic line on my dmesg:
>
> `[ 2.543596] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM`
>
> seems like ASPM is disabled on my Qubes. I don't know why. Should this be considered a bug? Is
> there anything I can do to get it working? This looks promising.
>
> It's worth noting that on Ubuntu 18, 19, Fedora 30, Linux Mint, etc, all these systems work like a
> charm with the sleep process. I can close the lid and open and it works. So the problem seems to be
> **related to Qubes**. I even tried qubes most recent dom0 kernel, based on 5.x linux kernel, but
> the problem persists.

There's a pretty big difference between Fedora and Qubes. R4.0 is based on Fedora 25, not 30. Also
have you tried suspend on any of those OSes with Xen installed and running? Or, have you tried
booting Qubes without Xen? (Here's how to boot Qubes 4.0 without Xen:
https://www.mail-archive.com/qubes...@googlegroups.com/msg31138.html - however it may be easier
for you to install Qubes 4.1 on a removable drive to test because it comes with Grub already, and
you don't have to risk breaking your main installation. Also 4.1 comes with a newer Xen version
which might help.)

> I also tried `pcie_aspm=force` on `/boot/efi/EFI/qubes/xen.cfg` (is this where I put kernel
> parameters?) like this:

Yes on R4.0 you use xen.cfg. On other releases, you use /etc/default/grub. Unfortunately I don't
know anything about ASPM so you probably know more than I do.

> `kernel=vmlinuz-4.14.74-1.pvops.qubes.x86_64 root=/dev/mapper/qubes_dom0-root
> rd.luks.uuid=luks-39fc83eb-9829-43b7-86e8-08068bd81087 rd.lvm.lv=qubes_dom0/root
> rd.lvm.lv=qubes_dom0/swap i915.alpha_support=1 pcie_aspm=force rhgb quiet
> plymouth.ignore-serial-consoles`
>
> but it didn't help.

Didn't help as in didn't make the message go away? Or just didn't fix the suspend issue?

> I pratically need to run Qubes on this notebook because any Linux distribution with any kernel will
> have a problem that corrupts my SSD many times a day. No one could solve it, and on Qubes it never
> happens. I tried Qubes just to see if it'd solve and it does! I'm loving it, not going back even on
> other notebooks. However, closing the lid/putting the system to sleep is essential for a notebook.

That's really strange.

> ```
>
> [lz@dom0 ~]$ cat /sys/power/mem_sleep
> s2idle [deep]
>
> ```
>
> as you see, the suspend default is deep mode.
>
> I tried s2idle by doing `echo freeze > /sys/power/state` and the screen turns off but they keyboard
> keeps with lights on. Pressing buttons does nothing. Pressing touchpad, nothing. Pressing power
> rapidly, nothing. Had to reboot by long pressing power. I thougth s2idle should always work since
> it's software based.

I don't have much if any experience with s2idle, but I would think too it would be the most
reliable. However, s2idle may power off the VGA controller or GPU or something like that, and if so
it could cause a graphics issue just like deep sleep. Does screen poweroff work for you? See:
https://www.mail-archive.com/qubes...@googlegroups.com/msg31504.html

> Here's my journalctl of the moment when I go to suspend by closing the lid (that is, suspending in
> deep mode):

> ...


>
> If anyone has other debug ideas, I'm very thankful!!!!!!!!!!!!!

Just some general tips: try kernel-latest, and Qubes R4.1, if you haven't yet. Also make sure your
firmware is up to date. If your machine has a dGPU, disable it in BIOS.

It doesn't sound like the CPUID Xen panic I had on my machine, but you could try the Xen patch
anyway, if nothing else works. In my case, only the fan came back on, but not the screen backlight
or anything else.

I also had to pin dom0 to CPU 0 to fix a different problem (my SATA controller was broken after
resume). Add the following to your Xen cmdline ("options=", not "kernel="!): "dom0_max_vcpus=1
dom0_vcpus_pin"

That's all I can think of right now.

Abel Luck

unread,
Jan 13, 2020, 1:28:19 PM1/13/20
to qubes-users
Hi there,

I'm debugging similar resume issues, though on different hardware. Hopefully you don't mind if we share tips in this thread.
Thanks for this tip. Using this method I was able to get a "hash matches" line in my dmesg whereas before I didn't get one.

I am also debugging a suspend resume issue but with a Asus z390 I Aorus Pro Wifi motherboard on a desktop (and an nvidia gpu unfortunately).

Some interesting facts:

1) the pci device that matched was "INT34B9:00". I can't really find much info about what this device is, it doesn't correspond to anything under lspci. /sys/bus/acpi/devices/INT34B9:00/uid contains the value "SerialIoUart1"

2) suspend and resume works when I execute "echo mem > /sys/power/state". However when I execute the suspend from xfce or run systemctl suspend, the resume fails (with a black screen but the keyboard lights up).
 

> I also tried `pcie_aspm=force` on `/boot/efi/EFI/qubes/xen.cfg` (is this where I put kernel
> parameters?) like this:

Yes on R4.0 you use xen.cfg. On other releases, you use /etc/default/grub. Unfortunately I don't
know anything about ASPM so you probably know more than I do.


I also don't know much about ASPM, but I noticed my bios had a section for "Active State Power Management" which was disabled, I enabled it (and the sub-options that appeared) but still haven't had luck.


> If anyone has other debug ideas, I'm very thankful!!!!!!!!!!!!!

Just some general tips: try kernel-latest, and Qubes R4.1, if you haven't yet.


I'm still on 4.0, how does one try 4.1 without a full re-install?
 

Also make sure your
firmware is up to date. If your machine has a dGPU, disable it in BIOS.

It doesn't sound like the CPUID Xen panic I had on my machine, but you could try the Xen patch
anyway, if nothing else works. In my case, only the fan came back on, but not the screen backlight
or anything else.

I also had to pin dom0 to CPU 0 to fix a different problem (my SATA controller was broken after
resume). Add the following to your Xen cmdline ("options=", not "kernel="!): "dom0_max_vcpus=1
dom0_vcpus_pin"


Will give these a try.

I have both iwlifi and nouveau, which are definitely top suspects however they haven't given me any issues and so far no evidence points to them being responsible.

~abel

Abel Luck

unread,
Jan 14, 2020, 5:58:11 AM1/14/20
to qubes...@googlegroups.com
Abel Luck:
>> Just some general tips: try kernel-latest, and Qubes R4.1, if you haven't
>> yet.


Some interesting news, TL;DR is that I got suspend/resume working!
Here's how:


I updated dom0 to kernel-latest, booted again and with all vms off
tested suspend with this script:

```
#!/bin/sh

sync
echo 1 > /sys/power/pm_trace
echo mem > /sys/power/state
```

Resume worked. However as soon as I turned on sys-usb it failed to
resume again, with the monitor staying off but the keyboard lights
turning on.

At this point I went into my bios and disabled all the devices I could:
wlan adapter, ethernet adapter, graphics, etc.

Throughout this point I was constantly checking for the "hash matches"
devices in dmesg and looking at /sys/power/pm_trace_dev_match. Also I
had edited qubes-pciback.sh as described by Claudia. There was never a
clear smoking gun that revealed some particular device, and the values
seemed to change with every reboot or configuration. However at one
point I noticed 'drm' in pm_trace_dev_match, and this would prove
useful later.

My motherboard has integrated intel graphics (igfx) but also a PCIe
nvidia card. Eventually I happened upon the bios configuration where I
enabled integrated graphics (I had no option to disable the nvidias card
aside from physically removing it).

Booting into Qubes using the igfx output, I noticed 'drm' in the
pm_trace_dev_match, which I know has something to do with the nouveau
driver. So I disabled as described at
https://www.qubes-os.org/doc/nvidia-troubleshooting/#disabling-nouveau.

Then resume worked!

I could have left it there and relied on igfx alone, but I hadn't had
any problems with nouveau, and for various reasons want to use it rather
than igfx. So on a hunch I tried the opposite process. I disabled igfx
in the bios and then added iommu=no-igfx to the GRUB_CMDLINE_LINUX (not
the XEN line) and resume works fine.

I'm a little confused as to why iommu=no-igfx is necessary since the
bios disabled the igfx card, but whatever, it's working.

To cleanup I reverted the change in qubes-pciback.sh, removed the
nouveau changes and added the no-igfx param in grub config.

I should add that at some point I switched from 'echo mem >
/sys/power/state' to 'systemctl suspend' in my test script because the
former would actually resume successfully in more cases, while the
latter never would (until I landed on the gpu solution).

In summary, it appears suspend/resume Qubes may have some problem when
multiple graphics adapters are present. This hardware suspends/resumes
fine in normal Debian. I observed that blocking one of the adapters and
forcing just a particular one seems to allow suspend/resume to operate
as expected.

~abel

Guerlan

unread,
Jan 17, 2020, 8:59:46 PM1/17/20
to qubes-users
Hi Claudia, I'm gonna test everything you told, and I'm very thankful for all the help you give here. It's just going to take some days because I sometimes can't pause my workflow to tweak my Qubes.

About the NVME problem, I opened a new thread: https://groups.google.com/forum/#!topic/qubes-users/ZVx3tDQ002E
If you know anything that can help, please give me some advice. I could finally understand why this bug happens and then I could help people that also suffer from this problem

Guerlan

unread,
Jan 17, 2020, 9:00:43 PM1/17/20
to qubes-users
Thank you for all your work Abel. I'm gonna read it carefully and try to apply to my Qubes to find out the problems!
Reply all
Reply to author
Forward
0 new messages