Qubes4.0 on gen3 X1 looses disk on resume from suspend

25 views
Skip to first unread message

Jonathan Proulx

unread,
Dec 6, 2018, 3:52:27 PM12/6/18
to qubes...@googlegroups.com
Hi All,

New to Qubes so hopefully I just missed some thing obvious. When
resuming from suspend on Lenovo X1 Carbon (gen3) with fresh Qubes 4.0
install I'm loosing access to the physical disk on resume.

This looked very similar to
https://github.com/QubesOS/qubes-issues/issues/3049 and is first
noticeable as I/O errors in App VMs and Dom0, but that but was closed
a long time ago so presumably a new install followed bu dom0-update
should get me caught up with that?

To cut out various other hardware resume issues I replicated this with
all network and usb VMs down just Dom0 and a "vault" style no-network
AppVM running.

Repeating with closing the lid in text console (rather than Xwindows)
I can see (but not effectively copy) error messages.

ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND
(20170728/psparse-364)
ACPI Error: Method parse/execution failed \_PR.CPU1._CST, AE_NOT_FOUND
(20170728/psparse-550)

ata1.00: revalidation failed (errno=-5)
ata1.00: revalidation failed (errno=-5)
sd 0:0:0:0: rejecting I/O to off lien device

then a bunch of device-mapper and EXT4 errors that likely result from
that initial failure. from time stamps the ACPI errors appear to be
when going to sleep though not 100% sure the disc errors are
definitely on resume. screenshot here:
https://people.csail.mit.edu/jon/qubes_disk_error.jpg

I was previously running Debian Testing and a few older Ubuntu
releases and they had been OK with resume from sleep on this hardware.
Maybe there's something extra I need to poke in Fedora land? I found
some notes about blacklisting wireless modules in net domain but
nothing Dom0 related.

Any clues?

Thanks,
-Jon

Jonathan Proulx

unread,
Dec 7, 2018, 2:16:11 PM12/7/18
to qubes...@googlegroups.com
On Thu, Dec 6, 2018 at 3:52 PM Jonathan Proulx <j...@jonproulx.com> wrote:
>
> Hi All,
>
> New to Qubes so hopefully I just missed some thing obvious. When
> resuming from suspend on Lenovo X1 Carbon (gen3) with fresh Qubes 4.0
> install I'm loosing access to the physical disk on resume.

With a few clever suggestions from ThomasWaldmann and busu on IRC I've
gotten a bit more detail but no resolution yet.

copying binaries and libraries to /tmp before suspend keeps them
usable on resume.

dmesg shows additional messages that were't making it to the console:

ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300 )
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300 )
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: limiting SATA link speed to 3.0 Gbps
ata1: SATA link sown (SStatus 0 SControl 320 )
ata1.00: disabled
sd 0:0:0:0: rejecting I/O to off line device
<etc...>

reverting from 4.14.74 to the default 4.14.18 kernel doesn't change anything

echo "- - -" > /sys/class/scsi_host/host0/scan

does show bus reset in dmesg but doesn't rediscover the lost drive

rescan-scsi-bus.sh from sg3_utils (after much copying of dependencies
to tmpfs) executes and scans host0 but finds zero drives post resume

smartctl shows drive healthy and long selftest passed with no logged
errors in device lifetime and all counter at the good end of their
scales.

X1 is a fairly popular laptop and gen3's been a round a while so I
tend to suspect this is a setting on my side rather than a Qubes bug
especially as I was running a different linux 4.18 kernel on this
hardware recently, but not sure what else there is to poke.

The most obvious bios setting is the Intel Rapid Start which was "on"
though that should only take effect after 30min and my failures are
immediate so unsurprisingly switching this to "off" has no effect
either.

Thanks,
-Jon

Jonathan Proulx

unread,
Dec 7, 2018, 4:14:35 PM12/7/18
to qubes...@googlegroups.com
On Fri, Dec 7, 2018 at 2:15 PM Jonathan Proulx <j...@jonproulx.com> wrote:
>
> On Thu, Dec 6, 2018 at 3:52 PM Jonathan Proulx <j...@jonproulx.com> wrote:
> >
> > Hi All,
> >
> > New to Qubes so hopefully I just missed some thing obvious. When
> > resuming from suspend on Lenovo X1 Carbon (gen3) with fresh Qubes 4.0
> > install I'm loosing access to the physical disk on resume.

Updating BIOS from circa 2015 to latest didn't help either.

brief suspends of <1min usually resume OK. sometimes closing and
immediately open in the lid is enough to trigger disk loss other a
few 10's of seconds resumes with disk. Pretty much 100% failure rate
at 5min so now suspecting firmware in the drive perhaps? cant think
what else would change things while in S3

the hunt continues,
-Jon

Daniel Moerner

unread,
Dec 9, 2018, 10:05:04 AM12/9/18
to qubes-users
Hi Jon,

This is very strange. Not very helpful for you, I just want to confirm that I have run Qubes 4.0 on a gen 3 X1 without having ever run into this bug. The ACPI errors are present on almost all these Lenovo machines and don't mean anything.

Daniel

Nick

unread,
Dec 9, 2018, 11:11:14 AM12/9/18
to qubes...@googlegroups.com
Hi,
having the same ACPI error on a non Lenovo machine. I'm getting these
errors on varying OS. Not just Qubes.

Niav

Daniel Moerner:
signature.asc
Reply all
Reply to author
Forward
0 new messages