blk_update_request: I/O error, private.img turns readonly

470 views
Skip to first unread message

IX4 Svs

unread,
Dec 19, 2015, 6:53:05 AM12/19/15
to qubes...@googlegroups.com
I noticed application errors in one of my AppVMs and quickly discovered this with dmesg:

[25779.941639] hrtimer: interrupt took 4397732 ns
[29981.965428] blkfront: xvdb: empty flush op failed
[29981.965438] blkfront: xvdb: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled;
[29981.965736] blk_update_request: I/O error, dev xvdb, sector 2157912
[29981.966111] Aborting journal on device xvdb-8.
[29981.969043] EXT4-fs error (device xvdb): ext4_journal_check_start:56: Detected aborted journal
[29981.969055] EXT4-fs (xvdb): Remounting filesystem read-only
[user@AppVM ~]$ touch lala
touch: cannot touch ‘lala’: Read-only file system
[user@AppVM ~]$

This is how /dev/xvdb is mounted on this AppVM:
/dev/xvdb on /rw type ext4 (rw,relatime,discard,data=ordered)
/dev/xvdb on /home type ext4 (rw,relatime,discard,data=ordered)

/dev/xvdb is the private.img for this AppVM.

Now, initially I suspected errors in the underlying filesystem or disk. Even though there were no errors in Dom0, I swapped the internal SSD, installed Qubes from scratch and restored my AppVMs. Unfortunately that didn't make any difference - I now see the exact same behaviour.

After such a fault that results in a readonly filesystem in the AppVM, shutting down and booting the AppVM shows the following with dmesg:
[    5.345561] EXT4-fs warning (device xvdb): ext4_clear_journal_err:4728: Filesystem error recorded from previous mount: IO failure
[    5.345577] EXT4-fs warning (device xvdb): ext4_clear_journal_err:4729: Marking fs in need of filesystem check.
[    5.346477] EXT4-fs (xvdb): warning: mounting fs with errors, running e2fsck is recommended
[    5.347419] EXT4-fs (xvdb): recovery complete
[    5.353636] EXT4-fs (xvdb): mounted filesystem with ordered data mode. Opts: discard

AFAIK journal issues should not be creating such serious I/O errors. In any case, how do I fix whatever is broken in private.img?

Thanks

Alex

Marek Marczykowski-Górecki

unread,
Dec 19, 2015, 9:58:16 AM12/19/15
to IX4 Svs, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Dec 19, 2015 at 11:53:05AM +0000, IX4 Svs wrote:
> I noticed application errors in one of my AppVMs and quickly discovered
> this with dmesg:
>
> [25779.941639] hrtimer: interrupt took 4397732 ns
> [29981.965428] blkfront: xvdb: empty flush op failed
> [29981.965438] blkfront: xvdb: barrier or flush: disabled; persistent
> grants: enabled; indirect descriptors: enabled;
> [29981.965736] blk_update_request: I/O error, dev xvdb, sector 2157912
> [29981.966111] Aborting journal on device xvdb-8.
> [29981.969043] EXT4-fs error (device xvdb): ext4_journal_check_start:56:
> Detected aborted journal
> [29981.969055] EXT4-fs (xvdb): Remounting filesystem read-only
> [user@AppVM ~]$ touch lala
> touch: cannot touch ‘lala’: Read-only file system
> [user@AppVM ~]$
>
> This is how /dev/xvdb is mounted on this AppVM:
> /dev/xvdb on /rw type ext4 (rw,relatime,discard,data=ordered)
> /dev/xvdb on /home type ext4 (rw,relatime,discard,data=ordered)
>
> /dev/xvdb is the private.img for this AppVM.
>
> Now, initially I suspected errors in the underlying filesystem or disk.

I think this is the most probable cause...

> Even though there were no errors in Dom0, I swapped the internal SSD,
> installed Qubes from scratch and restored my AppVMs. Unfortunately that
> didn't make any difference - I now see the exact same behaviour.
>
> After such a fault that results in a readonly filesystem in the AppVM,
> shutting down and booting the AppVM shows the following with dmesg:
> [ 5.345561] EXT4-fs warning (device xvdb): ext4_clear_journal_err:4728:
> Filesystem error recorded from previous mount: IO failure
> [ 5.345577] EXT4-fs warning (device xvdb): ext4_clear_journal_err:4729:
> Marking fs in need of filesystem check.

Because of some bug[1], you need to run fsck manually...

[1] https://github.com/QubesOS/qubes-issues/issues/979

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJWdXCAAAoJENuP0xzK19csyDEH+wZzspDkg1D8LoKkag0sXzac
MWSJ69E+KEdmks7Qcwo2Wft5pF8K4a4fGrCiEy5BZPzMDsdXuqRkeE7KM6n+qc3B
VvcmsWf3YzH4hs52TZ3Q2FsVXtUB/xH4+VIsRd6qT86ZFZB4/PKcO9m2VRRcwELB
heLAHN6SHJZFh8xX4/zc+hSSZEInWlhymYzDfQCkseO6hXTvBhcyZnQ/HsHMjD8B
HAv09uUCMBuE1EZOBwddi3wb1/U1PdXU3NdiSelloRldkcKUZPGm5UdM/OUFy7Ej
trCYsJ34BGkYvGLL3WNNW0RNPz/ZTPz55IAG2Q/B6wTUYw1o9KyVza7445Te4Nc=
=+EGw
-----END PGP SIGNATURE-----

IX4 Svs

unread,
Dec 29, 2015, 5:03:37 PM12/29/15
to Marek Marczykowski-Górecki, qubes...@googlegroups.com
On Sat, Dec 19, 2015 at 2:58 PM, Marek Marczykowski-Górecki <marm...@invisiblethingslab.com> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Dec 19, 2015 at 11:53:05AM +0000, IX4 Svs wrote:
> I noticed application errors in one of my AppVMs and quickly discovered
> this with dmesg:
>
> [25779.941639] hrtimer: interrupt took 4397732 ns
> [29981.965428] blkfront: xvdb: empty flush op failed
> [29981.965438] blkfront: xvdb: barrier or flush: disabled; persistent
> grants: enabled; indirect descriptors: enabled;
> [29981.965736] blk_update_request: I/O error, dev xvdb, sector 2157912
> [29981.966111] Aborting journal on device xvdb-8.
> [29981.969043] EXT4-fs error (device xvdb): ext4_journal_check_start:56:
> Detected aborted journal
> [29981.969055] EXT4-fs (xvdb): Remounting filesystem read-only
> [user@AppVM ~]$ touch lala
> touch: cannot touch ‘lala’: Read-only file system
> [user@AppVM ~]$
>
> This is how /dev/xvdb is mounted on this AppVM:
> /dev/xvdb on /rw type ext4 (rw,relatime,discard,data=ordered)
> /dev/xvdb on /home type ext4 (rw,relatime,discard,data=ordered)
>
> /dev/xvdb is the private.img for this AppVM.
>
> Now, initially I suspected errors in the underlying filesystem or disk.

I think this is the most probable cause...


I now have the exact same behaviour on a different AppVM. The new SSD is in my Qubes laptop, but I get:

[ 7528.012625] blkfront: xvdb: empty flush op failed
[ 7528.012653] blkfront: xvdb: barrier or flush: disabled; persistent grants: enabled; indirect descriptors: enabled;
[ 7528.012948] blk_update_request: I/O error, dev xvdb, sector 2123248
[ 7528.013289] Aborting journal on device xvdb-8.
[ 7528.016528] EXT4-fs error (device xvdb): ext4_journal_check_start:56: Detected aborted journal
[ 7528.016554] EXT4-fs (xvdb): Remounting filesystem read-only
[user@untrusted ~]$

This is too much of a coincidence... two different AppVMs, on two different physical SSDs, showing the same behaviour? How could I troubleshoot this?
 
> Even though there were no errors in Dom0, I swapped the internal SSD,
> installed Qubes from scratch and restored my AppVMs. Unfortunately that
> didn't make any difference - I now see the exact same behaviour.
>
> After such a fault that results in a readonly filesystem in the AppVM,
> shutting down and booting the AppVM shows the following with dmesg:
> [    5.345561] EXT4-fs warning (device xvdb): ext4_clear_journal_err:4728:
> Filesystem error recorded from previous mount: IO failure
> [    5.345577] EXT4-fs warning (device xvdb): ext4_clear_journal_err:4729:
> Marking fs in need of filesystem check.

Because of some bug[1], you need to run fsck manually...

[1] https://github.com/QubesOS/qubes-issues/issues/979


Thanks for documenting the workaround - the manual process worked well.

IX4 Svs

unread,
Dec 29, 2015, 5:13:37 PM12/29/15
to Marek Marczykowski-Górecki, qubes...@googlegroups.com
With one exception - I always get a garbled terminal that doesn't echo what I type - so I cannot see the fsck prompts. For posterity, the way to fix this is to type (blindly)
reset
and hit enter. This reset the terminal and allows echo, line breaks etc to work as expected.

Marek Marczykowski-Górecki

unread,
Dec 29, 2015, 7:28:06 PM12/29/15
to IX4 Svs, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I guess you've already checked dom0 kernel messages, right?
If there is nothing related there, you can enable debugging in block
backend (as root in dom0):
echo module xen_blkback > /sys/kernel/debug/dynamic_debug

Then observe kernel messages for the next error...

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJWgyUIAAoJENuP0xzK19csPW4H/Aug/5fN+VfRBwzW6tIFdjaA
sOvZyq/3AClTng8S5uUR8XERCOY0ahU22pNW3iALGATzZnLskIRIbOe7jDl/V53q
QUpzWFtdBGUeADZz1MsNBkCrXMRKvIYyIX5DCSoizhTZMB7vRw4yFM+gxjvQxrbx
vfaC2elklp1U+jb0gUCaFNnXmisA8tWZggm1tycgj4rYlh59UnZQ/9bDDOPxredI
xNsV2H1vJPmgwHfgtHzYcP1GiNWV54a7bJdYc/61fiECJobimtT41so6fKdvGhU9
Xz5BpvHjebNzbt1NpNcbVvRoXiWE9Ky69KrkSxT8seLObwQoGxR0XFjuspGwHVk=
=zSqh
-----END PGP SIGNATURE-----

IX4 Svs

unread,
Dec 30, 2015, 6:21:58 AM12/30/15
to Marek Marczykowski-Górecki, qubes...@googlegroups.com
As there had been nothing there the first time this happened, I got lazy this time and didn't check. You're right, there is plenty of reason for the AppVMs to have issues. Here's what dom0's /var/log/messages says:

ec 29 21:14:44 dom0 kernel: [ 7669.133786] ata1.00: exception Emask 0x60 SAct 0x8000 SErr 0x800 action 0x6 frozen
Dec 29 21:14:44 dom0 kernel: [ 7669.133800] ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:44 dom0 kernel: [ 7669.133805] ata1: SError: { HostInt }
Dec 29 21:14:44 dom0 kernel: [ 7669.133810] ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:44 dom0 kernel: [ 7669.133818] ata1.00: cmd 61/c0:78:30:e5:26/00:00:0c:00:00/40 tag 15 ncq 98304 out
Dec 29 21:14:44 dom0 kernel: [ 7669.133818]          res 40/00:78:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:44 dom0 kernel: [ 7669.133822] ata1.00: status: { DRDY }
Dec 29 21:14:44 dom0 kernel: [ 7669.133828] ata1: hard resetting link
Dec 29 21:14:44 dom0 kernel: ata1.00: exception Emask 0x60 SAct 0x8000 SErr 0x800 action 0x6 frozen
Dec 29 21:14:44 dom0 kernel: ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:44 dom0 kernel: ata1: SError: { HostInt }
Dec 29 21:14:44 dom0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:44 dom0 kernel: ata1.00: cmd 61/c0:78:30:e5:26/00:00:0c:00:00/40 tag 15 ncq 98304 out
         res 40/00:78:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:44 dom0 kernel: ata1.00: status: { DRDY }
Dec 29 21:14:44 dom0 kernel: ata1: hard resetting link
Dec 29 21:14:45 dom0 kernel: [ 7669.438914] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:45 dom0 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:45 dom0 kernel: [ 7669.441496] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7669.441500] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7669.441649] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: [ 7669.443451] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7669.443455] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7669.443514] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: [ 7669.444336] ata1.00: configured for UDMA/133
Dec 29 21:14:45 dom0 kernel: ata1.00: configured for UDMA/133
Dec 29 21:14:45 dom0 kernel: [ 7669.445818] ata1: EH complete
Dec 29 21:14:45 dom0 kernel: ata1: EH complete
Dec 29 21:14:45 dom0 kernel: [ 7669.455840] ata1.00: exception Emask 0x60 SAct 0x20000 SErr 0x800 action 0x6 frozen
Dec 29 21:14:45 dom0 kernel: [ 7669.455846] ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:45 dom0 kernel: [ 7669.455850] ata1: SError: { HostInt }
Dec 29 21:14:45 dom0 kernel: [ 7669.455855] ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:45 dom0 kernel: [ 7669.455863] ata1.00: cmd 61/c0:88:30:e5:26/00:00:0c:00:00/40 tag 17 ncq 98304 out
Dec 29 21:14:45 dom0 kernel: [ 7669.455863]          res 40/00:88:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:45 dom0 kernel: [ 7669.455867] ata1.00: status: { DRDY }
Dec 29 21:14:45 dom0 kernel: [ 7669.455873] ata1: hard resetting link
Dec 29 21:14:45 dom0 kernel: ata1.00: exception Emask 0x60 SAct 0x20000 SErr 0x800 action 0x6 frozen
Dec 29 21:14:45 dom0 kernel: ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:45 dom0 kernel: ata1: SError: { HostInt }
Dec 29 21:14:45 dom0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:45 dom0 kernel: ata1.00: cmd 61/c0:88:30:e5:26/00:00:0c:00:00/40 tag 17 ncq 98304 out
         res 40/00:88:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:45 dom0 kernel: ata1.00: status: { DRDY }
Dec 29 21:14:45 dom0 kernel: ata1: hard resetting link
Dec 29 21:14:45 dom0 kernel: [ 7669.760892] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:45 dom0 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:45 dom0 kernel: [ 7669.763326] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7669.763331] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7669.763408] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: [ 7669.765264] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7669.765269] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7669.765329] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: ata1.00: configured for UDMA/133
Dec 29 21:14:45 dom0 kernel: ata1: EH complete
Dec 29 21:14:45 dom0 kernel: [ 7669.766057] ata1.00: configured for UDMA/133
Dec 29 21:14:45 dom0 kernel: [ 7669.766103] ata1: EH complete
Dec 29 21:14:45 dom0 kernel: [ 7669.775847] ata1.00: exception Emask 0x60 SAct 0x40000000 SErr 0x800 action 0x6 frozen
Dec 29 21:14:45 dom0 kernel: [ 7669.775854] ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:45 dom0 kernel: [ 7669.775858] ata1: SError: { HostInt }
Dec 29 21:14:45 dom0 kernel: [ 7669.775863] ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:45 dom0 kernel: [ 7669.775871] ata1.00: cmd 61/c0:f0:30:e5:26/00:00:0c:00:00/40 tag 30 ncq 98304 out
Dec 29 21:14:45 dom0 kernel: [ 7669.775871]          res 40/00:f0:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:45 dom0 kernel: [ 7669.775874] ata1.00: status: { DRDY }
Dec 29 21:14:45 dom0 kernel: [ 7669.775879] ata1: hard resetting link
Dec 29 21:14:45 dom0 kernel: ata1.00: exception Emask 0x60 SAct 0x40000000 SErr 0x800 action 0x6 frozen
Dec 29 21:14:45 dom0 kernel: ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:45 dom0 kernel: ata1: SError: { HostInt }
Dec 29 21:14:45 dom0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:45 dom0 kernel: ata1.00: cmd 61/c0:f0:30:e5:26/00:00:0c:00:00/40 tag 30 ncq 98304 out
         res 40/00:f0:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:45 dom0 kernel: ata1.00: status: { DRDY }
Dec 29 21:14:45 dom0 kernel: ata1: hard resetting link
Dec 29 21:14:45 dom0 kernel: [ 7670.080896] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:45 dom0 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:45 dom0 kernel: [ 7670.083476] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7670.083481] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7670.083569] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: [ 7670.085394] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7670.085398] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: [ 7670.085458] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:45 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:45 dom0 kernel: ata1.00: configured for UDMA/133
Dec 29 21:14:45 dom0 kernel: [ 7670.086191] ata1.00: configured for UDMA/133
Dec 29 21:14:45 dom0 kernel: [ 7670.087806] ata1: EH complete
Dec 29 21:14:45 dom0 kernel: ata1: EH complete
Dec 29 21:14:45 dom0 kernel: [ 7670.097854] ata1.00: exception Emask 0x60 SAct 0x1000 SErr 0x800 action 0x6 frozen
Dec 29 21:14:45 dom0 kernel: [ 7670.097861] ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:45 dom0 kernel: [ 7670.097865] ata1: SError: { HostInt }
Dec 29 21:14:45 dom0 kernel: [ 7670.097870] ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:45 dom0 kernel: [ 7670.097878] ata1.00: cmd 61/c0:60:30:e5:26/00:00:0c:00:00/40 tag 12 ncq 98304 out
Dec 29 21:14:45 dom0 kernel: [ 7670.097878]          res 40/00:60:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:45 dom0 kernel: [ 7670.097882] ata1.00: status: { DRDY }
Dec 29 21:14:45 dom0 kernel: [ 7670.097888] ata1: hard resetting link
Dec 29 21:14:45 dom0 kernel: ata1.00: exception Emask 0x60 SAct 0x1000 SErr 0x800 action 0x6 frozen
Dec 29 21:14:45 dom0 kernel: ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:45 dom0 kernel: ata1: SError: { HostInt }
Dec 29 21:14:45 dom0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:45 dom0 kernel: ata1.00: cmd 61/c0:60:30:e5:26/00:00:0c:00:00/40 tag 12 ncq 98304 out
         res 40/00:60:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:45 dom0 kernel: ata1.00: status: { DRDY }
Dec 29 21:14:45 dom0 kernel: ata1: hard resetting link
Dec 29 21:14:46 dom0 kernel: [ 7670.402898] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:46 dom0 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:46 dom0 kernel: [ 7670.405476] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7670.405480] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7670.405561] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: [ 7670.407538] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7670.407543] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7670.407622] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: [ 7670.408388] ata1.00: configured for UDMA/133
Dec 29 21:14:46 dom0 kernel: ata1.00: configured for UDMA/133
Dec 29 21:14:46 dom0 kernel: [ 7670.409802] ata1: EH complete
Dec 29 21:14:46 dom0 kernel: ata1: EH complete
Dec 29 21:14:46 dom0 kernel: [ 7670.419845] ata1.00: exception Emask 0x60 SAct 0x2000000 SErr 0x800 action 0x6 frozen
Dec 29 21:14:46 dom0 kernel: [ 7670.419851] ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:46 dom0 kernel: [ 7670.419855] ata1: SError: { HostInt }
Dec 29 21:14:46 dom0 kernel: [ 7670.419860] ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:46 dom0 kernel: [ 7670.419868] ata1.00: cmd 61/c0:c8:30:e5:26/00:00:0c:00:00/40 tag 25 ncq 98304 out
Dec 29 21:14:46 dom0 kernel: [ 7670.419868]          res 40/00:c8:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:46 dom0 kernel: [ 7670.419871] ata1.00: status: { DRDY }
Dec 29 21:14:46 dom0 kernel: [ 7670.419877] ata1: hard resetting link
Dec 29 21:14:46 dom0 kernel: ata1.00: exception Emask 0x60 SAct 0x2000000 SErr 0x800 action 0x6 frozen
Dec 29 21:14:46 dom0 kernel: ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:46 dom0 kernel: ata1: SError: { HostInt }
Dec 29 21:14:46 dom0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:46 dom0 kernel: ata1.00: cmd 61/c0:c8:30:e5:26/00:00:0c:00:00/40 tag 25 ncq 98304 out
         res 40/00:c8:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:46 dom0 kernel: ata1.00: status: { DRDY }
Dec 29 21:14:46 dom0 kernel: ata1: hard resetting link
Dec 29 21:14:46 dom0 kernel: [ 7670.724896] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:46 dom0 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:46 dom0 kernel: [ 7670.727492] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7670.727497] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7670.727625] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: [ 7670.729413] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7670.729418] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7670.729476] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: ata1.00: configured for UDMA/133
Dec 29 21:14:46 dom0 kernel: [ 7670.730209] ata1.00: configured for UDMA/133
Dec 29 21:14:46 dom0 kernel: [ 7670.731795] ata1: EH complete
Dec 29 21:14:46 dom0 kernel: ata1: EH complete
Dec 29 21:14:46 dom0 kernel: [ 7670.741857] ata1.00: exception Emask 0x60 SAct 0x80 SErr 0x800 action 0x6 frozen
Dec 29 21:14:46 dom0 kernel: [ 7670.741862] ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:46 dom0 kernel: [ 7670.741866] ata1: SError: { HostInt }
Dec 29 21:14:46 dom0 kernel: [ 7670.741870] ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:46 dom0 kernel: [ 7670.741878] ata1.00: cmd 61/c0:38:30:e5:26/00:00:0c:00:00/40 tag 7 ncq 98304 out
Dec 29 21:14:46 dom0 kernel: [ 7670.741878]          res 40/00:38:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:46 dom0 kernel: [ 7670.741882] ata1.00: status: { DRDY }
Dec 29 21:14:46 dom0 kernel: [ 7670.741887] ata1: hard resetting link
Dec 29 21:14:46 dom0 kernel: ata1.00: exception Emask 0x60 SAct 0x80 SErr 0x800 action 0x6 frozen
Dec 29 21:14:46 dom0 kernel: ata1.00: irq_stat 0x20000000, host bus error
Dec 29 21:14:46 dom0 kernel: ata1: SError: { HostInt }
Dec 29 21:14:46 dom0 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Dec 29 21:14:46 dom0 kernel: ata1.00: cmd 61/c0:38:30:e5:26/00:00:0c:00:00/40 tag 7 ncq 98304 out
         res 40/00:38:30:e5:26/00:00:0c:00:00/40 Emask 0x60 (host bus error)
Dec 29 21:14:46 dom0 kernel: ata1.00: status: { DRDY }
Dec 29 21:14:46 dom0 kernel: ata1: hard resetting link
Dec 29 21:14:46 dom0 kernel: [ 7671.046885] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:46 dom0 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Dec 29 21:14:46 dom0 kernel: [ 7671.049344] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7671.049349] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7671.049452] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: [ 7671.051308] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7671.051313] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: [ 7671.051360] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
Dec 29 21:14:46 dom0 kernel: ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
Dec 29 21:14:46 dom0 kernel: ata1.00: configured for UDMA/133
Dec 29 21:14:46 dom0 kernel: [ 7671.052056] ata1.00: configured for UDMA/133
Dec 29 21:14:46 dom0 kernel: [ 7671.052106] sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 29 21:14:46 dom0 kernel: [ 7671.052116] sd 0:0:0:0: [sda] tag#7 Sense Key : Illegal Request [current] [descriptor]
Dec 29 21:14:46 dom0 kernel: [ 7671.052121] sd 0:0:0:0: [sda] tag#7 Add. Sense: Unaligned write command
Dec 29 21:14:46 dom0 kernel: [ 7671.052129] sd 0:0:0:0: [sda] tag#7 CDB: Write(10) 2a 00 0c 26 e5 30 00 00 c0 00
Dec 29 21:14:46 dom0 kernel: [ 7671.052133] blk_update_request: I/O error, dev sda, sector 203875632
Dec 29 21:14:46 dom0 kernel: [ 7671.052171] EXT4-fs warning (device dm-2): ext4_end_bio:332: I/O error -5 writing to inode 3932190 (offset 0 size 0 starting block 23303334)
Dec 29 21:14:46 dom0 kernel: [ 7671.052178] Buffer I/O error on device dm-2, logical block 23303334
Dec 29 21:14:46 dom0 kernel: [ 7671.052201] Buffer I/O error on device dm-2, logical block 23303335
Dec 29 21:14:46 dom0 kernel: [ 7671.052207] Buffer I/O error on device dm-2, logical block 23303336
Dec 29 21:14:46 dom0 kernel: [ 7671.052212] Buffer I/O error on device dm-2, logical block 23303337
Dec 29 21:14:46 dom0 kernel: [ 7671.052216] Buffer I/O error on device dm-2, logical block 23303338
Dec 29 21:14:46 dom0 kernel: [ 7671.052230] Buffer I/O error on device dm-2, logical block 23303339
Dec 29 21:14:46 dom0 kernel: [ 7671.052236] Buffer I/O error on device dm-2, logical block 23303340
Dec 29 21:14:46 dom0 kernel: [ 7671.052248] Buffer I/O error on device dm-2, logical block 23303341
Dec 29 21:14:46 dom0 kernel: [ 7671.052254] Buffer I/O error on device dm-2, logical block 23303342
Dec 29 21:14:46 dom0 kernel: [ 7671.052265] Buffer I/O error on device dm-2, logical block 23303343
Dec 29 21:14:46 dom0 kernel: [ 7671.052309] blk_update_request: I/O error, dev loop11, sector 0
Dec 29 21:14:46 dom0 kernel: [ 7671.052321] ata1: EH complete
Dec 29 21:14:46 dom0 kernel: sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 29 21:14:46 dom0 kernel: sd 0:0:0:0: [sda] tag#7 Sense Key : Illegal Request [current] [descriptor]
Dec 29 21:14:46 dom0 kernel: sd 0:0:0:0: [sda] tag#7 Add. Sense: Unaligned write command
Dec 29 21:14:46 dom0 kernel: sd 0:0:0:0: [sda] tag#7 CDB: Write(10) 2a 00 0c 26 e5 30 00 00 c0 00
Dec 29 21:14:46 dom0 kernel: blk_update_request: I/O error, dev sda, sector 203875632
Dec 29 21:14:46 dom0 kernel: EXT4-fs warning (device dm-2): ext4_end_bio:332: I/O error -5 writing to inode 3932190 (offset 0 size 0 starting block 23303334)
Dec 29 21:14:46 dom0 kernel: Buffer I/O error on device dm-2, logical block 23303334
Dec 29 21:14:46 dom0 kernel: Buffer I/O error on device dm-2, logical block 23303335
Dec 29 21:14:46 dom0 kernel: Buffer I/O error on device dm-2, logical block 23303336
Dec 29 21:14:46 dom0 kernel: Buffer I/O error on device dm-2, logical block 23303337
Dec 29 21:14:46 dom0 kernel: Buffer I/O error on device dm-2, logical block 23303338
Dec 29 21:14:46 dom0 kernel: Buffer I/O error on device dm-2, logical block 23303339
Dec 29 21:14:46 dom0 kernel: Buffer I/O error on device dm-2, logical block 23303340
Dec 29 21:14:46 dom0 kernel: Buffer I/O error on device dm-2, logical block 23303341
Dec 29 21:14:46 dom0 kernel: Buffer I/O error on device dm-2, logical block 23303342
Dec 29 21:14:46 dom0 kernel: Buffer I/O error on device dm-2, logical block 23303343
Dec 29 21:14:46 dom0 kernel: blk_update_request: I/O error, dev loop11, sector 0
Dec 29 21:14:46 dom0 kernel: ata1: EH complete

So, a lot of unpleasantness.

I ran the BIOS-level Dell diagnostic tool (Fn + power button on any Dell laptop) and after ~45' of deliberation it declared the disk "healthy", with no warnings.

I ran dom0$ sudo smartctl -t long /dev/sda and reviewed the results - "completed without error".

Trying to run badblocks from dom0 I get "/dev/sda is apparently in use by the system; it's not safe to run badblocks!" - is there a reasonably safe way around this?

Marek Marczykowski-Górecki

unread,
Dec 30, 2015, 7:24:47 AM12/30/15
to IX4 Svs, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

(...)

> So, a lot of unpleasantness.
>
> I ran the BIOS-level Dell diagnostic tool (Fn + power button on any Dell
> laptop) and after ~45' of deliberation it declared the disk "healthy", with
> no warnings.
>
> I ran dom0$ sudo smartctl -t long /dev/sda and reviewed the results -
> "completed without error".
>
> Trying to run badblocks from dom0 I get "/dev/sda is apparently in use by
> the system; it's not safe to run badblocks!" - is there a reasonably safe
> way around this?

Run it from some live image.

I think if not the disk, it may be your SATA interface defective.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJWg8z1AAoJENuP0xzK19csG8sH/3AaI5aM7SNVk66Rhm6lYDFE
Webdohs7+0vxiPtAkX6Q9pY7w3ZiN9I2tDuWHKAoRCX5I38Pyg0GwujS2fE42RvH
J2G/jF6q7xdMHJ++XzmaKtfrL/fEaV7ZMJJDV2YFYkG/LxaNvcWj20lM+mzKRGQw
6sbkgLfoOqv5aL9JqtQB+8vvFxGWGUmmLktjVkb6vP+aCH9MkV3rJYcFpcniP1zr
o+hILFSdZ+a2+UZfCFJ/fOVRjgKzSJDC0CxuURdTrduuUPjcSulPD26PPaXMvtUM
vyk4Cb2t212CBVfwnkOAsYw1fWfMdn8GybDeIUZhw0njVJiHMmeoyL11RcEOrjw=
=rLTV
-----END PGP SIGNATURE-----

Manuel Amador (Rudd-O)

unread,
Dec 30, 2015, 5:34:07 PM12/30/15
to qubes...@googlegroups.com
On 12/29/2015 10:03 PM, IX4 Svs wrote:
> On Sat, Dec 19, 2015 at 2:58 PM, Marek Marczykowski-Górecki <marm...@invisiblethingslab.com <mailto:marm...@invisiblethingslab.com>> wrote: >
>
> > I now have the exact same behaviour on a different AppVM. The new
> SSD is in my Qubes laptop, but I get:
>
> > [ 7528.012625] blkfront: xvdb: empty flush op failed
> > [ 7528.012653] blkfront: xvdb: barrier or flush: disabled;
> persistent grants: enabled; indirect descriptors: enabled;
> > [ 7528.012948] blk_update_request: I/O error, dev xvdb, sector 2123248
> > [ 7528.013289] Aborting journal on device xvdb-8.
> > [ 7528.016528] EXT4-fs error (device xvdb):
> ext4_journal_check_start:56: Detected aborted journal
> > [ 7528.016554] EXT4-fs (xvdb): Remounting filesystem read-only
> > [user@untrusted ~]$
>
> > This is too much of a coincidence... two different AppVMs, on two
> different physical SSDs, showing the same behaviour? How could I
> troubleshoot this?

Seems like bad RAM, bad controller, or bad PSU.
--
Rudd-O
http://rudd-o.com/

IX4 Svs

unread,
Dec 31, 2015, 7:23:59 AM12/31/15
to Marek Marczykowski-Górecki, qubes...@googlegroups.com
Good idea. I booted the latest TAILS off a USB stick and ran badblocks -n -s -v -p 3 /dev/sda

It crunched away for 10 hours (3 passes, rw test) with 0 errors.

I also checked RAM with memtest86 v4.3.7 as per Manuel's suggestion - 2 full passes (4 hours), no errors.
 
I think if not the disk, it may be your SATA interface defective.

I think that's the only (mainstream) possibility left.

Thank you for your ideas, I'll get on the phone with Dell to see what they can do about the SATA controller.

IX4 Svs

unread,
Dec 31, 2015, 9:55:12 AM12/31/15
to Marek Marczykowski-Górecki, qubes...@googlegroups.com
Talked to Dell ProSupport (a luxury I definitely recommend) and had to talk them out of sending me someone to replace the motherboard and my memory modules... They noticed that my BIOS was two revisions old, so I opted to update the BIOS and monitor the system for a couple of days. If the issues continue (which I expect will happen), we'll go with the nuclear option and switch the guts of the laptop for fresh guts. This will get me a new storage controller as well, which will hopefully resolve any manufacturing flukes.

Having said that, I've noticed https://lkml.org/lkml/2015/11/30/415 which talks about a Linux bug that results in similar errors being logged when a TRIM command is issued to a controller that doesn't support it. But, it applies to a different PCI device (my laptop's SATA controller is 8086:282a), and my understanding of the bug is that it would not lead to real filesystem corruption.

I'll keep monitoring with the new BIOS, if anything untowards happens I'll switch motherboards and report back.

IX4 Svs

unread,
Jan 17, 2016, 4:55:26 PM1/17/16
to Marek Marczykowski-Górecki, qubes...@googlegroups.com
So, given I have ProSupport, I decided to be extra cautious and just change the motherboard anyway. They also gave me new memory modules. So I now have a laptop with new SSD, RAM and motherboard since this bug occurred... I haven't been using this computer much but I haven't had any problems with the internal SSD so far. But I do have a consistent, reproducible failure which is preventing me from making local backups. At about 15% of a backup process started via Qubes VM Manager I now always get:

[ 6719.638588] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command.
[ 6719.638601] xhci_hcd 0000:00:14.0: Assuming host is dying, halting host.
[ 6719.638676] xhci_hcd 0000:00:14.0: HC died; cleaning up
[ 6719.638908] usb 4-2: USB disconnect, device number 2
[ 6719.640097] sd 6:0:0:0: [sdb] Synchronizing SCSI cache
[ 6719.641585] sd 6:0:0:0: [sdb] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 6719.641595] sd 6:0:0:0: [sdb] CDB: Write(10) 2a 00 57 bc dd 20 00 00 f0 00
[ 6719.641601] blk_update_request: I/O error, dev sdb, sector 1471995168
[ 6719.641635] sd 6:0:0:0: [sdb] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 6719.641644] EXT4-fs warning (device dm-4): ext4_end_bio:332: I/O error -5 writing to inode 24 (offset 11022630912 size 8388608 starting block 183998884)
[ 6719.641646] sd 6:0:0:0: [sdb] CDB: Write(10) 2a 00 57 bc de 10 00 00 f0 00
[ 6719.641650] blk_update_request: I/O error, dev sdb, sector 1471995408
[ 6719.641657] Buffer I/O error on device dm-4, logical block 183998884
[ 6719.641665] Buffer I/O error on device dm-4, logical block 183998885
[ 6719.641672] EXT4-fs warning (device dm-4): ext4_end_bio:332: I/O error -5 writing to inode 24 (offset 11022630912 size 8388608 starting block 183998914)
[ 6719.641675] Buffer I/O error on device dm-4, logical block 183998914
<duplicates removed>
[ 6719.641697] Buffer I/O error on device dm-4, logical block 183998917
[ 6719.641775] EXT4-fs warning (device dm-4): ext4_end_bio:332: I/O error -5 writing to inode 24 (offset 11022630912 size 8388608 starting block 183998944)
<duplicates removed>
[ 6719.642235] EXT4-fs warning (device dm-4): ext4_end_bio:332: I/O error -5 writing to inode 24 (offset 11022630912 size 8388608 starting block 183999154)
[ 6719.647213] sd 6:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[ 6719.647341] scsi 6:0:0:0: rejecting I/O to dead device
[<duplicates removed>
[ 6719.647982] scsi 6:0:0:0: rejecting I/O to dead device
[ 6719.706864] blk_update_request: I/O error, dev sdb, sector 1472034656
[ 6719.713532] Aborting journal on device dm-4-8.
[ 6719.714366] EXT4-fs error (device dm-4) in ext4_reserve_inode_write:4980: Journal has aborted
[ 6719.752890] Buffer I/O error on dev dm-4, logical block 121667584, lost sync page write
[ 6719.752930] JBD2: Error -5 detected when updating journal superblock for dm-4-8.
[ 6719.757852] Buffer I/O error on dev dm-4, logical block 0, lost sync page write
[ 6719.757883] EXT4-fs error (device dm-4): mpage_map_and_submit_extent:2248: comm kworker/u8:3: Failed to mark inode 24 dirty
[ 6719.757891] EXT4-fs (dm-4): previous I/O error to superblock detected
[ 6719.757957] Buffer I/O error on dev dm-4, logical block 0, lost sync page write
[ 6719.757990] EXT4-fs error (device dm-4) in ext4_writepages:2539: Journal has aborted
[ 6719.757994] EXT4-fs (dm-4): previous I/O error to superblock detected
[ 6719.758087] Buffer I/O error on dev dm-4, logical block 0, lost sync page write
[ 6719.786438] EXT4-fs (dm-4): previous I/O error to superblock detected
[ 6719.786630] Buffer I/O error on dev dm-4, logical block 0, lost sync page write
[ 6719.786695] EXT4-fs error (device dm-4): ext4_journal_check_start:56: Detected aborted journal
[ 6719.786703] EXT4-fs (dm-4): Remounting filesystem read-only
[ 6719.786709] EXT4-fs (dm-4): previous I/O error to superblock detected
[ 6719.786828] Buffer I/O error on dev dm-4, logical block 0, lost sync page write
[ 6720.720755] Buffer I/O error on dev dm-4, logical block 244190112, async page read
[ 6720.720799] Buffer I/O error on dev dm-4, logical block 244190131, async page read
[ 6720.720822] Buffer I/O error on dev dm-4, logical block 0, async page read
[ 6720.720859] Buffer I/O error on dev dm-4, logical block 244190133, async page read
[ 6721.773923] device-mapper: ioctl: unable to remove open device luks-56e12bc0-b300-424e-bd57-af53c743a9e9
<duplicates removed>
[ 6723.576265] device-mapper: ioctl: unable to remove open device luks-56e12bc0-b300-424e-bd57-af53c743a9e9
[ 6724.757738] buffer_io_error: 100 callbacks suppressed
[ 6724.757748] Buffer I/O error on dev dm-4, logical block 183500806, lost async page write
<duplicates removed>
[ 6724.757778] Buffer I/O error on dev dm-4, logical block 183500809, lost async page write
[ 6726.883288] dev_remove: 16 callbacks suppressed
[ 6726.883295] device-mapper: ioctl: unable to remove open device luks-56e12bc0-b300-424e-bd57-af53c743a9e9


I'm attempting to backup to my usual external USB disk:
[ 5897.635211] usb 4-2: new SuperSpeed USB device number 2 using xhci_hcd
[ 5897.648391] usb 4-2: New USB device found, idVendor=0bc2, idProduct=2312
[ 5897.648398] usb 4-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 5897.648402] usb 4-2: Product: Expansion
[ 5897.648405] usb 4-2: Manufacturer: Seagate
[ 5897.648408] usb 4-2: SerialNumber: <snip>
[ 5897.703744] usb-storage 4-2:1.0: USB Mass Storage device detected
[ 5897.703957] usb-storage 4-2:1.0: Quirks match for vid 0bc2 pid 2312: 2000000
[ 5897.704049] scsi host6: usb-storage 4-2:1.0
[ 5897.707761] usbcore: registered new interface driver usb-storage
[ 5897.709734] usbcore: registered new interface driver uas
[ 5898.706159] scsi 6:0:0:0: Direct-Access     Seagate  Expansion        0219 PQ: 0 ANSI: 6
[ 5898.707022] sd 6:0:0:0: Attached scsi generic sg2 type 0
[ 5901.920725] sd 6:0:0:0: [sdb] 1953525167 512-byte logical blocks: (1.00 TB/931 GiB)
[ 5901.921867] sd 6:0:0:0: [sdb] Write Protect is off
[ 5901.921884] sd 6:0:0:0: [sdb] Mode Sense: 2b 00 10 08
[ 5901.922667] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 5901.944223] sd 6:0:0:0: [sdb] Attached SCSI disk

On this very same system, after all the new hardware, after the repeated problems which do not allow the local VM backup to complete, I run two rounds of badblocks -n -s -v /dev/sdb from dom0 and it comes back with zero errors, and after all this I/O, dmesg is clean and all read/writes have been done wonderfully without a hiccup.

This unreliable I/O to USB media business may be unrelated to the original issue (unreliable I/O on internal SSD), but it's too much coincidence, on new hardware, on a machine that never had issues before, and I'm starting to get paranoid. Any ideas what may be going on here?

Thanks

Alex

Marek Marczykowski-Górecki

unread,
Jan 17, 2016, 5:07:12 PM1/17/16
to IX4 Svs, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

(...)
I've seen similar issue when there was not enough power from a single
USB port for that hard drive. Connecting an external power (actually from a
second USB port) fixed that issue.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJWnBCJAAoJENuP0xzK19csdPcH/0+gkkEeCuw2gncxxQF6bUDN
ZEJNGtjHKnjuezBXAJWdVH7kOZOH9SAstpDkPPk4WARz7ehWGLmCDjAKrQSnhQA1
BcNoOeml0P+hbsCj34jtp5q8By4DXVvxMXkEFTlBQissO989rOMS+Q1ryKWICBW4
DPUDxY9DVjee9ScURemRrWQQdwtdef4DPEJBkUtO+rizrRgT12P/XIWOilIxhDJ0
r2u69V8kDMWElrkdO0IGfOtBDejI7FPsFAcB4GqH5WS3jWxLhnojHYoXLgdxPRtM
eb6x3PJSQIlaDxAQrJD3leY5DnXklO+eEEFYnkLZq16F4NNQZyNkzd/AplGrgMU=
=ORAq
-----END PGP SIGNATURE-----

IX4 Svs

unread,
Jan 17, 2016, 5:51:40 PM1/17/16
to Marek Marczykowski-Górecki, qubes...@googlegroups.com
I've now tested a second USB drive (this one USB2), on two other ports of this laptop. I'm not feeding additional power to the drive, as I've been using it for years with no issues and I never had to. Same with the original drive, which I've been using for my local Qubes backup for >18 months without issues. The behaviour is identical. Even though it's USB2 (less power required), even though the laptop is connected to a monster power supply (180W).

So I've now tried
- multiple hard drives
- both USB2 and USB3
- multiple USB ports (both on the laptop chassis and on the dock)
- laptop's original 90W power supply, as well as 180W power supply
- laptop docked and undocked

...and even though something as I/O intensive as badblocks -n (non-destructive read-write test test) on a 1TB drive has completed multiple rounds without a hiccup, Qubes VM backup always fails (I've tried 6 times now) with xhci_hcd errors. Is there a way of running the backup from the console with more verbose logging so that I perhaps see what it's trying to do before it fails? I'm finding it difficult to believe that is is purely hardware based, but not triggered by so many hours of running badblocks -n.

Cheers

Alex

Marek Marczykowski-Górecki

unread,
Jan 17, 2016, 6:06:55 PM1/17/16
to IX4 Svs, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Yes, you can run qvm-backup --debug

> I'm finding it difficult to believe that is
> is purely hardware based, but not triggered by so many hours of running
> badblocks -n.

Indeed looks like to much for a coincidence.
You can also try with some random-access workload (badblocks is doing
sequential accesses only). Maybe simply cp -r something with many small
files? Or something more robust like `bonnie++`.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJWnB6IAAoJENuP0xzK19cs5v8H/jP6pawCTi/oCSU05eeVJmGg
dz3oyDsTSwOoBHYn3x1/EbMGWtmA7oRjXrJoZ5cYJAP2o+PcBCbX/3760QqUdOhm
LwB9T2H7mpmTV/EWwPryyHPWTfXAZ8kOOr34MXe78Wv2bIMV6loG2c8gEmrwGeus
LIL9/QA/70EuxfaqJt9/b0W6Alc3MIzHpjQkNlVuS0rIU59mKxVrnZC1cdaytWkM
C+j8630eZKzuN4gAADFibj9kGBldhVA5oJ/6iVSn21hpnRnSD/FPCzED//vJmthn
oO0XCPxTu6Qq1aJijFnZXBAjrmQVvSD1ZrL5FRg4eD7HeFPEMZhtJxgyx8vJm4o=
=jiXa
-----END PGP SIGNATURE-----
Reply all
Reply to author
Forward
0 new messages