Qubes-DB fails when running an AppVM with pvgrub2

65 views
Skip to first unread message

Olivier Médoc

unread,
Sep 6, 2016, 2:01:29 AM9/6/16
to qubes...@googlegroups.com

Hello,

I'm trying to improve the pvgrub2 support for archlinux, but after updating my system and the kernel, I cannot successfully boot with pvgrub2 anymore.

The system boot successfully, but all qubes related components are failing starting with the error message:

[    1.177436] qubesdb-daemon[306]: Cannot get own domid
[    1.177608] qubesdb-daemon[306]: FATAL: vchan initialization failed


Any idea why it boots with a standard qubes fedora kernel, but fails with pvgrub2 ?


The first relevant messages in the kernel console are :

[.[0;32m  OK  .[0m] Reached target Local File Systems (Pre).
         Mounting /tmp...
[.[0;32m  OK  .[0m] Mounted /tmp.
[.[0;32m  OK  .[0m] Started udev Coldplug all Devices.
[.[0;32m  OK  .[0m] Started udev Kernel Device Manager.
[.[0;32m  OK  .[0m] Started Journal Service.
[    0.963731] systemd-modules-load[169]: Inserted module 'xen_evtchn'
         Starting Flush Journal to Persistent Storage...
[    0.967448] systemd-modules-load[169]: Inserted module 'xen_blkback'
[    0.967622] systemd-modules-load[169]: Inserted module 'u2mfn'
[    0.967731] systemd-modules-load[169]: Inserted module 'xen_gntalloc'
[    0.967842] systemd-modules-load[169]: Inserted module 'xen_gntdev'
[    0.967951] systemd-modules-load[169]: Inserted module 'dummy_hcd'
[.[0;32m  OK  .[0m] Found device /dev/hvc0.
[.[0;32m  OK  .[0m] Found device /dev/xvdb.
         Starting File System Check on /dev/xvdb...
[.[0;32m  OK  .[0m] Started Flush Journal to Persistent Storage.
[    1.101323] systemd-fsck[266]: /dev/xvdb: recovering journal
[    1.122903] systemd-fsck[266]: /dev/xvdb: clean, 53723/966656 files, 678174/3852288 blocks
[.[0;32m  OK  .[0m] Started File System Check on /dev/xvdb.
rw.mount: Directory /rw to mount over is not empty, mounting anyway.
         Mounting /rw...
[.[0;32m  OK  .[0m] Mounted /rw.


Then I have an error with tinyproxy (which I think could be safely ignored):

[.[0;32m  OK  .[0m] Reached target Local File Systems.
         Starting Create Volatile Files and Directories...
[    1.154212] systemd-tmpfiles[298]: [/usr/lib/tmpfiles.d/qubes-core-agent-linux.conf:1] Unknown user 'tinyproxy'.
         Starting Qubes DB agent...
systemd-tmpfiles-setup.service: Main process exited, code=exited, status=1/FAILURE
[.[0;1;31mFAILED.[0m] Failed to start Create Volatile Files and Directories.
See 'systemctl status systemd-tmpfiles-setup.service' for details.
systemd-tmpfiles-setup.service: Unit entered failed state.
systemd-tmpfiles-setup.service: Failed with result 'exit-code'.

Finally, the real error is qubes-db failing, which causes everything depending on vchan failing:

         Starting Update UTMP about System Boot/Shutdown...
         Starting Network Time Synchronization...
qubes-db.service: Main process exited, code=exited, status=1/FAILURE
[.[0;1;31mFAILED.[0m] Failed to start Qubes DB agent.
See 'systemctl status qubes-db.service' for details.
qubes-db.service: Unit entered failed state.
qubes-db.service: Failed with result 'exit-code'.
         Starting Init Qubes Services settings...
[    1.177436] qubesdb-daemon[306]: Cannot get own domid
[    1.177608] qubesdb-daemon[306]: FATAL: vchan initialization failed
         Starting Qubes Random Seed...
[.[0;32m  OK  .[0m] Started Update UTMP about System Boot/Shutdown.
[    1.189617] qubes-random-seed.sh[316]: Failed connect to local daemon
qubes-random-seed.service: Main process exited, code=exited, status=1/FAILURE
[.[0;1;31mFAILED.[0m] Failed to start Qubes Random Seed.
See 'systemctl status qubes-random-seed.service' for details.
qubes-random-seed.service: Unit entered failed state.
qubes-random-seed.service: Failed with result 'exit-code'.
Reexecuting.
systemd 231 running in system mode. (+PAM -AUDIT -SELINUX -IMA -APPARMOR +SMACK -SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN)
Detected virtualization xen.
Detected architecture x86-64.
[    1.297687] qubes-sysinit.sh[315]: Failed connect to local daemon
[    1.308254] qubes-sysinit.sh[315]: /usr/lib/qubes/init/qubes-sysinit.sh: line 105: [: : integer expression expected
[.[0;32m  OK  .[0m] Started Init Qubes Services settings.
[.[0;32m  OK  .[0m] Started Network Time Synchronization.
[.[0;32m  OK  .[0m] Reached target System Time Synchronized.
[.[0;32m  OK  .[0m] Found device /dev/xvdc1.
         Activating swap /dev/xvdc1...
[.[0;32m  OK  .[0m] Activated swap /dev/xvdc1.
[.[0;32m  OK  .[0m] Reached target Swap.
[.[0;32m  OK  .[0m] Reached target System Initialization.
[.[0;32m  OK  .[0m] Listening on D-Bus System Message Bus Socket.
[.[0;32m  OK  .[0m] Reached target Sockets.
[.[0;32m  OK  .[0m] Started Daily rotation of log files.
[.[0;32m  OK  .[0m] Started Daily locate database update.
[.[0;32m  OK  .[0m] Reached target Paths.
[.[0;32m  OK  .[0m] Reached target Basic System.
         Starting Qubes memory information reporter...
[.[0;32m  OK  .[0m] Reached target Containers.
[.[0;32m  OK  .[0m] Started D-Bus System Message Bus.
         Starting Network Service...
         Starting Initialize and mount /rw and /home...
         Starting Qubes base firewall settings...
         Starting Login Service...
[.[0;32m  OK  .[0m] Started Daily verification of password and group files.
         Starting Qubes remote exec agent...
[.[0;32m  OK  .[0m] Started Daily man-db cache update.
[.[0;32m  OK  .[0m] Started Daily Cleanup of Temporary Directories.
[.[0;32m  OK  .[0m] Reached target Timers.
[.[0;32m  OK  .[0m] Started Entropy Harvesting Daemon.
[.[0;32m  OK  .[0m] Started Qubes memory information reporter.
[.[0;32m  OK  .[0m] Started Qubes remote exec agent.
[    4.181114] systemd-networkd[377]: Enumeration completed
[.[0;32m  OK  .[0m] Started Network Service.
[    4.186496] qrexec-agent[392]: Error while vchan server_init, exiting
qubes-qrexec-agent.service: Main process exited, code=exited, status=1/FAILURE
qubes-qrexec-agent.service: Unit entered failed state.
qubes-qrexec-agent.service: Failed with result 'exit-code'.
[.[0;32m  OK  .[0m] Started Login Service.
[    4.195638] systemd-logind[380]: New seat seat0.
[    4.205274] mount-dirs.sh[378]: tune2fs 1.43.1 (08-Jun-2016)
[    4.205759] mount-dirs.sh[378]: Setting reserved blocks percentage to 0% (0 blocks)
[    4.209024] mount-dirs.sh[378]: mount: /dev/xvdb is already mounted or /rw busy
[    4.209367] mount-dirs.sh[378]:        /dev/xvdb is already mounted on /rw
[    4.246177] qubes-iptables[379]: iptables: Applying firewall rules: OK
[.[0;32m  OK  .[0m] Started Initialize and mount /rw and /home.
         Starting Qubes GUI Agent...
         Starting Qubes misc post-boot actions...
[.[0;32m  OK  .[0m] Started Qubes GUI Agent.
[    4.277814] qubes-gui[448]: vchan initialization failed
qubes-gui-agent.service: Main process exited, code=exited, status=1/FAILURE
[.[0;32m  OK  .[0m] Started Qubes base firewall settings.
[.[0;32m  OK  .[0m] Reached target Network.
         Starting Network Name Resolution...
[    4.289943] qubes-iptables[379]: ip6tables: Applying firewall rules: OK
         Starting Permit User Sessions...
qubes-gui-agent.service: Unit entered failed state.
qubes-gui-agent.service: Failed with result 'exit-code'.
[.[0;32m  OK  .[0m] Started Permit User Sessions.
[.[0;32m  OK  .[0m] Started Getty on tty1.
[.[0;32m  OK  .[0m] Started Serial Getty on hvc0.
[.[0;32m  OK  .[0m] Reached target Login Prompts.
[    4.338130] login[466]: pam_unix(login:session): session opened for user user by LOGIN(uid=0)
[    4.343453] systemd-resolved[457]: Positive Trust Anchors:
[    4.343957] systemd-resolved[457]: . IN DS 19036 8 2 49aac11d7b6f6446702e54a1607371607a1a41855200fd2ce1cdde32f24e8fb5
[    4.344197] systemd-resolved[457]: Negative trust anchors: 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa 168.192.in-addr.arpa corp home internal intranet lan local private test
[    4.344430] systemd-resolved[457]: Defaulting to hostname 'linux'.
[    4.344684] systemd-resolved[457]: Switching to system DNS server 10.137.2.1.
[.[0;32m  OK  .[0m] Created slice User Slice of user.
[.[0;32m  OK  .[0m] Started Network Name Resolution.
         Starting User Manager for UID 1000...
[.[0;32m  OK  .[0m] Started Session c1 of user user.
[    4.354997] systemd-logind[380]: New session c1 of user user.
[    4.357798] systemd[480]: pam_unix(systemd-user:session): session opened for user user by (uid=0)
[.[0;32m  OK  .[0m] Started Qubes misc post-boot actions.
[.[0;32m  OK  .[0m] Reached target Multi-User System.
[    4.386519] systemd[480]: Startup finished in 20ms.
[.[0;32m  OK  .[0m] Started User Manager for UID 1000.
Startup finished in 643ms (kernel) + 3.743s (userspace) = 4.387s.
[    4.400567] login[466]: LOGIN ON tty1 BY user
[    4.504396] haveged[385]: haveged: ver: 1.9.1; arch: x86; vend: GenuineIntel; build: (gcc 4.9.2 ITV); collect: 128K
[    4.505139] haveged[385]: haveged: cpu: (L4 VC); data: 32K (L4 V); inst: 32K (L4 V); idx: 21/40; sz: 31910/59039
[    4.505343] haveged[385]: haveged: tot tests(BA8): A:1/1 B:1/1 continuous tests(B):  last entropy estimate 8.00172
[    4.505520] haveged[385]: haveged: fills: 0, generated: 0
 
Arch Linux 4.4.19-1-lts (hvc0)
 
localhost login:

Olivier Médoc

unread,
Sep 6, 2016, 4:37:13 AM9/6/16
to qubes...@googlegroups.com
I'm using R3.1 by the way.


Marek Marczykowski-Górecki

unread,
Sep 6, 2016, 5:50:44 AM9/6/16
to Olivier Médoc, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Tue, Sep 06, 2016 at 08:01:26AM +0200, 'Olivier Médoc' via qubes-devel wrote:
> Hello,
>
> I'm trying to improve the pvgrub2 support for archlinux, but after
> updating my system and the kernel, I cannot successfully boot with
> pvgrub2 anymore.
>
> The system boot successfully, but all qubes related components are
> failing starting with the error message:
>
> [ 1.177436] qubesdb-daemon[306]: Cannot get own domid

Looks like it can't talk to xenstore. You can check this manually:

xenstore-read domid

> [ 1.177608] qubesdb-daemon[306]: FATAL: vchan initialization failed
>
>
> Any idea why it boots with a standard qubes fedora kernel, but fails
> with pvgrub2 ?

My first guess would be some kernel module missing. But see below.

> *The first relevant messages in the kernel console are :*
>
> [.[0;32m OK .[0m] Reached target Local File Systems (Pre).
> Mounting /tmp...
> [.[0;32m OK .[0m] Mounted /tmp.
> [.[0;32m OK .[0m] Started udev Coldplug all Devices.
> [.[0;32m OK .[0m] Started udev Kernel Device Manager.
> [.[0;32m OK .[0m] Started Journal Service.
> [ 0.963731] systemd-modules-load[169]: Inserted module 'xen_evtchn'
> Starting Flush Journal to Persistent Storage...
> [ 0.967448] systemd-modules-load[169]: Inserted module 'xen_blkback'
> [ 0.967622] systemd-modules-load[169]: Inserted module 'u2mfn'
> [ 0.967731] systemd-modules-load[169]: Inserted module 'xen_gntalloc'
> [ 0.967842] systemd-modules-load[169]: Inserted module 'xen_gntdev'
> [ 0.967951] systemd-modules-load[169]: Inserted module 'dummy_hcd'

This looks good, but maybe I forgot some of them.
Try starting qubesdb-daemon with strace and see on which device file it
fails.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJXzpFsAAoJENuP0xzK19csnFoH/3DnsFuALajseaK1MbpJCtOk
BkInclaxIoI8VPiKfhDbYgkxpEkBkoro+gkRZLqHpY9HcyJnUs2HiDnIyhpajnrb
mIUEgsVIb6iaXOBDA7jQcCl6+kVIzE+8iFpK1ANDpkBSs8euUvl7Iwj2hD9oVlTg
rQxz9aY2cEyXU14CeUBl+oKtU10xUSKmpO4AMks500U1K/anfietusN34hPEvMjm
S81j7eSOZuK0neuQ3EXQlmBc2GWmqFu+jAMbonxMxJSay1KWkLka+GqQft4x8XMS
9yiOmwqkcg8yimRtx1qPAc1eTLJ9Jz+7br8zq55gCQ0+qpNywpEaVH/p4VryOqg=
=UeNP
-----END PGP SIGNATURE-----

Olivier Médoc

unread,
Sep 6, 2016, 9:07:31 AM9/6/16
to qubes...@googlegroups.com
On 09/06/2016 11:50 AM, Marek Marczykowski-Górecki wrote:
> On Tue, Sep 06, 2016 at 08:01:26AM +0200, 'Olivier Médoc' via
> qubes-devel wrote:
> > Hello,
>
> > I'm trying to improve the pvgrub2 support for archlinux, but after
> > updating my system and the kernel, I cannot successfully boot with
> > pvgrub2 anymore.
>
> > The system boot successfully, but all qubes related components are
> > failing starting with the error message:
>
> > [ 1.177436] qubesdb-daemon[306]: Cannot get own domid
>
> Looks like it can't talk to xenstore. You can check this manually:
>
> xenstore-read domid

Thanks xenstore apparently is broken, but I wonder why.

Strace shows that it somehow fails when writing the domid on xenbus (The
exact same bytes are written on working vms.):

rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
stat("/var/run/xenstored/socket", 0x7ffe5f761770) = -1 ENOENT (No such
file or directory)
access("/dev/xen/xenbus", F_OK) = 0
stat("/dev/xen/xenbus", {st_mode=S_IFCHR|0660, st_rdev=makedev(10, 62),
...}) = 0
open("/dev/xen/xenbus", O_RDWR) = 3
brk(NULL) = 0x2396000
brk(0x23b7000) = 0x23b7000
rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER, 0x7f868f37e0b0},
{SIG_DFL, [], 0}, 8) = 0
write(3, "\2\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0", 16) = 16
write(3, "domid\0", 6) = -1 ESRCH (No such process)
rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7f868f37e0b0}, NULL,
8) = 0
close(3) = 0
write(2, "xenstore-read: ", 15xenstore-read: ) = 15
write(2, "couldn't read path domid", 24couldn't read path domid) = 24
write(2, "\n", 1
) = 1
close(-1) = -1 EBADF (Bad file descriptor)
exit_group(1) = ?
+++ exited with 1 +++


>
> > [ 1.177608] qubesdb-daemon[306]: FATAL: vchan initialization failed
>
>
> > Any idea why it boots with a standard qubes fedora kernel, but fails
> > with pvgrub2 ?
>
> My first guess would be some kernel module missing. But see below.
>
> > *The first relevant messages in the kernel console are :*
>
> > [.[0;32m OK .[0m] Reached target Local File Systems (Pre).
> > Mounting /tmp...
> > [.[0;32m OK .[0m] Mounted /tmp.
> > [.[0;32m OK .[0m] Started udev Coldplug all Devices.
> > [.[0;32m OK .[0m] Started udev Kernel Device Manager.
> > [.[0;32m OK .[0m] Started Journal Service.
> > [ 0.963731] systemd-modules-load[169]: Inserted module 'xen_evtchn'
> > Starting Flush Journal to Persistent Storage...
> > [ 0.967448] systemd-modules-load[169]: Inserted module 'xen_blkback'
> > [ 0.967622] systemd-modules-load[169]: Inserted module 'u2mfn'
> > [ 0.967731] systemd-modules-load[169]: Inserted module 'xen_gntalloc'
> > [ 0.967842] systemd-modules-load[169]: Inserted module 'xen_gntdev'
> > [ 0.967951] systemd-modules-load[169]: Inserted module 'dummy_hcd'
>
> This looks good, but maybe I forgot some of them.
> Try starting qubesdb-daemon with strace and see on which device file it
> fails.
>
I don't see specific device files used when running qubesdb-daemon. I
only see a read to file descriptor 3. Maybe a started xenstore-read
process ?

open("/usr/lib/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3,
"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200\r\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=14608, ...}) = 0
mmap(NULL, 2109680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f8adca7a000
mprotect(0x7f8adca7c000, 2097152, PROT_NONE) = 0
mmap(0x7f8adcc7c000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f8adcc7c000
close(3) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x7f8adee61000
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f8adee5e000
arch_prctl(ARCH_SET_FS, 0x7f8adee5e780) = 0
mprotect(0x7f8adeaee000, 16384, PROT_READ) = 0
mprotect(0x7f8adcc7c000, 4096, PROT_READ) = 0
mprotect(0x7f8add4d2000, 4096, PROT_READ) = 0
mprotect(0x7f8add6ea000, 4096, PROT_READ) = 0
mprotect(0x7f8add9f2000, 8192, PROT_READ) = 0
mprotect(0x7f8addc0b000, 4096, PROT_READ) = 0
mprotect(0x7f8adde31000, 4096, PROT_READ) = 0
mprotect(0x7f8ade135000, 4096, PROT_READ) = 0
mprotect(0x7f8ade33d000, 4096, PROT_READ) = 0
mprotect(0x7f8ade756000, 4096, PROT_READ) = 0
mprotect(0x7f8adeeea000, 12288, PROT_READ) = 0
mprotect(0x605000, 4096, PROT_READ) = 0
mprotect(0x7f8adef1d000, 4096, PROT_READ) = 0
munmap(0x7f8adeef1000, 180125) = 0
set_tid_address(0x7f8adee5ea50) = 924
set_robust_list(0x7f8adee5ea60, 24) = 0
rt_sigaction(SIGRTMIN, {0x7f8add2c0b70, [], SA_RESTORER|SA_SIGINFO,
0x7f8add2cc080}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f8add2c0c00, [],
SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f8add2cc080}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
brk(NULL) = 0x2126000
brk(0x2147000) = 0x2147000
pipe([3, 4]) = 0
clone(child_stack=NULL,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f8adee5ea50) = 925
close(4) = 0
read(3, "", 6) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=925, si_uid=0,
si_status=1, si_utime=0, si_stime=0} ---
write(2, "startup failed\n", 15startup failed
) = 15
exit_group(1) = ?
+++ exited with 1 +++



Marek Marczykowski-Górecki

unread,
Sep 6, 2016, 9:44:24 AM9/6/16
to Olivier Médoc, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

This looks wrong. And not just missing entry, something else. Missing
entry access looks like this:

stat("/var/run/xenstored/socket", 0x7fff2170aaa0) = -1 ENOENT (No such
file or directory)
access("/dev/xen/xenbus", F_OK) = 0
stat("/dev/xen/xenbus", {st_mode=S_IFCHR|0660, st_rdev=makedev(10, 62),
...}) = 0
open("/dev/xen/xenbus", O_RDWR) = 3
rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER, 0x7fc8b0ec1ab0},
{SIG_DFL, [], 0}, 8) = 0
write(3, "\2\0\0\0\0\0\0\0\0\0\0\0\r\0\0\0", 16) = 16
write(3, "non-existent\0", 13) = 13
read(3, "\20\0\0\0\0\0\0\0\0\0\0\0\7\0\0\0", 16) = 16
read(3, "ENOENT\0", 7) = 7
rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7fc8b0ec1ab0}, NULL,
8) = 0
write(2, "xenstore-read: ", 15xenstore-read: ) = 15
write(2, "couldn't read path non-existent", 31couldn't read path
non-existent) = 31

So, write should succeed, then it should return "ENOENT".

According to kernel source code the only situation when it returns ESRCH
is when it can't find xenstore transaction under which operation is
running
(drivers/xen/xenbus/xenbus_dev_frontend.c:xenbus_write_transaction).

But xenstore-read do not start any transaction for a single read at all.
Which AFAIR should be fine (to specify null transaction).

I think this is a bug in kernel driver. And indeed that -ESRCH case was
introduced not so long ago:

commit 0beef634b86a1350c31da5fcc2992f0d7c8a622b
Author: Jan Beulich <JBeu...@suse.com>
Date: Thu Jul 7 01:23:57 2016 -0600

xenbus: don't BUG() on user mode induced condition

Inability to locate a user mode specified transaction ID should not
lead to a kernel crash. For other than XS_TRANSACTION_START also
don't issue anything to xenbus if the specified ID doesn't match
that
of any active transaction.

Signed-off-by: Jan Beulich <jbeu...@suse.com>
Cc: <sta...@vger.kernel.org>
Signed-off-by: David Vrabel <david....@citrix.com>


And then fixed:


commit 9a035a40f7f3f6708b79224b86c5777a3334f7ea
Author: Jan Beulich <JBeu...@suse.com>
Date: Mon Aug 15 09:02:38 2016 -0600

xenbus: don't look up transaction IDs for ordinary writes

This should really only be done for XS_TRANSACTION_END messages, or
else at least some of the xenstore-* tools don't work anymore.

Fixes: 0beef634b8 ("xenbus: don't BUG() on user mode induced
condition")
Reported-by: Richard Schütz <rsch...@uni-koblenz.de>
Cc: <sta...@vger.kernel.org>
Signed-off-by: Jan Beulich <jbeu...@suse.com>
Tested-by: Richard Schütz <rsch...@uni-koblenz.de>
Signed-off-by: David Vrabel <david....@citrix.com>


So, it looks like your kernel have the former commit, but not the later.
Indeed the later hasn't been backported to 4.4 yet.

For now I suggest downgrading to 4.4.16 or older (which don't have the
broken commit).

> I don't see specific device files used when running qubesdb-daemon. I
> only see a read to file descriptor 3. Maybe a started xenstore-read
> process ?
>

(...)

> clone(child_stack=NULL,
> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x7f8adee5ea50) = 925

The actual connection happen inside child process. You need "-f" flag to
strace. But I think it doesn't matter now.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJXzsgyAAoJENuP0xzK19csdnAH/il2h5SpmeUsQtGQI4w4f+Ww
cRq9lr3dAU1Qa8qFut3dpNy3YKJFu9sWTVi3TOOFV8uhcw5qgYGl0gvxYz9MbWNw
vf8KqzC4cL9zUpSu0iEYOMWrReGAA1unOYLVThtUpw8YuqqCUCQW54tQukU+zMJg
Txn7OtMysy5X+sYNe0D/OXbhpL0ror/JYSAsoL4bO00J5Vq7bJnOFkpbuafeznDl
H80S4FlP7Xabb2t38Va8Iyr+NTiKdqm8yT3vPJ8UPMyxoORwHHHaXwpf49mtJ0Wu
va9rboQlRP++ife3HDjodzUUrMTrcqOaFcvDeqmydOEKUW9bECyUgYJToF9TeoM=
=8wTM
-----END PGP SIGNATURE-----

Olivier Médoc

unread,
Sep 6, 2016, 9:49:52 AM9/6/16
to qubes...@googlegroups.com
On 09/06/2016 03:07 PM, 'Olivier Médoc' via qubes-devel wrote:
> On 09/06/2016 11:50 AM, Marek Marczykowski-Górecki wrote:
>> On Tue, Sep 06, 2016 at 08:01:26AM +0200, 'Olivier Médoc' via
>> qubes-devel wrote:
>>> Hello,
>>> I'm trying to improve the pvgrub2 support for archlinux, but after
>>> updating my system and the kernel, I cannot successfully boot with
>>> pvgrub2 anymore.
>>> The system boot successfully, but all qubes related components are
>>> failing starting with the error message:
>>> [ 1.177436] qubesdb-daemon[306]: Cannot get own domid
>> Looks like it can't talk to xenstore. You can check this manually:
>>
>> xenstore-read domid
> Thanks xenstore apparently is broken, but I wonder why.

Could it be that I use the git HEAD of vmm-xen in Qubes 3.1 ?
However, xenstore-exists works successfully. And when running strace
xenstore-exists, I can even see that xenstore-exists reads to domid
successfully.

So it is apparently a bug in xenstore-read. xenstore-exists seems to use
transaction ID. Could it be linked to the bug ?

I also found this reference of a kernel bug on the internet:
https://lkml.org/lkml/2016/8/21/308

Marek Marczykowski-Górecki

unread,
Sep 6, 2016, 9:56:23 AM9/6/16
to Olivier Médoc, qubes...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Tue, Sep 06, 2016 at 03:49:49PM +0200, 'Olivier Médoc' via qubes-devel wrote:
> On 09/06/2016 03:07 PM, 'Olivier Médoc' via qubes-devel wrote:
> > On 09/06/2016 11:50 AM, Marek Marczykowski-Górecki wrote:
> >> On Tue, Sep 06, 2016 at 08:01:26AM +0200, 'Olivier Médoc' via
> >> qubes-devel wrote:
> >>> Hello,
> >>> I'm trying to improve the pvgrub2 support for archlinux, but after
> >>> updating my system and the kernel, I cannot successfully boot with
> >>> pvgrub2 anymore.
> >>> The system boot successfully, but all qubes related components are
> >>> failing starting with the error message:
> >>> [ 1.177436] qubesdb-daemon[306]: Cannot get own domid
> >> Looks like it can't talk to xenstore. You can check this manually:
> >>
> >> xenstore-read domid
> > Thanks xenstore apparently is broken, but I wonder why.
>
> Could it be that I use the git HEAD of vmm-xen in Qubes 3.1 ?

Shouldn't matter.
Yes, exactly - see my other mail.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJXzssAAAoJENuP0xzK19csbt4IAIKc+ZNRVNOK19K5iJB0UH+z
c8aWiB1Xq4+JaIBDHWWmEuxfKiNKOzFPms9sRb9SAVTkcR59jOm2Ri66yx6h+lq6
Q/pEFldIQ554Jnd3qZrpcIqyoFn/GUo9YDJb9K1QlDs55Iv0oF8XtZxmS46G9vj+
7cJXgBH5B6eWJOmGKu6ZmQ3eevu2q5JYmtBLvbjQLIpHYi/H3BFO50TLEngBC7mg
WJzfIwOegZhGFY7Jm+1ubcHunxMwLVk6HyT0v0tG4cn0pYYw0PqlG+RqQb53R5pV
THX0iKoh5ivU9GXxMP/fKu7lme53+9U0YHKepm7B0Zok6ld3kcvRVGo6gbfZXgY=
=vHFw
-----END PGP SIGNATURE-----

Olivier Médoc

unread,
Sep 6, 2016, 9:56:37 AM9/6/16
to qubes...@googlegroups.com
Thanks for your help, I came to the same 'upgrade kernel because of a
bug' conclusion. But I forgot that downgrading is also a solution :).
And kernel 4.8 is not released yet (I only have 4.7.2 or 4.4.19 in stock
archlinux repositories).

> > I don't see specific device files used when running qubesdb-daemon. I
> > only see a read to file descriptor 3. Maybe a started xenstore-read
> > process ?
>
>
> (...)
>
> > clone(child_stack=NULL,
> > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> > child_tidptr=0x7f8adee5ea50) = 925
>
> The actual connection happen inside child process. You need "-f" flag to
> strace. But I think it doesn't matter now.
>
It matters for me. Thanks for the tip.


Olivier Médoc

unread,
Sep 6, 2016, 10:14:35 AM9/6/16
to qubes...@googlegroups.com
It work properly with kernel 4.4.16.

Thanks for your help.

Reply all
Reply to author
Forward
0 new messages