AppVms being killed on resume due to clock skew too large

73 views
Skip to first unread message

mmo...@disroot.org

unread,
Dec 22, 2019, 2:13:04 PM12/22/19
to qubes...@googlegroups.com
Hello,

I've been struggle with this for months now.
Whenever there's a resume from suspend some random AppVms are killed. Although this doesn't happen all the time it happens often enough on resume sufficiently to indicate an issue.
It seems that the clock skew is too large resulting in a kernel panic that kills the AppVM as the error below shows.

Is anyone experiencing the same issue? Is there any solution to this?

Thank you in advance.

[38009.358471] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[38009.358521] clocksource: 'xen' wd_now: 2697b2c28a19 wd_last: 2696858155d9 mask: ffffffffffffffff
[38009.358558] clocksource: 'tsc' cs_now: ffffff40e6dd060e cs_last: 4900d58115e2 mask: ffffffffffffffff

[38009.358594] tsc: Marking TSC unstable due to clocksource watchdog
[79151.795501] borgmatic[2418]: segfault at 6060cafd8549 ip 00006060ca2ae054 sp 00007ffe9d9bbad8 error 4 in python3.5[6060ca158000+3f0000]
[79151.795535] Code: 08 4c 8b 5e 08 4c 89 5b f0 4c 89 4e 08 4d 89 0b 48 89 73 e8 4c 89 43 f8 eb 92 66 90 66 2e 0f 1f 84 00 00 00 00 00 48 8b 47 08 <f6> 80 a9 00 00 00 40 75 03 31 c0 c3 53 48 8b 80 48 01 00 00 48 89
[79152.712396] general protection fault: 0000 [#1] SMP PTI
[79152.712416] CPU: 1 PID: 2425 Comm: systemctl Tainted: G O 4.19.84-1.pvops.qubes.x86_64 #1
[79152.712444] RIP: 0010:__pv_queued_spin_lock_slowpath+0x199/0x270
[79152.712462] Code: eb bd 83 e0 03 c1 ea 12 4c 8d 6d 44 48 c1 e0 04 41 be 01 00 00 00 4c 8d a0 80 10 02 00 8d 42 ff 48 98 4c 03 24 c5 20 77 17 82 <49> 89 2c 24 b8 00 80 00 00 eb 15 84 c0 75 0a 41 0f b6 54 24 44 84
[79152.712509] RSP: 0018:ffffc900006c7b60 EFLAGS: 00010002
[79152.712524] RAX: 0000000000003ffe RBX: ffff8880f47b0764 RCX: 0000000000000001
[79152.712544] RDX: 0000000000003fff RSI: 0000000000000000 RDI: ffff8880f47b0764
[79152.712565] RBP: ffff8880f5b21080 R08: 0000000000000000 R09: ffffc90000637e90
[79152.712585] R10: ffffc900006c7e20 R11: 0000000000000001 R12: 64616f6c00708019
[79152.712605] R13: ffff8880f5b210c4 R14: 0000000000000001 R15: 0000000000080000
[79152.712626] FS: 0000000000000000(0000) GS:ffff8880f5b00000(0000) knlGS:0000000000000000
[79152.712647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[79152.712664] CR2: 00007f088e3efab4 CR3: 000000000220a004 CR4: 00000000003606e0
[79152.712687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[79152.712707] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[79152.712727] Call Trace:
[79152.712739] _raw_spin_lock_irqsave+0x32/0x40
[79152.712755] try_to_wake_up+0x3d/0x490
[79152.712767] __wake_up_common+0x96/0x180
[79152.712780] ep_poll_callback+0x1ac/0x2f0
[79152.712791] __wake_up_common+0x96/0x180
[79152.712803] __wake_up_common_lock+0x7c/0xc0
[79152.712819] sock_def_wakeup+0x32/0x40
[79152.712831] unix_release_sock+0x288/0x2d0
[79152.712844] unix_release+0x19/0x30
[79152.712855] __sock_release+0x3d/0xc0
[79152.712866] sock_close+0x11/0x20
[79152.712878] __fput+0xe2/0x210
[79152.712891] task_work_run+0x8a/0xb0
[79152.712903] do_exit+0x2eb/0xc30
[79152.712915] ? __do_page_fault+0x278/0x4f0
[79152.712926] do_group_exit+0x3a/0xa0
[79152.712938] __x64_sys_exit_group+0x14/0x20
[79152.716375] do_syscall_64+0x5b/0x190
[79152.716391] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[79152.716407] RIP: 0033:0x7f088fac8618
[79152.716422] Code: Bad RIP value.
[79152.716433] RSP: 002b:00007fffd239eba8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[79152.716455] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f088fac8618
[79152.716475] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[79152.716496] RBP: 00007f088fda58e0 R08: 00000000000000e7 R09: fffffffffffffee8
[79152.716516] R10: 00007f088df83158 R11: 0000000000000246 R12: 00007f088fda58e0
[79152.716537] R13: 00007f088fdaac20 R14: 0000000000000000 R15: 0000000000000000
[79152.716558] Modules linked in: ip6table_filter ip6_tables xt_conntrack ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c binfmt_misc intel_rapl crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel xen_netfront intel_rapl_perf snd_pcm snd_timer snd soundcore pcspkr xen_netback u2mfn(O) xen_gntdev xen_gntalloc xen_blkback xen_evtchn parport_pc xenfs ppdev xen_privcmd lp parport overlay xen_blkfront
[79152.716680] ---[ end trace 1f4d28c382e93215 ]---
[79152.716703] RIP: 0010:__pv_queued_spin_lock_slowpath+0x199/0x270
[79152.716721] Code: eb bd 83 e0 03 c1 ea 12 4c 8d 6d 44 48 c1 e0 04 41 be 01 00 00 00 4c 8d a0 80 10 02 00 8d 42 ff 48 98 4c 03 24 c5 20 77 17 82 <49> 89 2c 24 b8 00 80 00 00 eb 15 84 c0 75 0a 41 0f b6 54 24 44 84
[79152.716768] RSP: 0018:ffffc900006c7b60 EFLAGS: 00010002
[79152.716782] RAX: 0000000000003ffe RBX: ffff8880f47b0764 RCX: 0000000000000001
[79152.716803] RDX: 0000000000003fff RSI: 0000000000000000 RDI: ffff8880f47b0764
[79152.716823] RBP: ffff8880f5b21080 R08: 0000000000000000 R09: ffffc90000637e90
[79152.716844] R10: ffffc900006c7e20 R11: 0000000000000001 R12: 64616f6c00708019
[79152.716864] R13: ffff8880f5b210c4 R14: 0000000000000001 R15: 0000000000080000
[79152.716885] FS: 0000000000000000(0000) GS:ffff8880f5b00000(0000) knlGS:0000000000000000
[79152.716906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[79152.716923] CR2: 00007f088fac85ee CR3: 000000000220a004 CR4: 00000000003606e0
[79152.716946] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[79152.716967] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[79152.716987] Kernel panic - not syncing: Fatal exception
[79152.717276] Kernel Offset: disabled

mmo...@disroot.org

unread,
Jan 31, 2020, 8:11:46 AM1/31/20
to qubes...@googlegroups.com
Hi everyone!

Is anyone experiencing this issue on resume (VMs being killed randomly apparently due to clock skew that generates kernel panic) ??
This is happening very often and it makes suspend useless.

I'm using a Lenovo T480s.

Any ideas are really appreciated!

Thanks!

Mike Keehan

unread,
Jan 31, 2020, 9:35:21 AM1/31/20
to qubes...@googlegroups.com
On 1/31/20 1:11 PM, mmo...@disroot.org wrote:
> Hi everyone!
>
> Is anyone experiencing this issue on resume (VMs being killed randomly
> apparently due to clock skew that generates kernel panic) ??
> This is happening very often and it makes suspend useless.
>
> I'm using a Lenovo T480s.
>
> Any ideas are really appreciated!
>
> Thanks!
>
> December 22, 2019 8:13 PM, mmo...@disroot.org
> <mailto:mmo...@disroot.org> wrote:
>
> Hello,
>
> I've been struggle with this for months now.
> Whenever there's a resume from suspend some random AppVms are
> killed. Although this doesn't happen all the time it happens often
> enough on resume sufficiently to indicate an issue.
> It seems that the clock skew is too large resulting in a kernel
> panic that kills the AppVM as the error below shows.
>
> Is anyone experiencing the same issue? Is there any solution to this?
>
> Thank you in advance.
>
> /*[38009.358471] clocksource: timekeeping watchdog on CPU0: Marking
> clocksource 'tsc' as unstable because the skew is too large:
> [38009.358521] clocksource: 'xen' wd_now: 2697b2c28a19 wd_last:
> 2696858155d9 mask: ffffffffffffffff
> [38009.358558] clocksource: 'tsc' cs_now: ffffff40e6dd060e cs_last:
> 4900d58115e2 mask: ffffffffffffffff*
> [38009.358594] tsc: Marking TSC unstable On 1/31/20 1:11 PM, mmo...@disroot.org wrote:> Hi everyone!
>
> Is anyone experiencing this issue on resume (VMs being killed randomly
> apparently due to clock skew that generates kernel panic) ??
> This is happening very often and it makes suspend useless.
>
> I'm using a Lenovo T480s.
>
> Any ideas are really appreciated!
>
> Thanks!
>
> December 22, 2019 8:13 PM, mmo...@disroot.org
> <mailto:mmo...@disroot.org> wrote:
>
> Hello,
>
> I've been struggle with this for months now.
> Whenever there's a resume from suspend some random AppVms are
> killed. Although this doesn't happen all the time it happens often
> enough on resume sufficiently to indicate an issue.
> It seems that the clock skew is too large resulting in a kernel
> panic that kills the AppVM as the error below shows.
>
> Is anyone experiencing the same issue? Is there any solution to this?
>
> Thank you in advance.
>
> /*[38009.358471] clocksource: timekeeping watchdog on CPU0: Marking
> clocksource 'tsc' as unstable because the skew is too large:
> [38009.358521] clocksource: 'xen' wd_now: 2697b2c28a19 wd_last:
> 2696858155d9 mask: ffffffffffffffff
> [38009.358558] clocksource: 'tsc' cs_now: ffffff40e6dd060e cs_last:
> 4900d58115e2 mask: ffffffffffffffff*
> 64616f6c00708019On 1/31/20 1:11 PM, mmo...@disroot.org wrote:> Hi everyone!
>
> Is anyone experiencing this issue on resume (VMs being killed randomly
> apparently due to clock skew that generates kernel panic) ??
> This is happening very often and it makes suspend useless.
>
> I'm using a Lenovo T480s.
>
> Any ideas are really appreciated!
>
> Thanks!
>
> December 22, 2019 8:13 PM, mmo...@disroot.org
> <mailto:mmo...@disroot.org> wrote:
>
> Hello,
>
> I've been struggle with this for months now.
> Whenever there's a resume from suspend some random AppVms are
> killed. Although this doesn't happen all the time it happens often
> enough on resume sufficiently to indicate an issue.
> It seems that the clock skew is too large resulting in a kernel
> panic that kills the AppVM as the error below shows.
>
> Is anyone experiencing the same issue? Is there any solution to this?
>
> Thank you in advance.
>
> /*[38009.358471] clocksource: timekeeping watchdog on CPU0: Marking
> clocksource 'tsc' as unstable because the skew is too large:
> [38009.358521] clocksource: 'xen' wd_now: 2697b2c28a19 wd_last:
> 2696858155d9 mask: ffffffffffffffff
> [38009.358558] clocksource: 'tsc' cs_now: ffffff40e6dd060e cs_last:
> 4900d58115e2 mask: ffffffffffffffff*
> *[79152.716987] Kernel panic - not syncing: Fatal exception*
> [79152.717276] Kernel OffseThe tsc warnings happen a long time before the segfault, which looks like

> [79152.716864] R13: ffff8880f5b210c4 R14: 0000000000000001 R15:
> 0000000000080000
> [79152.716885] FS: 0000000000000000(0000) GS:ffff8880f5b00000(0000)
> knlGS:0000000000000000
> [79152.716906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [79152.716923] CR2: 00007f088fac85ee CR3: 000000000220a004 CR4:
> 00000000003606e0
> [79152.716946] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [79152.716967] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> *[79152.716987] Kernel panic - not syncing: Fatal exception*
> [79152.717276] Kernel OffseThe tsc warnings happen a long time before the segfault, which looks like
> *[79152.716987] Kernel panic - not syncing: Fatal exception*
> [79152.717276] Kernel Offset: disabled/
>
>

The tsc warnings happen some time before the segfault appears.

Rather, the segfault mentions python3.5 and systemctl, and the
code lines show that the kernel is trying to close a socket.

Also, the line
"[79152.712416] CPU: 1 PID: 2425 Comm: systemctl Tainted: G O"
is showing that a non-linux module has been loaded (the "O"). Do
you have a proprietary module that you load? (graphics, or network
drivers?). If you can try without those it may help.

Mike.

mmo...@disroot.org

unread,
Jan 31, 2020, 11:08:47 AM1/31/20
to qubes...@googlegroups.com
Many thanks for the suggestion!
I'm not using any proprietary modules of any sort, below are the only modules that I are loaded in the AppVM that was killed (as you can see nothing really special):

Module Size Used by
fuse 126976 3
ip6table_filter 16384 1
ip6_tables 32768 1 ip6table_filter
xt_conntrack 16384 2
ipt_MASQUERADE 16384 1
iptable_nat 16384 1
nf_nat_ipv4 16384 2 ipt_MASQUERADE,iptable_nat
nf_nat 36864 1 nf_nat_ipv4
nf_conntrack 163840 4 xt_conntrack,nf_nat,ipt_MASQUERADE,nf_nat_ipv4
nf_defrag_ipv6 20480 1 nf_conntrack
nf_defrag_ipv4 16384 1 nf_conntrack
libcrc32c 16384 2 nf_conntrack,nf_nat
intel_rapl 24576 0
crct10dif_pclmul 16384 0
crc32_pclmul 16384 0
crc32c_intel 24576 1
ghash_clmulni_intel 16384 0
xen_netfront 32768 0
intel_rapl_perf 16384 0
pcspkr 16384 0
xenfs 16384 1
u2mfn 16384 0
xen_privcmd 24576 17 xenfs
xen_gntdev 24576 1
xen_gntalloc 16384 5
xen_blkback 49152 0
xen_evtchn 16384 6
overlay 122880 1
xen_blkfront 45056 6

The closesure of the socket probably is related with borgmatic (that I'm using as my backup mechanism for the AppVms). But I don't think its related, since I this enabled only in a few machines, and even the ones that are not using borgmatic are terminated on resume.

I'm runing out of ideas on this. What I do noticed though is that if the resume is done immediately after the suspend the resume works fins without any AppVM being killed, which seems to indicate perhaps an issue with the clock (that's the only thing that comes to mind, specially given the warning above) but I'm not sure if this is the root cause.

Any more suggestions would be really appreciated!

mmo...@disroot.org

unread,
Feb 1, 2020, 5:27:41 AM2/1/20
to qubes...@googlegroups.com
Same problem again, this time not related to any socket closure.
Apparently related to systemd:


[41911.199732] audit: type=1104 audit(1580516883.707:119): pid=4917 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? res=success'
[41920.252871] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[41920.252927] clocksource: 'xen' wd_now: 2a1620baf67a wd_last: 2a140e3c5f9f mask: ffffffffffffffff
[41920.252972] clocksource: 'tsc' cs_now: ffffff88779d4270 cs_last: 5083a288ea9a mask: ffffffffffffffff
[41920.253013] tsc: Marking TSC unstable due to clocksource watchdog
[41921.161370] audit: type=1100 audit(1580516893.670:120): pid=4955 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:authentication grantors=pam_rootok acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? res=success'
[41921.163039] audit: type=1103 audit(1580516893.672:121): pid=4955 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? res=success'
[41921.176874] audit: type=1105 audit(1580516893.686:122): pid=4955 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? res=success'
[41922.205481] audit: type=1106 audit(1580552389.038:123): pid=4955 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:session_close grantors=pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_umask,pam_lastlog acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? res=success'
[41922.205554] audit: type=1104 audit(1580552389.038:124): pid=4955 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? res=success'
[41932.321374] systemd[4919]: segfault at 640550f11920 ip 0000640550345cbd sp 00007ffd40e80440 error 6 in systemd[6405502f6000+b7000]
[41932.321420] Code: 24 28 02 00 00 48 85 c9 74 0f 48 89 81 28 02 00 00 49 8b 84 24 28 02 00 00 48 85 c0 0f 84 a0 07 00 00 49 8b 94 24 20 02 00 00 <48> 89 90 20 02 00 00 49 c7 84 24 28 02 00 00 00 00 00 00 49 c7 84

[41932.321515] audit: type=1701 audit(1580552399.156:125): auid=0 uid=0 gid=0 ses=4 pid=4919 comm="systemd" exe="/usr/lib/systemd/systemd" sig=11 res=1
[41932.336794] audit: type=1130 audit(1580552399.171:126): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4990-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[41932.627105] audit: type=1131 audit(1580552399.456:127): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=user@0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[41932.636551] audit: type=1131 audit(1580552399.471:128): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=user-runtime-dir@0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[41932.661359] audit: type=1131 audit(1580552399.495:129): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-coredump@0-4990-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[41934.482123] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[41934.482143] PGD 0 P4D 0
[41934.482150] Oops: 0000 [#1] SMP PTI
[41934.482159] CPU: 0 PID: 5002 Comm: Compositor Tainted: G O 4.19.94-1.pvops.qubes.x86_64 #1
[41934.482178] RIP: 0010:mem_cgroup_page_lruvec+0x28/0x50
[41934.482189] Code: 00 00 0f 1f 44 00 00 0f 1f 44 00 00 48 8b 47 38 48 8b 17 48 85 c0 48 0f 44 05 dc d1 0c 01 48 c1 ea 36 48 8b 84 d0 48 0a 00 00 <48> 3b b0 80 00 00 00 75 12 f3 c3 48 8d 86 a0 a1 02 00 48 3b b0 80
[41934.482222] RSP: 0018:ffffc900011d3aa8 EFLAGS: 00010046
[41934.482232] RAX: 0000000000000000 RBX: ffffffff82369cc0 RCX: ffffc900011d3ae8
[41934.482246] RDX: 0000000000000000 RSI: ffff8880f9fd5000 RDI: ffffea0002adec00
[41934.482265] RBP: ffff88802f7e6fb8 R08: ffffc900011d3ae8 R09: 000000000001eb39
[41934.482279] R10: 00000000000fa000 R11: ffffffffffffffff R12: ffff8880f9fd5000
[41934.482294] R13: ffffea0002adec00 R14: 0000000000000014 R15: ffff88802f7e7000
[41934.482308] FS: 0000000000000000(0000) GS:ffff8880f5a00000(0000) knlGS:0000000000000000
[41934.482323] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41934.482335] CR2: 0000000000000080 CR3: 000000003c9da001 CR4: 00000000003606f0
[41934.482351] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[41934.482365] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[41934.482380] Call Trace:
[41934.482388] release_pages+0x12c/0x4b0
[41934.482397] tlb_flush_mmu_free+0x36/0x50
[41934.482406] unmap_page_range+0x8f0/0xd00
[41934.482415] unmap_vmas+0x4c/0xa0
[41934.482423] exit_mmap+0xb5/0x1a0
[41934.482432] mmput+0x5f/0x140
[41934.482443] flush_old_exec+0x597/0x6c0
[41934.482451] ? load_elf_phdrs+0x97/0xb0
[41934.482460] load_elf_binary+0x3d9/0x1224
[41934.482468] ? get_acl+0x1a/0x100
[41934.482477] search_binary_handler+0xa6/0x1c0
[41934.482487] __do_execve_file.isra.34+0x587/0x7e0
[41934.482498] __x64_sys_execve+0x34/0x40
[41934.482506] do_syscall_64+0x5b/0x190
[41934.482515] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[41934.482526] RIP: 0033:0x7c1fb7d15acb
[41934.482535] Code: Bad RIP value.
[41934.482543] RSP: 002b:00007c1fa7361b18 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[41934.482557] RAX: ffffffffffffffda RBX: 00007c1fa7361b40 RCX: 00007c1fb7d15acb
[41934.482572] RDX: 00007c1fa9b5f800 RSI: 00007c1fa7361b20 RDI: 00007c1fb7a22cd0
[41934.482586] RBP: 00007c1fa7361ba0 R08: 00007c1fa7361b38 R09: 00007c1fa7361b60
[41934.482600] R10: 00007c1fa7361b20 R11: 0000000000000246 R12: 00007c1fa7361bd8
[41934.482615] R13: 0000000000000000 R14: 000000005e355001 R15: 00007c1fa7361bf0
[41934.482630] Modules linked in: ip6table_filter ip6_tables xt_conntrack ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c intel_rapl crct10dif_pclmul crc32_pclmul crc32c_intel xen_netfront ghash_clmulni_intel intel_rapl_perf pcspkr u2mfn(O) xenfs xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn overlay xen_blkfront
[41934.482694] CR2: 0000000000000080
[41934.482703] ---[ end trace f587889938477959 ]---
[41934.482714] RIP: 0010:mem_cgroup_page_lruvec+0x28/0x50
[41934.482724] Code: 00 00 0f 1f 44 00 00 0f 1f 44 00 00 48 8b 47 38 48 8b 17 48 85 c0 48 0f 44 05 dc d1 0c 01 48 c1 ea 36 48 8b 84 d0 48 0a 00 00 <48> 3b b0 80 00 00 00 75 12 f3 c3 48 8d 86 a0 a1 02 00 48 3b b0 80
[41934.482756] RSP: 0018:ffffc900011d3aa8 EFLAGS: 00010046
[41934.482766] RAX: 0000000000000000 RBX: ffffffff82369cc0 RCX: ffffc900011d3ae8
[41934.482780] RDX: 0000000000000000 RSI: ffff8880f9fd5000 RDI: ffffea0002adec00
[41934.482794] RBP: ffff88802f7e6fb8 R08: ffffc900011d3ae8 R09: 000000000001eb39
[41934.482808] R10: 00000000000fa000 R11: ffffffffffffffff R12: ffff8880f9fd5000
[41934.482822] R13: ffffea0002adec00 R14: 0000000000000014 R15: ffff88802f7e7000
[41934.482837] FS: 0000000000000000(0000) GS:ffff8880f5a00000(0000) knlGS:0000000000000000
[41934.482851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41934.482863] CR2: 00007c1fb7d15aa1 CR3: 000000003c9da001 CR4: 00000000003606f0
[41934.482877] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[41934.482891] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[41934.482905] Kernel panic - not syncing: Fatal exception
[41936.108632] Shutting down cpus with NMI
[41936.108774] Kernel Offset: disabled


Any idea what might be causing this issue?

Thanks.

Mike Keehan

unread,
Feb 1, 2020, 9:58:15 AM2/1/20
to qubes...@googlegroups.com
Should have replied to the list!


-------- Forwarded Message --------
Subject: Re: [qubes-users] Re: AppVms being killed on resume due to
clock skew too large
Date: Sat, 1 Feb 2020 11:49:29 +0000
From: Mike Keehan <mi...@keehan.net>
To: mmo...@disroot.org
> *[41932.321374] systemd[4919]: segfault at 640550f11920 ip
> 0000640550345cbd sp 00007ffd40e80440 error 6 in systemd[6405502f6000+b7000]
> [41932.321420] Code: 24 28 02 00 00 48 85 c9 74 0f 48 89 81 28 02 00 00
> 49 8b 84 24 28 02 00 00 48 85 c0 0f 84 a0 07 00 00 49 8b 94 24 20 02 00
> 00 <48> 89 90 20 02 00 00 49 c7 84 24 28 02 00 00 00 00 00 00 49 c7 84*
> January 31, 2020 5:08 PM, mmo...@disroot.org <mailto:mmo...@disroot.org>
As this is a different crash, maybe it is memory corruption.
Some information about which VM template is crashing may help,
and any VMs that never crash?

What type of machine are you using?

mmo...@disroot.org

unread,
Feb 1, 2020, 1:14:47 PM2/1/20
to qubes...@googlegroups.com
It was a typo sorry about that. I'm now replying to the list.

I'm using fedora 30 and debian 9, both templates and AppVMS uses pvh virtualization.
I have nothing fancy on my setup, this is what it comes out of the box.

Do you have any idea Mike about what might be originating this?

Thanks!


February 1, 2020 1:37 PM, "Mike Keehan" <mi...@keehan.net> wrote:

> On 2/1/20 10:27 AM, mmo...@disroot.org wrote:
>

>> *[41932.321374] systemd[4919]: segfault at 640550f11920 ip > 0000640550345cbd sp 00007ffd40e80440


>> error 6 in systemd[6405502f6000+b7000]
>> [41932.321420] Code: 24 28 02 00 00 48 85 c9 74 0f 48 89 81 28 02 00 00 > 49 8b 84 24 28 02 00 00
>> 48 85 c0 0f 84 a0 07 00 00 49 8b 94 24 20 02 00 > 00 <48> 89 90 20 02 00 00 49 c7 84 24 28 02 00 00

>> 00 00 00 00 49 c7 84*

brenda...@gmail.com

unread,
Feb 1, 2020, 4:06:46 PM2/1/20
to qubes-users
Perhaps due to memory balancing being temporarily unable to service all AppVMs upon dom0 wake?

mmo...@disroot.org

unread,
Feb 1, 2020, 4:15:54 PM2/1/20
to qubes...@googlegroups.com
Are you suggesting to disable the memory balancing? I can do that and see if it solves anything..

mmo...@disroot.org

unread,
Feb 13, 2020, 1:45:53 PM2/13/20
to qubes...@googlegroups.com
Problem still occurs even after memory balancing has been disabled.
On random resumes:

[37968.267047] audit: type=1104 audit(1581552250.086:231): pid=13012 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok acct="root" exe="/usr/lib/qubes/qrexec-agent" hostname=? addr=? terminal=? res=success'
[37973.300468] page:ffffea00039be440 count:0 mapcount:-128 mapping:0000000000000000 index:0x1
[37973.300492] flags: 0xfffe000000000()
[37973.300502] raw: 000fffe000000000 ffffea00031be488 ffffea00021ebe88 0000000000000000
[37973.300518] raw: 0000000000000001 0000000000000000 00000000ffffff7f 0000000000000000
[37973.300534] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
[37973.300552] ------------[ cut here ]------------
[37973.300563] kernel BUG at /home/user/rpmbuild/BUILD/kernel-4.19.94/linux-4.19.94/include/linux/mm.h:519!
[37973.300585] invalid opcode: 0000 [#1] SMP PTI
[37973.300597] CPU: 1 PID: 13048 Comm: DOM Worker Tainted: G O 4.19.94-1.pvops.qubes.x86_64 #1
[37973.300619] RIP: 0010:release_pages+0x3c3/0x4b0
[37973.300630] Code: ff ff 48 8b 34 24 49 8d bc 24 80 a1 02 00 45 31 e4 e8 c1 4a 71 00 e9 48 ff ff ff 48 c7 c6 48 70 08 82 4c 89 ef e8 9d 58 02 00 <0f> 0b 4d 85 e4 74 11 48 8b 34 24 49 8d bc 24 80 a1 02 00 e8 95 4a
[37973.300666] RSP: 0018:ffffc90000a6fab0 EFLAGS: 00010046
[37973.300678] RAX: 000000000000003e RBX: ffffffff82369cc0 RCX: 0000000000000006
[37973.300693] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880f5b168b0
[37973.300709] RBP: ffff8880873a5a28 R08: ffffc90000000000 R09: 0000000000011dc1
[37973.300725] R10: 00000000000fa000 R11: ffffffff829f1e4d R12: ffff8880f9fd5000
[37973.300740] R13: ffffea00039be440 R14: 0000000000000002 R15: ffff8880873a6000
[37973.300757] FS: 0000000000000000(0000) GS:ffff8880f5b00000(0000) knlGS:0000000000000000
[37973.300772] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37973.300787] CR2: 00007e31a34ff9d0 CR3: 000000004c3a8003 CR4: 00000000003606e0
[37973.300810] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37973.300827] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[37973.300842] Call Trace:
[37973.300852] tlb_flush_mmu_free+0x36/0x50
[37973.300862] unmap_page_range+0x8f0/0xd00
[37973.300872] unmap_vmas+0x4c/0xa0
[37973.300881] exit_mmap+0xb5/0x1a0
[37973.300984] mmput+0x5f/0x140
[37973.300995] flush_old_exec+0x597/0x6c0
[37973.301004] ? load_elf_phdrs+0x97/0xb0
[37973.301014] load_elf_binary+0x3d9/0x1224
[37973.301023] ? get_acl+0x1a/0x100
[37973.301033] search_binary_handler+0xa6/0x1c0
[37973.301045] __do_execve_file.isra.34+0x587/0x7e0
[37973.301056] __x64_sys_execve+0x34/0x40
[37973.301066] do_syscall_64+0x5b/0x190
[37973.301076] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[37973.301088] RIP: 0033:0x7e31bc9ffacb
[37973.301099] Code: Bad RIP value.
[37973.301107] RSP: 002b:00007e31a34fd4d8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[37973.301123] RAX: ffffffffffffffda RBX: 00007e31a34fd500 RCX: 00007e31bc9ffacb
[37973.301139] RDX: 00007e31bc786c00 RSI: 00007e31a34fd4e0 RDI: 00007e31bc722cd0
[37973.301155] RBP: 00007e31a34fd560 R08: 00007e31a34fd4f8 R09: 00007e31a34fd520
[37973.301171] R10: 00007e31a34fd4e0 R11: 0000000000000246 R12: 00007e31a34fd598
[37973.301187] R13: 0000000000000000 R14: 000000005e449201 R15: 00007e31a34fd5b0
[37973.301203] Modules linked in: binfmt_misc ip6table_filter ip6_tables xt_conntrack ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c intel_rapl crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel xen_netfront intel_rapl_perf pcspkr u2mfn(O) xen_gntdev xenfs xen_gntalloc xen_blkback xen_privcmd xen_evtchn overlay xen_blkfront
[37973.301280] ---[ end trace f2c87841b7f07a99 ]---
[37973.301292] RIP: 0010:release_pages+0x3c3/0x4b0
[37973.301303] Code: ff ff 48 8b 34 24 49 8d bc 24 80 a1 02 00 45 31 e4 e8 c1 4a 71 00 e9 48 ff ff ff 48 c7 c6 48 70 08 82 4c 89 ef e8 9d 58 02 00 <0f> 0b 4d 85 e4 74 11 48 8b 34 24 49 8d bc 24 80 a1 02 00 e8 95 4a
[37973.301340] RSP: 0018:ffffc90000a6fab0 EFLAGS: 00010046
[37973.301351] RAX: 000000000000003e RBX: ffffffff82369cc0 RCX: 0000000000000006
[37973.301367] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880f5b168b0
[37973.301383] RBP: ffff8880873a5a28 R08: ffffc90000000000 R09: 0000000000011dc1
[37973.301399] R10: 00000000000fa000 R11: ffffffff829f1e4d R12: ffff8880f9fd5000
[37973.301415] R13: ffffea00039be440 R14: 0000000000000002 R15: ffff8880873a6000
[37973.301431] FS: 0000000000000000(0000) GS:ffff8880f5b00000(0000) knlGS:0000000000000000
[37973.301447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37973.301461] CR2: 00007e31bc9ffaa1 CR3: 000000004c3a8003 CR4: 00000000003606e0
[37973.301476] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37973.301492] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[37973.301508] Kernel panic - not syncing: Fatal exception
[37975.195379] Shutting down cpus with NMI
[37975.195581] Kernel Offset: disabled




[37973.300468] page:ffffea00039be440 count:0 mapcount:-128 mapping:0000000000000000 index:0x1
[37973.300492] flags: 0xfffe000000000()
[37973.300502] raw: 000fffe000000000 ffffea00031be488 ffffea00021ebe88 0000000000000000
[37973.300518] raw: 0000000000000001 0000000000000000 00000000ffffff7f 0000000000000000
[37973.300534] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
[37973.300552] ------------[ cut here ]------------
[37973.300563] kernel BUG at /home/user/rpmbuild/BUILD/kernel-4.19.94/linux-4.19.94/include/linux/mm.h:519!
[37973.300585] invalid opcode: 0000 [#1] SMP PTI
[37973.300597] CPU: 1 PID: 13048 Comm: DOM Worker Tainted: G O 4.19.94-1.pvops.qubes.x86_64 #1
[37973.300619] RIP: 0010:release_pages+0x3c3/0x4b0
[37973.300630] Code: ff ff 48 8b 34 24 49 8d bc 24 80 a1 02 00 45 31 e4 e8 c1 4a 71 00 e9 48 ff ff ff 48 c7 c6 48 70 08 82 4c 89 ef e8 9d 58 02 00 <0f> 0b 4d 85 e4 74 11 48 8b 34 24 49 8d bc 24 80 a1 02 00 e8 95 4a
[37973.300666] RSP: 0018:ffffc90000a6fab0 EFLAGS: 00010046
[37973.300678] RAX: 000000000000003e RBX: ffffffff82369cc0 RCX: 0000000000000006
[37973.300693] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880f5b168b0
[37973.300709] RBP: ffff8880873a5a28 R08: ffffc90000000000 R09: 0000000000011dc1
[37973.300725] R10: 00000000000fa000 R11: ffffffff829f1e4d R12: ffff8880f9fd5000
[37973.300740] R13: ffffea00039be440 R14: 0000000000000002 R15: ffff8880873a6000
[37973.300757] FS: 0000000000000000(0000) GS:ffff8880f5b00000(0000) knlGS:0000000000000000
[37973.300772] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37973.300787] CR2: 00007e31a34ff9d0 CR3: 000000004c3a8003 CR4: 00000000003606e0
[37973.300810] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37973.300827] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[37973.300842] Call Trace:
[37973.300852] tlb_flush_mmu_free+0x36/0x50
[37973.300862] unmap_page_range+0x8f0/0xd00
[37973.300872] unmap_vmas+0x4c/0xa0
[37973.300881] exit_mmap+0xb5/0x1a0
[37973.300984] mmput+0x5f/0x140
[37973.300995] flush_old_exec+0x597/0x6c0
[37973.301004] ? load_elf_phdrs+0x97/0xb0
[37973.301014] load_elf_binary+0x3d9/0x1224
[37973.301023] ? get_acl+0x1a/0x100
[37973.301033] search_binary_handler+0xa6/0x1c0
[37973.301045] __do_execve_file.isra.34+0x587/0x7e0
[37973.301056] __x64_sys_execve+0x34/0x40
[37973.301066] do_syscall_64+0x5b/0x190
[37973.301076] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[37973.301088] RIP: 0033:0x7e31bc9ffacb
[37973.301099] Code: Bad RIP value.
[37973.301107] RSP: 002b:00007e31a34fd4d8 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
[37973.301123] RAX: ffffffffffffffda RBX: 00007e31a34fd500 RCX: 00007e31bc9ffacb
[37973.301139] RDX: 00007e31bc786c00 RSI: 00007e31a34fd4e0 RDI: 00007e31bc722cd0
[37973.301155] RBP: 00007e31a34fd560 R08: 00007e31a34fd4f8 R09: 00007e31a34fd520
[37973.301171] R10: 00007e31a34fd4e0 R11: 0000000000000246 R12: 00007e31a34fd598
[37973.301187] R13: 0000000000000000 R14: 000000005e449201 R15: 00007e31a34fd5b0
[37973.301203] Modules linked in: binfmt_misc ip6table_filter ip6_tables xt_conntrack ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c intel_rapl crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel xen_netfront intel_rapl_perf pcspkr u2mfn(O) xen_gntdev xenfs xen_gntalloc xen_blkback xen_privcmd xen_evtchn overlay xen_blkfront
[37973.301280] ---[ end trace f2c87841b7f07a99 ]---
[37973.301292] RIP: 0010:release_pages+0x3c3/0x4b0
[37973.301303] Code: ff ff 48 8b 34 24 49 8d bc 24 80 a1 02 00 45 31 e4 e8 c1 4a 71 00 e9 48 ff ff ff 48 c7 c6 48 70 08 82 4c 89 ef e8 9d 58 02 00 <0f> 0b 4d 85 e4 74 11 48 8b 34 24 49 8d bc 24 80 a1 02 00 e8 95 4a
[37973.301340] RSP: 0018:ffffc90000a6fab0 EFLAGS: 00010046
[37973.301351] RAX: 000000000000003e RBX: ffffffff82369cc0 RCX: 0000000000000006
[37973.301367] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880f5b168b0
[37973.301383] RBP: ffff8880873a5a28 R08: ffffc90000000000 R09: 0000000000011dc1
[37973.301399] R10: 00000000000fa000 R11: ffffffff829f1e4d R12: ffff8880f9fd5000
[37973.301415] R13: ffffea00039be440 R14: 0000000000000002 R15: ffff8880873a6000
[37973.301431] FS: 0000000000000000(0000) GS:ffff8880f5b00000(0000) knlGS:0000000000000000
[37973.301447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37973.301461] CR2: 00007e31bc9ffaa1 CR3: 000000004c3a8003 CR4: 00000000003606e0
[37973.301476] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37973.301492] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[37973.301508] Kernel panic - not syncing: Fatal exception
[37975.195379] Shutting down cpus with NMI
[37975.195581] Kernel Offset: disabled

Is anyone experiencing this issue, or has any solution to this?

Thank you!
Reply all
Reply to author
Forward
0 new messages