I noticed that when the computer goes to sleep, Qubes calls some scripts to prepare each VM to be to suspended or maybe paused and then resumed or maybe woken up after that.
First thing. After doing some research in DVM creation scripts noticed that saving state also involves suspend function to be called for that specific vm, so when calling
virsh -c xen:/// save <domain> <filename>
will also call
virsh -c xen:/// dompmsuspend <domain> mem
This comes in to play later after reading all of this.
For the second thig this sounds simple.
PV VMs they can handle that if they are not attacehd to any PCI device.
PV VMs with attached PCI devices usualy they freeze after resuming from sleep. A workaround for this is here:
https://groups.google.com/forum/#!msg/qubes-users/XtdtD20BiSA/1AJ-zZuX5IkJ
HVM VMs they freeze after coming back from sleep as reported here https://groups.google.com/forum/#!topic/qubes-users/UI7cKNCjj4w
Which doesn't matter if you are running Windows or other OS on HVM, after i open the lid i get them all the windows frozen
I also did some test with a Windows 7 installation and a Ubuntu instalation on HVM to make sure and the same thing happens.
I also tried to change kernels at boot, i also tried 3 different laptops, all leads to the same problem.
Now, after digging around with my Ubuntu HVM installation i did the followng
[user@dom0 ~]$ qvm-start ubuntu ; xl list
--> Loading the VM (type = HVM)...
--> Starting Qubes DB...
--> Setting Qubes DB info for the VM...
--> Updating firewall rules...
--> Starting the VM...
--> Starting Qubes GUId (full screen)...
Connecting to VM's GUI agent: .connected
--> Starting the qrexec daemon...
Name ID Mem VCPUs State Time(s)
dom0 0 9150 8 r----- 427.2
sys-net 1 291 8 -b---- 46.5
sys-firewall 2 1339 8 -b---- 27.3
ubuntu 9 1023 1 ------ 0.0
ubuntu-dm 10 44 1 -b---- 0.0
As we can see here, ubuntu runs two VM actually the VM itself and HVM stub domain.
After that we clear the logs, and try to do a suspend with virsh:
[user@dom0 ~]$ sudo bash -c 'echo > /var/log/libvirt/libxl/ubuntu.log'
[user@dom0 ~]$ sudo bash -c 'echo > /var/log/xen/console/guest-ubuntu-dm.log'
[user@dom0 ~]$ virsh -c xen:/// dompmsuspend ubuntu mem
Domain ubuntu successfully suspended
[user@dom0 ~]$ xl list
Name ID Mem VCPUs State Time(s)
dom0 0 9150 8 r----- 471.0
sys-net 1 291 8 -b---- 49.5
sys-firewall 2 1422 8 -b---- 28.8
ubuntu 9 1023 8 ---ss- 11.5
ubuntu-dm 10 44 1 ---sc- 11.9
Now i notice that suspending ubuntu VM makes ubuntu in a suspended state and ubuntu-dm in a suspended state but just before suspending is done on stubdom, the ubuntu-dm crashes and shows 'sc' letters wich mean SUSPENDED+CRASHED
This also hapens when doing "virsh -c xen:/// save <domain> <filename>" which leads to "dompmsuspend"
Now checking the logs i found the following:
[user@dom0 ~]$ sudo cat /var/log/libvirt/libxl/ubuntu.log
libxl: debug: libxl.c:776:libxl_domain_suspend: ao 0x7efe9000cd00: create: how=(nil) callback=(nil) poller=0x7efe900049b0
libxl: debug: libxl_dom.c:1074:libxl__domain_suspend_common_callback: issuing PVHVM suspend request via XenBus control node
libxl: debug: libxl_dom.c:1078:libxl__domain_suspend_common_callback: wait for the guest to acknowledge suspend request
libxl: debug: libxl_dom.c:1125:libxl__domain_suspend_common_callback: guest acknowledged suspend request
libxl: debug: libxl_dom.c:1129:libxl__domain_suspend_common_callback: wait for the guest to suspend
libxl: debug: libxl_dom.c:1143:libxl__domain_suspend_common_callback: guest has suspended
libxl: debug: libxl_dom.c:987:libxl__domain_suspend_device_model: Saving device model state to /var/lib/xen/qemu-save.9
libxl: error: libxl_exec.c:227:libxl__xenstore_child_wait_deprecated: Device Model not ready <-----------------could this be a problem?
libxl: debug: libxl_event.c:1600:libxl__ao_complete: ao 0x7efe9000cd00: complete, rc=0
libxl: debug: libxl.c:798:libxl_domain_suspend: ao 0x7efe9000cd00: inprogress: poller=0x7efe900049b0, flags=ic
libxl: debug: libxl_event.c:1572:libxl__ao__destroy: ao 0x7efe9000cd00: destroy
libxl: debug: libxl_event.c:518:watchfd_callback: watch w=0x7efe90008ed0 wpath=@releaseDomain token=3/2d: event epath=@releaseDomain
libxl: debug: libxl.c:1012:domain_death_xswatch_callback: [evg=0x7efe7c00ddd0:9] nentries=1 rc=1 9..9
libxl: debug: libxl.c:1023:domain_death_xswatch_callback: [evg=0x7efe7c00ddd0:9] got=domaininfos[0] got->domain=9
libxl: debug: libxl.c:1050:domain_death_xswatch_callback: exists shutdown_reported=0 dominf.flags=20006
libxl: debug: libxl.c:1062:domain_death_xswatch_callback: shutdown reporting
libxl: debug: libxl.c:1016:domain_death_xswatch_callback: [evg=0] all reported
libxl: debug: libxl.c:1079:domain_death_xswatch_callback: domain death search done
libxl: debug: libxl_event.c:518:watchfd_callback: watch w=0x7efe90008ed0 wpath=@releaseDomain token=3/2d: event epath=@releaseDomain
libxl: debug: libxl.c:1012:domain_death_xswatch_callback: [evg=0x7efe7c00ddd0:9] nentries=1 rc=1 9..9
libxl: debug: libxl.c:1023:domain_death_xswatch_callback: [evg=0x7efe7c00ddd0:9] got=domaininfos[0] got->domain=9
libxl: debug: libxl.c:1050:domain_death_xswatch_callback: exists shutdown_reported=1 dominf.flags=20006
libxl: debug: libxl.c:1016:domain_death_xswatch_callback: [evg=0] all reported
libxl: debug: libxl.c:1079:domain_death_xswatch_callback: domain death search done
libxl: debug: libxl_event.c:1155:egc_run_callbacks: event 0x7efeb4c603b0 callback type=domain_shutdown
[user@dom0 ~]$ sudo cat /var/log/xen/console/guest-ubuntu-dm.log
xs_read_watch() -> /local/domain/0/device-model/9/command dm-command
dm-command: pause and save state
device model saving state
xs_read_watch() -> /local/domain/0/device-model/9/command dm-command
xs_read(/local/domain/0/device-model/9/command): ENOENT
******************* CONSFRONT for device/console/1 **********
Failed to read device/console/1/backend-id.
Page fault at linear address 0x0, rip 0x1048cd, regs 0x5ff568, sp 0x5ff618, our_sp 0x5ff530, code 0
Thread: main
RIP: e030:[<00000000001048cd>]
RSP: e02b:00000000005ff618 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000002002c76e40 RCX: 0000000000000001
RDX: 0000002002c03730 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00000000005ff618 R08: 0000002002c04910 R09: 000000000057a000
R10: 000000000000276d R11: 000000000000276d R12: 0000000000000000
R13: 0000000000000000 R14: 0000002002c03150 R15: 000000000000001c
base is 0x5ff618 caller is 0xead7d
base is 0x5ff678 caller is 0xeb32d
base is 0x5ff8d8 caller is 0xe4c49
base is 0x5ff938 caller is 0xe4d69
base is 0x5ff958 caller is 0x100d94
base is 0x5ff978 caller is 0xfd5ab
base is 0x5ff9b8 caller is 0x7d75e
base is 0x5ff9e8 caller is 0x660e
base is 0x5ffa18 caller is 0x21f9e
base is 0x5ffa68 caller is 0x952d
base is 0x5ffdf8 caller is 0xdfbb7
base is 0x5fffe8 caller is 0x343b
5ff600: 18 f6 5f 00 00 00 00 00 2b e0 00 00 00 00 00 00
5ff610: a0 37 c0 02 20 00 00 00 78 f6 5f 00 00 00 00 00
5ff620: 7d ad 0e 00 00 00 00 00 48 f6 5f 00 00 00 00 00
5ff630: 20 49 c0 02 20 00 00 00 98 f6 5f 00 00 00 00 00
5ff600: 18 f6 5f 00 00 00 00 00 2b e0 00 00 00 00 00 00
5ff610: a0 37 c0 02 20 00 00 00 78 f6 5f 00 00 00 00 00
5ff620: 7d ad 0e 00 00 00 00 00 48 f6 5f 00 00 00 00 00
5ff630: 20 49 c0 02 20 00 00 00 98 f6 5f 00 00 00 00 00
1048b0: 5d 41 5e 41 5f 5d c3 66 0f 1f 84 00 00 00 00 00
1048c0: 55 40 f6 c7 07 48 89 f8 48 89 e5 75 57 48 8b 07
1048d0: 49 b8 ff fe fe fe fe fe fe fe 48 be 80 80 80 80
1048e0: 80 80 80 80 4a 8d 14 00 48 f7 d0 48 21 c2 48 89
Pagetable walk from virt 0, base 57b000:
L4 = 00000000aa995067 (0x57c000) [offset = 0]
L3 = 00000000aa994067 (0x57d000) [offset = 0]
L2 = 00000000aa993067 (0x57e000) [offset = 0]
L1 = 0000000000000000 [offset = 0]
As i see here, the stubdom crashes exactly before going to suspend mode and not after coming back.
So this is the explanation for freezing after resuming the laptop from sleep which actually happens before
actually sleep occurs.
Also it is clear now that saving state leads to the same issue.
Now here is the problem.
I don't know to trace the stack of the error for any of the code written inside stubdom. Or i don't know may be there is something that could be changed in a script of qubes os to solve this problem.
Also i suspect this could be a memory problem in DM or stack overflow but i don't know for sure.
Can someone help me? or can some one advice what to do next to debug the problem?
> > [user@dom0 ~]$ sudo cat /var/log/xen/console/guest-ubuntu-dm.log
> >
> > xs_read_watch() -> /local/domain/0/device-model/9/command dm-command
> > dm-command: pause and save state
> > device model saving state
>
> This log is from "save" or "dompmsuspend"? I guess the later
> (unfortunately). The sole "dompmsuspend" shouldn't require dumping
> stubdomain state to a file. This is needed only in "save". But
> "dompmsuspend" (without actual saving to file, or migrating) isn't fully
> supported by Xen.
The log output above is from "dompmsuspend"
> > virsh -c xen:/// save <domain> <filename>
> This one is (semi-intentionally) broken for Qubes HVMs, because it
> requires qemu in dom0 (in addition to the one in stubdomain).
> > xs_read_watch() -> /local/domain/0/device-model/9/command dm-command
> > xs_read(/local/domain/0/device-model/9/command): ENOENT
> > ******************* CONSFRONT for device/console/1 **********
> >
> >
> > Failed to read device/console/1/backend-id.
>
> And this the thing requiring qemu in dom0, which we don't have. Shame it
> crashes that badly, instead of some nice error message...
I managed to do a successful save. but before that i got an error that
a file called /var/lib/xen/qemu-save.DOMID was missing.
I did a "touch /var/lib/xen/qemu-save.DOMID" and then "virsh -c xen:/// save <domain> <filename>" it worked.
I didn't know that there was something else missing at that time. I did found about this now.
Anyway. After doing the save procedure and after doing "xl list" i noticed again the stubdom is crashed and suspended so i assumed that "virsh save" is also calling "dompmsuspend"
After that i also managed to do a successfull restore and after running "xl list" i noticed domain was resumed successfully and was running but stubdom was crashed because it crashed before save
So now back to the save/restore.
If i can somehow fix "dompmsuspend", will save/restore work after that? or that qemu part that is missing is something essential?
What are the missing parts of qemu exactly?
So if i understand correctly, there is no other way to do this currently because xenconsoled doesn't support more than one channel. So qemu in dom0 indeed i consider that add extra risk. So i decided to give up on this.
Anyway. Just to have something as a replacement i just did a test to save/restore a PV VM and it works very nice.