> I can suspend, and the machine wakes up from sleep, but is more or less
> unusable afterwards. The symptoms are:
> 1. the network-connection symbol says the network is broken. I have to
> reboot the sys-net (kill and restart), then the network comes online
> again and I have network connection from within the sys-net.
This is an issue because of the device attachment. That is what I found.
This issue has been there for a very long time. It is not something that can easily be fixed, but can only be fixed by user doing things particular ways. (Depending on what you want to do)
> 2. Most of the other VMs are offline and stopped. Restarting these VMs
> does not work or takes ages. You cannot start any other VMs (well
> somethimes after several minutes something works, i.e. a shell is
> startet, but no net.)
This functionality I don't use, I never suspend my machine.
IF you have this issue, you need to check the system itself, not just the Qubes Manager. OR, you can restart the Qubes Manager so that it can completely refresh itself. This has been an issue with the Manager from Qubes-OS for a while now, sometimes it loses what's happening in the system. So restarting the front-end NORMALLY fix's the situation.
Other than that you can check the system by running "xl list"
That will show you all the running VMs as well as their current utilisation specs.
> 3. in Dom0 you can see that some process like (awk, cut, sed and such
> are taking 100% of one CPU core for hours.)
I have noticed this too. This is one thing I do not know about and have never been able to find out why they do this.
> 4. the VM-Manager marks some/most VMs with "VM didn't give back memory".
> I can restart (kill) them, but they are unresposive/unusable. Sometimes
> I can get a shell, sometimes not (or I haven't waited long enough.)
I've had this happen before. Not in 3.2R1+ though.
Restart the manager, as I stated in last response. It can sometimes let you see what you need to see so that you can fix the problem.
> 5. I cannot shut down properly. (_maybe_ I could wait some hours, and
> it would work.) Have to do a cold reset.
This is caused if you have guests that aren't shut down properly, or have not returned things and can't be cleared out. There are many things that it could be.
If you weren't using graphical mode on boot/shutdown then you could see what was going on.
I've had the identicle issue before, it sits on the "a stop job is running for dom0" and it actually sat there all weekend trying to shut down.
> So I have to cold reset to have a working system again.
> Could someone maybe shed some light on this?
I have a torch, battery is a bit low, but it's something. Maybe you can provide some more information with what I have provided?
So with other operating system it took 3 or so seconds to boot, and now with Qubes it takes 20?
Did you know that Qubes starts the sys-net and sys-firewall (whatever VMs you set to start on boot) before you actually get to the GUI?
Not to mention, IF there are issues, it will check the HDD.
If you don't run RedHat Graphical Boot (RHGB) then you can actually see what is happenning and you can see what it does that takes a long time. Do this, and you will see. :}
So on booting, press ESC to get rid of RHGB temporarily.
If there is something that is taking a long time, and it shouldn't, message it back here and let people know.
If need be, IF it is one of the linux based systems, you can always try running "xl console {VMNAME}" as root. This should allow you full access to the backend, and allow you to check what it is doing.
or you can just hit the esc key when prompted for encryption password.
are you talking about the bios boot screen? thats pretty strange. I'm not convinced it was qubes team. did you use a fresh usb out the package? where did you download the iso? Are you using uefi or legacy boot mode? is the hdd mode the same as your previous system setups?
I will try it out and take a look once I've finished this post.
> >
> > > I can suspend, and the machine wakes up from sleep, but is more or
> > > less unusable afterwards. The symptoms are:
> > > 1. the network-connection symbol says the network is broken. I have
> > > to reboot the sys-net (kill and restart), then the network comes
> > > online again and I have network connection from within the
> > > sys-net.
> >
> > This is an issue because of the device attachment. That is what I
> > found. This issue has been there for a very long time. It is not
> > something that can easily be fixed, but can only be fixed by user
> > doing things particular ways. (Depending on what you want to do)
>
> Hmh, but what device? I start the machine, go to suspend, wake it up
> again and it "hangs". There is no USB-Device attached, no wlan,
> bluethooth and such.
Interesting. I have many thoughts on the possibilities of why this can and could happen. I'll check it here on a few PCs and Laptops and see what happens.
> >
> > > 2. Most of the other VMs are offline and stopped. Restarting these
> > > VMs does not work or takes ages. You cannot start any other VMs
> > > (well somethimes after several minutes something works, i.e. a
> > > shell is startet, but no net.)
> >
> > This functionality I don't use, I never suspend my machine.
> > IF you have this issue, you need to check the system itself, not just
> > the Qubes Manager. OR, you can restart the Qubes Manager so that it
> > can completely refresh itself. This has been an issue with the
> > Manager from Qubes-OS for a while now, sometimes it loses what's
> > happening in the system. So restarting the front-end NORMALLY fix's
> > the situation. Other than that you can check the system by running
> > "xl list" That will show you all the running VMs as well as their
> > current utilisation specs.
> >
>
> Thanks for the hints, but I (think I did) tried everything, (BIOS,
> Manager, different VMSs etc) but no change. ...
> xl list and xentop show that dom0 is taking 100%, all the VMs don't do
> anything.
Dom0 has access to all threads. Personally, I reduce my Dom0 to 4 threads, that way the back end can still do things. I'll have to post how to do that because that may avoid this lockup situation.
I would advise you to run 'htop' to see what is going on and the actual CPU usage and activity on each thread when it's doing the 100% bit. Just to know what is actually going on.
Do you have ECC RAM?
> >
> > > 3. in Dom0 you can see that some process like (awk, cut, sed and
> > > such are taking 100% of one CPU core for hours.)
> >
> > I have noticed this too. This is one thing I do not know about and
> > have never been able to find out why they do this.
> >
> >
> > > 4. the VM-Manager marks some/most VMs with "VM didn't give back
> > > memory". I can restart (kill) them, but they are
> > > unresposive/unusable. Sometimes I can get a shell, sometimes not
> > > (or I haven't waited long enough.)
> >
> > I've had this happen before. Not in 3.2R1+ though.
> > Restart the manager, as I stated in last response. It can sometimes
> > let you see what you need to see so that you can fix the problem.
> >
>
> Hmh, no effekt, I disabled "memory balancing" with the effekt that the
> "didn't return requsted memory" error disapears. But still hanging.
> >
> > > 5. I cannot shut down properly. (_maybe_ I could wait some hours,
> > > and it would work.) Have to do a cold reset.
> >
> > This is caused if you have guests that aren't shut down properly, or
> > have not returned things and can't be cleared out. There are many
> > things that it could be.
> >
> > If you weren't using graphical mode on boot/shutdown then you could
> > see what was going on.
>
> It seems to hang at some stopjobs, or "watchdog did not stop" etc
>
> In any case. Even if I switch off all VMs, Dom0 is taking 100% at the
> moment ("logger -p damon.debug -- /etc/xen/scripts/block-snapshot:
> remving /dev/loop" )
That's more info for the devs as I'm not aware of what that is (in their terms), but it sounds like on restore, something has gone wrong.
Did you TAIL that log file to see what was going on?
It could provide valuable details.
> >
> > I've had the identical issue before, it sits on the "a stop job is
> > running for dom0" and it actually sat there all weekend trying to
> > shut down.
>
> :-) Yeah, that sounds familiar.
It's the one that's familiar to me too. It's stuck in my brain after sitting infront of my PC for 30 minutes waiting for it to shut down after first booting after install.
> >
> >
> > > So I have to cold reset to have a working system again.
> > > Could someone maybe shed some light on this?
> >
> > I have a torch, battery is a bit low, but it's something. Maybe you
> > can provide some more information with what I have provided?
> >
>
> Hmh, probably not much. I fidled around with the BIOS, but no change.
> Another symptom:
> Restarting sys-net VMs I get "error starting VM: invalid argument:
> network device with mac 00:16.... already exists"
>
> And just now:
>
> "sudo su - " in Dom0 hangs. (no VM running) and "id" is taking 100%?
don't "sudo su - ", just "su -" or just "su".
If you think about it, sudo means superuser do I believe, so that's elevating you, then you are trying to switch user to the superuser... have you performed "sudo passwd" to give root a password you know to be able to access the account? If not, I suggest you do that. Otherwise you'll never know what the password is for Domo0.
I disable sudo in general, if it is needed, then I limit what sudo can do.
I only allow it the bare minimum of actions.
> Interesting.
> So actually with no VMs involved I already can reproduce the problem.
> Will investigate during the week.
There are things I have issue with as well. Personally, I have a script running almost all the time just so that I can know what happens when my PC locks up...
in crontab...
"*/5 * * * * root sh /path/to/monitorstats.sh"
This way it runs every 5 minutes.
So no matter what, I can always see what's going on in Dom0.
I was thinking about adding xentop to it too, I might do that now...
Added it...
But yes, it has told me many things ever since I started the logging.
Every day the logs are archived.
It's only a few Mb of logs a day. Archived it's about 100k. I use 7za for archiving them, because of it's better compression algorithm.
Might help provide you some answers too.
9.5M 20160928
160K 20160928.7za
The compression ratio is large.
So I can store them all and look back on the effect of things over time to see where things are good and bad.