I know you're asking Chris, so feel free to discount my comments here. :)
My strategy before Qubes installation is:
1. Update the BIOS/UEFI PC firmware. Must be done on target workstation/server.
2. Update the SSD firmware. Can be done on a different workstation.
3. Reset the factory DEK on the SSD if it supports OPAL or OPALite. Can be done on a different workstation.
4. NEW: turn off hyperthreading in the BIOS/UEFI config.
> I wonder how many people gave up on
> a Qubes install without ever trying a firmware update. Should I keep the
> firmware up to date thereafter? Firmware updates used to be rare, and it
> was only recommended to install them if something was actually broken. I
> guess now that the firmware is practically an OS in itself, we should be
> updating it like one?
I do, usually after some new Intel CPU vulnerability is published. :)
> Luckily, Dell provides a way to update firmware from Linux, and a way to
> do it with no OS at all. However, I imagine some (most) vendors require
> Windows in order to install firmware updates. Just out of curiosity, how
> do Linux users do firmware updates on machines without fwupd or a
> self-updating firmware?
I install a test Windows instance, download the firmware from the vendor and install it using the vendor tools. That's probably not the most secure method out there.
> Also, what about microcode? Can microcode updates affect Qubes
> compatibility? I know microcode is typically loaded by Linux each boot,
> but if the system can't boot, I guess you have to install a permanent
> update through the BIOS?
One thing that enterprise-serving PC manufacturer *firmware* updates generally cover is the microcode updates. Dell, Lenovo, etc. generally update the firmware with the microcode. Sometimes the BIOS firmware might get the update first, sometimes Linux distros might.
> 2) Do you think 4GB RAM will be enough to do the install? The system
> requirements list 4GB as minimum, so I'm assuming it'll work. I'd rather
> not buy the RAM until I know I'm keeping the machine, although I will if
> I have to. But if I am going to need RAM for the install, I should order
> it ahead of time.
I semi-recently installed 4.0 (can't remember which ISO release) on a system w/ only 4GB of RAM and it installed fine. However, it freaked out trying to start up VMs, repeating and failing continuously. Part of the issue there, I think, is a defect that has already been fixed in testing (and recently pushed to current) where dom0 was reserved more RAM than necessary severing limiting RAM available to VMs.
So for 4.0 I recommend at least 8GB and preferably 12GB. If you'll be running HVMs and/or Windows VMs that cannot participate in memory balancing like the PVH VMs can, 16GB minimum.
Brendan
https://www.asrock.com/mb/AMD/X370%20Taichi/
It's the motherboard out there that has aced being able to do GPU Passthrough on a Windows Guest VM on a Linux Host so all in all it's very reliable for Qubes' many VM requirements and has done very well to reliably run various linux distros just by me reading through alot of pages/articles about doing GPU passthrough in linux
All in all, it's really worth the try and even if it doesn't work on qubes then maybe it will work on considerable alternatives like Subgraph OS.
https://subgraph.com/
There's X470 and X570 out already but I'm not sure how it fares with Qubes OS given that there's alot of new stuff going on in there right now that may not be compatible or working well with Linux.
> Are there any other Xen-based distros out there I could test?
You can add Xen to your stock Fedora install. That takes it roughly to
where Qubes begins, but you might want to use the same version of Fedora
dom0 uses.
If it doesn't work, then the problem is probably entirely in dom0 and
Fedora 25. Assuming you already have the testing 4.19 kernel, have you
thought of upgrading it to the even newer 5.x one as 'latest'? The
latest kernel is installed by specifying the special package named
'kernel-latest'.
On 7/25/19 11:04 AM, brend...@gmail.com wrote:
> I was able to install that particular test build on a Thinkpad X230 for
> testing: https://openqa.qubes-os.org/tests/3021
>
> (note: click on assets tab for link to download the ISO)
Interesting link!
Just to humor myself, I was going to try testing if I could hear sound
from Qubes after resume, but it seems audio isn't working at all. Which
is a whole 'nother problem. Aplay says "... unable to open slave; audio
open error: no such file or directory." `echo -e '\a'` doesn't work even
on a TTY (lsmod shows pcspkr), and `beep` isn't installed.
> I have a Thinkpad T495 with an AMD Ryzen Pro 3700U and Vega 10 graphics. Everything seems to be
> working besides suspend/resume which is crucial for me since I'm on the go a lot. I had to build my
> own Qubes R4.0 ISO to get the installer to work due to it needing a 5.0+ kernel for the graphics
> driver. I installed `kernel-latest` from qubes-dom0-current testing but still didn't work. After
> trying every kernel option on the face of this Earth I decided to use an experimental Qubes R4.1
> build as some things were pointing to dom0 Fedora 25 being the issue. On dom0 Fedora 31 it's still
> an issue with a 5.4 kernel. Has been driving me nuts as I've spent almost the whole day trying to
> figure the issue out.
>
> When I suspend, it clearly suspends but when I open it back up the screen is off but the power LED
> is on. I can hear the fan spin up for a bit but nothing happens. CTRL + ALT + Backspace does
> nothing. I also tried switching to text mode before suspending with CTRL + ALT + F2. Nothing... I
> also disabled the compositor in XFCE to give it a try in both R4.0 and R4.1, no difference. It
> totally seems like an X server or amdgpu issue but I really don't know what to do.
>
> I don't have any VMs running when I test the suspend and I don't have a sys-usb VM to take that out
> of the equation. Any ideas? I'm scratching my head over here and I'm at a loss on what to try next.
>
Did you try the Xen power.c patch?
It sounds like a Xen panic. Some or all AMD Fam15h processors change their CPUID feature bits after resume, which triggers a Xen panic (LEDs and fans on, screen off, keyboard and power button unresponsive). There is a patch and instructions towards the end of this thread: https://www.mail-archive.com/qubes...@googlegroups.com/msg31517.html - It takes some work but it sounds very likely it will fix your problem. Sys-usb causes other problems on a lot of Ryzen machines, so continue to keep it disabled for now.
It doesn't sound like a graphics problem. Usually X or amdgpu issues result in the screen's backlight coming on but displaying a blank screen, and often the keyboard is responsive just not the screen. At least in my experience.
PS: when replying to mailing lists please write your response *below* the quoted text you're replying to.
Also I forgot to mention, if the patch works but you still run into other post-resume problems, you may have to pin dom0 to CPU0. See https://www.mail-archive.com/qubes...@googlegroups.com/msg31737.html
Not sure what you mean about building an RPM outside qubes-builder. This is all done within qubes-builder. So if you have any experience at all with that, then you're already off to a way better start than I was. It wasn't nearly as bad as I thought it would be either. The GUI script does most of the work for you. I tried to leave a sufficiently detailed diary of what I did because I knew it would come in handy later (whether for myself or others). And, you can always ask the mailing list for help.
I just built the RPMs. Building the whole ISO apparently takes many hours and many GB of disk space. And, as with building anything, keep in mind if something goes wrong mid-build, there's a good chance you'll have to start over from the beginning. It took me several attempts. Either method should work the same though, so it's up to you.
>> Should we get the Qubes team to include this patch as a fix for AMD? I'm not sure what the security
>> implications are but I would assume it could introduce an issue where the Spectre/Meltdown
>> microcode patches would not be applied when resuming? I'm also assuming the code is functioning as
>> intended, as it panics but what would the real solution be? I wonder if there's any official fix by
>> Xen in the works rather than commenting out that panic line. Even in Qubes R4.1 with Xen 4.13 the
>> issue persists.
I've been thinking about that. I asked the original author if he reported it to upstream or intended to, but I never heard back from him. I think the Qubes devs would probably just say it's Xen's responsibility, and I can't say I disagree. I've been meaning to mention it on xen-devel but haven't gotten around to it. You're welcome to do so too if you want (if you do, please CC me). My thought was adding a Xen command line argument to override this check, e.g. recheck_cpuid_bits=false (default true, of course), but I have no idea if it would be accepted.
>> Sorry about the email above yours, Google groups wants to put it above your quote by default for
>> some reason. I was also exhausted from trying 1000 kernel boot options lol.
No worries, trust me you're not the first one. Terrible decision on google's part.
> Also... The patch shouldn't really have any security implications assuming your BIOS has the latest
> microcode patches right? I'm guessing this is only for microcode packages installed on the OS.
I have no idea really. I haven't been able to figure out what those feature bits actually mean, if anything. I get the feeling the original authors of that code didn't know either. It kind of looks to me like someone just noticed some bits changing and decided to add a panic just to be on the safe side. Mainly because that code doesn't actually look for any specific bits, it just compares the entire set of flags before and after resume. But I don't know. Use it at your own risk.
BTW, if the patch works for you, please post the output of `xl dmesg` showing which bits have changed after resume. I'm curious if it's the same on all machines.
>> Should we get the Qubes team to include this patch as a fix for AMD? I'm not sure what the security
>> implications are but I would assume it could introduce an issue where the Spectre/Meltdown
>> microcode patches would not be applied when resuming? I'm also assuming the code is functioning as
>> intended, as it panics but what would the real solution be? I wonder if there's any official fix by
>> Xen in the works rather than commenting out that panic line. Even in Qubes R4.1 with Xen 4.13 the
>> issue persists.I've been thinking about that. I asked the original author if he reported it to upstream or intended to, but I never heard back from him. I think the Qubes devs would probably just say it's Xen's responsibility, and I can't say I disagree. I've been meaning to mention it on xen-devel but haven't gotten around to it. You're welcome to do so too if you want (if you do, please CC me). My thought was adding a Xen command line argument to override this check, e.g. recheck_cpuid_bits=false (default true, of course), but I have no idea if it would be accepted.
>> Sorry about the email above yours, Google groups wants to put it above your quote by default for
>> some reason. I was also exhausted from trying 1000 kernel boot options lol.No worries, trust me you're not the first one. Terrible decision on google's part.
> Also... The patch shouldn't really have any security implications assuming your BIOS has the latest
> microcode patches right? I'm guessing this is only for microcode packages installed on the OS.I have no idea really. I haven't been able to figure out what those feature bits actually mean, if anything. I get the feeling the original authors of that code didn't know either. It kind of looks to me like someone just noticed some bits changing and decided to add a panic just to be on the safe side. Mainly because that code doesn't actually look for any specific bits, it just compares the entire set of flags before and after resume. But I don't know. Use it at your own risk.
BTW, if the patch works for you, please post the output of `xl dmesg` showing which bits have changed after resume. I'm curious if it's the same on all machines.
I preemptively submitted this PR to see what the Qubes team thinks. https://github.com/QubesOS/qubes-vmm-xen/pull/70I agree it probably should be fixed upstream, although I've seen the Qubes team make exceptions and apply their own changes. Upstream would probably take a huge amount of time to get merged and tested. I'm not a developer though so I'm sure you could explain the issue better than I. If you do mention it, CC me as well! I like the CLI argument idea, that's probably a much cleaner way of doing it and defaulting it to true. That way users could disable it if needed due to hardware screw-ups.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
If that's just about microcode updates, that's probably BIOS bug - if it
applies microcode update on system startup, it should do the same on
system resume too. Anyway it's worth trying updating linux-firmware
package, which carries microcode updates for AMD. This should make Xen
apply microcode updates too - before checking those flags.
I've just uploaded updated version of the package to the current-testing
repository (both R4.0 and R4.1).
If that's about something else, then fixing it would require finding
what exactly is changing (and preferably also why). And only then find
how to mitigate this issue. If specific flags would turn out to be not
related to security features or otherwise having unwanted effects, then
ignoring those changes would be an option. But ignoring _only those
flags verified to be safe to ignore_, not all of them.
- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAl4/abcACgkQ24/THMrX
1yxEGgf/SG+V7TKM8f7QZ5JFVSr++QasDbMefkuc30OeUkXKtFXsTNMH2fp1S8zq
lTgxfrrGH+N7sfP1KkjAZ7ri+DJgmoCyqULUNZAez5DdGlaLJRtsz5rRBtTr4t9F
nmJNC859/RPEpbozwxlM6K8JRhlxVg35Sl46E9lYHbNsTBqAywxhTUgENsZlrblh
gXn2MgnzDHvwShCltlNL2l29HaAXBzIICpPcgiRWLEY/Y1OTNHvYPiTgZdRtkkEM
5tM97EwxZF31k5i7wGpRed84xCid2bXvufq2Xjo2jWxXuQ01r+bv6v/lVwDvd5tz
iOWJsjj4tXLo3bcpuaCM5XvHI9x0yg==
=h62J
-----END PGP SIGNATURE-----
(XEN) CPU0: cap[ 1] is 7ed8320b (expected f6d8320b)
Thanks for sharing that. Just as I expected, bits 31 and 27, xor 0x88000000. That makes three of us now.
I finally did some digging. I'm wondering if it has to do with the RDRAND issue which has been well known since at least May 7, 2019 to affect Fam15h. This stands to reason, as I immediately updated to the May 19 BIOS update when I bought this machine. awokd had suggested this update, specifically an AGESA update contained within, might have been the cause of an unrelated problem.
https://www.phoronix.com/scan.php?page=news_item&px=AMD-CPUs-RdRand-Suspend
https://linuxreviews.org/AMD_finally_submits_kernel_patch_for_broken_RDRAND_on_older_AMD_APUs
https://www.mail-archive.com/qubes...@googlegroups.com/msg31568.html - User awokd's note about AGESA update
https://www.mail-archive.com/qubes...@googlegroups.com/msg31689.html - User Qubes123's investigation into CPUID bits
From linuxreviews.org:
"There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems. [...] RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND support using CPUID, including the kernel, will believe that RDRAND is not supported. "
According to the page below, RDRAND is bit 30 in ECX, not 31. And that still doesn't explain the 27th bit turning on after resume.
27: OSXSAVE (turns ON)
30: RDRAND (unchanged)
31: Not used, always 0 (turns ON)
https://www.felixcloutier.com/x86/cpuid#fig-3-7
So it doesn't sound like the same problem at all, but all my search queries seem to lead back to the RDRAND issue. I'm hoping someone with more expertise in this area can make some better sense of this.
From linuxreviews.org:
"There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems. [...] RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND support using CPUID, including the kernel, will believe that RDRAND is not supported. "
According to the page below, RDRAND is bit 30 in ECX, not 31. And that still doesn't explain the 27th bit turning on after resume.
27: OSXSAVE (turns ON)
30: RDRAND (unchanged)
31: Not used, always 0 (turns ON)
https://www.felixcloutier.com/x86/cpuid#fig-3-7
So it doesn't sound like the same problem at all, but all my search queries seem to lead back to the RDRAND issue. I'm hoping someone with more expertise in this area can make some better sense of this.
OSXSAVE | A value of 1 indicates that the OS has set CR4.OSXSAVE[bit 18] to enable XSETBV/XGETBV instructions to access XCR0 and to support processor extended state management using XSAVE/XRSTOR |
I want to make one thing clear: I am **not** suggesting this check be removed altogether. I am suggesting adding an **optional**, even undocumented, override parameter which defaults to the **current behavior** which is to panic.
I've found the patch to be quite stable so far. Unpatched is guaranteed to cause a crash (xen
panic) at resume; patched so far has not caused any noticeable stability issues for the four of us
using it, afaik. Just saying.
Also, not everyone has the option of coreboot. And we're not even completely certain this a
post-resume microcode update issue, either.
> lunarthegray:
> @marmarek the "fix" is a hack for sure but it's currently the only way to get some AMD Ryzen
> laptops to work with Qubes. I built Qubes R4.1 the other day and with kernel 5.4 and Xen 4.13 the
> issue remains.Laptop users often suspend and are on the go as I am. There was some discussion on
> the qubes-users mailing list about other solutions. I'm no firmware/Xen expert though. Would
> pinning dom0 to 1 vCPU prevent the issue of missing or changed CPU bits?I'm not exactly sure what
> the fix would be with standard BIOS, as I'm not brave enough to flash coreboot on my very new
> ThinkPad. Should I start trying to get in contact with Lenovo? I'm assuming AMD needs to release a
> microcode patch as it's not really an issue with Xen itself.
At least in my case, CPU pinning did not fix this issue. The bits still change and (would) cause a
Xen panic as before. Pinning dom0 to CPU0 merely fixed a separate post-resume issue with my SATA
controller. In that thread, I link to the original Xen archives thread about pinning which had
nothing to do with Ryzen.
February 9, 2020 2:09 AM, "Marek Marczykowski-Górecki" <marm...@invisiblethingslab.com> wrote:
> (continuing discussion from the above PR)
>
> The patch as it is, is not acceptable, as it may introduce security
> and/or stability issues on some machines. Xen (and Linux too) assumes
> what CPU features is can use based on CPUID flags. If those changes
> during system runtime (including suspend/resume) some instructions or
> control registers may no longer be valid (->crash) or safe to use
> (->security issue).
Like I said, it's been very stable for me so far. I've only had one bad resume in the months I've been using it, suspending at least once a day. Security issues on the other hand are indeed unknown at this point.
Also worth noting that this is Xen-specific. Afaik, the Linux kernel doesn't check for these changes. So everyone using plain old Ubuntu or whatever would be subject to the same stability and security implications caused by this patch.
> If that's just about microcode updates, that's probably BIOS bug - if it
> applies microcode update on system startup, it should do the same on
Weird that it's happening equally on various vendor BIOSes as well as coreboot, the only thing they have in common is Ryzen 2xxx-3xxx chips. It doesn't sound to me like a **BIOS** bug, per se, unless all these vendors and the Coreboot developers wrote the same bug independently. More likely an AMD bug, imo.
> system resume too. Anyway it's worth trying updating linux-firmware
> package, which carries microcode updates for AMD. This should make Xen
> apply microcode updates too - before checking those flags.
> I've just uploaded updated version of the package to the current-testing
> repository (both R4.0 and R4.1).
Thanks for the tip. I'll try it when I have a chance. `--enablerepo=qubes-dom0-current-testing kernel-latest linux-firmware` I'm guessing?
> If that's about something else, then fixing it would require finding
> what exactly is changing (and preferably also why). And only then find
> how to mitigate this issue. If specific flags would turn out to be not
> related to security features or otherwise having unwanted effects, then
> ignoring those changes would be an option. But ignoring _only those
> flags verified to be safe to ignore_, not all of them.
See my other reply about that.
But I would like to mention, there are already all kinds of options and parameters throughout the Xen, Qubes, and Linux codebases that come with stability/security implications. This isn't Apple iOS. You can easily shoot yourself in the foot. That's the nature of the beast. It is not Qubes' purpose to hide these from the user or take away control.
By that logic, we should also patch Xen so that "smt=off" is hardcoded, because as it is now someone might open xen.cfg and see that parameter and decide to turn it on for performance, which we all know is dangerous. Same with Qubes' "no-strict-reset", or dm-crypt's weak upstream default crypto parameters, I could go on and on.
So, again, I'm not suggesting we skip this check for everybody. I'm suggesting we make it into an undocumented Xen cmdline parameter known only to those who, as they say, have been warned. As it is right now, all of us who are affected by this are patching our own machines anyway, so what's the difference to anyone else?
> - --
> Best Regards,
> Marek Marczykowski-Górecki
Thank you for your consideration and for taking the time to follow up on the ML. I look forward to hearing your thoughts.
> marmarek:
> This is a very bad idea to "fix" it. Those missing/changed CPUID bits later on will cause issues.
> And given most of the microcode updates recently are about speculative execution, missing those
> features will make the host vulnerable to those issues again. There are multiple ways it can
> manifest - from crashes when Xen uses (now not present) CPU feature, to silent failures when Xen
> tries to use some feature and assume it protects the system, while it does not in practice.
>
> For this particular case (microcode included in BIOS newer than in OS), I see two options: make
> BIOS (coreboot, right?) apply microcode update on resume too, or include newer microcode in OS.I want to make one thing clear: I am **not** suggesting this check be removed altogether. I am suggesting adding an **optional**, even undocumented, override parameter which defaults to the **current behavior** which is to panic.
I've found the patch to be quite stable so far. Unpatched is guaranteed to cause a crash (xen
panic) at resume; patched so far has not caused any noticeable stability issues for the four of us
using it, afaik. Just saying.
= fam_0f_rev_[cdefg] | fam_10_rev_[bc] | fam_11_rev_b
Applicability: AMD
If none of the other cpuid_mask_* options are given, Xen has a set of pre-configured masks to make the current processor appear to be family/revision specified.
See below for general information on masking.
Warning: This option is not fully effective on Family 15h processors or later.
= <integer>
Applicability: x86. Default:
~0
(all bits set)
The availability of these options are model specific. Some processors don't support any of them, and no processor supports all of them. Xen will ignore options on processors which are lacking support.
These options can be used to alter the features visible via the CPUID
instruction. Settings applied here take effect globally, including for Xen and all guests.
Note: Since Xen 4.7, it is no longer necessary to mask a host to create migration safety in heterogeneous scenarios. All necessary CPUID settings should be provided in the VM configuration file. Furthermore, it is recommended not to use this option, as doing so causes an unnecessary reduction of features at Xen's disposal to manage guests.
Has anyone tried utilizing the xen command line options to mask bits in the cpuid, in particular section 1.2.35 cpuid_mask_ecx)?The man page below says that "Settings applied here take effect globally, including for Xen and all guests." This *might* mean it is applied *before* the resume from sleep CPU bit checks (but I'm not promising anything, as I have not traced through the source). And also "Warning: This option is not fully effective on Family 15h processors or later."
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On Sun, Feb 09, 2020 at 09:28:13AM -0800, brend...@gmail.com wrote:
> On Sunday, February 9, 2020 at 5:25:56 PM UTC, brend...@gmail.com wrote:
> >
> >
> > Has anyone tried utilizing the xen command line options to mask bits in
> > the cpuid, in particular section 1.2.35 cpuid_mask_ecx)?
> >
> > The man page below says that "Settings applied here take effect globally,
> > including for Xen and all guests." This *might* mean it is applied *before*
> > the resume from sleep CPU bit checks (but I'm not promising anything, as I
> > have not traced through the source). And also "*Warning: This option is
> > not fully effective on Family 15h processors or later.*"
> >
>
> Just noticed that the warning applies only to 1.2.34, which is AMD-only,
> apparently. Unclear to me if the other items 1.2.35 and higher, which is
> for "x86" apply only to intel or to all x86 architecture.
I may be missing it in this thread, but have anybody tried Qubes 4.1
builds (with Xen 4.13) on such system? Does it have the same issue?
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On Sun, Feb 09, 2020 at 09:28:13AM -0800, brenda...@gmail.com wrote:
>> On Sunday, February 9, 2020 at 5:25:56 PM UTC, brend...@gmail.com wrote:
>>>
>>>
>>> Has anyone tried utilizing the xen command line options to mask bits in
>>> the cpuid, in particular section 1.2.35 cpuid_mask_ecx)?
>>>
>>> The man page below says that "Settings applied here take effect globally,
>>> including for Xen and all guests." This *might* mean it is applied *before*
>>> the resume from sleep CPU bit checks (but I'm not promising anything, as I
>>> have not traced through the source). And also "*Warning: This option is
>>> not fully effective on Family 15h processors or later.*"
>>>
>>
>> Just noticed that the warning applies only to 1.2.34, which is AMD-only,
>> apparently. Unclear to me if the other items 1.2.35 and higher, which is
>> for "x86" apply only to intel or to all x86 architecture.
>
> I may be missing it in this thread, but have anybody tried Qubes 4.1
> builds (with Xen 4.13) on such system? Does it have the same issue?
I also had the same problem with unpatched Xen 4.13, it was on the fc31-based R4.1 build right before christmas. The check was introduced in 4.8.3.3 and probably hasn't changed. For what it's worth, R4.1 and R4.0 both resume fine when booted without Xen. See https://www.mail-archive.com/qubes...@googlegroups.com/msg31518.html