Disabled PAT in Xen (was: Re: [qubes-devel] More issues w/ Qubes)

101 views
Skip to first unread message

Radoslaw Szkodzinski

unread,
Jun 7, 2012, 11:36:01 PM6/7/12
to konra...@oracle.com, qubes...@googlegroups.com, Joanna Rutkowska
On Sat, Jun 2, 2012 at 12:08 PM, Joanna Rutkowska
<joa...@invisiblethingslab.com> wrote:
> On 05/22/12 13:13, Radoslaw Szkodzinski wrote:
>> Hello,
>> I'm back with some new (and old?) issues in Qubes:
>>
>> 1) Kernel 3.3.5 has a horrible performance regression, probably in the
>> Intel driver (i935, HD 3000).
>> It cuts performance noticeably with large windows and graphics. 3.2.7 is fine.
>
> So, have you found out which commit (between 3.2.7 and 3.3.5) causes the
> GPU slow down (by bisectioning)? It would be very helpful...
>

Yes, I found it... and I'm scared by what this commit does.

Who reviewed this, if anyone at all - this is #if 0 for crying out
loud, not to mention with a previous proper, but not as fast as
everyone would like, fix.
I wonder why nobody using InfiniBand noticed it - I bet they use old
kernels in the datacenters...
(and yes, Xen on desktop is not exactly widespread)

Should've been made into an actual toggle for starters (wait, there's
nopat already) - and properly fixed.
It actually has been fixed - the "wall" (3 nops in the hot path) on A8
is trivial compared to missing caching on all high mappings and what's
more, broken flag reads.

This makes MTRR allocations fail here with horrible results for
performance all around.

Other horrible things will also happen if a high page is already
mapped WB and you return it as WC...

commit 8eaffa67b43e99ae581622c5133e20b0f48bcef1
Author: Konrad Rzeszutek Wilk <konra...@oracle.com>
Date: Fri Feb 10 09:16:27 2012 -0500

xen/pat: Disable PAT support for now.

[Pls also look at https://lkml.org/lkml/2012/2/10/228]

Using of PAT to change pages from WB to WC works quite nicely.
Changing it back to WB - not so much. The crux of the matter is
that the code that does this (__page_change_att_set_clr) has only
limited information so when it tries to the change it gets
the "raw" unfiltered information instead of the properly filtered one -
and the "raw" one tell it that PSE bit is on (while infact it
is not). As a result when the PTE is set to be WB from WC, we get
tons of:

--

Except it does work here. No BUG()s, no WARNING()s. I suspect the
problems when changing attributes arose from nasty overlapping
mappings.

Might be that this also messes up PCI passthrough and might be the
reason why the passed Radeon 7950 here only works once (in a system
boot) in the new kernels.
I know that the Linux driver attempts lots of PFN remappings - if this
goes via dom0 in any way while it's passed through, they will be
messed up if they're high.
(the guest is Windows)

--
Radosław Szkodziński

Konrad Rzeszutek Wilk

unread,
Jun 8, 2012, 11:12:39 AM6/8/12
to Radoslaw Szkodzinski, qubes...@googlegroups.com, Joanna Rutkowska
On Fri, Jun 08, 2012 at 05:36:01AM +0200, Radoslaw Szkodzinski wrote:
> On Sat, Jun 2, 2012 at 12:08 PM, Joanna Rutkowska
> <joa...@invisiblethingslab.com> wrote:
> > On 05/22/12 13:13, Radoslaw Szkodzinski wrote:
> >> Hello,
> >> I'm back with some new (and old?) issues in Qubes:
> >>
> >> 1) Kernel 3.3.5 has a horrible performance regression, probably in the
> >> Intel driver (i935, HD 3000).
> >> It cuts performance noticeably with large windows and graphics. 3.2.7 is fine.
> >
> > So, have you found out which commit (between 3.2.7 and 3.3.5) causes the
> > GPU slow down (by bisectioning)? It would be very helpful...
> >
>
> Yes, I found it... and I'm scared by what this commit does.

It is a workaround corruption issues.
>
> Who reviewed this, if anyone at all - this is #if 0 for crying out
> loud, not to mention with a previous proper, but not as fast as
> everyone would like, fix.

If you would like a fix, take a look at stable/for-x86-3.3

and these two patches:

commit 4f93aa02acd0e34806d4ac9c3a700bb5d040eab6
Author: Konrad Rzeszutek Wilk <konra...@oracle.com>
Date: Fri Nov 4 11:59:34 2011 -0400

x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.

and
commit f474007a0761d0ecb6b84ceaf4f97f4f1de92038
Author: Konrad Rzeszutek Wilk <konra...@oracle.com>
Date: Fri Nov 4 13:18:15 2011 -0400

x86/paravirt: Use pte_val instead of pte_flags on CPA pageattr_test

and revert the one that does the #ifdef. And see if that makes
a difference.

> I wonder why nobody using InfiniBand noticed it - I bet they use old
> kernels in the datacenters...
> (and yes, Xen on desktop is not exactly widespread)
>
> Should've been made into an actual toggle for starters (wait, there's
> nopat already) - and properly fixed.

I agree. But if you Google for it you will find that the converstation
about this has stalled. I would be more than happy to ressurect it.

> It actually has been fixed - the "wall" (3 nops in the hot path) on A8
> is trivial compared to missing caching on all high mappings and what's
> more, broken flag reads.

Huh? I am not sure what you are saying here.

>
> This makes MTRR allocations fail here with horrible results for
> performance all around.

But there are no MTRR support for dom0 - so how would it even
do MTRR?

>
> Other horrible things will also happen if a high page is already
> mapped WB and you return it as WC...

Why would you do WC? It wouldn't get to be WC in the first place.

>
> commit 8eaffa67b43e99ae581622c5133e20b0f48bcef1
> Author: Konrad Rzeszutek Wilk <konra...@oracle.com>
> Date: Fri Feb 10 09:16:27 2012 -0500
>
> xen/pat: Disable PAT support for now.
>
> [Pls also look at https://lkml.org/lkml/2012/2/10/228]
>
> Using of PAT to change pages from WB to WC works quite nicely.
> Changing it back to WB - not so much. The crux of the matter is
> that the code that does this (__page_change_att_set_clr) has only
> limited information so when it tries to the change it gets
> the "raw" unfiltered information instead of the properly filtered one -
> and the "raw" one tell it that PSE bit is on (while infact it
> is not). As a result when the PTE is set to be WB from WC, we get
> tons of:
>
> --
>
> Except it does work here. No BUG()s, no WARNING()s. I suspect the
> problems when changing attributes arose from nasty overlapping
> mappings.

How? If the pat is disabled how would you get WC in the first place?

Radoslaw Szkodzinski

unread,
Jun 8, 2012, 4:14:45 PM6/8/12
to Konrad Rzeszutek Wilk, qubes...@googlegroups.com, Joanna Rutkowska
On Fri, Jun 8, 2012 at 5:12 PM, Konrad Rzeszutek Wilk
<konra...@oracle.com> wrote:
>>
>> Who reviewed this, if anyone at all - this is #if 0 for crying out
>> loud, not to mention with a previous proper, but not as fast as
>> everyone would like, fix.
>
> If you would like a fix, take a look at stable/for-x86-3.3
>
> and these two patches:
>
> commit 4f93aa02acd0e34806d4ac9c3a700bb5d040eab6
> Author: Konrad Rzeszutek Wilk <konra...@oracle.com>
> Date:   Fri Nov 4 11:59:34 2011 -0400
>
>    x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations.
>
> and
> commit f474007a0761d0ecb6b84ceaf4f97f4f1de92038
> Author: Konrad Rzeszutek Wilk <konra...@oracle.com>
> Date:   Fri Nov 4 13:18:15 2011 -0400
>
>    x86/paravirt: Use pte_val instead of pte_flags on CPA pageattr_test
>
> and revert the one that does the #ifdef. And see if that makes
> a difference.

Just reverting the ifdef makes a huge difference. Adding those two
patches is a nice fix,
but I wouldn't be able to spot the performance or crash difference
from the revert on this Xeon E3 system
which has sane (non-overlapping) mappings.

For perf test, would need A8.

>> It actually has been fixed - the "wall" (3 nops in the hot path) on A8
>> is trivial compared to missing caching on all high mappings and what's
>> more, broken flag reads.
>
> Huh? I am not sure what you are saying here.

Relating to the old discussion in the mentioned thread.

Note that the A8 issue you've uncovered is a small cost (if the total
runtime in your benchmark is to be believed) - could be ifdefable if
anyone really wants it.
(I don't have the A8 and I'd have to disassemble the code to check
whether it's still relevant. It's been some time since 3.3.)

>> This makes MTRR allocations fail here with horrible results for
>> performance all around.
>
> But there are no MTRR support for dom0 - so how would it even
> do MTRR?

I've checked the code, it doesn't indeed. My mistake.

>>
>> Other horrible things will also happen if a high page is already
>> mapped WB and you return it as WC...
>
> Why would you do WC? It wouldn't get to be WC in the first place.

You can have writeback and write through mappings via MTRRs or PAT
preset by EFI/BIOS.
Since you can't use MTRR from dom0 and there's no PAT, you don't know
which mode the area is actually in.
If it's write-combining or writeback and you're assuming write-through
or uncached... ouch.

Unless Xen really cleans up mappings itself - but then there should be
no crash in the first place.
I'm pretty sure it attempts that, but not sure how good that is and
why there would be desync between dom0 and the hypervisor in the first
place, such as more than one version of PAT.

--
Radosław Szkodziński

Konrad Rzeszutek Wilk

unread,
Jun 11, 2012, 4:35:13 PM6/11/12
to Radoslaw Szkodzinski, qubes...@googlegroups.com, Joanna Rutkowska
I think they are reset by Xen during bootup.

> Since you can't use MTRR from dom0 and there's no PAT, you don't know
> which mode the area is actually in.

I think Xen sets them to UC (for MMIO) and the rest for WB.

> If it's write-combining or writeback and you're assuming write-through
> or uncached... ouch.
>
> Unless Xen really cleans up mappings itself - but then there should be
> no crash in the first place.
> I'm pretty sure it attempts that, but not sure how good that is and
> why there would be desync between dom0 and the hypervisor in the first
> place, such as more than one version of PAT.

Or a bug in the hypervisor.
Reply all
Reply to author
Forward
0 new messages