Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

3.2.1 Unable to reset IRR messages on boot

698 views
Skip to first unread message

Josh Boyer

unread,
Jan 24, 2012, 7:10:02 PM1/24/12
to
We've had a report [1] from users booting the 3.2.1 kernel and getting a
large number of KERN_ERR messages that look like:

[ 0.020902] Unable to reset IRR for apic: 2, pin :0
[ 0.020970] Unable to reset IRR for apic: 2, pin :1
[ 0.021012] Unable to reset IRR for apic: 2, pin :2
[ 0.021077] Unable to reset IRR for apic: 2, pin :3
[ 0.021138] Unable to reset IRR for apic: 2, pin :4
[ 0.021199] Unable to reset IRR for apic: 2, pin :5
[ 0.021261] Unable to reset IRR for apic: 2, pin :6
[ 0.021323] Unable to reset IRR for apic: 2, pin :7

Digging through git, it seems that error message was added to 3.2 with
commit 1e75b31d63. The commit log mentions kdump, but I don't believe the
user is doing kexec/kdump of any kind. It seems a normal yum update/reboot
and they hit this.

Are there any details the user can gather to help debug this, or has
anyone seen this before?

josh

[1] https://bugzilla.redhat.com/show_bug.cgi?id=784445
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Suresh Siddha

unread,
Jan 24, 2012, 8:20:02 PM1/24/12
to
On Tue, 2012-01-24 at 19:04 -0500, Josh Boyer wrote:
> We've had a report [1] from users booting the 3.2.1 kernel and getting a
> large number of KERN_ERR messages that look like:
>
> [ 0.020902] Unable to reset IRR for apic: 2, pin :0
> [ 0.020970] Unable to reset IRR for apic: 2, pin :1
> [ 0.021012] Unable to reset IRR for apic: 2, pin :2
> [ 0.021077] Unable to reset IRR for apic: 2, pin :3
> [ 0.021138] Unable to reset IRR for apic: 2, pin :4
> [ 0.021199] Unable to reset IRR for apic: 2, pin :5
> [ 0.021261] Unable to reset IRR for apic: 2, pin :6
> [ 0.021323] Unable to reset IRR for apic: 2, pin :7
>
> Digging through git, it seems that error message was added to 3.2 with
> commit 1e75b31d63. The commit log mentions kdump, but I don't believe the
> user is doing kexec/kdump of any kind. It seems a normal yum update/reboot
> and they hit this.
>
> Are there any details the user can gather to help debug this, or has
> anyone seen this before?
>

complete dmesg (which will have the platform, io-apic version info etc)
will be useful.

If we are seeing this during a regular boot and for the all the RTE
entries for a specific io-apic, most likely something is wrong with that
io-apic (probably a bogus one listed by the bios?). We should be able to
make the kernel code bit more smart to workaround this.

thanks,
suresh

Josh Boyer

unread,
Jan 25, 2012, 9:00:03 AM1/25/12
to
On Tue, Jan 24, 2012 at 05:24:11PM -0800, Suresh Siddha wrote:
> On Tue, 2012-01-24 at 19:04 -0500, Josh Boyer wrote:
> > We've had a report [1] from users booting the 3.2.1 kernel and getting a
> > large number of KERN_ERR messages that look like:
> >
> > [ 0.020902] Unable to reset IRR for apic: 2, pin :0
> > [ 0.020970] Unable to reset IRR for apic: 2, pin :1
> > [ 0.021012] Unable to reset IRR for apic: 2, pin :2
> > [ 0.021077] Unable to reset IRR for apic: 2, pin :3
> > [ 0.021138] Unable to reset IRR for apic: 2, pin :4
> > [ 0.021199] Unable to reset IRR for apic: 2, pin :5
> > [ 0.021261] Unable to reset IRR for apic: 2, pin :6
> > [ 0.021323] Unable to reset IRR for apic: 2, pin :7
> >
> > Digging through git, it seems that error message was added to 3.2 with
> > commit 1e75b31d63. The commit log mentions kdump, but I don't believe the
> > user is doing kexec/kdump of any kind. It seems a normal yum update/reboot
> > and they hit this.
> >
> > Are there any details the user can gather to help debug this, or has
> > anyone seen this before?
> >
>
> complete dmesg (which will have the platform, io-apic version info etc)
> will be useful.

Attached. If you prefer it inline, let me know.

> If we are seeing this during a regular boot and for the all the RTE
> entries for a specific io-apic, most likely something is wrong with that
> io-apic (probably a bogus one listed by the bios?). We should be able to
> make the kernel code bit more smart to workaround this.

The reporter (now CC'd) confirmed it was a normal boot.

josh
dmesg-ioapic

Suresh Siddha

unread,
Jan 25, 2012, 5:10:01 PM1/25/12
to
On Wed, 2012-01-25 at 08:49 -0500, Josh Boyer wrote:
> [ 0.000000] IOAPIC[1]: apic_id 2, version 255, address 0xfec28000, GSI 24-279

This looks indeed like a bogus entry probably returning all 1's for
RTE's etc. Can you please send me a dmesg with "apic=verbose" boot
parameter?

Josh Boyer

unread,
Jan 25, 2012, 6:20:02 PM1/25/12
to
On Wed, Jan 25, 2012 at 02:04:08PM -0800, Suresh Siddha wrote:
> On Wed, 2012-01-25 at 08:49 -0500, Josh Boyer wrote:
> > [ 0.000000] IOAPIC[1]: apic_id 2, version 255, address 0xfec28000, GSI 24-279
>
> This looks indeed like a bogus entry probably returning all 1's for
> RTE's etc. Can you please send me a dmesg with "apic=verbose" boot
> parameter?

Here you go:

https://bugzilla.redhat.com/attachment.cgi?id=557552

josh

Josh Boyer

unread,
Jan 31, 2012, 9:30:02 AM1/31/12
to
On Wed, Jan 25, 2012 at 06:15:35PM -0500, Josh Boyer wrote:
> On Wed, Jan 25, 2012 at 02:04:08PM -0800, Suresh Siddha wrote:
> > On Wed, 2012-01-25 at 08:49 -0500, Josh Boyer wrote:
> > > [ 0.000000] IOAPIC[1]: apic_id 2, version 255, address 0xfec28000, GSI 24-279
> >
> > This looks indeed like a bogus entry probably returning all 1's for
> > RTE's etc. Can you please send me a dmesg with "apic=verbose" boot
> > parameter?
>
> Here you go:
>
> https://bugzilla.redhat.com/attachment.cgi?id=557552

Was this helpful at all? I've been watching lkml for a related patch
in case I was missed on CC but haven't seen anything as of yet.

Suresh Siddha

unread,
Feb 1, 2012, 3:10:02 AM2/1/12
to
On Tue, 2012-01-31 at 09:26 -0500, Josh Boyer wrote:
> On Wed, Jan 25, 2012 at 06:15:35PM -0500, Josh Boyer wrote:
> > On Wed, Jan 25, 2012 at 02:04:08PM -0800, Suresh Siddha wrote:
> > > On Wed, 2012-01-25 at 08:49 -0500, Josh Boyer wrote:
> > > > [ 0.000000] IOAPIC[1]: apic_id 2, version 255, address 0xfec28000, GSI 24-279
> > >
> > > This looks indeed like a bogus entry probably returning all 1's for
> > > RTE's etc. Can you please send me a dmesg with "apic=verbose" boot
> > > parameter?
> >
> > Here you go:
> >
> > https://bugzilla.redhat.com/attachment.cgi?id=557552
>
> Was this helpful at all? I've been watching lkml for a related patch
> in case I was missed on CC but haven't seen anything as of yet.

Yes, it was helpful. Something like the appended patch should ignore the
bogus io-apic entry all together. As I can't test this, can you or the
reporter give the appended patch a try and ack please?

thanks,
suresh
---

From: Suresh Siddha <suresh....@intel.com>
Subject: x86, ioapic: add register checks for bogus io-apic entries

With the recent changes to clear_IO_APIC_pin() which tries to clear
remoteIRR bit explicitly, some of the users started to see
"Unable to reset IRR for apic .." messages.

Close look shows that these are related to bogus IO-APIC entries which
return's all 1's for their io-apic registers. And the above mentioned error
messages are benign. But kernel should have ignored such io-apic's in the
first place.

Check if register 0, 1, 2 of the listed io-apic are all 1's and ignore
such io-apic.

Reported-by: Álvaro Castillo <mid...@gmail.com>
Signed-off-by: Suresh Siddha <suresh....@intel.com>
---
arch/x86/kernel/apic/io_apic.c | 26 ++++++++++++++++++++++++++
1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index fb07275..953e54d 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3979,6 +3979,26 @@ static __init int bad_ioapic(unsigned long address)
return 0;
}

+static __init int bad_ioapic_regs(int idx)
+{
+ union IO_APIC_reg_00 reg_00;
+ union IO_APIC_reg_01 reg_01;
+ union IO_APIC_reg_02 reg_02;
+
+ reg_00.raw = io_apic_read(idx, 0);
+ reg_01.raw = io_apic_read(idx, 1);
+ reg_02.raw = io_apic_read(idx, 2);
+
+ if (reg_00.raw == -1 && reg_01.raw == -1 && reg_02.raw == -1) {
+ printk(KERN_WARNING
+ "I/O APIC 0x%x regs return all ones, skipping!\n",
+ mpc_ioapic_addr(idx));
+ return 1;
+ }
+
+ return 0;
+}
+
void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
{
int idx = 0;
@@ -3995,6 +4015,12 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
ioapics[idx].mp_config.apicaddr = address;

set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address);
+
+ if (bad_ioapic_regs(idx)) {
+ clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
+ return;
+ }
+
ioapics[idx].mp_config.apicid = io_apic_unique_id(id);
ioapics[idx].mp_config.apicver = io_apic_get_version(idx);

Josh Boyer

unread,
Mar 12, 2012, 9:30:03 AM3/12/12
to
On Wed, Feb 01, 2012 at 12:00:30AM -0800, Suresh Siddha wrote:
> On Tue, 2012-01-31 at 09:26 -0500, Josh Boyer wrote:
> > On Wed, Jan 25, 2012 at 06:15:35PM -0500, Josh Boyer wrote:
> > > On Wed, Jan 25, 2012 at 02:04:08PM -0800, Suresh Siddha wrote:
> > > > On Wed, 2012-01-25 at 08:49 -0500, Josh Boyer wrote:
> > > > > [ 0.000000] IOAPIC[1]: apic_id 2, version 255, address 0xfec28000, GSI 24-279
> > > >
> > > > This looks indeed like a bogus entry probably returning all 1's for
> > > > RTE's etc. Can you please send me a dmesg with "apic=verbose" boot
> > > > parameter?
> > >
> > > Here you go:
> > >
> > > https://bugzilla.redhat.com/attachment.cgi?id=557552
> >
> > Was this helpful at all? I've been watching lkml for a related patch
> > in case I was missed on CC but haven't seen anything as of yet.
>
> Yes, it was helpful. Something like the appended patch should ignore the
> bogus io-apic entry all together. As I can't test this, can you or the
> reporter give the appended patch a try and ack please?

Hi Suresh,

Apologies for the delay. The original reporter had to return the
machine he was using. We've since had another report where this
happened and your patch below does indeed fix the issue.

I'd suggest pushing this soon.

https://bugzilla.redhat.com/show_bug.cgi?id=801501

josh

Suresh Siddha

unread,
Mar 12, 2012, 2:40:02 PM3/12/12
to
On Mon, 2012-03-12 at 09:24 -0400, Josh Boyer wrote:
> On Wed, Feb 01, 2012 at 12:00:30AM -0800, Suresh Siddha wrote:
> > Yes, it was helpful. Something like the appended patch should ignore the
> > bogus io-apic entry all together. As I can't test this, can you or the
> > reporter give the appended patch a try and ack please?
>
> Hi Suresh,
>
> Apologies for the delay. The original reporter had to return the
> machine he was using. We've since had another report where this
> happened and your patch below does indeed fix the issue.
>
> I'd suggest pushing this soon.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=801501
>

Thanks Josh. Peter/Ingo, please queue the appended patch for -tip.
---
From: Suresh Siddha <suresh....@intel.com>
Subject: x86, ioapic: add register checks for bogus io-apic entries

With the recent changes to clear_IO_APIC_pin() which tries to clear
remoteIRR bit explicitly, some of the users started to see
"Unable to reset IRR for apic .." messages.

Close look shows that these are related to bogus IO-APIC entries which
return's all 1's for their io-apic registers. And the above mentioned error
messages are benign. But kernel should have ignored such io-apic's in the
first place.

Check if register 0, 1, 2 of the listed io-apic are all 1's and ignore
such io-apic.

Reported-by: Álvaro Castillo <mid...@gmail.com>
Tested-by: Jon Dufresne <j...@jondufresne.org>

tip-bot for Suresh Siddha

unread,
Mar 13, 2012, 5:50:03 AM3/13/12
to
Commit-ID: 73d63d038ee9f769f5e5b46792d227fe20e442c5
Gitweb: http://git.kernel.org/tip/73d63d038ee9f769f5e5b46792d227fe20e442c5
Author: Suresh Siddha <suresh....@intel.com>
AuthorDate: Mon, 12 Mar 2012 11:36:33 -0700
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 13 Mar 2012 05:52:02 +0100

x86/ioapic: Add register level checks to detect bogus io-apic entries

With the recent changes to clear_IO_APIC_pin() which tries to
clear remoteIRR bit explicitly, some of the users started to see
"Unable to reset IRR for apic .." messages.

Close look shows that these are related to bogus IO-APIC entries
which return's all 1's for their io-apic registers. And the
above mentioned error messages are benign. But kernel should
have ignored such io-apic's in the first place.

Check if register 0, 1, 2 of the listed io-apic are all 1's and
ignore such io-apic.

Reported-by: Álvaro Castillo <mid...@gmail.com>
Tested-by: Jon Dufresne <j...@jondufresne.org>
Signed-off-by: Suresh Siddha <suresh....@intel.com>
Cc: yin...@kernel.org
Cc: kerne...@fedoraproject.org
Cc: Josh Boyer <jwb...@redhat.com>
Cc: <sta...@kernel.org>
Link: http://lkml.kernel.org/r/1331577393.3...@sbsiddha-desk.sc.intel.com
[ Performed minor cleanup of affected code. ]
Signed-off-by: Ingo Molnar <mi...@elte.hu>
---
arch/x86/kernel/apic/io_apic.c | 40 ++++++++++++++++++++++++++++++++--------
1 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index fb07275..6d10a66 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3967,18 +3967,36 @@ int mp_find_ioapic_pin(int ioapic, u32 gsi)
static __init int bad_ioapic(unsigned long address)
{
if (nr_ioapics >= MAX_IO_APICS) {
- printk(KERN_WARNING "WARNING: Max # of I/O APICs (%d) exceeded "
- "(found %d), skipping\n", MAX_IO_APICS, nr_ioapics);
+ pr_warn("WARNING: Max # of I/O APICs (%d) exceeded (found %d), skipping\n",
+ MAX_IO_APICS, nr_ioapics);
return 1;
}
if (!address) {
- printk(KERN_WARNING "WARNING: Bogus (zero) I/O APIC address"
- " found in table, skipping!\n");
+ pr_warn("WARNING: Bogus (zero) I/O APIC address found in table, skipping!\n");
return 1;
}
return 0;
}

+static __init int bad_ioapic_register(int idx)
+{
+ union IO_APIC_reg_00 reg_00;
+ union IO_APIC_reg_01 reg_01;
+ union IO_APIC_reg_02 reg_02;
+
+ reg_00.raw = io_apic_read(idx, 0);
+ reg_01.raw = io_apic_read(idx, 1);
+ reg_02.raw = io_apic_read(idx, 2);
+
+ if (reg_00.raw == -1 && reg_01.raw == -1 && reg_02.raw == -1) {
+ pr_warn("I/O APIC 0x%x registers return all ones, skipping!\n",
+ mpc_ioapic_addr(idx));
+ return 1;
+ }
+
+ return 0;
+}
+
void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
{
int idx = 0;
@@ -3995,6 +4013,12 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
ioapics[idx].mp_config.apicaddr = address;

set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address);
+
+ if (bad_ioapic_register(idx)) {
+ clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
+ return;
+ }
+
ioapics[idx].mp_config.apicid = io_apic_unique_id(id);
ioapics[idx].mp_config.apicver = io_apic_get_version(idx);

@@ -4015,10 +4039,10 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
if (gsi_cfg->gsi_end >= gsi_top)
gsi_top = gsi_cfg->gsi_end + 1;

- printk(KERN_INFO "IOAPIC[%d]: apic_id %d, version %d, address 0x%x, "
- "GSI %d-%d\n", idx, mpc_ioapic_id(idx),
- mpc_ioapic_ver(idx), mpc_ioapic_addr(idx),
- gsi_cfg->gsi_base, gsi_cfg->gsi_end);
+ pr_info("IOAPIC[%d]: apic_id %d, version %d, address 0x%x, GSI %d-%d\n",
+ idx, mpc_ioapic_id(idx),
+ mpc_ioapic_ver(idx), mpc_ioapic_addr(idx),
+ gsi_cfg->gsi_base, gsi_cfg->gsi_end);

nr_ioapics++;

Josh Boyer

unread,
Mar 19, 2012, 9:40:01 AM3/19/12
to
On Mon, Mar 12, 2012 at 11:36:33AM -0700, Suresh Siddha wrote:
> On Mon, 2012-03-12 at 09:24 -0400, Josh Boyer wrote:
> > On Wed, Feb 01, 2012 at 12:00:30AM -0800, Suresh Siddha wrote:
> > > Yes, it was helpful. Something like the appended patch should ignore the
> > > bogus io-apic entry all together. As I can't test this, can you or the
> > > reporter give the appended patch a try and ack please?
> >
> > Hi Suresh,
> >
> > Apologies for the delay. The original reporter had to return the
> > machine he was using. We've since had another report where this
> > happened and your patch below does indeed fix the issue.
> >
> > I'd suggest pushing this soon.
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=801501
> >
>
> Thanks Josh. Peter/Ingo, please queue the appended patch for -tip.

Hi Suresh,

Seems this patch and Xen don't get along very well. See the bug link
below. I've CC'd Konrad and hopefully he'll have some insight as to why
that might be.

josh

https://bugzilla.redhat.com/show_bug.cgi?id=804347

Konrad Rzeszutek Wilk

unread,
Mar 19, 2012, 3:50:02 PM3/19/12
to
On Mon, Mar 19, 2012 at 09:30:46AM -0400, Josh Boyer wrote:
> On Mon, Mar 12, 2012 at 11:36:33AM -0700, Suresh Siddha wrote:
> > On Mon, 2012-03-12 at 09:24 -0400, Josh Boyer wrote:
> > > On Wed, Feb 01, 2012 at 12:00:30AM -0800, Suresh Siddha wrote:
> > > > Yes, it was helpful. Something like the appended patch should ignore the
> > > > bogus io-apic entry all together. As I can't test this, can you or the
> > > > reporter give the appended patch a try and ack please?
> > >
> > > Hi Suresh,
> > >
> > > Apologies for the delay. The original reporter had to return the
> > > machine he was using. We've since had another report where this
> > > happened and your patch below does indeed fix the issue.
> > >
> > > I'd suggest pushing this soon.
> > >
> > > https://bugzilla.redhat.com/show_bug.cgi?id=801501
> > >
> >
> > Thanks Josh. Peter/Ingo, please queue the appended patch for -tip.
>
> Hi Suresh,
>
> Seems this patch and Xen don't get along very well. See the bug link
> below. I've CC'd Konrad and hopefully he'll have some insight as to why
> that might be.

Quick glance at the code tells me that the 'mp_register_ioapic' with the
patch won't increment the gsi_top. That value (gsi_top) is used in
'get_nr_irqs_gsi()'. And that function is used:

#ifdef CONFIG_X86_IO_APIC
/*
* For an HVM guest or domain 0 which see "real" (emulated or
* actual respectively) GSIs we allocate dynamic IRQs
* e.g. those corresponding to event channels or MSIs
* etc. from the range above those "real" GSIs to avoid
* collisions.
*/
if (xen_initial_domain() || xen_hvm_domain())
first = get_nr_irqs_gsi();
#endif

So we get 'first' to be 16 instead of the proper GSI number.. Or perhaps
it is some other bizzare number. Would need to instrument this.


Now, the reason that read to the IOAPIC:
[ 0.000000] I/O APIC 0xfec00000 regs return all ones, skipping!

is b/c we aren't suppose to read the APIC entries at all. There was
some discussion between Ingo (or Peter?) and Jeremy years ago about a pvops call
to do a hypercall to read said entries but it was established that the
initial domain should have no such business. As such it does this:

2066 memset(dummy_mapping, 0xff, PAGE_SIZE);

and :
1899 * We just don't map the IO APIC - all access is via
1900 * hypercalls. Keep the address in the pte for reference.
1901 */
1902 pte = pfn_pte(PFN_DOWN(__pa(dummy_mapping)), PAGE_KERNEL);
1903 break;

[ignore that comment, there are no hypercalls for it]. This is in arch/x86/xen/mmu.c
So the IO_APIC is all 0xfff..

Konrad Rzeszutek Wilk

unread,
Mar 20, 2012, 5:50:03 AM3/20/12
to
> > > > > Yes, it was helpful. Something like the appended patch should ignore the
> > > > > bogus io-apic entry all together. As I can't test this, can you or the
> > > > > reporter give the appended patch a try and ack please?
> > > >
> > > > Hi Suresh,
> > > >
> > > > Apologies for the delay. The original reporter had to return the
> > > > machine he was using. We've since had another report where this
> > > > happened and your patch below does indeed fix the issue.
> > > >
> > > > I'd suggest pushing this soon.
> > > >
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=801501
> > > >
> > >
> > > Thanks Josh. Peter/Ingo, please queue the appended patch for -tip.
> >
> > Hi Suresh,
> >
> > Seems this patch and Xen don't get along very well. See the bug link
> > below. I've CC'd Konrad and hopefully he'll have some insight as to why
> > that might be.
>
> Quick glance at the code tells me that the 'mp_register_ioapic' with the
> patch won't increment the gsi_top. That value (gsi_top) is used in
.. snip..
> So the IO_APIC is all 0xfff..

My "quick glance" was wrong. The reason we are dying is b/c the call
acpi_get_override_irq() is used, which returns the polarity and trigger for the
IRQs. But that function calls mp_find_ioapics to get the 'struct ioapic' structure
- which along with the mp_irq[x] is used to figure out the default values
and the polarity/trigger overrides. Since the mp_find_ioapics now returns -1
[b/c the IOAPIC is filled with 0xffffffff], the acpi_get_override_irq() stops
trying to lookup in the mp_irq[x] the proper INT_SRV_OVR and we can't install the
SCI interrupt.

Furthermore, we end up using that function in a loop to setup the sixteen legacy
interrupts:

/* Pre-allocate legacy irqs */
480 for (irq = 0; irq < NR_IRQS_LEGACY; irq++) {
481 int trigger, polarity;
482
483 if (acpi_get_override_irq(irq, &trigger, &polarity) == -1)
484 continue;
485
486 xen_register_pirq(irq, -1 /* no GSI override */,
487 trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE,
488 true /* Map GSI to PIRQ */);


and since we get -1, we never end up setting any interrupts.

Also the nr_ioapics ends up being zero, so we think we are running in legacy mode with XT-PIC
interrupts (ugh). By the time ACPI kicks in and wants to install the SCI interrupt
we blow up since we cannot get the proper irq_desc.

The interesting thing is that the same issue could be reproduced on baremetal
if the first IOAPIC had 0xfff all over it - and the call to acpi_get_override_irq()
done by the ACPI layer to setup the SCI would have triggered a similar failure.
But the baremetal failing case has the first IOAPIC with the INT_SRV_OVR with
valid values - it is just the second IOAPIC is busted.

Or if the second IOAPIC was busted and there was an INT_SRV_OVR for the second
APIC to handle the SCI.

I think there are three ways of fixing this:

1). Revert Suresh's patch and look at just removing the "Unable to reset IRR" warning
(perhaps by being conditional on running in kexec-env?).

2). Make the Xen layer fake out an IOAPIC - so instead of 0xffffff, make sure to
clear the three bits that Suresh' patch is testing for (Ewwwww, I don't actually
like that - that stinks of a hack).

3). Rework Suresh's patch - to only remove the IOAPIC entry if there is no
INT_SRV_OVR that depend on it. I made a stab at it and here is draft patch, that
looks to work on my boxes that have more than one IOAPIC and are booting under Xen:
But I am not 100% confident about it so would appreciate somebody looking at it.

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 690d1cc..d92be91 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -170,6 +170,7 @@ extern u32 gsi_top;
int mp_find_ioapic(u32 gsi);
int mp_find_ioapic_pin(int ioapic, u32 gsi);
void __init mp_register_ioapic(int id, u32 address, u32 gsi_base);
+void __init mp_erase_defective_ioapics(void);
extern void __init pre_init_apic_IRQ0(void);

extern void mp_save_irq(struct mpc_intsrc *m);
diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index 9c7d95f..e886853 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -103,6 +103,7 @@ extern void mp_config_acpi_legacy_irqs(void);
struct device;
extern int mp_register_gsi(struct device *dev, u32 gsi, int edge_level,
int active_high_low);
+extern void mp_erase_defective_ioapics(void);
#endif /* CONFIG_ACPI */

#define PHYSID_ARRAY_SIZE BITS_TO_LONGS(MAX_LOCAL_APIC)
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index c3a5b95..b5115e7 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1193,6 +1193,10 @@ static int __init acpi_parse_madt_ioapic_entries(void)
}

/*
+ * Cleanup up invalid IOAPICs.
+ */
+ mp_erase_defective_ioapics();
+ /*
* If BIOS did not supply an INT_SRC_OVR for the SCI
* pretend we got one so we can set the SCI flags.
*/
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 6d10a66..77f1e84 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -3989,14 +3989,43 @@ static __init int bad_ioapic_register(int idx)
reg_02.raw = io_apic_read(idx, 2);

if (reg_00.raw == -1 && reg_01.raw == -1 && reg_02.raw == -1) {
- pr_warn("I/O APIC 0x%x registers return all ones, skipping!\n",
- mpc_ioapic_addr(idx));
return 1;
}

return 0;
}
+bool __init no_gsi_override_for_ioapic(int idx)
+{
+ unsigned int gsi, i;
+ struct mp_ioapic_gsi *gsi_cfg;
+
+ gsi_cfg = mp_ioapic_gsi_routing(idx);
+ for (gsi = gsi_cfg->gsi_base; gsi < gsi_cfg->gsi_end; gsi++) {
+ unsigned int pin = mp_find_ioapic_pin(idx, gsi);
+ for (i = 0; i < mp_irq_entries; i++) {
+ if (mp_irqs[i].dstirq == pin &&
+ mp_irqs[i].dstapic == mpc_ioapic_id(idx))
+ return false;
+ }
+ }
+ return true;
+}
+void __init mp_erase_defective_ioapics(void)
+{
+ unsigned int idx = 0;

+ while (idx < nr_ioapics) {
+ if (bad_ioapic_register(idx) && no_gsi_override_for_ioapic(idx)) {
+ pr_warn("I/O APIC 0x%x registers return all ones, skipping!\n",
+ mpc_ioapic_addr(idx));
+ clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
+ memmove(&ioapics[idx], &(ioapics[idx+1]),
+ sizeof(struct ioapic) * (nr_ioapics - 1 - idx));
+ --nr_ioapics;
+ } else
+ idx++;
+ }
+}
void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
{
int idx = 0;
@@ -4014,11 +4043,6 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)

set_fixmap_nocache(FIX_IO_APIC_BASE_0 + idx, address);

- if (bad_ioapic_register(idx)) {
- clear_fixmap(FIX_IO_APIC_BASE_0 + idx);
- return;
- }
-
ioapics[idx].mp_config.apicid = io_apic_unique_id(id);
ioapics[idx].mp_config.apicver = io_apic_get_version(idx);

Ingo Molnar

unread,
Mar 20, 2012, 6:00:03 AM3/20/12
to

* Konrad Rzeszutek Wilk <konra...@oracle.com> wrote:

> 2). Make the Xen layer fake out an IOAPIC - so instead of 0xffffff, make sure to
> clear the three bits that Suresh' patch is testing for (Ewwwww, I don't actually
> like that - that stinks of a hack).

In what universe would a hardware virtualization layer emulating
actual hardware behavior be a 'hack'?

Thanks,

Ingo

Konrad Rzeszutek Wilk

unread,
Mar 20, 2012, 2:00:03 PM3/20/12
to
On Tue, Mar 20, 2012 at 10:59:24AM +0100, Ingo Molnar wrote:
>
> * Konrad Rzeszutek Wilk <konra...@oracle.com> wrote:
>
> > 2). Make the Xen layer fake out an IOAPIC - so instead of 0xffffff, make sure to
> > clear the three bits that Suresh' patch is testing for (Ewwwww, I don't actually
> > like that - that stinks of a hack).
>
> In what universe would a hardware virtualization layer emulating
> actual hardware behavior be a 'hack'?

I think I didn't explain myself enough. The 2) "fix" would be basically the minimal work-around
around Suresh's patch so that the test his patch employs passes. That feels to me like a hack -
a band-aid solution. If his code employs more stringent tests in the next release, then I've got
to play catch-up and add more faking off the IOAPIC. That in long term might mean introducing
a pvops for the ioapic_read so that we can selectivity provide the "proper" IOAPIC entries.

The patch I posted thought tries to solve the existing baremetal problem Sureh's patch was for
and also not introduce a regression by only erasing the IOAPIC if there are no dependencies on it.

Suresh Siddha

unread,
Mar 20, 2012, 2:20:01 PM3/20/12
to
On Tue, 2012-03-20 at 05:40 -0400, Konrad Rzeszutek Wilk wrote:
> I think there are three ways of fixing this:
>
> 1). Revert Suresh's patch and look at just removing the "Unable to reset IRR" warning
> (perhaps by being conditional on running in kexec-env?).
>
> 2). Make the Xen layer fake out an IOAPIC - so instead of 0xffffff, make sure to
> clear the three bits that Suresh' patch is testing for (Ewwwww, I don't actually
> like that - that stinks of a hack).
>
> 3). Rework Suresh's patch - to only remove the IOAPIC entry if there is no
> INT_SRV_OVR that depend on it. I made a stab at it and here is draft patch, that
> looks to work on my boxes that have more than one IOAPIC and are booting under Xen:
> But I am not 100% confident about it so would appreciate somebody looking at it.
>

Thanks for looking at this Konrad. This issue is not just specific to
INT_SRC_OVR per-say.

Issue is that Xen though it doesn't use IO-APIC, it does depend on
proper IO-APIC parsing for various things like getting proper gsi_top,
INT_SRV_OVR entries etc.

I think Xen should be setting up a valid dummy IO-APIC mapping instead
of working around.

thanks,
suresh

Konrad Rzeszutek Wilk

unread,
Mar 20, 2012, 3:10:02 PM3/20/12
to
On Tue, Mar 20, 2012 at 11:12:58AM -0700, Suresh Siddha wrote:
> On Tue, 2012-03-20 at 05:40 -0400, Konrad Rzeszutek Wilk wrote:
> > I think there are three ways of fixing this:
> >
> > 1). Revert Suresh's patch and look at just removing the "Unable to reset IRR" warning
> > (perhaps by being conditional on running in kexec-env?).
> >
> > 2). Make the Xen layer fake out an IOAPIC - so instead of 0xffffff, make sure to
> > clear the three bits that Suresh' patch is testing for (Ewwwww, I don't actually
> > like that - that stinks of a hack).
> >
> > 3). Rework Suresh's patch - to only remove the IOAPIC entry if there is no
> > INT_SRV_OVR that depend on it. I made a stab at it and here is draft patch, that
> > looks to work on my boxes that have more than one IOAPIC and are booting under Xen:
> > But I am not 100% confident about it so would appreciate somebody looking at it.
> >
>
> Thanks for looking at this Konrad. This issue is not just specific to
> INT_SRC_OVR per-say.
>
> Issue is that Xen though it doesn't use IO-APIC, it does depend on
> proper IO-APIC parsing for various things like getting proper gsi_top,
> INT_SRV_OVR entries etc.
>
> I think Xen should be setting up a valid dummy IO-APIC mapping instead
> of working around.

Then this fixes the issue - thought if there are more checks in the future
it will have to be redone..:

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 988828b..b8e2794 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1859,6 +1859,7 @@ pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
#endif /* CONFIG_X86_64 */

static unsigned char dummy_mapping[PAGE_SIZE] __page_aligned_bss;
+static unsigned char fake_ioapic_mapping[PAGE_SIZE] __page_aligned_bss;

static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
{
@@ -1899,7 +1900,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
* We just don't map the IO APIC - all access is via
* hypercalls. Keep the address in the pte for reference.
*/
- pte = pfn_pte(PFN_DOWN(__pa(dummy_mapping)), PAGE_KERNEL);
+ pte = pfn_pte(PFN_DOWN(__pa(fake_ioapic_mapping)), PAGE_KERNEL);
break;
#endif

@@ -2064,6 +2065,7 @@ void __init xen_init_mmu_ops(void)
pv_mmu_ops = xen_mmu_ops;

memset(dummy_mapping, 0xff, PAGE_SIZE);
+ memset(fake_ioapic_mapping, 0xfd, PAGE_SIZE);
}

/* Protected by xen_reservation_lock. */

Suresh Siddha

unread,
Mar 20, 2012, 4:10:02 PM3/20/12
to
On Tue, 2012-03-20 at 14:58 -0400, Konrad Rzeszutek Wilk wrote:
> Then this fixes the issue - thought if there are more checks in the future
> it will have to be redone..:
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 988828b..b8e2794 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1859,6 +1859,7 @@ pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
> #endif /* CONFIG_X86_64 */
>
> static unsigned char dummy_mapping[PAGE_SIZE] __page_aligned_bss;
> +static unsigned char fake_ioapic_mapping[PAGE_SIZE] __page_aligned_bss;
>
> static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
> {
> @@ -1899,7 +1900,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot)
> * We just don't map the IO APIC - all access is via
> * hypercalls. Keep the address in the pte for reference.
> */
> - pte = pfn_pte(PFN_DOWN(__pa(dummy_mapping)), PAGE_KERNEL);
> + pte = pfn_pte(PFN_DOWN(__pa(fake_ioapic_mapping)), PAGE_KERNEL);
> break;
> #endif
>
> @@ -2064,6 +2065,7 @@ void __init xen_init_mmu_ops(void)
> pv_mmu_ops = xen_mmu_ops;
>
> memset(dummy_mapping, 0xff, PAGE_SIZE);
> + memset(fake_ioapic_mapping, 0xfd, PAGE_SIZE);

heh ;)

I was referring to setting up something little more valid. Like
IO_APIC_reg_01 showing value of 0x00170020 etc. As the gsi computation
actually refers to the number of redirection table entries supported by
the IO-APIC.

Also does the dom0 see all the GSI's/io-apic's that the host sees or is
it going to be just one io-apic? I was just wondering if the reg 01 need
to be same as what the host sees?

Anyways, instead of 0xfd, having sane fake register values will be a
better start.

Konrad Rzeszutek Wilk

unread,
Mar 20, 2012, 4:40:01 PM3/20/12
to
It does not use the APIC mechanism at all. It only needs the GSI value
from the IOAPIC and LAPIC so that the ACPI can work. Then the ACPI _PRT values
are used to setup the GSI's (via hypercalls).

>
> Anyways, instead of 0xfd, having sane fake register values will be a
> better start.

It could also at startup copy the original values and stick them in these copies.
That does mean using the memory allocator to allocate X copies for X IOAPICS. I
don't think the memory allocator is active at that stage since we using fixmaps.
Thoughts?

Suresh Siddha

unread,
Mar 20, 2012, 4:50:01 PM3/20/12
to
On Tue, 2012-03-20 at 16:25 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Mar 20, 2012 at 01:05:19PM -0700, Suresh Siddha wrote:
> > Also does the dom0 see all the GSI's/io-apic's that the host sees or is
> > it going to be just one io-apic? I was just wondering if the reg 01 need
> > to be same as what the host sees?
>
> It does not use the APIC mechanism at all. It only needs the GSI value
> from the IOAPIC and LAPIC so that the ACPI can work.

So the number of redirection entries in the io-apic need to be same as
the original values then?

> It could also at startup copy the original values and stick them in these copies.
> That does mean using the memory allocator to allocate X copies for X IOAPICS. I
> don't think the memory allocator is active at that stage since we using fixmaps.
> Thoughts?

You should be able to use alloc_bootmem friends.

thanks,
suresh

Konrad Rzeszutek Wilk

unread,
Mar 20, 2012, 5:00:02 PM3/20/12
to
On Tue, Mar 20, 2012 at 01:41:32PM -0700, Suresh Siddha wrote:
> On Tue, 2012-03-20 at 16:25 -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Mar 20, 2012 at 01:05:19PM -0700, Suresh Siddha wrote:
> > > Also does the dom0 see all the GSI's/io-apic's that the host sees or is
> > > it going to be just one io-apic? I was just wondering if the reg 01 need
> > > to be same as what the host sees?
> >
> > It does not use the APIC mechanism at all. It only needs the GSI value
> > from the IOAPIC and LAPIC so that the ACPI can work.
>
> So the number of redirection entries in the io-apic need to be same as
> the original values then?

It can be up to sixteen and or up to whatever the INT_SRV_OVR has. With
the dummy mapping it was 0xff, which meant it had up to 255 which was more than
enough.

Konrad Rzeszutek Wilk

unread,
Mar 20, 2012, 8:20:01 PM3/20/12
to
Or rather just implement one different function as opposed
to the native one : the read function.

We synthesize the values.

Suggested-by: Suresh Siddha <suresh....@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
---
arch/x86/xen/Makefile | 2 +-
arch/x86/xen/apic.c | 17 +++++++++++++++++
arch/x86/xen/enlighten.c | 2 ++
arch/x86/xen/xen-ops.h | 4 ++++
4 files changed, 24 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/xen/apic.c

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index add2c2d..96ab2c0 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -20,5 +20,5 @@ obj-$(CONFIG_EVENT_TRACING) += trace.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
-obj-$(CONFIG_XEN_DOM0) += vga.o
+obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
new file mode 100644
index 0000000..c53b3ef
--- /dev/null
+++ b/arch/x86/xen/apic.c
@@ -0,0 +1,17 @@
+#include <linux/init.h>
+#include <asm/x86_init.h>
+
+unsigned int xen_io_apic_read(unsigned apic, unsigned reg)
+{
+ if (reg == 0x1)
+ return 0x00170020;
+ else if (reg == 0x0)
+ return 0x00000000;
+
+ return 0xfd;
+}
+
+void __init xen_init_apic(void)
+{
+ x86_apic.read = xen_io_apic_read;
+}
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 0732326..591ee69 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1381,6 +1381,8 @@ asmlinkage void __init xen_start_kernel(void)
pci_request_acs();

xen_acpi_sleep_register();
+
+ xen_init_apic();
}


diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index b095739..45c0c06 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -92,11 +92,15 @@ struct dom0_vga_console_info;

#ifdef CONFIG_XEN_DOM0
void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size);
+void __init xen_init_apic(void);
#else
static inline void __init xen_init_vga(const struct dom0_vga_console_info *info,
size_t size)
{
}
+static inline void __init xen_init_apic(void)
+{
+}
#endif

/* Declare an asm function, along with symbols needed to make it
--
1.7.7.5

Konrad Rzeszutek Wilk

unread,
Mar 20, 2012, 8:20:01 PM3/20/12
to
On Tue, Mar 20, 2012 at 01:05:19PM -0700, Suresh Siddha wrote:
I was talking with Suresh on IRC and I realized that some of these patches
that did this were posted some time ago, so I dusted them off and redid them
just a tiny bit.

Ingo, please take a look - and if you are OK with them please put in them
the v3.4 queue - if you are not comfortable, then the above patch will have to
suffice as a work-around for the regression.

arch/x86/include/asm/io_apic.h | 31 +++++++++++++++++++++++++++++--
arch/x86/include/asm/x86_init.h | 8 ++++++++
arch/x86/kernel/apic/io_apic.c | 10 ++++++----
arch/x86/kernel/setup.c | 2 +-
arch/x86/kernel/x86_init.c | 8 ++++++++
arch/x86/xen/Makefile | 2 +-
arch/x86/xen/apic.c | 17 +++++++++++++++++
arch/x86/xen/enlighten.c | 2 ++
arch/x86/xen/xen-ops.h | 4 ++++
9 files changed, 76 insertions(+), 8 deletions(-)
Jeremy Fitzhardinge (1):
x86: add io_apic_ops to allow interception

Konrad Rzeszutek Wilk (2):
x86/apic_ops: Replace apic_ops with x86_apic_ops.
xen/x86: Implement x86_apic_ops

Konrad Rzeszutek Wilk

unread,
Mar 20, 2012, 8:20:01 PM3/20/12
to
. which makes the code fit within the rest of the x86_ops functions.

Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
---
arch/x86/include/asm/io_apic.h | 40 +++++++++++++++++++++--------
arch/x86/include/asm/x86_init.h | 8 ++++++
arch/x86/kernel/apic/io_apic.c | 54 ++++----------------------------------
arch/x86/kernel/setup.c | 2 +-
arch/x86/kernel/x86_init.c | 8 ++++++
5 files changed, 52 insertions(+), 60 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 190d8c2..bb876ce 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -5,6 +5,7 @@
#include <asm/mpspec.h>
#include <asm/apicdef.h>
#include <asm/irq_vectors.h>
+#include <asm/x86_init.h>

/*
* Intel IO-APIC support for SMP and UP systems.
@@ -21,15 +22,6 @@
#define IO_APIC_REDIR_LEVEL_TRIGGER (1 << 15)
#define IO_APIC_REDIR_MASKED (1 << 16)

-struct io_apic_ops {
- void (*init)(void);
- unsigned int (*read)(unsigned int apic, unsigned int reg);
- void (*write)(unsigned int apic, unsigned int reg, unsigned int value);
- void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
-};
-
-void __init set_io_apic_ops(const struct io_apic_ops *);
-
/*
* The structure of the IO-APIC:
*/
@@ -156,7 +148,6 @@ struct io_apic_irq_attr;
extern int io_apic_set_pci_routing(struct device *dev, int irq,
struct io_apic_irq_attr *irq_attr);
void setup_IO_APIC_irq_extra(u32 gsi);
-extern void ioapic_and_gsi_init(void);
extern void ioapic_insert_resources(void);

int io_apic_setup_irq_pin_once(unsigned int irq, int node, struct io_apic_irq_attr *attr);
@@ -185,12 +176,35 @@ extern void mp_save_irq(struct mpc_intsrc *m);

extern void disable_ioapic_support(void);

+
+void __init native_ioapic_init_mappings(void);
+unsigned int native_ioapic_read(unsigned int apic, unsigned int reg);
+void native_ioapic_write(unsigned int apic, unsigned int reg,
+ unsigned int val);
+void native_ioapic_modify(unsigned int apic, unsigned int reg,
+ unsigned int val);
+
+static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
+{
+ return x86_apic.read(apic, reg);
+}
+
+static inline void io_apic_write(unsigned int apic, unsigned int reg,
+ unsigned int value)
+{
+ x86_apic.write(apic, reg, value);
+}
+
+static inline void io_apic_modify(unsigned int apic, unsigned int reg,
+ unsigned int value)
+{
+ x86_apic.modify(apic, reg, value);
+}
#else /* !CONFIG_X86_IO_APIC */

#define io_apic_assign_pci_irqs 0
#define setup_ioapic_ids_from_mpc x86_init_noop
static const int timer_through_8259 = 0;
-static inline void ioapic_and_gsi_init(void) { }
static inline void ioapic_insert_resources(void) { }
#define gsi_top (NR_IRQS_LEGACY)
static inline int mp_find_ioapic(u32 gsi) { return 0; }
@@ -212,6 +226,10 @@ static inline int restore_ioapic_entries(void)

static inline void mp_save_irq(struct mpc_intsrc *m) { };
static inline void disable_ioapic_support(void) { }
+#define native_ioapic_init_mappings NULL
+#define native_ioapic_read NULL
+#define native_ioapic_write NULL
+#define native_ioapic_modify NULL
#endif

#endif /* _ASM_X86_IO_APIC_H */
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 517d476..0fdf65b 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -182,10 +182,18 @@ struct x86_msi_ops {
void (*restore_msi_irqs)(struct pci_dev *dev, int irq);
};

+struct x86_apic_ops {
+ void (*init)(void);
+ unsigned int (*read)(unsigned int apic, unsigned int reg);
+ void (*write)(unsigned int apic, unsigned int reg, unsigned int value);
+ void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
+};
+
extern struct x86_init_ops x86_init;
extern struct x86_cpuinit_ops x86_cpuinit;
extern struct x86_platform_ops x86_platform;
extern struct x86_msi_ops x86_msi;
+extern struct x86_apic_ops x86_apic;

extern void x86_init_noop(void);
extern void x86_init_uint_noop(unsigned int unused);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index bf120234..9a15d4b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -67,25 +67,6 @@
#define for_each_irq_pin(entry, head) \
for (entry = head; entry; entry = entry->next)

-static void __init __ioapic_init_mappings(void);
-static unsigned int __io_apic_read(unsigned int apic, unsigned int reg);
-static void __io_apic_write(unsigned int apic, unsigned int reg,
- unsigned int val);
-static void __io_apic_modify(unsigned int apic, unsigned int reg,
- unsigned int val);
-
-static struct io_apic_ops io_apic_ops = {
- .init = __ioapic_init_mappings,
- .read = __io_apic_read,
- .write = __io_apic_write,
- .modify = __io_apic_modify,
-};
-
-void __init set_io_apic_ops(const struct io_apic_ops *ops)
-{
- io_apic_ops = *ops;
-}
-
/*
* Is the SiS APIC rmw bug present ?
* -1 = don't know, 0 = no, 1 = yes
@@ -313,24 +294,6 @@ static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
irq_free_desc(at);
}

-static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
-{
- return io_apic_ops.read(apic, reg);
-}
-
-static inline void io_apic_write(unsigned int apic, unsigned int reg,
- unsigned int value)
-{
- io_apic_ops.write(apic, reg, value);
-}
-
-static inline void io_apic_modify(unsigned int apic, unsigned int reg,
- unsigned int value)
-{
- io_apic_ops.modify(apic, reg, value);
-}
-
-
struct io_apic {
unsigned int index;
unsigned int unused[3];
@@ -351,15 +314,15 @@ static inline void io_apic_eoi(unsigned int apic, unsigned int vector)
writel(vector, &io_apic->eoi);
}

-static unsigned int __io_apic_read(unsigned int apic, unsigned int reg)
+unsigned int native_ioapic_read(unsigned int apic, unsigned int reg)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);
writel(reg, &io_apic->index);
return readl(&io_apic->data);
}

-static void __io_apic_write(unsigned int apic, unsigned int reg,
- unsigned int value)
+void native_ioapic_write(unsigned int apic, unsigned int reg,
+ unsigned int value)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);
writel(reg, &io_apic->index);
@@ -372,8 +335,8 @@ static void __io_apic_write(unsigned int apic, unsigned int reg,
*
* Older SiS APIC requires we rewrite the index register
*/
-static void __io_apic_modify(unsigned int apic, unsigned int reg,
- unsigned int value)
+void native_ioapic_modify(unsigned int apic, unsigned int reg,
+ unsigned int value)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);

@@ -3910,12 +3873,7 @@ static struct resource * __init ioapic_setup_resources(int nr_ioapics)
return res;
}

-void __init ioapic_and_gsi_init(void)
-{
- io_apic_ops.init();
-}
-
-static void __init __ioapic_init_mappings(void)
+void __init native_ioapic_init_mappings(void)
{
unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
struct resource *ioapic_res;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d7d5099..87a0fcd 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1016,7 +1016,7 @@ void __init setup_arch(char **cmdline_p)
init_cpu_to_node();

init_apic_mappings();
- ioapic_and_gsi_init();
+ x86_apic.init();

kvm_guest_init();

diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 947a06c..c5d0697 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -18,6 +18,7 @@
#include <asm/e820.h>
#include <asm/time.h>
#include <asm/irq.h>
+#include <asm/io_apic.h>
#include <asm/pat.h>
#include <asm/tsc.h>
#include <asm/iommu.h>
@@ -117,3 +118,10 @@ struct x86_msi_ops x86_msi = {
.teardown_msi_irqs = default_teardown_msi_irqs,
.restore_msi_irqs = default_restore_msi_irqs,
};
+
+struct x86_apic_ops x86_apic = {
+ .init = native_ioapic_init_mappings,
+ .read = native_ioapic_read,
+ .write = native_ioapic_write,
+ .modify = native_ioapic_modify,
+};
--
1.7.7.5

Konrad Rzeszutek Wilk

unread,
Mar 20, 2012, 8:20:02 PM3/20/12
to
From: Jeremy Fitzhardinge <jeremy.fi...@citrix.com>

Xen dom0 needs to paravirtualize IO operations to the IO APIC, so add
a io_apic_ops for it to intercept. Do this as ops structure because
there's at least some chance that another paravirtualized environment
may want to intercept these.

[ Impact: indirect IO APIC access via io_apic_ops ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fi...@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
---
arch/x86/include/asm/io_apic.h | 9 +++++++
arch/x86/kernel/apic/io_apic.c | 50 +++++++++++++++++++++++++++++++++++++--
2 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 690d1cc..190d8c2 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -21,6 +21,15 @@
#define IO_APIC_REDIR_LEVEL_TRIGGER (1 << 15)
#define IO_APIC_REDIR_MASKED (1 << 16)

+struct io_apic_ops {
+ void (*init)(void);
+ unsigned int (*read)(unsigned int apic, unsigned int reg);
+ void (*write)(unsigned int apic, unsigned int reg, unsigned int value);
+ void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
+};
+
+void __init set_io_apic_ops(const struct io_apic_ops *);
+
/*
* The structure of the IO-APIC:
*/
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index fb07275..bf120234 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -67,6 +67,25 @@
#define for_each_irq_pin(entry, head) \
for (entry = head; entry; entry = entry->next)

+static void __init __ioapic_init_mappings(void);
+static unsigned int __io_apic_read(unsigned int apic, unsigned int reg);
+static void __io_apic_write(unsigned int apic, unsigned int reg,
+ unsigned int val);
+static void __io_apic_modify(unsigned int apic, unsigned int reg,
+ unsigned int val);
+
+static struct io_apic_ops io_apic_ops = {
+ .init = __ioapic_init_mappings,
+ .read = __io_apic_read,
+ .write = __io_apic_write,
+ .modify = __io_apic_modify,
+};
+
+void __init set_io_apic_ops(const struct io_apic_ops *ops)
+{
+ io_apic_ops = *ops;
+}
+
/*
* Is the SiS APIC rmw bug present ?
* -1 = don't know, 0 = no, 1 = yes
@@ -294,6 +313,24 @@ static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
irq_free_desc(at);
}

+static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
+{
+ return io_apic_ops.read(apic, reg);
+}
+
+static inline void io_apic_write(unsigned int apic, unsigned int reg,
+ unsigned int value)
+{
+ io_apic_ops.write(apic, reg, value);
+}
+
+static inline void io_apic_modify(unsigned int apic, unsigned int reg,
+ unsigned int value)
+{
+ io_apic_ops.modify(apic, reg, value);
+}
+
+
struct io_apic {
unsigned int index;
unsigned int unused[3];
@@ -314,14 +351,15 @@ static inline void io_apic_eoi(unsigned int apic, unsigned int vector)
writel(vector, &io_apic->eoi);
}

-static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
+static unsigned int __io_apic_read(unsigned int apic, unsigned int reg)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);
writel(reg, &io_apic->index);
return readl(&io_apic->data);
}

-static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
+static void __io_apic_write(unsigned int apic, unsigned int reg,
+ unsigned int value)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);
writel(reg, &io_apic->index);
@@ -334,7 +372,8 @@ static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned i
*
* Older SiS APIC requires we rewrite the index register
*/
-static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
+static void __io_apic_modify(unsigned int apic, unsigned int reg,
+ unsigned int value)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);

@@ -3873,6 +3912,11 @@ static struct resource * __init ioapic_setup_resources(int nr_ioapics)

void __init ioapic_and_gsi_init(void)
{
+ io_apic_ops.init();
+}
+
+static void __init __ioapic_init_mappings(void)
+{
unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
struct resource *ioapic_res;
int i;
--
1.7.7.5

Suresh Siddha

unread,
Mar 20, 2012, 9:40:01 PM3/20/12
to
For IO-APIC ID register (0), you can probably use the argument 'apic' to
set the ID bits in the returned register.

> +
> + return 0xfd;

0xff might be better.

If something breaks in future, then we can add that register
implementation here.

thanks,
suresh

Suresh Siddha

unread,
Mar 20, 2012, 9:50:02 PM3/20/12
to
On Tue, 2012-03-20 at 20:12 -0400, Konrad Rzeszutek Wilk wrote:
> I was talking with Suresh on IRC and I realized that some of these patches
> that did this were posted some time ago, so I dusted them off and redid them
> just a tiny bit.
>
> Ingo, please take a look - and if you are OK with them please put in them
> the v3.4 queue - if you are not comfortable, then the above patch will have to
> suffice as a work-around for the regression.
>
> arch/x86/include/asm/io_apic.h | 31 +++++++++++++++++++++++++++++--
> arch/x86/include/asm/x86_init.h | 8 ++++++++
> arch/x86/kernel/apic/io_apic.c | 10 ++++++----
> arch/x86/kernel/setup.c | 2 +-
> arch/x86/kernel/x86_init.c | 8 ++++++++
> arch/x86/xen/Makefile | 2 +-
> arch/x86/xen/apic.c | 17 +++++++++++++++++
> arch/x86/xen/enlighten.c | 2 ++
> arch/x86/xen/xen-ops.h | 4 ++++
> 9 files changed, 76 insertions(+), 8 deletions(-)
> Jeremy Fitzhardinge (1):
> x86: add io_apic_ops to allow interception
>
> Konrad Rzeszutek Wilk (2):
> x86/apic_ops: Replace apic_ops with x86_apic_ops.
> xen/x86: Implement x86_apic_ops
>

Ingo, Konrad was mentioning you didn't like this pieces before.

But for what Xen is trying to do, either we have to pursue this or
untangle the gsi and ioapic dependency.

I am ok with this path for now.

Acked-by: Suresh Siddha <suresh....@intel.com>

Yinghai Lu

unread,
Mar 20, 2012, 10:40:02 PM3/20/12
to
On Tue, Mar 20, 2012 at 5:12 PM, Konrad Rzeszutek Wilk
<konra...@oracle.com> wrote:
> . which makes the code fit within the rest of the x86_ops functions.
> +
> +struct x86_apic_ops x86_apic = {
> +       .init = native_ioapic_init_mappings,
> +       .read = native_ioapic_read,
> +       .write = native_ioapic_write,
> +       .modify = native_ioapic_modify,
> +};

x86_ioapic_ops and x86_ioapic would be better?

Yinghai

Konrad Rzeszutek Wilk

unread,
Mar 21, 2012, 12:30:01 PM3/21/12
to
On Tue, Mar 20, 2012 at 07:31:51PM -0700, Yinghai Lu wrote:
> On Tue, Mar 20, 2012 at 5:12 PM, Konrad Rzeszutek Wilk
> <konra...@oracle.com> wrote:
> > . which makes the code fit within the rest of the x86_ops functions.
> > +
> > +struct x86_apic_ops x86_apic = {
> > +       .init = native_ioapic_init_mappings,
> > +       .read = native_ioapic_read,
> > +       .write = native_ioapic_write,
> > +       .modify = native_ioapic_modify,
> > +};
>
> x86_ioapic_ops and x86_ioapic would be better?

It certainly would. Thank you for the suggestion!

Konrad Rzeszutek Wilk

unread,
Mar 21, 2012, 12:50:02 PM3/21/12
to
ok, that would be just
return 0x0000000 | apic

right? Since the last 8 bits are the apic value?
>
> > +
> > + return 0xfd;
>
> 0xff might be better.

<nods> Done!

Suresh Siddha

unread,
Mar 21, 2012, 3:10:01 PM3/21/12
to
On Wed, 2012-03-21 at 12:41 -0400, Konrad Rzeszutek Wilk wrote:
> > > +unsigned int xen_io_apic_read(unsigned apic, unsigned reg)
> > > +{
> > > + if (reg == 0x1)
> > > + return 0x00170020;
> > > + else if (reg == 0x0)
> > > + return 0x00000000;
> >
> > For IO-APIC ID register (0), you can probably use the argument 'apic' to
> > set the ID bits in the returned register.
>
> ok, that would be just
> return 0x0000000 | apic
>
> right? Since the last 8 bits are the apic value?

It is the most significant 8 bits.

So you need to return (apic << 24);

thanks,
suresh

Konrad Rzeszutek Wilk

unread,
Mar 21, 2012, 11:10:01 PM3/21/12
to
Or rather just implement one different function as opposed
to the native one : the read function.

We synthesize the values.

Acked-by: Suresh Siddha <suresh....@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
---
arch/x86/xen/Makefile | 2 +-
arch/x86/xen/apic.c | 17 +++++++++++++++++
arch/x86/xen/enlighten.c | 2 ++
arch/x86/xen/xen-ops.h | 4 ++++
4 files changed, 24 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/xen/apic.c

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index add2c2d..96ab2c0 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -20,5 +20,5 @@ obj-$(CONFIG_EVENT_TRACING) += trace.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
-obj-$(CONFIG_XEN_DOM0) += vga.o
+obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
new file mode 100644
index 0000000..71ed91c
--- /dev/null
+++ b/arch/x86/xen/apic.c
@@ -0,0 +1,17 @@
+#include <linux/init.h>
+#include <asm/x86_init.h>
+
+unsigned int xen_io_apic_read(unsigned apic, unsigned reg)
+{
+ if (reg == 0x1)
+ return 0x00170020;
+ else if (reg == 0x0)
+ return apic << 24;
+
+ return 0xff;
+}
+
+void __init xen_init_apic(void)
+{
+ x86_ioapic.read = xen_io_apic_read;
+}
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 0732326..93a03195 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1377,6 +1377,8 @@ asmlinkage void __init xen_start_kernel(void)
xen_start_info->console.domU.mfn = 0;
xen_start_info->console.domU.evtchn = 0;

+ xen_init_apic();
+
/* Make sure ACS will be enabled */
pci_request_acs();

diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index b095739..45c0c06 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -92,11 +92,15 @@ struct dom0_vga_console_info;

#ifdef CONFIG_XEN_DOM0
void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size);
+void __init xen_init_apic(void);
#else
static inline void __init xen_init_vga(const struct dom0_vga_console_info *info,
size_t size)
{
}
+static inline void __init xen_init_apic(void)
+{
+}
#endif

/* Declare an asm function, along with symbols needed to make it
--
1.7.7.5

Konrad Rzeszutek Wilk

unread,
Mar 21, 2012, 11:10:01 PM3/21/12
to
From: Jeremy Fitzhardinge <jeremy.fi...@citrix.com>

Xen dom0 needs to paravirtualize IO operations to the IO APIC, so add
a io_apic_ops for it to intercept. Do this as ops structure because
there's at least some chance that another paravirtualized environment
may want to intercept these.

[ Impact: indirect IO APIC access via io_apic_ops ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fi...@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
Acked-by: Suresh Siddha <suresh....@intel.com>

Konrad Rzeszutek Wilk

unread,
Mar 21, 2012, 11:10:02 PM3/21/12
to
> > Ingo, please take a look - and if you are OK with them please put in them
> > the v3.4 queue - if you are not comfortable, then the above patch will have to
> > suffice as a work-around for the regression.
> >
> > arch/x86/include/asm/io_apic.h | 31 +++++++++++++++++++++++++++++--
> > arch/x86/include/asm/x86_init.h | 8 ++++++++
> > arch/x86/kernel/apic/io_apic.c | 10 ++++++----
> > arch/x86/kernel/setup.c | 2 +-
> > arch/x86/kernel/x86_init.c | 8 ++++++++
> > arch/x86/xen/Makefile | 2 +-
> > arch/x86/xen/apic.c | 17 +++++++++++++++++
> > arch/x86/xen/enlighten.c | 2 ++
> > arch/x86/xen/xen-ops.h | 4 ++++
> > 9 files changed, 76 insertions(+), 8 deletions(-)
> > Jeremy Fitzhardinge (1):
> > x86: add io_apic_ops to allow interception
> >
> > Konrad Rzeszutek Wilk (2):
> > x86/apic_ops: Replace apic_ops with x86_apic_ops.
> > xen/x86: Implement x86_apic_ops
> >
>
> Ingo, Konrad was mentioning you didn't like this pieces before.
>
> But for what Xen is trying to do, either we have to pursue this or
> untangle the gsi and ioapic dependency.
>
> I am ok with this path for now.
>
> Acked-by: Suresh Siddha <suresh....@intel.com>

Redid them per feedback. Please consider them for 3.4. Thanks!

Konrad Rzeszutek Wilk

unread,
Mar 21, 2012, 11:10:01 PM3/21/12
to
. which makes the code fit within the rest of the x86_ops functions.

Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
[v1: Changed x86_apic -> x86_ioapic per Yinghai Lu <yin...@kernel.org> suggestion]
Acked-by: Suresh Siddha <suresh....@intel.com>
---
arch/x86/include/asm/io_apic.h | 40 +++++++++++++++++++++--------
arch/x86/include/asm/x86_init.h | 8 ++++++
arch/x86/kernel/apic/io_apic.c | 54 ++++----------------------------------
arch/x86/kernel/setup.c | 2 +-
arch/x86/kernel/x86_init.c | 8 ++++++
5 files changed, 52 insertions(+), 60 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 190d8c2..ba1b11a 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -5,6 +5,7 @@
#include <asm/mpspec.h>
#include <asm/apicdef.h>
#include <asm/irq_vectors.h>
+#include <asm/x86_init.h>

/*
* Intel IO-APIC support for SMP and UP systems.
@@ -21,15 +22,6 @@
#define IO_APIC_REDIR_LEVEL_TRIGGER (1 << 15)
#define IO_APIC_REDIR_MASKED (1 << 16)

-struct io_apic_ops {
- void (*init)(void);
- unsigned int (*read)(unsigned int apic, unsigned int reg);
- void (*write)(unsigned int apic, unsigned int reg, unsigned int value);
- void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
-};
-
-void __init set_io_apic_ops(const struct io_apic_ops *);
-
/*
* The structure of the IO-APIC:
*/
@@ -156,7 +148,6 @@ struct io_apic_irq_attr;
extern int io_apic_set_pci_routing(struct device *dev, int irq,
struct io_apic_irq_attr *irq_attr);
void setup_IO_APIC_irq_extra(u32 gsi);
-extern void ioapic_and_gsi_init(void);
extern void ioapic_insert_resources(void);

int io_apic_setup_irq_pin_once(unsigned int irq, int node, struct io_apic_irq_attr *attr);
@@ -185,12 +176,35 @@ extern void mp_save_irq(struct mpc_intsrc *m);

extern void disable_ioapic_support(void);

+
+void __init native_ioapic_init_mappings(void);
+unsigned int native_ioapic_read(unsigned int apic, unsigned int reg);
+void native_ioapic_write(unsigned int apic, unsigned int reg,
+ unsigned int val);
+void native_ioapic_modify(unsigned int apic, unsigned int reg,
+ unsigned int val);
+
+static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
+{
+ return x86_ioapic.read(apic, reg);
+}
+
+static inline void io_apic_write(unsigned int apic, unsigned int reg,
+ unsigned int value)
+{
+ x86_ioapic.write(apic, reg, value);
+}
+
+static inline void io_apic_modify(unsigned int apic, unsigned int reg,
+ unsigned int value)
+{
+ x86_ioapic.modify(apic, reg, value);
+}
#else /* !CONFIG_X86_IO_APIC */

#define io_apic_assign_pci_irqs 0
#define setup_ioapic_ids_from_mpc x86_init_noop
static const int timer_through_8259 = 0;
-static inline void ioapic_and_gsi_init(void) { }
static inline void ioapic_insert_resources(void) { }
#define gsi_top (NR_IRQS_LEGACY)
static inline int mp_find_ioapic(u32 gsi) { return 0; }
@@ -212,6 +226,10 @@ static inline int restore_ioapic_entries(void)

static inline void mp_save_irq(struct mpc_intsrc *m) { };
static inline void disable_ioapic_support(void) { }
+#define native_ioapic_init_mappings NULL
+#define native_ioapic_read NULL
+#define native_ioapic_write NULL
+#define native_ioapic_modify NULL
#endif

#endif /* _ASM_X86_IO_APIC_H */
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 517d476..a3730cc 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -182,10 +182,18 @@ struct x86_msi_ops {
void (*restore_msi_irqs)(struct pci_dev *dev, int irq);
};

+struct x86_ioapic_ops {
+ void (*init)(void);
+ unsigned int (*read)(unsigned int apic, unsigned int reg);
+ void (*write)(unsigned int apic, unsigned int reg, unsigned int value);
+ void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
+};
+
extern struct x86_init_ops x86_init;
extern struct x86_cpuinit_ops x86_cpuinit;
extern struct x86_platform_ops x86_platform;
extern struct x86_msi_ops x86_msi;
+extern struct x86_ioapic_ops x86_ioapic;

extern void x86_init_noop(void);
extern void x86_init_uint_noop(unsigned int unused);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index bf120234..9a15d4b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -67,25 +67,6 @@
#define for_each_irq_pin(entry, head) \
for (entry = head; entry; entry = entry->next)

-static void __init __ioapic_init_mappings(void);
-static unsigned int __io_apic_read(unsigned int apic, unsigned int reg);
-static void __io_apic_write(unsigned int apic, unsigned int reg,
- unsigned int val);
-static void __io_apic_modify(unsigned int apic, unsigned int reg,
- unsigned int val);
-
-static struct io_apic_ops io_apic_ops = {
- .init = __ioapic_init_mappings,
- .read = __io_apic_read,
- .write = __io_apic_write,
- .modify = __io_apic_modify,
-};
-
-void __init set_io_apic_ops(const struct io_apic_ops *ops)
-{
- io_apic_ops = *ops;
-}
-
/*
* Is the SiS APIC rmw bug present ?
* -1 = don't know, 0 = no, 1 = yes
@@ -313,24 +294,6 @@ static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
irq_free_desc(at);
}

-static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
-{
- return io_apic_ops.read(apic, reg);
-}
-
-static inline void io_apic_write(unsigned int apic, unsigned int reg,
- unsigned int value)
-{
- io_apic_ops.write(apic, reg, value);
-}
-
-static inline void io_apic_modify(unsigned int apic, unsigned int reg,
- unsigned int value)
-{
- io_apic_ops.modify(apic, reg, value);
-}
-
-
struct io_apic {
unsigned int index;
unsigned int unused[3];
@@ -351,15 +314,15 @@ static inline void io_apic_eoi(unsigned int apic, unsigned int vector)
writel(vector, &io_apic->eoi);
}

-static unsigned int __io_apic_read(unsigned int apic, unsigned int reg)
+unsigned int native_ioapic_read(unsigned int apic, unsigned int reg)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);
writel(reg, &io_apic->index);
return readl(&io_apic->data);
}

-static void __io_apic_write(unsigned int apic, unsigned int reg,
- unsigned int value)
+void native_ioapic_write(unsigned int apic, unsigned int reg,
+ unsigned int value)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);
writel(reg, &io_apic->index);
@@ -372,8 +335,8 @@ static void __io_apic_write(unsigned int apic, unsigned int reg,
*
* Older SiS APIC requires we rewrite the index register
*/
-static void __io_apic_modify(unsigned int apic, unsigned int reg,
- unsigned int value)
+void native_ioapic_modify(unsigned int apic, unsigned int reg,
+ unsigned int value)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);

@@ -3910,12 +3873,7 @@ static struct resource * __init ioapic_setup_resources(int nr_ioapics)
return res;
}

-void __init ioapic_and_gsi_init(void)
-{
- io_apic_ops.init();
-}
-
-static void __init __ioapic_init_mappings(void)
+void __init native_ioapic_init_mappings(void)
{
unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
struct resource *ioapic_res;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d7d5099..7eaef1a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1016,7 +1016,7 @@ void __init setup_arch(char **cmdline_p)
init_cpu_to_node();

init_apic_mappings();
- ioapic_and_gsi_init();
+ x86_ioapic.init();

kvm_guest_init();

diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 947a06c..df870d3 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -18,6 +18,7 @@
#include <asm/e820.h>
#include <asm/time.h>
#include <asm/irq.h>
+#include <asm/io_apic.h>
#include <asm/pat.h>
#include <asm/tsc.h>
#include <asm/iommu.h>
@@ -117,3 +118,10 @@ struct x86_msi_ops x86_msi = {
.teardown_msi_irqs = default_teardown_msi_irqs,
.restore_msi_irqs = default_restore_msi_irqs,
};
+
+struct x86_ioapic_ops x86_ioapic = {
+ .init = native_ioapic_init_mappings,
+ .read = native_ioapic_read,
+ .write = native_ioapic_write,
+ .modify = native_ioapic_modify,
+};
--
1.7.7.5

Ingo Molnar

unread,
Mar 23, 2012, 8:30:03 AM3/23/12
to

* Konrad Rzeszutek Wilk <konra...@oracle.com> wrote:

> . which makes the code fit within the rest of the x86_ops functions.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
> [v1: Changed x86_apic -> x86_ioapic per Yinghai Lu <yin...@kernel.org> suggestion]
> Acked-by: Suresh Siddha <suresh....@intel.com>
> ---
> arch/x86/include/asm/io_apic.h | 40 +++++++++++++++++++++--------
> arch/x86/include/asm/x86_init.h | 8 ++++++
> arch/x86/kernel/apic/io_apic.c | 54 ++++----------------------------------
> arch/x86/kernel/setup.c | 2 +-
> arch/x86/kernel/x86_init.c | 8 ++++++
> 5 files changed, 52 insertions(+), 60 deletions(-)

Ok, I guess we can do this.

Thanks,

Ingo

Konrad Rzeszutek Wilk

unread,
Mar 27, 2012, 10:50:02 AM3/27/12
to
On Fri, Mar 23, 2012 at 01:24:44PM +0100, Ingo Molnar wrote:
>
> * Konrad Rzeszutek Wilk <konra...@oracle.com> wrote:
>
> > . which makes the code fit within the rest of the x86_ops functions.
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
> > [v1: Changed x86_apic -> x86_ioapic per Yinghai Lu <yin...@kernel.org> suggestion]
> > Acked-by: Suresh Siddha <suresh....@intel.com>
> > ---
> > arch/x86/include/asm/io_apic.h | 40 +++++++++++++++++++++--------
> > arch/x86/include/asm/x86_init.h | 8 ++++++
> > arch/x86/kernel/apic/io_apic.c | 54 ++++----------------------------------
> > arch/x86/kernel/setup.c | 2 +-
> > arch/x86/kernel/x86_init.c | 8 ++++++
> > 5 files changed, 52 insertions(+), 60 deletions(-)
>
> Ok, I guess we can do this.

Ingo,

I am not sure how your prefer these patches, but I think you like git pull, so please
git pull the following branch:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-ingo-3.4

which is based of your "x86-urgent-for-linus" branch and has the following patches:

Jeremy Fitzhardinge (1):
x86: add io_apic_ops to allow interception

Konrad Rzeszutek Wilk (2):
x86/apic_ops: Replace apic_ops with x86_apic_ops.
xen/x86: Implement x86_apic_ops


arch/x86/include/asm/io_apic.h | 31 +++++++++++++++++++++++++++++--
arch/x86/include/asm/x86_init.h | 8 ++++++++
arch/x86/kernel/apic/io_apic.c | 10 ++++++----
arch/x86/kernel/setup.c | 2 +-
arch/x86/kernel/x86_init.c | 8 ++++++++
arch/x86/xen/Makefile | 2 +-
arch/x86/xen/apic.c | 17 +++++++++++++++++
arch/x86/xen/enlighten.c | 2 ++
arch/x86/xen/xen-ops.h | 4 ++++
9 files changed, 76 insertions(+), 8 deletions(-)

Please pull!

Ingo Molnar

unread,
Mar 28, 2012, 5:30:02 AM3/28/12
to

* Konrad Rzeszutek Wilk <konra...@oracle.com> wrote:

> From: Jeremy Fitzhardinge <jeremy.fi...@citrix.com>
>
> Xen dom0 needs to paravirtualize IO operations to the IO APIC, so add
> a io_apic_ops for it to intercept. Do this as ops structure because
> there's at least some chance that another paravirtualized environment
> may want to intercept these.
>
> [ Impact: indirect IO APIC access via io_apic_ops ]
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fi...@citrix.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
> Acked-by: Suresh Siddha <suresh....@intel.com>
> ---
> arch/x86/include/asm/io_apic.h | 9 +++++++
> arch/x86/kernel/apic/io_apic.c | 50 +++++++++++++++++++++++++++++++++++++--
> 2 files changed, 56 insertions(+), 3 deletions(-)

Ok, I have applied a cleaned up version of this patch.

The followup patch does not apply anymore - please merge it and
resubmit.

Thanks,

Ingo

tip-bot for Jeremy Fitzhardinge

unread,
Mar 28, 2012, 5:40:03 AM3/28/12
to
Commit-ID: 136d249ef7dbf0fefa292082cc40be1ea864cbd6
Gitweb: http://git.kernel.org/tip/136d249ef7dbf0fefa292082cc40be1ea864cbd6
Author: Jeremy Fitzhardinge <jeremy.fi...@citrix.com>
AuthorDate: Wed, 21 Mar 2012 22:58:08 -0400
Committer: Ingo Molnar <mi...@kernel.org>
CommitDate: Wed, 28 Mar 2012 09:49:29 +0200

x86/ioapic: Add io_apic_ops driver layer to allow interception

Xen dom0 needs to paravirtualize IO operations to the IO APIC,
so add a io_apic_ops for it to intercept. Do this as ops
structure because there's at least some chance that another
paravirtualized environment may want to intercept these.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fi...@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
Acked-by: Suresh Siddha <suresh....@intel.com>
Cc: jwb...@redhat.com
Cc: yin...@kernel.org
Link: http://lkml.kernel.org/r/1332385090-18056-2-git...@oracle.com
[ Made all the affected code easier on the eyes ]
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
arch/x86/include/asm/io_apic.h | 9 ++++++
arch/x86/kernel/apic/io_apic.c | 58 +++++++++++++++++++++++++++++++++++-----
2 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 690d1cc..2c4943d 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -21,6 +21,15 @@
#define IO_APIC_REDIR_LEVEL_TRIGGER (1 << 15)
#define IO_APIC_REDIR_MASKED (1 << 16)

+struct io_apic_ops {
+ void (*init) (void);
+ unsigned int (*read) (unsigned int apic, unsigned int reg);
+ void (*write) (unsigned int apic, unsigned int reg, unsigned int value);
+ void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
+};
+
+void __init set_io_apic_ops(const struct io_apic_ops *);
+
/*
* The structure of the IO-APIC:
*/
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 2c428c5..e88300d 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -64,9 +64,28 @@
#include <asm/apic.h>

#define __apicdebuginit(type) static type __init
+
#define for_each_irq_pin(entry, head) \
for (entry = head; entry; entry = entry->next)

+static void __init __ioapic_init_mappings(void);
+
+static unsigned int __io_apic_read (unsigned int apic, unsigned int reg);
+static void __io_apic_write (unsigned int apic, unsigned int reg, unsigned int val);
+static void __io_apic_modify(unsigned int apic, unsigned int reg, unsigned int val);
+
+static struct io_apic_ops io_apic_ops = {
+ .init = __ioapic_init_mappings,
+ .read = __io_apic_read,
+ .write = __io_apic_write,
+ .modify = __io_apic_modify,
+};
+
+void __init set_io_apic_ops(const struct io_apic_ops *ops)
+{
+ io_apic_ops = *ops;
+}
+
/*
* Is the SiS APIC rmw bug present ?
* -1 = don't know, 0 = no, 1 = yes
@@ -294,6 +313,22 @@ static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
irq_free_desc(at);
}

+static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
+{
+ return io_apic_ops.read(apic, reg);
+}
+
+static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
+{
+ io_apic_ops.write(apic, reg, value);
+}
+
+static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
+{
+ io_apic_ops.modify(apic, reg, value);
+}
+
+
struct io_apic {
unsigned int index;
unsigned int unused[3];
@@ -314,16 +349,17 @@ static inline void io_apic_eoi(unsigned int apic, unsigned int vector)
writel(vector, &io_apic->eoi);
}

-static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
+static unsigned int __io_apic_read(unsigned int apic, unsigned int reg)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);
writel(reg, &io_apic->index);
return readl(&io_apic->data);
}

-static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
+static void __io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);
+
writel(reg, &io_apic->index);
writel(value, &io_apic->data);
}
@@ -334,7 +370,7 @@ static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned i
*
* Older SiS APIC requires we rewrite the index register
*/
-static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
+static void __io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);

@@ -377,6 +413,7 @@ static struct IO_APIC_route_entry __ioapic_read_entry(int apic, int pin)

eu.w1 = io_apic_read(apic, 0x10 + 2 * pin);
eu.w2 = io_apic_read(apic, 0x11 + 2 * pin);
+
return eu.entry;
}

@@ -384,9 +421,11 @@ static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
{
union entry_union eu;
unsigned long flags;
+
raw_spin_lock_irqsave(&ioapic_lock, flags);
eu.entry = __ioapic_read_entry(apic, pin);
raw_spin_unlock_irqrestore(&ioapic_lock, flags);
+
return eu.entry;
}

@@ -396,8 +435,7 @@ static struct IO_APIC_route_entry ioapic_read_entry(int apic, int pin)
* the interrupt, and we need to make sure the entry is fully populated
* before that happens.
*/
-static void
-__ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
+static void __ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
{
union entry_union eu = {{0, 0}};

@@ -409,6 +447,7 @@ __ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
static void ioapic_write_entry(int apic, int pin, struct IO_APIC_route_entry e)
{
unsigned long flags;
+
raw_spin_lock_irqsave(&ioapic_lock, flags);
__ioapic_write_entry(apic, pin, e);
raw_spin_unlock_irqrestore(&ioapic_lock, flags);
@@ -435,8 +474,7 @@ static void ioapic_mask_entry(int apic, int pin)
* shared ISA-space IRQs, so we have to support them. We are super
* fast in the common case, and fast for shared ISA-space IRQs.
*/
-static int
-__add_pin_to_irq_node(struct irq_cfg *cfg, int node, int apic, int pin)
+static int __add_pin_to_irq_node(struct irq_cfg *cfg, int node, int apic, int pin)
{
struct irq_pin_list **last, *entry;

@@ -521,6 +559,7 @@ static void io_apic_sync(struct irq_pin_list *entry)
* a dummy read from the IO-APIC
*/
struct io_apic __iomem *io_apic;
+
io_apic = io_apic_base(entry->apic);
readl(&io_apic->data);
}
@@ -3894,6 +3933,11 @@ static struct resource * __init ioapic_setup_resources(int nr_ioapics)

void __init ioapic_and_gsi_init(void)
{
+ io_apic_ops.init();
+}
+
+static void __init __ioapic_init_mappings(void)
+{
unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
struct resource *ioapic_res;
int i;
--

Konrad Rzeszutek Wilk

unread,
Mar 28, 2012, 1:50:01 PM3/28/12
to
Done. I've put them on: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-ingo-3.4.v2

and I am also attaching them here for your convenience.

They differ a bit from the previous one - the 'x86_io_apic' is now
called 'x86_io_apic_ops' - otherwise the warning of functions referencing
the __init function would show up - but otherwise they are exactly the same.

Konrad Rzeszutek Wilk

unread,
Mar 28, 2012, 1:50:01 PM3/28/12
to
Or rather just implement one different function as opposed
to the native one : the read function.

We synthesize the values.

Acked-by: Suresh Siddha <suresh....@intel.com>
[v1: Rebased on top of tip/x86/urgent]
Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
---
arch/x86/xen/Makefile | 2 +-
arch/x86/xen/apic.c | 17 +++++++++++++++++
arch/x86/xen/enlighten.c | 2 ++
arch/x86/xen/xen-ops.h | 4 ++++
4 files changed, 24 insertions(+), 1 deletions(-)
create mode 100644 arch/x86/xen/apic.c

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index add2c2d..96ab2c0 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -20,5 +20,5 @@ obj-$(CONFIG_EVENT_TRACING) += trace.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
-obj-$(CONFIG_XEN_DOM0) += vga.o
+obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
new file mode 100644
index 0000000..aee16ab
--- /dev/null
+++ b/arch/x86/xen/apic.c
@@ -0,0 +1,17 @@
+#include <linux/init.h>
+#include <asm/x86_init.h>
+
+unsigned int xen_io_apic_read(unsigned apic, unsigned reg)
+{
+ if (reg == 0x1)
+ return 0x00170020;
+ else if (reg == 0x0)
+ return apic << 24;
+
+ return 0xff;
+}
+
+void __init xen_init_apic(void)
+{
+ x86_io_apic_ops.read = xen_io_apic_read;
+}
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 4172af8..a7df080 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1271,6 +1271,8 @@ asmlinkage void __init xen_start_kernel(void)
xen_start_info->console.domU.mfn = 0;
xen_start_info->console.domU.evtchn = 0;

+ xen_init_apic();
+
/* Make sure ACS will be enabled */
pci_request_acs();
}
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index b095739..45c0c06 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -92,11 +92,15 @@ struct dom0_vga_console_info;

#ifdef CONFIG_XEN_DOM0
void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size);
+void __init xen_init_apic(void);
#else
static inline void __init xen_init_vga(const struct dom0_vga_console_info *info,
size_t size)
{
}
+static inline void __init xen_init_apic(void)
+{
+}
#endif

/* Declare an asm function, along with symbols needed to make it
--
1.7.7.5

Konrad Rzeszutek Wilk

unread,
Mar 28, 2012, 1:50:02 PM3/28/12
to
Which makes the code fit within the rest of the x86_ops functions.

Acked-by: Suresh Siddha <suresh....@intel.com>
[v1: Changed x86_apic -> x86_ioapic per Yinghai Lu <yin...@kernel.org> suggestion]
[v2: Rebased on tip/x86/urgent and redid to match Ingo's syntax style]
Signed-off-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
---
arch/x86/include/asm/io_apic.h | 35 +++++++++++++++++++----------
arch/x86/include/asm/x86_init.h | 9 ++++++-
arch/x86/kernel/apic/io_apic.c | 46 +++-----------------------------------
arch/x86/kernel/setup.c | 2 +-
arch/x86/kernel/x86_init.c | 8 ++++++
5 files changed, 44 insertions(+), 56 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 2c4943d..73d8c53 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -5,7 +5,7 @@
#include <asm/mpspec.h>
#include <asm/apicdef.h>
#include <asm/irq_vectors.h>
-
+#include <asm/x86_init.h>
/*
* Intel IO-APIC support for SMP and UP systems.
*
@@ -21,15 +21,6 @@
#define IO_APIC_REDIR_LEVEL_TRIGGER (1 << 15)
#define IO_APIC_REDIR_MASKED (1 << 16)

-struct io_apic_ops {
- void (*init) (void);
- unsigned int (*read) (unsigned int apic, unsigned int reg);
- void (*write) (unsigned int apic, unsigned int reg, unsigned int value);
- void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
-};
-
-void __init set_io_apic_ops(const struct io_apic_ops *);
-
/*
* The structure of the IO-APIC:
*/
@@ -156,7 +147,6 @@ struct io_apic_irq_attr;
extern int io_apic_set_pci_routing(struct device *dev, int irq,
struct io_apic_irq_attr *irq_attr);
void setup_IO_APIC_irq_extra(u32 gsi);
-extern void ioapic_and_gsi_init(void);
extern void ioapic_insert_resources(void);

int io_apic_setup_irq_pin_once(unsigned int irq, int node, struct io_apic_irq_attr *attr);
@@ -185,12 +175,29 @@ extern void mp_save_irq(struct mpc_intsrc *m);

extern void disable_ioapic_support(void);

+extern void __init native_io_apic_init_mappings(void);
+extern unsigned int native_io_apic_read(unsigned int apic, unsigned int reg);
+extern void native_io_apic_write(unsigned int apic, unsigned int reg, unsigned int val);
+extern void native_io_apic_modify(unsigned int apic, unsigned int reg, unsigned int val);
+
+static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
+{
+ return x86_io_apic_ops.read(apic, reg);
+}
+
+static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
+{
+ x86_io_apic_ops.write(apic, reg, value);
+}
+static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
+{
+ x86_io_apic_ops.modify(apic, reg, value);
+}
#else /* !CONFIG_X86_IO_APIC */

#define io_apic_assign_pci_irqs 0
#define setup_ioapic_ids_from_mpc x86_init_noop
static const int timer_through_8259 = 0;
-static inline void ioapic_and_gsi_init(void) { }
static inline void ioapic_insert_resources(void) { }
#define gsi_top (NR_IRQS_LEGACY)
static inline int mp_find_ioapic(u32 gsi) { return 0; }
@@ -212,6 +219,10 @@ static inline int restore_ioapic_entries(void)

static inline void mp_save_irq(struct mpc_intsrc *m) { };
static inline void disable_ioapic_support(void) { }
+#define native_io_apic_init_mappings NULL
+#define native_io_apic_read NULL
+#define native_io_apic_write NULL
+#define native_io_apic_modify NULL
#endif

#endif /* _ASM_X86_IO_APIC_H */
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 517d476..3a0f04a 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -182,11 +182,18 @@ struct x86_msi_ops {
void (*restore_msi_irqs)(struct pci_dev *dev, int irq);
};

+struct x86_io_apic_ops {
+ void (*init) (void);
+ unsigned int (*read) (unsigned int apic, unsigned int reg);
+ void (*write) (unsigned int apic, unsigned int reg, unsigned int value);
+ void (*modify)(unsigned int apic, unsigned int reg, unsigned int value);
+};
+
extern struct x86_init_ops x86_init;
extern struct x86_cpuinit_ops x86_cpuinit;
extern struct x86_platform_ops x86_platform;
extern struct x86_msi_ops x86_msi;
-
+extern struct x86_io_apic_ops x86_io_apic_ops;
extern void x86_init_noop(void);
extern void x86_init_uint_noop(unsigned int unused);
extern void x86_default_fixup_cpu_id(struct cpuinfo_x86 *c, int node);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index e88300d..973539c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -68,24 +68,6 @@
#define for_each_irq_pin(entry, head) \
for (entry = head; entry; entry = entry->next)

-static void __init __ioapic_init_mappings(void);
-
-static unsigned int __io_apic_read (unsigned int apic, unsigned int reg);
-static void __io_apic_write (unsigned int apic, unsigned int reg, unsigned int val);
-static void __io_apic_modify(unsigned int apic, unsigned int reg, unsigned int val);
-
-static struct io_apic_ops io_apic_ops = {
- .init = __ioapic_init_mappings,
- .read = __io_apic_read,
- .write = __io_apic_write,
- .modify = __io_apic_modify,
-};
-
-void __init set_io_apic_ops(const struct io_apic_ops *ops)
-{
- io_apic_ops = *ops;
-}
-
/*
* Is the SiS APIC rmw bug present ?
* -1 = don't know, 0 = no, 1 = yes
@@ -313,21 +295,6 @@ static void free_irq_at(unsigned int at, struct irq_cfg *cfg)
irq_free_desc(at);
}

-static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
-{
- return io_apic_ops.read(apic, reg);
-}
-
-static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
-{
- io_apic_ops.write(apic, reg, value);
-}
-
-static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
-{
- io_apic_ops.modify(apic, reg, value);
-}
-

struct io_apic {
unsigned int index;
@@ -349,14 +316,14 @@ static inline void io_apic_eoi(unsigned int apic, unsigned int vector)
writel(vector, &io_apic->eoi);
}

-static unsigned int __io_apic_read(unsigned int apic, unsigned int reg)
+unsigned int native_io_apic_read(unsigned int apic, unsigned int reg)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);
writel(reg, &io_apic->index);
return readl(&io_apic->data);
}

-static void __io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
+void native_io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);

@@ -370,7 +337,7 @@ static void __io_apic_write(unsigned int apic, unsigned int reg, unsigned int va
*
* Older SiS APIC requires we rewrite the index register
*/
-static void __io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
+void native_io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
{
struct io_apic __iomem *io_apic = io_apic_base(apic);

@@ -3931,12 +3898,7 @@ static struct resource * __init ioapic_setup_resources(int nr_ioapics)
return res;
}

-void __init ioapic_and_gsi_init(void)
-{
- io_apic_ops.init();
-}
-
-static void __init __ioapic_init_mappings(void)
+void __init native_io_apic_init_mappings(void)
{
unsigned long ioapic_phys, idx = FIX_IO_APIC_BASE_0;
struct resource *ioapic_res;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 8863888..944288e 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1022,7 +1022,7 @@ void __init setup_arch(char **cmdline_p)
init_cpu_to_node();

init_apic_mappings();
- ioapic_and_gsi_init();
+ x86_io_apic_ops.init();

kvm_guest_init();

diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 947a06c..53771f1 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -18,6 +18,7 @@
#include <asm/e820.h>
#include <asm/time.h>
#include <asm/irq.h>
+#include <asm/io_apic.h>
#include <asm/pat.h>
#include <asm/tsc.h>
#include <asm/iommu.h>
@@ -117,3 +118,10 @@ struct x86_msi_ops x86_msi = {
.teardown_msi_irqs = default_teardown_msi_irqs,
.restore_msi_irqs = default_restore_msi_irqs,
};
+
+struct x86_io_apic_ops x86_io_apic_ops = {
+ .init = native_io_apic_init_mappings,
+ .read = native_io_apic_read,
+ .write = native_io_apic_write,
+ .modify = native_io_apic_modify,
+};
--
1.7.7.5

Lin Ming

unread,
Apr 11, 2012, 12:50:01 AM4/11/12
to
Why is fixed value returned?

Can't we implement it with hypercall(PHYSDEVOP_apic_read)?

Lin Ming

Konrad Rzeszutek Wilk

unread,
Apr 16, 2012, 12:00:02 PM4/16/12
to
We sure can. But this was meant to fix the regression - so I
figured we would do that later on. But the more important question
is - do we actually need to do the hypercall? We aren't using the
IOAPIC for anything - we don't program the pins from dom0.

Zhang, Xiantao

unread,
Apr 17, 2012, 4:00:01 AM4/17/12
to
> > > +       if (reg == 0x1)
> > > +               return 0x00170020;
> > > +       else if (reg == 0x0)
> > > +               return apic << 24;
> >
> > Why is fixed value returned?
> >
> > Can't we implement it with hypercall(PHYSDEVOP_apic_read)?
>
> We sure can. But this was meant to fix the regression - so I figured we would
> do that later on. But the more important question is - do we actually need to
> do the hypercall? We aren't using the IOAPIC for anything - we don't program
> the pins from dom0.

Actually, dom0 doesn't programe ioapic pins directly as you said, but the work is done by dom0 in the indirect way. For dom0, I added a hypercall PHYSDEVOP_setup_gsi before, and with this hypercall, we can get rid of programing IOAPIC pins from dom0 and hypervisor helps to do the ioapic programming, but we still need to get correct GSI information from hypervisor, otherwise, the hypercall may meet issues.
Xiantao


> >
> > Lin Ming
> >
> > > +
> > > +       return 0xff;
> > > +}
> > > +
> > > +void __init xen_init_apic(void)
> > > +{
> > > +       x86_io_apic_ops.read = xen_io_apic_read; }
> > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> > > index 4172af8..a7df080 100644
> > > --- a/arch/x86/xen/enlighten.c
> > > +++ b/arch/x86/xen/enlighten.c
> > > @@ -1271,6 +1271,8 @@ asmlinkage void __init xen_start_kernel(void)
> > >                xen_start_info->console.domU.mfn = 0;
> > >                xen_start_info->console.domU.evtchn = 0;
> > >
> > > +               xen_init_apic();
> > > +
> > >                /* Make sure ACS will be enabled */
> > >                pci_request_acs();
> > >        }
> > > diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index
> > > b095739..45c0c06 100644
> > > --- a/arch/x86/xen/xen-ops.h
> > > +++ b/arch/x86/xen/xen-ops.h
> > > @@ -92,11 +92,15 @@ struct dom0_vga_console_info;
> > >
> > >  #ifdef CONFIG_XEN_DOM0
> > >  void __init xen_init_vga(const struct dom0_vga_console_info *,
> > > size_t size);
> > > +void __init xen_init_apic(void);
> > >  #else
> > >  static inline void __init xen_init_vga(const struct
> > > dom0_vga_console_info *info,
> > >                                       size_t size)
> > >  {
> > >  }
> > > +static inline void __init xen_init_apic(void) { }

Konrad Rzeszutek Wilk

unread,
Apr 18, 2012, 5:10:01 PM4/18/12
to
> > > > +unsigned int xen_io_apic_read(unsigned apic, unsigned reg) {
> > > > +       if (reg == 0x1)
> > > > +               return 0x00170020;
> > > > +       else if (reg == 0x0)
> > > > +               return apic << 24;
> > >
> > > Why is fixed value returned?
> > >
> > > Can't we implement it with hypercall(PHYSDEVOP_apic_read)?
> >
> > We sure can. But this was meant to fix the regression - so I figured we would
> > do that later on. But the more important question is - do we actually need to
> > do the hypercall? We aren't using the IOAPIC for anything - we don't program
> > the pins from dom0.
>
> Actually, dom0 doesn't programe ioapic pins directly as you said, but the work is done by dom0 in the indirect way. For dom0, I added a hypercall PHYSDEVOP_setup_gsi before, and with this hypercall, we can get rid of programing IOAPIC pins from dom0 and hypervisor helps to do the ioapic programming, but we still need to get correct GSI information from hypervisor, otherwise, the hypercall may meet issues.

The GSI information (I think of the polarity and trigger, and as well
which PCI devices pins correspond to what interrupt) is retrieved via the ACPI _PRT.

The IO-APIC isn't used at all - instead the code in arch/x86/pci/xen.c is utilized
which does the appropiate lookup and invokes the proper hypercall to bind an
PIRQ (GSI) to an event channel (and the event channel is then binded to the Linux
IRQ subsystem).

The hypercall - PHYSDEVOP_apic_read can be implemented here - and I am
more than open for folks to submit a patch for it - but it does not affect
how GSI/IO-APIC programming is done. This will just report proper IO-APIC
information instead of the fabricated one.

Lin Ming

unread,
Apr 20, 2012, 5:40:01 AM4/20/12
to
On Thu, Apr 19, 2012 at 5:03 AM, Konrad Rzeszutek Wilk
<konra...@oracle.com> wrote:
>> > > > +unsigned int xen_io_apic_read(unsigned apic, unsigned reg) {
>> > > > +       if (reg == 0x1)
>> > > > +               return 0x00170020;
>> > > > +       else if (reg == 0x0)
>> > > > +               return apic << 24;
>> > >
>> > > Why is fixed value returned?
>> > >
>> > > Can't we implement it with hypercall(PHYSDEVOP_apic_read)?
>> >
>> > We sure can. But this was meant to fix the regression - so I figured we would
>> > do that later on. But the more important question is - do we actually need to
>> > do the hypercall? We aren't using the IOAPIC for anything - we don't program
>> > the pins from dom0.
>>
>>  Actually, dom0 doesn't programe ioapic pins directly as you said,  but the work is done by dom0 in the indirect way.  For dom0,  I added a hypercall PHYSDEVOP_setup_gsi before, and with this hypercall, we can get rid of programing IOAPIC pins from dom0 and hypervisor helps to do the ioapic programming, but we still need to get correct GSI information from hypervisor, otherwise, the hypercall may meet issues.
>
> The GSI information (I think of the polarity and trigger, and as well
> which PCI devices pins correspond to what interrupt) is retrieved via the ACPI _PRT.
>
> The IO-APIC isn't used at all - instead the code in arch/x86/pci/xen.c is utilized
> which does the appropiate lookup and invokes the proper hypercall to bind an
> PIRQ (GSI) to an event channel (and the event channel is then binded to the Linux
> IRQ subsystem).
>
> The hypercall - PHYSDEVOP_apic_read can be implemented here - and I am
> more than open for folks to submit a patch for it - but it does not affect
> how GSI/IO-APIC programming is done. This will just report proper IO-APIC
> information instead of the fabricated one.

Here is the patch.
I tested it against your stable/for-ingo-v3.5.v3 branch and it works well.

[PATCH] xen/apic: implement io apic read with hypercall
https://lkml.org/lkml/2012/4/20/92

Regards,
Lin Ming
0 new messages