Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

For those with P4 SMP problems..

23 views
Skip to first unread message

John Baldwin

unread,
Sep 18, 2002, 5:02:47 PM9/18/02
to
I've hacked some code into acpidump to dump the contents of the ACPI
MADT table which is kind of like mptable for ACPI. It's not real
pretty output though I must admit. :) You can snarf the code from
http://www.FreeBSD.org/~jhb/patches/acpi_dump.tgz. Untar it, then
cd acpi/acpidump ; make ; ./acpidump (last as root) to get the
contents of the MADT table. If people with P4 SMP problems could
run this on one of the machines and send me both the mptable output
and the acpidump output I would appreciate it. Note that the acpidump
stuff will work fine on both stable and current.

--

John Baldwin <j...@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/

To Unsubscribe: send mail to majo...@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message

John Baldwin

unread,
Sep 19, 2002, 11:55:10 AM9/19/02
to

On 19-Sep-2002 Hiten Pandya wrote:

> --- John Baldwin <j...@FreeBSD.org> wrote:
>> I've hacked some code into acpidump to dump the contents of the ACPI
>> MADT table which is kind of like mptable for ACPI. It's not real
>> pretty output though I must admit. :) You can snarf the code from
>> http://www.FreeBSD.org/~jhb/patches/acpi_dump.tgz. Untar it, then
>> cd acpi/acpidump ; make ; ./acpidump (last as root) to get the
>> contents of the MADT table. If people with P4 SMP problems could
>> run this on one of the machines and send me both the mptable output
>> and the acpidump output I would appreciate it. Note that the acpidump
>> stuff will work fine on both stable and current.
>
> Hi.
>
> Would it be good if this utility was added to the repository, under tools/
> or something? Just an idea. :-)

acpidump is already in current. :) I just haven't committed these bits yet.

John Baldwin

unread,
Sep 19, 2002, 12:00:02 PM9/19/02
to

On 18-Sep-2002 Vaclav Musil wrote:
> Hi,
> there are information as you requested.
> Hardware was Intel SE7500CW2, system 4.5-RELEASE-p8 with non-SMP kernel.
>
> Best regards.
>
> Vasek Musil

Well, I've gotten several of these now, and everyone with this problem
seems to have the same exact machine. Someone else said that a slightly
different model machine didn't have this problem. The only difference
I have noted in the various acpidump's is that at least one person had
the Hyperthreading disabled in the BIOS since the two HT CPU's (APIC ID's
of 1 and 7) were disabled (see the FLAGS item for the Local APIC entries).
As far as I can tell, the mptable and acpidump output is ok, so I don't
know what is up with this specific machine.

Does anyonen have this issue on a machine which is not an Intel
SE7500CW2?

beemern

unread,
Sep 20, 2002, 1:01:10 PM9/20/02
to

i edit'd /usr/src/sys/i386/i386/mp_machdep.c , changing the apic loop to
start at x=2 (instead of x=1) since its (apparantly at x=1) where the
problem arises, and it boots up fine, saying that its SMP Motherboard in
dmesg (no errors)

the reason i say that x=1 is the bad apic is because of the panic msg in
the "regular" smp kernel which says:

"AP #1 (PHY #6) failed!"

i'm sure its not this simple, BUT, i'm wondering how to tell for sure if
you really ARE running in SMP or not

even tho the system is booted and running fine, i dont know how to verify
that it is INDEED running in SMP mode

thanks,
nathan

> Well, hmm. You could perhaps try using a hack to force it to start a
> different APIC ID instead using trial and error to figure out which ID's
> work and which one's don't.

John Baldwin

unread,
Sep 20, 2002, 1:31:15 PM9/20/02
to

On 20-Sep-2002 beemern wrote:
>
> i edit'd /usr/src/sys/i386/i386/mp_machdep.c , changing the apic loop to
> start at x=2 (instead of x=1) since its (apparantly at x=1) where the
> problem arises, and it boots up fine, saying that its SMP Motherboard in
> dmesg (no errors)
>
> the reason i say that x=1 is the bad apic is because of the panic msg in
> the "regular" smp kernel which says:
>
> "AP #1 (PHY #6) failed!"
>
> i'm sure its not this simple, BUT, i'm wondering how to tell for sure if
> you really ARE running in SMP or not
>
> even tho the system is booted and running fine, i dont know how to verify
> that it is INDEED running in SMP mode

Well, in teh above message, 6 is the APIC ID, not 1. Does it actually
lauch the other CPU? Probably not. Your change probably just made it
skip launching the CPU. What you want to do instead is something more
like this:

Index: mp_machdep.c
===================================================================
RCS file: /usr/cvs/src/sys/i386/i386/mp_machdep.c,v
retrieving revision 1.189
diff -u -r1.189 mp_machdep.c
--- mp_machdep.c 28 Aug 2002 23:24:05 -0000 1.189
+++ mp_machdep.c 20 Sep 2002 17:12:11 -0000
@@ -2134,6 +2134,8 @@

/* get the PHYSICAL APIC ID# */
physical_cpu = CPU_TO_ID(logical_cpu);
+ /* XXX: override hack */
+ physical_cpu = 1;

/* calculate the vector */
vector = (boot_addr >> 12) & 0xff;

See if that works. Then try setting it to '2', '3', '4', '5', '6',
'7', '8', '9', '10', '11' and see which ones work and which ones
don't. It may just be that the mptable (and the APIC table) have
the wrong ID for the second CPU. Turn off the Hyperthreading in
the BIOS so it is easier to figure out what the ID of the second
physical processor is.

--

John Baldwin <j...@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/

To Unsubscribe: send mail to majo...@FreeBSD.org

beemern

unread,
Sep 20, 2002, 2:22:07 PM9/20/02
to

thanks for the diff

i made the (first) change; setting to 1, and dmesg gives:

--snip--
real memory = 536346624 (523776K bytes)
avail memory = 517992448 (505852K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 24 pins in IOAPIC #1
Programming 24 pins in IOAPIC #2
FreeBSD/SMP: Multiprocessor motherboard
cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000
cpu1 (AP): apic id: 6, version: 0x00000000, at 0xfee00000
io0 (APIC): apic id: 2, version: 0x00178020, at 0xfec00000
io1 (APIC): apic id: 3, version: 0x00178020, at 0xfec80000
io2 (APIC): apic id: 4, version: 0x00178020, at 0xfec80400
Preloaded elf kernel "kernel" at 0xc03b2000.
--snip--

so theres no error, but my inclination is that its still not enabling the
2nd cpu. again, i'm not sure how to verify for SURE whether it is or
not.. i've just been running top and looking for 2 cpu's running

is there something in dmesg i should be looking for in particular?

i'm going to try now by setting physical_cpu to 2..11 as you indicated and
see what happens

thanks
nathan


> Well, in teh above message, 6 is the APIC ID, not 1. Does it actually
> lauch the other CPU? Probably not. Your change probably just made it
> skip launching the CPU. What you want to do instead is something more
> like this:

John Baldwin

unread,
Sep 20, 2002, 2:31:20 PM9/20/02
to

On 20-Sep-2002 beemern wrote:
>
> thanks for the diff
>
> i made the (first) change; setting to 1, and dmesg gives:
>
> --snip--
> real memory = 536346624 (523776K bytes)
> avail memory = 517992448 (505852K bytes)
> Programming 24 pins in IOAPIC #0
> IOAPIC #0 intpin 2 -> irq 0
> Programming 24 pins in IOAPIC #1
> Programming 24 pins in IOAPIC #2
> FreeBSD/SMP: Multiprocessor motherboard
> cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000
> cpu1 (AP): apic id: 6, version: 0x00000000, at 0xfee00000
> io0 (APIC): apic id: 2, version: 0x00178020, at 0xfec00000
> io1 (APIC): apic id: 3, version: 0x00178020, at 0xfec80000
> io2 (APIC): apic id: 4, version: 0x00178020, at 0xfec80400
> Preloaded elf kernel "kernel" at 0xc03b2000.
> --snip--
>
> so theres no error, but my inclination is that its still not enabling the
> 2nd cpu. again, i'm not sure how to verify for SURE whether it is or
> not.. i've just been running top and looking for 2 cpu's running
>
> is there something in dmesg i should be looking for in particular?
>
> i'm going to try now by setting physical_cpu to 2..11 as you indicated and
> see what happens

APIC ID 1 is the second logical CPU (hyperthreading) on the first physical
CPU. Do you have hyperthreading turned on in your BIOS? See if you
can turn it off for these tests. Thanks.

--

John Baldwin <j...@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/

To Unsubscribe: send mail to majo...@FreeBSD.org

beemern

unread,
Sep 20, 2002, 2:46:43 PM9/20/02
to
yes, hyperthreading is definitely off in the bios

> APIC ID 1 is the second logical CPU (hyperthreading) on the > first physical
CPU. Do you have hyperthreading turned on in > your BIOS? See if you can
turn it off for these tests.

beemern

unread,
Sep 20, 2002, 3:33:53 PM9/20/02
to

ok, i edit'd as you indicated:

--snip--


/* get the PHYSICAL APIC ID# */
physical_cpu = CPU_TO_ID(logical_cpu);

printf("CPU_TO_ID of logical_cpu: %d gives physical_cpu %d\n",
logical_cpu,physical_cpu);

/* XXX: override hack divemojo */
physical_cpu = 11;
printf("OVERRIDE physical_cpu to: %d\n",physical_cpu);

/* calculate the vector */
vector = (boot_addr >> 12) & 0xff;

--snip--

and i set physical_cpu to 1,2,3...,11 as you indicated

rebooting after make'ing and install'ing each kernel gave:

--snip--


avail memory = 517992448 (505852K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 24 pins in IOAPIC #1
Programming 24 pins in IOAPIC #2

CPU_TO_ID of logical_cpu: 1 gives physical_cpu 6
OVERRIDE physical_cpu to: 3
WOOT WOOT WOOT


AP #1 (PHY# 6) failed!

panic y/n? [y] FreeBSD/SMP: Multiprocessor motherboard


cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000
cpu1 (AP): apic id: 6, version: 0x00000000, at 0xfee00000
io0 (APIC): apic id: 2, version: 0x00178020, at 0xfec00000
io1 (APIC): apic id: 3, version: 0x00178020, at 0xfec80000
io2 (APIC): apic id: 4, version: 0x00178020, at 0xfec80400
Preloaded elf kernel "kernel" at 0xc03b2000.
--snip--

where the ONLY difference, is the printf of the OVERRIDE physical_cpu,
which was obviously different on each instance since i was changing it in
the code (and just wanted to verify it in dmesg)

basically, EACH kernel edit gave the "panic" msg as indicated in the
snippet
so... any other ideas to try at this point?

thanks
nathan

John Baldwin

unread,
Sep 20, 2002, 3:55:42 PM9/20/02
to

Grrr. Hmm, might need to do the override in CPU_TO_ID() instead
(overriding for logical cpu 1) so it is consistent everywhere. I'm
curious, if you boot up linux what apic id does it give for the
second CPU?

--

John Baldwin <j...@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/

To Unsubscribe: send mail to majo...@FreeBSD.org

Jonathan R Feldkamp

unread,
Sep 20, 2002, 7:35:54 PM9/20/02
to
Ok, my start_ap(...) as it is right now is at the bottom of this file.

I have added to what beemern's work. I set both the physical_cpu and the
logical_cpu from 1 to 8, if you think it is necessary i'll go clear up to
15. I added a little debugging around the RESETs and STARTUPs just to
see it, and increased the timeout from 5 seconds to 10 seconds. All of
this had basically no effect. start_ap(...) is always returning 0 because
mp_ncpus is never being incremented.

So, the CPU is never being started right? I take it an interrupt of some
kind is supposed increment mp_ncpus? I greped through all of the code in
/sys/ but couldn't find where it was supposed to happen at.

What do you think is going on?

Thanks for you help,
jon

-- snip from dmesg --

Copyright (c) 1992-2002 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 4.6.2-RELEASE-p2 #14: Fri Sep 20 18:20:11 CDT 2002
ro...@osage.telecom.ksu.edu:/usr/src/sys/compile/OSAGE
Timecounter "i8254" frequency 1193182 Hz
CPU: Pentium 4 (2192.89-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0xf24 Stepping = 4

Features=0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,<
b28>,ACC>


real memory = 536346624 (523776K bytes)

avail memory = 518197248 (506052K bytes)


Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 24 pins in IOAPIC #1
Programming 24 pins in IOAPIC #2
CPU_TO_ID of logical_cpu: 1 gives physical_cpu 6

overiding, physical_cpu = 8, logical_cpu = 8
waiting for pending status end (INIT IPI: assert RESET)
done
waiting for pending status end (INIT IPI: deassert RESET)
done
waiting for pending status end (STARTUP IPI)
done
waiting for pending status end (STARTUP IPI {second time})
done
error while waiting for cpu...


AP #1 (PHY# 6) failed!
panic y/n? [y] FreeBSD/SMP: Multiprocessor motherboard
cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000
cpu1 (AP): apic id: 6, version: 0x00000000, at 0xfee00000
io0 (APIC): apic id: 2, version: 0x00178020, at 0xfec00000
io1 (APIC): apic id: 3, version: 0x00178020, at 0xfec80000
io2 (APIC): apic id: 4, version: 0x00178020, at 0xfec80400

Preloaded elf kernel "kernel" at 0xc0380000.
md0: Malloc disk
Using $PIR table, 17 entries at 0xc00fdeb0

-- end of dmesg snip --


-- snip of mp_machdep.c --
static int
start_ap(int logical_cpu, u_int boot_addr)
{
int physical_cpu;
int vector;
int cpus;
u_long icr_lo, icr_hi;

POSTCODE(START_AP_POST);

/* get the PHYSICAL APIC ID# */
physical_cpu = CPU_TO_ID(logical_cpu);

printf("CPU_TO_ID of logical_cpu: %d gives physical_cpu %d\n",
logical_cpu,physical_cpu);

/* XXX: override hack divemojo */

physical_cpu = 8;
logical_cpu = 8;
printf("overiding, physical_cpu = %d, logical_cpu = %d\n",
physical_cpu, logical_cpu);

/* calculate the vector */
vector = (boot_addr >> 12) & 0xff;

/* used as a watchpoint to signal AP startup */
cpus = mp_ncpus;

/*
* first we do an INIT/RESET IPI this INIT IPI might be run, reseting
* and running the target CPU. OR this INIT IPI might be latched (P5
* bug), CPU waiting for STARTUP IPI. OR this INIT IPI might be
* ignored.
*/

/* setup the address for the target AP */
icr_hi = lapic.icr_hi & ~APIC_ID_MASK;
icr_hi |= (physical_cpu << 24);
lapic.icr_hi = icr_hi;

/* do an INIT IPI: assert RESET */
icr_lo = lapic.icr_lo & 0xfff00000;
lapic.icr_lo = icr_lo | 0x0000c500;

/* wait for pending status end */
printf("waiting for pending status end (INIT IPI: assert
RESET)\n");
while (lapic.icr_lo & APIC_DELSTAT_MASK); /* spin */
printf("done\n");

/* do an INIT IPI: deassert RESET */
lapic.icr_lo = icr_lo | 0x00008500;

/* wait for pending status end */
u_sleep(10000); /* wait ~10mS */
printf("waiting for pending status end (INIT IPI: deassert
RESET)\n");
while (lapic.icr_lo & APIC_DELSTAT_MASK); /* spin */
printf("done\n");

/*
* next we do a STARTUP IPI: the previous INIT IPI might still be
* latched, (P5 bug) this 1st STARTUP would then terminate
* immediately, and the previously started INIT IPI would continue. OR
* the previous INIT IPI has already run. and this STARTUP IPI will
* run. OR the previous INIT IPI was ignored. and this STARTUP IPI
* will run.
*/

/* do a STARTUP IPI */
lapic.icr_lo = icr_lo | 0x00000600 | vector;
printf("waiting for pending status end (STARTUP IPI)\n");
while (lapic.icr_lo & APIC_DELSTAT_MASK); /* spin */
printf("done\n");
u_sleep(200); /* wait ~200uS */

/*
* finally we do a 2nd STARTUP IPI: this 2nd STARTUP IPI should run IF
* the previous STARTUP IPI was cancelled by a latched INIT IPI. OR
* this STARTUP IPI will be ignored, as only ONE STARTUP IPI is
* recognized after hardware RESET or INIT IPI.
*/

lapic.icr_lo = icr_lo | 0x00000600 | vector;
printf("waiting for pending status end (STARTUP IPI {second
time})\n");
while (lapic.icr_lo & APIC_DELSTAT_MASK);
printf("done\n");
u_sleep(200);

/* wait for it to start */
set_apic_timer(10000000);/* == 10 seconds */
while (read_apic_timer())
if (mp_ncpus > cpus)
return 1; /* return SUCCESS */

printf("error while waiting for cpu...\n");
return 0;
}

-- end of mp_machdep.c snip --

Terry Lambert

unread,
Sep 20, 2002, 9:07:38 PM9/20/02
to
Jonathan R Feldkamp wrote:
> Ok, my start_ap(...) as it is right now is at the bottom of this file.
>
> I have added to what beemern's work. I set both the physical_cpu and the
> logical_cpu from 1 to 8, if you think it is necessary i'll go clear up to
> 15. I added a little debugging around the RESETs and STARTUPs just to
> see it, and increased the timeout from 5 seconds to 10 seconds. All of
> this had basically no effect. start_ap(...) is always returning 0 because
> mp_ncpus is never being incremented.
>
> So, the CPU is never being started right? I take it an interrupt of some
> kind is supposed increment mp_ncpus? I greped through all of the code in
> /sys/ but couldn't find where it was supposed to happen at.
>
> What do you think is going on?

He already said: the real CPU IDs don't match the APIC IDs from
the BIOS.

My guess as to why the other systems work is that they start the
CPUs in parallel, which basically means that there are matching
CPUs started at the time of the check, but that the ones which
match are not the ones which were started.

If the APIC ID isn't used again, you could make this "work" by
changing the code that calls start_ap() seperate it into a
"start_ap()" and a "check_ap()", and then:

for( i = 0; i < NCPU; i++) {
try |= 1<<i;
start_ap(i, ...);
}

/* wait for it to start */

set_apic_timer(10000000);/* == *up to* 10 seconds */
while (read_apic_timer()) { /* wait for them to start */
if ((did & try) == try) {
/* blah blah blah SUCCESS */
goto go_on_with_life;
}
}

/* blah blah blah FAILED */

go_on_with_life:

For extra points, you could have each CPU record its real APIC ID
into the array, so more things will work later on.


-- Terry

beemern

unread,
Sep 21, 2002, 11:53:05 AM9/21/02
to

i'm preparing to start in on Terry Lambert's suggestion, however,
perhaps you (anyone) could clear up a few minor questions..

-he says other systems are "matching CPUs started at the time of the
check"
..matching them with what?

-also, shouldn't our whole exercise of exhuastively hardcoding the apic
for cpu1 from 1 to 11 have found out which one was the REAL one?

-finally, it appears Mr. Lambert is suggesting 2 mutually exclusive
solutions (correct?) ..where the second one ("For extra points...") looks
like the more complete and "right" solution, however, as noted in the
previous question, shouldn't we have hit upon the correct id already by
playing with the physical_cpu and CPU_TO_ID() as i and Mr. Feldkamp have
been?

thanks for any further input/direction you can give.. i'm gonna poke
around in the src and find where the cpu->apic assignments are made
originally and just see what i can see

thanks
nathan

--my previous (rejected) smp list post--


John Baldwin wrote:
>
> Grrr. Hmm, might need to do the override in CPU_TO_ID() instead
> (overriding for logical cpu 1) so it is consistent everywhere. I'm
> curious, if you boot up linux what apic id does it give for the
> second CPU?

fyi, i messed with the CPU_TO_ID() array directly (so it would be
consistent as you suggested) and tried all values from 1 to 11 inclusive
again, with the exact same results as before

it is possible that the the 2nd cpu's APIC really IS at 6, and that
the issue is with the 80-some lines of the start_ap() function instead?

a buddy (with exact same system) and i haved dived into this, but
its slow going at this point just to try to figure out what the heck
we're lookin at...

if you think of anything, or need us to run more tests, let us know

thanks
nathan
--end--

Terry Lambert

unread,
Sep 21, 2002, 2:48:10 PM9/21/02
to
beemern wrote:
> i'm preparing to start in on Terry Lambert's suggestion, however,
> perhaps you (anyone) could clear up a few minor questions..
>
> -he says other systems are "matching CPUs started at the time of the
> check"
> ..matching them with what?

The theory is that the BIOS has the corect information, but in the
wrong order, and FreeBSD cares about the order, but Linux and Windows
do not, because they;ve performed an additional optimization that lets
them start the APs simultaneously, and a side effect of this is that
they don't care about order of start, they just care *that* they start.


/*
* start each AP in our list
*/
static int
start_all_aps(u_int boot_addr)
{
...
/* start each AP */
for (x = 1; x <= mp_naps; ++x) {
...
bootSTK = &SMP_prvspace[x].idlestack[UPAGES*PAGE_SIZE];
bootAP = x;

/* attempt to start the Application Processor */
...
if (!start_ap(x, boot_addr)) {
...
}
...
/* record its version info */
cpu_apic_versions[x] = cpu_apic_versions[0];

all_cpus |= (1 << x); /* record AP in CPU map */
}
...
/*
* this function starts the AP (application processor) identified
* by the APIC ID 'physicalCpu'. It does quite a "song and dance"
* to accomplish this. This is necessary because of the nuances
* of the different hardware we might encounter. It ain't pretty,
* but it seems to work.
*/


static int
start_ap(int logical_cpu, u_int boot_addr)
{

...


/* get the PHYSICAL APIC ID# */
physical_cpu = CPU_TO_ID(logical_cpu);

...
}


...basically, what is happening here is that there is an iteration
through all of the logical CPUs, which is then used to start the
physical CPUs.

If they are all started at the same time, and the results are
collected before they are compared, as in Linux or Windows NT,
then the results are that they all start.

If they are attempted to be started serially, and the results
are also collected serially, then the result is that they do
not start.

The implication here is clear: serial start fails because the
APIC ID the BIOS claims is assigned to each CPU is not the APIC
ID which was actually assigned to the CPU. But the concurrent
startup works, because the set of IDs known to the BIOS matches
the set of IDs assigned to the CPUs.

So you get the right answer to the "has this started?" question,
but you don't get the answer from the physical CPU you expected.

The serial start depends on getting the correct answer from the
CPU you expected, rather than just getting the correct answer
and not caring about the man behind the curtain.

Probably, the canonically correct thing to do would be to start
each CPU with code that reassigns it's real APIC ID into the
logical APIC ID, so that there is no longer a mismatch.


> -also, shouldn't our whole exercise of exhuastively hardcoding the apic
> for cpu1 from 1 to 11 have found out which one was the REAL one?

Not really. If you look at the code, there's a bunch of coupled
information. By serially attempting to start it, you assume not
only that the APIC ID that the BIOS erroneously believes to be
correct is used, but that the associated stack and other information
is also known to the processor.

Basically, you aren't going to be able to safely do about 4 things
in the same order, and expect them to work. The start_all_aps()
code needs to be refactored, amd the start_ap() code needs to be
broken into between 3 and 5 parts (depending on how you handle
making the APs correspond to the the logical APs), and unrolled
so that it can b. done concurrently, instead of depending on serial
success.


> -finally, it appears Mr. Lambert is suggesting 2 mutually exclusive
> solutions (correct?) ..where the second one ("For extra points...") looks
> like the more complete and "right" solution, however, as noted in the
> previous question, shouldn't we have hit upon the correct id already by
> playing with the physical_cpu and CPU_TO_ID() as i and Mr. Feldkamp have
> been?
>
> thanks for any further input/direction you can give.. i'm gonna poke
> around in the src and find where the cpu->apic assignments are made
> originally and just see what i can see

You should be able to start everything up, not caring about the
logical vs. physical APIC ID mapping, as long as you start all the
CPUs. What will break, however, is that if the BIOS doesn't simply
contain the right physical APIC IDs, out of order, or if you need
to send a targetted IPI, instead of a broadcast IPI.

So the two "solutions" boil down to correcting the physical/logical
mapping, or reloading the physical APIC ID register. Either one
works, but reloading the register lets you get rid of the logical
and physical indirection (assuming you shove the I/O APIC off to
ID 31, the last ID). How correct, and when correct, these have to
be really depends on how often the logical to physical translation
happens, in order to explicitly signal a CPU. I'd have to read all
the -current code in considerable detail to answer that question, or
just punt, and come up with a fix where the answer to the question
ends up not mattering. That's the way I prefer... 8-).

Rewriting the APIC ID in each auxillary CPU is a pain in the neck;
the BIOS does it by holding 5 bits worth of pins on each CPU to a
specific value. You can do it, in theory: the code should not need
the BIOS to do the assignment to function, if you don't care about
not starting some CPUs, or starting particular ones... that gets
around all the normal BIOS bugs related to CPU detection, but it's
a much harder problem to solve, since you have to have a free APIC
ID to let you shuffle things around (hence the extra points ;^)).

-- Terry

0 new messages