I was just reading some stuff over on Intel's site about the new
hyperthreading technology in their new P4 processors. I was wondering if
you had any idea if/when we'd see this stuff in our kernels or not?
Regards,
Jack
would not the SMP-aware kernel automagically do it???
eric...
> On Mon, 18 Nov 2002 01:52:56 UTC, Jack Troughton
> would not the SMP-aware kernel automagically do it???
That has occurred to me... but I figured I'd go to the mountain and find
out for sure:)
No, these aren't physical available processors, they are 'only' logical.
Currently only WinXP and Linux with Kernel > 2.4.18 can use Hypertherading.
Btw. the abbreviation is SMT (simultaneous multithreading) and not SMP.
Klaus Staedtler
Well, I read some of the docs at developer.intel.com, and it looks to me
like all it does is make the system LOOK like an SMP system. There are
certain optimisations that need to be made to get the most out of
hyperthreading, but even without them it's possible to get benefits.
OTOH, I also know that I know nothing (little Socrates, there;)... I'm
sure that there are some other people here that could shed a lot more
light on the subject.
Regards,
Jack
There's only one FPU (which includes MMX and the like).
A normal SMP OS trying to save and restore processor contexts would
probably make a big mess of things.
The OS has to be smart enough to not schedule two threads using floating
point instructions at the same time.
In general, it probably also has to be dumb enough to make sure two
threads from the same process are running at the same time, to avoid
thrashing the cache to pieces.
It'll be interesting to see real-world benchmarks of how it actually
performs. I predict a very limited scope of better performance.
--
- Mike
Remove 'spambegone.net' and reverse to send e-mail.
:>Hi Scott!
:>
Jack, you must have missed Kim's announcement that the SMP support in eCS
works just dandy with HyperThreading. This according to someone at Intel
that tested it.
Chris Stumpf
C.S.E. Computer Services
Computer Consultant (OS/2, Lan, Wan, CTI)
Serenity Systems Channel Partner
IBM Certified Systems Expert - OS/2 Warp 4
email: cst...@monmouth.com
phone: (732)496-4699
> Jack, you must have missed Kim's announcement that the SMP support in eCS
> works just dandy with HyperThreading. This according to someone at Intel
> that tested it.
As I expected. I had heard of issues with Windows 2000, but the blame lies
solely with Microsoft, as their processor tax, er, licensing, would consider
a single hyperthreaded P4 to be two processors. Therefore, in a dual CPU
workstation with two P4 processors, Windows 2000 Pro would think it had four
installed and not enjoy any benefits.
regards,
- dink
On Mon, 18 Nov 2002 17:16:54 GMT, Stephen Eickhoff wrote:
:
:"Chris Stumpf" <cst...@monmouth.com> wrote in message
:
:
well in the mainframe world, when you run VM, it provides virtual machines
to any other OS (the OS has NO way of knowing if it is running on a
physical or logical processor.)
if intel did it right the same would be TRUE!
eric...
>Hyperthreading:
>1. If it really currently works on OS/2 (including eCS), I'd be
>deeply, deeply shocked.
I've read one claim that it does... but OTOH, it would be in Intel's
interest to make it as transparent as possible to the system.
What I've heard is that it works with the SMP kernel.
As always, testing it is the only way to know for sure... and since
I'm not planning on springing for one of those things anytime in the
near future, my personal test is going to have to wait:)
>2. It's not a kernel change that's required, it's an OS2APIC
>change.
That would make sense, considering what I've read about it on the
Intel developer's site.
>3. It's on my list of things to do in my spare time, but don't hold
>your breath.
:-8 <- cheeks distended, face turning blue:)
>4. Personally, I find it hard to believe that using an SMP kernel
>(including Windoze) and recoginizing two logical CPUs is any faster
>than an UNI kernel with one physical one. If there's testing to the
>contrary on the Intel site, I've missed it.
IME, the threading in Windows is not as efficient as the threading
in OS/2. It might be a hardware hack to get around those
limitations...
>5. The discussion about FPUs is bogus. Ignore it. This stuff is
>pretty well understood and handled already in SMP systems. Whether
>or not the other FPU is virtual is immaterial.
Well, that's good... and might help explain why it works with SMP
OS/2. That's assuming that it does actually improve performance, and
not just function without cracking up, of course...
Interesting times:)
--
-------------------------------------------------------------------
* Jack Troughton jake at consultron.ca *
* http://consultron.ca irc.ecomstation.ca *
* Laval Québec Canada news://news.consultron.ca *
-------------------------------------------------------------------
Scott,
Here is part of the original message I received from a friend at Intel (he
was testing the SMP peer-fix for us on his own time - and without his
permission, I would rather not disclose the entire email - just the portion
that's relevant):
============================
I set up a test platform for eCS GA it has
4 - P4 running @ 2.6GHz.
1 - system drive (HPFS) hung off a Adaptec 39160
1 - data drive 160M (JFS) on another Adaptec 39160
this drive is then shared out using peer 2GB RAM
1- Intel Pro 1000 (Gbit over fiber)
Install of eCS smp GA and the smp8603.zip fix.
<snip portion regarding details of the test>
This machine is also capable of HT -- HyperThreading
Are you aware of the HyperThreading Technology ?
This machine is able to present 8 processors to OS/2
when I enable HT. Of course this is a test vehicle only, not GA.
http://www.intel.com/technology/hyperthread/
Has any analysis been done on OS/2 with HT enabled ?
============================
If it's relevant, I can get more details from our friend.
>his machine is able to present 8 processors to OS/2 when I enable HT.
Capable, certainly. But the real question is, does OS/2 *use* the capability. I don't think so,
though I don't say this for sure. If your friend said "I turned on mpcpumon and it showd 8 CPUs",
then I would believe that.
As for the rest (performance comments), it will be interesting to see the benchmarks when (if) I do the
work on the APIC driver.
> As for the rest (performance comments), it will be interesting to see the benchmarks when (if) I do the
> work on the APIC driver.
>
What is APIC???
APIC is the SMP hardware (guessing) does I/O interupts and other
things. The PSD driver implements a set of APIs the kernel uses to
interface with the SMP hardware, It provides CPU spin locks and things
like that ...
The PSD=OS2APIC.PSD statement in the config.sys file loads it.
--
Lorne Sunley
Advanced Programable Interrupt Controler
The logical PCU in an HT P4 has it's own so the system'll find 2 APICs
and hence should find 2 CPUs.
As for kernel/PSD stuff it's mosly about use of the PAUSE instruction
on spinlook waits which is a NOP with a prefix an hence irgnored on all
other x86 CPUs.
Other things like thread stack trashing the cache on windows as they are
1MB aligned are application spec and easily fixable by an alloca(64*n)
in thread n.
Btw what is the stack alignment on OS/2 for different threads in a process?
-=-
Alan
USBGuy wrote:
> . . . the nice thing is that XP Home edition does support HT
Wrong. VM provides a way of querying if an OS runs in a virtual machine or
not.
That's why you can run VM in a VM, bot not VM in a VM in a VM.
>
>if intel did it right the same would be TRUE!
>
>eric...
Nick
> On Mon, 18 Nov 2002 20:07:49 GMT, eric w wrote:
>
>> >well in the mainframe world, when you run VM, it provides virtual
machines
> >to any other OS (the OS has NO way of knowing if it is running on a
> >physical or logical processor.)
>
> Wrong. VM provides a way of querying if an OS runs in a virtual machine or
> not.
> That's why you can run VM in a VM, bot not VM in a VM in a VM.
>
> >
i don't believe i am wrong.
yes there is a way to find out if you are running under VM, BUT this is to
take advantage of various microcode assists, etc.
in fact under VSE one had the option to run VMaware or natively. granted
you pay a price in performance but it should be possible.
eric..
>> >to any other OS (the OS has NO way of knowing if it is running on a
-----------------------------=====================================
>> >physical or logical processor.)
------=====================
>>
>> Wrong. VM provides a way of querying if an OS runs in a virtual machine or
>> not.
>> That's why you can run VM in a VM, bot not VM in a VM in a VM.
>>
>> >
>
>i don't believe i am wrong.
>
>yes there is a way to find out if you are running under VM, BUT this is to
-===========================================
>take advantage of various microcode assists, etc.
>
>in fact under VSE one had the option to run VMaware or natively. granted
>you pay a price in performance but it should be possible.
>
>eric..
No need to argue,
Nick
I think the theory is that great lumps of a processor sit idle while
other pieces are working. For example, the FPU will sit idle while the
ALU is working, and vice versa. If you can get more pieces working at
the same time, overall you should see an improvement in throughput, but
it sounds as though you'd need a fairly mixed workload to get the most
out of it.
Graham.
--
*-* Please remove spam free prefix before replying *-*
:>lol os/2 has the same processor tax.. unless you hack the smp kernel/os2apic.*
:>out of a publically available fixpak and get updates from testcase
:>(which is completely doable :)
:>
:>
:>regards,
:>- dink
:>
Huh? Please explain. As far as I know, the SMP support in eCS or WSeB will
work with up to 64 processors, no shenaigans involved.
:>Hyperthreading:
:>1. If it really currently works on OS/2 (including eCS), I'd be deeply, deeply shocked.
Well, Kim posted that someone at Intel tested eCS with the SMP kernel and it
worked just fine according to them.
:>2. It's not a kernel change that's required, it's an OS2APIC change.
:>3. It's on my list of things to do in my spare time, but don't hold your breath.
:>4. Personally, I find it hard to believe that using an SMP kernel (including Windoze) and recoginizing two
:> logical CPUs is any faster than an UNI kernel with one physical one. If there's testing to the contrary
:> on the Intel site, I've missed it.
:>5. The discussion about FPUs is bogus. Ignore it. This stuff is pretty well understood and handled already
:> in SMP systems. Whether or not the other FPU is virtual is immaterial.
:>
:>
> On Mon, 18 Nov 2002 12:39:41 -0500 (EST), dinkmeister wrote:
>
> :>lol os/2 has the same processor tax.. unless you hack the smp
> kernel/os2apic.*
> :>out of a publically available fixpak and get updates from testcase
> :>(which is completely doable :)
> :>
> :>
> :>regards,
> :>- dink
> :>
>
> Huh? Please explain. As far as I know, the SMP support in eCS or
> WSeB will
> work with up to 64 processors, no shenaigans involved.
Well, it was designed to handle up to that many processors, but
AFAIK it's only been tested up to 16.
It's not like machines like that are common, though.
It would be cool to see what it would act like on a machine like
that, though. Imagine, 32 instances of dnetc or seti. You could
pound a lot of data on a machine like that.
A well threaded webserver could probably push a lot of data out too.
Hehehe.:)
Regards,
Jack
Please read Kim's posting more carefully. Someone at Intel has installed
eCS SMP on a HT capable machine, but he didn't say that he turned HT on.
Klaus Staedtler
:>
:>Well, it was designed to handle up to that many processors, but
:>AFAIK it's only been tested up to 16.
:>
:>It's not like machines like that are common, though.
:>
:>It would be cool to see what it would act like on a machine like
:>that, though. Imagine, 32 instances of dnetc or seti. You could
:>pound a lot of data on a machine like that.
:>
:>A well threaded webserver could probably push a lot of data out too.
:>
:>Hehehe.:)
Well, someone reported to Kim that they tested eCS SMP on a 32 way Xenon box
and it worked just fine. BTW, I was sitting next to Kim when he got the
email. We both thought that was very cool. As for a webserver, well the
limitation is not cpu, but I/O. The PCI bus will get flooded way before the
cpu gets saturated on a webserver.
:>
Actually he did. Read it again.
>:>Hyperthreading:
>:>1. If it really currently works on OS/2 (including eCS), I'd be deeply, deeply shocked.
>Well, Kim posted that someone at Intel tested eCS with the SMP kernel and it
>worked just fine according to [him].
Actually, if you read the tester's comments very carefully, you will realize that he did not, in fact,
verify that OS/2 used the logical processors, only that OS/2 runs on a machine that is HT-capable.
This is a VERY different statement.
Further, the comments by "USBGuy" about what the PSD and SMP support are all about are
completely wrong. I don't have time to go into great detail, but there are a number of hardware
abstraction services provided by the PSD, including counting, starting, stopping CPUs, sending
IPIs, and hardware-specific timer services, among other things. Further, there are numerous
SMP-specific changes that have nothing to do with spinlock. Lastly spinlocks are themselves
utterly different from his comments.
This is the last reply I will make in this thread, which now goes into my kill file.
Bottom line:
1. OS/2 does NOT currently make any distinction between a system where the P4 is HT-capable
or not. I will *probably* eventually change os2apic as necessary to use this.
2. I am NOT convinced, Intel marketing aside, that OS/2 will run faster with an SMP kernel using
2 (or 4) logical CPUs as opposed to a UNI kernel with 1 physical CPU. I do not say it won't,
only that it is not obvious.
3. This stuff about apps needing to change to take adavantage of HT is crap. Apps need to be
written to take advantage of true threading on an OS with an efficient threads mechanism.
Then, whether the scheduling is done on two physical CPUs or whatever is another story.
I tend to think that HT is a marketing tool, although I find plausible the speculation by the poster who
theorized that HT is a response to poor task-switch performance on Windoze.
>On Mon, 18 Nov 2002 23:41:17 -0500, Jack Troughton wrote:
>
>:>
>:>Well, it was designed to handle up to that many processors, but
>:>AFAIK it's only been tested up to 16.
>:>
>:>It's not like machines like that are common, though.
>:>
>:>It would be cool to see what it would act like on a machine like
>:>that, though. Imagine, 32 instances of dnetc or seti. You could
>:>pound a lot of data on a machine like that.
>:>
>:>A well threaded webserver could probably push a lot of data out too.
>:>
>:>Hehehe.:)
>
>Well, someone reported to Kim that they tested eCS SMP on a 32 way Xenon box
>and it worked just fine. BTW, I was sitting next to Kim when he got the
>email. We both thought that was very cool. As for a webserver, well the
>limitation is not cpu, but I/O. The PCI bus will get flooded way before the
>cpu gets saturated on a webserver.
Well, that's cool. 32 way... for the compensatory CIO;)
The PCI bus would go long before the CPUs... but there are other
buses. With a machine like that, you're not going to be talking
about cheap components anywhere.
At any rate, for me, this is all idle speculation. I'm not going to
be using anything more than a two way for the foreseeable future
anyway:)
I tried to get a confirmation about this. And this is what he said:
==========================================
HT works with all operating systems... I've seen so far. The problem is HT
does not benefit all applications and operating systems. There are some
applications broken by HT.
For applications and OS to gain benefits from HT they generally must be
recompiled and optimized for HT.
Apps that are multi-threaded tend to be the best candidates for benefits of
HT.
==========================================
End of quoted message.
Intel does not test OS/2. He did it on his own time. He did say some
applications got broken - like SDD.
Scott E. Garfinkle wrote:
>>his machine is able to present 8 processors to OS/2 when I enable HT.
>
> Capable, certainly. But the real question is, does OS/2 *use* the capability. I don't think so,
> though I don't say this for sure. If your friend said "I turned on mpcpumon and it showd 8 CPUs",
> then I would believe that.
>
> As for the rest (performance comments), it will be interesting to see the benchmarks when (if) I do the
> work on the APIC driver.
As far as I understand this issue, taking advantage of HT requires
evaluation of the _MAT (multiple APIC table entry) ACPI object resulting
in a filled MADT structure, which in turn contains the required info to
find the additional LAPICs (very basic description, details left out).
An OS/2 ACPI/APIC PSD is in its embryonic stage (trying to figure out
how minimum ACPI support can be implemented efficiently (in terms of
programmer resources) without requiring changes to other parts of OS/2
or drivers).
Ciao,
Dani
>:>Please read Kim's posting more carefully. Someone at Intel has installed
>:>eCS SMP on a HT capable machine, but he didn't say that he turned HT on.
>:>
>:>Klaus Staedtler
>:>
>Actually he did. Read it again.
Okay, okay, guys. I am trying to get another confirmation about this.
A question came up that he might have run it and then "assumed" that OS/2 is
using multiple CPUs because he was running the SMP kernel. I am trying to
get him to look more carefully and reply more specifically whether OS/2 is
actually seeing and using multiple CPUs.
Please stay tune.
>:>Hyperthreading:
>:>1. If it really currently works on OS/2 (including eCS), I'd be deeply, deeply shocked.
>
>Well, Kim posted that someone at Intel tested eCS with the SMP kernel and it
>worked just fine according to them.
I am working to get a more precise clarification as to what my friend meant
by "it worked".
Okay, so far, this is where we stand.
From our friend at Intel:
==========================
If you have HT enabled processors and HT enabled BIOS.
If you look at the CPU monitor you will see all physical and logical
processors.
OS/2 cannot tell the difference between a physical or a logical processor.
Example:
2- physical cpu
HT enabled --- you will see 4 cpu in the CPU monitor
3- physical cpu
HT enabled -- you will see 6 cpu in the CPU monitor
==========================
So, I believe our friend definitely *did* run eCS/Pro in HT mode and CPU
monitor is showing that OS/2 *is* seeing multiple CPUs. No indication on
performance or anything - and some application might not run properly (he
only mentioned SDD).
There is still question why some had saw only one CPU when they tried it and
it appears to be related to the BIOS. If the BIOS is not HT enabled, OS/2
would see only one CPU - that's how I understand it.
> On Tue, 19 Nov 2002 15:37:59 GMT, Jack Troughton wrote:
> :>At any rate, for me, this is all idle speculation. I'm not going to
> :>be using anything more than a two way for the foreseeable future
> :>anyway:)
> :>
> Well, when the AMD Hammer hits the market, there will be 4 way
> SMP boards
> for it. And from what I hear the manufacturers plan on doing 4
> way boards
> for the Athalon too.
A four way Athlon would be very cool. I'd better start saving those
pennies:)
Even with my limited exposure to SMP, I gotta say it's good.
I bet that if Warp had become the dominant OS instead of Windows,
SMP would be a lot more popular today. The platform gets such a HUGE
benefit out of running on an SMP system.
Regards,
Jack
> I tend to think that HT is a marketing tool, although I find
> plausible the speculation by the poster who theorized that HT is
> a response to poor task-switch performance on Windoze.
Why, thank you:)
Regards,
Jack
> From our friend at Intel:
> ==========================
> If you have HT enabled processors and HT enabled BIOS.
> If you look at the CPU monitor you will see all physical and logical
> processors.
> OS/2 cannot tell the difference between a physical or a logical processor.
>
> Example:
>
> 2- physical cpu
> HT enabled --- you will see 4 cpu in the CPU monitor
>
> 3- physical cpu
> HT enabled -- you will see 6 cpu in the CPU monitor
> ==========================
>
> So, I believe our friend definitely *did* run eCS/Pro in HT mode and CPU
> monitor is showing that OS/2 *is* seeing multiple CPUs. No indication on
> performance or anything - and some application might not run properly (he
> only mentioned SDD).
>
> There is still question why some had saw only one CPU when they tried it and
> it appears to be related to the BIOS. If the BIOS is not HT enabled, OS/2
> would see only one CPU - that's how I understand it.
It *is* a BIOS issue!
OS2APIC.PSD looks at the MPTS 1.1/1.4 SMP table (considered obsolete by
MS) in the BIOS only but not at the ACPI MADT table to find the number
of CPUs and Local APICs. Some time ago Intel recommended to
manufacturers to include only the physical CPUs in the MPT no matter if
HT is enabled or not to avoid confusing "legacy" OS which supposedly
cannot handle the HT feature. In such cases only the MADT contains the
full info. This seems the be changing now, some vendors seem to include
the logical CPUs instead of the physical ones into the MPT SMP tables
when HT is enabled. This would make them visible to OS/2.
Ciao,
Dani
> 1. OS/2 does NOT currently make any distinction between a system where the P4 is
> HT-capable or not.
As someone else suggested, I think how the BIOS represents the "CPU(s)"
plays a factor in this. I've got access to one of these, and the UNI
kernel works great. I'll have to try the SMP kernel on it and see what
it says...
> 2. I am NOT convinced, Intel marketing aside, that OS/2 will run faster with an
> SMP kernel using 2 (or 4) logical CPUs as opposed to a UNI kernel with 1 physical
I'll try the SysBench tests while I'm at it...
> 3. This stuff about apps needing to change to take adavantage of HT is crap. Apps
> need to be written to take advantage of true threading on an OS with an efficient
> threads mechanism.
> Then, whether the scheduling is done on two physical CPUs or whatever is another
> story. I tend to think that HT is a marketing tool, although I find plausible the
> speculation by the poster who theorized that HT is a response to poor task-switch
> performance on Windoze.
I've received some evidence(?) of this:
----------------------
> Does it really just look like two CPU's to applications etc
Yes, in fact even the OS thinks there's two CPU's there unless it's been
told different. e.g. W2K thinks I have two CPU's in the box. (And in
fact since the W2K scheduler is so shitty, it kills the performance
since it's thumping the <snip> thread between "CPU's" unnecessarily.
On a 2ghz P4, I get 50PPS (pixels per second) on our benchmark, on a
3ghz HT P4 I only get 60PPS. Yet when I installed Windows XP Pro on that
same 3ghz P4, I got 80PPS on the same render ... as <snip> explained to
me, XP Pro fixes some problems with the scheduler.
> Or do you have to code stuff specially for it?
Well, there's two things here. Technically, no, as long as your code
already makes use of two or more threads, it in theory can benefit. In
practice, however, if the threads are doing more or less the same thing
and accessing the same memory (e.g. <snip>), then you tend to lose some
of the benefit, since the advantage of hyperthreading is
that if one thread is stalled waiting for RAM, then another thread may
be able to do some work, if the resources it needs are in the cache. If
that's not the case then it doesn't really help.
No ACPI is for configuration of the HW which does include the
power states. See Danis comment about using the old MPS versus the
new ACPI info areas.
Well the APIC setting will let the board use the 24 IRQ lines of
the APIC instead of the 16 of the normal PIC. So you get 8 more IRQs
with that on an OS that supports those.
According to an detailed article about SMT/HT in the german c't mag the
sheduler was updated to prefer phys over logical cpus which you only see
if you have a dual P4 with SMP => 4 vCPUs.
And the main problem is that NT has a SpinCounter option since NT4SP3
on the critical section so that a blocked thread can check the critsec
it is blocking on more often and doesn't get sheduled with the next
timeslice which can be 120ms on an NT Server. The code used for that
was basicaly eating up vCPU cycles/resources on a SMT system.
And the Pause instruction was added by Intel so the the wait between the
check takes longer and the CPU units can be used by the other vCPU in
the CPU.
>
>>Or do you have to code stuff specially for it?
>
>
> Well, there's two things here. Technically, no, as long as your code
> already makes use of two or more threads, it in theory can benefit. In
> practice, however, if the threads are doing more or less the same thing
> and accessing the same memory (e.g. <snip>), then you tend to lose some
> of the benefit, since the advantage of hyperthreading is
> that if one thread is stalled waiting for RAM, then another thread may
> be able to do some work, if the resources it needs are in the cache. If
> that's not the case then it doesn't really help.
Well you also have to be aware of cache trashing on an SMT system
which can hurt performance and doesn't show up on a SMP system as there
the 2 CPUs don't have a unified cache.
With 1MB aligned stacks in windows and a 1MB/64kB cache window
(P4/Xeons) you get that easily. In the mag they wrote a benchmark a
vtune suggested to add alloca n*64 to 1 thread so they are n cachelines
appart. that did result in a speed up from 63.5% to 71.5% (100% is the
speed of the same thread/code on a single CPU system) which is a speedup
from 27% to43% of the combined workload of both threads.
So if you code with SMT in mind you can benefit further from HT.
but else it works fine. only problem i have is a trap in doscall1 when
running the uni kernel. but disabling proc cache until the smp kernel loads
gets around that fine. scitech is great as long as you disable write
combining....but this was a while back, and i haven't checked newer scitech
releases on smp machines.
Scott,
Thank you for sharing with us your knowlegde.
Regards,
-=terry (Denver)=-
chu...@attglobal.net
ICQ: 6387625
AIM: terryXela