Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Hyperthreading DECREASES performance? There ought to be a solution

0 views
Skip to first unread message
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Ignoramus8868

unread,
Oct 28, 2009, 10:04:22 AM10/28/09
to
On 2009-10-28, Ignoramus27237 <ignoram...@NOSPAM.27237.invalid> wrote:
> On 2009-10-28, Joe <j...@spam.hits-spam-buffalo.com> wrote:
>> And what DO you want? You are using a kernel that predates your CPU.
>> It is NOT optimized to run on your CPU, and will NOT run optimally.
>> If you'd like to test the performance, test it on a kernel that has
>> been built and optimized for the CPU you are running it on.
>
> OK, you convinced me. I will download and compile a new kernel,
> 2.6.31.5. It just might help. In fact I am already compiling it. It
> has hyperthreading support as a default option, which is encouraging.

Well, a new kernel did not help at all. Same problem.

I compiled it last night and tried at work today.

Linux version 2.6.31.5 (root@***) (gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4)) #1 SMP Tue Oct 27 23:48:54 CDT 2009

No difference, same poor (and random) allocation of CPUs, with no HT
awareness.

I am not surprised about it because I think that these issues were
worked on years ago. These features either do not exist, or, more
likely, I do not know how to take advantage of them.

i

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

GuiGui

unread,
Oct 28, 2009, 10:25:07 AM10/28/09
to
Ignoramus8868 a �crit :

>
> What I do NOT know is how to make the scheduler allocate tasks to CPUs
> for maximum performance.

apt-get install schedutils

then

man taskset

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Douglas Mayne

unread,
Oct 28, 2009, 1:18:06 PM10/28/09
to
On Wed, 28 Oct 2009 11:56:54 -0500, Ignoramus8868 wrote:

> On 2009-10-28, Douglas Mayne <do...@sl12.localnet> wrote:
>> On Tue, 27 Oct 2009 17:31:58 -0500, Ignoramus27237 wrote:
>>
>>>
>>> At work, we have some users who ru a multithreaded app, and they need
>>> every single bit of performance we can squeeze from computers.
>>>
>>>
>><snip>
>>>
>>> This is Ubuntu Hardy, 2.6.24 kernel.
>>>
>>> Thanks
>>>
>> IME, the 2.6.2x kernel series performance was all over the map*. IMO,
>> you'd be better off investigating the "latest and greatest," beginning
>> with a new kernel in the 2.6.3x series. This is especially true if you
>> are truly seeking the best performance. The kernel's changelog shows major
>> modifications with respect to its scheduler, block io, and filesystem
>> subsystems.
>
> I tried 2.6.31.5, made no difference, same shit exactly. I spent all
> morning googling. Still looking.
>
> i
>
Caveat: I am not running Ubuntu.

If you did not compile the kernel, then you should at least
verify what the critical kernel configuration parameters are for the
kernel that you are using.

CPU architecture
SLAB/SLUB
block io elevator

Note: it is possible to test various block elevators via a boot parameter.
I have had the best result with the anticipatory scheduler, and I avoid
the CFQ. YMMV.

Also, hyperthreading is only one method for increasing performance. AFAIK,
it is less effective than separate "physical" cores. It works by making
sure that processor pipelines are well utilized. For a while it went out
of fashion (i.e. upon transition from Pentium IV to Core 2 architecture).
Hyperthreading is being reintroduced with the Atom and i7 architectures.
What specific hardware is deployed to the power users?

--
Douglas Mayne

Message has been deleted
Message has been deleted
Message has been deleted

General Schvantzkoph

unread,
Oct 28, 2009, 2:13:10 PM10/28/09
to

I'm surprised that Intel brought hyperthreading back in the I7, it adds
complexity without providing much gain. The theory behind hyperthreading
is that you can get higher efficiency by using a pipe to process two or
more unrelated instruction streams. Unrelated streams have no
dependencies between them so you can schedule instructions based on
available resources without being constrained by the dependencies within
a single stream. This is helpful if you can't use your pipe efficiently
with a single stream. The hard problem in instruction scheduling are
conditional branches which are very common in most code. If you have a
long pipe and you guess wrong about the direction of a branch then you
end up throwing away a lot of work. The P4 had a very long pipe, 28
stages in the last incarnations of that architecture. A wrongly predicted
branch would end up costing you a lot of cycles, probably not all 28
cycles but close to that. If you were to run two separate streams down
that pipe then it would look like two 14 stage pipelines rather than a
single 28 stage pipeline. As a result a wrongly predicted branch is half
as expensive, so in theory much less work would be wasted. The iCore7 has
a much shorter pipeline to begin with so the cost of a mispredicted
branch is less. Also branch prediction algorithms have been improved so
the iCore7 guesses wrong less often. As a result the potential gain from
hyperthreading is much less. Hyperthreading has a negative side also.
Some resources are in short supply, in the iCore7 this is principally
cache, in the P4 registers there weren't enough registers either. If you
run more instruction streams simultaneously you end up contending for the
scarce resource. The iCore7 is undercached to begin with. The Core2 had
6M for two cores while the Core7 has 8M for four cores. If you also turn
on hyperthreading you are down to 1M per thread. It's actually not as
simple as that. It's a common cache so every thread is contending for the
same cache blocks. If the cache was much larger this wouldn't be a
problem but it's a pretty small cache even for four threads let alone
eight so it is an issue. Imagine 8 people trying to share a two bedroom
apartment, they would be throwing each others things in the street all
the time. My experiments on my workload have shown that the different
effects mostly cancel each other out so I keep hyperthreading enabled but
I don't think it makes more than a 1 or 2% difference either way.

Message has been deleted
Message has been deleted
Message has been deleted

Ignoramus8868

unread,
Oct 28, 2009, 2:27:13 PM10/28/09
to
On 2009-10-28, Douglas Mayne <do...@sl12.localnet> wrote:
> On Wed, 28 Oct 2009 12:37:33 -0500, Ignoramus8868 wrote:
>
>> On 2009-10-28, Douglas Mayne <do...@sl12.localnet> wrote:
>>> On Wed, 28 Oct 2009 11:56:54 -0500, Ignoramus8868 wrote:
>>>
>>>> On 2009-10-28, Douglas Mayne <do...@sl12.localnet> wrote:
>>>>> On Tue, 27 Oct 2009 17:31:58 -0500, Ignoramus27237 wrote:
>>>>>
>>>>>>
>>>>>> At work, we have some users who ru a multithreaded app, and they need
>>>>>> every single bit of performance we can squeeze from computers.
>>>>>>
>>>>>>
>>>>><snip>
>>>>>>
>>>>>> This is Ubuntu Hardy, 2.6.24 kernel.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>> IME, the 2.6.2x kernel series performance was all over the map*. IMO,
>>>>> you'd be better off investigating the "latest and greatest," beginning
>>>>> with a new kernel in the 2.6.3x series. This is especially true if you
>>>>> are truly seeking the best performance. The kernel's changelog shows major
>>>>> modifications with respect to its scheduler, block io, and filesystem
>>>>> subsystems.
>>>>
>>>> I tried 2.6.31.5, made no difference, same shit exactly. I spent all
>>>> morning googling. Still looking.
>>>>
>>>> i
>>>>
>>> Caveat: I am not running Ubuntu.
>>>
>>> If you did not compile the kernel, then you should at least verify
>>> what the critical kernel configuration parameters are for the kernel
>>> that you are using.
>>
>> I did compile it by myself.
>>
>> I noted, later on, that my architecture was set to Pentium 4. I
>> changed it to i586/i686 and will try it again. I do not expect any
>> improvement, but it deserves a try.
>>
> I see something like that under "processor family":
> 586/K5/5x86/6x86/6x86MX.
>
> I don't think I would choose that for my hardware. YMMV. I would choose
> Pentium III- coppermine as the minimum, and when the hardware is
> available, I would compile for Core 2. I am running 2.6.30.8 and I don't
> see anything specific to i7, as of yet.

Yep.

>>
>>> CPU architecture
>>> SLAB/SLUB
>>> block io elevator
>>>
>>> Note: it is possible to test various block elevators via a boot
>>> parameter. I have had the best result with the anticipatory scheduler,
>>> and I avoid the CFQ. YMMV.
>>

>> How would I explore and change those values?
>>
> Append the "elevator=" parameter to your kernel line (if using the grub
> loader). See the file .../Documentation/kernel-parameters.txt in the
> kernel source directory.

Hold on, this is for I/O, my test script does not do any I/O except
for printing something when finishing.

i

Message has been deleted

General Schvantzkoph

unread,
Oct 28, 2009, 5:01:26 PM10/28/09
to
On Wed, 28 Oct 2009 16:28:57 -0400, johnny bobby bee wrote:

> General Schvantzkoph wrote:
>> The iCore7 is undercached to begin with. The Core2 had 6M for two cores
>> while the Core7 has 8M for four cores. If you also turn on
>> hyperthreading you are down to 1M per thread. It's actually not as
>> simple as that. It's a common cache so every thread is contending for
>> the same cache blocks. If the cache was much larger this wouldn't be a
>> problem but it's a pretty small cache even for four threads let alone
>> eight so it is an issue. Imagine 8 people trying to share a two bedroom
>> apartment, they would be throwing each others things in the street all
>> the time. My experiments on my workload have shown that the different
>> effects mostly cancel each other out so I keep hyperthreading enabled
>> but I don't think it makes more than a 1 or 2% difference either way.
>

> So, for the price difference, would you say the i5 are comparable to the
> i7, since they don't do hyperthreading?

I prefer the Core2. I have a couple of Core2 boxes and an iCore7 box, the
best performer is the newer Core2 (with a 6M cache). I've done extensive
benchmarking on my workload, Verilog simulation and FPGA place and
routes. What I found was that the Core2 out performs the iCore7 on a
clock for clock basis by about 10% for Verilog simulations (which for me
is vast majority of my workload). The iCore7 outperforms the Core2 on
FPGA builds, clock for clock. However I have been able to run my Core2
system at 4GHz, the fastest I get my iCore7 system to run stably is
3.3GHz. My definition of running is that they are able to run 100% on all
cores simultaneously for days and days without problems, the overclocker
sites define running as being able to boot which is not the same thing at
all. In both cases I'm using a Thermalright Ultra Extreme 120 CPU cooler
which is the current king of the cooling hill. My Core2 is an 8400, my
iCore7 is a 920. The Core2 only has two cores, the iCore7 has four.
However when I run Verilog regressions on all four cores of the i7
simultaneously the performance drops so much, relative to two streams,
that the throughput on the four cores is no greater than the two cores in
the Core2. The problem with the iCore7, as I've mentioned before, is the
cache which is both undersized for four cores and has higher latency than
the cache on the Core2. Verilog simulations are extremely cache sensitive
which is why I'm seeing such terrible results from the 920. The FPGA
tools aren't nearly as sensitive so they aren't hurt by the cache
architecture of the iCore7.

Core2 motherboards cost half as much as iCore7 motherboards and the
processors are also considerably cheaper. If I were building another
system today I'd base it on the 8500 (which is a little faster than the
8400). I'd stick with the Thermalright heatsinks and I'd pick DDR2 1066
RAM which is the speed the memory system will run at when you overclock
the CPU.

Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages