Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How are applications run on hyper-threading enabled multi-core machines?

833 views
Skip to first unread message

jerry

unread,
Feb 3, 2011, 1:08:51 PM2/3/11
to

Hi,

I'm trying to gain a better understanding of how hyper-threading
enabled multi-core processors work. Let's say I have an app which can
be compiled with MPI or OpenMP or MPI+OpenMP. I wonder how it will be
scheduled on a CentOS 5.3 box with four Xeon X7560 @ 2.27GHz
processors and each processor core has Hyper-Threading enabled.

The processor is numbered from 0 to 63 in /proc/cpuinfo. For my
understanding, there are FOUR 8-cores physical processors, the total
PHYSICAL CORES are 32, each processor core has Hyper-Threading
enabled, the total LOGICAL processors are 64.


1. Compiled with MPICH2
How many physical cores will be used if I run with mpirun -np 16?
Does it get divided up amongst the available 16 PHYSICAL cores or 16
LOGICAL processors ( 8 PHYSICAL cores using hyper-threading)?

2. compiled with OpenMP
How many physical cores will be used if I set OMP_NUM_THREADS=16? Does
it will use 16 LOGICAL processors ?

3. Compiled with MPICH2+OpenMP
How many physical cores will be used if I set OMP_NUM_THREADS=16 and
run with mpirun -np 16?

4. Compiled with OpenMPI

OpenMPI has two runtime options

-cpu-set which specifies logical cpus allocated to the job,
-cpu-per-proc which specifies number of cpu to use for each process.

If run with mpirun -np 16 -cpu-set 0-15, will it only use 8 PHYSICAL
cores ?
If run with mpirun -np 16 -cpu-set 0-31 -cpu-per-proc 2, how it will
be scheduled?

Thanks

Jerry

Heiko Bauke

unread,
Feb 4, 2011, 11:57:18 PM2/4/11
to

Hi,

On 03 Feb 2011 18:08:51 GMT
"jerry" <jerr...@gmail.com> wrote:

> 1. Compiled with MPICH2
> How many physical cores will be used if I run with mpirun -np 16?
> Does it get divided up amongst the available 16 PHYSICAL cores or 16
> LOGICAL processors ( 8 PHYSICAL cores using hyper-threading)?
>
> 2. compiled with OpenMP
> How many physical cores will be used if I set OMP_NUM_THREADS=16? Does
> it will use 16 LOGICAL processors ?
>
> 3. Compiled with MPICH2+OpenMP
> How many physical cores will be used if I set OMP_NUM_THREADS=16 and
> run with mpirun -np 16?
>
> 4. Compiled with OpenMPI
>
> OpenMPI has two runtime options
>
> -cpu-set which specifies logical cpus allocated to the job,
> -cpu-per-proc which specifies number of cpu to use for each process.
>
> If run with mpirun -np 16 -cpu-set 0-15, will it only use 8 PHYSICAL
> cores ?
> If run with mpirun -np 16 -cpu-set 0-31 -cpu-per-proc 2, how it will
> be scheduled?

in all cases it depends on the OS's scheduler and may vary from run to
run. You may set the processor affinity to make process mapping to cores
deterministic. (I assume -cpu-set and -cpu-per-proc just do this.)

According to my experience hyper-threading on multicore systems has at
best no effect on the performance of HPC applications. For many
programs there is a serious performance degeneration due to
hyper-threading and scheduling issues. However, there might be some
applications that may benefit from hyper-threading. Always
check/benchmark.


Heiko

--
-- Auf zweierlei sollte man sich nie verlassen: Wenn man Boses tut,
-- dass es verborgen bleibt; wenn man Gutes tut, dass es bemerkt wird.
-- (Ludwig Fulda, dt. Buhnenautor, 1862-1939)
-- Number Crunch Blog @ http://numbercrunch.de


Joe Btfsplk

unread,
Feb 14, 2011, 2:31:49 PM2/14/11
to

1) 16 logical processors, probably spread out as one process per core. I
would guess that, in the absence of any other busy processes, most O/Ses
will allocate the processes using round-robin -- thread 1 on CPU1/core1,
thread 2 on CPU2/core1, thread 3 on CPU1/core2, thread 4 on CPU2/core2,
etc. The hyperthreads will be allocated likewise, but will arise only
after you have launched 32 processes (after each core is already running
a thread).

2) 16 logical processors again. I suspect the thread distribution will
also be the same.

3) Each of the 16 heavyweight MPI process will enter its OMP section of
the code and spawn 16 threads = 256 threads. If OMP_NUM_THREADS=2, you
would get 32 threads (16 MPI x 2 OMP each).

4) Frankly. you may have to experiment to know what's happening. CPU-
sets are not always supported or implemented the same on all O/Ses,
especially in the presence of hyperthreaded CPUs. The way to find out:
launch jobs with 1, 2, and then 3 threads. If the wall clock time is the
same for all, then the O/S is placing each thread on a different core
(almost certainly the case). Then run jobs of 32 vs 33 threads (forcing
both CPUs to use all 8 cores. The 33 core job should slow a little,
since the 33rd thread is a hyperthread and runs more slowly than a true
thread.

This sort of thing is complicated by the background load of O/S processes
that is always present on any computer. To minimize that, make sure
you're the only user of the computer and make multiple runs with each job
configuration. Then use the fastest time. As such, you might find your
hyperthreading threshold not to be 32 threads but 31 or 30, because 1 or
2 O/S processes may already be running.

I think you will want OMP_NUM_THREADS=2 in almost all cases, since each
core should run no more than 2 threads if it has hyperthreading.

And in general, I would avoid using CPU-sets. Let the O/S decide where
to put the processes. If you guess wrong and force a process onto a core
that the O/S is already using, your runtime will suffer significanty.

BTW, hyperthreading is not really multiprocessing. It's just a latency-
hiding method Intel uses to improve throughput on each core by up to
30%. As such, if you launch two threads on a dual core CPU, each core
will usually receive one thread. Any other scheduling strategy would be
inefficient.

http://en.wikipedia.org/wiki/Hyper-threading

Randy

0 new messages