Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Does Loop Fission Work in Single Cores?

9 views
Skip to first unread message

kunjaan

unread,
May 6, 2009, 3:08:21 AM5/6/09
to
When does it make sense to use Loop fission/distribution if I am
compiling for a single core processor?

George Neuner

unread,
May 6, 2009, 3:32:49 PM5/6/09
to
On Wed, 6 May 2009 00:08:21 -0700 (PDT), kunjaan <kun...@gmail.com>
wrote:

>When does it make sense to use Loop fission/distribution if I am
>compiling for a single core processor?

Well, distribution is meaningless on a single core.

When it makes sense to fission a loop is hardware specific. Generally
you want to do it when the loop somehow isn't a good fit to the
characteristics of the primary cache - whether for code size, for data
access pattern, or for data dependent branching within the loop body.

For modern 32-bit and 64-bit CPUs, the primary cache is large enough
for all but really monstrous loops ... it's far more likely that
you'll want to be unrolling or fusing loops than fissioning them.

Fissioning for core distribution is a different question, but the
consideration is not code size but rather how the data access patterns
of (possibly) concurrently executed separate loops will interact.

George

Harold Aptroot

unread,
May 7, 2009, 4:23:52 AM5/7/09
to
"kunjaan" <kun...@gmail.com> wrote in message

> When does it make sense to use Loop fission/distribution if I am
> compiling for a single core processor?

It's hard to answer questions like this.
I could tell you that it makes sense whenever it speeds up execution, but
such an answer is completely useless.
It can speed up execution though, such as when 2 big arrays are accessed in
the loop body but the work performed on them is not interdependant and the
arrays are so big that they do not fit in the cache. If that's the case,
then you're probably memory IO bound, since the CPU might not understand
your memory access pattern, usually they only known about forward linear
strided access and you'd be quickly switching back and forth between two
unrelated places. Confusing the CPU is almost never a good thing. Using
prefetch hints would also help, but the best prefetch offset (and the size
of the prefetched block) depend on the CPU model/brand. Splitting the loops
also works, as does interleaving the arrays (but it's often hard to prove
that it's possible to interleave them without breaking something).
Distributing the loop in 2 or more threads helps if for example you have
slow IO in the loop body, or when the CPU has HyperThreading.
Note that these were all examples, there will be more cases in which it will
improve performance.

Florian Stock

unread,
May 7, 2009, 4:27:05 AM5/7/09
to
Hello,

kunjaan <kun...@gmail.com> writes:

> When does it make sense to use Loop fission/distribution if I am
> compiling for a single core processor?

depending on the code, it could improve the data locality ( = better
utilization of the cache).

Florian

0 new messages