Loop jamming!?

光頭

unread,

Jun 24, 1997, 3:00:00 AM6/24/97

to

Hi.... :)
I want to ask a question about compiler optimization.
Is there anyone that understands the keyword "loop jamming".
Can you give me the definition (and a example if you can)?
Thanks for your help :)
--
Send compilers articles to comp...@iecc.com,
meta-mail to compiler...@iecc.com.

Steve Simmons

unread,

Jun 30, 1997, 3:00:00 AM6/30/97

to

Loop jamming is very similar to loop unrolling; however, loop
unrolling unrolls the innermost loop in the nest where loop jamming
unrolls an outer loop in the nest and jams the next inner loop
together.

Example:

FOR I= 1, 2
FOR J= 1, N
A(I, J) = B(I, J)
ENDDO
ENDDO

This is a perfect candidate for loop jamming because it can
take advantage of cache locality (assuming the elements in
the first dimension are contiguous).

The result of a loop unroll and jam is...

FOR J = 1, N
A(1, J) = B(1, J)
A(2, J) = B(2, J)
ENDDO

Note, this same optimization can be achieved by first
performing a loop interchange, and then a complete unroll.
However, this is one complete optimization.

Joel Williamson

unread,

Jun 30, 1997, 3:00:00 AM6/30/97

to

> Is there anyone that understands the keyword "loop jamming".

Loop-jamming is often used as a synonym for loop-fusion:

do i = 1, n
a(i) = b(i) + c(i)
enddo
do i = 1, n
e(i) = f(i) + g(i)
enddo
becomes:

do i = 1, n
a(i) = b(i) + c(i)
e(i) = f(i) + g(i)
enddo

Often used to jam many congruent Fortran 90 array assignment statements
into one loop to reduce loop overhead, where doing so does not introduce
any loop-carried-dependences.

Best regards,

Joel Williamson

Sven

unread,

Jul 13, 1997, 3:00:00 AM7/13/97

to

> I want to ask a question about compiler optimization.

> Is there anyone that understands the keyword "loop jamming".

> Can you give me the definition (and a example if you can)?
> Thanks for your help :)

A search for "loop jamming" (also called fusion) in AltaVista,
yielded:

http://www.ontek.com/mikey/current.html

"The idea is to combine adjacent loops which loop over the same range of
the same variable."

http://happy.dt.uh.edu/~hu/4328/note.html

"If there are two loops with no data dependencies adjacent to one
another, push them together."

http://www.bme.jhu.edu/programs/courses/580.483/opt.techniques.html

"Programmers often encounter two or more independent, vectorizable loops
with the same iteration count. By combining two or more of these loops
into one vector loop, loop control overhead can be reduced."

N8TM

unread,

Jul 16, 1997, 3:00:00 AM7/16/97

to

Loop fusion or jamming is useful not only to reduce loop control overhead,
but to reduce the total number of memory references (by sharing) or
improve use of Instruction Level Parallelism. It is a useful part of
outer unrolling and loop interchanging analysis performed by modern
compilers. Certain architectures, like HP PA (in single precision, where
more registers are available), are more dependent on it than others. In
the extreme case, the slower loop is not slowed down any more by merging
another loop with it.

Tim

Christopher Glaeser

unread,

Jul 18, 1997, 3:00:00 AM7/18/97

to

> In the extreme case, the slower loop is not slowed down any more
> by merging another loop with it.

Not true. In the extreme case, performance can actually decrease. This
can occur, for example, when the fusion of the loops cause cache
conflicts that were not present when the loops are run separately.

See http://www.nullstone.com/htmls/category/fusion.htm for a brief
description of loop fusion.

Best regards,
Christopher Glaeser c...@nullstone.com
Nullstone Corporation http://www.nullstone.com

Stanley Chow

unread,

Jul 21, 1997, 3:00:00 AM7/21/97

to

Christopher Glaeser <c...@nullstone.com> wrote:
>> In the extreme case, the slower loop is not slowed down any more
>> by merging another loop with it.
>
>Not true. In the extreme case, performance can actually decrease. This
>can occur, for example, when the fusion of the loops cause cache
>conflicts that were not present when the loops are run separately.

Of course, both cases can be true (but for different extreme cases).

--
Stanley Chow; sc...@nortel.ca, (613) 763-2831
Nortel/BNR, PO Box 3511 Station C, Ottawa, Ontario

Christopher Glaeser

unread,

Jul 22, 1997, 3:00:00 AM7/22/97

to

> >> In the extreme case, the slower loop is not slowed down any more
> >> by merging another loop with it.
> >
> >Not true. In the extreme case, performance can actually decrease. This
> >can occur, for example, when the fusion of the loops cause cache
> >conflicts that were not present when the loops are run separately.
>
> Of course, both cases can be true (but for different extreme cases).

I understood "In the extreme case, the slower loop is not slowed down"
to mean there are *no* programs that will slow as a result of loop
jamming. I contend such programs *do* exist. These two assertions can
not both be true. Either such programs exist, or they do not.

If I have misinterpreted the meaning of the original sentence, then
please explain what is "extreme" about the "extreme case". An example
would be helpful.

Best regards,
Christopher Glaeser c...@nullstone.com
Nullstone Corporation http://www.nullstone.com

N8TM

unread,

Jul 27, 1997, 3:00:00 AM7/27/97

to

Evidently, there are cases on certain systems, e.g. SGI, where loops
must be split to obtain reasonable performance, due primarily to
apparent shortage of available registers. I meant the other extreme,
where a loop is held back by sequential dependencies, and another loop
can be jammed in with it, even possibly a loop with a slightly
different trip count, so that the time taken by one of the two loops
may be eliminated.

Tim

Gene Wagenbreth

unread,

Jul 28, 1997, 3:00:00 AM7/28/97

to

If you are after high performance, f90 array syntax can get in your way. The
compiler must translate the f90 to the equivalent of an f77 DO loop while
generating code. It must perform loop jamming, and sometimes array demotion.
Performance depends on how the compiler does this. The user has no control,
and usually can not find out what the compiler has done. Rewriting the code
by hand in f77 Do loops is often the best thing to do.

Gene W
--
-------------------------------------------------------------------
Gene Wagenbreth Programming for Dollars

Steve Simmons

unread,

Jul 29, 1997, 3:00:00 AM7/29/97

to

> I understood "In the extreme case, the slower loop is not slowed down"
> to mean there are *no* programs that will slow as a result of loop
> jamming. I contend such programs *do* exist. These two assertions can
> not both be true. Either such programs exist, or they do not.

Every compiler optimization performed has a risk of producing slower
code. A good compiler can catch many of these instances by analysis
before the transformation; however, no compiler can catch all of
these instances.

Usually, a rule of thumb is to perform the transformation if
it results in a performance enhancement a majority of the time,
and the downside risk is minimal.

Loop jamming/fusion increases the number of generated instructions,
and can thus introduce page faulting in the body of the loop where no page
fault existed before. Obviously, this would slow down your code.

Stanley Chow

unread,

Jul 31, 1997, 3:00:00 AM7/31/97

to

Christopher Glaeser <c...@nullstone.com> wrote:
>> >> In the extreme case, the slower loop is not slowed down any more
>> >> by merging another loop with it.
>> >
>> >Not true. In the extreme case, performance can actually decrease. This
>> >can occur, for example, when the fusion of the loops cause cache
>> >conflicts that were not present when the loops are run separately.
>>
>> Of course, both cases can be true (but for different extreme cases).
>

>I understood "In the extreme case, the slower loop is not slowed down"
>to mean there are *no* programs that will slow as a result of loop
>jamming. I contend such programs *do* exist. These two assertions can
>not both be true. Either such programs exist, or they do not.

I took "extreme" to apply to the time penalty to the slow loop; as
opposed to applying to the set of programs. To paraphrase the original
statement:

Loop jamming will result in a loop that is, in general, slower than
each of the original loops. The time penalty will depend on many
factors (including cache size, serial dependency, multiple ALU) and
in the extreme case, the time penalty can be zero wrt the slower loop.

--
Stanley Chow; sc...@nortel.ca, (613) 763-2831
Nortel/BNR, PO Box 3511 Station C, Ottawa, Ontario

Jan Vorbrueggen

unread,

Jul 31, 1997, 3:00:00 AM7/31/97

to

ge...@netcom.com (Gene Wagenbreth) writes:

> If you are after high performance, f90 array syntax can get in your way. The
> compiler must translate the f90 to the equivalent of an f77 DO loop while
> generating code. It must perform loop jamming, and sometimes array demotion.
> Performance depends on how the compiler does this. The user has no control,
> and usually can not find out what the compiler has done. Rewriting the code
> by hand in f77 Do loops is often the best thing to do.

Well, the the whole point of using f90's array operations is lost,
isn't it? But I'm sure the APL/J crowd will point out to you that
they did this type of optimization oh, three decades ago. If your f90
compiler doesn't do its work properly, return it to the manufacturer
while complaining loudly.

Jan

Stanley Chow

unread,

Aug 9, 1997, 3:00:00 AM8/9/97

to

Stanley Chow wrote:
>> Loop jamming will result in a loop that is, in general, slower than
>> each of the original loops. The time penalty will depend on many
>> factors (including cache size, serial dependency, multiple ALU) and
>> in the extreme case, the time penalty can be zero wrt the slower loop.

<cliff...@Eng.Sun.COM> wrote:
>Actually, it's pretty easy for the jammed loop to run faster than the
>slowest loop, if the cache locality is right.

Then the compiler was not doing a good job for the slow loop - it should
have managed the cache better. In the extreme :-), it could invent
the "extra" loop and use loop jamming to speed up the slow loop.

--
Stanley Chow; sc...@nortel.ca, (613) 763-2831
Nortel/BNR, PO Box 3511 Station C, Ottawa, Ontario