optimizing diffusion code (was arithmetic speed)

Eric Lavigne

unread,

Apr 26, 2005, 8:54:18 PM4/26/05

to

>
> Good luck with the project. The c.l.l. crowd loves tuning challenges,
so
> keep them up to date. Is the latest source up in one of the articles?

I will post this email as the start of a new thread. The old thread has
become rather cluttered with multiple parallel conversations and the
old topic name is no longer appropriate.

I put a link to the source in my original post. I update the file every
hour or two.
plaza.ufl.edu/lavigne/
Neutron Diffusion Code 3

The latest version (of the gauss-seidel function) is 6 times faster
than the version I had when I started the thread. It's not yet clear to
me whether this is fast enough, but it is a huge improvement.

> have ACL enterprise with profiling, maybe I could spot something. If
you
> have a standalone source including the test case you are trying to
tune
> (perhaps abbreviated to facilitate repeated experiments) I will fire
it
> up and see what I can see (not that I am a tuning guru).

My latest improvement came from eliminating some unnecessary
destruction/creation of nodes. Keeping the old nodes and just changing
a few values was much faster. It doesn't take a tuning guru to notice
things like this, but it sure took this newbie a long time.

My current set of commands for testing are listed below. It may be time
to change them, though, now that gauss-seidel is a bit faster. It is
likely that the bottleneck has moved, and I need to find it. Probably
the next bottleneck will be expand-flux. It is not called nearly as
often, but it is extremely slow. The t at the end of some lines is
meant to supress output for my convenience, a trick that worked in
clisp but not in LispWorks. Note that gauss-seidel will run 10 times
faster if you run it a second time without first resetting the flux
(with the dotimes stuff), because it will benefit from that fact that
its job has already been done.

(setf flux (make-array '(9 9 9))) t
(setf my-material (make-material :d .9 :v-sigma-f .07 :sigma-a .066))
(dotimes (i (array-dimension flux 0)) (dotimes (j (array-dimension flux
1)) (dotimes (k (array-dimension flux 2)) (setf (aref flux i j k)
(make-node :material my-material :flux 1.0)))))
(setf source (set-source flux 1.5)) t
(compile 'gauss-seidel)
(compile 'phi-calc)
(time (gauss-seidel source flux (/ 40 (1- (array-dimension flux 0)))))

> btw, do you
> have a feel for how fast it would run in C or Fortran or something to
> set the height of the performance bar we are trying to hit?
>

I don't have any numbers for that because my project is a bit unique...
Not only am I the only person doing the assignment in Lisp (Fortran is
the most common, some are using C++), but I am also the only one doing
it in 3D. This is a cylinder problem, and others are modeling that
cylinder in 2D (r,z) cylindrical coordinates (and are still not finding
it easy). I am writing a 3D arbitrary geometry code with the intention
of modeling that cylinder in it later. This makes my code more
complicated (arbitrary geometry) and slower (50-100 times more nodes
due to extra dimension). Also, this problem scales much worse than
linearly with number of nodes because of convergence issues.

The performance bar I am trying to hit is 30 minutes for the whole
program. This would allow a halfway reasonable rate for my
edit-compile-run development cycle. I don't think I'm anywhere close to
this number, since expand-flux alone can run for 30 minutes and is
called more than 20 times in the full program.

> cheers, kenny
>
>

Wade Humeniuk

unread,

Apr 26, 2005, 9:19:21 PM4/26/05

to

Eric Lavigne wrote:

>
> (setf flux (make-array '(9 9 9))) t

Is your flux array just an array of single-float(s)? (or even for that matter
double-float). If it is then you should probably create flux arrays like

(make-array '(9 9 9) :element-type 'single-float)

and then when you use it in a function add this declaration,

(declare (type (simple-array single-float (* * *)) <var>)

e.g. in gauss-seidal add to the current declaration

(type (simple-array single-float (* * *)) flux)

I am not sure if anything has to be done with copy-array.

Wade

Eric Lavigne

unread,

Apr 26, 2005, 9:40:14 PM4/26/05

to

Flux is an array of nodes. A node looks like this:
node
-flux (single-float)
-material(struct)
D(single-float)
v-sigma-f(single-float)
sigma-a(single-float)
So I think specifying that it will contain nodes in advance is probably
not a big deal since a pointer to the node will be needed anyway. I do
save memory and reduce copying by making only a few material structs,
with many nodes having pointers to the same material.

Someone recommended switching from single-floats to double-floats.
Apparently they are just as fast and are more precise. My only problem
with taking this advice is that I don't understand the notation. If I
pass 1.0 to a function and use (declare (double-float... on that
argument, is that an error because I passed a single-float and called
it double? Do I need to say 1d0 instead?

Slightly off-topic, I'm trying to use the Lisp Resource Kit, based on
Knoppix. I downloaded an ISO, burned it to CD, but it doesn't boot. How
do I make CDs bootable, and is there a free windows program that I can
use for the job?

Edi Weitz

unread,

Apr 27, 2005, 4:23:45 AM4/27/05

to

On 26 Apr 2005 18:40:14 -0700, "Eric Lavigne" <lavign...@gmail.com> wrote:

> Slightly off-topic, I'm trying to use the Lisp Resource Kit, based
> on Knoppix. I downloaded an ISO, burned it to CD, but it doesn't
> boot. How do I make CDs bootable,

If the ISO describes a bootable CD than the CD-R you burned should
boot I think.

> and is there a free windows program that I can use for the job?

The Debian FAQ says that this is a freeware program that can burn CDs
from ISOs:

<http://www.cdburnerxp.se/>

Cheers,
Edi.

--

Lisp is not dead, it just smells funny.

Real email: (replace (subseq "spam...@agharta.de" 5) "edi")

Eric Lavigne

unread,

Apr 27, 2005, 12:11:33 PM4/27/05

to

Email from Nicolas Neuss:

What do you use Gauss-Seidel for? Solving a linear system? This is an
extremely bad choice.

For performance, you should
0. Switch to another iteration, e.g. multigrid
1. use CMUCL or SBCL
2. not use multi-dimensional arrays
3. declare the types of your arrays
4. (declare (optimize speed))
5. eliminate all efficiency notes
6. Only in extreme need do runtime-compilation as in
N. Neuss: On using Common Lisp for Scientific Computing
Proceedings of the CISC Conference 2002, LNCSE, Springer-Verlag
(2003).
[http://sit.iwr.uni-heidelberg.de/~neuss/publications.html]
which might prove interesting to you nevertheless.

Nicolas.

Eric Lavigne

unread,

Apr 27, 2005, 12:17:45 PM4/27/05

to

> What do you use Gauss-Seidel for? Solving a linear system? This is
an
> extremely bad choice.
> For performance, you should
> 0. Switch to another iteration, e.g. multigrid

I am surprised to hear that. My department's proudest accomplishment,
PENTRAN, uses a similar algorithm, block-jacobi, for solving linear
systems.
http://ufttg.nre.ufl.edu/~haghigha/pentran.html
If there are much better iteration methods, I might end up turning
this into a dissertation project. I don't know if I have time to learn
these methods and apply them before my project is due, but either way
I am excited to hear about these.

>
> 1. use CMUCL or SBCL

I'd like to. I started trying to get linux working yesterday
(developing on windows now). It runs, but can't access the internet.
Can't develop software on a system where I can't import my old files
or export new ones.

> 2. not use multi-dimensional arrays
> 3. declare the types of your arrays
> 4. (declare (optimize speed))
> 5. eliminate all efficiency notes
> 6. Only in extreme need do runtime-compilation as in
> N. Neuss: On using Common Lisp for Scientific Computing
> Proceedings of the CISC Conference 2002, LNCSE, Springer-Verlag
(2003).
> [http://sit.iwr.uni-heidelberg.de/~neuss/publications.html]
> which might prove interesting to you nevertheless.

I will read that article now. Thanks for the tips, Nicolas.

Eric

Eric Lavigne

unread,

Apr 27, 2005, 12:20:05 PM4/27/05

to

>I wanted to post to cll, but pressed the wrong button in my
newsreader. If
>you want you can post followups there again (best with the above
context).
>It might interest others, too.
>
>Nicolas.

Will do. Also, I have good news. I'm writing this message on my
temporary linux computer. I am using a knoppix distribution called the
Linux Resource Kit. On to CMUCL. My next goal is to do some
performance tests to see if CMUCL gives me much better speed. After
that I'm going to look into FEMLISP. It might not be practical for me
because I will have to reinstall it every time I reboot my computer,
but it might still work out if installation is fast enough.

Thanks,
Eric

Eric Lavigne

unread,

Apr 27, 2005, 12:23:37 PM4/27/05

to

> OK. I myself did not notice any improvement. I agree it is time to
look
> elsewhere for speed boosts.
>
> kt

Of all the functions that work right now, this one is the highest
level. So it is the one we are *really* trying to optimize:

(time (criticality-refine 40.0 #'simple-material-selector))

Unfortunately, it's so slow that I haven't seen it complete yet. It
depends one these two functions which can be tested in a more
reasonable time frame:

(time (expand-flux-3d
flux
(/ 4e1 (1- (array-dimension flux 0)))
#'simple-material-selector 1.5))

(time (criticality-fixed flux (/ 4e1 (1- (array-dimension flux 0)))))

The second of them, criticality-fixed, takes longer and is called more
often, so is far more likely to be the bottleneck. (it also depends on
gauss-seidel, which we've been optimizing for many hours)

Eric Lavigne

unread,

Apr 27, 2005, 12:52:55 PM4/27/05

to

The bad news is that CMUCL on LRK is running about 20% slower than
CLISP on winxp. The good news is that it gives me many pages of
explanations about why it can't optimize my code, including suggestions
for improving the situation. Rather than taking its advice, I am ready
to give FEMLISP a try. Hopefully that installation will go smoothly.

GP lisper

unread,

Apr 27, 2005, 3:56:43 PM4/27/05

to

On 27 Apr 2005 09:52:55 -0700, <lavign...@gmail.com> wrote:
>
> The bad news is that CMUCL on LRK is running about 20% slower than
> CLISP on winxp. The good news is that it gives me many pages of

Can you change your "signature" to mention on what webpage your code
(and running hints) can be found? When I read the above, I wanted to
time the code myself.

--
Everyman has three hearts;
one to show the world, one to show friends, and one only he knows.

Paul Dietz

unread,

Apr 27, 2005, 4:25:47 PM4/27/05

to

Eric Lavigne wrote:
>>What do you use Gauss-Seidel for? Solving a linear system? This is
>
> an
>
>>extremely bad choice.
>>For performance, you should
>>0. Switch to another iteration, e.g. multigrid
>
>
> I am surprised to hear that. My department's proudest accomplishment,
> PENTRAN, uses a similar algorithm, block-jacobi, for solving linear
> systems.
> http://ufttg.nre.ufl.edu/~haghigha/pentran.html
> If there are much better iteration methods, I might end up turning
> this into a dissertation project. I don't know if I have time to learn
> these methods and apply them before my project is due, but either way
> I am excited to hear about these.

Multigrid solvers for the Laplace (diffusion) equation run in time
linear in the number of finest grid points, IIRC.

Paul

Eric Lavigne

unread,

Apr 27, 2005, 6:25:19 PM4/27/05

to

>Can you change your "signature" to mention
> on what webpage your code (and running hints)
> can be found? When I read the above, I wanted
> to time the code myself.

It's on my webpage at
plaza.ufl.edu/lavigne/
Look for Neutron Diffusion Code 3.
Running hints are below it.

Note that I don't blame CMUCL for the poor performance. A compiler
should be measured on its performance with good code and its general
helpfulness with bad code. CMUCL gets better marks on both points. In
benchmarks it does better (with good coders providing the benchmark
code, not newbies like me). When I feed it bad code it runs a bit
slower than CLISP but gives me the comments I need to get better. CLISP
never complained about my mistakes, even when I gave it an integer and
called it a float (fixed this morning when I tried to run it on ACL).
CMUCL is the first compiler I've seen, for any language, that gives
warnings about... your code would be 10 ticks per iteration faster
if... On the other hand, I don't understand why it gives me so many
messages about garbage collection. I would think that printing those
messages would take more time than actually collecting the garbage.

I will add a signature right after this post. Thanks for the suggestion.

Edi Weitz

unread,

Apr 27, 2005, 6:39:43 PM4/27/05

to

On 27 Apr 2005 15:25:19 -0700, "Eric Lavigne" <lavign...@gmail.com> wrote:

> On the other hand, I don't understand why it gives me so many
> messages about garbage collection. I would think that printing those
> messages would take more time than actually collecting the garbage.

You can turn them off with (setq ext:*gc-verbose* nil).

Add that form to your initialization file (~/.cmucl-init) and you'll
never see those messages again... :)

Eric Lavigne

unread,

Apr 27, 2005, 7:48:26 PM4/27/05

to

>Can you change your "signature" to mention
> on what webpage your code (and running hints)
> can be found? When I read the above, I wanted
> to time the code myself.

It's on my webpage at

plaza.ufl.edu/lavigne/
Look for Neutron Diffusion Code 3.
Running hints are below it.

Note that I don't blame CMUCL for the poor performance. A compiler
should be measured on its performance with good code and its general
helpfulness with bad code. CMUCL gets better marks on both points. In
benchmarks it does better (with good coders providing the benchmark
code, not newbies like me). When I feed it bad code it runs a bit
slower than CLISP but gives me the comments I need to get better. CLISP
never complained about my mistakes, even when I gave it an integer and
called it a float (fixed this morning when I tried to run it on ACL).
CMUCL is the first compiler I've seen, for any language, that gives
warnings about... your code would be 10 ticks per iteration faster

if... On the other hand, I don't understand why it gives me so many

messages about garbage collection. I would think that printing those
messages would take more time than actually collecting the garbage.

I will add a signature right after this post. Thanks for the suggestion.

Eric Lavigne

unread,

Apr 27, 2005, 8:59:44 PM4/27/05

to

>The bad news is that CMUCL on LRK is running about 20% slower than
> CLISP on winxp. The good news is that it gives me many pages of
> explanations about why it can't optimize my code

I've managed to take care of all of the warnings in phi-calc, except
for one:
Note: Doing float to pointer coercion (cost 13) to "<return value>"
How do I fix this?

My code is available on my website.
plaza.ufl.edu/lavigne/
Diffusion Code 3

Eric Lavigne

unread,

Apr 27, 2005, 9:33:25 PM4/27/05

to

If I want to declare something as a 3 dimensional array of
single-float, how do I do that?

Ordinary variable declarations look like (declare (single-float
source)), but I can't think of how that would be changed to work with
arrays.

Wade Humeniuk

unread,

Apr 27, 2005, 11:07:58 PM4/27/05

to

Eric Lavigne wrote:
> If I want to declare something as a 3 dimensional array of
> single-float, how do I do that?
>

(declare (type (simple-array single-float (* * *)) something)

Sampo Smolander

unread,

Apr 28, 2005, 2:07:35 AM4/28/05

to

Eric Lavigne <lavign...@gmail.com> wrote:
> > What do you use Gauss-Seidel for? Solving a linear system?
> > This is an extremely bad choice. For performance, you should
> > 0. Switch to another iteration, e.g. multigrid

> I am surprised to hear that. My department's proudest accomplishment,
> PENTRAN, uses a similar algorithm, block-jacobi, for solving linear
> systems.
> http://ufttg.nre.ufl.edu/~haghigha/pentran.html

"PENTRAN performs multigroup, anisotropic, discrete ordinates neutral
particle transport calculations"

May I just point out, that you are solving the diffusion equation, and
(according to the above web page) PENTRAN is solving the radiative
transport equation. These are two entirely different beasts.

It seems that multigrid techniques are common to speed up numerical
solutions to diffusion equation, and other partial differential equations.
But transport equation is a partial differential-integral equation, and
I don't know whether multigrip techniques would work on it, too.

(You who know, please enlighten us.)

Eric Lavigne

unread,

Apr 28, 2005, 2:37:02 AM4/28/05

to

>It seems that multigrid techniques are common to speed up numerical
>solutions to diffusion equation, and other partial differential
equations.
>But transport equation is a partial differential-integral equation,
and
>I don't know whether multigrip techniques would work on it, too.

What is a differential-integral equation?
Diffusion and transport are different beasts, in the sense that
transport has a lot more simultaneous equations. I don't really
understand how multigrid works, though, so maybe there is something
special about diffusion that makes multigrid effective...

Sampo Smolander

unread,

Apr 28, 2005, 3:02:57 AM4/28/05

to

Eric Lavigne <lavign...@gmail.com> wrote:
> What is a differential-integral equation?

An equation that has both differential and integral terms.

For example, the transport equation is written out as
eq. 2.5, on page 13, in the PENTRAN manual,
http://ufttg.nre.ufl.edu/~haghigha/PENMANV900n.pdf

> I don't really understand how multigrid works, though, so maybe there
> is something special about diffusion that makes multigrid effective...

Then again, googling for 'multigrid radiative transfer' or 'multigrid
radiative transport' seems to give enough reading for more that just one
evening :-)

Nicolas Neuss

unread,

Apr 28, 2005, 7:39:14 AM4/28/05

to

"Eric Lavigne" <lavign...@gmail.com> writes:

> Will do. Also, I have good news. I'm writing this message on my
> temporary linux computer. I am using a knoppix distribution called the
> Linux Resource Kit. On to CMUCL. My next goal is to do some
> performance tests to see if CMUCL gives me much better speed. After
> that I'm going to look into FEMLISP. It might not be practical for me
> because I will have to reinstall it every time I reboot my computer,
> but it might still work out if installation is fast enough.

It should be possible to get Femlisp working on GCL, too. This is
something which is on my TODO list. GCL's arithmetic speed might be
acceptable, at least it proved to be so for some simple tests.
Unfortunately, you will encounter other difficulties along this road (which
is why noone has recommended it). Several useful stuff is not yet
available for GCL, for example SLIME. [Another option for Windows is
Allegro, of course. It should be easy to port Femlisp there.]

If you should try out Femlisp, you will note that it is not yet optimized
for speed. This is mainly a problem of Femlisp's datastructures and only a
little bit a problem of Lisp. Also one should keep in mind that Femlisp
works on general unstructured meshes, and calculations on structured meshes
are always an order of magnitude faster in any language. [Using structured
meshes can lead to astonishing performance: it is very well possible to
solve linear systems with 1e6 unknowns arising from discretization of the
2d Laplace equation in less than 1 second (using CL).]

Nicolas.

Eric Lavigne

unread,

Apr 28, 2005, 8:05:35 PM4/28/05

to

I discussed this issue with one of PENTRAN's creators. It is applicable
to neutron transport but has three problems:

1) Requires about twice as much memory. One of the reasons for parallel
computing, in addition to the speedup, is that larger problems can be
solved with more computers contributing memory.
2)Complicated (difficult to program).
3)Convergence issues near material boundaries if the materials have
extremely different properties.

On the other hand, he agreed that multigrid was much faster, sometimes
by an order of magnitude. I am under the impression that all of the
above issues are solveable, and it sounds worthwhile to do so. It
sounds like Nicolas Neuss has already taken care of 2 (and possibly 3,
I haven't studied his articles enough yet). I am inclined to ignore 1.
Double the memory for an order of magnitude in speed sounds like a good
deal.

Kenny Tilton

unread,

Apr 28, 2005, 8:58:57 PM4/28/05

to

Eric Lavigne wrote:

Cool, so you are undertaking a multigrid project? What happened with the
project due today?

kenny

--
Cells? Cello? Cells-Gtk?: http://www.common-lisp.net/project/cells/
Why Lisp? http://lisp.tech.coop/RtL%20Highlight%20Film

"Doctor, I wrestled with reality for forty years, and I am happy to
state that I finally won out over it." -- Elwood P. Dowd

Eric Lavigne

unread,

Apr 28, 2005, 11:56:31 PM4/28/05

to

>Cool, so you are undertaking a multigrid project?

There's a good chance I'll do that later. Not now.

>What happened with the project due today?

I increased the tolerances, trading accuracy for speed so I could get
it done. I still need to update my webpage about the diffusion code's
latest status.

I am starting a new project called CRYSTAL. My dad designed a software
system years ago to support forensics analysts. The idea was to take a
bunch of existing forensics tools, develop a few new ones, and have one
program (CRYSTAL) that would tie all those tools together. CRYSTAL
would also automatically document the progress of an investigation,
improving the quality of reports and the ease of generating them.

He came up with the idea while doing forensics work for the Air Force,
and he arranged for civilian contractors to program it. After a few
years of slow development, the project was cancelled. I never thought
it should take years, but I have ignored it in the past because of the
need for gui development. Peter Seibel's book showed me that gui
development doesn't need to be complicated. I can just turn the whole
program into a webpage, which brings some nice advantages to the table
anyway.

This week, starting Friday or Saturday, I will create my first
prototype of CRYSTAL. This is a big task. Ordinary programmers tried
for years and failed. But I am no ordinary programmer. I am a Lisp
newbie. Wish me luck :-)

Larry Clapp

unread,

Apr 29, 2005, 10:39:51 AM4/29/05

to

[ posted and mailed ]

In article <1114746991....@g14g2000cwa.googlegroups.com>, Eric Lavigne wrote:

>>Kenny Tilton Wrote:
>>What happened with the project due today?
> I increased the tolerances, trading accuracy for speed so I could get
> it done. I still need to update my webpage about the diffusion code's
> latest status.

I came to this discussion fairly late. In the event that it's still
relevant, I downloaded your Diffusion 3 code, compiled it under cmucl,
and posted the output at
http://www.theclapp.org/diff-3d.lisp.compile.out.txt. The actual code
I compiled is at http://www.theclapp.org/diff-3d.lisp retrieved from
http://plaza.ufl.edu/lavigne/diff-3d.lisp at 10:12am edt 4/29/05.

Perhaps people more knowledgeable in cmucl than I am could point you
to problems. It prints a lot of warnings along the lines of

; (INCF *ITERATION-COUNT-INNER*)
; --> LET*
; ==>
; (+ *ITERATION-COUNT-INNER* 1)
; Note: Unable to optimize due to type uncertainty:
; The first argument is a NUMBER, not a FLOAT.
[...]
; (ARRAY-DIMENSION FLUX 0)
; Note: Unable to optimize because:
; Array dimensions unknown, must call array-dimension at runtime.

On the other hand, you use clisp, so it might be completely
irrelevant, but I figured it couldn't hurt.

It's also within the realm of possibility that telnetting to
prompt.franz.com might help, at least as far as optimizing your code,
though perhaps not in actually running it. (prompt.franz.com provides
a text-mode version of Allegro CL Trial Edition v6.2.)

-- Larry

Eric Lavigne

unread,

Apr 29, 2005, 11:17:13 AM4/29/05

to

>came to this discussion fairly late. In the event that it's still
>relevant, I downloaded your Diffusion 3 code, compiled it under cmucl,
>and posted the output at

Nicolas Neuss convinced me that I was coming at this problem from the
wrong angle. Since I had a project due the next day, I ignored his
advice, stuck with what I had, and focused on finishing. If and when I
return to this project, I will probably throw away my old code and
start over, so I no longer have reason to continue optimizing the old
code.