I will post this email as the start of a new thread. The old thread has
become rather cluttered with multiple parallel conversations and the
old topic name is no longer appropriate.
I put a link to the source in my original post. I update the file every
hour or two.
plaza.ufl.edu/lavigne/
Neutron Diffusion Code 3
The latest version (of the gauss-seidel function) is 6 times faster
than the version I had when I started the thread. It's not yet clear to
me whether this is fast enough, but it is a huge improvement.
> have ACL enterprise with profiling, maybe I could spot something. If
you
> have a standalone source including the test case you are trying to
tune
> (perhaps abbreviated to facilitate repeated experiments) I will fire
it
> up and see what I can see (not that I am a tuning guru).
My latest improvement came from eliminating some unnecessary
destruction/creation of nodes. Keeping the old nodes and just changing
a few values was much faster. It doesn't take a tuning guru to notice
things like this, but it sure took this newbie a long time.
My current set of commands for testing are listed below. It may be time
to change them, though, now that gauss-seidel is a bit faster. It is
likely that the bottleneck has moved, and I need to find it. Probably
the next bottleneck will be expand-flux. It is not called nearly as
often, but it is extremely slow. The t at the end of some lines is
meant to supress output for my convenience, a trick that worked in
clisp but not in LispWorks. Note that gauss-seidel will run 10 times
faster if you run it a second time without first resetting the flux
(with the dotimes stuff), because it will benefit from that fact that
its job has already been done.
(setf flux (make-array '(9 9 9))) t
(setf my-material (make-material :d .9 :v-sigma-f .07 :sigma-a .066))
(dotimes (i (array-dimension flux 0)) (dotimes (j (array-dimension flux
1)) (dotimes (k (array-dimension flux 2)) (setf (aref flux i j k)
(make-node :material my-material :flux 1.0)))))
(setf source (set-source flux 1.5)) t
(compile 'gauss-seidel)
(compile 'phi-calc)
(time (gauss-seidel source flux (/ 40 (1- (array-dimension flux 0)))))
> btw, do you
> have a feel for how fast it would run in C or Fortran or something to
> set the height of the performance bar we are trying to hit?
>
I don't have any numbers for that because my project is a bit unique...
Not only am I the only person doing the assignment in Lisp (Fortran is
the most common, some are using C++), but I am also the only one doing
it in 3D. This is a cylinder problem, and others are modeling that
cylinder in 2D (r,z) cylindrical coordinates (and are still not finding
it easy). I am writing a 3D arbitrary geometry code with the intention
of modeling that cylinder in it later. This makes my code more
complicated (arbitrary geometry) and slower (50-100 times more nodes
due to extra dimension). Also, this problem scales much worse than
linearly with number of nodes because of convergence issues.
The performance bar I am trying to hit is 30 minutes for the whole
program. This would allow a halfway reasonable rate for my
edit-compile-run development cycle. I don't think I'm anywhere close to
this number, since expand-flux alone can run for 30 minutes and is
called more than 20 times in the full program.
> cheers, kenny
>
>
>
> (setf flux (make-array '(9 9 9))) t
Is your flux array just an array of single-float(s)? (or even for that matter
double-float). If it is then you should probably create flux arrays like
(make-array '(9 9 9) :element-type 'single-float)
and then when you use it in a function add this declaration,
(declare (type (simple-array single-float (* * *)) <var>)
e.g. in gauss-seidal add to the current declaration
(type (simple-array single-float (* * *)) flux)
I am not sure if anything has to be done with copy-array.
Wade
Someone recommended switching from single-floats to double-floats.
Apparently they are just as fast and are more precise. My only problem
with taking this advice is that I don't understand the notation. If I
pass 1.0 to a function and use (declare (double-float... on that
argument, is that an error because I passed a single-float and called
it double? Do I need to say 1d0 instead?
Slightly off-topic, I'm trying to use the Lisp Resource Kit, based on
Knoppix. I downloaded an ISO, burned it to CD, but it doesn't boot. How
do I make CDs bootable, and is there a free windows program that I can
use for the job?
> Slightly off-topic, I'm trying to use the Lisp Resource Kit, based
> on Knoppix. I downloaded an ISO, burned it to CD, but it doesn't
> boot. How do I make CDs bootable,
If the ISO describes a bootable CD than the CD-R you burned should
boot I think.
> and is there a free windows program that I can use for the job?
The Debian FAQ says that this is a freeware program that can burn CDs
from ISOs:
Cheers,
Edi.
--
Lisp is not dead, it just smells funny.
Real email: (replace (subseq "spam...@agharta.de" 5) "edi")
What do you use Gauss-Seidel for? Solving a linear system? This is an
extremely bad choice.
For performance, you should
0. Switch to another iteration, e.g. multigrid
1. use CMUCL or SBCL
2. not use multi-dimensional arrays
3. declare the types of your arrays
4. (declare (optimize speed))
5. eliminate all efficiency notes
6. Only in extreme need do runtime-compilation as in
N. Neuss: On using Common Lisp for Scientific Computing
Proceedings of the CISC Conference 2002, LNCSE, Springer-Verlag
(2003).
[http://sit.iwr.uni-heidelberg.de/~neuss/publications.html]
which might prove interesting to you nevertheless.
Nicolas.
I am surprised to hear that. My department's proudest accomplishment,
PENTRAN, uses a similar algorithm, block-jacobi, for solving linear
systems.
http://ufttg.nre.ufl.edu/~haghigha/pentran.html
If there are much better iteration methods, I might end up turning
this into a dissertation project. I don't know if I have time to learn
these methods and apply them before my project is due, but either way
I am excited to hear about these.
>
> 1. use CMUCL or SBCL
I'd like to. I started trying to get linux working yesterday
(developing on windows now). It runs, but can't access the internet.
Can't develop software on a system where I can't import my old files
or export new ones.
> 2. not use multi-dimensional arrays
> 3. declare the types of your arrays
> 4. (declare (optimize speed))
> 5. eliminate all efficiency notes
> 6. Only in extreme need do runtime-compilation as in
> N. Neuss: On using Common Lisp for Scientific Computing
> Proceedings of the CISC Conference 2002, LNCSE, Springer-Verlag
(2003).
> [http://sit.iwr.uni-heidelberg.de/~neuss/publications.html]
> which might prove interesting to you nevertheless.
I will read that article now. Thanks for the tips, Nicolas.
Eric
Will do. Also, I have good news. I'm writing this message on my
temporary linux computer. I am using a knoppix distribution called the
Linux Resource Kit. On to CMUCL. My next goal is to do some
performance tests to see if CMUCL gives me much better speed. After
that I'm going to look into FEMLISP. It might not be practical for me
because I will have to reinstall it every time I reboot my computer,
but it might still work out if installation is fast enough.
Thanks,
Eric
Of all the functions that work right now, this one is the highest
level. So it is the one we are *really* trying to optimize:
(time (criticality-refine 40.0 #'simple-material-selector))
Unfortunately, it's so slow that I haven't seen it complete yet. It
depends one these two functions which can be tested in a more
reasonable time frame:
(time (expand-flux-3d
flux
(/ 4e1 (1- (array-dimension flux 0)))
#'simple-material-selector 1.5))
(time (criticality-fixed flux (/ 4e1 (1- (array-dimension flux 0)))))
The second of them, criticality-fixed, takes longer and is called more
often, so is far more likely to be the bottleneck. (it also depends on
gauss-seidel, which we've been optimizing for many hours)
Can you change your "signature" to mention on what webpage your code
(and running hints) can be found? When I read the above, I wanted to
time the code myself.
--
Everyman has three hearts;
one to show the world, one to show friends, and one only he knows.
Multigrid solvers for the Laplace (diffusion) equation run in time
linear in the number of finest grid points, IIRC.
Paul
It's on my webpage at
plaza.ufl.edu/lavigne/
Look for Neutron Diffusion Code 3.
Running hints are below it.
Note that I don't blame CMUCL for the poor performance. A compiler
should be measured on its performance with good code and its general
helpfulness with bad code. CMUCL gets better marks on both points. In
benchmarks it does better (with good coders providing the benchmark
code, not newbies like me). When I feed it bad code it runs a bit
slower than CLISP but gives me the comments I need to get better. CLISP
never complained about my mistakes, even when I gave it an integer and
called it a float (fixed this morning when I tried to run it on ACL).
CMUCL is the first compiler I've seen, for any language, that gives
warnings about... your code would be 10 ticks per iteration faster
if... On the other hand, I don't understand why it gives me so many
messages about garbage collection. I would think that printing those
messages would take more time than actually collecting the garbage.
I will add a signature right after this post. Thanks for the suggestion.
> On the other hand, I don't understand why it gives me so many
> messages about garbage collection. I would think that printing those
> messages would take more time than actually collecting the garbage.
You can turn them off with (setq ext:*gc-verbose* nil).
Add that form to your initialization file (~/.cmucl-init) and you'll
never see those messages again... :)
It's on my webpage at
plaza.ufl.edu/lavigne/
Look for Neutron Diffusion Code 3.
Running hints are below it.
Note that I don't blame CMUCL for the poor performance. A compiler
should be measured on its performance with good code and its general
helpfulness with bad code. CMUCL gets better marks on both points. In
benchmarks it does better (with good coders providing the benchmark
code, not newbies like me). When I feed it bad code it runs a bit
slower than CLISP but gives me the comments I need to get better. CLISP
never complained about my mistakes, even when I gave it an integer and
called it a float (fixed this morning when I tried to run it on ACL).
CMUCL is the first compiler I've seen, for any language, that gives
warnings about... your code would be 10 ticks per iteration faster
if... On the other hand, I don't understand why it gives me so many
messages about garbage collection. I would think that printing those
messages would take more time than actually collecting the garbage.
I will add a signature right after this post. Thanks for the suggestion.
My code is available on my website.
plaza.ufl.edu/lavigne/
Diffusion Code 3
Ordinary variable declarations look like (declare (single-float
source)), but I can't think of how that would be changed to work with
arrays.
(declare (type (simple-array single-float (* * *)) something)
> I am surprised to hear that. My department's proudest accomplishment,
> PENTRAN, uses a similar algorithm, block-jacobi, for solving linear
> systems.
> http://ufttg.nre.ufl.edu/~haghigha/pentran.html
"PENTRAN performs multigroup, anisotropic, discrete ordinates neutral
particle transport calculations"
May I just point out, that you are solving the diffusion equation, and
(according to the above web page) PENTRAN is solving the radiative
transport equation. These are two entirely different beasts.
It seems that multigrid techniques are common to speed up numerical
solutions to diffusion equation, and other partial differential equations.
But transport equation is a partial differential-integral equation, and
I don't know whether multigrip techniques would work on it, too.
(You who know, please enlighten us.)
What is a differential-integral equation?
Diffusion and transport are different beasts, in the sense that
transport has a lot more simultaneous equations. I don't really
understand how multigrid works, though, so maybe there is something
special about diffusion that makes multigrid effective...
An equation that has both differential and integral terms.
For example, the transport equation is written out as
eq. 2.5, on page 13, in the PENTRAN manual,
http://ufttg.nre.ufl.edu/~haghigha/PENMANV900n.pdf
> I don't really understand how multigrid works, though, so maybe there
> is something special about diffusion that makes multigrid effective...
Then again, googling for 'multigrid radiative transfer' or 'multigrid
radiative transport' seems to give enough reading for more that just one
evening :-)
> Will do. Also, I have good news. I'm writing this message on my
> temporary linux computer. I am using a knoppix distribution called the
> Linux Resource Kit. On to CMUCL. My next goal is to do some
> performance tests to see if CMUCL gives me much better speed. After
> that I'm going to look into FEMLISP. It might not be practical for me
> because I will have to reinstall it every time I reboot my computer,
> but it might still work out if installation is fast enough.
It should be possible to get Femlisp working on GCL, too. This is
something which is on my TODO list. GCL's arithmetic speed might be
acceptable, at least it proved to be so for some simple tests.
Unfortunately, you will encounter other difficulties along this road (which
is why noone has recommended it). Several useful stuff is not yet
available for GCL, for example SLIME. [Another option for Windows is
Allegro, of course. It should be easy to port Femlisp there.]
If you should try out Femlisp, you will note that it is not yet optimized
for speed. This is mainly a problem of Femlisp's datastructures and only a
little bit a problem of Lisp. Also one should keep in mind that Femlisp
works on general unstructured meshes, and calculations on structured meshes
are always an order of magnitude faster in any language. [Using structured
meshes can lead to astonishing performance: it is very well possible to
solve linear systems with 1e6 unknowns arising from discretization of the
2d Laplace equation in less than 1 second (using CL).]
Nicolas.
1) Requires about twice as much memory. One of the reasons for parallel
computing, in addition to the speedup, is that larger problems can be
solved with more computers contributing memory.
2)Complicated (difficult to program).
3)Convergence issues near material boundaries if the materials have
extremely different properties.
On the other hand, he agreed that multigrid was much faster, sometimes
by an order of magnitude. I am under the impression that all of the
above issues are solveable, and it sounds worthwhile to do so. It
sounds like Nicolas Neuss has already taken care of 2 (and possibly 3,
I haven't studied his articles enough yet). I am inclined to ignore 1.
Double the memory for an order of magnitude in speed sounds like a good
deal.
Eric Lavigne wrote:
Cool, so you are undertaking a multigrid project? What happened with the
project due today?
kenny
--
Cells? Cello? Cells-Gtk?: http://www.common-lisp.net/project/cells/
Why Lisp? http://lisp.tech.coop/RtL%20Highlight%20Film
"Doctor, I wrestled with reality for forty years, and I am happy to
state that I finally won out over it." -- Elwood P. Dowd
>What happened with the project due today?
I increased the tolerances, trading accuracy for speed so I could get
it done. I still need to update my webpage about the diffusion code's
latest status.
I am starting a new project called CRYSTAL. My dad designed a software
system years ago to support forensics analysts. The idea was to take a
bunch of existing forensics tools, develop a few new ones, and have one
program (CRYSTAL) that would tie all those tools together. CRYSTAL
would also automatically document the progress of an investigation,
improving the quality of reports and the ease of generating them.
He came up with the idea while doing forensics work for the Air Force,
and he arranged for civilian contractors to program it. After a few
years of slow development, the project was cancelled. I never thought
it should take years, but I have ignored it in the past because of the
need for gui development. Peter Seibel's book showed me that gui
development doesn't need to be complicated. I can just turn the whole
program into a webpage, which brings some nice advantages to the table
anyway.
This week, starting Friday or Saturday, I will create my first
prototype of CRYSTAL. This is a big task. Ordinary programmers tried
for years and failed. But I am no ordinary programmer. I am a Lisp
newbie. Wish me luck :-)
In article <1114746991....@g14g2000cwa.googlegroups.com>, Eric Lavigne wrote:
>>Kenny Tilton Wrote:
>>What happened with the project due today?
> I increased the tolerances, trading accuracy for speed so I could get
> it done. I still need to update my webpage about the diffusion code's
> latest status.
I came to this discussion fairly late. In the event that it's still
relevant, I downloaded your Diffusion 3 code, compiled it under cmucl,
and posted the output at
http://www.theclapp.org/diff-3d.lisp.compile.out.txt. The actual code
I compiled is at http://www.theclapp.org/diff-3d.lisp retrieved from
http://plaza.ufl.edu/lavigne/diff-3d.lisp at 10:12am edt 4/29/05.
Perhaps people more knowledgeable in cmucl than I am could point you
to problems. It prints a lot of warnings along the lines of
; (INCF *ITERATION-COUNT-INNER*)
; --> LET*
; ==>
; (+ *ITERATION-COUNT-INNER* 1)
; Note: Unable to optimize due to type uncertainty:
; The first argument is a NUMBER, not a FLOAT.
[...]
; (ARRAY-DIMENSION FLUX 0)
; Note: Unable to optimize because:
; Array dimensions unknown, must call array-dimension at runtime.
On the other hand, you use clisp, so it might be completely
irrelevant, but I figured it couldn't hurt.
It's also within the realm of possibility that telnetting to
prompt.franz.com might help, at least as far as optimizing your code,
though perhaps not in actually running it. (prompt.franz.com provides
a text-mode version of Allegro CL Trial Edition v6.2.)
-- Larry
Nicolas Neuss convinced me that I was coming at this problem from the
wrong angle. Since I had a project due the next day, I ignored his
advice, stuck with what I had, and focused on finishing. If and when I
return to this project, I will probably throw away my old code and
start over, so I no longer have reason to continue optimizing the old
code.