Lenia implementation

215 views
Skip to first unread message

Dan Wills

unread,
Jan 15, 2020, 6:22:00 AM1/15/20
to reaction-...@googlegroups.com
Hey rd'ers,
Recently heard about this 'Lenia' rule that sounds interesting :) Looks a bit like SmoothLife maybe?
Maybe we already have it and I'm just unaware? I looked but found no mention.
Dan

Torolf Sauermann

unread,
Jan 21, 2020, 4:11:51 AM1/21/20
to reaction-diffusion

Tim Hutton

unread,
Jan 21, 2020, 12:53:10 PM1/21/20
to reaction-...@googlegroups.com
We should reimplement it, because Bert is curious about the speed: https://mobile.twitter.com/BertChakovsky/status/1219663341531975680

--
You received this message because you are subscribed to the Google Groups "reaction-diffusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reaction-diffus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/reaction-diffusion/cda8a738-c6ff-4e6c-a3cf-e84d0f0399ee%40googlegroups.com.

Dan Wills

unread,
Jan 22, 2020, 7:48:44 AM1/22/20
to reaction-...@googlegroups.com
With a fair amount of mucking around I got the python version (LeniaND.py) running on my Gentoo Linux, so I might be able to make some performance comparisons if/when we've ported the formula over ;)

In the code I did see some use of FFT (do have any nice kernel-y solution for that?) though maybe that is just for all the analytics stuff? (the app has a lot in that regard).. in which case (ie FFT is not in the formula), then it should be similar to SmoothLife complexity to implement in Ready, fingers crossed! Seems like an interesting formula to explore!

It didn't really seem very obvious what cell resolution anything was at in the LeniaND.py app, but you can turn it up and down with arguments to the command.

For instance I reckon this launch command makes it run at 1024x1024 and 1-to-1 pixels-to-cells:
python LeniaND.py -s 10 -p 0)
So we'll have to work out how to exactly compare apples to apples wrt res - if Bert is interested in the speed/performance aspect.

I found the Lenia code very interesting in the way it (and the 'cluda' thing that 'reikna', a lib that it imports, imports) uses a layer to somewhat abstract-away the Cuda/OpenCL distinction, and then it drives the whole GPU thing the same way on either platform, all from python! Pretty cool UI as well! Wish I could paint though! ;D

Have a good one all ye ready folk,
Dan


Tim Hutton

unread,
Jan 23, 2020, 8:52:05 AM1/23/20
to reaction-...@googlegroups.com
With large-neighborhood rules like SmoothLife and Lenia, FFT will probably be the fastest way to run them. It's on my list to look into for Ready but at low priority.



--

Dan Wills

unread,
Jan 30, 2020, 4:49:03 AM1/30/20
to reaction-...@googlegroups.com
That is very interesting to hear Tim, do you think that's true regardless of whether the rule is CPU or OpenCL? I've done some somewhat large-neighborhood stuff in glsl and it seems to manage somewhat ok.

I absolutely love FFT (I wrote a plugin that brought FFTW-based FFT-for-images to the compositing app we use at work 'Shake' (now killed by apple)) but am not yet across what a GPU-based solution to FFT looks like, guess I'd better get googling!

I'd be incredibly interested in helping with that though! if it's something we can tinker with initially, I'd be really interested. Do you have any particular libs or approach in mind?

Have a wicked one,
Dan

Robert Munafo

unread,
Jan 31, 2020, 8:45:50 AM1/31/20
to reaction-diffusion
On 1/30/20, Dan Wills <gda...@gmail.com> wrote:
> That is very interesting to hear Tim, do you think that's true regardless
> of whether the rule is CPU or OpenCL? I've done some somewhat
> large-neighborhood stuff in glsl and it seems to manage somewhat ok.

Yes, though the details of memory access patterns vary, and FFT scans
through all the data once for each possible power of 2 stride from 1
element up to the full N^2 elements. (Also, N must be a power of 2
unless you want to write your own FFT codes from scratch). On GPU you
need to ensure coalesced reads/writes for the smaller strides and
avoid bank conflicts for the really large strides. On CPU, the
challenge is cache hit rate, perhaps using streaming prefetch when
possible, etc. Ignoring all that (and assuming the CPU and GPU
programmers address memory access problems optimally) we can just look
at the computational complexity.

If the kernel is KxK in size, and the pattern being simulated is NxN,
then the normal calculation takes O(K^2 * N^2) per frame. But
reaction-diffusion can be done in the frequency domain. It takes O(N^2
* log(N^2)) for the FFT and the same for the inverse FFT, plus O(N^2)
to perform the diffusion (convolution) in the frequency domain. The
difference is O(N^2 * K^2) versus O(N^2 * (1+4 log(N))). If the kernel
is large enough, then 1+4 log(N) is less than K^2.

> I absolutely love FFT (I wrote a plugin that brought FFTW-based
> FFT-for-images to the compositing app we use at work 'Shake' (now killed by
> apple)) but am not yet across what a GPU-based solution to FFT looks like,
> guess I'd better get googling!

Nvidia provides cuFFT library for use in CUDA programs. For OpenCL and
other non-CUDA I'm sure there is source code somewhere.


> On Fri, Jan 24, 2020 at 12:22 AM Tim Hutton <tim.h...@gmail.com> wrote:
>> With large-neighborhood rules like SmoothLife and Lenia, FFT will
>> probably be the fastest way to run them. It's on my list to look into for Ready
>> but at low priority.

--
Robert Munafo

Dan Wills

unread,
Jan 31, 2020, 11:13:26 PM1/31/20
to reaction-...@googlegroups.com
Cheers for the analysis Robert! really helps to build some intuition about where complexities lie and where the benefits can be :)

I found this that says it's FFT for OpenCL:

Last commit was a fair while ago, but seems worth checking out.

Dan

--
You received this message because you are subscribed to the Google Groups "reaction-diffusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reaction-diffus...@googlegroups.com.

Bert Chan

unread,
Feb 3, 2020, 8:55:52 AM2/3/20
to reaction-diffusion
Just found this thread from google search.

As Robert explained, FFT is the best method to calculate convolution with large kernels (radius > 10). The formula of Lenia is quite similar to continuous-time SmoothLife, i.e. convolution => mapping => incremental update. I actually got how to use FFT from this Javascript implementation of SmoothLife:

1D FFT is also used in analysis e.g. detect symmetry, rotation speed, periodogram.

Reikna library did save me from tedious CUDA/OpenCL coding. But GPU is significantly faster only for large arrays (e.g. large world, 3D), for small worlds (e.g. 128x128) old school CPU is faster.

Bert

Reply all
Reply to author
Forward
0 new messages