Okay,
For a long time, I have this dream of pursuing real-time brain simulations.
As the brain is a giant network of interconnected neurons, it actually forms a graph of graphs....
To my best knowledge, the major obstacle in today's neuroscience is actually the speed of coding in order
to get experiments done. There is a research DSL's based on Python which suffers on ugly debugging
caused by dynamic typing. Other tools are written in Java but suffer on verbose (slow) coding and even
worse, they hardly scale at all. Scaling, in general, is big issue as many aren't familiar with cluster programming.
Scala, fits in nicely as its safe, fast to code and has already solid concurrency built-in.
I'm just finishing my Msc around compile-time verification of correct concurrency in Scala.
With Scala's compiler macro's, code checking and transformation works but not exactly trivial.
When it comes to scaling out, Scala prefers actors which is fine for
commercial deployment where fault-tolerance is more important than efficiency but for
processing billions of requests, it's just hopeless.
I've experimented with a special-form of consumer & producer interconnected with
a circular buffer (aka Disruptor) and its actually the fastest possible local concurrency in Java
but it can scale out in a Cluster using any messaging protocol. However, the problem is
programming this beast is pain in the ass, to say the least.
While thinking more about how to tackle this issue,
I've come up with the idea of a Neuro-DSL embedded in Scala that stands on three pillar's:
1) Cache optimized data-structures.
The reason why many data structures in Java and Scala have such a terrible performance is simply the wast
amount of cache-misses they produce on CPU level. Just adding simple Struct of Arrays adds
a ~100% speedup in execution time. Going further, exploiting spatial as well
as temporal data locality results in near zero cache misses and close to maximum performance a CPU can deliver.
Combined with some semi-manual memory management, (packing & unpacking data in ByteArrays) means,
Java/Scala code can run up to 40x faster compared to standard collections and thus comes very close to the speed of C/C++ code.
The problem is, coding these structures over and over again is kinda borrowing but using custom
data-structures in a DSL's actually means cache optimized data-structures can be used by default.
Also, using the "pimp my library" pattern of implicit conversion could be used to lift some of
the worst performers (aka List) into a cache friendly version for SIMD calculations.
2) Built-in graph operators that scales vertical with nr. of cores and horizontal with the nr. of nodes in a cluster.
As neurons connect with several other neurons, efficient graph processing is the key. I admit, I have only
very little understanding of modern graph algorithms. Maybe Adelbert like to comment at this point.
3) Super fast coding. It needs to be minutes, not weeks.
That is where Bloom comes in. Coding in Scala is already a magnitude faster compared to Java
but it is pretty weak in distributing tasks efficiently, fast and correct. Bloom
on the other hands shines in minimizing coding related to distributing tasks
and is very strong in reasoning about correctness.
How to combine them together, ideally while exploiting concurrency efficiently?
There is an ongoing effort at Standford's pervasive parallelism lab for domain-specific languages embedded in Scala
with the goal to develop parallel software without becoming an expert in parallel programming. One result is
an "infrastructure" to build embedded DSL's so I would suggest to try their stuff first as they already have used it to
build two prototype DSL's in Scala. .
Code is on Github:
One of Standford's DSL is for efficient graph analysis:
However, to my best knowledge, there is no DSL dedicated to Neuroscience
that is optimized for heterogeneous concurrency, safe, scales *and* is super fast to code.
Of course this is hard and there are many, many more questions to ask.
I do not have all the answers but I I would say, there is a path to go and a challenge worth going for.