'functional' performance VS 'imperative' complexity

Jim - FooBar();

unread,

Aug 25, 2012, 4:01:21 PM8/25/12

to clo...@googlegroups.com

Hello everyone,

in this post I'm not asking for something specific, but rather I'd like to spark a discussion regarding the issue of performance within the functional paradigm...most of the things i will mention will probably not be news for most of you...Hopefully, however the issues I plan to raise will eventualy help someone else faced with the same dilemmas as me...

First of all, let me clarify upfront that I strongly believe that 'functional' and 'immutable' should be the default (as Rich often says). This thread is certainly not about praising the 'imperative' style. It is about having all the facts before you start coding (probably before even designing)...Most of you presumably already do...

Ok so, it is evident from my other posts that I'm building a library for writing board games. In a nutshell, when it's finished, I'd like someone else to be able to write up his own board game, show it up on screen and genetically train a neural-net for an opponent, in literally less than 5-6 hours (~ 100 LOC).
Now, for those of you that have done any board games you can immediately identify the hot-spots of such a program. These are 2: actually exploring the game-tree and training the neural-net. Notice how both these tasks can be run in parallel...(exploring the game tree is not immediately apparent how to do in parallel but we have reducers)...

Generally there are 3 major ways going about writing a program - functionally all the way, imperatively all the way, a mixture. I sort of implemented all 3 categories for my chess game and I've got some interesting results:

Firstly and more importantly (I mean that), the purely functional and immutable road is simply such a pleasure to travel...There are no words to describe the beauty, clarity and elegance of the functional version. Mutation is non-existent or only through reference types and operations like 'update-position' and 'move' return brand new piece and board respectively. Also, it is the only one that is extremely stable and always brings back the correct answer. It is *gorgeous*... On the flip-side, it performs horrible! It is very very slow for realistic depths like 4 or 6 even regardless of utilising reducers to the maximum and countless optimisations. The best time I can report is 9 min for level 4.
after watching Daniel Solano Gomez's presentation on infoq ("11 tips to boost performance"), I realised that If I wanted raw speed (as he puts it), I 'd have to resort to arrays. Well, I made my heart a stone and went to implement an array-based version that favours mutation. Having such modular code in the first place, that did not take too long...I just wrote up different version of 'move' and 'collides?' (amove, acollides?) that know how to deal with arrays and created a ChessPiece2 record which holds a java.awt.Point object which is mutated by update-position (instead of returning a brand new piece). Basically, (I thought) i was done in 30 min... However it turned out I was being sooo ignorant!!! Making such a u-turn in programming paradigms while working on the same project is never that simple. The functional style protected me from so many bad things...of course, I already knew that but I was surprised to see how many these are! For instance, making a move in the functional version caused absolutely no damage...there is an 'execute!' fn that does damage if we want it to (via atom only) but this is to be used only when we decide what move we want. Now, trying out a move messes up everything!!! Now, I need means of undoing and not only that...My entire searching algorithm can no longer function properly...Off the top of my head, I need some sort of serial loop/recur that tries moves when recursion rolls in and takes them back (by undoing) when recursion rolls out . In other words I need to keep track of the changes carefully! On the filp-side, even though this version has bugs and does not return the correct answer, it seems it can reach level 4 in roughly 2 min. This is 4x faster! Again, I'm being cautious cos there are bugs so I can't be sure of the time but there seems to be a dramatic performance increase...The code however is a lot buggier and uglier...
Finally I tried doing a functional version that uses mutable arrays but doesn't mutate them...Instead critical fns like 'move' build new arrays (btw 'aclone' does not deep copy) to return and updating a piece's position does return a new piece... This version, is not very stable but does return the correct answer in just over 7 min for level 4...again the importance of immutability shines! the timing shows that i got 22% better performance. I have not profiled this but I expect most of the time being spent copying arrays.

In essence what am I telling you here? 2 things basically... Firstly, think about performance before you actually design your solution...it may be that some tasks are not well suited for immutable data-structures and it can be a show stopper. Don't think for a second that you can convert your gorgeous, purely immutable approach to a purely mutable (better performing one) without any cost. These are 2 fundamentally different choices that lead to completely different algorithmic designs...

so where does this leave me? well, I am going to stick with the elegant, all-clojure solution and perhaps find a 8/12-core machine to do my training on... however, in a different setting I might choose otherwise...at the moment I don't have time to spend hundreds of hours to fix all the bugs of the mutable version so i can get it to preform better... I hate the process as well...if someone was paying me though, my ego would be far less!

Of course, there are going to be people that will say: "You're using the wrong algorithm!", "You can prune the tree!" etc etc...that is not the point though...even of I do pruning does anybody think I can get to level 4 in less than 5 min? That would require pruning half the tree! anyway, what I'm trying to say is that in algorithms like these performance matters a lot... most machine learning algorithms involve matrix multiplication...why do you think all the machine-learners use matlab? you think they enjoy writing matlab code? No - it just goes super fast ! If performance matters to you , then perhaps your time is best spent thinking a mutable approach from day 1... I cannot believe I said that but there you go - I said it!

No hard feelings eh? I still love Clojure... :-)

Jim

Dennis Haupt

unread,

Aug 25, 2012, 5:10:48 PM8/25/12

to clo...@googlegroups.com

+1

i stay functional if possible and fall back to mutable on isolated,
performance critical spots if i can't get it done fast enough in a
purely functional way.

i solved the move-mess-up-everything problem by forcing a move to
implement both apply and unapply on a game board. (it was a java
project). this way, i could still safely try a move and undo it in
secret - and i could still use multithreading by making one copy of a
board per thread.

however, depending on the game, the single thread version was up to 10x
faster because of being able to cut off large portions of the tree while
the multithread version had less information per branch and could not
simply share it while travering the tree of possibilities.

> 1. Firstly and more importantly (I mean that), the purely functional

> and immutable road is simply such a pleasure to travel...There are
> no words to describe the beauty, clarity and elegance of the
> functional version. Mutation is non-existent or only through
> reference types and operations like 'update-position' and 'move'
> return brand new piece and board respectively. Also, it is the only
> one that is extremely stable and always brings back the correct
> answer. It is *gorgeous*... On the flip-side, it performs horrible!
> It is very very slow for realistic depths like 4 or 6 even
> regardless of utilising reducers to the maximum and countless
> optimisations. The best time I can report is 9 min for level 4.

> 2. after watching Daniel Solano Gomez's presentation on infoq ("11 tips

> 3. Finally I tried doing a functional version that uses mutable arrays

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

--

Joshua Ballanco

unread,

Aug 26, 2012, 6:03:40 AM8/26/12

to clo...@googlegroups.com

> 1. Firstly and more importantly (I mean that), the purely functional

> and immutable road is simply such a pleasure to travel...There are
> no words to describe the beauty, clarity and elegance of the
> functional version. Mutation is non-existent or only through
> reference types and operations like 'update-position' and 'move'
> return brand new piece and board respectively. Also, it is the only
> one that is extremely stable and always brings back the correct
> answer. It is *gorgeous*... On the flip-side, it performs horrible!
> It is very very slow for realistic depths like 4 or 6 even
> regardless of utilising reducers to the maximum and countless
> optimisations. The best time I can report is 9 min for level 4.

> 2. after watching Daniel Solano Gomez's presentation on infoq ("11 tips

> 3. Finally I tried doing a functional version that uses mutable arrays

I would love to have some time to look into the details of your specific
problem more, but in the absence of time, might I suggest two quick
points:

1. Gary Bernhardt has been playing with a "new" approach he calls
"Functional Core, Imperative Shell". Essentially, it's another take on
the question of how to limit the scope of mutation in order to get the
most out of the correctness of mutation-free algorithms and the
performance of mutating data instead of replicating it.

2. Along the same lines, have you made the most out of transients in
your code? From your description, it seems like you have less work
happening within methods than between methods, but perhaps if you
"manually" inline some of the work, transients could provide improved
performance

At any rate, it's been very informative watching you hammer on this
problem.

Cheers,

Josh

--
Joshua Ballanco

ELC Technologies™
1771 NW Pettygrove Street, Suite 140
Portland, OR, 97209
jbal...@elctech.com

P +1 866.863.7365
F +1 877.658.6313
M +1 646.463.2673
T +90 533.085.5773

http://www.elctech.com

Jim - FooBar();

unread,

Aug 26, 2012, 6:16:29 AM8/26/12

to clo...@googlegroups.com

On 26/08/12 11:03, Joshua Ballanco wrote:

I would love to have some time to look into the details of your specific
problem more, but in the absence of time, might I suggest two quick
points:

Well, feel free to have a look at the project on github when you find some time ( https://github.com/jimpil/Clondie24)...I should clarify that the big problem is with chess and not any other games...I'm expecting games like checkers or tic-tac-toe to perform just fine with the functional solution.

1. Gary Bernhardt has been playing with a "new" approach he calls
"Functional Core, Imperative Shell". Essentially, it's another take on
the question of how to limit the scope of mutation in order to get the
most out of the correctness of mutation-free algorithms and the
performance of mutating data instead of replicating it.

hmmm...I will definitely check this out...It is certainly an important issue!

2. Along the same lines, have you made the most out of transients in
your code? From your description, it seems like you have less work
happening within methods than between methods, but perhaps if you
"manually" inline some of the work, transients could provide improved
performance

well, transients do play an important role in my core 'move' fn which used to create 2 extra vectors as a result of 2 'assoc's...I got a 15% performance increase by using them despite of only mutating 2 indices! I cannot really use them anywhere else though...as far as inlining goes, most of my core fns that are involved in moving (which happens a lot) are 'definline'd and the fn that translates the position from 1d to2d and vice-versa (also called very frequently) is memoized....personally I cannot think of any other optimisations...the best suggestion so far was by Nicolas who basically proposed to precalculate all the pieces moves (apart from pawn) in a big table (a nested map) instead of calculating them on the fly...this made a bid difference! before that, the profiler was showing 86% core.logic.Subsitutions!!!!

Jim

Joshua Ballanco

unread,

Aug 26, 2012, 6:35:21 AM8/26/12

to clo...@googlegroups.com

On Sun, Aug 26, 2012 at 11:16:29AM +0100, Jim - FooBar(); wrote:
> On 26/08/12 11:03, Joshua Ballanco wrote:
> >I would love to have some time to look into the details of your specific
> >problem more, but in the absence of time, might I suggest two quick
> >points:
>
> Well, feel free to have a look at the project on github when you
> find some time ( https://github.com/jimpil/Clondie24)...I should
> clarify that the big problem is with chess and not any other
> games...I'm expecting games like checkers or tic-tac-toe to perform
> just fine with the functional solution.

Stared! We've been having a weekly Clojure study group at work. I don't
think we're at the point, as a group, where we would be able to tackle
this, but I'll keep it in mind for the future.

> >1. Gary Bernhardt has been playing with a "new" approach he calls
> >"Functional Core, Imperative Shell". Essentially, it's another take on
> >the question of how to limit the scope of mutation in order to get the
> >most out of the correctness of mutation-free algorithms and the
> >performance of mutating data instead of replicating it.
>
> hmmm...I will definitely check this out...It is certainly an
> important issue!

Ah, just realized I forgot the link:

https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell

It's a for-pay screencast series that he does, but well worth the money
in my opinion!

Patryk Bukowinski

unread,

Aug 26, 2012, 4:51:15 AM8/26/12

to clo...@googlegroups.com

Hi Jim,
Reading your story I've got an impression that you make 'functional' and 'immutable' a synonym, not default.
Implementation should be more transparent.

In APL func&vect programming languages fammily there are tools which amends values in place.
It feels so natural, part of a language used in ordinary functional way even at higher abstraction level.
People use those languages for ML because solutions are much faster than Matlab, being very neat functional solutions.

Killing performance for religious paradigm of immutability may kill the language.

cheers
patryk

Jim - FooBar();

unread,

Aug 26, 2012, 8:50:10 AM8/26/12

to clo...@googlegroups.com

On 26/08/12 09:51, Patryk Bukowinski wrote:

Hi Jim,
Reading your story I've got an impression that you make 'functional' and 'immutable' a synonym, not default.
Implementation should be more transparent.

In APL func&vect programming languages fammily there are tools which amends values in place.
It feels so natural, part of a language used in ordinary functional way even at higher abstraction level.
People use those languages for ML because solutions are much faster than Matlab, being very neat functional solutions.

Killing performance for religious paradigm of immutability may kill the language.

cheers
patryk

So, you're implying that 'functional' and 'immutable' do not share the same roots? It is my understanding that unless you go fully immutable you're sort of getting half the functional story...As for scientific computing and ML have you checked julia? ( http://julialang.org/)

Implementation should be more transparent? how can you do that since the 2 approaches are fundamentally different? That was sort of my point...You need to think about these things early...It is not easy to switch!

cheers
Jim

nicola...@gmail.com

unread,

Aug 26, 2012, 10:56:13 AM8/26/12

to clo...@googlegroups.com

>
> 1. Gary Bernhardt has been playing with a "new" approach he calls
> "Functional Core, Imperative Shell". Essentially, it's another take on
> the question of how to limit the scope of mutation in order to get the
> most out of the correctness of mutation-free algorithms and the
> performance of mutating data instead of replicating it.
>

It seems to be this treats of another question: how to have IO in a
functional world.
And it gives the classic answer: write the functional core and wrap s
small IO shell around.

Jim is asking another question: what if you can't get good enough
performance in a given algorithm
with persistent functional data structures.

My views is that state is difficult. Every state makes your program
harder to maintain and less modular.
However, a localised state to a function or a group of functions is
perfectly fine.

In the case of your checkers problem, you can encapsulate in a
protocol the notion of being an undoable move:

(defprotocol UndoableMove
(move [this b] "return the board or nil if it can't be applied")
(undo [this b] "returns the board. Fails if it cannot be undone"))

Then create a protocol for boards:

(defprotocol BoardLike
(apply-move [b m] "Apply move m. Keep it in undo stack. Returns a board.")
(undo-last-move [b] "Undo last move if any, else fails. Return a board"))

(Note: I like better threading the board, even if it is updated in place.
It offers more flexibility: you can use persistent/transient or
mutable data-structures.
You can also decide to change representation of the board, if you want.)

What is essential though, is that if I use that to explore the game
tree, I would NOT use it
to represent the state of the game in rest of the program. Too many
way to stumble later.
I would convert the board into the mutable representation, and the
explore the game tree.
And copy back into an immutable board once done.

Transients are a good way to do this.

But the key is mutable state works for me for mono-thread, small localised code.

Jim - FooBar();

unread,

Aug 26, 2012, 11:17:29 AM8/26/12

to clo...@googlegroups.com

Thanks for the snippet Nicolas but that is not the problem! I do know
how to implement the 'undo' functionality...In OOP this is called the
"Command" design pattern...The command interface has execute(from, to)
and undo(from,to) (which calls execute with reversed arguments)...That
part is not hard at all...the problem arises when trying to manage all
this state (as you indicated)....also since I 'm using reducers quite
heavily, it is not easy to isolate mutation on 1 thread (it also seems
impossible to do any pruning)... I would have to revert to single
threaded execution in order to implement all this...

Jim

btw, my functional approach of undoing was simply to log every new board
change in a vector via watches and atom. This way not only I can undo
without calling 'move' but I can also serialize the entire history and
have it available when you load the game back in! pretty cool and simple
I think...

Reply all

Reply to author

Forward