A port of Norvig's Sudoku solver to Julia

294 views
Skip to first unread message

andy hayden

unread,
Jul 1, 2014, 1:37:00 PM7/1/14
to julia...@googlegroups.com
I recently ported Norvig's Solve Every Sudoku Puzzle to Julia: https://github.com/hayd/Sudoku.jl

Some simple benchmarks suggest my Julia implementation solves around 20% slower* than the Python version, and 3 times faster than the implementation on JuMP (vendorized from the latest release), against the random puzzles. I tried to include the solver from attractivechaos/plb but couldn't get it working for comparison...

I'm new to Julia so would love to hear people's thoughts / any performance tips!
I've not delved too deeply into the Profile, but @time suggests 10% of time is GC.

*I'm sure I've lost some performance in translation which could be easily sped up...

Best,
Andy

Steven G. Johnson

unread,
Jul 1, 2014, 5:05:49 PM7/1/14
to julia...@googlegroups.com
On Tuesday, July 1, 2014 1:37:00 PM UTC-4, andy hayden wrote:
I recently ported Norvig's Solve Every Sudoku Puzzle to Julia: https://github.com/hayd/Sudoku.jl

Some simple benchmarks suggest my Julia implementation solves around 20% slower* than the Python version

A few things that pop out at first glance:

You probably want to use a different data structure than an array of strings.   Your array of strings approach requires you call the replace() function in your innermost loop body, which allocates a new string.   It would be better to use some mutable data structure (e.g. an array of 10 bools, one per digit?  I would actually tend to use bit tricks here to use one bit per digit in a single Uint16)

The allocation of a new dictionary (poss) in every call to search(...) seems unnecessary.   A dictionary is total overkill here because the only things you do with the dictionary are to call length and minimum on it, both of which can be computed much more cheaply just by a single loop over vals.

Probably your globals should be made const ... non-const globals are a performance sink.

Your search(...) function calls copy(vals) in its innermost loop, which seems horrendously wasteful.  Rather than copying the whole data structure, I would do the assign! in-place, then do the search, and then undo the assignment if the search did not succeed.

Many of your functions are not type-stable because they return "false" to signal failure and some other value on success.    You should be able to do things in a type-stable way.   For example, I would suggest that you change search(vals) to search!(vals), changing vals to the solution in-place (see above), and then return true or false depending upon whether the search succeeds.

Iain Dunning

unread,
Jul 1, 2014, 6:06:30 PM7/1/14
to julia...@googlegroups.com
Hah I'm actually surprised there isn't more of a performance gap for the MIP formulation, thats kinda cool.


On Tuesday, July 1, 2014 1:37:00 PM UTC-4, andy hayden wrote:

Stefan Karpinski

unread,
Jul 1, 2014, 6:08:43 PM7/1/14
to Julia Users
If the Sudokus were *really* huge I bet the MIP would win ;-)

Iain Dunning

unread,
Jul 1, 2014, 6:11:27 PM7/1/14
to julia...@googlegroups.com
I'm working on improving this, but I'm not sure how you are measuring that 20% slower - can you be more specific?


On Tuesday, July 1, 2014 1:37:00 PM UTC-4, andy hayden wrote:

andy hayden

unread,
Jul 1, 2014, 6:27:15 PM7/1/14
to julia...@googlegroups.com
Bench.jl has a bench_compare method which returns a DataFrame of times (I then divide the median of Python vs Julia columns), I'll add this output to the Bench script as it's useful to see (would be nice to add more stats, as it's just a DataFrame of all the solved puzzles in seconds). By default it runs a hundred random sudoku's on Julia, Python, and JuMP (the same on each)...

Thanks Steven: Making those const makes a huge difference, Julia wins (from 20% slower to 10% faster for me with just that change).
I will have a play and see how your other suggestions play out.

I was also very impressed with JuMP here... and it may be the latest is even faster (I'm using the version from the last release rather than master, and it has changed since then).

Iain Dunning

unread,
Jul 1, 2014, 6:47:04 PM7/1/14
to julia...@googlegroups.com
I was unable to run Bench.jl (ERROR: varzm! not defined), but, on my computer just using runtests.jl, a fixed seed, and total time for 100 random

*Initial
elapsed time: 1.641434988 seconds (282491732 bytes allocated, 5.99% gc time)

*Change globals to const
elapsed time: 1.563094028 seconds (261818132 bytes allocated, 6.61% gc time)

* Changing from using a Dict{Int64, *} for the Grid types to just a Vector{*}, as well as those other globals
elapsed time: 1.373703078 seconds (191864592 bytes allocated, 4.91% gc time)

Iain Dunning

unread,
Jul 1, 2014, 6:47:39 PM7/1/14
to julia...@googlegroups.com
JuMP won't be getting any faster, its entirely limited by the speed of the MIP solver. Which one did you use?

Andy Hayden

unread,
Jul 1, 2014, 6:59:02 PM7/1/14
to julia...@googlegroups.com
I was using Cbc.

SolveModel is a copy and paste job from JuMP (from the last release rather than master) so may not work with JuMP from master - I couldn't get the version from master working since it was incompatible with the JuMP release I had! It'd be great to just be able to just include the file, but I couldn't get that working so I just pasted it in (I should probably clean Bench as I made quite a mess, apologies about that.)... so it may be you need to update SolveModel from JuMP master/your version of JuMP to get Bench working.

It's amazing how some small tweaks like this go so far, there's a few other bits that are obvious even to me (but I just couldn't get working).

Iain Dunning

unread,
Jul 1, 2014, 7:45:04 PM7/1/14
to julia...@googlegroups.com, andyh...@gmail.com
Yeah we changed the example, so best to take it from the one in the release version...

I removed the dictionary from search() but its now no longer solving all the problems(!) - does the algorithm rely somehow on the way the dictionary is constructed?

andy hayden

unread,
Jul 1, 2014, 7:58:44 PM7/1/14
to julia...@googlegroups.com, andyh...@gmail.com
Ha, I had exactly the same issue (I pushed a perf update with a commented out impl), I assumed it was something (very) wrong in my understanding of control flow!

I don't think see how it would rely on anything, ordering (?)... perplexing.

Iain Dunning

unread,
Jul 1, 2014, 8:13:37 PM7/1/14
to julia...@googlegroups.com, andyh...@gmail.com
Stripping out the dictionary stuff in search and doing it in a single array pass has knocked me down to 
elapsed time: 1.305257143 seconds (183884144 bytes allocated, 3.85% gc time)

Changing to bitarrays would be the real fix, and I got halfway, but converting the code was so painful I might just write my own solver based on the same bruteforce idea.

Andy Hayden

unread,
Jul 1, 2014, 8:28:06 PM7/1/14
to Iain Dunning, julia...@googlegroups.com
It could be what quinnj's port does https://github.com/attractivechaos/plb/blob/master/sudoku/sudoku_v1.jl. I couldn't get this working to compare...

andy hayden

unread,
Jul 2, 2014, 12:44:47 PM7/2/14
to julia...@googlegroups.com
Using a single BitArray increases performance by 10x! I've pushed that to github. It also feels a much neater solution.

I still can't work out how to strip out the dictionary in solve..

Iain Dunning

unread,
Jul 2, 2014, 12:48:04 PM7/2/14
to julia...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages