I sent you an email off-list. Let me know if you didn't receive it.
Playing around with the JVM settings, I was able to get a speedup of
my Clojure code without doing any further modifications. I'm getting
around 14 minutes on a 2Ghz machine; given that my machine is slower
than yours, that's roughly in the ballpark of 10x slower than your C++
version, which is about what I expect from Clojure.
I don't have time to work on this problem more, but here's a summary
of how I approached it. Maybe someone will be motivated to run with
this in Qi. I'll use the 5-digit problem as an example.
Build a set of all the 5-digit primes where reversing the digits gives
a different prime number. These are the relevant primes for building
the prime square.
Also, consider the 5-digit primes which are less than their reverse.
I'll call these "halfPrimes" because this set is half the size of all
the primes under consideration (we've kept one prime from each pair).
halfPrimes will turn out to be important for ensuring that the grids
are unique with respect to isomorphism.
Build some sort of table/hashmap where you can easily look up
questions like:
Give me a set of all the prime numbers where the third digit is a 5.
Give me a set of all the prime numbers where the first digit is a 1.
etc.
A solving state is a collection of lines that need to be solved.
For each line, I track:
1. The cells of the prime square that it spans.
2. A "pattern" that tells you what digits are known so far for this
line, for example, 1**89.
3. A set of all primes that fit the pattern so far.
#1 doesn't change, but #2 and #3 change as the solving progresses
(they begin with a blank pattern and all the primes are
possibilities).
We only need to consider one direction for each line, so for a 5x5
grid, there are 12 lines we need to solve.
Also, you need a table where you can easily lookup for a given cell,
which lines go through that cell and at what position they intersect
that cell.
So for example, if the grid cells are labeled:
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
then cell 11 is part of the line [10,11,12,13,14] at its second
position and part of the line [1,6,11,16,21] at its third position.
Armed with all this pre-compiled data, the solving strategy is
relatively straightforward.
To achieve isomorphically unique grids, we begin by picking two primes
from the halfPrimes set and assign them to the two diagonal lines.
We'll now do a depth-first to find all the grids that match those two
starter lines. If we do this for every combination of two primes from
the halfPrimes set, we're guaranteed to get all the possible prime
squares, exactly once. For odd sizes (such as 5x5), we only need
worry about the combinations of two primes whose middle digit matches.
When you "assign a prime" to a line, you need to remove that prime and
its reverse from the set of possibilities under consideration in all
the other lines. Then, for each digit in the prime, you need to add
it to the "pattern" of all the intersecting lines, and restrict the
set of primes accordingly.
So, for example, if you assign the prime 90019 to the line
[1,6,11,16,21], then you need to add 9 to the pattern for line
[0,1,2,3,4] (so the pattern will now look like *9*** if this is the
first digit to be assigned to that line), and then we intersect its
set of possible primes with the set of all primes that have a 9 as the
second digit.
Using this strategy, just do a depth-first search, always branching by
taking cases on the line with the fewest possible primes.
By using purely functional, non-destructive representations for all
these things, the depth-first, backtracking search is a piece of cake.
I've intentionally avoided getting too specific about the exact
representations I chose, mainly because I want to avoid influencing
others to follow too precisely in my path. It seems clear that
choosing different representations for the lines, the grid, the tables
of information, etc. will have a significant impact on speed. I think
I made reasonable choices, but someone starting from scratch may well
find a faster representation than I. I also speculate that the best
representional choice will be language-dependent. I briefly tried
porting my Clojure solution to Haskell, thinking that as a statically-
typed language, Haskell would be faster. But it actually turned out
to be waaaaaay slower, possibly because the representation I used just
isn't as efficient in Haskell. Clojure's sets and persistent vectors
are quite fast and I made heavy use of them.
I think if I were writing this in an imperative language, I would use
essentially the same strategy, and would similarly use immutable sets
to track the set of possible primes for each line. The "assign a
prime" step is the trickiest step to do efficiently in a functional
language, so I'd probably make use of arrays in my representation of
patterns and lines to hopefully get a bit of a speed boost. Then, it
would be necessary to copy the arrays at each step of the search. Of
course, most functional languages allow some sort of "back door" to do
some mutable trickery behind the scenes for efficiency (for example,
using mutable arrays as an intermediate step to efficiently build some
sort of externally immutable representation). So it's possible to
have your cake and eat it too, but I refrained from using such
features of Clojure to adhere to the spirit of the challenge.
Is there a fundamentally different, better strategy that is only
viable in an imperative language?