Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Super Single SQL Set Statement Sudoku Solution

42 views
Skip to first unread message

Alan Samet

unread,
Apr 22, 2008, 10:23:32 AM4/22/08
to
...complete with alliteration.

I've just written what I believe to be an entirely new approach to
solving Sudoku problems using T-SQL. This approach takes the finite
set of possible combinations of 1-9 (9-factorial [9!], or 362,880 and
stores that in a table of statesets. The following query then, through
a combination of limiting expressions, derives the puzzle solution in
a single operation. This is a set-oriented solution, meaning that it
involves a lot of cross-row operations within the table and full set
operations. It also brings out some shortcomings of the query
optimizer and crashes GDI in SQL Management Studio when you try to use
the widget to look at the execution plan. I use lots of common table
expressions here to simplify the query; it's possible to do without
the CTEs, but at a significant expense of readability.

I'm posting a write-up, complete with documented SQL, a WPF test
application, an optimized step-by-step execution and sample data on my
website here:
http://www.socialdetour.com/Profile/ViewEssay.aas?user=Alan&g=5c0af944-4bf6-4bde-bf25-0d69061be087

However, if you just want the SQL, I'm posting that here, along with
the table creation scripts. You use a GUID as a puzzle identifier and
insert the GUID/row/cell/value combinations into the Puzzles table for
known values.

Enjoy!

PS. If anyone needs a database developer, I'm on the market:

-- ## BEGIN DATABASE SCHEMA ## --

USE tempdb

GO

SET NOCOUNT ON

GO

IF DB_ID('Sudoku') IS NOT NULL BEGIN
ALTER DATABASE Sudoku SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE Sudoku
END

GO

CREATE DATABASE Sudoku

GO

USE Sudoku

GO

ALTER DATABASE Sudoku SET RECOVERY SIMPLE

GO

CREATE TABLE StateSets
(
StateSet INT
NOT NULL
, Position TINYINT
NOT NULL
CHECK (Position BETWEEN 1 AND 9)
, Val TINYINT
NOT NULL
CHECK (Val BETWEEN 1 AND 9)
, PRIMARY KEY (StateSet, Position)
, UNIQUE(StateSet, Val)
)

GO

CREATE INDEX IX_Position_Val ON StateSets (Position, Val, StateSet)

GO

CREATE TABLE Puzzles
(
PuzzleGUID UNIQUEIDENTIFIER
, Row TINYINT
NOT NULL
CHECK (Row BETWEEN 1 AND 9)
, Col TINYINT
NOT NULL
CHECK (Col BETWEEN 1 AND 9)
, Cell AS (Row - 1) / 3 * 3 + (Col - 1) / 3 + 1 PERSISTED
, CellPosition AS ((Row - 1) % 3) * 3 + ((Col - 1) % 3 + 1) PERSISTED
, Val TINYINT
NOT NULL
CHECK (Val BETWEEN 1 AND 9)
, PRIMARY KEY (PuzzleGUID, Row, Col, Cell)
, UNIQUE (PuzzleGUID, Row, Val)
, UNIQUE (PuzzleGUID, Col, Val)
, UNIQUE (PuzzleGUID, Cell, Val)
, UNIQUE (PuzzleGUID, Row, Col)
)

GO
-- nums = range(1,10)
DECLARE @States TABLE
(
StateSet INT
IDENTITY -- IDENTITY is fine here, table variable
PRIMARY KEY
, P1 INT
, P2 INT
, P3 INT
, P4 INT
, P5 INT
, P6 INT
, P7 INT
, P8 INT
, P9 INT
)

DECLARE @V TABLE
(
V INT
)

DECLARE @I INT
SET @I = 1
WHILE @I <= 9 BEGIN
INSERT @V
SELECT @I
SET @I = @I + 1
END

CREATE TABLE StateValues
(
StateValue TINYINT PRIMARY KEY
)

INSERT StateValues SELECT V FROM @V

INSERT @States(P1,P2,P3,P4,P5,P6,P7,P8,P9)
SELECT t1.V
, t2.V
, t3.V
, t4.V
, t5.V
, t6.V
, t7.V
, t8.V
, t9.V
FROM @V t1,@V t2,@V t3,@V t4,@V t5,@V t6,@V t7,@V t8,@V t9
--print '%s=%d' % (reduce (lambda a, b : '%s*%s'%(a, b), ['t%s.V'%n
for n in nums]), reduce(lambda a, b:a*b, nums))
WHERE t1.V*t2.V*t3.V*t4.V*t5.V*t6.V*t7.V*t8.V*t9.V=362880
AND (t1.V+1)*(t2.V+1)*(t3.V+1)*(t4.V+1)*(t5.V+1)*(t6.V+1)*(t7.V
+1)*(t8.V+1)*(t9.V+1)=3628800
ORDER BY t1.V, t2.V, t3.V, t4.V, t5.V, t6.V, t7.V, t8.V, t9.V

SELECT *
INTO #States
FROM @States

GO

INSERT StateSets
(
StateSet
, Position
, Val
)
SELECT StateSet
, REPLACE(Position, 'P', '')
, Val
FROM #states
UNPIVOT (Val FOR Position IN (P1, P2, P3, P4, P5, P6, P7, P8, P9))n

GO

-- ### BEGIN SOLUTION ### --

WITH Puzzle AS
(
SELECT Cell
, CellPosition
, Row
, Col
, Val
FROM Puzzles
WHERE PuzzleGUID LIKE '4877%'
)
, RowSolutions AS
(
SELECT RowStateSet
, Row
, Position
, Val
FROM (SELECT StateSet RowStateSet
, Row
FROM Puzzle
inner join StateSets
ON Position = Col
AND Puzzle.Val = StateSets.Val
inner join StateValues
ON Row = StateValue
GROUP BY StateSet, Row, Statevalue
HAVING COUNT(*) = (SELECT COUNT(*) FROM Puzzle WHERE Row =
StateValue)
) a
INNER JOIN StateSets
ON RowStateSet = StateSet
) ,
ColumnSolutions AS
(
SELECT ColStateSet
, Col
, Position
, Val
FROM (SELECT StateSet ColStateSet
, Col
FROM Puzzle
inner join StateSets
ON Position = Row
AND Puzzle.Val = StateSets.Val
inner join StateValues
ON Col = StateValue
GROUP BY StateSet, Col, Statevalue
HAVING COUNT(*) = (SELECT COUNT(*) FROM Puzzle WHERE Col =
StateValue)
) a
INNER JOIN StateSets
ON ColStateSet = StateSet
) ,
CellSolutions AS
(
SELECT CellStateSet
, Cell
, Position
, Val
FROM (SELECT StateSet CellStateSet
, Cell
FROM Puzzle
inner join StateSets
ON Position = CellPosition
AND Puzzle.Val = StateSets.Val
inner join StateValues
ON Cell = StateValue
GROUP BY StateSet, Cell, StateValue
HAVING COUNT(*) = (SELECT COUNT(*) FROM Puzzle WHERE Cell =
StateValue)
) a
INNER JOIN StateSets
ON CellStateSet = StateSet
) ,
CompleteRowMatches AS
(
SELECT RowStateSet
, Row
FROM RowSolutions
INNER JOIN ColumnSolutions
ON Row = ColumnSolutions.Position
AND Col = RowSolutions.Position
AND RowSolutions.Val = ColumnSolutions.Val
GROUP BY RowStateSet, Row
HAVING COUNT(DISTINCT Col) = 9
) ,
CompleteColumnMatches AS
(
SELECT ColStateSet, Col
FROM RowSolutions
INNER JOIN ColumnSolutions
ON Row = ColumnSolutions.Position
AND Col = RowSolutions.Position
AND RowSolutions.Val = ColumnSolutions.Val
INNER JOIN CompleteRowMatches
ON RowSolutions.Row = CompleteRowMatches.Row
AND RowSolutions.RowStateSet = CompleteRowMatches.RowStateSet
GROUP BY ColStateSet, Col
HAVING COUNT(DISTINCT RowSolutions.Row) = 9
) ,
MatchStateValues AS
(
SELECT ColStateSet StateSet
, Col
, Position Row
, (Position - 1) / 3 * 3 + (Col - 1) / 3 + 1 Cell
, ((Position - 1) % 3) * 3 + ((Col - 1) % 3 + 1) CellPos
, Val
FROM CompleteColumnMatches
INNER JOIN StateSets
ON ColStateSet = StateSet
) ,
CellMatchStateSets AS
(
SELECT CellSolutions.CellStateSet
, CellSolutions.Cell
FROM MatchStateValues
INNER JOIN CellSolutions
ON MatchStateValues.Cell = CellSolutions.Cell
AND MatchStateValues.CellPos = CellSolutions.Position
AND MatchStateValues.Val = CellSolutions.Val
GROUP BY CellSolutions.CellStateSet, CellSolutions.Cell
HAVING COUNT(DISTINCT Position) = 9
) ,
CellMatchStateSetValues AS
(
SELECT StateSets.*
, (Position + 2) % 3 + ((cell + 2) % 3) * 3 + 1 Col
, (Position - 1) / 3 + ((cell - 1) / 3) * 3 + 1 Row
, Cell
FROM StateSets
INNER JOIN CellMatchStateSets
ON StateSet = CellStateSet
) ,
RowCellSets AS
(
SELECT a.CellStateSet ASet
, a.Cell ACell
, b.CellStateSet BSet
, b.Cell BCell
, c.CellStateSet CSet
, c.Cell CCell
FROM CellMatchStateSets a
, CellMatchStateSets b
, CellMatchStateSets c
WHERE a.Cell % 3 = 1
AND b.Cell % 3 = 2
AND c.Cell % 3 = 0
AND b.Cell BETWEEN a.Cell AND c.Cell
AND c.Cell - a.Cell = 2
AND EXISTS
(
SELECT *
FROM
(
SELECT Row
FROM CellMatchStateSetValues
WHERE (
StateSet = a.CellStateSet
AND Cell = a.Cell
)
OR
(
StateSet = b.CellStateSet
AND Cell = b.Cell
)
OR
(
StateSet = c.CellStateSet
AND Cell = c.Cell
)
GROUP BY Row
HAVING SUM(DISTINCT Val) = 45
) y
HAVING COUNT(DISTINCT Row) = 3
)
) ,
ColCellSets AS
(
SELECT a.CellStateSet ASet
, a.Cell ACell
, b.CellStateSet BSet
, b.Cell BCell
, c.CellStateSet CSet
, c.Cell CCell
FROM CellMatchStateSets a
, CellMatchStateSets b
, CellMatchStateSets c
WHERE a.Cell <= 3
AND c.Cell - b.Cell = 3
AND c.Cell - a.Cell = 6
AND EXISTS
(
SELECT *
FROM
(
SELECT Col
FROM CellMatchStateSetValues
WHERE (
StateSet = a.CellStateSet
AND Cell = a.Cell
)
OR
(
StateSet = b.CellStateSet
AND Cell = b.Cell
)
OR
(
StateSet = c.CellStateSet
AND Cell = c.Cell
)
GROUP BY Col
HAVING SUM(DISTINCT Val) = 45
) y
HAVING COUNT(DISTINCT Col) = 3
)
)
SELECT DISTINCT Row
, Col
, Val
FROM CellMatchStateSetValues
WHERE EXISTS (
SELECT DISTINCT *
FROM
(
SELECT TOP 1
cA.ASet Cell1
, cA.BSet Cell4
, cA.CSet Cell7
, rA.BSet Cell2
, rA.CSet Cell3
, cB.BSet Cell5
, cB.CSet Cell8
, rB.CSet Cell6
, rC.CSet Cell9
FROM ColCellSets cA
INNER JOIN RowCellSets rA
ON cA.ASet = rA.ASet
AND cA.ACell = 1
AND rA.ACell = 1
INNER JOIN ColCellSets cB
ON rA.BSet = cB.ASet
AND cB.ACell = 2
INNER JOIN RowCellSets rB
ON cA.BSet = rB.ASet
AND cB.BSet = rB.BSet
AND rB.ACell = 4
INNER JOIN ColCellSets cC
ON rA.CSet = cC.ASet
AND rB.CSet = cC.BSet
AND cC.ACell = 3
INNER JOIN RowCellSets rC
ON rC.ASet = cA.CSet
AND rC.BSet = cB.CSet
AND rC.CSet = cC.CSet
AND rC.ACell = 7
) s UNPIVOT (StateSet FOR Cell IN (Cell1, Cell2, Cell3, Cell4,
Cell5, Cell6, Cell7, Cell8, Cell9)) x
WHERE CellMatchStateSetValues.StateSet = x.StateSet
AND RTRIM(x.Cell) = RTRIM('Cell' + CAST(CellMatchStateSetValues.Cell
AS VARCHAR(10)))
)

GO

rpresser

unread,
Apr 22, 2008, 5:39:53 PM4/22/08
to
On Apr 22, 10:23 am, Alan Samet <alansa...@gmail.com> wrote:
> ...complete with alliteration.
>
> I've just written what I believe to be an entirely new approach to
> solving Sudoku problems using T-SQL. This approach takes the finite
> set of possible combinations of 1-9 (9-factorial [9!], or 362,880 and
> stores that in a table of statesets. The following query then, through
> a combination of limiting expressions, derives the puzzle solution in
> a single operation. This is a set-oriented solution, meaning that it
> involves a lot of cross-row operations within the table and full set
> operations. It also brings out some shortcomings of the query
> optimizer and crashes GDI in SQL Management Studio when you try to use
> the widget to look at the execution plan. I use lots of common table
> expressions here to simplify the query; it's possible to do without
> the CTEs, but at a significant expense of readability.
>
> I'm posting a write-up, complete with documented SQL, a WPF test
> application, an optimized step-by-step execution and sample data on my
> website here:http://www.socialdetour.com/Profile/ViewEssay.aas?user=Alan&g=5c0af94...

>
> However, if you just want the SQL, I'm posting that here, along with
> the table creation scripts. You use a GUID as a puzzle identifier and
> insert the GUID/row/cell/value combinations into the Puzzles table for
> known values.

I've got nothing to add, but I think this is very impressive!

--CELKO--

unread,
Apr 22, 2008, 6:19:04 PM4/22/08
to
>> an entirely new approach to solving Sudoku problems using T-SQL. This approach takes the finite set of possible combinations of 1-9 (9-factorial [9!], or 362,880 and stores that in a table of statesets. <<

While I am usually a fan of data-driven solutions, this is not a good
idea in this case. First of all, as you said, this is a T-SQL
solution -- it is full of proprietary tricks, like PIVOT, TOP, etc.
and will never port.

But it is so convoluted

Think about how an EXISTS() works when you write things like this

EXISTS (SELECT DISTINCT *
FROM (SELECT TOP 1

Why do I want a DISTINCT? Why do I want a TOP? Why not just EXISTS
(SELECT *..) instead all that extra work and nested code? All this
could do is screw up the optimizer by forcing extra sorting under the
covers.

Why use a GUID to number the results? It long, proprietary and
unreadable; an IDENTITY would give you a count and sequence of the
answers. And I am not a big fan of IDENTITY.

>> The following query then, through a combination of limiting expressions, derives the puzzle solution in a single operation. <<

Look at Richard Romley's solution at www.celko.com. He does it in one
query also, but without an proprietary code. It is just a brute force
application of the rules with IN() predicates. He passes in 81
parameters to represent the grid and then produces **all** correct
answers. Yes, a lot of published puzzles have more than one answer,
even tho they are not supposed to -- he got one with over 30 answers
and tested it for cases in the thousands.

>> It also brings out some shortcomings of the query optimizer and crashes GDI in SQL Management Studio when you try to use the widget to look at the execution plan. <<

Richard's answer has no such problems. It also demonstrates some
important things:

1) Long parameter list work just fine in T-SQL. That is why they allow
up to 2K of them in a stored procedure.

2) Simple IN() predicates work better than trying fancy stuff.
Originally, we were trying to use home-grown binary math to check the
uniqueness of digits in rows, columns and regions. It is also a lot
easier to maintain :)

3) The problem is symmetric, so the trick is to turn off
optimization. The T-SQL optimizer is really pretty good as long as
you write straight, simple SQL. There is an 81 table self-join in the
middle of the query; if you leave optimization on, it will spend more
time on compiling the query than on just running it directly as
written.

Alan Samet

unread,
Apr 22, 2008, 7:55:54 PM4/22/08
to
> EXISTS (SELECT DISTINCT *
>  FROM (SELECT TOP 1
>
> Why do I want a DISTINCT?  Why do I want a TOP? Why not just EXISTS
> (SELECT *..) instead all that extra work and nested code?  All this
> could do is screw up the optimizer by forcing extra sorting under the
> covers.

This was to select each result set. The subquery returns all result
sets for the given puzzle. I used UNPIVOT just because it was
convenient. You can easily accomplish the same thing using standard
techniques with a little extra work. The DISTINCT was my mistake -- an
artifact from something else I was trying in development. The query
that starts with SELECT TOP 1 ... is what returns ALL valid solutions
to the given puzzle. I could have stopped there, or joined to it to
have returned all possible solution sets for the puzzle. I was lazy;
after getting to where I could get one solution, I decided that was
good enough for me.

> Why use a GUID to number the results?  

I used a GUID to select the puzzle input values from sets of puzzles.
I didn't publish this, but the puzzles that I used required some
transformation before loading them into the database. Only the puzzles
in an unsolved state used a GUID.

> Look at Richard Romley's solution atwww.celko.com.  

Not to sound too defensive, but if you follow my initial link here to
my posted essay: I've posted my compliments on Richard's solution,
even specifically mentioning that I assumed his would likely
outperform mine, but that wasn't my goal.

If I were simply interested in writing something to solve Sudoku
puzzles, I would not have used SQL at all. The handful of techniques I
used leading up to the final query was why I followed this approach. I
saw it as a multidimensional membership problem with known solutions.
My approach was to develop the techniques to solve this type of
problem using sets.

Tony Rogerson

unread,
Apr 23, 2008, 3:18:25 AM4/23/08
to
> Richard's answer has no such problems. It also demonstrates some
> important things:
>
> 1) Long parameter list work just fine in T-SQL. That is why they allow
> up to 2K of them in a stored procedure.
>
> 2) Simple IN() predicates work better than trying fancy stuff.
> Originally, we were trying to use home-grown binary math to check the
> uniqueness of digits in rows, columns and regions. It is also a lot
> easier to maintain :)
>
> 3) The problem is symmetric, so the trick is to turn off
> optimization. The T-SQL optimizer is really pretty good as long as
> you write straight, simple SQL. There is an 81 table self-join in the
> middle of the query; if you leave optimization on, it will spend more
> time on compiling the query than on just running it directly as
> written.

As Richard has pointed out in a number of emails to you and myself his
solution concept was never mean't to be an example of production code; there
are many things missing.

Take for instance - if you pass in numbers > 9 the thing locks up - I
cancelled it after my tempdb grew to over 20GB and after a number of hours.

There needs to be a large number of IF @p1 not between 0 and 9 then error
but that is all manual coding, manual checking and a lot of test cases; that
coding style is non-intuitative. Simply using a macro to do it does not get
round you having to write many test cases; it does not get round the fact it
takes over 2 minutes to edit the thing in Management Studio, it does not get
round the excessive support costs a very large number of IF ELSE would incur
for the poor guy who has to maintain it.

There are many other and more efficient ways to do sudoko and there are many
examples out there - more maintainable and succinct examples.

For those who require portability (is there anybody these days? [no]) then
the other solutions take advantage of the data engine features of SQL Server
rather than pure standard SQL.

Unfortunetly you've been told all this on many occaisions and we are still
waiting for responses on many threads for instance how you would validate
numbers between 0 - 9 for all those 2000 parameters without a lot of repeat
code which is subject to errors and significant maintanence burden.

--
Tony Rogerson, SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson
[Ramblings from the field from a SQL consultant]
http://sqlserverfaq.com
[UK SQL User Community]

0 new messages