Expert Challenge - Complex Join, Group By, Having

r...@ti.com

unread,

Jul 13, 2007, 2:07:51 AM7/13/07

to

I'm having a hard time getting my brain around this one and would
appreciate some help. Here are setup tables and sample data:

-- load main data table
declare @data table([id] INT IDENTITY, [answer] VARCHAR(20),
[wordCount] int, [word] varchar(20))
insert @data([answer], [wordCount], [word] ) select 'first', 2,
'george'
insert @data([answer], [wordCount], [word] ) select 'first', 2,
'burns'
insert @data([answer], [wordCount], [word] ) select 'second', 2,
'burns'
insert @data([answer], [wordCount], [word] ) select 'second', 2,
'burns'
insert @data([answer], [wordCount], [word] ) select 'third', 3, 'go'
insert @data([answer], [wordCount], [word] ) select 'third', 3,
'george'
insert @data([answer], [wordCount], [word] ) select 'third', 3,
'burns'

-- test #1, load search words for matching to main data table
declare @match table([ID] INT IDENTITY, [word] varchar(20))
insert @match([word] ) select 'burns'
insert @match([word] ) select 'george'

/*
-- test #2, load search words for matching to main data table
declare @match table([ID] INT IDENTITY, [word] varchar(20))
insert @match([word] ) select 'burns'
insert @match([word] ) select 'burns'

-- test #3, load search words for matching to main data table
declare @match table([ID] INT IDENTITY, [word] varchar(20))
insert @match([word] ) select 'burns'
insert @match([word] ) select 'george'
insert @match([word] ) select 'go'
*/

I'm looking for a sql join that will return the following results:

'first' for test #1
'second' for test #2
'third' for test #3

The point is to return the answer from the @data table where the words
exactly matching the input (i.e. @match) words (regardless of word
order). @data is a large (>10 million rows), denormalized table.
"wordCount" is simply the total number of words for each answer and
for improved perfomance. @matches will typically be small; 1-10
words. The following is as far as I've gotten but fails for test case
#2.

-- review setup data
select * from @data
select * from @match
-- current code attempt
select [answer]
from (
select
t1.[answer], t1.[dataId]
from (
select d.[answer], max(d.[id]) [dataId]
from @data d
inner join @match m on m.[word] = d.[word]
where d.[wordCount] = (select count(*) from @match)
group by d.[answer], m.[id]
) t1
group by t1.[answer], t0.[id]
) t2
group by t2.[answer]
having count(*) = (select count(*) from @match)

Thanks for your assistance!

Petar Atanasov

unread,

Jul 13, 2007, 2:45:25 AM7/13/07

to

One suggestion:

1) Fix the GROUP BY clause
'group by t1.[answer], t0.[id]' -> group by t1.[answer], t1.[dataId]

2) Comment the HAVING clause :

having count(*) = (select count(*) from @match)

I really can't figure out it's exact purpose from the sample, but it
messes the test case #2.

HTH,
Petar Atanasov
http://a-wake.net

Sha Anand

unread,

Jul 13, 2007, 3:34:00 AM7/13/07

to

Try the below Query

SELECT d2.answer FROM
(
SELECT word,count(*) as WCount from #Match
group by word
) d1
LEFT JOIN
(
SELECT answer,word,count(*) as WCount FROM #data
WHERE wordcount = (select count(*) FROM #match)
GROUP BY answer,word
) d2 ON
d1.word = d2.word and d1.wcount = d2.wcount
GROUP BY d2.answer
HAVING COUNT(d1.Word) = COUNT(DISTINCT d2.word)

- Sha Anand

Bob

unread,

Jul 13, 2007, 5:42:00 AM7/13/07

to

Got it!

SELECT d.answer
FROM @data d
CROSS JOIN @match m
GROUP BY d.answer
HAVING MAX( d.WordCount ) = COUNT( DISTINCT m.id )
AND COUNT( DISTINCT d.word ) = COUNT( DISTINCT m.word )

Let me know how you get on.

wBob

Sha Anand

unread,

Jul 13, 2007, 7:32:01 AM7/13/07

to

Hi Bob,

Your Query will not work for the following data. My above Query will.

Try inserting these data and run your Query, Its not supposed to return a
match. But it returning a match.

insert #data([answer], [wordCount], [word] ) select 'Tenth', 4, 'a'
insert #data([answer], [wordCount], [word] ) select 'Tenth', 4, 'a'
insert #data([answer], [wordCount], [word] ) select 'Tenth', 4, 'b'
insert #data([answer], [wordCount], [word] ) select 'Tenth', 4, 'b'

insert #match([word] ) select 'a'
insert #match([word] ) select 'a'
insert #match([word] ) select 'a'
insert #match([word] ) select 'b'

- Sha Anand

Petar Atanasov

unread,

Jul 13, 2007, 9:22:10 AM7/13/07

to

Petar,

read carefully the query conditions:

>> 'first' for test #1
>> 'second' for test #2
>> 'third' for test #3

definitly need <b>HAVING</b> some sleep tonight...

Regards,
Petar Atanasov
http:/a-wake.net

r...@ti.com

unread,

Jul 13, 2007, 11:15:06 AM7/13/07

to

You guys are amazing! But not quite.... I though Sha had it but the
following test case breaks his solution:

insert @data([answer], [wordCount], [word] ) select 'eleventh',
3,'burns'
insert @data([answer], [wordCount], [word] ) select 'eleventh',
3,'gracie'

With this additional test case a @match table containing 'george' and
'burns' also matches to 'gracie', 'burns' in @data (not desired, only
want an exact match). It seems you have to join @match and @data
tables on [word] but that you'll need to group on respective identity
fields so each word occurance appears a single time on either side of
the join (for a specific answer)?

More ideas?

Roy Harvey

unread,

Jul 13, 2007, 11:40:13 AM7/13/07

to

I added the extra test data and this seems to be working.

SELECT A.answer
FROM (SELECT answer, wordCount, word,
COUNT(*) as Repeats
FROM @data
WHERE wordCount =
(select count(*) from @match)
GROUP BY answer, wordCount, word) as A
JOIN (SELECT word, count(*) as Repeats
FROM @match
GROUP BY word) as B
ON A.word = B.word
AND A.Repeats = B.Repeats
GROUP BY A.answer, A.wordCount
HAVING SUM(B.Repeats) = A.WordCount

Roy Harvey
Beacon Falls, CT

r...@ti.com

unread,

Jul 13, 2007, 5:10:42 PM7/13/07

to

Most excellent Roy! Very insightful -- not sure I would have found
that answer in a thousand years. What's the title of your book?

And thanks again everyone.

Roy Harvey

unread,

Jul 13, 2007, 6:04:34 PM7/13/07

to

On Fri, 13 Jul 2007 14:10:42 -0700, r...@ti.com wrote:

>What's the title of your book?

"Solving Obscure Problems Created by Strange Database Designs Through
Clever SQL Coding"

8-)

Steve Dassin

unread,

Jul 18, 2007, 1:02:04 AM7/18/07

to

Here you go:

Dataphor - Simplifying Relational Division
http://beyondsql.blogspot.com/2007/07/dataphor-simplifying-relational.html

I've used this problem as an example.

www.beyondsql.blogspot.com