-- load main data table
declare @data table([id] INT IDENTITY, [answer] VARCHAR(20),
[wordCount] int, [word] varchar(20))
insert @data([answer], [wordCount], [word] ) select 'first', 2,
'george'
insert @data([answer], [wordCount], [word] ) select 'first', 2,
'burns'
insert @data([answer], [wordCount], [word] ) select 'second', 2,
'burns'
insert @data([answer], [wordCount], [word] ) select 'second', 2,
'burns'
insert @data([answer], [wordCount], [word] ) select 'third', 3, 'go'
insert @data([answer], [wordCount], [word] ) select 'third', 3,
'george'
insert @data([answer], [wordCount], [word] ) select 'third', 3,
'burns'
-- test #1, load search words for matching to main data table
declare @match table([ID] INT IDENTITY, [word] varchar(20))
insert @match([word] ) select 'burns'
insert @match([word] ) select 'george'
/*
-- test #2, load search words for matching to main data table
declare @match table([ID] INT IDENTITY, [word] varchar(20))
insert @match([word] ) select 'burns'
insert @match([word] ) select 'burns'
-- test #3, load search words for matching to main data table
declare @match table([ID] INT IDENTITY, [word] varchar(20))
insert @match([word] ) select 'burns'
insert @match([word] ) select 'george'
insert @match([word] ) select 'go'
*/
I'm looking for a sql join that will return the following results:
'first' for test #1
'second' for test #2
'third' for test #3
The point is to return the answer from the @data table where the words
exactly matching the input (i.e. @match) words (regardless of word
order). @data is a large (>10 million rows), denormalized table.
"wordCount" is simply the total number of words for each answer and
for improved perfomance. @matches will typically be small; 1-10
words. The following is as far as I've gotten but fails for test case
#2.
-- review setup data
select * from @data
select * from @match
-- current code attempt
select [answer]
from (
select
t1.[answer], t1.[dataId]
from (
select d.[answer], max(d.[id]) [dataId]
from @data d
inner join @match m on m.[word] = d.[word]
where d.[wordCount] = (select count(*) from @match)
group by d.[answer], m.[id]
) t1
group by t1.[answer], t0.[id]
) t2
group by t2.[answer]
having count(*) = (select count(*) from @match)
Thanks for your assistance!
One suggestion:
1) Fix the GROUP BY clause
'group by t1.[answer], t0.[id]' -> group by t1.[answer], t1.[dataId]
2) Comment the HAVING clause :
having count(*) = (select count(*) from @match)
I really can't figure out it's exact purpose from the sample, but it
messes the test case #2.
HTH,
Petar Atanasov
http://a-wake.net
SELECT d2.answer FROM
(
SELECT word,count(*) as WCount from #Match
group by word
) d1
LEFT JOIN
(
SELECT answer,word,count(*) as WCount FROM #data
WHERE wordcount = (select count(*) FROM #match)
GROUP BY answer,word
) d2 ON
d1.word = d2.word and d1.wcount = d2.wcount
GROUP BY d2.answer
HAVING COUNT(d1.Word) = COUNT(DISTINCT d2.word)
- Sha Anand
SELECT d.answer
FROM @data d
CROSS JOIN @match m
GROUP BY d.answer
HAVING MAX( d.WordCount ) = COUNT( DISTINCT m.id )
AND COUNT( DISTINCT d.word ) = COUNT( DISTINCT m.word )
Let me know how you get on.
wBob
Your Query will not work for the following data. My above Query will.
Try inserting these data and run your Query, Its not supposed to return a
match. But it returning a match.
insert #data([answer], [wordCount], [word] ) select 'Tenth', 4, 'a'
insert #data([answer], [wordCount], [word] ) select 'Tenth', 4, 'a'
insert #data([answer], [wordCount], [word] ) select 'Tenth', 4, 'b'
insert #data([answer], [wordCount], [word] ) select 'Tenth', 4, 'b'
insert #match([word] ) select 'a'
insert #match([word] ) select 'a'
insert #match([word] ) select 'a'
insert #match([word] ) select 'b'
- Sha Anand
Petar,
read carefully the query conditions:
>> 'first' for test #1
>> 'second' for test #2
>> 'third' for test #3
definitly need <b>HAVING</b> some sleep tonight...
Regards,
Petar Atanasov
http:/a-wake.net
You guys are amazing! But not quite.... I though Sha had it but the
following test case breaks his solution:
insert @data([answer], [wordCount], [word] ) select 'eleventh',
3,'burns'
insert @data([answer], [wordCount], [word] ) select 'eleventh',
3,'gracie'
With this additional test case a @match table containing 'george' and
'burns' also matches to 'gracie', 'burns' in @data (not desired, only
want an exact match). It seems you have to join @match and @data
tables on [word] but that you'll need to group on respective identity
fields so each word occurance appears a single time on either side of
the join (for a specific answer)?
More ideas?
SELECT A.answer
FROM (SELECT answer, wordCount, word,
COUNT(*) as Repeats
FROM @data
WHERE wordCount =
(select count(*) from @match)
GROUP BY answer, wordCount, word) as A
JOIN (SELECT word, count(*) as Repeats
FROM @match
GROUP BY word) as B
ON A.word = B.word
AND A.Repeats = B.Repeats
GROUP BY A.answer, A.wordCount
HAVING SUM(B.Repeats) = A.WordCount
Roy Harvey
Beacon Falls, CT
Most excellent Roy! Very insightful -- not sure I would have found
that answer in a thousand years. What's the title of your book?
And thanks again everyone.
>What's the title of your book?
"Solving Obscure Problems Created by Strange Database Designs Through
Clever SQL Coding"
8-)
Dataphor - Simplifying Relational Division
http://beyondsql.blogspot.com/2007/07/dataphor-simplifying-relational.html
I've used this problem as an example.