-- test #1, load search words for matching to main data table declare @match table([ID] INT IDENTITY, [word] varchar(20)) insert @match([word] ) select 'burns' insert @match([word] ) select 'george'
/* -- test #2, load search words for matching to main data table declare @match table([ID] INT IDENTITY, [word] varchar(20)) insert @match([word] ) select 'burns' insert @match([word] ) select 'burns'
-- test #3, load search words for matching to main data table declare @match table([ID] INT IDENTITY, [word] varchar(20)) insert @match([word] ) select 'burns' insert @match([word] ) select 'george' insert @match([word] ) select 'go' */
I'm looking for a sql join that will return the following results:
'first' for test #1 'second' for test #2 'third' for test #3
The point is to return the answer from the @data table where the words exactly matching the input (i.e. @match) words (regardless of word order). @data is a large (>10 million rows), denormalized table. "wordCount" is simply the total number of words for each answer and for improved perfomance. @matches will typically be small; 1-10 words. The following is as far as I've gotten but fails for test case #2.
-- review setup data select * from @data select * from @match -- current code attempt select [answer] from ( select t1.[answer], t1.[dataId] from ( select d.[answer], max(d.[id]) [dataId] from @data d inner join @match m on m.[word] = d.[word] where d.[wordCount] = (select count(*) from @match) group by d.[answer], m.[id] ) t1 group by t1.[answer], t0.[id] ) t2 group by t2.[answer] having count(*) = (select count(*) from @match)
> -- test #1, load search words for matching to main data table > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > insert @match([word] ) select 'burns' > insert @match([word] ) select 'george'
> /* > -- test #2, load search words for matching to main data table > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > insert @match([word] ) select 'burns' > insert @match([word] ) select 'burns'
> -- test #3, load search words for matching to main data table > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > insert @match([word] ) select 'burns' > insert @match([word] ) select 'george' > insert @match([word] ) select 'go' > */
> I'm looking for a sql join that will return the following results:
> 'first' for test #1 > 'second' for test #2 > 'third' for test #3
> The point is to return the answer from the @data table where the words > exactly matching the input (i.e. @match) words (regardless of word > order). @data is a large (>10 million rows), denormalized table. > "wordCount" is simply the total number of words for each answer and > for improved perfomance. @matches will typically be small; 1-10 > words. The following is as far as I've gotten but fails for test case > #2.
> -- review setup data > select * from @data > select * from @match > -- current code attempt > select [answer] > from ( > select > t1.[answer], t1.[dataId] > from ( > select d.[answer], max(d.[id]) [dataId] > from @data d > inner join @match m on m.[word] = d.[word] > where d.[wordCount] = (select count(*) from @match) > group by d.[answer], m.[id] > ) t1 > group by t1.[answer], t0.[id] > ) t2 > group by t2.[answer] > having count(*) = (select count(*) from @match)
> Thanks for your assistance!
One suggestion:
1) Fix the GROUP BY clause 'group by t1.[answer], t0.[id]' -> group by t1.[answer], t1.[dataId]
2) Comment the HAVING clause : having count(*) = (select count(*) from @match)
I really can't figure out it's exact purpose from the sample, but it messes the test case #2.
SELECT d2.answer FROM ( SELECT word,count(*) as WCount from #Match group by word ) d1 LEFT JOIN ( SELECT answer,word,count(*) as WCount FROM #data WHERE wordcount = (select count(*) FROM #match) GROUP BY answer,word ) d2 ON d1.word = d2.word and d1.wcount = d2.wcount GROUP BY d2.answer HAVING COUNT(d1.Word) = COUNT(DISTINCT d2.word)
"r...@ti.com" wrote: > I'm having a hard time getting my brain around this one and would > appreciate some help. Here are setup tables and sample data:
> -- test #1, load search words for matching to main data table > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > insert @match([word] ) select 'burns' > insert @match([word] ) select 'george'
> /* > -- test #2, load search words for matching to main data table > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > insert @match([word] ) select 'burns' > insert @match([word] ) select 'burns'
> -- test #3, load search words for matching to main data table > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > insert @match([word] ) select 'burns' > insert @match([word] ) select 'george' > insert @match([word] ) select 'go' > */
> I'm looking for a sql join that will return the following results:
> 'first' for test #1 > 'second' for test #2 > 'third' for test #3
> The point is to return the answer from the @data table where the words > exactly matching the input (i.e. @match) words (regardless of word > order). @data is a large (>10 million rows), denormalized table. > "wordCount" is simply the total number of words for each answer and > for improved perfomance. @matches will typically be small; 1-10 > words. The following is as far as I've gotten but fails for test case > #2.
> -- review setup data > select * from @data > select * from @match > -- current code attempt > select [answer] > from ( > select > t1.[answer], t1.[dataId] > from ( > select d.[answer], max(d.[id]) [dataId] > from @data d > inner join @match m on m.[word] = d.[word] > where d.[wordCount] = (select count(*) from @match) > group by d.[answer], m.[id] > ) t1 > group by t1.[answer], t0.[id] > ) t2 > group by t2.[answer] > having count(*) = (select count(*) from @match)
SELECT d.answer FROM @data d CROSS JOIN @match m GROUP BY d.answer HAVING MAX( d.WordCount ) = COUNT( DISTINCT m.id ) AND COUNT( DISTINCT d.word ) = COUNT( DISTINCT m.word )
"r...@ti.com" wrote: > I'm having a hard time getting my brain around this one and would > appreciate some help. Here are setup tables and sample data:
> -- test #1, load search words for matching to main data table > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > insert @match([word] ) select 'burns' > insert @match([word] ) select 'george'
> /* > -- test #2, load search words for matching to main data table > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > insert @match([word] ) select 'burns' > insert @match([word] ) select 'burns'
> -- test #3, load search words for matching to main data table > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > insert @match([word] ) select 'burns' > insert @match([word] ) select 'george' > insert @match([word] ) select 'go' > */
> I'm looking for a sql join that will return the following results:
> 'first' for test #1 > 'second' for test #2 > 'third' for test #3
> The point is to return the answer from the @data table where the words > exactly matching the input (i.e. @match) words (regardless of word > order). @data is a large (>10 million rows), denormalized table. > "wordCount" is simply the total number of words for each answer and > for improved perfomance. @matches will typically be small; 1-10 > words. The following is as far as I've gotten but fails for test case > #2.
> -- review setup data > select * from @data > select * from @match > -- current code attempt > select [answer] > from ( > select > t1.[answer], t1.[dataId] > from ( > select d.[answer], max(d.[id]) [dataId] > from @data d > inner join @match m on m.[word] = d.[word] > where d.[wordCount] = (select count(*) from @match) > group by d.[answer], m.[id] > ) t1 > group by t1.[answer], t0.[id] > ) t2 > group by t2.[answer] > having count(*) = (select count(*) from @match)
> > -- test #1, load search words for matching to main data table > > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > > insert @match([word] ) select 'burns' > > insert @match([word] ) select 'george'
> > /* > > -- test #2, load search words for matching to main data table > > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > > insert @match([word] ) select 'burns' > > insert @match([word] ) select 'burns'
> > -- test #3, load search words for matching to main data table > > declare @match table([ID] INT IDENTITY, [word] varchar(20)) > > insert @match([word] ) select 'burns' > > insert @match([word] ) select 'george' > > insert @match([word] ) select 'go' > > */
> > I'm looking for a sql join that will return the following results:
> > 'first' for test #1 > > 'second' for test #2 > > 'third' for test #3
> > The point is to return the answer from the @data table where the words > > exactly matching the input (i.e. @match) words (regardless of word > > order). @data is a large (>10 million rows), denormalized table. > > "wordCount" is simply the total number of words for each answer and > > for improved perfomance. @matches will typically be small; 1-10 > > words. The following is as far as I've gotten but fails for test case > > #2.
> > -- review setup data > > select * from @data > > select * from @match > > -- current code attempt > > select [answer] > > from ( > > select > > t1.[answer], t1.[dataId] > > from ( > > select d.[answer], max(d.[id]) [dataId] > > from @data d > > inner join @match m on m.[word] = d.[word] > > where d.[wordCount] = (select count(*) from @match) > > group by d.[answer], m.[id] > > ) t1 > > group by t1.[answer], t0.[id] > > ) t2 > > group by t2.[answer] > > having count(*) = (select count(*) from @match)
Petar Atanasov wrote: > r...@ti.com wrote: >> I'm having a hard time getting my brain around this one and would >> appreciate some help. Here are setup tables and sample data:
>> -- test #1, load search words for matching to main data table >> declare @match table([ID] INT IDENTITY, [word] varchar(20)) >> insert @match([word] ) select 'burns' >> insert @match([word] ) select 'george'
>> /* >> -- test #2, load search words for matching to main data table >> declare @match table([ID] INT IDENTITY, [word] varchar(20)) >> insert @match([word] ) select 'burns' >> insert @match([word] ) select 'burns'
>> -- test #3, load search words for matching to main data table >> declare @match table([ID] INT IDENTITY, [word] varchar(20)) >> insert @match([word] ) select 'burns' >> insert @match([word] ) select 'george' >> insert @match([word] ) select 'go' >> */
>> I'm looking for a sql join that will return the following results:
>> 'first' for test #1 >> 'second' for test #2 >> 'third' for test #3
>> The point is to return the answer from the @data table where the words >> exactly matching the input (i.e. @match) words (regardless of word >> order). @data is a large (>10 million rows), denormalized table. >> "wordCount" is simply the total number of words for each answer and >> for improved perfomance. @matches will typically be small; 1-10 >> words. The following is as far as I've gotten but fails for test case >> #2.
>> -- review setup data >> select * from @data >> select * from @match >> -- current code attempt >> select [answer] >> from ( >> select >> t1.[answer], t1.[dataId] >> from ( >> select d.[answer], max(d.[id]) [dataId] >> from @data d >> inner join @match m on m.[word] = d.[word] >> where d.[wordCount] = (select count(*) from @match) >> group by d.[answer], m.[id] >> ) t1 >> group by t1.[answer], t0.[id] >> ) t2 >> group by t2.[answer] >> having count(*) = (select count(*) from @match)
>> Thanks for your assistance!
> One suggestion:
> 1) Fix the GROUP BY clause > 'group by t1.[answer], t0.[id]' -> group by t1.[answer], t1.[dataId]
> 2) Comment the HAVING clause : > having count(*) = (select count(*) from @match)
> I really can't figure out it's exact purpose from the sample, but it > messes the test case #2.
> Petar Atanasov wrote: > > r...@ti.com wrote: > >> I'm having a hard time getting my brain around this one and would > >> appreciate some help. Here are setup tables and sample data:
> >> -- test #1, load search words for matching to main data table > >> declare @match table([ID] INT IDENTITY, [word] varchar(20)) > >> insert @match([word] ) select 'burns' > >> insert @match([word] ) select 'george'
> >> /* > >> -- test #2, load search words for matching to main data table > >> declare @match table([ID] INT IDENTITY, [word] varchar(20)) > >> insert @match([word] ) select 'burns' > >> insert @match([word] ) select 'burns'
> >> -- test #3, load search words for matching to main data table > >> declare @match table([ID] INT IDENTITY, [word] varchar(20)) > >> insert @match([word] ) select 'burns' > >> insert @match([word] ) select 'george' > >> insert @match([word] ) select 'go' > >> */
> >> I'm looking for a sql join that will return the following results:
> >> 'first' for test #1 > >> 'second' for test #2 > >> 'third' for test #3
> >> The point is to return the answer from the @data table where the words > >> exactly matching the input (i.e. @match) words (regardless of word > >> order). @data is a large (>10 million rows), denormalized table. > >> "wordCount" is simply the total number of words for each answer and > >> for improved perfomance. @matches will typically be small; 1-10 > >> words. The following is as far as I've gotten but fails for test case > >> #2.
> >> -- review setup data > >> select * from @data > >> select * from @match > >> -- current code attempt > >> select [answer] > >> from ( > >> select > >> t1.[answer], t1.[dataId] > >> from ( > >> select d.[answer], max(d.[id]) [dataId] > >> from @data d > >> inner join @match m on m.[word] = d.[word] > >> where d.[wordCount] = (select count(*) from @match) > >> group by d.[answer], m.[id] > >> ) t1 > >> group by t1.[answer], t0.[id] > >> ) t2 > >> group by t2.[answer] > >> having count(*) = (select count(*) from @match)
> >> Thanks for your assistance!
> > One suggestion:
> > 1) Fix the GROUP BY clause > > 'group by t1.[answer], t0.[id]' -> group by t1.[answer], t1.[dataId]
> > 2) Comment the HAVING clause : > > having count(*) = (select count(*) from @match)
> > I really can't figure out it's exact purpose from the sample, but it > > messes the test case #2.
With this additional test case a @match table containing 'george' and 'burns' also matches to 'gracie', 'burns' in @data (not desired, only want an exact match). It seems you have to join @match and @data tables on [word] but that you'll need to group on respective identity fields so each word occurance appears a single time on either side of the join (for a specific answer)?
I added the extra test data and this seems to be working.
SELECT A.answer FROM (SELECT answer, wordCount, word, COUNT(*) as Repeats FROM @data WHERE wordCount = (select count(*) from @match) GROUP BY answer, wordCount, word) as A JOIN (SELECT word, count(*) as Repeats FROM @match GROUP BY word) as B ON A.word = B.word AND A.Repeats = B.Repeats GROUP BY A.answer, A.wordCount HAVING SUM(B.Repeats) = A.WordCount
On Fri, 13 Jul 2007 08:15:06 -0700, r...@ti.com wrote: >You guys are amazing! But not quite.... I though Sha had it but the >following test case breaks his solution:
>With this additional test case a @match table containing 'george' and >'burns' also matches to 'gracie', 'burns' in @data (not desired, only >want an exact match). It seems you have to join @match and @data >tables on [word] but that you'll need to group on respective identity >fields so each word occurance appears a single time on either side of >the join (for a specific answer)?
> I added the extra test data and this seems to be working.
> SELECT A.answer > FROM (SELECT answer, wordCount, word, > COUNT(*) as Repeats > FROM @data > WHERE wordCount = > (select count(*) from @match) > GROUP BY answer, wordCount, word) as A > JOIN (SELECT word, count(*) as Repeats > FROM @match > GROUP BY word) as B > ON A.word = B.word > AND A.Repeats = B.Repeats > GROUP BY A.answer, A.wordCount > HAVING SUM(B.Repeats) = A.WordCount
> Roy Harvey > Beacon Falls, CT
> On Fri, 13 Jul 2007 08:15:06 -0700, r...@ti.com wrote: > >You guys are amazing! But not quite.... I though Sha had it but the > >following test case breaks his solution:
> >With this additional test case a @match table containing 'george' and > >'burns' also matches to 'gracie', 'burns' in @data (not desired, only > >want an exact match). It seems you have to join @match and @data > >tables on [word] but that you'll need to group on respective identity > >fields so each word occurance appears a single time on either side of > >the join (for a specific answer)?
> >More ideas?
Most excellent Roy! Very insightful -- not sure I would have found that answer in a thousand years. What's the title of your book?