Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Filtering in an N to N relationship

2 views
Skip to first unread message

David Scemama

unread,
Feb 23, 2004, 12:33:42 PM2/23/04
to
Hi,

I have two tables (PRODUCT and CRITERIA) , linked by an N to N relationship.
I need to select all the products that match a set of criterias.

For example, I need to find all the products that are at least 'red' AND 'in
bottle'

I've setup a table (FILTER) containing the criterias. If I create an inner
joint between CRITERIA and FILTER, I can have all the products that are 'red
' OR 'in bottle'.

I can't figure out how to have the AND clause.

Thanks for your help
David


Invotion

unread,
Feb 23, 2004, 1:03:53 PM2/23/04
to
Please post the ddl and some sample data rows.

Sincerely,
Invotion Engineering Team
Advanced Microsoft Hosting Solutions
http://www.Invotion.com

>.
>

Joe Celko

unread,
Feb 23, 2004, 1:30:47 PM2/23/04
to
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are. Here is a wild ass guess:

CREATE TABLE Products -- note use of plural names
(upc CHAR(10) NOT NULL -- note use of industry standards
PRIMARY KEY,
..);

The use of criteria for looks like characteristics is weird. Criteria
asks a question and sounds like it should be what you are calling
"Filters"; mind if I change that? Work on a better name, like
"packaging", that makes sense in the data model and avoid these vague
general terms.

CREATE TABLE Packaging
(package_code CHAR(10) NOT NULL
PRIMARY KEY,
..);

And the (n:m) table is:

CREATE TABLE Characteristics
(upc CHAR(10) NOT NULL
REFERENCES Products(upc)
ON DELETE CASCADE
ON UPDATE CASCADE,
package_code CHAR(10) NOT NULL
REFERENCES Packaging(package_code)
ON DELETE CASCADE
ON UPDATE CASCADE,
PRIMARY KEY (upc, package_code)):

>> I need to select all the products that match a set of criterias. <<

CREATE TABLE Criteria
(package_code CHAR(10) NOT NULL PRIMARY KEY
REFERENCES Packaging(package_code)
ON DELETE CASCADE
ON UPDATE CASCADE);

Relational division is one of the eight basic operations in Codd's
relational algebra. The idea is that a divisor table is used to
partition a dividend table and produce a quotient or results table. The
quotient table is made up of those values of one column for which a
second column had all of the values in the divisor.

This is easier to explain with an example. We have a table of pilots
and the planes they can fly (dividend); we have a table of planes in the
hangar (divisor); we want the names of the pilots who can fly every
plane (quotient) in the hangar. To get this result, we divide the
PilotSkills table by the planes in the hangar.

CREATE TABLE PilotSkills
(pilot CHAR(15) NOT NULL,
plane CHAR(15) NOT NULL,
PRIMARY KEY (pilot, plane));

PilotSkills
pilot plane
=========================
'Celko' 'Piper Cub'
'Higgins' 'B-52 Bomber'
'Higgins' 'F-14 Fighter'
'Higgins' 'Piper Cub'
'Jones' 'B-52 Bomber'
'Jones' 'F-14 Fighter'
'Smith' 'B-1 Bomber'
'Smith' 'B-52 Bomber'
'Smith' 'F-14 Fighter'
'Wilson' 'B-1 Bomber'
'Wilson' 'B-52 Bomber'
'Wilson' 'F-14 Fighter'
'Wilson' 'F-17 Fighter'

CREATE TABLE Hangar
(plane CHAR(15) NOT NULL PRIMARY KEY);

Hangar
plane
=============
'B-1 Bomber'
'B-52 Bomber'
'F-14 Fighter'

PilotSkills DIVIDED BY Hangar
pilot
=============================
'Smith'
'Wilson'

In this example, Smith and Wilson are the two pilots who can fly
everything in the hangar. Notice that Higgins and Celko know how to fly
a Piper Cub, but we don't have one right now. In Codd's original
definition of relational division, having more rows than are called for
is not a problem.

The important characteristic of a relational division is that the CROSS
JOIN (Cartesian product) of the divisor and the quotient produces a
valid subset of rows from the dividend. This is where the name comes
from, since the CROSS JOIN acts like a multiplication operator.

Division with a Remainder

There are two kinds of relational division. Division with a remainder
allows the dividend table to have more values than the divisor, which
was Codd's original definition. For example, if a pilot can fly more
planes than just those we have in the hangar, this is fine with us. The
query can be written in SQL-89 as

SELECT DISTINCT pilot
FROM PilotSkills AS PS1
WHERE NOT EXISTS
(SELECT *
FROM Hangar
WHERE NOT EXISTS
(SELECT *
FROM PilotSkills AS PS2
WHERE (PS1.pilot = PS2.pilot)
AND (PS2.plane = Hangar.plane)));

The quickest way to explain what is happening in this query is to
imagine an old World War II movie where a cocky pilot has just walked
into the hangar, looked over the fleet, and announced, "There ain't no
plane in this hangar that I can't fly!" We are finding the pilots for
whom there does not exist a plane in the hangar for which they have no
skills. The use of the NOT EXISTS() predicates is for speed. Most SQL
systems will look up a value in an index rather than scan the whole
table. The SELECT * clause lets the query optimizer choose the column
to use when looking for the index.

This query for relational division was made popular by Chris Date in his
textbooks, but it is not the only method nor always the fastest.
Another version of the division can be written so as to avoid three
levels of nesting. While it is not original with me, I have made it
popular in my books.

SELECT PS1.pilot
FROM PilotSkills AS PS1, Hangar AS H1
WHERE PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar);

There is a serious difference in the two methods. Burn down the hangar,
so that the divisor is empty. Because of the NOT EXISTS() predicates in
Date's query, all pilots are returned from a division by an empty set.
Because of the COUNT() functions in my query, no pilots are returned
from a division by an empty set.

In the sixth edition of his book, INTRODUCTION TO DATABASE SYSTEMS
(Addison-Wesley; 1995 ;ISBN 0-201-82458-2), Chris Date defined another
operator (DIVIDEBY ... PER) which produces the same results as my query,
but with more complexity.

Exact Division

The second kind of relational division is exact relational
division. The dividend table must match exactly to the values of
the divisor without any extra values.

SELECT PS1.pilot
FROM PilotSkills AS PS1
LEFT OUTER JOIN
Hangar AS H1
ON PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar)
AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hangar);

This says that a pilot must have the same number of certificates as
there planes in the hangar and these certificates all match to a plane
in the hangar, not something else. The "something else" is shown by a
created NULL from the LEFT OUTER JOIN.

Please do not make the mistake of trying to reduce the HAVING clause
with a little algebra to:

HAVING COUNT(PS1.plane) = COUNT(H1.plane)

because it does not work; it will tell you that the hangar has (n)
planes in it and the pilot is certified for (n) planes, but not that
those two sets of planes are equal to each other.

Note on Performance

The nested EXISTS() predicates version of relational division was made
popular by Chris Date's textbooks, while the author is associated with
popularizing the COUNT(*) version of relational division. The Winter
1996 edition of DB2 ON-LINE MAGAZINE http://www.db2mag.com/96011ar:htm)
had an article entitled "Powerful SQL:Beyond the Basics" by Sheryl
Larsen which gave the results of testing both methods. Her conclusion
for DB2 was that the nested EXISTS() version is better when the quotient
has less than 25% of the dividend table's rows and the COUNT(*) version
is better when the quotient is more than 25% of the dividend table.

--CELKO--


*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Chris2

unread,
Feb 23, 2004, 5:30:55 PM2/23/04
to
David S.,

You mentioned filters . . . may I ask if you are using MS Access?


Sincerely,

Chris O.

"David Scemama" <david....@wanadoo.fr> wrote in message
news:%23S%23JyMj%23DHA...@TK2MSFTNGP10.phx.gbl...

David Scemama

unread,
Feb 24, 2004, 5:21:26 AM2/24/04
to
Here is the DDL:

CREATE TABLE Model (
id int IDENTITY (1, 1) NOT NULL PRIMARY KEY,
name varchar (25) NOT NULL
)

CREATE TABLE Critere (
id int IDENTITY (1, 1) NOT NULL PRIMARY KEY,
name varchar (30) NOT NULL
)

CREATE TABLE Mod_Crit (
[id] [int] IDENTITY (1, 1) NOT NULL PRIMARY KEY,
idMod int NOT NULL REFERENCES Model(id),
idCrit int NOT NULL REFERENCES Critere(id)
)

CREATE TABLE FilterCrit (
id int NOT NULL PRIMARY KEY
)

Here are sample data
Model
id name
1 VENICE
2 PARIS
3 LONDON
4 NY

Critere
id name
1 SPRING 2003
2 WINTER 2003
3 SPRING 2004
4 WINTER 2004

Mod_Crit
id idMod idCrit
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 3
7 3 1
8 3 2
9 4 3
10 4 4

I need to find the models that can be used for both SPRING 2003 and SPRING
2004, so the FilterCrit table is:
FilterCrit
id
1
3

The result of the request should be models: VENICE and PARIS

David


"Invotion" <anon...@discussions.microsoft.com> wrote in message
news:1521201c3fa37$65e9c3f0$a401...@phx.gbl...

David Scemama

unread,
Feb 24, 2004, 5:23:01 AM2/24/04
to
Thanks a lot for you clear and complete answer.

David

"Joe Celko" <joe....@northface.edu> wrote in message
news:urcvosj%23DHA...@TK2MSFTNGP09.phx.gbl...

Joe Celko

unread,
Feb 24, 2004, 10:42:23 AM2/24/04
to
Thanks for the DDL, even if it is too late :)

Please read ISO 11179 and any book on data modeling; your schema is
dangerous wrong. You literally have no verifiable, relational keys on
any of the tables, the data element names are meaningless. You have
designed a bad file system in SQL.

--CELKO--
===========================


Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***

David Scemama

unread,
Feb 25, 2004, 5:05:54 AM2/25/04
to
Hi

the DLL you see is not really the one in my DB. I removed everything
unnecessary to simplify it.

But thanks for the advices, I'm not an expert in design and I'll check.

David

"Joe Celko" <joe....@northface.edu> wrote in message

news:eGH6Mzu%23DHA...@TK2MSFTNGP09.phx.gbl...

0 new messages