relational algrabra division?

ruben safir

unread,

Dec 11, 2014, 1:04:22 PM12/11/14

to

Does anyone understand this. The way this is written it is not
comprehendable. The text doesn't flow logically and the example, which
in theory is supposed to explain what the text does not, does not
explain it either, which leaves this as "magic logic" -

"And Blood of a Nwet and Poof, you get the desired chicken!"

<<6.3.4 The DIVISION Operation
The DIVISION operation, denoted by ÷, is useful for a special kind of
query that
sometimes occurs in database applications.>>

Really? What kind?

<< An example is Retrieve the names of employees who work on all the
projects that ‘John Smith’ works on. >>

What class of query is this called and how is it generlized mathmatically.

<<To express this query using the DIVISION operation, proceed as follows.>>

OK - we will learn by exmple since we have no definitions

<<First, retrieve the list of project numbers that ‘John Smith’ works on
in the intermediate relation
SMITH_PNOS:>>

OK

<<SMITH ← σFname=‘John’ AND Lname=‘Smith’(EMPLOYEE)
SMITH_PNOS ← πPno(WORKS_ONEssn=SsnSMITH)>>

OK - that makes sense

<< Next, create a relation that includes a tuple <Pno, Essn> whenever
the employee
whose Ssn is Essn works on the project whose number is Pno in the
intermediate
relation SSN_PNOS:>>

That is a lot of words that make NO SENSE. This sentence doesn't parse
English Gammar in a way that is meaningful, which is interesting since
this is a chapter on Relational Algrabra for which Grammar is a specific
subset and application of.

In other words, I don't think the auther knows what he wrote here and
his editor didn't know enough to fix it or clarify it. But we have an
example:

<<SSN_PNOS ← πEssn, Pno(WORKS_ON)>>

Fine this MEANS project all the tuples in WORKS_ON by restricting them
to the attributes of Essn and Pno and assign it to a relationship we
call SSN_PNOS

<<Finally, apply the DIVISION operation to the two relations, which
gives the desired employees’ Social Security numbers:>>

PRESTO!!! Use this magic symbol and get your desired result!

<<SSNS(Ssn) ← SSN_PNOS ÷ SMITH_PNOS>>

I have no idea what this does and it hasn't been explained.

<<RESULT ← πFname, Lname(SSNS * EMPLOYEE)>>

And this says take a natural join of SSNS and EMPLOYEE and project just
the names.

Now dissect the example more closely. Maybe "Presto" will show itself
and a logical contruction.

SSN_PNOS

Essn Pno

123456789 1

123456789 2

666884444 3

453453453 1

453453453 2

333445555 2

333445555 3

333445555 10

333445555 20

999887777 30

999887777 10

987987987 10

987987987 30

987654321 30

987654321 20

888665555 20

SMITH_PNOS
Pno
1
2

SSNS
Ssn
123456789
453453453

Magic Results, no explanation. This is the kind of soft material that
litters this class everywhere. I don't understand it

James K. Lowden

unread,

Dec 11, 2014, 9:13:19 PM12/11/14

to

On Thu, 11 Dec 2014 13:04:27 -0500
ruben safir <ru...@mrbrklyn.com> wrote:

> What class of query is this called and how is it generlized
> mathmatically.

A join is called a "cross product" and may be thought of as
multiplication. Relational division is its inverse:

C = A × B
B = C ÷ A

It is not well represented in SQL, unfortunately.

>> sometimes occurs in database applications.
>
> Really? What kind?

> (e) Retrieve the names of employees who work on every project

That kind!

--jkl

ruben safir

unread,

Dec 12, 2014, 2:33:18 AM12/12/14

to

On 12/11/2014 09:13 PM, James K. Lowden wrote:
>> > (e) Retrieve the names of employees who work on every project
> That kind!

in deed. With enough scotch I can understand this

There is this line in the text about division :(

The DIVISION operation can be expressed as a sequence of π, ×, and –
operations as
follows:
T1 ← πY(R)
T2 ← πY((S × T1) – R)
T ← T1 – T2

tried this and go some insite but it is still just magic

-CELKO-

unread,

Dec 12, 2014, 8:20:50 AM12/12/14

to

Relational division is one of the eight basic operations in Codd's relational algebra. The idea is that a divisor table is used to partition a dividend table and produce a quotient or results table. The quotient table is made up of those values of one column for which a second column had all of the values in the divisor. There is a really good presentation on four ways to do this at: http://www.cs.arizona.edu/people/mccann/research/divpresentation.pdf

This is easier to explain with an example. We have a table of pilots and the planes they can fly (dividend); we have a table of planes in the hangar (divisor); we want the names of the pilots who can fly every plane (quotient) in the hangar. To get this result, we divide the PilotSkills table by the planes in the hangar.

CREATE TABLE PilotSkills
(pilot CHAR(15) NOT NULL,
plane CHAR(15) NOT NULL,
PRIMARY KEY (pilot, plane));

PilotSkills
pilot plane
=========================
'Celko' 'Piper Cub'
'Higgins' 'B-52 Bomber'
'Higgins' 'F-14 Fighter'
'Higgins' 'Piper Cub'
'Jones' 'B-52 Bomber'
'Jones' 'F-14 Fighter'
'Smith' 'B-1 Bomber'
'Smith' 'B-52 Bomber'
'Smith' 'F-14 Fighter'
'Wilson' 'B-1 Bomber'
'Wilson' 'B-52 Bomber'
'Wilson' 'F-14 Fighter'
'Wilson' 'F-17 Fighter'

CREATE TABLE Hangar
(plane CHAR(15) NOT NULL PRIMARY KEY);

Hangar
plane
=============
'B-1 Bomber'
'B-52 Bomber'
'F-14 Fighter'

PilotSkills DIVIDED BY Hangar
pilot
=============================
'Smith'
'Wilson'

In this example, Smith and Wilson are the two pilots who can fly everything in the hangar. Notice that Higgins and Celko know how to fly a Piper Cub, but we don't have one right now. In Codd's original definition of relational division, having more rows than are called for is not a problem.

The important characteristic of a relational division is that the CROSS JOIN (Cartesian product) of the divisor and the quotient produces a valid subset of rows from the dividend. This is where the name comes from, since the CROSS JOIN acts like a multiplication operator.

Division with a Remainder

There are two kinds of relational division. Division with a remainder allows the dividend table to have more values than the divisor, which was Codd's original definition. For example, if a pilot can fly more planes than just those we have in the hangar, this is fine with us. The query can be written in SQL-89 as

SELECT DISTINCT pilot
FROM PilotSkills AS PS1
WHERE NOT EXISTS
(SELECT *
FROM Hangar
WHERE NOT EXISTS
(SELECT *
FROM PilotSkills AS PS2
WHERE (PS1.pilot = PS2.pilot)
AND (PS2.plane = Hangar.plane)));

The quickest way to explain what is happening in this query is to imagine an old World War II movie where a cocky pilot has just walked into the hangar, looked over the fleet, and announced, "There ain't no plane in this hangar that I can't fly!" We are finding the pilots for whom there does not exist a plane in the hangar for which they have no skills. The use of the NOT EXISTS() predicates is for speed. Most SQL systems will look up a value in an index rather than scan the whole table. The SELECT * clause lets the query optimizer choose the column to use when looking for the index.

This query for relational division was made popular by Chris Date in his textbooks, but it is not the only method nor always the fastest. Another version of the division can be written so as to avoid three levels of nesting. While it is not original with me, I have made it popular in my books.

SELECT PS1.pilot
FROM PilotSkills AS PS1, Hangar AS H1
WHERE PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar);

There is a serious difference in the two methods. Burn down the hangar, so that the divisor is empty. Because of the NOT EXISTS() predicates in Date's query, all pilots are returned from a division by an empty set. Because of the COUNT() functions in my query, no pilots are returned from a division by an empty set.

In the sixth edition of his book, INTRODUCTION TO DATABASE SYSTEMS (Addison-Wesley; 1995 ;ISBN 0-201-82458-2), Chris Date defined another operator (DIVIDEBY ... PER) which produces the same results as my query, but with more complexity.

Exact Division

The second kind of relational division is exact relational division. The dividend table must match exactly to the values of the divisor without any extra values.

SELECT PS1.pilot
FROM PilotSkills AS PS1
LEFT OUTER JOIN
Hangar AS H1
ON PS1.plane = H1.plane
GROUP BY PS1.pilot
HAVING COUNT(PS1.plane) = (SELECT COUNT(plane) FROM Hangar)
AND COUNT(H1.plane) = (SELECT COUNT(plane) FROM Hangar);

This says that a pilot must have the same number of certificates as there planes in the hangar and these certificates all match to a plane in the hangar, not something else. The "something else" is shown by a created NULL from the LEFT OUTER JOIN.

Please do not make the mistake of trying to reduce the HAVING clause with a little algebra to:

HAVING COUNT(PS1.plane) = COUNT(H1.plane)

because it does not work; it will tell you that the hangar has (n) planes in it and the pilot is certified for (n) planes, but not that those two sets of planes are equal to each other.

Note on Performance

The nested EXISTS() predicates version of relational division was made popular by Chris Date's textbooks, while the author is associated with popularizing the COUNT(*) version of relational division. The Winter 1996 edition of DB2 ON-LINE MAGAZINE (http://www.db2mag.com/96011ar:htm) had an article entitled "Powerful SQL:Beyond the Basics" by Sheryl Larsen which gave the results of testing both methods. Her conclusion for DB2 was that the nested EXISTS() version is better when the quotient has less than 25% of the dividend table's rows and the COUNT(*) version is better when the quotient is more than 25% of the dividend table.

James K. Lowden

unread,

Dec 13, 2014, 2:10:30 PM12/13/14

to

On Fri, 12 Dec 2014 02:33:22 -0500
ruben safir <ru...@mrbrklyn.com> wrote:

> The DIVISION operation can be expressed as a sequence of ?, ×, and ?
> operations as
> follows:
> T1 ? ?Y(R)
> T2 ? ?Y((S × T1) ? R)
> T ? T1 ? T2

>
>
> tried this and go some insite but it is still just magic

You may find this helpful.

http://fie-conference.org/fie2003/papers/1057.pdf

--jkl

Ruben Safir

unread,

Dec 13, 2014, 11:32:50 PM12/13/14

to

Thank you. I never know what moviates people to answer questions with
this much depth but thank you. This covers more than I did spending
three days in the library and on line looking for an explanation of this
subject,

I hope I can do this some day for someone.

Ruben

James K. Lowden

unread,

Dec 14, 2014, 5:12:09 PM12/14/14

to

On Fri, 12 Dec 2014 05:20:48 -0800 (PST)
-CELKO- <jcel...@earthlink.net> wrote:

> FROM Hangar
> WHERE NOT EXISTS
> (SELECT *
> FROM PilotSkills AS PS2
> WHERE (PS1.pilot = PS2.pilot)
> AND (PS2.plane = Hangar.plane)));

...

> The use of the NOT EXISTS() predicates is for speed. Most SQL systems
> will look up a value in an index rather than scan the whole table.
> The SELECT * clause lets the query optimizer choose the column to use
> when looking for the index.

I don't want to criticize an otherwise lucid answer, but SELECT * does
no such thing!

Any columns mentioned in the SELECT clause (projection) part of the
subquery are pure anachronism, the result of SQL's
select-from-where syntactical structure. Because they are not *used*
the query optimizer can ignore them.

In the above example, the columns "pilot" and "plane" are the ones
in play, being tested against the PS1 and Hanger tables. Likely the
index supporting the primary key, {pilot, plane}, will be used.
Whatever choice is made, the "*" is of no significance. The optimizer
would have been within its rights to treat "SELECT 1" or "SELECT count
(*)" equivalently, because the mere presence of a single row is all that
matters.

For that reason, I always recommend "SELECT 1" to emphasize to the
reader its role in an existence test.

--jkl

Nicola

unread,

Dec 15, 2014, 2:52:12 PM12/15/14

to

In article <20141214171206.b...@speakeasy.net>,

This is OT, but since it has happened to me recently, there is at least
one case where "select *" cannot be used in the subquery, and that's
when the subquery contains a "group by" clause ("group by... having..."
to be precise). This is one more reason to prefer "select 1" over
"select *". But wait, why not use "select null" since it's a don't care?
After all, "null" is loaded with different meanings already :)

Nicola

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

-CELKO-

unread,

Dec 16, 2014, 8:20:19 AM12/16/14

to

A note on EXISTS(). The use of EXISTS(SELECT *..) versus EXISTS(SELECT <constant>..) should not make a difference in SQL today, but it used to be important! The first versions of Oracle, among others, would actually attempt to return a result set from the subquery.

When we wrote the SQL-89 Standard the EXISTS() was part of the ALL(<subquery>) and [ANY|SOME](<subquery>) predicate family. So the <subquery> had to return a single column table. Yes, I know it looks silly today. The fiction in the standard for SELECT * was that the compiler picked one column and used it. But if you wrote EXISTS(SELECT <constant>..), then the Oracle compiler did not have to find a column and read it, so it was faster.

The myth persisted long after the fact. Today, EXISTS(SELECT *..) is the preferred style. It shows that the predicate applies to a whole row at the table level, and not at the column level.

com...@hotmail.com

unread,

Dec 21, 2014, 3:25:15 AM12/21/14

to

The text states that the operation is useful for a kind of query. It then gives an example query in natural language and the corresponding formal version using the operation and works through its evaluation. The very next thing is a formal definition of the operation. Then another definition of it. The some properies of the operation. Then a third definition. (The one you quote.) Then it relates the operation to natural language.

This is reasonable and clear.

It is not the job of the textbook to anticipate and answer the rhetorical questions of its readers. It is the job of the reader to accept, consider and absorb what a presentation gives.

The sentence that you think makes "no sense" is perfectly grammatical and sensible. Like any sentence it requires a certain amount of ability with the language and a certain amount of effort since it has a certain amount of structure. It might have been a good idea for the authors to use simpler sentences. You could have spent more time trying to parse it. Clearly your English comprehension is not what it could be, and your opinion of your English comprehension is not what it could be.

My suggestion is that you think more about where the authors are trying to take you. "Trust them" and your instructors. This involves holding multiple things in mind.

I agree that not every mentor can be trusted. There's certainly a lot of writing, and textbook writing, and CS textbook writing, and DB textbook writing that could be clearer or much clearer both in wording and in concepts. But your recent messages are simply about writing that is about topics of a certain complexity.

By the way, the whole idea of relational division is a bit dubious and it is much better to approach queries of the kind that it is used for plus queries of a more general kind via relation inequalities and "image relations". So there is an example of a relative conceptual weakness of this book's syllabus.

James K. Lowden

unread,

Dec 21, 2014, 4:53:15 PM12/21/14

to

On Sun, 21 Dec 2014 00:25:14 -0800 (PST)
com...@hotmail.com wrote:

> the whole idea of relational division is a bit dubious and it is much
> better to approach queries of the kind that it is used for plus
> queries of a more general kind via relation inequalities and "image
> relations".

How "dubious", exactly? The only references to "image relation" I've
been able to find come from CJ Date. The search term " (Abstract:image
and Abstract:relation and Abstract:algebra)" in the ACM digital library
returns 41 entries, only one of which -- Database Explorations: Essays
on The Third Manifesto and related topics -- is germane.

So far as I understand them, image relations are *defined* as
relational dividends. (Date doesn't use conventional relational
algebra notation to define them, so it's hard to be sure.) Then, having
more or less assumed their existence, he explains how nice they are are
to use as syntactic elements. That's lovely, but hardly a refutation
of division as a relational operator. If anything, it's more a case
for set-equality as a language feature.

--jkl

com...@hotmail.com

unread,

Dec 22, 2014, 2:02:17 AM12/22/14

to

On Sunday, December 21, 2014 1:53:15 PM UTC-8, James K. Lowden wrote:
> On Sun, 21 Dec 2014 00:25:14 -0800 (PST)
> compdb wrote:
>
> > the whole idea of relational division is a bit dubious and it is much
> > better to approach queries of the kind that it is used for plus
> > queries of a more general kind via relation inequalities and "image
> > relations".
>
> How "dubious", exactly?

It turns out that there are many special cases of "division", ie "forall/forsome/forno" queries that can reasonably be called "division", with two, three and four operands. However they are not all special cases of one operator. It turns out that it is simpler to construct such queries from (in)equalities, semiijoin and projection, and particulary from image relations.

(Eg: Elmasri & Navathe's DIVIDE is actually Date's (unintentionally) abridged (special case) version of Codd's DIVIDE.)

See the Chapter 12 A Brief History of the Relational Divide Operator &
Chapter 14 Image Relations (both with appendices) of
http://www.dcs.warwick.ac.uk/~hugh/TTM/Database-Explorations-revision-2.pdf :

> The only references to "image relation" I've
> been able to find come from CJ Date.

I put it in quotes because it is Date's term for certain relations with id.

> So far as I understand them, image relations are *defined* as
> relational dividends.

No. An image relation is what is reasonably described as the image of a tuple in a relation. For division queries the tuple and relation are in a certain context.

Date happens to have special syntax "!!s" inside the wff of r-WHERE-wff for the image of a tuple of r in s. The fact that he has special notation is beside the point. It is short for (s JOIN RELATION{<r1 r1,...>}) PROJECT_OUT {r1,...}. He used to have a special syntax "MATCHING s" for s SEMIJOIN RELATION{<r1 r1,...>} but it turns out not to be as useful/clear. The idea is that s maps <r1 r1,...> to (a relation containing) some subtuples in s.

> If anything, it's more a case
> for set-equality as a language feature.

Having relation (in)equality does not of itself suggest that

> > the whole idea of relational division is a bit dubious and it is much
> > better to approach queries of the kind that it is used for plus
> > queries of a more general kind via relation inequalities and "image
> > relations".

philip

com...@hotmail.com

unread,

Dec 22, 2014, 2:25:08 AM12/22/14

to

On Sunday, December 21, 2014 11:02:17 PM UTC-8, com wrote:
> > On Sun, 21 Dec 2014 00:25:14 -0800 (PST) compdb wrote:

> > > "image relations".

> I put it in quotes because it is Date's term for certain relations with id.

with division-related idioms. (I don't have a more common term, author or reference.)

philip