double linked list

Juergen

unread,

Jan 28, 2003, 10:44:29 AM1/28/03

to

Hi folks!,

I've got a table called 'link_t' containing a collection of seller -
buyer relations between two parties.

sql> select * from link_t

S B
- -
C X
A B
B C
C D
D E

5 rows selected.

I am looking for a select statement that returns the concatenation of
seller - buyer relations between the first seller 'A' and the last
buyer 'B'.

the result should be

S B
- -
A B
B C
C D
D E

Currently I fumbling around with self joins but haven't figured out
yet.
Any suggestions

Many tanks in advance

Cheers

Juergen

DA Morgan

unread,

Jan 28, 2003, 11:45:21 AM1/28/03

to

Juergen wrote:

Look at the CONNECT BY built-in function.

Daniel Morgan

--CELKO--

unread,

Jan 28, 2003, 6:54:15 PM1/28/03

to

>> I've got a table called 'link_t' containing a collection of seller
-
buyer relations between two parties. <<

That is not a real linked list, but let's ignore bad terminology. One
way to do this is with cursors, but they will take time and trend to
be proprietary.

Anohter way is to build a tree, with the first seller as the root and
the final buyer as a leaf node.

The usual example of a tree structure in SQL books is called an
adjacency list model and it looks like this:

CREATE TABLE OrgChart
(emp CHAR(10) NOT NULL PRIMARY KEY,
boss CHAR(10) DEFAULT NULL REFERENCES OrgChart(emp),
salary DECIMAL(6,2) NOT NULL DEFAULT 100.00);

OrgChart
emp boss salary
===========================
'Albert' 'NULL' 1000.00
'Bert' 'Albert' 900.00
'Chuck' 'Albert' 900.00
'Donna' 'Chuck' 800.00
'Eddie' 'Chuck' 700.00
'Fred' 'Chuck' 600.00

Another way of representing trees is to show them as nested sets.
Since SQL is a set oriented language, this is a better model than the
usual adjacency list approach you see in most text books. Let us
define a simple OrgChart table like this, ignoring the left (lft) and
right (rgt) columns for now. This problem is always given with a
column for the employee and one for his boss in the textbooks. This
table without the lft and rgt columns is called the adjacency list
model, after the graph theory technique of the same name; the pairs of
emps are adjacent to each other.

CREATE TABLE OrgChart
(emp CHAR(10) NOT NULL PRIMARY KEY,
lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
CONSTRAINT order_okay CHECK (lft < rgt) );

OrgChart
emp lft rgt
======================
'Albert' 1 12
'Bert' 2 3
'Chuck' 4 11
'Donna' 5 6
'Eddie' 7 8
'Fred' 9 10

The organizational chart would look like this as a directed graph:

Albert (1,12)
/ \
/ \
Bert (2,3) Chuck (4,11)
/ | \
/ | \
/ | \
/ | \
Donna (5,6) Eddie (7,8) Fred (9,10)

The first table is denormalized in several ways. We are modeling both
the OrgChart and the organizational chart in one table. But for the
sake of saving space, pretend that the names are job titles and that
we have another table which describes the OrgChart that hold those
positions.

Another problem with the adjacency list model is that the boss and
employee columns are the same kind of thing (i.e. names of OrgChart),
and therefore should be shown in only one column in a normalized
table. To prove that this is not normalized, assume that "Chuck"
changes his name to "Charles"; you have to change his name in both
columns and several places. The defining characteristic of a
normalized table is that you have one fact, one place, one time.

The final problem is that the adjacency list model does not model
subordination. Authority flows downhill in a hierarchy, but If I fire
Chuck, I disconnect all of his subordinates from Albert. There are
situations (i.e. water pipes) where this is true, but that is not the
expected situation in this case.

To show a tree as nested sets, replace the emps with ovals, then nest
subordinate ovals inside each other. The root will be the largest oval
and will contain every other emp. The leaf emps will be the innermost
ovals with nothing else inside them and the nesting will show the
hierarchical relationship. The rgt and lft columns (I cannot use the
reserved words LEFT and RIGHT in SQL) are what shows the nesting.

If that mental model does not work, then imagine a little worm
crawling anti-clockwise along the tree. Every time he gets to the left
or right side of a emp, he numbers it. The worm stops when he gets all
the way around the tree and back to the top.

This is a natural way to model a parts explosion, since a final
assembly is made of physically nested assemblies that final break down
into separate parts.

At this point, the boss column is both redundant and denormalized, so
it can be dropped. Also, note that the tree structure can be kept in
one table and all the information about a emp can be put in a second
table and they can be joined on employee number for queries.

To convert the graph into a nested sets model think of a little worm
crawling along the tree. The worm starts at the top, the root, makes a
complete trip around the tree. When he comes to a emp, he puts a
number in the cell on the side that he is visiting and increments his
counter. Each emp will get two numbers, one of the right side and one
for the left. Computer Science majors will recognize this as a
modified preorder tree traversal algorithm. Finally, drop the unneeded
OrgChart.boss column which used to represent the edges of a graph.

This has some predictable results that we can use for building
queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*)
FROM TreeTable)); leaf emps always have (left + 1 = right); subtrees
are defined by the BETWEEN predicate; etc. Here are two common queries
which can be used to build others:

1. An employee and all their Supervisors, no matter how deep the tree.

SELECT O2.*
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp = :myemployee;

2. The employee and all subordinates. There is a nice symmetry here.

SELECT O1.*
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O2.emp = :myemployee;

3. Add a GROUP BY and aggregate functions to these basic queries and
you have hierarchical reports. For example, the total salaries which
each
employee controls:

SELECT O2.emp, SUM(S1.salary)
FROM OrgChart AS O1, OrgChart AS O2,
Salaries AS S1
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp = S1.emp
GROUP BY O2.emp;

4. To find the level of each emp, so you can print the tree as an
indented listing. Technically, you should declare a cursor to go with
the ORDER BY clause.

SELECT COUNT(O2.emp) AS indentation, O1.emp
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
GROUP BY O1.lft. O1.emp
ORDER BY O1.lft;

5. The nested set model has an implied ordering of siblings which
theadjacency list model does not. To insert a new node, G1, under part
G. We can insert one node at a time like this:

BEGIN ATOMIC
DECLARE rightmost_spread INTEGER;

SET rightmost_spread
= (SELECT rgt
FROM Frammis
WHERE part = 'G');
UPDATE Frammis
SET lft = CASE WHEN lft > rightmost_spread
THEN lft + 2
ELSE lft END,
rgt = CASE WHEN rgt >= rightmost_spread
THEN rgt + 2
ELSE rgt END
WHERE rgt >= rightmost_spread;

INSERT INTO Frammis (part, lft, rgt)
VALUES ('G1', rightmost_spread, (rightmost_spread + 1));
COMMIT WORK;
END;

The idea is to spread the lft and rgt numbers after the youngest child
of the parent, G in this case, over by two to make room for the new
addition, G1. This procedure will add the new node to the rightmost
child position, which helps to preserve the idea of an age order among
the siblings.

6. To convert a nested sets model into an adjacency list model:

SELECT B.emp AS boss, P.emp
FROM OrgChart AS P
LEFT OUTER JOIN
OrgChart AS B
ON B.lft
= (SELECT MAX(lft)
FROM OrgChart AS S
WHERE P.lft > S.lft
AND P.lft < S.rgt);

7. To convert an adjacency list to a nested set model, use a push down
stack. Here is version with a stack in SQL/PSM.

-- Tree holds the adjacency model
CREATE TABLE Tree
(node CHAR(10) NOT NULL,
parent CHAR(10));

-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
node CHAR(10) NOT NULL,
lft INTEGER,
rgt INTEGER);

BEGIN ATOMIC
DECLARE counter INTEGER;
DECLARE max_counter INTEGER;
DECLARE current_top INTEGER;

SET counter = 2;
SET max_counter = 2 * (SELECT COUNT(*) FROM Tree);
SET current_top = 1;

--clear the stack
DELETE FROM Stack;

-- push the root
INSERT INTO Stack
SELECT 1, node, 1, max_counter
FROM Tree
WHERE parent IS NULL;

-- delete rows from tree as they are used
DELETE FROM Tree WHERE parent IS NULL;

WHILE counter <= max_counter- 1
IF EXISTS (SELECT *
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top)

BEGIN -- push when top has subordinates and set lft value
INSERT INTO Stack
SELECT (current_top + 1), MIN(T1.node), counter, NULL
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top;

-- delete rows from tree as they are used
DELETE FROM Tree
WHERE node = (SELECT node
FROM Stack
WHERE stack_top = current_top + 1);
-- housekeeping of stack pointers and counter
SET counter = counter + 1;
SET current_top = current_top + 1;
END
ELSE
BEGIN -- pop the stack and set rgt value
UPDATE Stack
SET rgt = counter,
stack_top = -stack_top -- pops the stack
WHERE stack_top = current_top
SET counter = counter + 1;
SET current_top = current_top - 1;
END;
END IF;
-- the top column is not needed in the final answer
SELECT node, lft, rgt FROM Stack;
END;

I have a book on TREES & HIERARCHIES IN SQL coming out in 2003, but in
the meantime, you can look at:

http://searchdatabase.techtarget.com/tip/1,289483,sid13_gci537290,00.html

http://searchdatabase.techtarget.com/tip/1,289483,sid13_gci801943,00.html

Juergen

unread,

Jan 29, 2003, 7:12:09 AM1/29/03

to

heavy stuff Celko. I would lie if I would pretend I fully understand
Your answer. I'll let sink it in.

However, I dont store a consistent tree structure. The table at hand
is more a kind of a collection of graphs where I want to find all
possible paths between a given starting point and a given end point

cheers

Juergen

71062...@compuserve.com (--CELKO--) wrote in message news:<c0d87ec0.03012...@posting.google.com>...

Jan Hidders

unread,

Jan 29, 2003, 8:48:56 AM1/29/03

to

Juergen wrote:
>
>However, I dont store a consistent tree structure. The table at hand
>is more a kind of a collection of graphs where I want to find all
>possible paths between a given starting point and a given end point

A collection of graphs? As you presented the problem it was simply a single
graph.

That's not possible in a single SQL statement without using some form of
recursion such as the CONNECT BY in Oracle that was already mentioned or the
recursive queries as are possible in RDB. Another "poor man's solution" could for
example be to add a table Reachable(node, from_a, to_b) with 'from_a' and
'from_b' boolean field that indicate that the node is reachable from a and
that b is reachable from this node. You could compute this relation by
repeating a certain SQL update statement that:
1. sets from_a of node n to true if there is a node n' that is reachable
from a and there is an edge from n' to n, and
2. set to_b of node n to tur if there is a node n' that leads to b and there
is an edge from n to n'.
You repeat that until no more flags are changed. Then you select only those
edges for which the begin and node have both flags set to true.

Mikito Harakiri

unread,

Jan 29, 2003, 1:45:40 PM1/29/03

to

"Jan Hidders" <hid...@REMOVE.THIS.uia.ua.ac.be> wrote in message
news:3e37dbc8$1...@news.uia.ac.be...

> Juergen wrote:
> >
> >However, I dont store a consistent tree structure. The table at hand
> >is more a kind of a collection of graphs where I want to find all
> >possible paths between a given starting point and a given end point
>
> A collection of graphs? As you presented the problem it was simply a
single
> graph.

There is no such thing as a collection of graphs. Any graph in the
collection can be viewed just as a (disconnected) component of the aggregate
graph.

> That's not possible in a single SQL statement without using some form of
> recursion such as the CONNECT BY in Oracle that was already mentioned or
the
> recursive queries as are possible in RDB. Another "poor man's solution"
could for
> example be to add a table Reachable(node, from_a, to_b) with 'from_a' and
> 'from_b' boolean field that indicate that the node is reachable from a and
> that b is reachable from this node. You could compute this relation by
> repeating a certain SQL update statement that:
> 1. sets from_a of node n to true if there is a node n' that is reachable
> from a and there is an edge from n' to n, and
> 2. set to_b of node n to tur if there is a node n' that leads to b and
there
> is an edge from n to n'.
> You repeat that until no more flags are changed. Then you select only
those
> edges for which the begin and node have both flags set to true.

There is not much about graph labeling in the literature! In the meantime,
here is one more labeling schema for the trees:

http://www.dbazine.com/tropashko4.html

--CELKO--

unread,

Jan 29, 2003, 4:15:51 PM1/29/03

to

>> The table at hand is more a kind of a collection of graphs where I
want to find all possible paths between a given starting point and a

given end point. <<

For the reachabiity index of a general graph, you need Warshal's
algorithm.

Let V = number of nodes in the graph
Let A[i,j] be the adjacency matrix for the undirected graph

FOR j:= 1 TO V
DO FOR i:= 1 TO V
DO IF A[i,j] = 1
THEN FOR k := 1 TO V
DO IF A[j,k]] = 1
THEN A[i,k]] := 1;

You can also do a summation to get the length of the path from i to j.
You can concatenate names of the nodes into a string that gives the
path, etc.

Her is a first attempt at some SQL; I am sure it can be done better

CREATE TABLE Graph
(i CHAR(2) NOT NULL,
j CHAR(2) NOT NULL,
flag CHAR(1) NOT NULL DEFAULT 'n'
CHECK (flag IN ('n', 'y')),
PRIMARY KEY (i,j));

INSERT INTO Graph (i, j, flag)
SELECT DISTINCT G1.i, G2.j, 'y'
FROM Graph AS G1, Graph AS G1
WHERE G1.i <> G2.j
AND EXISTS
(SELECT *
FROM Graph AS G3
WHERE G3.i = G1.j
AND G3.j = G2.i)
AND NOT EXISTS
(SELECT *
FROM Graph AS G3
WHERE (G3.i = G1.i AND G3.j = G1.j))
OR (G3.i = G2.i AND G3.j = G2.j));

You wll have to run this statement until the size of Graph does not
change -- no new rows are being added.

Ryan

unread,

Jan 29, 2003, 4:30:11 PM1/29/03

to

are you joe celko, guy who wrote those sql books?

"--CELKO--" <71062...@compuserve.com> wrote in message
news:c0d87ec0.03012...@posting.google.com...

--CELKO--

unread,

Jan 30, 2003, 2:13:39 PM1/30/03

to

>> are you joe celko, guy who wrote those sql books? <<

Yes.

Juergen

unread,

Jan 31, 2003, 7:49:24 AM1/31/03

to

first of all thanks for Your answers. They certainly gave me direction
on what to learn concerning SQL.

Being a novice in SQL and after doing just a couple of days of
research on that matter I got the impression that SQL hasn't yet
evolved to that point to provide simple to use keywords for these kind
of problems. Oracle 9i seems to
have been progressed over 8i in this regard. One can use subqueries
with the *with as* clause, one can use the *connect by* statement on
views and joined statements and there may be a couple of other
features...
However I am restricted to 8i and java and the requirement not to use
pl/sql for the sake of a common code base.

Currently I figure to use

select cs, cb, level
from link_t
start with cb in (
select cb
from link_t
where ´cs = 'A')
connect by prior cb = cs

that will return a tree containing all paths with the root as the
common starting point. I would load this structure into a middle teer
and apply some sort of commonly used tree traversial algorithm to it
to search for nodes and endpoints of interest.

Cheers and thanks again

Juergen

71062...@compuserve.com (--CELKO--) wrote in message news:<c0d87ec0.03012...@posting.google.com>...

--CELKO--

unread,

Feb 1, 2003, 4:23:11 PM2/1/03

to

>> Being a novice in SQL and after doing just a couple of days of

research on that matter ... <<

Another boss who will not pay for training :0!!

>> I got the impression that SQL hasn't yet evolved to that point to
provide simple to use keywords for these kind of problems. <<

SQL is a set oriented language, not a procedural language. You have
to learn to model data in terms of sets and not sequences to use it.

>> Oracle 9i seems to have been progressed over 8i in this regard. <<

Actually, Oracle is a horrible product and their extensions are flaws
that lock the code into a particular underlying sequential physical
implementation with sorting and cursors, no parallelism, etc.

>> Currently I figure to use

SELECT cs, cb, level
FROM link_t
START WITH cb
IN (SELECT cb
FROM link_t
WHERE cs = 'A')
CONNECT BY PRIOR cb = cs;

that will return a tree containing all paths with the root as the
common starting point. <<

It returns a sequential file that represents a table and that
represernation depends on the order of the records (they are no
longer rows in a table).

>> I would load this structure into a middle tier and apply some sort

of commonly used tree traversial algorithm to it to search for nodes
and endpoints of interest. <<

You can do all of those searches in the Nested Sets model with a
single query, no middle ware and they will run 10 to 100 times faster
for large trees.

I have a book on trees in SQL that will be published later this year
that you might want to buy.

DA Morgan

unread,

Feb 1, 2003, 4:36:16 PM2/1/03

to

--CELKO-- wrote:

I'll not take issue with you with respect to your field of expertise. But
to say Oracle is a horrible product presumes two things: first that you
are being objective with respect to Oracle in comparison with other
products, and second that you have sufficient expertise with the product
from which to render that opinion.

If I am incorrect I will gladly stand corrected. But based on my
knowledge of what you have been doing for the last few years ... I have
my doubts about both. I have worked with databases from Teradata to
Oracle to DB2 and most of the rest of the field. And I see nothing in
Oracle that isn't at least the equal to the other major RDBMS
implementations including the one in which you do your work.

Daniel Morgan

Mikito Harakiri

unread,

Feb 2, 2003, 12:51:40 AM2/2/03

to

"--CELKO--" <71062...@compuserve.com> wrote in message

news:c0d87ec0.03020...@posting.google.com...

> >> Currently I figure to use
>
> SELECT cs, cb, level
> FROM link_t
> START WITH cb
> IN (SELECT cb
> FROM link_t
> WHERE cs = 'A')
> CONNECT BY PRIOR cb = cs;
>
> that will return a tree containing all paths with the root as the
> common starting point. <<
>
> It returns a sequential file that represents a table and that
> represernation depends on the order of the records (they are no
> longer rows in a table).

Looks like we won't see a "connect-by" chapter in your book;-)

In addition to

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=bdf69bdf.02
08011026.9caa07a%40posting.google.com

I would mention that there is a striking syntactic and semantic similarity
between "connect-by" and "group-by"...

DA Morgan

unread,

Feb 2, 2003, 5:09:21 PM2/2/03

to

I think Joe's problem is spelled 'SQL SERVER'. CONNECT BY isn't in the Microsoft product.

Daniel Morgan

--CELKO--

unread,

Feb 2, 2003, 7:54:59 PM2/2/03

to

>> Looks like we won't see a "connect-by" chapter in your book;-) <<

I do mention it -- as the work of Satan :)

>> I would mention that there is a striking syntactic and semantic
similarity between "connect-by" and "group-by"... <<

Syntax, maybe if you count using the word "BY" in the construct. But
they are nothing alike.

Here is how a SELECT works in SQL ... at least in theory. Real
products will optimize things when they can.

a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors
are there. The table expression> AS <correlation name> option allows
you give a name to this working table which you then have to use for
the rest of the containing query.

b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (reject UNKNOWN and FALSE). The
WHERE clause is applied to the working in the FROM clause.

c) Go to the optional GROUP BY clause, make groups and reduce each
group to a single row, replacing the original working table with the
new grouped table. The rows of a grouped table must be group
characteristics: (1) a grouping column (2) a statistic about the group
(i.e. aggregate functions) (3) a function or (4) an expression made up
of the those three items.

d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.

e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done. The AS
operator can give a name to expressions in the SELECT list, too.
These new names come into existence all at once, but after the WHERE
clause, GROUP BY clause and HAVING clause has been executed; you
cannot use them in the SELECT list or the WHERE clause for that
reason.

If there is a SELECT DISTINCT, then redundant duplicate rows are
removed. For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).

f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.

The CONNECT BY is a rule for a cursor to traverse a chain.

Jim Kennedy

unread,

Feb 2, 2003, 8:50:21 PM2/2/03

to

God, forbid that someone use something that makes their life easier. Those
of use that are in the trenches use what tools we have to accomplish our
tasks and ignore academician's purity arguments.

"--CELKO--" <71062...@compuserve.com> wrote in message
news:c0d87ec0.03020...@posting.google.com...

> >> Looks like we won't see a "connect-by" chapter in your book;-) <<
>
> I do mention it -- as the work of Satan :)
>

<snip for brevity>
Jim

DA Morgan

unread,

Feb 2, 2003, 8:59:58 PM2/2/03

to

--CELKO-- wrote:

So because you don't happen to like something it makes the product
'horrible'? Not ... there are certain features or facilities in the
product I don't use and why ... just a blanket thumb down? I'm not
impressed. I can come up with a list of poor features and poorly
implemented features in any product from assembly language on up. Does
that mean we should we go back to 3x5 cards?

When you are ready to demonstrate a superior implementation in another SQL
RDBMS please let us know. And I don't mean superior in terms of some
theoretical concept like building a database in fifth normal form. I mean
superior in terms of performance and scalability in the real world.

Daniel Morgan

Galen Boyer

unread,

Feb 2, 2003, 10:30:19 PM2/2/03

to

On 1 Feb 2003, 71062...@compuserve.com wrote:

> Actually, Oracle is a horrible product and their extensions are flaws
> that lock the code into a particular underlying sequential physical
> implementation with sorting and cursors, no parallelism, etc.

Pretty strong statement. Care to expound and provide examples of the
better ones?
--
Galen deForest Boyer
Sweet dreams and flying machines in pieces on the ground.

Juergen

unread,

Feb 3, 2003, 3:54:31 AM2/3/03

to

71062...@compuserve.com (--CELKO--) wrote in message news:<c0d87ec0.03020...@posting.google.com>...

> >> Being a novice in SQL and after doing just a couple of days of
> research on that matter ... <<
>
> Another boss who will not pay for training :0!!

Is not that my boss wouldn't pay for training. It is just that I can't
take an endless steep learning curve. It is not just SQL I have to
learn to get my job done, there other aereas as well.

> >> I got the impression that SQL hasn't yet evolved to that point to
> provide simple to use keywords for these kind of problems. <<
>
> SQL is a set oriented language, not a procedural language. You have
> to learn to model data in terms of sets and not sequences to use it.

What I realized, it is a totally different animal and it is amazing
what one can achieve with statements just a few lines long. I will
teach myself up on that stuff. But it'll take time to sink in.

Cheers

Juergen

P.S. CELKO, I learnd about Your good reputation in the SQL community.
Keep care to not destroy it by cheaply bashing trademarks without
giving a clean argumentation.

DA Morgan

unread,

Feb 3, 2003, 11:31:49 AM2/3/03

to

Juergen wrote:

> <snipped>

>
>
> Cheers
>
> Juergen
>
> P.S. CELKO, I learnd about Your good reputation in the SQL community.
> Keep care to not destroy it by cheaply bashing trademarks without
> giving a clean argumentation.

Well said. That's what I was thinking when I made the decision to challenge his statement.

Daniel Morgan

--CELKO--

unread,

Feb 3, 2003, 5:09:03 PM2/3/03

to

>> When you are ready to demonstrate a superior implementation in
another SQL
RDBMS please let us know. And I don't mean superior in terms of some
theoretical concept like building a database in fifth normal form. I
mean superior in terms of performance and scalability in the real
world. <<

Funny you should ask :)!! I have a book coming out later this year on
TREES & HIERARCHIES IN SQL (Morgan-Kaufmann), which gives several
different models in Standard SQL-92 for tree structures which are
superior in terms of performance and scalability compared to the
CONNECT BY.

A few years back, we ran a test on a huge tree with the following
method versus cursors and CONNECT BY; it was 20 to 100 times faster
for 75,000 nodes and 12 levels deep. It could also do things like
compare the structures of sub-trees in one SELECT statement.

I have a book on TREES & HIERARCHIES IN SQL coming out in 2003 with
more code and details.

--CELKO--

unread,

Feb 3, 2003, 5:17:35 PM2/3/03

to

>> Is not that my boss wouldn't pay for training. It is just that I
can't take an endless steep learning curve. It is not just SQL I have

to learn to get my job done, there other areas as well. <<

I know I am preaching to church chorus, but there is too much of this
happening right now. Nobody can be an expert in a dozen different
languages, environments, etc. and do a good job. There are enough
unemployed geeks on the market that companies ought to be able to rent
the expertise they need if they are not willing to take the time to do
it right. You are already know this and I am jut ranting ...

>> What I realized, it is a totally different animal and it is amazing
what one can achieve with statements just a few lines long. I will
teach myself up on that stuff. But it'll take time to sink in. <<

I estimate it takes about one year of full time SQL programming to
"un-learn" procedural thinking and then some more time after that to
get to be a good SQL programmer. The people who have the least
problems are LISP programmers -- they are used to nesting functions on
a data structure to get a result.

>> Keep care to not destroy it by cheaply bashing trademarks without
giving a clean argumentation. <<

I do the details in magazine columns <g>.

DA Morgan

unread,

Feb 3, 2003, 5:19:08 PM2/3/03

to

--CELKO-- wrote:

Thanks for your valiant attempt to sell your book. As I have you on my
recommended reading list of authors for my students at the University of
Washington no doubt I'll buy a copy if you don't get too terribly
offensive.

But unless you are going to state that the above can NOT be done in Oracle
... the fact that CONNECT BY exists is not license to call the product, as
you did, 'horrible'.

So are you complaining because there is a less superior method available
for those not as technically proficient as you, are you complaining
because that less superior implementation doesn't exist in Transact SQL,
or are you complaining just because you don't personally work in Oracle?
Seems to me the posting you did, that started this thread, was 99%
hyperbole and that perhaps a restatement would be appropriate without
editorializing would be appropriate.

Daniel Morgan

--CELKO--

unread,

Feb 4, 2003, 9:49:04 PM2/4/03

to

>> But unless you are going to state that the above can NOT be done in
Oracle ... the fact that CONNECT BY exists is not license to call the
product, as you did, 'horrible'. <<

How about if I say the code to do tree manipulation with CONNECT BY is
orders of magnitude slower, not Standard, not portable and quickly
becomes a nightmare of nested subqueries to maintain? Once more, let
me demonstrate my points with actual code. (If that dos not work,
what will convince you that I have a valid position?)

The query "Show all subcomponents of part A1, including the
substructure" can be handled by the following Oracle SQL statement:

SELECT LEVEL AS pathlength, assemblyno, subassemblyno
FROM Blueprint
START WITH assemblyno = 'A1'
CONNECT BY PRIOR subassemblyno = assemblyno;

The CONNECT BY ... PRIOR clause provides traversal but not support for
recursive aggregate functions. For example, it is not possible to sum
the weights of all subcomponents of part A1 to find the weight of A1.
The only recursive function supported by the CONNECT BY ... PRIOR
clause is the LEVEL function.

Another limitation of the CONNECT BY ... PRIOR clause is that it does
not permit the use of joins. The reason for disallowing joins is that
the order in which the rows are returned in the result is important.
The parent nodes appear before their children, so you know that if the
pathlength increases, these are children; if it does not, they are new
nodes at a higher level.

This also means that an ORDER BY can destroy any meaning in the
results. This means, moreover, that the CONNECT BY ... PRIOR result
is not a true table, since a table by definition does not have an
internal ordering. In addition, this means that it is not always
possible to use the result of a CONNECT BY query in another query.

A trick for working around this limitation, which makes indirect use
of the CONNECT BY ... PRIOR clause, is to hide it in a subquery that
is used to make a JOIN at the higher level. For example, to attach a
product category description, form another table to the parts
explosion.

SELECT part_nbr, category_name
FROM Parts, ProductCategories
WHERE Parts.category_id = ProductCategories.category_id
AND part_nbr IN (SELECT subassemblyno
FROM Blueprint
START WITH assemblyno = 'A1'
CONNECT BY PRIOR subassemblyno = assemblyno);

The subquery has only one table in the FROM clause and complies with
the restriction that there must be no joins in any query block that
contains a CONNECT BY ... PRIOR. On the other hand, the main query
involves a JOIN of two tables, which would not be possible with direct
use of the CONNECT BY ... PRIOR clause. Another query that cannot be
processed by direct use of the CONNECT BY ... PRIOR clause is one that
displays all parent-child relationships at all levels. A technique to
process this query is illustrated by the following SQL:

SELECT DISTINCT PX.part_nbr, PX.pname, PY.part_nbr, PY.pname
FROM Parts AS PX, Parts AS PY
WHERE PY.part_nbr IN (SELECT Blueprint.subassemblyno
FROM Blueprint
START WITH assemblyno = PX.part_nbr
CONNECT BY PRIOR subassemblyno = assemblyno)
ORDER BY PX.part_nbr, PY.part_nbr;

Again, the outer query includes a JOIN, which is not allowed with the
CONNECT BY ... PRIOR clause. Note that the correlated subquery
references PX.part_nbr.
As you try to do more, the subquery nesting gets worse.

The basic problem is that CONNECT BY is a cursor (procedural code) and
SQL is set-oriented. They don't work together very well

>> So are you complaining because there is a less superior method
available for those not as technically proficient as you, are you
complaining because that less superior implementation doesn't exist in
Transact SQL, or are you complaining just because you don't personally
work in Oracle? <<

Why do you think I like T-SQL? The T-SQL newsgroups have more SQL
programming problems than other product newsgroups, which tend to
focus on particulars of their product. While SQL Server is getting
better (so is Oracle)and I hope Yukon cleasns up some of the old
"Sybase Code Museum" problems, but it still has some major problems in
the basic underlying model.

I am trying to figure out "those not as technically proficient as
me"?? I think that anyone can understand the nested sets or nested
intervals model if they understand HTML or XML or work in a block
structured language -- it is the same concept. It is all just
"parentheses in disguise", not rocket science. I have not lost an
audience when I teach it and several times programmers who were
introduced to the technique for the first time came up with some stuff
I had not thought of before -- they started drawing circles inside
circles (i.e. set diagrams) instead of "boxes and arrows" (i.e.
sequential travesals) and looked at their problems differently.

Richard Foote

unread,

Feb 5, 2003, 7:08:44 AM2/5/03

to

"--CELKO--" <71062...@compuserve.com> wrote in message
news:c0d87ec0.03020...@posting.google.com...

> >> But unless you are going to state that the above can NOT be done in
> Oracle ... the fact that CONNECT BY exists is not license to call the
> product, as you did, 'horrible'. <<
>

Hi Celko

You made the comment that "Oracle is a horrible product" and argued the
point (rather verbosely) that such an assessment was due to your dislike of
the connect by clause.

The fact there (may) be better alternatives than the connect by clause with
Oracle hardly constitutes Oracle being a horrible product.

I mean, there is a tad more to Oracle than connect by ........

Hope the cover to your new book is fresh and exciting lest people think it a
horrible book ;)

Richard

DA Morgan

unread,

Feb 5, 2003, 12:16:50 PM2/5/03

to

--CELKO-- wrote:

What Richard said! And triple it!

Your hyperbole was not directed at a single feature that, as you have
demonstrated, can be ignored by those technically proficient enough to do
so. But rather at the entire product. This type of generalization is not
only meaningless ... it is counterproductive.

Daniel Morgan

DA Morgan

unread,

Feb 5, 2003, 12:24:36 PM2/5/03

to

--CELKO-- wrote:

> <snipped>

> I am trying to figure out "those not as technically proficient as
> me"?? I think that anyone can understand the nested sets or nested
> intervals model if they understand HTML or XML or work in a block
> structured language -- it is the same concept.

I teach at the University of Washington and I can assure you that those
things that are perfectly obvious and logical to you can create a great
deal of challenge and struggle for others. Remember they are buying your
book ... not the other way around.

Perhaps everyone "can" understand nested sets or nested intervals. But
that does not mean that they do. Anytime you want to be a guest lecturer
at the U... let me know.

Daniel Morgan

Mikito Harakiri

unread,

Feb 5, 2003, 1:40:38 PM2/5/03

to

"--CELKO--" <71062...@compuserve.com> wrote in message
news:c0d87ec0.03020...@posting.google.com...

> How about if I say the code to do tree manipulation with CONNECT BY is
> orders of magnitude slower, not Standard, not portable and quickly
> becomes a nightmare of nested subqueries to maintain? Once more, let
> me demonstrate my points with actual code. (If that dos not work,
> what will convince you that I have a valid position?)

Connect-By uses indexed nested loops. If both id and parent id columns are
indexed, it is reasonably fast.

> The query "Show all subcomponents of part A1, including the
> substructure" can be handled by the following Oracle SQL statement:
>
> SELECT LEVEL AS pathlength, assemblyno, subassemblyno
> FROM Blueprint
> START WITH assemblyno = 'A1'
> CONNECT BY PRIOR subassemblyno = assemblyno;
>
> The CONNECT BY ... PRIOR clause provides traversal but not support for
> recursive aggregate functions. For example, it is not possible to sum
> the weights of all subcomponents of part A1 to find the weight of A1.
> The only recursive function supported by the CONNECT BY ... PRIOR
> clause is the LEVEL function.

Recursive Aggregate Functions aka Hierarchical Totals are prefectly possible
with connect-by. Just use subqueries in the "select" list (Date's idea of
expressing group-by with subqueries in the "select" list).
select e1.ename,

(select sum(e2.sal) from emp e2

connect by e2.mgr = prior e2.empno

start with e2.empno = e1.empno)

from emp e1

I'm skeptical, however, about optimization of hierarchical total. For one
thing, I'm not aware of any product that would be able to unnest subquery in
the select list into "group by" query. Corrections are welcome.

> Another limitation of the CONNECT BY ... PRIOR clause is that it does
> not permit the use of joins. The reason for disallowing joins is that
> the order in which the rows are returned in the result is important.
> The parent nodes appear before their children, so you know that if the
> pathlength increases, these are children; if it does not, they are new
> nodes at a higher level.

Not true anymore for Oracle 9.

> This also means that an ORDER BY can destroy any meaning in the
> results. This means, moreover, that the CONNECT BY ... PRIOR result
> is not a true table, since a table by definition does not have an
> internal ordering. In addition, this means that it is not always
> possible to use the result of a CONNECT BY query in another query.

Connect-by ordering is no more a problem than the Standard SQL Order-By.

> Another query that cannot be
> processed by direct use of the CONNECT BY ... PRIOR clause is one that
> displays all parent-child relationships at all levels. A technique to
> process this query is illustrated by the following SQL:
>
> SELECT DISTINCT PX.part_nbr, PX.pname, PY.part_nbr, PY.pname
> FROM Parts AS PX, Parts AS PY
> WHERE PY.part_nbr IN (SELECT Blueprint.subassemblyno
> FROM Blueprint
> START WITH assemblyno = PX.part_nbr
> CONNECT BY PRIOR subassemblyno = assemblyno)
> ORDER BY PX.part_nbr, PY.part_nbr;
> Again, the outer query includes a JOIN, which is not allowed with the
> CONNECT BY ... PRIOR clause. Note that the correlated subquery
> references PX.part_nbr.

What do you mean by "direct use"? This is SQL, a user is allowed to inline
and nest subqueries.

Transitive closure query could be answered easily if we just drop "start
with" clause.

SELECT Blueprint.subassemblyno,
substr(sys_connect_by_path(assemblyno,'/'),2,

instr(sys_connect_by_path(assemblyno,'/')||'/','/',2)-2))
FROM Blueprint

CONNECT BY PRIOR subassemblyno = assemblyno

> The basic problem is that CONNECT BY is a cursor (procedural code) and

> SQL is set-oriented. They don't work together very well

Where in the above examples did you see a cursor?

> I am trying to figure out "those not as technically proficient as
> me"?? I think that anyone can understand the nested sets or nested
> intervals model if they understand HTML or XML or work in a block
> structured language -- it is the same concept. It is all just
> "parentheses in disguise", not rocket science. I have not lost an
> audience when I teach it and several times programmers who were
> introduced to the technique for the first time came up with some stuff
> I had not thought of before -- they started drawing circles inside
> circles (i.e. set diagrams) instead of "boxes and arrows" (i.e.
> sequential travesals) and looked at their problems differently.

The appealing part of the transitive closure syntax -- being it Recursive
With, or Connect By -- is that the hierarchical query doesn't use a
particular labeling scheme. Certainly, you might insist that only one
labeling scheme exists -- nested sets -- but we know otherwise. Here is
incomplete list:
0. Materialized Path
1. Nested Sets.
2. <depth_first_order, depth> integer pair
3. Binary Rationals
4. Primary Number Decompositions
The hierarchical query that is locked into any of these scheme is
implementation dependent!

--CELKO--

unread,

Feb 5, 2003, 7:38:53 PM2/5/03

to

>> Connect-By uses indexed nested loops. If both id and parent id
columns are indexed, it is reasonably fast. <<

And it is reasonable to assume that they will have two column index to
ensure uniqueness of the pairs.

>> Recursive Aggregate Functions aka Hierarchical Totals are prefectly
possible with connect-by. Just use subqueries in the "select" list
(Date's idea of expressing group-by with subqueries in the "select"

list) ... I'm skeptical, however, about optimization of hierarchical

total. For one thing, I'm not aware of any product that would be able
to unnest subquery in the select list into "group by" query.
Corrections are welcome. <<

I don't know of any such optimization either. The levels of
correlation between the scalar subquery expresion and a containing
query with mutliple table references in its FROM clause would be be
pretty complicated.

>> Not true anymore for Oracle 9. <<

I am been corrected on that one, and changed it in the text.

>> What do you mean by "direct use"? This is SQL, a user is allowed to
inline and nest subqueries. <<

But you want to avoid them in favor of simple, "flat" joins whenever
possible. They hurt performance and maintainability.

>> Where in the above examples did you see a cursor? <<

It's there, but it is hidden in the syntax. The fact that I have to
proceed from one row to another in a particular order means I am doing
sequential processing. Do you know of a parallel algorithm for ue
with the adjacency list model of a tree?

>> The appealing part of the transitive closure syntax -- being it
Recursive With, or Connect By -- is that the hierarchical query
doesn't use a particular labeling scheme. <<

Granted, but you also lose the ability to order the children, to
separate the tree structure from the nodes (organizational chart
versus personnel)

>> Certainly, you might insist that only one labeling scheme exists --

nested sets -- but we know otherwise. [snip list] <<

I know there are quite a few and I cover them. I just brought out
nested sets
in this post because I have a "cut & paste" file on them. Nested sets
is the easiest method to explain and it has the simplest math of any
method that uses numeric pairs to locate a node.

>> The hierarchical query that is locked into any of these scheme is
implementation dependent! <<

Unh? All of the methods on your list use Standard SQL, and have no
implementation dependent features (well, other than precision and
scale of numbers) or vendor extensions. There are trade-offs with
each; the amount of math needed for a query in the ones that depend on
numeric pairs, the storage for a path enumeration string and the
seartch time, etc.

The big advantage of the adjacency list model is that it is fast and
easy to insert new rows. But summary data is much harder to obtain,
and constraints to preserve the tree structure are more complex (most
programmers do not bother to write the needed constraints!! Arrgh!).
Simple deletion of a row splits the tree apart, so you need code to
reconstruct it. Protection from cycles requires either procedural
code or table level check clauses (not yet found in many products).
Following a path requires cursors - hidden or explicit.

Done right, the adjacency list is harder than people think.

DA Morgan

unread,

Feb 5, 2003, 8:16:32 PM2/5/03

to

--CELKO-- wrote:

> <snipped>columns are indexed, it is reasonably fast. <<

>
> >> What do you mean by "direct use"? This is SQL, a user is allowed to
> inline and nest subqueries. <<
>
> But you want to avoid them in favor of simple, "flat" joins whenever
> possible. They hurt performance and maintainability.
>

Not in Oracle.

The performance impact of inline views, while there must be some, is
unmeasurable. And, in fact, I can easily demonstrate cases where
subqueries are substantially faster than flat joins and do so in the
EXPLAIN PLAN lesson I teach first quarter students.

Maintainability depends on what developers are used to working with. If
you haven't seen curly braces before they are a nightmare. In the case of
serious Oracle shops ... inline views and nested subqueries are so common
as to have no impact on maintainance.

PL/SQL and the optimizer have changed a lot recently. Knowledge of how
things worked three to five years ago, or more, is nearly useless.

Daniel Morgan

Mikito Harakiri

unread,

Feb 5, 2003, 8:36:46 PM2/5/03

to

"--CELKO--" <71062...@compuserve.com> wrote in message
news:c0d87ec0.03020...@posting.google.com...

> >> What do you mean by "direct use"? This is SQL, a user is allowed to
> inline and nest subqueries. <<
>
> But you want to avoid them in favor of simple, "flat" joins whenever
> possible. They hurt performance and maintainability.

OK.

> >> Where in the above examples did you see a cursor? <<
>
> It's there, but it is hidden in the syntax. The fact that I have to
> proceed from one row to another in a particular order means I am doing
> sequential processing. Do you know of a parallel algorithm for ue
> with the adjacency list model of a tree?

Suppose we have relation

Links
start end
---- ---
1 2
2 3

and it's transitive closure

TC
start end
---- ---
1 2
2 3
1 3

It's not important whether TC is built on the fly (for example, with
"recursive with" construct), or incrementally maintained. Then the
hierarchical queries against TC relation

select count(1) from TC
where start = 2

-- the total number of descendants of the node #2, and

select count(1) from TC
where end = 2

-- the node level

and others are as easy as the queries against the Nested Sets model.

To be fair, if you have a "cursor" objection against "connect by", it should
apply to "recursive with" as well, right?

> >> The appealing part of the transitive closure syntax -- being it
> Recursive With, or Connect By -- is that the hierarchical query
> doesn't use a particular labeling scheme. <<
>
> Granted, but you also lose the ability to order the children, to
> separate the tree structure from the nodes (organizational chart
> versus personnel)

I don't follow here.

Since adjacency list model applies to DAGs, which are more general case then
trees, we must be careful about separating graph nodes and edges. Otherwise,
there would be nulls like this

Links
parent_id id
-------- ---
null 1
1 2
2 3

and a lot of confusion. (For example: Was that query against the nodes, or
against the links?)

> >> The hierarchical query that is locked into any of these scheme is
> implementation dependent! <<
>
> Unh? All of the methods on your list use Standard SQL, and have no
> implementation dependent features (well, other than precision and
> scale of numbers) or vendor extensions. There are trade-offs with
> each; the amount of math needed for a query in the ones that depend on
> numeric pairs, the storage for a path enumeration string and the
> seartch time, etc.

Once again, it is implementation dependent in the sence that the user is
locked into a particular labeling schema. He would have to completely
rewrite his queries if he decides to change labeling. On contrary, there
only one adjacency list model.

> The big advantage of the adjacency list model is that it is fast and
> easy to insert new rows.

It is more general too.

> But summary data is much harder to obtain,

Not if transitive closure is known.

> and constraints to preserve the tree structure are more complex (most
> programmers do not bother to write the needed constraints!! Arrgh!).
> Simple deletion of a row splits the tree apart, so you need code to
> reconstruct it. Protection from cycles requires either procedural
> code or table level check clauses (not yet found in many products).
> Following a path requires cursors - hidden or explicit.
>
> Done right, the adjacency list is harder than people think.

That might be correct. However, until Herarchy Theory in the Relational
Model is mostly undeveloped I prefer not to dismiss alternative solutions
easily.

Mikito Harakiri

unread,

Feb 5, 2003, 9:08:08 PM2/5/03

to

"DA Morgan" <damo...@exesolutions.com> wrote in message
news:3E41B770...@exesolutions.com...

> --CELKO-- wrote:
>
> > <snipped>columns are indexed, it is reasonably fast. <<
> >
> > >> What do you mean by "direct use"? This is SQL, a user is allowed to
> > inline and nest subqueries. <<
> >
> > But you want to avoid them in favor of simple, "flat" joins whenever
> > possible. They hurt performance and maintainability.
> >
>
> Not in Oracle.
>
> The performance impact of inline views, while there must be some, is
> unmeasurable. And, in fact, I can easily demonstrate cases where
> subqueries are substantially faster than flat joins and do so in the
> EXPLAIN PLAN lesson I teach first quarter students.

I would be interested to see those. Evaluating subquery for each row of the
outer query (Tuple Iteration Semantics) is essentially Nested Loops.
Unnesting subquery is a transformation that makes the join explicit.
Combining all joins together into a flat select-project-join query is
beneficial because it opens larger space of join orders (For example, new
join orders might be explored, unavailable in the original query, plus other
join methods -- Merge Join and Hash Join are available as well).

Mikito Harakiri

unread,

Feb 5, 2003, 9:11:41 PM2/5/03

to

"Mikito Harakiri" <mikha...@ywho.com> wrote in message
news:44j0a.17$3j3...@news.oracle.com...

> Links
> parent_id id
> -------- ---
> null 1
> 1 2
> 2 3

That shoud be "Nodes" relation, not "Links", of course.

Ed Prochak

unread,

Feb 6, 2003, 12:10:04 AM2/6/03

to

[]

And this update doesn't blow performance out of the water?
Doesn't it potentionally update large percentages of the entire table?

Since you only go by two, wouldn't the next case of large updates happen after
another couple inserts at the "leaf" nodes. Maybe I do not follow this
correctly. In your orgchart example, what happens when Donna hires an
assistant? It seems to me all the current employee records get touched except
Bert. Is that right? (Strikes me like a network model DB in that you have to
rebuild links as part of inserts.) Am I wrong?

[]

>
> I have a book on TREES & HIERARCHIES IN SQL coming out in 2003 with
> more code and details.

Are we rediscovering the past?

(Network model databases have their place and can out perform a RDBMS in those
cases. It's just that those cases are few and far between. In my experience,
it's when the data is static, ie, load data once and read many times very
fast. Dynamic data just bogs done on link updates.)

Disclamer: I'm no DB theoretician. I'm reading this from
comp.databases.oracle.misc

(Which makes me curious who put oracle in the mix of groups when no other DB
product is included?)

--
Ed Prochak
running: http://www.faqs.org/faqs/running-faq/
family: http://web.magicinterface.com/~collins
--
"Two roads diverged in a wood and I
I took the one less travelled by
and that has made all the difference."
robert frost

DA Morgan

unread,

Feb 6, 2003, 12:33:25 AM2/6/03

to

Mikito Harakiri wrote:

> <snipped>

> I would be interested to see those. Evaluating subquery for each row of the
> outer query (Tuple Iteration Semantics) is essentially Nested Loops.
> Unnesting subquery is a transformation that makes the join explicit.
> Combining all joins together into a flat select-project-join query is
> beneficial because it opens larger space of join orders (For example, new
> join orders might be explored, unavailable in the original query, plus other
> join methods -- Merge Join and Hash Join are available as well).

Table 1:
CREATE TABLE servers (
srvr_id NUMBER(10),
network_id NUMBER(10),
status CHAR(1),
latitude FLOAT(20),
longitude FLOAT(20),
netaddress VARCHAR2(15))
PCTFREE 5
PCTUSED 90
STORAGE (INITIAL 64K NEXT 64K PCTINCREASE 0)
TABLESPACE data_sml;
ALTER TABLE servers
ADD CONSTRAINT pk_servers PRIMARY KEY (srvr_id)
USING INDEX
PCTFREE 5
STORAGE (INITIAL 64K NEXT 64K PCTINCREASE 0)
TABLESPACE indx_sml;

100 rows of data modeled like this:
INSERT INTO SERVERS VALUES (1,1028,'Y',32.9806,-117.2567,'172.020.130.002');

Table 2:
CREATE TABLE serv_inst (
siid NUMBER(10),
si_status VARCHAR2(15),
type VARCHAR2(5),
installstatus CHAR(1),
location_code NUMBER(10),
custacct_id VARCHAR2(10),
srvr_id NUMBER(10),
ws_id NUMBER(10))
PCTFREE 5
PCTUSED 90
STORAGE (INITIAL 256K NEXT 256K PCTINCREASE 0)
TABLESPACE data_sml;

ALTER TABLE serv_inst
ADD CONSTRAINT pk_serv_inst PRIMARY KEY (siid, srvr_id, ws_id)
USING INDEX
PCTFREE 5
STORAGE (INITIAL 256K NEXT 256K PCTINCREASE 0)
TABLESPACE indx_sml;

1000 rows of data modeled like this:
INSERT INTO SERV_INST VALUES
(3490095,'Pending','WIN','Y',78161,'03490092',12,1);

Try these two SQL statements in both 8.1.7 and 9.2. They product the exact same
result set from the data.

SQL Statement 1:
SELECT DISTINCT s.srvr_id
FROM servers s, serv_inst i
WHERE s.srvr_id = i.srvr_id;

SQL Statement 2:
SELECT DISTINCT srvr_id
FROM servers
WHERE srvr_id NOT IN (
SELECT srvr_id
FROM servers
MINUS
SELECT srvr_id
FROM serv_inst);

I'll take the ridiculously convoluted sub-query every time. In fact the flat
inner-join is slower than just about anything else you can write to solve the
problem other than an INTERSECT.

Daniel Morgan

--CELKO--

unread,

Feb 9, 2003, 9:08:47 PM2/9/03

to

>> [book on TREES & HIERARCHIES IN SQL] Are we rediscovering the past?

Network model databases have their place and can out perform a RDBMS
in those cases. It's just that those cases are few and far between. In
my experience, it's when the data is static, ie, load data once and
read many times very fast. Dynamic data just bogs done on link
updates. <<

Actually, I spend most of the time showing how to code hierarchies
without using links, but by taking a set-oriented approach, like the
one I just did a "cut & paste" on.

Actually, the old network and IMS databases are really fast to load
and access. They just have no flexibility. It's like a Polar Bear --
great in one environment and no place else.

If you put parts subordinate to their suppliers, and want to get all
the sources for a "Metric #5 Machine Screw", you will be chasing
pointer chains from the supplier, thru his catalog, back to the next
supplier, thru his catalog, etc.

--CELKO--

unread,

Feb 9, 2003, 9:35:53 PM2/9/03

to

>> It's not important whether TC is built on the fly (for example,
with "recursive with" construct), or incrementally maintained. <<

I grant that once you have the transitive closure, a lot of problems
are easy. But building it and storing each possible path as a
separate row, is expensive.

>> I don't follow here. <<

In a nested set model, the (lft, rgt) pairs are ordered, so I have a
left-most (oldest) child, a right-most (youngest) child and an
ordering of the sibling in between.

>> Once again, it is implementation dependent in the sence that the
user is
locked into a particular labeling schema. <<

Okay. The phrase "implementation dependent" is used in ANSI/ISO
Standards to mean that the vendor gets to determine something about a
feature.

>> there only one adjacency list model. <<

Two.

Albert
/ \
/ \
Bert Chuck
/ | \
/ | \
/ | \
/ | \
Donna Eddie Fred

CREATE TABLE Parent_Down_Tree
(parent_id CHAR (8), -- null means child is the root
child_id CHAR(8) NOT NULL);

parent child
======================
NULL 'Albert'
'Albert' 'Bert'
'Bert' 'Chuck'
'Chuck' 'Donna'
'Chuck 'Eddie'
'Chuck' 'Fred'

CREATE TABLE Child_Up_Tree
(parent_id CHAR (8) NOT NULL,
child_id CHAR(8)); -- null means child is a leaf

parent child
======================
'Albert' 'Chuck'
'Albert' 'Bert'
'Bert' NULL'
'Donna' NULL
'Eddie' NULL
'Fred' NULL

>> It is more general too. <<

True.

>> That might be correct. However, until Herarchy Theory in the
Relational Model is mostly undeveloped I prefer not to dismiss
alternative solutions easily. <<

And I am a "Standards Faniac" <g> ...

Mikito Harakiri

unread,

Feb 10, 2003, 12:05:50 PM2/10/03

to

"--CELKO--" <71062...@compuserve.com> wrote in message
news:c0d87ec0.03020...@posting.google.com...

>> >> Granted, but you also lose the ability to order the children, to
>> >> separate the tree structure from the nodes (organizational chart
>> >> versus personnel)

> >> I don't follow here. <<
>
> In a nested set model, the (lft, rgt) pairs are ordered, so I have a
> left-most (oldest) child, a right-most (youngest) child and an
> ordering of the sibling in between.

I understand the ordering. I don't understand "separate the tree structure
from the nodes".

> >> there only one adjacency list model. <<
>
> Two.
>
> Albert
> / \
> / \
> Bert Chuck
> / | \
> / | \
> / | \
> / | \
> Donna Eddie Fred
>
>
> CREATE TABLE Parent_Down_Tree
> (parent_id CHAR (8), -- null means child is the root
> child_id CHAR(8) NOT NULL);
>
> parent child
> ======================
> NULL 'Albert'
> 'Albert' 'Bert'
> 'Bert' 'Chuck'

'Albert' 'Chuck' ?

> 'Chuck' 'Donna'
> 'Chuck 'Eddie'
> 'Chuck' 'Fred'
>
> CREATE TABLE Child_Up_Tree
> (parent_id CHAR (8) NOT NULL,
> child_id CHAR(8)); -- null means child is a leaf
>
> parent child
> ======================
> 'Albert' 'Chuck'
> 'Albert' 'Bert'
> 'Bert' NULL'
> 'Donna' NULL
> 'Eddie' NULL
> 'Fred' NULL

I don't agree.

First, there are missing links in your data. Whose child is Donna, for
example?

Second, if you reorganize the model into 2 tables: nodes and linkes, then
there would be no difference between both your representations (and plus,
there would be no NULLs). Essentially, we are talking about "poles and
spaces" counting problem. We can relate each pole to the space before, or
after it.

Third, for arbitrary graphs you can't merge nodes and links into the same
table anymore.

There is only one adjacency model simply because there is only one adjacency
matrix definition in graph theory. The fact that we can consider transposed
adjacency matrix too is not important.

--CELKO--

unread,

Feb 12, 2003, 12:01:28 AM2/12/03

to

>> First, there are missing links in your data ... <<

ARRGGH! Let me try again:

CREATE TABLE Parent_Down_Tree
(parent_id CHAR (8), -- null means child is the root
child_id CHAR(8) NOT NULL);

parent child
======================
NULL 'Albert'
'Albert' 'Bert'
'Bert' 'Chuck'
'Albert' 'Chuck'

'Chuck' 'Donna'
'Chuck 'Eddie'
'Chuck' 'Fred'

CREATE TABLE Child_Up_Tree
(parent_id CHAR (8) NOT NULL,
child_id CHAR(8)); -- null means child is a leaf

parent child
======================
'Albert' 'Chuck'
'Albert' 'Bert'
'Bert' NULL'
'Donna' NULL
'Eddie' NULL
'Fred' NULL
'Chuck' 'Donna'
'Chuck 'Eddie'
'Chuck' 'Fred'

>> Second, if you reorganize the model into 2 tables: nodes and links,

then there would be no difference between both your representations
(and plus, there would be no NULLs). <<

That is not what I mean. The organizational chart is not the same
thing as the personnel who hold those positions. Albert might be the
president of the company, but when he gets fired, the office of the
president is still there, only it is vacant. If I have personnel
separated from the organizational structure, I can have one person
hold two jobs, show vacant positions, etc.

The mild advantage of a child-up adjacency list model is when you have
a tree that grows down from leaf nodes -- I can immediately find and
update a leaf node with an IS NULL predicate.

>> for arbitrary graphs ... <<

I am only concerned with trees in these models. The adjacency list
(or perhaps an adjacency array) would be the way to do an arbitrary
graph. I would say that it is too easy to accidently do an arbitrary
graph in the adjacency list model and that people forget to put all of
the constraints they need to assure that they have a tree.

>> There is only one adjacency model simply because there is only one
adjacency matrix definition in graph theory. The fact that we can
consider transposed adjacency matrix too is not important. <<

Only if you are a programmer <g>

Paul Vernon

unread,

Feb 12, 2003, 5:36:02 AM2/12/03

to

"--CELKO--" <71062...@compuserve.com> wrote in message

news:c0d87ec0.03021...@posting.google.com...

> >> First, there are missing links in your data ... <<
>
> ARRGGH! Let me try again:

Third time lucky?

Given:

Albert
/ \
/ \
Bert Chuck
/ | \
/ | \
/ | \
/ | \
Donna Eddie Fred

> CREATE TABLE Parent_Down_Tree

> (parent_id CHAR (8), -- null means child is the root
> child_id CHAR(8) NOT NULL);
>
> parent child
> ======================
> NULL 'Albert'
> 'Albert' 'Bert'

Scrub this row
> 'Bert' 'Chuck'

> 'Albert' 'Chuck'
> 'Chuck' 'Donna'
> 'Chuck 'Eddie'
> 'Chuck' 'Fred'
>
> CREATE TABLE Child_Up_Tree
> (parent_id CHAR (8) NOT NULL,
> child_id CHAR(8)); -- null means child is a leaf
>
> parent child
> ======================
> 'Albert' 'Chuck'
> 'Albert' 'Bert'
> 'Bert' NULL'
> 'Donna' NULL
> 'Eddie' NULL
> 'Fred' NULL
> 'Chuck' 'Donna'
> 'Chuck' 'Eddie'
> 'Chuck' 'Fred'

So the only difference is whether we redundantly mark root or leaf nodes with
NULL links? How does this make them different models? If it does, don't we
have 4 models:

1) Mark none

'Albert' 'Chuck'
'Albert' 'Bert'

'Chuck' 'Donna'
'Chuck' 'Eddie'
'Chuck' 'Fred'

2) Mark all

NULL 'Albert'
'Albert' 'Chuck'
'Albert' 'Bert'

'Chuck' 'Donna'
'Chuck' 'Eddie'
'Chuck' 'Fred'

'Bert' NULL
'Donna' NULL
'Eddie' NULL
'Fred' NULL

plus the two above.

> >> Second, if you reorganize the model into 2 tables: nodes and links,
> then there would be no difference between both your representations
> (and plus, there would be no NULLs). <<
>
> That is not what I mean. The organizational chart is not the same
> thing as the personnel who hold those positions. Albert might be the
> president of the company, but when he gets fired, the office of the
> president is still there, only it is vacant. If I have personnel
> separated from the organizational structure, I can have one person
> hold two jobs, show vacant positions, etc.

I can't see why an org chart is not best represented by seperate node and link
tables.

> The mild advantage of a child-up adjacency list model is when you have
> a tree that grows down from leaf nodes -- I can immediately find and
> update a leaf node with an IS NULL predicate.

'Parent-down' has the advantage of having a nice key

CREATE TABLE Parent_Down_Tree
(parent_id CHAR (8), -- null means child is the root

child_id CHAR(8) NOT NULL,
PRIMARY KEY(child_id));

Regards
Paul Vernon
Business Intelligence, IBM Global Services

Galen Boyer

unread,

Feb 12, 2003, 12:19:25 AM2/12/03

to

On 3 Feb 2003, 71062...@compuserve.com wrote:

>>> When you are ready to demonstrate a superior implementation in
> another SQL
> RDBMS please let us know. And I don't mean superior in terms of some
> theoretical concept like building a database in fifth normal form. I
> mean superior in terms of performance and scalability in the real
> world. <<
>
> Funny you should ask :)!! I have a book coming out later this year on
> TREES & HIERARCHIES IN SQL (Morgan-Kaufmann), which gives several
> different models in Standard SQL-92 for tree structures which are
> superior in terms of performance and scalability compared to the
> CONNECT BY.
>
> A few years back, we ran a test on a huge tree with the following
> method versus cursors and CONNECT BY; it was 20 to 100 times faster
> for 75,000 nodes and 12 levels deep. It could also do things like
> compare the structures of sub-trees in one SELECT statement.

You have done nothing, whatsoever to prove that Oracle is a horrible
product. Nothing, zilch, nada!

What you have done is present a design that many have, I know I have,
implemented to speed up access to hierarchies. But, you want to know
what, we implemented these designs in Oracle. I deliberately try not to
use connect by very often, but it comes in handy, especially for small
sets and quick hits. What the flip is wrong with offering that up?
When someone wants to really implement hierarchies and fast access to
them, they design a home-grown one, like yours.

So, please tell us how Oracle is a "horrible product". This time, stop
selling your book and show where another database vendor's solution to
hierarchical queries is superior to Oracle's "connect by". Even then,
you will still have done, nothing, zilch, nada to prove that Oracle is a
"horrible product", but you can't even start to defend the statement
because you haven't even shown one vendor's superior solution to
Oracle's "connect by", not one.

Galen Boyer

unread,

Feb 12, 2003, 12:26:18 AM2/12/03

to

On 3 Feb 2003, 71062...@compuserve.com wrote:

>>> Keep care to not destroy it by cheaply bashing trademarks without
> giving a clean argumentation. <<
>
> I do the details in magazine columns <g>.

Stop the marketing bullshit and back up your claim.

Ed Prochak

unread,

Feb 12, 2003, 2:08:22 PM2/12/03

to

Yes. I expected you knew this but it didn't hurt to ask. (I've seen others not
know the past and proclaim their network model to the the "next big thing".)

Now what about the UPDATE issue I asked about. I was really more concerned
with that. It reminds me somewhat of the old line numbering issue in BASIC.
After so many inserts, you have top renumber a large number of records. How do
you avoid that?

Mikito Harakiri

unread,

Feb 12, 2003, 3:49:11 PM2/12/03

to

I don't follow. Here is what i did:

CREATE TABLE servers (
srvr_id NUMBER(10),
network_id NUMBER(10),
status CHAR(1),
latitude FLOAT(20),
longitude FLOAT(20),
netaddress VARCHAR2(15))
PCTFREE 5
PCTUSED 90

STORAGE (INITIAL 64K NEXT 64K PCTINCREASE 0);

ALTER TABLE servers
ADD CONSTRAINT pk_servers PRIMARY KEY (srvr_id)
USING INDEX
PCTFREE 5

STORAGE (INITIAL 64K NEXT 64K PCTINCREASE 0);

begin
FOR i In 1..100 LOOP
INSERT INTO SERVERS VALUES
(i,abs(floor(DBMS_RANDOM.RANDOM/1000000000)),
CASE WHEN DBMS_RANDOM.RANDOM > 0 THEN 'Y'
ELSE 'N' END ,DBMS_RANDOM.RANDOM/10000000,
DBMS_RANDOM.RANDOM/20000000,
abs(floor(DBMS_RANDOM.RANDOM/10000000)) || '.' ||
abs(floor(DBMS_RANDOM.RANDOM/10000000)) || '.' ||
abs(floor(DBMS_RANDOM.RANDOM/10000000)) || '.' ||
abs(floor(DBMS_RANDOM.RANDOM/10000000))
);
END LOOP;
end;

commit

CREATE TABLE serv_inst (
siid NUMBER(10),
si_status VARCHAR2(15),
type VARCHAR2(5),
installstatus CHAR(1),
location_code NUMBER(10),
custacct_id VARCHAR2(10),
srvr_id NUMBER(10),
ws_id NUMBER(10))
PCTFREE 5
PCTUSED 90
STORAGE (INITIAL 256K NEXT 256K PCTINCREASE 0)

ALTER TABLE serv_inst

ADD CONSTRAINT pk_serv_inst PRIMARY KEY (siid, srvr_id, ws_id)
USING INDEX
PCTFREE 5
STORAGE (INITIAL 256K NEXT 256K PCTINCREASE 0)

begin
FOR i In 1..1000 LOOP
INSERT INTO serv_inst VALUES
(3490000+i,'Pending','WIN',CASE WHEN DBMS_RANDOM.RANDOM > 0 THEN 'Y'
ELSE 'N' END ,78161,'03490092',floor(i/10),1);
END LOOP;
end;

commit

SELECT --+all_rows

DISTINCT s.srvr_id
FROM servers s, serv_inst i
WHERE s.srvr_id = i.srvr_id;

10 buffer gets

SELECT --+all_rows

DISTINCT srvr_id
FROM servers
WHERE srvr_id NOT IN (
SELECT srvr_id
FROM servers
MINUS

SELECT --+index(serv_inst pk_serv_inst)
srvr_id
FROM serv_inst);

500 buffer gets
(without index hint it's 1000 buffer gets, while rule plan is about 1200
buffer gets)

If you look to execution plans, then, the first one has a Nested Loop, while
the second one has FILTER. Nested Loop execution is virtully identical to
FILTER rowsource Tuple Iteration Semantics. If you look into
v$sqlplan_statistics_all then there would be 100 LAST_STARTS for the MINUS
rowsource)

Certainly, the subquery is not corellated, so that there is no need to
execute subquery 100 times, but I failed to get unnested plan in 9.2.0.2.
And this would only prove my point: query unnesting is almost always
beneficial.

Could you please publish your plans & execution statistics?

"DA Morgan" <damo...@exesolutions.com> wrote in message

news:3E41F3A5...@exesolutions.com...