> I've got a table called 'link_t' containing a collection of seller - > buyer relations between two parties.
> sql> select * from link_t
> S B > - - > C X > A B > B C > C D > D E
> 5 rows selected.
> I am looking for a select statement that returns the concatenation of > seller - buyer relations between the first seller 'A' and the last > buyer 'B'.
> the result should be
> S B > - - > A B > B C > C D > D E
> Currently I fumbling around with self joins but haven't figured out > yet. > Any suggestions
>> I've got a table called 'link_t' containing a collection of seller
- buyer relations between two parties. <<
That is not a real linked list, but let's ignore bad terminology. One way to do this is with cursors, but they will take time and trend to be proprietary.
Anohter way is to build a tree, with the first seller as the root and the final buyer as a leaf node.
The usual example of a tree structure in SQL books is called an adjacency list model and it looks like this:
CREATE TABLE OrgChart (emp CHAR(10) NOT NULL PRIMARY KEY, boss CHAR(10) DEFAULT NULL REFERENCES OrgChart(emp), salary DECIMAL(6,2) NOT NULL DEFAULT 100.00);
Another way of representing trees is to show them as nested sets. Since SQL is a set oriented language, this is a better model than the usual adjacency list approach you see in most text books. Let us define a simple OrgChart table like this, ignoring the left (lft) and right (rgt) columns for now. This problem is always given with a column for the employee and one for his boss in the textbooks. This table without the lft and rgt columns is called the adjacency list model, after the graph theory technique of the same name; the pairs of emps are adjacent to each other.
The organizational chart would look like this as a directed graph:
Albert (1,12) / \ / \ Bert (2,3) Chuck (4,11) / | \ / | \ / | \ / | \ Donna (5,6) Eddie (7,8) Fred (9,10)
The first table is denormalized in several ways. We are modeling both the OrgChart and the organizational chart in one table. But for the sake of saving space, pretend that the names are job titles and that we have another table which describes the OrgChart that hold those positions.
Another problem with the adjacency list model is that the boss and employee columns are the same kind of thing (i.e. names of OrgChart), and therefore should be shown in only one column in a normalized table. To prove that this is not normalized, assume that "Chuck" changes his name to "Charles"; you have to change his name in both columns and several places. The defining characteristic of a normalized table is that you have one fact, one place, one time.
The final problem is that the adjacency list model does not model subordination. Authority flows downhill in a hierarchy, but If I fire Chuck, I disconnect all of his subordinates from Albert. There are situations (i.e. water pipes) where this is true, but that is not the expected situation in this case.
To show a tree as nested sets, replace the emps with ovals, then nest subordinate ovals inside each other. The root will be the largest oval and will contain every other emp. The leaf emps will be the innermost ovals with nothing else inside them and the nesting will show the hierarchical relationship. The rgt and lft columns (I cannot use the reserved words LEFT and RIGHT in SQL) are what shows the nesting.
If that mental model does not work, then imagine a little worm crawling anti-clockwise along the tree. Every time he gets to the left or right side of a emp, he numbers it. The worm stops when he gets all the way around the tree and back to the top.
This is a natural way to model a parts explosion, since a final assembly is made of physically nested assemblies that final break down into separate parts.
At this point, the boss column is both redundant and denormalized, so it can be dropped. Also, note that the tree structure can be kept in one table and all the information about a emp can be put in a second table and they can be joined on employee number for queries.
To convert the graph into a nested sets model think of a little worm crawling along the tree. The worm starts at the top, the root, makes a complete trip around the tree. When he comes to a emp, he puts a number in the cell on the side that he is visiting and increments his counter. Each emp will get two numbers, one of the right side and one for the left. Computer Science majors will recognize this as a modified preorder tree traversal algorithm. Finally, drop the unneeded OrgChart.boss column which used to represent the edges of a graph.
This has some predictable results that we can use for building queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*) FROM TreeTable)); leaf emps always have (left + 1 = right); subtrees are defined by the BETWEEN predicate; etc. Here are two common queries which can be used to build others:
1. An employee and all their Supervisors, no matter how deep the tree.
SELECT O2.* FROM OrgChart AS O1, OrgChart AS O2 WHERE O1.lft BETWEEN O2.lft AND O2.rgt AND O1.emp = :myemployee;
2. The employee and all subordinates. There is a nice symmetry here.
SELECT O1.* FROM OrgChart AS O1, OrgChart AS O2 WHERE O1.lft BETWEEN O2.lft AND O2.rgt AND O2.emp = :myemployee;
3. Add a GROUP BY and aggregate functions to these basic queries and you have hierarchical reports. For example, the total salaries which each employee controls:
SELECT O2.emp, SUM(S1.salary) FROM OrgChart AS O1, OrgChart AS O2, Salaries AS S1 WHERE O1.lft BETWEEN O2.lft AND O2.rgt AND O1.emp = S1.emp GROUP BY O2.emp;
4. To find the level of each emp, so you can print the tree as an indented listing. Technically, you should declare a cursor to go with the ORDER BY clause.
SELECT COUNT(O2.emp) AS indentation, O1.emp FROM OrgChart AS O1, OrgChart AS O2 WHERE O1.lft BETWEEN O2.lft AND O2.rgt GROUP BY O1.lft. O1.emp ORDER BY O1.lft;
5. The nested set model has an implied ordering of siblings which theadjacency list model does not. To insert a new node, G1, under part G. We can insert one node at a time like this:
BEGIN ATOMIC DECLARE rightmost_spread INTEGER;
SET rightmost_spread = (SELECT rgt FROM Frammis WHERE part = 'G'); UPDATE Frammis SET lft = CASE WHEN lft > rightmost_spread THEN lft + 2 ELSE lft END, rgt = CASE WHEN rgt >= rightmost_spread THEN rgt + 2 ELSE rgt END WHERE rgt >= rightmost_spread;
The idea is to spread the lft and rgt numbers after the youngest child of the parent, G in this case, over by two to make room for the new addition, G1. This procedure will add the new node to the rightmost child position, which helps to preserve the idea of an age order among the siblings.
6. To convert a nested sets model into an adjacency list model:
SELECT B.emp AS boss, P.emp FROM OrgChart AS P LEFT OUTER JOIN OrgChart AS B ON B.lft = (SELECT MAX(lft) FROM OrgChart AS S WHERE P.lft > S.lft AND P.lft < S.rgt);
7. To convert an adjacency list to a nested set model, use a push down stack. Here is version with a stack in SQL/PSM.
-- Tree holds the adjacency model CREATE TABLE Tree (node CHAR(10) NOT NULL, parent CHAR(10));
-- Stack starts empty, will holds the nested set model CREATE TABLE Stack (stack_top INTEGER NOT NULL, node CHAR(10) NOT NULL, lft INTEGER, rgt INTEGER);
SET counter = 2; SET max_counter = 2 * (SELECT COUNT(*) FROM Tree); SET current_top = 1;
--clear the stack DELETE FROM Stack;
-- push the root INSERT INTO Stack SELECT 1, node, 1, max_counter FROM Tree WHERE parent IS NULL;
-- delete rows from tree as they are used DELETE FROM Tree WHERE parent IS NULL;
WHILE counter <= max_counter- 1 IF EXISTS (SELECT * FROM Stack AS S1, Tree AS T1 WHERE S1.node = T1.parent AND S1.stack_top = current_top)
BEGIN -- push when top has subordinates and set lft value INSERT INTO Stack SELECT (current_top + 1), MIN(T1.node), counter, NULL FROM Stack AS S1, Tree AS T1 WHERE S1.node = T1.parent AND S1.stack_top = current_top;
-- delete rows from tree as they are used DELETE FROM Tree WHERE node = (SELECT node FROM Stack WHERE stack_top = current_top + 1); -- housekeeping of stack pointers and counter SET counter = counter + 1; SET current_top = current_top + 1; END ELSE BEGIN -- pop the stack and set rgt value UPDATE Stack SET rgt = counter, stack_top = -stack_top -- pops the stack WHERE stack_top = current_top SET counter = counter + 1; SET current_top = current_top - 1; END; END IF; -- the top column is not needed in the final answer SELECT node, lft, rgt FROM Stack; END;
I have a book on TREES & HIERARCHIES IN SQL coming out in 2003, but in the meantime, you can look at:
heavy stuff Celko. I would lie if I would pretend I fully understand Your answer. I'll let sink it in.
However, I dont store a consistent tree structure. The table at hand is more a kind of a collection of graphs where I want to find all possible paths between a given starting point and a given end point
71062.1...@compuserve.com (--CELKO--) wrote in message <news:c0d87ec0.0301281554.581797ee@posting.google.com>... > >> I've got a table called 'link_t' containing a collection of seller > - > buyer relations between two parties. <<
> That is not a real linked list, but let's ignore bad terminology. One > way to do this is with cursors, but they will take time and trend to > be proprietary.
> Anohter way is to build a tree, with the first seller as the root and > the final buyer as a leaf node.
> The usual example of a tree structure in SQL books is called an > adjacency list model and it looks like this:
> CREATE TABLE OrgChart > (emp CHAR(10) NOT NULL PRIMARY KEY, > boss CHAR(10) DEFAULT NULL REFERENCES OrgChart(emp), > salary DECIMAL(6,2) NOT NULL DEFAULT 100.00);
> Another way of representing trees is to show them as nested sets. > Since SQL is a set oriented language, this is a better model than the > usual adjacency list approach you see in most text books. Let us > define a simple OrgChart table like this, ignoring the left (lft) and > right (rgt) columns for now. This problem is always given with a > column for the employee and one for his boss in the textbooks. This > table without the lft and rgt columns is called the adjacency list > model, after the graph theory technique of the same name; the pairs of > emps are adjacent to each other.
> The organizational chart would look like this as a directed graph:
> Albert (1,12) > / \ > / \ > Bert (2,3) Chuck (4,11) > / | \ > / | \ > / | \ > / | \ > Donna (5,6) Eddie (7,8) Fred (9,10)
> The first table is denormalized in several ways. We are modeling both > the OrgChart and the organizational chart in one table. But for the > sake of saving space, pretend that the names are job titles and that > we have another table which describes the OrgChart that hold those > positions.
> Another problem with the adjacency list model is that the boss and > employee columns are the same kind of thing (i.e. names of OrgChart), > and therefore should be shown in only one column in a normalized > table. To prove that this is not normalized, assume that "Chuck" > changes his name to "Charles"; you have to change his name in both > columns and several places. The defining characteristic of a > normalized table is that you have one fact, one place, one time.
> The final problem is that the adjacency list model does not model > subordination. Authority flows downhill in a hierarchy, but If I fire > Chuck, I disconnect all of his subordinates from Albert. There are > situations (i.e. water pipes) where this is true, but that is not the > expected situation in this case.
> To show a tree as nested sets, replace the emps with ovals, then nest > subordinate ovals inside each other. The root will be the largest oval > and will contain every other emp. The leaf emps will be the innermost > ovals with nothing else inside them and the nesting will show the > hierarchical relationship. The rgt and lft columns (I cannot use the > reserved words LEFT and RIGHT in SQL) are what shows the nesting.
> If that mental model does not work, then imagine a little worm > crawling anti-clockwise along the tree. Every time he gets to the left > or right side of a emp, he numbers it. The worm stops when he gets all > the way around the tree and back to the top.
> This is a natural way to model a parts explosion, since a final > assembly is made of physically nested assemblies that final break down > into separate parts.
> At this point, the boss column is both redundant and denormalized, so > it can be dropped. Also, note that the tree structure can be kept in > one table and all the information about a emp can be put in a second > table and they can be joined on employee number for queries.
> To convert the graph into a nested sets model think of a little worm > crawling along the tree. The worm starts at the top, the root, makes a > complete trip around the tree. When he comes to a emp, he puts a > number in the cell on the side that he is visiting and increments his > counter. Each emp will get two numbers, one of the right side and one > for the left. Computer Science majors will recognize this as a > modified preorder tree traversal algorithm. Finally, drop the unneeded > OrgChart.boss column which used to represent the edges of a graph.
> This has some predictable results that we can use for building > queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*) > FROM TreeTable)); leaf emps always have (left + 1 = right); subtrees > are defined by the BETWEEN predicate; etc. Here are two common queries > which can be used to build others:
> 1. An employee and all their Supervisors, no matter how deep the tree.
> SELECT O2.* > FROM OrgChart AS O1, OrgChart AS O2 > WHERE O1.lft BETWEEN O2.lft AND O2.rgt > AND O1.emp = :myemployee;
> 2. The employee and all subordinates. There is a nice symmetry here.
> SELECT O1.* > FROM OrgChart AS O1, OrgChart AS O2 > WHERE O1.lft BETWEEN O2.lft AND O2.rgt > AND O2.emp = :myemployee;
> 3. Add a GROUP BY and aggregate functions to these basic queries and > you have hierarchical reports. For example, the total salaries which > each > employee controls:
> SELECT O2.emp, SUM(S1.salary) > FROM OrgChart AS O1, OrgChart AS O2, > Salaries AS S1 > WHERE O1.lft BETWEEN O2.lft AND O2.rgt > AND O1.emp = S1.emp > GROUP BY O2.emp;
> 4. To find the level of each emp, so you can print the tree as an > indented listing. Technically, you should declare a cursor to go with > the ORDER BY clause.
> SELECT COUNT(O2.emp) AS indentation, O1.emp > FROM OrgChart AS O1, OrgChart AS O2 > WHERE O1.lft BETWEEN O2.lft AND O2.rgt > GROUP BY O1.lft. O1.emp > ORDER BY O1.lft;
> 5. The nested set model has an implied ordering of siblings which > theadjacency list model does not. To insert a new node, G1, under part > G. We can insert one node at a time like this:
> BEGIN ATOMIC > DECLARE rightmost_spread INTEGER;
> SET rightmost_spread > = (SELECT rgt > FROM Frammis > WHERE part = 'G'); > UPDATE Frammis > SET lft = CASE WHEN lft > rightmost_spread > THEN lft + 2 > ELSE lft END, > rgt = CASE WHEN rgt >= rightmost_spread > THEN rgt + 2 > ELSE rgt END > WHERE rgt >= rightmost_spread;
> The idea is to spread the lft and rgt numbers after the youngest child > of the parent, G in this case, over by two to make room for the new > addition, G1. This procedure will add the new node to the rightmost > child position, which helps to preserve the idea of an age order among > the siblings.
> 6. To convert a nested sets model into an adjacency list model:
> SELECT B.emp AS boss, P.emp > FROM OrgChart AS P > LEFT OUTER JOIN > OrgChart AS B > ON B.lft > = (SELECT MAX(lft) > FROM OrgChart AS S > WHERE P.lft > S.lft > AND P.lft < S.rgt);
> 7. To convert an adjacency list to a nested set model, use a push down > stack. Here is version with a stack in SQL/PSM.
> -- Tree holds the adjacency model > CREATE TABLE Tree > (node CHAR(10) NOT NULL, > parent CHAR(10));
> -- Stack starts empty, will holds the nested set model > CREATE TABLE Stack > (stack_top INTEGER NOT NULL, > node CHAR(10) NOT NULL, > lft INTEGER, > rgt INTEGER);
> SET counter = 2; > SET max_counter = 2 * (SELECT COUNT(*) FROM Tree); > SET current_top = 1;
> --clear the stack > DELETE FROM Stack;
> -- push the root > INSERT INTO Stack > SELECT 1, node, 1, max_counter > FROM Tree > WHERE parent IS NULL;
> -- delete rows from tree as they are used > DELETE FROM Tree WHERE parent IS NULL;
> WHILE counter <= max_counter- 1 > IF EXISTS (SELECT * > FROM Stack AS S1, Tree AS T1 > WHERE S1.node = T1.parent > AND S1.stack_top = current_top)
> BEGIN -- push when top has subordinates and set lft value > INSERT INTO Stack > SELECT (current_top + 1), MIN(T1.node), counter, NULL > FROM Stack AS S1, Tree AS T1 > WHERE S1.node = T1.parent > AND S1.stack_top = current_top;
> -- delete rows from tree as they are used > DELETE FROM Tree > WHERE node = (SELECT node > FROM Stack > WHERE stack_top = current_top + 1); > -- housekeeping of stack
>However, I dont store a consistent tree structure. The table at hand >is more a kind of a collection of graphs where I want to find all >possible paths between a given starting point and a given end point
A collection of graphs? As you presented the problem it was simply a single graph.
That's not possible in a single SQL statement without using some form of recursion such as the CONNECT BY in Oracle that was already mentioned or the recursive queries as are possible in RDB. Another "poor man's solution" could for example be to add a table Reachable(node, from_a, to_b) with 'from_a' and 'from_b' boolean field that indicate that the node is reachable from a and that b is reachable from this node. You could compute this relation by repeating a certain SQL update statement that: 1. sets from_a of node n to true if there is a node n' that is reachable from a and there is an edge from n' to n, and 2. set to_b of node n to tur if there is a node n' that leads to b and there is an edge from n to n'. You repeat that until no more flags are changed. Then you select only those edges for which the begin and node have both flags set to true.
> >However, I dont store a consistent tree structure. The table at hand > >is more a kind of a collection of graphs where I want to find all > >possible paths between a given starting point and a given end point
> A collection of graphs? As you presented the problem it was simply a single > graph.
There is no such thing as a collection of graphs. Any graph in the collection can be viewed just as a (disconnected) component of the aggregate graph.
> That's not possible in a single SQL statement without using some form of > recursion such as the CONNECT BY in Oracle that was already mentioned or the > recursive queries as are possible in RDB. Another "poor man's solution" could for > example be to add a table Reachable(node, from_a, to_b) with 'from_a' and > 'from_b' boolean field that indicate that the node is reachable from a and > that b is reachable from this node. You could compute this relation by > repeating a certain SQL update statement that: > 1. sets from_a of node n to true if there is a node n' that is reachable > from a and there is an edge from n' to n, and > 2. set to_b of node n to tur if there is a node n' that leads to b and there > is an edge from n to n'. > You repeat that until no more flags are changed. Then you select only those > edges for which the begin and node have both flags set to true.
There is not much about graph labeling in the literature! In the meantime, here is one more labeling schema for the trees:
>> The table at hand is more a kind of a collection of graphs where I
want to find all possible paths between a given starting point and a given end point. <<
For the reachabiity index of a general graph, you need Warshal's algorithm.
Let V = number of nodes in the graph Let A[i,j] be the adjacency matrix for the undirected graph
FOR j:= 1 TO V DO FOR i:= 1 TO V DO IF A[i,j] = 1 THEN FOR k := 1 TO V DO IF A[j,k]] = 1 THEN A[i,k]] := 1;
You can also do a summation to get the length of the path from i to j. You can concatenate names of the nodes into a string that gives the path, etc.
Her is a first attempt at some SQL; I am sure it can be done better
CREATE TABLE Graph (i CHAR(2) NOT NULL, j CHAR(2) NOT NULL, flag CHAR(1) NOT NULL DEFAULT 'n' CHECK (flag IN ('n', 'y')), PRIMARY KEY (i,j));
INSERT INTO Graph (i, j, flag) SELECT DISTINCT G1.i, G2.j, 'y' FROM Graph AS G1, Graph AS G1 WHERE G1.i <> G2.j AND EXISTS (SELECT * FROM Graph AS G3 WHERE G3.i = G1.j AND G3.j = G2.i) AND NOT EXISTS (SELECT * FROM Graph AS G3 WHERE (G3.i = G1.i AND G3.j = G1.j)) OR (G3.i = G2.i AND G3.j = G2.j));
You wll have to run this statement until the size of Graph does not change -- no new rows are being added.
> >> The table at hand is more a kind of a collection of graphs where I > want to find all possible paths between a given starting point and a > given end point. <<
> For the reachabiity index of a general graph, you need Warshal's > algorithm.
> Let V = number of nodes in the graph > Let A[i,j] be the adjacency matrix for the undirected graph
> FOR j:= 1 TO V > DO FOR i:= 1 TO V > DO IF A[i,j] = 1 > THEN FOR k := 1 TO V > DO IF A[j,k]] = 1 > THEN A[i,k]] := 1;
> You can also do a summation to get the length of the path from i to j. > You can concatenate names of the nodes into a string that gives the > path, etc.
> Her is a first attempt at some SQL; I am sure it can be done better
> CREATE TABLE Graph > (i CHAR(2) NOT NULL, > j CHAR(2) NOT NULL, > flag CHAR(1) NOT NULL DEFAULT 'n' > CHECK (flag IN ('n', 'y')), > PRIMARY KEY (i,j));
> INSERT INTO Graph (i, j, flag) > SELECT DISTINCT G1.i, G2.j, 'y' > FROM Graph AS G1, Graph AS G1 > WHERE G1.i <> G2.j > AND EXISTS > (SELECT * > FROM Graph AS G3 > WHERE G3.i = G1.j > AND G3.j = G2.i) > AND NOT EXISTS > (SELECT * > FROM Graph AS G3 > WHERE (G3.i = G1.i AND G3.j = G1.j)) > OR (G3.i = G2.i AND G3.j = G2.j));
> You wll have to run this statement until the size of Graph does not > change -- no new rows are being added.
first of all thanks for Your answers. They certainly gave me direction on what to learn concerning SQL.
Being a novice in SQL and after doing just a couple of days of research on that matter I got the impression that SQL hasn't yet evolved to that point to provide simple to use keywords for these kind of problems. Oracle 9i seems to have been progressed over 8i in this regard. One can use subqueries with the *with as* clause, one can use the *connect by* statement on views and joined statements and there may be a couple of other features... However I am restricted to 8i and java and the requirement not to use pl/sql for the sake of a common code base.
Currently I figure to use
select cs, cb, level from link_t start with cb in ( select cb from link_t where ´cs = 'A') connect by prior cb = cs
that will return a tree containing all paths with the root as the common starting point. I would load this structure into a middle teer and apply some sort of commonly used tree traversial algorithm to it to search for nodes and endpoints of interest.
71062.1...@compuserve.com (--CELKO--) wrote in message <news:c0d87ec0.0301291315.7ae946eb@posting.google.com>... > >> The table at hand is more a kind of a collection of graphs where I > want to find all possible paths between a given starting point and a > given end point. <<
> For the reachabiity index of a general graph, you need Warshal's > algorithm.
> Let V = number of nodes in the graph > Let A[i,j] be the adjacency matrix for the undirected graph
> FOR j:= 1 TO V > DO FOR i:= 1 TO V > DO IF A[i,j] = 1 > THEN FOR k := 1 TO V > DO IF A[j,k]] = 1 > THEN A[i,k]] := 1;
> You can also do a summation to get the length of the path from i to j. > You can concatenate names of the nodes into a string that gives the > path, etc.
> Her is a first attempt at some SQL; I am sure it can be done better
> CREATE TABLE Graph > (i CHAR(2) NOT NULL, > j CHAR(2) NOT NULL, > flag CHAR(1) NOT NULL DEFAULT 'n' > CHECK (flag IN ('n', 'y')), > PRIMARY KEY (i,j));
> INSERT INTO Graph (i, j, flag) > SELECT DISTINCT G1.i, G2.j, 'y' > FROM Graph AS G1, Graph AS G1 > WHERE G1.i <> G2.j > AND EXISTS > (SELECT * > FROM Graph AS G3 > WHERE G3.i = G1.j > AND G3.j = G2.i) > AND NOT EXISTS > (SELECT * > FROM Graph AS G3 > WHERE (G3.i = G1.i AND G3.j = G1.j)) > OR (G3.i = G2.i AND G3.j = G2.j));
> You wll have to run this statement until the size of Graph does not > change -- no new rows are being added.
>> Being a novice in SQL and after doing just a couple of days of
research on that matter ... <<
Another boss who will not pay for training :0!!
>> I got the impression that SQL hasn't yet evolved to that point to
provide simple to use keywords for these kind of problems. <<
SQL is a set oriented language, not a procedural language. You have to learn to model data in terms of sets and not sequences to use it.
>> Oracle 9i seems to have been progressed over 8i in this regard. <<
Actually, Oracle is a horrible product and their extensions are flaws that lock the code into a particular underlying sequential physical implementation with sorting and cursors, no parallelism, etc.
>> Currently I figure to use
SELECT cs, cb, level FROM link_t START WITH cb IN (SELECT cb FROM link_t WHERE cs = 'A') CONNECT BY PRIOR cb = cs;
that will return a tree containing all paths with the root as the common starting point. <<
It returns a sequential file that represents a table and that represernation depends on the order of the records (they are no longer rows in a table).
>> I would load this structure into a middle tier and apply some sort
of commonly used tree traversial algorithm to it to search for nodes and endpoints of interest. <<
You can do all of those searches in the Nested Sets model with a single query, no middle ware and they will run 10 to 100 times faster for large trees.
I have a book on trees in SQL that will be published later this year that you might want to buy.
--CELKO-- wrote: > >> Being a novice in SQL and after doing just a couple of days of > research on that matter ... <<
> Another boss who will not pay for training :0!!
> >> I got the impression that SQL hasn't yet evolved to that point to > provide simple to use keywords for these kind of problems. <<
> SQL is a set oriented language, not a procedural language. You have > to learn to model data in terms of sets and not sequences to use it.
> >> Oracle 9i seems to have been progressed over 8i in this regard. <<
> Actually, Oracle is a horrible product and their extensions are flaws > that lock the code into a particular underlying sequential physical > implementation with sorting and cursors, no parallelism, etc.
> >> Currently I figure to use
> SELECT cs, cb, level > FROM link_t > START WITH cb > IN (SELECT cb > FROM link_t > WHERE cs = 'A') > CONNECT BY PRIOR cb = cs;
> that will return a tree containing all paths with the root as the > common starting point. <<
> It returns a sequential file that represents a table and that > represernation depends on the order of the records (they are no > longer rows in a table).
> >> I would load this structure into a middle tier and apply some sort > of commonly used tree traversial algorithm to it to search for nodes > and endpoints of interest. <<
> You can do all of those searches in the Nested Sets model with a > single query, no middle ware and they will run 10 to 100 times faster > for large trees.
> I have a book on trees in SQL that will be published later this year > that you might want to buy.
I'll not take issue with you with respect to your field of expertise. But to say Oracle is a horrible product presumes two things: first that you are being objective with respect to Oracle in comparison with other products, and second that you have sufficient expertise with the product from which to render that opinion.
If I am incorrect I will gladly stand corrected. But based on my knowledge of what you have been doing for the last few years ... I have my doubts about both. I have worked with databases from Teradata to Oracle to DB2 and most of the rest of the field. And I see nothing in Oracle that isn't at least the equal to the other major RDBMS implementations including the one in which you do your work.
> SELECT cs, cb, level > FROM link_t > START WITH cb > IN (SELECT cb > FROM link_t > WHERE cs = 'A') > CONNECT BY PRIOR cb = cs;
> that will return a tree containing all paths with the root as the > common starting point. <<
> It returns a sequential file that represents a table and that > represernation depends on the order of the records (they are no > longer rows in a table).
Looks like we won't see a "connect-by" chapter in your book;-)
> > SELECT cs, cb, level > > FROM link_t > > START WITH cb > > IN (SELECT cb > > FROM link_t > > WHERE cs = 'A') > > CONNECT BY PRIOR cb = cs;
> > that will return a tree containing all paths with the root as the > > common starting point. <<
> > It returns a sequential file that represents a table and that > > represernation depends on the order of the records (they are no > > longer rows in a table).
> Looks like we won't see a "connect-by" chapter in your book;-)
>> Looks like we won't see a "connect-by" chapter in your book;-) <<
I do mention it -- as the work of Satan :)
>> I would mention that there is a striking syntactic and semantic
similarity between "connect-by" and "group-by"... <<
Syntax, maybe if you count using the word "BY" in the construct. But they are nothing alike.
Here is how a SELECT works in SQL ... at least in theory. Real products will optimize things when they can.
a) Start in the FROM clause and build a working table from all of the joins, unions, intersections, and whatever other table constructors are there. The table expression> AS <correlation name> option allows you give a name to this working table which you then have to use for the rest of the containing query.
b) Go to the WHERE clause and remove rows that do not pass criteria; that is, that do not test to TRUE (reject UNKNOWN and FALSE). The WHERE clause is applied to the working in the FROM clause.
c) Go to the optional GROUP BY clause, make groups and reduce each group to a single row, replacing the original working table with the new grouped table. The rows of a grouped table must be group characteristics: (1) a grouping column (2) a statistic about the group (i.e. aggregate functions) (3) a function or (4) an expression made up of the those three items.
d) Go to the optional HAVING clause and apply it against the grouped working table; if there was no GROUP BY clause, treat the entire table as one group.
e) Go to the SELECT clause and construct the expressions in the list. This means that the scalar subqueries, function calls and expressions in the SELECT are done after all the other clauses are done. The AS operator can give a name to expressions in the SELECT list, too. These new names come into existence all at once, but after the WHERE clause, GROUP BY clause and HAVING clause has been executed; you cannot use them in the SELECT list or the WHERE clause for that reason.
If there is a SELECT DISTINCT, then redundant duplicate rows are removed. For purposes of defining a duplicate row, NULLs are treated as matching (just like in the GROUP BY).
f) Nested query expressions follow the usual scoping rules you would expect from a block structured language like C, Pascal, Algol, etc. Namely, the innermost queries can reference columns and tables in the queries in which they are contained.
The CONNECT BY is a rule for a cursor to traverse a chain.
God, forbid that someone use something that makes their life easier. Those of use that are in the trenches use what tools we have to accomplish our tasks and ignore academician's purity arguments.
"--CELKO--" <71062.1...@compuserve.com> wrote in message
--CELKO-- wrote: > >> Looks like we won't see a "connect-by" chapter in your book;-) <<
> I do mention it -- as the work of Satan :)
> >> I would mention that there is a striking syntactic and semantic > similarity between "connect-by" and "group-by"... <<
> Syntax, maybe if you count using the word "BY" in the construct. But > they are nothing alike.
> Here is how a SELECT works in SQL ... at least in theory. Real > products will optimize things when they can.
> a) Start in the FROM clause and build a working table from all of the > joins, unions, intersections, and whatever other table constructors > are there. The table expression> AS <correlation name> option allows > you give a name to this working table which you then have to use for > the rest of the containing query.
> b) Go to the WHERE clause and remove rows that do not pass criteria; > that is, that do not test to TRUE (reject UNKNOWN and FALSE). The > WHERE clause is applied to the working in the FROM clause.
> c) Go to the optional GROUP BY clause, make groups and reduce each > group to a single row, replacing the original working table with the > new grouped table. The rows of a grouped table must be group > characteristics: (1) a grouping column (2) a statistic about the group > (i.e. aggregate functions) (3) a function or (4) an expression made up > of the those three items.
> d) Go to the optional HAVING clause and apply it against the grouped > working table; if there was no GROUP BY clause, treat the entire table > as one group.
> e) Go to the SELECT clause and construct the expressions in the list. > This means that the scalar subqueries, function calls and expressions > in the SELECT are done after all the other clauses are done. The AS > operator can give a name to expressions in the SELECT list, too. > These new names come into existence all at once, but after the WHERE > clause, GROUP BY clause and HAVING clause has been executed; you > cannot use them in the SELECT list or the WHERE clause for that > reason.
> If there is a SELECT DISTINCT, then redundant duplicate rows are > removed. For purposes of defining a duplicate row, NULLs are treated > as matching (just like in the GROUP BY).
> f) Nested query expressions follow the usual scoping rules you would > expect from a block structured language like C, Pascal, Algol, etc. > Namely, the innermost queries can reference columns and tables in the > queries in which they are contained.
> The CONNECT BY is a rule for a cursor to traverse a chain.
So because you don't happen to like something it makes the product 'horrible'? Not ... there are certain features or facilities in the product I don't use and why ... just a blanket thumb down? I'm not impressed. I can come up with a list of poor features and poorly implemented features in any product from assembly language on up. Does that mean we should we go back to 3x5 cards?
When you are ready to demonstrate a superior implementation in another SQL RDBMS please let us know. And I don't mean superior in terms of some theoretical concept like building a database in fifth normal form. I mean superior in terms of performance and scalability in the real world.
> Actually, Oracle is a horrible product and their extensions are flaws > that lock the code into a particular underlying sequential physical > implementation with sorting and cursors, no parallelism, etc.
Pretty strong statement. Care to expound and provide examples of the better ones? -- Galen deForest Boyer Sweet dreams and flying machines in pieces on the ground.
Is not that my boss wouldn't pay for training. It is just that I can't take an endless steep learning curve. It is not just SQL I have to learn to get my job done, there other aereas as well.
> >> I got the impression that SQL hasn't yet evolved to that point to > provide simple to use keywords for these kind of problems. <<
> SQL is a set oriented language, not a procedural language. You have > to learn to model data in terms of sets and not sequences to use it.
What I realized, it is a totally different animal and it is amazing what one can achieve with statements just a few lines long. I will teach myself up on that stuff. But it'll take time to sink in.
Cheers
Juergen
P.S. CELKO, I learnd about Your good reputation in the SQL community. Keep care to not destroy it by cheaply bashing trademarks without giving a clean argumentation.
> P.S. CELKO, I learnd about Your good reputation in the SQL community. > Keep care to not destroy it by cheaply bashing trademarks without > giving a clean argumentation.
Well said. That's what I was thinking when I made the decision to challenge his statement.
>> When you are ready to demonstrate a superior implementation in
another SQL RDBMS please let us know. And I don't mean superior in terms of some theoretical concept like building a database in fifth normal form. I mean superior in terms of performance and scalability in the real world. <<
Funny you should ask :)!! I have a book coming out later this year on TREES & HIERARCHIES IN SQL (Morgan-Kaufmann), which gives several different models in Standard SQL-92 for tree structures which are superior in terms of performance and scalability compared to the CONNECT BY.
A few years back, we ran a test on a huge tree with the following method versus cursors and CONNECT BY; it was 20 to 100 times faster for 75,000 nodes and 12 levels deep. It could also do things like compare the structures of sub-trees in one SELECT statement.
The usual example of a tree structure in SQL books is called an adjacency list model and it looks like this:
CREATE TABLE OrgChart (emp CHAR(10) NOT NULL PRIMARY KEY, boss CHAR(10) DEFAULT NULL REFERENCES OrgChart(emp), salary DECIMAL(6,2) NOT NULL DEFAULT 100.00);
Another way of representing trees is to show them as nested sets. Since SQL is a set oriented language, this is a better model than the usual adjacency list approach you see in most text books. Let us define a simple OrgChart table like this, ignoring the left (lft) and right (rgt) columns for now. This problem is always given with a column for the employee and one for his boss in the textbooks. This table without the lft and rgt columns is called the adjacency list model, after the graph theory technique of the same name; the pairs of emps are adjacent to each other.
The organizational chart would look like this as a directed graph:
Albert (1,12) / \ / \ Bert (2,3) Chuck (4,11) / | \ / | \ / | \ / | \ Donna (5,6) Eddie (7,8) Fred (9,10)
The first table is denormalized in several ways. We are modeling both the OrgChart and the organizational chart in one table. But for the sake of saving space, pretend that the names are job titles and that we have another table which describes the OrgChart that hold those positions.
Another problem with the adjacency list model is that the boss and employee columns are the same kind of thing (i.e. names of OrgChart), and therefore should be shown in only one column in a normalized table. To prove that this is not normalized, assume that "Chuck" changes his name to "Charles"; you have to change his name in both columns and several places. The defining characteristic of a normalized table is that you have one fact, one place, one time.
The final problem is that the adjacency list model does not model subordination. Authority flows downhill in a hierarchy, but If I fire Chuck, I disconnect all of his subordinates from Albert. There are situations (i.e. water pipes) where this is true, but that is not the expected situation in this case.
To show a tree as nested sets, replace the emps with ovals, then nest subordinate ovals inside each other. The root will be the largest oval and will contain every other emp. The leaf emps will be the innermost ovals with nothing else inside them and the nesting will show the hierarchical relationship. The rgt and lft columns (I cannot use the reserved words LEFT and RIGHT in SQL) are what shows the nesting.
If that mental model does not work, then imagine a little worm crawling anti-clockwise along the tree. Every time he gets to the left or right side of a emp, he numbers it. The worm stops when he gets all the way around the tree and back to the top.
This is a natural way to model a parts explosion, since a final assembly is made of physically nested assemblies that final break down into separate parts.
At this point, the boss column is both redundant and denormalized, so it can be dropped. Also, note that the tree structure can be kept in one table and all the information about a emp can be put in a second table and they can be joined on employee number for queries.
To convert the graph into a nested sets model think of a little worm crawling along the tree. The worm starts at the top, the root, makes a complete trip around the tree. When he comes to a emp, he puts a number in the cell on the side that he is visiting and increments his counter. Each emp will get two numbers, one of the right side and one for the left. Computer Science majors will recognize this as a modified preorder tree traversal algorithm. Finally, drop the unneeded OrgChart.boss column which used to represent the edges of a graph.
This has some predictable results that we can use for building queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*) FROM TreeTable)); leaf emps always have (left + 1 = right); subtrees are defined by the BETWEEN predicate; etc. Here are two common queries which can be used to build others:
1. An employee and all their Supervisors, no matter how deep the tree.
SELECT O2.* FROM OrgChart AS O1, OrgChart AS O2 WHERE O1.lft BETWEEN O2.lft AND O2.rgt AND O1.emp = :myemployee;
2. The employee and all subordinates. There is a nice symmetry here.
SELECT O1.* FROM OrgChart AS O1, OrgChart AS O2 WHERE O1.lft BETWEEN O2.lft AND O2.rgt AND O2.emp = :myemployee;
3. Add a GROUP BY and aggregate functions to these basic queries and you have hierarchical reports. For example, the total salaries which each employee controls:
SELECT O2.emp, SUM(S1.salary) FROM OrgChart AS O1, OrgChart AS O2, Salaries AS S1 WHERE O1.lft BETWEEN O2.lft AND O2.rgt AND O1.emp = S1.emp GROUP BY O2.emp;
4. To find the level of each emp, so you can print the tree as an indented listing. Technically, you should declare a cursor to go with the ORDER BY clause.
SELECT COUNT(O2.emp) AS indentation, O1.emp FROM OrgChart AS O1, OrgChart AS O2 WHERE O1.lft BETWEEN O2.lft AND O2.rgt GROUP BY O1.lft. O1.emp ORDER BY O1.lft;
5. The nested set model has an implied ordering of siblings which theadjacency list model does not. To insert a new node, G1, under part G. We can insert one node at a time like this:
BEGIN ATOMIC DECLARE rightmost_spread INTEGER;
SET rightmost_spread = (SELECT rgt FROM Frammis WHERE part = 'G'); UPDATE Frammis SET lft = CASE WHEN lft > rightmost_spread THEN lft + 2 ELSE lft END, rgt = CASE WHEN rgt >= rightmost_spread THEN rgt + 2 ELSE rgt END WHERE rgt >= rightmost_spread;
The idea is to spread the lft and rgt numbers after the youngest child of the parent, G in this case, over by two to make room for the new addition, G1. This procedure will add the new node to the rightmost child position, which helps to preserve the idea of an age order among the siblings.
6. To convert a nested sets model into an adjacency list model:
SELECT B.emp AS boss, P.emp FROM OrgChart AS P LEFT OUTER JOIN OrgChart AS B ON B.lft = (SELECT MAX(lft) FROM OrgChart AS S WHERE P.lft > S.lft AND P.lft < S.rgt);
7. To convert an adjacency list to a nested set model, use a push down stack. Here is version with a stack in SQL/PSM.
-- Tree holds the adjacency model CREATE TABLE Tree (node CHAR(10) NOT NULL, parent CHAR(10));
-- Stack starts empty, will holds the nested set model CREATE TABLE Stack (stack_top INTEGER NOT NULL, node CHAR(10) NOT NULL, lft INTEGER, rgt INTEGER);
SET counter = 2; SET max_counter = 2 * (SELECT COUNT(*) FROM Tree); SET current_top = 1;
--clear the stack DELETE FROM Stack;
-- push the root INSERT INTO Stack SELECT 1, node, 1, max_counter FROM Tree WHERE parent IS NULL;
-- delete rows from tree as they are used DELETE FROM Tree WHERE parent IS NULL;
WHILE counter <= max_counter- 1 IF EXISTS (SELECT * FROM Stack AS S1, Tree AS T1 WHERE S1.node = T1.parent AND S1.stack_top = current_top)
BEGIN -- push when top has subordinates and set lft value INSERT INTO Stack SELECT (current_top + 1), MIN(T1.node), counter, NULL FROM Stack AS S1, Tree AS T1 WHERE S1.node = T1.parent AND S1.stack_top = current_top;
-- delete rows from tree as they are used DELETE FROM Tree WHERE node = (SELECT node FROM Stack WHERE stack_top = current_top + 1); -- housekeeping of stack pointers and counter SET counter = counter + 1; SET current_top = current_top + 1; END ELSE BEGIN -- pop the stack and set rgt value UPDATE Stack SET rgt = counter, stack_top = -stack_top -- pops the stack WHERE stack_top = current_top SET counter = counter + 1; SET current_top = current_top - 1; END; END IF; -- the top column is not needed in the final answer SELECT node, lft, rgt FROM Stack; END;
...
>> Is not that my boss wouldn't pay for training. It is just that I
can't take an endless steep learning curve. It is not just SQL I have to learn to get my job done, there other areas as well. <<
I know I am preaching to church chorus, but there is too much of this happening right now. Nobody can be an expert in a dozen different languages, environments, etc. and do a good job. There are enough unemployed geeks on the market that companies ought to be able to rent the expertise they need if they are not willing to take the time to do it right. You are already know this and I am jut ranting ...
>> What I realized, it is a totally different animal and it is amazing
what one can achieve with statements just a few lines long. I will teach myself up on that stuff. But it'll take time to sink in. <<
I estimate it takes about one year of full time SQL programming to "un-learn" procedural thinking and then some more time after that to get to be a good SQL programmer. The people who have the least problems are LISP programmers -- they are used to nesting functions on a data structure to get a result.
>> Keep care to not destroy it by cheaply bashing trademarks without
--CELKO-- wrote: > >> When you are ready to demonstrate a superior implementation in > another SQL > RDBMS please let us know. And I don't mean superior in terms of some > theoretical concept like building a database in fifth normal form. I > mean superior in terms of performance and scalability in the real > world. <<
> Funny you should ask :)!! I have a book coming out later this year on > TREES & HIERARCHIES IN SQL (Morgan-Kaufmann), which gives several > different models in Standard SQL-92 for tree structures which are > superior in terms of performance and scalability compared to the > CONNECT BY.
> A few years back, we ran a test on a huge tree with the following > method versus cursors and CONNECT BY; it was 20 to 100 times faster > for 75,000 nodes and 12 levels deep. It could also do things like > compare the structures of sub-trees in one SELECT statement.
> The usual example of a tree structure in SQL books is called an > adjacency list model and it looks like this:
> CREATE TABLE OrgChart > (emp CHAR(10) NOT NULL PRIMARY KEY, > boss CHAR(10) DEFAULT NULL REFERENCES OrgChart(emp), > salary DECIMAL(6,2) NOT NULL DEFAULT 100.00);
> Another way of representing trees is to show them as nested sets. > Since SQL is a set oriented language, this is a better model than the > usual adjacency list approach you see in most text books. Let us > define a simple OrgChart table like this, ignoring the left (lft) and > right (rgt) columns for now. This problem is always given with a > column for the employee and one for his boss in the textbooks. This > table without the lft and rgt columns is called the adjacency list > model, after the graph theory technique of the same name; the pairs of > emps are adjacent to each other.
> The organizational chart would look like this as a directed graph:
> Albert (1,12) > / \ > / \ > Bert (2,3) Chuck (4,11) > / | \ > / | \ > / | \ > / | \ > Donna (5,6) Eddie (7,8) Fred (9,10)
> The first table is denormalized in several ways. We are modeling both > the OrgChart and the organizational chart in one table. But for the > sake of saving space, pretend that the names are job titles and that > we have another table which describes the OrgChart that hold those > positions.
> Another problem with the adjacency list model is that the boss and > employee columns are the same kind of thing (i.e. names of OrgChart), > and therefore should be shown in only one column in a normalized > table. To prove that this is not normalized, assume that "Chuck" > changes his name to "Charles"; you have to change his name in both > columns and several places. The defining characteristic of a > normalized table is that you have one fact, one place, one time.
> The final problem is that the adjacency list model does not model > subordination. Authority flows downhill in a hierarchy, but If I fire > Chuck, I disconnect all of his subordinates from Albert. There are > situations (i.e. water pipes) where this is true, but that is not the > expected situation in this case.
> To show a tree as nested sets, replace the emps with ovals, then nest > subordinate ovals inside each other. The root will be the largest oval > and will contain every other emp. The leaf emps will be the innermost > ovals with nothing else inside them and the nesting will show the > hierarchical relationship. The rgt and lft columns (I cannot use the > reserved words LEFT and RIGHT in SQL) are what shows the nesting.
> If that mental model does not work, then imagine a little worm > crawling anti-clockwise along the tree. Every time he gets to the left > or right side of a emp, he numbers it. The worm stops when he gets all > the way around the tree and back to the top.
> This is a natural way to model a parts explosion, since a final > assembly is made of physically nested assemblies that final break down > into separate parts.
> At this point, the boss column is both redundant and denormalized, so > it can be dropped. Also, note that the tree structure can be kept in > one table and all the information about a emp can be put in a second > table and they can be joined on employee number for queries.
> To convert the graph into a nested sets model think of a little worm > crawling along the tree. The worm starts at the top, the root, makes a > complete trip around the tree. When he comes to a emp, he puts a > number in the cell on the side that he is visiting and increments his > counter. Each emp will get two numbers, one of the right side and one > for the left. Computer Science majors will recognize this as a > modified preorder tree traversal algorithm. Finally, drop the unneeded > OrgChart.boss column which used to represent the edges of a graph.
> This has some predictable results that we can use for building > queries. The root is always (left = 1, right = 2 * (SELECT COUNT(*) > FROM TreeTable)); leaf emps always have (left + 1 = right); subtrees > are defined by the BETWEEN predicate; etc. Here are two common queries > which can be used to build others:
> 1. An employee and all their Supervisors, no matter how deep the tree.
> SELECT O2.* > FROM OrgChart AS O1, OrgChart AS O2 > WHERE O1.lft BETWEEN O2.lft AND O2.rgt > AND O1.emp = :myemployee;
> 2. The employee and all subordinates. There is a nice symmetry here.
> SELECT O1.* > FROM OrgChart AS O1, OrgChart AS O2 > WHERE O1.lft BETWEEN O2.lft AND O2.rgt > AND O2.emp = :myemployee;
> 3. Add a GROUP BY and aggregate functions to these basic queries and > you have hierarchical reports. For example, the total salaries which > each > employee controls:
> SELECT O2.emp, SUM(S1.salary) > FROM OrgChart AS O1, OrgChart AS O2, > Salaries AS S1 > WHERE O1.lft BETWEEN O2.lft AND O2.rgt > AND O1.emp = S1.emp > GROUP BY O2.emp;
> 4. To find the level of each emp, so you can print the tree as an > indented listing. Technically, you should declare a cursor to go with > the ORDER BY clause.
> SELECT COUNT(O2.emp) AS indentation, O1.emp > FROM OrgChart AS O1, OrgChart AS O2 > WHERE O1.lft BETWEEN O2.lft AND O2.rgt > GROUP BY O1.lft. O1.emp > ORDER BY O1.lft;
> 5. The nested set model has an implied ordering of siblings which > theadjacency list model does not. To insert a new node, G1, under part > G. We can insert one node at a time like this:
> BEGIN ATOMIC > DECLARE rightmost_spread INTEGER;
> SET rightmost_spread > = (SELECT rgt > FROM Frammis > WHERE part = 'G'); > UPDATE Frammis > SET lft = CASE WHEN lft > rightmost_spread > THEN lft + 2 > ELSE lft END, > rgt = CASE WHEN rgt >= rightmost_spread > THEN rgt + 2 > ELSE rgt END > WHERE rgt >= rightmost_spread;
> The idea is to spread the lft and rgt numbers after the youngest child > of the parent, G in this case, over by two to make room for the new > addition, G1. This procedure will add the new node to the rightmost > child position, which helps to preserve the idea of an age order among > the siblings.
> 6. To convert a nested sets model into an adjacency list model:
> SELECT B.emp AS boss, P.emp > FROM OrgChart AS P > LEFT OUTER JOIN > OrgChart AS B > ON B.lft > = (SELECT MAX(lft) > FROM OrgChart AS S > WHERE P.lft > S.lft > AND P.lft < S.rgt);
> 7. To convert an adjacency list to a nested set model, use a push down > stack. Here is version with a stack in SQL/PSM.
> -- Tree holds the adjacency model > CREATE TABLE Tree > (node CHAR(10) NOT NULL, > parent CHAR(10));
> -- Stack starts empty, will holds the nested set model > CREATE TABLE Stack > (stack_top INTEGER NOT NULL, > node CHAR(10) NOT NULL, > lft INTEGER, > rgt INTEGER);
> SET counter = 2; > SET max_counter = 2 * (SELECT COUNT(*) FROM Tree); > SET current_top = 1;
> --clear the stack > DELETE FROM Stack;
> -- push the root > INSERT INTO Stack > SELECT 1, node, 1, max_counter > FROM Tree > WHERE parent IS NULL;
> -- delete rows from tree as they are used > DELETE FROM Tree WHERE parent IS NULL;
> WHILE counter <= max_counter- 1 > IF EXISTS (SELECT * > FROM Stack AS S1, Tree AS T1 > WHERE S1.node = T1.parent > AND S1.stack_top = current_top)
> BEGIN -- push when top has subordinates and set lft value > INSERT INTO Stack > SELECT (current_top + 1), MIN(T1.node), counter, NULL > FROM Stack AS S1, Tree AS T1 > WHERE S1.node = T1.parent > AND S1.stack_top = current_top;
> -- delete rows from tree as they are used > DELETE FROM Tree > WHERE node = (SELECT node > FROM Stack > WHERE stack_top = current_top + 1); > -- housekeeping of stack pointers and
>> But unless you are going to state that the above can NOT be done in
Oracle ... the fact that CONNECT BY exists is not license to call the product, as you did, 'horrible'. <<
How about if I say the code to do tree manipulation with CONNECT BY is orders of magnitude slower, not Standard, not portable and quickly becomes a nightmare of nested subqueries to maintain? Once more, let me demonstrate my points with actual code. (If that dos not work, what will convince you that I have a valid position?)
The query "Show all subcomponents of part A1, including the substructure" can be handled by the following Oracle SQL statement:
SELECT LEVEL AS pathlength, assemblyno, subassemblyno FROM Blueprint START WITH assemblyno = 'A1' CONNECT BY PRIOR subassemblyno = assemblyno;
The CONNECT BY ... PRIOR clause provides traversal but not support for recursive aggregate functions. For example, it is not possible to sum the weights of all subcomponents of part A1 to find the weight of A1. The only recursive function supported by the CONNECT BY ... PRIOR clause is the LEVEL function.
Another limitation of the CONNECT BY ... PRIOR clause is that it does not permit the use of joins. The reason for disallowing joins is that the order in which the rows are returned in the result is important. The parent nodes appear before their children, so you know that if the pathlength increases, these are children; if it does not, they are new nodes at a higher level.
This also means that an ORDER BY can destroy any meaning in the results. This means, moreover, that the CONNECT BY ... PRIOR result is not a true table, since a table by definition does not have an internal ordering. In addition, this means that it is not always possible to use the result of a CONNECT BY query in another query.
A trick for working around this limitation, which makes indirect use of the CONNECT BY ... PRIOR clause, is to hide it in a subquery that is used to make a JOIN at the higher level. For example, to attach a product category description, form another table to the parts explosion.
SELECT part_nbr, category_name FROM Parts, ProductCategories WHERE Parts.category_id = ProductCategories.category_id AND part_nbr IN (SELECT subassemblyno FROM Blueprint START WITH assemblyno = 'A1' CONNECT BY PRIOR subassemblyno = assemblyno);
The subquery has only one table in the FROM clause and complies with the restriction that there must be no joins in any query block that contains a CONNECT BY ... PRIOR. On the other hand, the main query involves a JOIN of two tables, which would not be possible with direct use of the CONNECT BY ... PRIOR clause. Another query that cannot be processed by direct use of the CONNECT BY ... PRIOR clause is one that displays all parent-child relationships at all levels. A technique to process this query is illustrated by the following SQL:
SELECT DISTINCT PX.part_nbr, PX.pname, PY.part_nbr, PY.pname FROM Parts AS PX, Parts AS PY WHERE PY.part_nbr IN (SELECT Blueprint.subassemblyno FROM Blueprint START WITH assemblyno = PX.part_nbr CONNECT BY PRIOR subassemblyno = assemblyno) ORDER BY PX.part_nbr, PY.part_nbr;
Again, the outer query includes a JOIN, which is not allowed with the CONNECT BY ... PRIOR clause. Note that the correlated subquery references PX.part_nbr. As you try to do more, the subquery nesting gets worse.
The basic problem is that CONNECT BY is a cursor (procedural code) and SQL is set-oriented. They don't work together very well
>> So are you complaining because there is a less superior method
available for those not as technically proficient as you, are you complaining because that less superior implementation doesn't exist in Transact SQL, or are you complaining just because you don't personally work in Oracle? <<
Why do you think I like T-SQL? The T-SQL newsgroups have more SQL programming problems than other product newsgroups, which tend to focus on particulars of their product. While SQL Server is getting better (so is Oracle)and I hope Yukon cleasns up some of the old "Sybase Code Museum" problems, but it still has some major problems in the basic underlying model.
I am trying to figure out "those not as technically proficient as me"?? I think that anyone can understand the nested sets or nested intervals model if they understand HTML or XML or work in a block structured language -- it is the same concept. It is all just "parentheses in disguise", not rocket science. I have not lost an audience when I teach it and several times programmers who were introduced to the technique for the first time came up with some stuff I had not thought of before -- they started drawing circles inside circles (i.e. set diagrams) instead of "boxes and arrows" (i.e. sequential travesals) and looked at their problems differently.
> >> But unless you are going to state that the above can NOT be done in > Oracle ... the fact that CONNECT BY exists is not license to call the > product, as you did, 'horrible'. <<
<blah blah snipped>
Hi Celko
You made the comment that "Oracle is a horrible product" and argued the point (rather verbosely) that such an assessment was due to your dislike of the connect by clause.
The fact there (may) be better alternatives than the connect by clause with Oracle hardly constitutes Oracle being a horrible product.
I mean, there is a tad more to Oracle than connect by ........
Hope the cover to your new book is fresh and exciting lest people think it a horrible book ;)