Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Recursive Delete in Stored Proc

268 views
Skip to first unread message

ToeKnee

unread,
Feb 7, 2004, 6:43:45 PM2/7/04
to
Sorry if this has been done before, I'm new to SQL and can't seem to
get this to work. The basic idea is I have Table1 that has a
heirarchtical structure, and a second table that holds information
about the nodes in the heirarchy. I want to be able to delete from
anywhere in the heirarchy (through recursive stored proc?) from Table1
and have all children removed and values from table to that reference
that node or the children nodes. Here is an example of what the
tables look like: (simplified version):

Table1:

NodeId ParentID
1 0
2 1
3 1
4 1
5 2
6 2

Table2:

NodeId Value
1 blah
2 blah
2 blah2
3 blah
4 blah

A couple of notes:
I'm not concerned about nesting more than 32 times in recursion.
I'm not concerned with the best performing solution, only the
simplest.

Could someone give me an example. I've seen suggestions, but I've
been unable to find any examples...

Thanks in advance,
Tony

Andrew John

unread,
Feb 7, 2004, 8:39:27 PM2/7/04
to
ToeKnee,

As you seem to be aware of the limitations of your design, I won't belabour
the point. As always, supplying DDL would have made things clearer.

So for your heretical structure:

create table Hierachy
(
NodeID int identity not NULL primary key,
ParentID int NULL -- Prefer that to using 0
)

Create table Data
(
NodeID int foreign key references Hierachy on delete cascade on update cascade,
Value varchar(200)
)

insert Hierachy ( ParentID ) values ( 0 )
insert Hierachy ( ParentID ) values ( 1 )
insert Hierachy ( ParentID ) values ( 1 )
insert Hierachy ( ParentID ) values ( 1 )
insert Hierachy ( ParentID ) values ( 2 )
insert Hierachy ( ParentID ) values ( 2 )

insert Data values ( 1, 'blah' )
insert Data values ( 2, 'blah' )
insert Data values ( 2, 'blah2' )
insert Data values ( 3, 'blah' )
insert Data values ( 4, 'blah' )
insert Data values ( 5, 'blah5' )
insert Data values ( 6, 'blah6' )

go
create procedure HierachyDelete ( @NodeID int )
as

declare @Nodes table
(
NodeID int
)

insert @Nodes
values ( @NodeID )

declare @Rows int
set @Rows = 1

While @Rows <> 0
Begin

insert @Nodes
select NodeID
from Hierachy
where ParentID in ( select NodeID from @Nodes )
and not NodeID in ( select NodeID from @Nodes )
-- Could do this with an inner join and an outer join, but this is more concise.

set @Rows = @@rowcount
End

Delete Hierachy
where NodeID in ( select NodeID from @Nodes )

return (0)
go

select *
from Hierachy h
inner join Data d
on d.NodeID = h.NodeID

exec HierachyDelete 2

select *
from Hierachy h
inner join Data d
on d.NodeID = h.NodeID


Note: This relies on delete cascading. If you don't want to do that,
then simply add another delete of Data before the delete of Hierachy

Regards
AJ

p.s. Recursion is NOT an elegant solution. Consider parent/child loops for example.


"ToeKnee" <thad...@hotmail.com> wrote in message news:99eab91a.04020...@posting.google.com...

Steve Kass

unread,
Feb 8, 2004, 1:36:11 AM2/8/04
to
Andrew,

Allow me to add a voice from the "recursion is elegant" corner. I'd
consider looking up the children of the root over and over again and
excluding them over and over again instead of adding them to @Nodes a
better example of "NOT elegant".

While SQL Server currently only supports 31 levels of recursion, it can
still be very useful, and if the Yukon Beta 1 features of recursive
queries and more levels of recursion make it into the final release,
recursion will become even more useful. In my experience, recursive
programs are easier to verify and maintain, too. Here's one recursive
solution:

Until that happens, if you're worried about more than 31 levels, either
don't use recursion, or guard against it (before you get here, but just
in case with

if @@nestlevel = 31 ...

before the recursive call


create procedure HierachyDelete ( @NodeID int )
as

return
-- create, then alter
-- takes care of sysdepends warning/issue
go

alter procedure HierachyDelete (
@NodeID int
) as
-- plan: Call HierachyDelete on the children,
-- then delete the root:

declare C cursor local dynamic for
select NodeID -- the children
from Hierachy
where ParentID = @NodeID

declare @Child int

open C
fetch next from C into @Child
while @@fetch_status = 0 begin
exec HierachyDelete @Child
fetch next from C into @Child
end

delete from Hierachy
where NodeID = @NodeID

close C deallocate C
return (0)
go

SK

Andrew John

unread,
Feb 8, 2004, 3:06:25 AM2/8/04
to
Steve,

"Steve Kass" <sk...@drew.edu> wrote in message news:uovPq3g7...@TK2MSFTNGP10.phx.gbl...


> Andrew,
>
> Allow me to add a voice from the "recursion is elegant" corner. I'd
> consider looking up the children of the root over and over again and
> excluding them over and over again instead of adding them to @Nodes a
> better example of "NOT elegant".
>

Well if you keep adding duplicates then the terminating condition is kaput.
Besides, it is a repetitive read from tempdb, so will probably be cached.

I'd still back this version for robustness, performance, and readability
against recursion and a cursor. Even yours.

Regards
AJ


Steve Kass

unread,
Feb 8, 2004, 3:48:49 AM2/8/04
to
Andrew,

Here's a repro for an ~8000-row table where the proc deletes around
2000 rows. I threw in a nonclustered index on ParentID and I corrected
my CURSOR to fast_forward (oops - gotta memorize those properties). On
my machine, your proc takes about 40 seconds, mine takes about 1.5
seconds. If you can get yours anywhere close, I can see if I can
optimize mine. Also let me know if something is wrong with the repro.

SK

Run the code in comments piece by piece once to populate the table each
proc will use. Then run the uncommented code once with my proc call
uncommented, and once with your proc call uncommented:

set nocount on
/*
-- run commented scripts once

create table HierarchyOrig


(
NodeID int identity not NULL primary key,
ParentID int NULL

)

insert into HierarchyOrig(ParentID)
select 0 from Northwind..[Order Details], Northwind..Region

update HierarchyOrig set
ParentID = (NodeID - 1) *
square(abs(binary_checksum(newid()))*1e0/power(2.,31))
*/
go
/*
-- run once
create procedure HierarchyAndrew ( @NodeID int )
as

declare @Nodes table
(
NodeID int
)

insert @Nodes
values ( @NodeID )

declare @Rows int
set @Rows = 1

While @Rows <> 0
Begin

insert @Nodes
select NodeID
from Hierarchy


where ParentID in ( select NodeID from @Nodes )
and not NodeID in ( select NodeID from @Nodes )
-- Could do this with an inner join and an outer join, but this is
more concise.

set @Rows = @@rowcount
End

Delete Hierarchy


where NodeID in ( select NodeID from @Nodes )

return (0)
*/
go

/*
create procedure HierarchySteve ( @NodeID int )


as
return
-- create, then alter
-- takes care of sysdepends warning/issue

*/
go
/*
alter procedure HierarchySteve (


@NodeID int
) as
-- plan: Call HierachyDelete on the children,
-- then delete the root:

declare C cursor local fast_forward for


select NodeID -- the children

from Hierarchy
where ParentID = @NodeID

declare @Child int

open C
fetch next from C into @Child
while @@fetch_status = 0 begin

exec HierarchySteve @Child


fetch next from C into @Child
end

delete from Hierarchy
where NodeID = @NodeID

close C deallocate C
return (0)

*/

go

select NodeID+0 NodeID, ParentID
into Hierarchy
from HierarchyOrig
create unique clustered index Hierarchy_ci on Hierarchy(NodeID)
create index Hierarchy_uci on Hierarchy(ParentID)

go

select getdate()
go
select count(*)
from Hierarchy h
go
select getdate()
go
--exec HierarchyAndrew 2
exec HierarchySteve 2
go
select getdate()
go

select count(*)
from Hierarchy h

go
drop table Hierarchy
-- drop proc HierarchyAndrew
-- drop proc HierarchySteve
-- drop table HierarchyOrig

Joe Celko

unread,
Feb 8, 2004, 4:10:18 PM2/8/04
to
>> I have Table1 that has a heirarchtical structure, and a second table
that holds information about the nodes in the heirarchy. I want to be
able to delete from anywhere in the heirarchy (through recursive stored
proc?) from Table1 and have all children removed and values from table
to that reference that node or the children nodes. <<

There are many ways to represent a tree or hierarchy in SQL. This is
called an adjacency list model and it looks like this:

CREATE TABLE OrgChart
(emp CHAR(10) NOT NULL PRIMARY KEY,
boss CHAR(10) DEFAULT NULL REFERENCES OrgChart(emp),
salary DECIMAL(6,2) NOT NULL DEFAULT 100.00);

OrgChart
emp boss salary
===========================
'Albert' 'NULL' 1000.00
'Bert' 'Albert' 900.00
'Chuck' 'Albert' 900.00
'Donna' 'Chuck' 800.00
'Eddie' 'Chuck' 700.00
'Fred' 'Chuck' 600.00

Another way of representing trees is to show them as nested sets.
Since SQL is a set oriented language, this is a better model than the
usual adjacency list approach you see in most text books. Let us define
a simple OrgChart table like this.

CREATE TABLE OrgChart
(emp CHAR(10) NOT NULL PRIMARY KEY,
lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
CONSTRAINT order_okay CHECK (lft < rgt) );

OrgChart
emp lft rgt
======================
'Albert' 1 12
'Bert' 2 3
'Chuck' 4 11
'Donna' 5 6
'Eddie' 7 8
'Fred' 9 10

The organizational chart would look like this as a directed graph:

Albert (1, 12)
/ \
/ \
Bert (2, 3) Chuck (4, 11)
/ | \
/ | \
/ | \
/ | \
Donna (5, 6) Eddie (7, 8) Fred (9, 10)

The adjacency list table is denormalized in several ways. We are
modeling both the Personnel and the organizational chart in one table.
But for the sake of saving space, pretend that the names are job titles
and that we have another table which describes the Personnel that hold
those positions.

Another problem with the adjacency list model is that the boss and
employee columns are the same kind of thing (i.e. names of personnel),
and therefore should be shown in only one column in a normalized table.
To prove that this is not normalized, assume that "Chuck" changes his
name to "Charles"; you have to change his name in both columns and
several places. The defining characteristic of a normalized table is
that you have one fact, one place, one time.

The final problem is that the adjacency list model does not model
subordination. Authority flows downhill in a hierarchy, but If I fire
Chuck, I disconnect all of his subordinates from Albert. There are
situations (i.e. water pipes) where this is true, but that is not the
expected situation in this case.

To show a tree as nested sets, replace the nodes with ovals, and then
nest subordinate ovals inside each other. The root will be the largest
oval and will contain every other node. The leaf nodes will be the
innermost ovals with nothing else inside them and the nesting will show
the hierarchical relationship. The (lft, rgt) columns (I cannot use the
reserved words LEFT and RIGHT in SQL) are what show the nesting. This is
like XML, HTML or parentheses.

At this point, the boss column is both redundant and denormalized, so it
can be dropped. Also, note that the tree structure can be kept in one
table and all the information about a node can be put in a second table
and they can be joined on employee number for queries.

To convert the graph into a nested sets model think of a little worm
crawling along the tree. The worm starts at the top, the root, makes a
complete trip around the tree. When he comes to a node, he puts a number
in the cell on the side that he is visiting and increments his counter.
Each node will get two numbers, one of the right side and one for the
left. Computer Science majors will recognize this as a modified preorder
tree traversal algorithm. Finally, drop the unneeded OrgChart.boss
column which used to represent the edges of a graph.

This has some predictable results that we can use for building queries.
The root is always (left = 1, right = 2 * (SELECT COUNT(*) FROM
TreeTable)); leaf nodes always have (left + 1 = right); subtrees are
defined by the BETWEEN predicate; etc. Here are two common queries which
can be used to build others:

1. An employee and all their Supervisors, no matter how deep the tree.

SELECT O2.*
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp = :myemployee;

2. The employee and all subordinates. There is a nice symmetry here.

SELECT O1.*
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O2.emp = :myemployee;

3. Add a GROUP BY and aggregate functions to these basic queries and you
have hierarchical reports. For example, the total salaries which each
employee controls:

SELECT O2.emp, SUM(S1.salary)
FROM OrgChart AS O1, OrgChart AS O2,
Salaries AS S1
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp = S1.emp
GROUP BY O2.emp;

4. To find the level of each emp, so you can print the tree as an
indented listing. Technically, you should declare a cursor to go with
the ORDER BY clause.

SELECT COUNT(O2.emp) AS indentation, O1.emp
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
GROUP BY O1.lft, O1.emp
ORDER BY O1.lft;

5. The nested set model has an implied ordering of siblings which the
adjacency list model does not. To insert a new node, G1, under part G.
We can insert one node at a time like this:

BEGIN ATOMIC
DECLARE rightmost_spread INTEGER;

SET rightmost_spread
= (SELECT rgt
FROM Frammis
WHERE part = 'G');
UPDATE Frammis
SET lft = CASE WHEN lft > rightmost_spread
THEN lft + 2
ELSE lft END,
rgt = CASE WHEN rgt >= rightmost_spread
THEN rgt + 2
ELSE rgt END
WHERE rgt >= rightmost_spread;

INSERT INTO Frammis (part, lft, rgt)
VALUES ('G1', rightmost_spread, (rightmost_spread + 1));
COMMIT WORK;
END;

The idea is to spread the (lft, rgt) numbers after the youngest child of
the parent, G in this case, over by two to make room for the new
addition, G1. This procedure will add the new node to the rightmost
child position, which helps to preserve the idea of an age order among
the siblings.

6. To convert a nested sets model into an adjacency list model:

SELECT B.emp AS boss, E.emp
FROM OrgChart AS E
LEFT OUTER JOIN
OrgChart AS B
ON B.lft
= (SELECT MAX(lft)
FROM OrgChart AS S
WHERE E.lft > S.lft
AND E.lft < S.rgt);

7. To convert an adjacency list to a nested set model, use a push down
stack. Here is version with a stack in SQL/PSM.

-- Tree holds the adjacency model
CREATE TABLE Tree
(node CHAR(10) NOT NULL,
parent CHAR(10));

-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
node CHAR(10) NOT NULL,
lft INTEGER,
rgt INTEGER);

CREATE PROCEDURE TreeTraversal ()
LANGUAGE SQL
DETERMINISTIC
BEGIN ATOMIC
DECLARE counter INTEGER;
DECLARE max_counter INTEGER;
DECLARE current_top INTEGER;

SET counter = 2;
SET max_counter = 2 * (SELECT COUNT(*) FROM Tree);
SET current_top = 1;

--clear the stack
DELETE FROM Stack;

-- push the root
INSERT INTO Stack
SELECT 1, node, 1, max_counter
FROM Tree
WHERE parent IS NULL;

-- delete rows from tree as they are used
DELETE FROM Tree WHERE parent IS NULL;

WHILE counter <= max_counter- 1
DO IF EXISTS (SELECT *
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top)
THEN BEGIN -- push when top has subordinates and set lft value
INSERT INTO Stack
SELECT (current_top + 1), MIN(T1.node), counter, NULL
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top;

-- delete rows from tree as they are used
DELETE FROM Tree
WHERE node = (SELECT node
FROM Stack
WHERE stack_top = current_top + 1);
-- housekeeping of stack pointers and counter
SET counter = counter + 1;
SET current_top = current_top + 1;
END;
ELSE
BEGIN -- pop the stack and set rgt value
UPDATE Stack
SET rgt = counter,
stack_top = -stack_top -- pops the stack
WHERE stack_top = current_top;
SET counter = counter + 1;
SET current_top = current_top - 1;
END;
END IF;
END WHILE;
-- SELECT node, lft, rgt FROM Stack;
-- the top column is not needed in the final answer
-- move stack contents to new tree table
END;

I have a book on TREES & HIERARCHIES IN SQL coming out in April 2004.

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Anith Sen

unread,
Feb 8, 2004, 4:46:25 PM2/8/04
to
Joe,

For small- and medium-sized datasets, the nested set method, specifically
your SQL code, seems to work reasonably well. However, parts of the
explanation which you carry around this solution, esp. regarding the
normalization issues with the Adjacency list table is somewhat off mark.

>> The adjacency list table is denormalized in several ways.<<

Generally people throw around the terms normalization & "denormalization" in
SQL newsgroups without going through sufficient analysis. Can you explain
which normal form does your Adjacency list OrgChart violate? If you believe
the table is "denormalized" in several ways, can you post at least one set
of equivalent non-loss projections of the table which you believe to be a
normalized schema?

The fact is that, it is normalized considering the obvious dependencies that
could possibly exist among the three attributes emp, boss & salary. The
predicate that more or less states, there exists an employee (emp), who has
a superior employee (boss) and draws an amount (salary) holds true for all
propositions (Except for the row representing Albert -- as much silly as it
is, to the logical model it is trivial one considers the string 'NULL'
represents the boss for Albert).

Here is why it is normalized, w.r.t the known dependencies & withstanding
the limitations of SQL tables & datatypes:

Given the fact the string value 'NULL' is not equivalent to NULL marker in
SQL, your adjacency list table has no duplicates, no row/column ordering and
has values from a conceptual set of scalars, which suggests the table
represents a relation and hence it can be considered to be in 1NF.

There is only declared key which is irreducible, (in this case made up of a
single column emp which makes other possible alternate keys reducible),
thereby eliminate any potential partial key dependencies and hence it is in
2NF.

There are no dependencies between the columns boss & salary and both these
attributes depend directly on the key column emp, i.e. no transitive
dependency which suggests it is in 3NF.

Also the column emp is the super key for other non-trivial dependencies,
i.e. there are no composite column value overlaps exists (which can be
validated with a composite key say the (emp, boss) columns) and hence this
is in BCNF.

Clearly no multi-determination exists, i.e. there are no sets of values in
one column are associated with sets of values in another column, which means
there is no multi-valued dependency and hence the table is in 4NF.

And, with a single column key it is obvious that there are sets of unique
columns in any pair-wise projection, which mean no intra-key join dependency
and thus is in 5NF.

As such, there is no specification of capturing temporal information
(including bi-temporal or semi-temporal) that needs to be represented as
attribute values and thus 6NF can be conveniently ignored in this case.

And thus the table, for the entity type it represents, is normalized with
respect to the known dependencies among the existing attributes.

>> Another problem with the adjacency list model is that the boss and
employee columns are the same kind of thing (i.e. names of personnel), and
therefore should be shown in only one column in a normalized table. <<

I challenge you to point to any sound theoretical treatment that exists to
support, even remotely, this claim.

In a precisely modeled relational system, you could have a domain which
represents the "name of personnel" values (and of course, with associated
operations applicable to those values) with a possible representation of
CHAR(10). Thus you'd have two columns with values drawn from the same domain
and this representation violates no known normalization principles.

>> To prove that this is not normalized, assume that "Chuck" changes his
name to "Charles"; you have to change his name in both columns and several
places. <<

By this assessment, any table that has a foreign key reference or any schema
which require cascading updates/deletes will be "not normalized", right :-)
This is not correct; in fact it has nothing to do with normalization. The
redundancies that occur due to certain type of "intra"-column dependencies
within the logical model only are addressed in normalization. There are
other design principles, like New Design Priciple (predicate uniqueness),
Orthogonal Principle etc, which address certain other kinds of redundancies.

>> The defining characteristic of a normalized table is that you have one
fact, one place, one time. <<

One fact, one place, one time - Isn't that what you have in your table? To
the DBMS, 'Albert' and 'Chuck', by themselves are *values*. In this case,
each is simply a property value of a fact that asserts a well defined
proposition which I mentioned earlier. This so-called redundancy which you
claim is nothing but the dependency between different tables - references --
that exist by virtue of necessity in the conceptual model.

The defining characteristic of normalization is not "one fact, one place,
one time", but that all non key columns in a table depend directly on the
whole key and nothing else.

Since many among us use this post of yours as a reference, especially for
nested set model, it would be helpful if these matters are addressed, at
least in the subsequent postings.

--
- Anith
( Please reply to newsgroups only )


Steve Kass

unread,
Feb 8, 2004, 5:48:43 PM2/8/04
to
Anith,

I think it's pointless to discuss normal forms for a relation like
this. The attributes lft and rgt do not represent anything in the
real-world model, and what attributes mean is at the heart of
normalization theory. For this to be a model we can even start talking
about, we need to have a way of looking at Fred and determining the
values of lft and rgt associated with Fred. What questions can you ask
Fred, and what can you measure about Fred that will allow you to
determine lft and rgt? There are only two potential questions that have
anything to do with what's being modeled: "Who is your boss?" and "Who
are you the boss of?". We assume Fred can describe these employees
uniquely, yet we still have no clue whatsoever what Fred's lft and rgt
values are.

Another way of looking at it is this: If you walk into a room with an
OrgChart table and so do I, how do we decide whether we are looking at
the same organizational chart? In the nested set model the same
organizational chart can be represented by more than one subset of [Emp
domain] X [lft domain] X [rgt domain].

There may not be a name in the literature for this sort of anomaly,
but simply because the literature is based on the assumption that every
value in every relation represents an attribute in the real world.
[lft] and [rgt] just don't.

This is not to say that representing hierarchies this way is
ridiculous. In fact, it's mathematically sound - just not with respect
to any relational model I'm aware of.

I will admit that this could be made somewhat better if anyone came up
with a way to close all gaps so that at any time the lft and rgt values
had no missing numbers. We would also have to either assume that Donna,
Eddie, and Fred were ordered according to some real-world attribute or
else give them the same [lft,rgt] values, whichever is correct for the
business model. Then we could start talking about normalization. And
we'd probably have plenty of time, since we'd be looking at very
inefficient algorithms where a single change could require every row of
the table to be updated.

SK

Joe Celko

unread,
Feb 8, 2004, 6:05:25 PM2/8/04
to
>> Can you explain which normal form does your Adjacency list OrgChart
violate? <<

Domain-Key Normal Form ("A relation is said to be in DKNF if all
constraints and dependencies that should hold on the relation can be
enforced simply by enforcing the domain constraints and the key
constraints specified on the relation." -- Elmasri and Navathe).

The test I use for this is that no redundancies exist so that no insert,
update or deletion Anomalies exists for a simple fact. If one of these
actions destroys a fact or creates a falsehood, then my schema is not
normalized ("one fact, one time, one place", to quote myself).

The problem with the adjacency list modeled is that it mixes a
relationship (the org chart; tree structure) with the entities (the
personnel; nodes) in the same. Furthermore, the relationship is not the
right one we need for a hierarchy (inheritance of subordination).

Using my little tree:

1) If you update "Chuck" to "Charles", this fact is redundantly store in
two columns (boss and emp) and four rows (Donna, Eddie, Fred as his
subordinates and Chuck as employee). That demonstrates an Update
Anomaly, which has to be handled with several actions. Since Chuck is a
unique entity and he appears only once in the nested sets model, I have
used a key.

2) Let's delete Chuck. But this leaves (Donna, Eddie, Fred) either in a
separate tree with a non-existent root, or we have to delete all of the
subtree rooted at Chuck. That destroys the fact that they all still
report to Albert. However, I preserve subordination with the (lft, rgt)
key in the nested sets model. That demonstrates a Deletion Anomaly,
which has to be handled with relationship rules that are not even in the
Adjacency List model.

3) There are no Insertion Anomalies. In fact, the speed and ease of
insertion are the strong points of the Adjacency List model.

Here is where you might get me. I have to update the entire structure in
the nested sets model for insertion, updating or deletion. My
counter-argument is that the org-chart table is a relationship made up
of many facts and that I am replacing that old relationship with a new
one. This is a total replacement, not modification. This is also why I
want to put the hierarchies on the same set of nodes in separate tables.

Or to put it another way, if I can believe that CHAR(n) is a different
datatype than CHAR(m), then I can believe an (n)-node tree is a
different relationship than an (m)-node tree.

The final test is given two rows (and only those two rows), can I tell
if the relationship modeled in the table between them holds? In the
nested set model, I know immediately if A is the superior, subordinate
or sibling of B -- the hierarchical relationship. Subordination is not
modeled at all in the adjacency list model. You have to compute it from
the entire table.

I also agree that having NULL as a Boss stinks. The attribute of "has a
boss" does not exist for the root node and the Adjacency List Model is
"faking it" -- I would be happier if we had Codd's "missing attribute"
null from RM-V2 instead. Not much happier, tho.

I believe that you would have to provide the transitive closure for a
loss-less Adjacency List model and a lot of people do that -- the path
enumeration model that Ben-Gan and Moreau use in their books.

Steve Kass

unread,
Feb 8, 2004, 6:22:26 PM2/8/04
to
Oops - I guess you were talking about a different table. So feel free
to offer what you think of the normalization situation (or not!) for the
tree...

SK

Steve Kass

unread,
Feb 8, 2004, 6:37:24 PM2/8/04
to

Joe Celko wrote:

>>>Can you explain which normal form does your Adjacency list OrgChart
>>>
>>>
>violate? <<
>
>Domain-Key Normal Form ("A relation is said to be in DKNF if all
>constraints and dependencies that should hold on the relation can be
>enforced simply by enforcing the domain constraints and the key
>constraints specified on the relation." -- Elmasri and Navathe).
>
>The test I use for this is that no redundancies exist so that no insert,
>update or deletion Anomalies exists for a simple fact. If one of these
>actions destroys a fact or creates a falsehood, then my schema is not
>normalized ("one fact, one time, one place", to quote myself).
>
>The problem with the adjacency list modeled is that it mixes a
>relationship (the org chart; tree structure) with the entities (the
>personnel; nodes) in the same. Furthermore, the relationship is not the
>right one we need for a hierarchy (inheritance of subordination).
>
>Using my little tree:
>
>1) If you update "Chuck" to "Charles", this fact is redundantly store in
>two columns (boss and emp) and four rows (Donna, Eddie, Fred as his
>subordinates and Chuck as employee). That demonstrates an Update
>Anomaly, which has to be handled with several actions. Since Chuck is a
>unique entity and he appears only once in the nested sets model, I have
>used a key.
>
>

That's not redundant. Keys are exactly the things that are supposed to
be stored in more than one location, and I don't think self-referential
tables are prohibited by the mathematical model.

>2) Let's delete Chuck. But this leaves (Donna, Eddie, Fred) either in a
>separate tree with a non-existent root, or we have to delete all of the
>subtree rooted at Chuck. That destroys the fact that they all still
>report to Albert. However, I preserve subordination with the (lft, rgt)
>key in the nested sets model. That demonstrates a Deletion Anomaly,
>which has to be handled with relationship rules that are not even in the
>Adjacency List model.
>
>
>

Whether or not it's an anomaly depends only on what you are modeling.
If you are only modeling the employee_boss relation, it's not an
anomaly. If you have a parent-child relation modeled for a school
district, do you call it a deletion anomaly if when you delete an adult,
the table doesn't automatically show the relationship between children
and their grandparents? If you are modeling the entire subordination
structure, then you could store the transitive closure, I suppose, but
nowhere in you statement of the physical model did I see where if a
person leaves all his or her subordinates automatically are reassigned
to his or her boss. Don't fault a model for not modeling what it
doesn't model.

>3) There are no Insertion Anomalies. In fact, the speed and ease of
>insertion are the strong points of the Adjacency List model.
>
>Here is where you might get me. I have to update the entire structure in
>the nested sets model for insertion, updating or deletion. My
>counter-argument is that the org-chart table is a relationship made up
>of many facts and that I am replacing that old relationship with a new
>one. This is a total replacement, not modification. This is also why I
>want to put the hierarchies on the same set of nodes in separate tables.
>
>Or to put it another way, if I can believe that CHAR(n) is a different
>datatype than CHAR(m), then I can believe an (n)-node tree is a
>different relationship than an (m)-node tree.
>
>

Huh? Of course it is. Are you trying to tell me that whether to
consider two sets with different cardinalities different or not is a
question of belief?

>The final test is given two rows (and only those two rows), can I tell
>if the relationship modeled in the table between them holds? In the
>nested set model, I know immediately if A is the superior, subordinate
>or sibling of B -- the hierarchical relationship. Subordination is not
>modeled at all in the adjacency list model. You have to compute it from
>the entire table.
>
>

Now you said it yourself - subordination isn't modeled in the adjacency
list model. So don't claim that model has an anomaly because of what it
doesn't model!

>I also agree that having NULL as a Boss stinks. The attribute of "has a
>boss" does not exist for the root node and the Adjacency List Model is
>"faking it" -- I would be happier if we had Codd's "missing attribute"
>null from RM-V2 instead. Not much happier, tho.
>
>

I think Anith may have been referring to the fact that your model
contains both 'NULL' and NULL, and has for years. Is that intended?

>I believe that you would have to provide the transitive closure for a
>loss-less Adjacency List model and a lot of people do that -- the path
>enumeration model that Ben-Gan and Moreau use in their books.
>
>

Adjacency models that store the path usually don't store the transitive
closure, but they do model subordination, since you can check if one
person's path is a prefix of another's.

Anith Sen

unread,
Feb 8, 2004, 11:25:08 PM2/8/04
to
>> Domain-Key Normal Form (.... -- Elmasri and Navathe). <<

Um...Elmasri and Navathe :-) Do people still read them?

Actually I do not have the Elmasri & Navathe book, but the explanations by
inventor Ron Fagin, seems to be clear on the topic. Based on his paper, if
every constraint is a logical consequence of domain constraints and key
constraints that apply to a table, then the table is said to be in DKNF.
http://www.almaden.ibm.com/cs/people/fagin/tods81.pdf

This means, for a 1NF table to be in DKNF, it is sufficient to enforce the
domain and key constraints properly and all other constraints (this includes
all functional, multivalued & join dependencies) will then be automatically
enforced. Given the poor domain support in SQL, I am not sure if this is
even achievable for a table. But the redundancy you suggest is inherent to
the conceptual model and I fail to see how DKNF would have removed it.

>> The problem with the adjacency list modeled is that it mixes a
relationship (the org chart; tree structure) with the entities (the
personnel; nodes) in the same. <<

These are arbitrary differences and to the DBMS all it matters is the
proposition represented as the tuple. There are no precise definitions for
relationship or entity. One's entity can be someone else's relationship. For
example, employee-employer relationship can be viewed as an employment
entity or an emp-boss relationship can be viewed as a subordination entity.

>> My counter-argument is that the org-chart table is a relationship made up
of many facts and that I am replacing that old relationship with a new one.
<<

That is exactly the point. So all it matters is whether the DBMS can change
a tuple without affecting the represented relationship or not.

>> I also agree that having NULL as a Boss stinks. The attribute of "has a
boss" does not exist for the root node and the Adjacency List Model is
"faking it" <<

As long as there exists no predicate which constrains ( emp <> boss ), you
can have the row ( 'Albert', 'Albert', 1000.00 ). But I understand it
requires changes stack approach (#7) for adjacency list to nested set
conversion.

>> I believe that you would have to provide the transitive closure for a
loss-less Adjacency List model and a lot of people do that <<

Good point. That is exactly what is lacking, a SQL equivalent to relational
TCLOSE operator. Another approach, though not exactly for the same problem,
but which is often recommended is an EXPLODE operator which can be used for
the part-explosion (say BOM explosion) problem like:

SELECT *
FROM EXPLODE (<tbl_exp>)
WHERE ...

Now with the new WITH <cte> ( common table expression ) option coming up
with Yukon, it could be a different scenario with t-SQL. You will likely end
up with more proprietary code bashing though!

Anith Sen

unread,
Feb 8, 2004, 11:27:55 PM2/8/04
to
Yeah, I was referring to Joe's comments about the table that represents the
adjacency list.

Louis Davidson

unread,
Feb 8, 2004, 11:56:44 PM2/8/04
to
> all functional, multivalued & join dependencies) will then be
automatically
> enforced. Given the poor domain support in SQL, I am not sure if this is
> even achievable for a table. But the redundancy you suggest is inherent to
> the conceptual model and I fail to see how DKNF would have removed it.

What types of domains cannot be achieved in SQL Server (I am sure you meant
SQL Server, right?)

--
----------------------------------------------------------------------------
-----------
Louis Davidson (dr...@hotmail.com)
Compass Technology Management

Pro SQL Server 2000 Database Design
http://www.apress.com/book/bookDisplay.html?bID=266

Note: Please reply to the newsgroups only unless you are
interested in consulting services. All other replies will be ignored :)


Steve Kass

unread,
Feb 9, 2004, 12:17:12 AM2/9/04
to

Louis Davidson wrote:

> > all functional, multivalued & join dependencies) will then be
>automatically
>
>
>>enforced. Given the poor domain support in SQL, I am not sure if this is
>>even achievable for a table. But the redundancy you suggest is inherent to
>>the conceptual model and I fail to see how DKNF would have removed it.
>>
>>
>
>What types of domains cannot be achieved in SQL Server (I am sure you meant
>SQL Server, right?)
>
>
>

I'll guess Anith meant what he typed (I'm hoping to get it right once
tonight). Like many languages, SQL leaves much up to the
implementation, such as the range of values for INTEGER. Languages with
better domain support (LISP, I think, is an example), don't allow those
kinds of limitations tied to machine word size. A LISP integer can be
specified in BNF syntax, the only machine limitation being that if the
machine runs out of memory or storage, boom, but it's not called integer
overflow. A SQL integer cannot be specified in BNF syntax, or at least
not in any compact form using decimal representations. Languages with
worse domain support are the weakly-typed or non-typed languages, like
BASIC (which started with types string and not-string for all practical
purposes), and somewhere in between SQL and LISP are languages like C,
that allow implementations to decide some things, but require them to
expose their limitations. Someone will correct me if I'm wrong, but I
don't think SQL requires any information like MAXINT or MAXDATE to
exist. No MAXDATE is especially annoying, since just trying to increase
the precision from 1/300 of a second to 1/1000 would add a new value,
9999-12-31T23:59:59.999 to the domain, which doesn't exist now.


SK

Andrew John

unread,
Feb 9, 2004, 4:36:10 AM2/9/04
to
Steve,

Whilest crying foul on your choice of test data
( no levels > 32, no loops - Yes I know that was the scenario, but still a lovely
population script to make sure your code didn't break ;)

Using the lowest of 4 runs for each procedure I have found the following:

Steve 670 msec
Andrew1 2470 msec

This is straight from your repro script, after I corrected for my spelling mistakes,
dropping hiearchy table between each run.

Doing a bit of optimising brought this down to:

Andrew2 1660 msec ( Added a primary key to temp table )
Andrew3 106 msec ( Different joins, still with PK )

And being ridiculous (unless very few levels)
Andrew4 1513 msec ( Lots of left outer joins, no temp table )

So over to you, dear sir.

p.s. Anyone else - please feel free to join in on my side - This is Steve Kass after all !


Regards
AJ


Test Script, after using your population one.

select NodeID+0 NodeID, ParentID
into Hierarchy
from HierarchyOrig
create unique clustered index Hierarchy_ci on Hierarchy(NodeID)
create index Hierarchy_uci on Hierarchy(ParentID)

Declare @Start datetime
set @Start = getdate()

select count(*)
from Hierarchy h

--exec HierarchySteve 2
exec HierarchyAndrew4 2

select datediff( ms, @Start, GetDate()) as Duration

select count(*)
from Hierarchy h

go
drop table Hierarchy


Procs I was using:

create procedure HierarchyAndrew1 ( @NodeID int )
as

declare @Nodes table
(
NodeID int
)

insert @Nodes
values ( @NodeID )

declare @Rows int
set @Rows = 1

While @Rows <> 0
Begin

insert @Nodes
select NodeID
from Hierarchy
where ParentID in ( select NodeID from @Nodes )
and not NodeID in ( select NodeID from @Nodes )
-- Could do this with an inner join and an outer join, but this is more concise.

set @Rows = @@rowcount
End

Delete Hierarchy
where NodeID in ( select NodeID from @Nodes )

return (0)

go

create procedure HierarchyAndrew2 ( @NodeID int )
as

declare @Nodes table
(
NodeID int primary key
)

insert @Nodes
values ( @NodeID )

declare @Rows int
set @Rows = 1

While @Rows <> 0
Begin

insert @Nodes
select NodeID
from Hierarchy
where ParentID in ( select NodeID from @Nodes )
and not NodeID in ( select NodeID from @Nodes )

set @Rows = @@rowcount
End

Delete Hierarchy
where NodeID in ( select NodeID from @Nodes )

return (0)

go

create procedure HierarchyAndrew3 ( @NodeID int )
as

declare @Nodes table
(
NodeID int primary key
)

insert @Nodes
values ( @NodeID )

declare @Rows int
set @Rows = 1

While @Rows <> 0
Begin

insert @Nodes
select h.NodeID
from Hierarchy h
inner join @Nodes n
on h.ParentID = n.NodeID
left outer join @Nodes n2
on n2.NodeID = h.NodeID
where n2.NodeID is NULL

set @Rows = @@rowcount
End

Delete Hierarchy
where NodeID in ( select NodeID from @Nodes )

return (0)

GO


create procedure HierarchyAndrew4 ( @NodeID int )
as

declare @Nodes table
(
NodeID int primary key
)

Delete h01
from Hierarchy h01
left outer join Hierarchy h02 on h02.NodeID = h01.ParentID
left outer join Hierarchy h03 on h03.NodeID = h02.ParentID
left outer join Hierarchy h04 on h04.NodeID = h03.ParentID
left outer join Hierarchy h05 on h05.NodeID = h04.ParentID
left outer join Hierarchy h06 on h06.NodeID = h05.ParentID
left outer join Hierarchy h07 on h07.NodeID = h06.ParentID
left outer join Hierarchy h08 on h08.NodeID = h07.ParentID
left outer join Hierarchy h09 on h09.NodeID = h08.ParentID
left outer join Hierarchy h10 on h10.NodeID = h09.ParentID
left outer join Hierarchy h11 on h11.NodeID = h10.ParentID
left outer join Hierarchy h12 on h12.NodeID = h11.ParentID
left outer join Hierarchy h13 on h13.NodeID = h12.ParentID
left outer join Hierarchy h14 on h14.NodeID = h13.ParentID
left outer join Hierarchy h15 on h15.NodeID = h14.ParentID
left outer join Hierarchy h16 on h16.NodeID = h15.ParentID
left outer join Hierarchy h17 on h17.NodeID = h16.ParentID
left outer join Hierarchy h18 on h18.NodeID = h17.ParentID
left outer join Hierarchy h19 on h19.NodeID = h18.ParentID
left outer join Hierarchy h20 on h20.NodeID = h19.ParentID
left outer join Hierarchy h21 on h21.NodeID = h20.ParentID
left outer join Hierarchy h22 on h22.NodeID = h21.ParentID
left outer join Hierarchy h23 on h23.NodeID = h22.ParentID
left outer join Hierarchy h24 on h24.NodeID = h23.ParentID
left outer join Hierarchy h25 on h25.NodeID = h24.ParentID
left outer join Hierarchy h26 on h26.NodeID = h25.ParentID
left outer join Hierarchy h27 on h27.NodeID = h26.ParentID
left outer join Hierarchy h28 on h28.NodeID = h27.ParentID
left outer join Hierarchy h29 on h29.NodeID = h28.ParentID
left outer join Hierarchy h30 on h30.NodeID = h29.ParentID
left outer join Hierarchy h31 on h31.NodeID = h30.ParentID
left outer join Hierarchy h32 on h32.NodeID = h31.ParentID
where h01.NodeID = @NodeID
or h01.ParentID = @NodeID
or h02.ParentID = @NodeID
or h03.ParentID = @NodeID
or h04.ParentID = @NodeID
or h05.ParentID = @NodeID
or h06.ParentID = @NodeID
or h07.ParentID = @NodeID
or h08.ParentID = @NodeID
or h09.ParentID = @NodeID
or h10.ParentID = @NodeID
or h11.ParentID = @NodeID
or h12.ParentID = @NodeID
or h13.ParentID = @NodeID
or h14.ParentID = @NodeID
or h15.ParentID = @NodeID
or h16.ParentID = @NodeID
or h17.ParentID = @NodeID
or h18.ParentID = @NodeID
or h19.ParentID = @NodeID
or h20.ParentID = @NodeID
or h21.ParentID = @NodeID
or h22.ParentID = @NodeID
or h23.ParentID = @NodeID
or h24.ParentID = @NodeID
or h25.ParentID = @NodeID
or h26.ParentID = @NodeID
or h27.ParentID = @NodeID
or h28.ParentID = @NodeID
or h29.ParentID = @NodeID
or h30.ParentID = @NodeID
or h31.ParentID = @NodeID
or h32.ParentID = @NodeID

return (0)

Louis Davidson

unread,
Feb 9, 2004, 10:30:29 AM2/9/04
to
If that is what he meant, then I completely agree. The way I was taking it
was that there would be domains that could not be enforced using SQL Server.
The thing I still don't understand is why DKNF would be limited because of
the type support. There are very few values that cannot be represented, and
other than the blob and text type datatypes, any value or combination of
values can be enforced.

--
----------------------------------------------------------------------------
-----------
Louis Davidson (dr...@hotmail.com)
Compass Technology Management

Pro SQL Server 2000 Database Design
http://www.apress.com/book/bookDisplay.html?bID=266

Note: Please reply to the newsgroups only unless you are
interested in consulting services. All other replies will be ignored :)

"Steve Kass" <sk...@drew.edu> wrote in message
news:O4ZnRws7...@TK2MSFTNGP12.phx.gbl...

Anith Sen

unread,
Feb 9, 2004, 1:21:19 PM2/9/04
to
I think I have to clarify a couple of things. First the mention of type
support was rather considering the language SQL.

>> The thing I still don't understand is why DKNF would be limited because
of the type support. <<

C.J.Date has clarified DKNF well enough to state that, for a table to be in
DKNF, all that is required is to enforce *all* applicable domain & key
constraints. Fagin's paper shows that any table in DKNF is also in 5NF,
however in practice, it is possible to have 5NF tables which are not in
DKNF. Date has also provided examples of certain constraints for such tables
that cannot be achieved as a logical consequence of any of the applicable
constraints on the attribute values or key columns, for instance,
: ( The number of rows in a table <= 20 )

It is also important to note that DKNF is simply a normalization approach
that does not fall in the progression from 1NF->5NF. And thus the
traditional decompositions based on functional/multi-valued/join
dependencies are not directly applicable to achieve DKNF. Perhaps someone
else can show if there are any additional researches on the topic (including
the ones by Fagin) which proves whether DKNF is always achievable or when it
is possible.

My point regarding the Adjacency list was that, Joe, in his explanation,
seemed to include the redundancy inherent to the problem (emp table,
references, etc.) with the addressable issues in the logical model and
called the table "denormalized". Since all functional/multi-valued/join
dependencies are addressed and no existing literature states DKNF was always
achievable, I pointed out that it is a well-normalized table. Hope this
clears the issue.

>> From your prev. post -- What types of domains cannot be achieved in SQL
[ not about SQL Server ] <<

A better question would be what domains (types) can be achieved in SQL? The
answer, would be the only the built-in types. I don't have a handy doc, but
Joe or Steve can provide the exact details on how SQL-92 defines them.

Due to its legacy, even the SQL datatypes (and the so called SQL-92 domains)
lacks specificity, clear distinction of internal encoding vs. possible
representation, detailed type constraint specs, user-defined operator
declarations, complex type representation etc. The BNF syntax that Steve
mentions is mostly a formal notation, not sure if it comes under the
consideration of specificity.

There are other type systems that are well defined, like the Milner system
(but not well-known in the arena of data management) and Tutorial-D
(currently receiving lot of attention from data professionals &
dilettantes). If interested, you can get The Third Manifesto or for D & D's
recommendations, considered by some as the current state of the art, for a
relational system, refer to:
http://www.hughdarwen.freeola.com/TheThirdManifesto.web/CHAP03.pdf

--
Anith


Steve Kass

unread,
Feb 9, 2004, 11:43:27 PM2/9/04
to
Andrew,

Here's a recursive solution that's about 30-40% faster than your #3.
I lose "elegant" points, but at least some of the clumsiness will go
away in Yukon if varbinary(max) is there, and it should be faster as
well. Below it is another one that's about 50% faster than yours, but
it uses what some people would call a cheat (undocumented behavior that
is not guaranteed).

create procedure HierarchySteve (
@NodeIDs varbinary(8000)
) as

declare @N int
set @N = datalength(@NodeIDs) / 4
if @N = 0 return

declare @b varbinary(8000)
set @b = 0x
declare @ct int
set @ct = 0

declare C cursor local fast_forward for
select NodeID -- the children

from Hierarchy, Integers
where I < @N
and ParentID = substring(@NodeIDs,4*I+1,4)

declare @Child binary(4)

open C
fetch next from C into @Child
while @@fetch_status = 0 begin

set @b = @b + @Child
set @ct = @ct + 1


fetch next from C into @Child

if @ct = 2000 begin
exec HierarchySteve @b
set @b = 0x
set @ct = 0
end
end
exec HierarchySteve @b

delete from Hierarchy
from Integers
where I < datalength(@NodeIDs)/4
and NodeID = substring(@NodeIDs,4*I+1,4)

close C deallocate C
return (0)

This one's the cheat, but it's very fast:


create procedure HierarchySteveCheat (
@NodeIDs varbinary(8000)
) as

declare @N int
set @N = datalength(@NodeIDs) / 4
if @N = 0 return

set rowcount 2000

declare @b varbinary(8000)
select @b = 0x

while 1 = 1 begin
update Hierarchy set
@b = @b + cast(NodeID as binary(4))
from Hierarchy, Integers
where I < @N
and ParentID = substring(@NodeIDs,4*I+1,4)

if @@rowcount = 0 break

exec HierarchySteve @b
set @b = 0x
end

delete from Hierarchy
from Integers
where I < @N
and NodeID = substring(@NodeIDs,4*I+1,4)

return (0)


SK

Steve Kass

unread,
Feb 10, 2004, 2:49:10 AM2/10/04
to
Here's another fast recursive solution for you, with some slippery use
of temp table scoping:


create procedure HierarchySteve3 (
@NodeID int = null,
@x int = 0
) as

if @x = 0 begin
create table #(
uid int,
NodeID int,
primary key(uid, NodeID)
)
insert into #
select @x uid, @NodeID NodeID

end

declare @u int
set @u = @x + 1

insert into #(uid, NodeID)
select @u, Hierarchy.NodeID
from Hierarchy, #
where Hierarchy.ParentID = #.NodeID
and uid = @x

if @@rowcount = 0 return

exec HierarchySteve3 @x = @u

delete from Hierarchy
from #
where uid = @x
and #.NodeID = Hierarchy.NodeID

SK

Andrew John

unread,
Feb 10, 2004, 4:25:49 AM2/10/04
to
Hats off to the master

Regards
AJ

"Steve Kass" <sk...@drew.edu> wrote in message news:eGeH1p67...@TK2MSFTNGP10.phx.gbl...

0 new messages