Help with ROW_NUMBER and recursive query

Farmer

unread,

Oct 5, 2009, 2:35:23 PM10/5/09

to

Hi.

Thank you for your help.

I am expecting N column to be numbered as 1, 2, 3, 4 in the result set.

It is not that way.

I suspect it due to Nested Loop operator but I would expect it to work on the whole set, not each row.

Can anyone see how to make ROW_NUMBER apply to the whole level set, considering such table structure?

DECLARE

@t TABLE (itmID int, itmIDComp int)

INSERT

@t VALUES (1,10), (2,10)

DECLARE

@t2 TABLE (itmID int, itmIDComp int)

INSERT

@t2 VALUES (3,10), (4,10)

;

WITH r AS

(

SELECT

t

.itmID as itmIDComp

, NULL as itmID

,cast(0 as bigint) as N

FROM (VALUES (1), (2), (3), (4)) as t (itmID)

UNION ALL

SELECT

i

.itmIDComp

,i.itmID

,ROW_NUMBER() OVER(PARTITION BY i.itmIDComp ORDER BY getdate()) N

FROM r

CROSS APPLY

(

SELECT *

FROM

(

SELECT

t

.itmIDComp

,t.itmID

FROM @t t

WHERE t.itmID = r.itmIDComp

UNION ALL

SELECT

t

.itmIDComp

,t.itmID

FROM @t2 t

WHERE t.itmID = r.itmIDComp

)v

) i

)

SELECT

*

FROM

r

--

;

WITH r AS

(

SELECT

t

.itmID as itmIDComp

, NULL as itmID

,cast(0 as bigint) as N

FROM (VALUES (1), (2), (3), (4)) as t (itmID)

UNION ALL

SELECT

i

.itmIDComp

,i.itmID

,i.N

FROM r

CROSS APPLY

(

SELECT *

,ROW_NUMBER() OVER(PARTITION BY v.itmIDComp ORDER BY getdate()) N

FROM

(

SELECT

t

.itmIDComp

,t.itmID

FROM @t t

WHERE t.itmID = r.itmIDComp

UNION ALL

SELECT

t

.itmIDComp

,t.itmID

FROM @t2 t

WHERE t.itmID = r.itmIDComp

)v

) i

)

SELECT

*

FROM

r

itmIDComp itmID N
1     NULL     0
2     NULL     0
3     NULL     0
4     NULL     0
10     4     1
10     3     1
10     2     1
10     1     1

--CELKO--

unread,

Oct 5, 2009, 2:53:23 PM10/5/09

to

"A problem well stated is a problem half solved." -- Charles F.
Kettering

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. If you know how, follow ISO-11179 data element naming
conventions and formatting rules. Temporal data should use ISO-8601
formats. Code should be in Standard SQL as much as possible and not
local dialect.

Sample data is also a good idea, along with clear specifications. It
is very hard to debug code when you do not let us see it. If you want
to learn how to ask a question on a Newsgroup, look at:
http://www.catb.org/~esr/faqs/smart-questions.html

It looks like you are trying to get the level numbers of a hierarchy
of some unknown kind of things. If so, use the Nested Sets model and
write this in one simple query. And get a copy of TREES & HIERARCHIES
IN SQL.

TheSQLGuru

unread,

Oct 5, 2009, 3:18:55 PM10/5/09

to

I don't have sql 2008 so cannot run your example code. But perhaps you just need an order by on your final select? Unless you have an explicit order by on output there is NO ordering in SQL Server and any you see is just because, not BECAUSE. :-)

--
Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net

"Farmer" <som...@somewhere.com> wrote in message news:DB892043-8C8A-48FD...@microsoft.com...

Farmer

unread,

Oct 5, 2009, 3:27:30 PM10/5/09

to

I need this record set numbering at each iteration level. I may have 10 levels for example.

For the sake of this issue I don't care about the order in which ROW_NUMBER sequences rows, therefore getdate() is used (SELECT 1) can also be used.

I expect ROW_NUMBER() to sequence rows in any order.

Result set I expect is like this

itmIDComp itmID N
1     NULL     0
2     NULL     0
3     NULL     0
4     NULL     0

10     4     4
10     3     3
10     2     2
10     1     1

this will do as well

itmIDComp itmID N
1     NULL     0
2     NULL     0
3     NULL     0
4     NULL     0
10     4     1

10     3     2
10     2     3
10     1     4

"TheSQLGuru" <kgb...@earthlink.net> wrote in message news:99ednTpkGaVn2FfX...@earthlink.com...

--CELKO--

unread,

Oct 5, 2009, 3:41:55 PM10/5/09

to

>> I need this record set numbering at each iteration level. I may have 10 levels for example. For the sake of this issue I don't care about the order in which ROW_NUMBER sequences rows, therefore CURRENT_DATE [sic: getdate() is proprietary and needlessly so] is used (SELECT 1) can also be used. I expect ROW_NUMBER() to sequence rows in any order. <<

AGAIN, please post DDL and specs that we can code from. This posting
gives us nothing. Hell, we don't even know that name of the tables!

This code is totally confused -- nesting, ordering by a function call,
recursion, etc.

Farmer

unread,

Oct 5, 2009, 3:54:03 PM10/5/09

to

Joe

All DLL needed for someone to understand what I am trying to do is posted.
this code runs on SQL 2008. With very minor change it can run on SQL 2005.
As you well know, SQL is not about any names of tables but about sets. What
my real table names are is irrelevant. I don't think you need any fuel for
ranting! I donn't want you to go off ranting about Employee versus Personnel
names; you know what I mean? :)

I have 3 sets, @t and @t2 and ancor set. I posted the result set I expect.

"--CELKO--" <jcel...@earthlink.net> wrote in message
news:fc19798e-7381-4700...@k41g2000vbt.googlegroups.com...

Farmer

unread,

Oct 5, 2009, 4:55:17 PM10/5/09

to

Here is slight adjustment that should work on SQL 2005.

Also, should row number not act on vw data set as a whole?

it looks more and more like a bug to me.

DECLARE

@t TABLE (itmID int, itmIDComp int)

INSERT

@t VALUES (1,10), (2,10)

DECLARE

@t2 TABLE (itmID int, itmIDComp int)

INSERT

@t2 VALUES (3,10), (4,10)

;

WITH vw AS

(

SELECT

t

.itmIDComp

,t.itmID

FROM @t t

UNION ALL

SELECT

t

.itmIDComp

,t.itmID

FROM @t2 t

)

,

r AS

(

SELECT

t

.itmID as itmIDComp

, NULL as itmID

,cast(0 as bigint) as N

,1 as Lvl

FROM (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) as t (itmID)

UNION ALL

SELECT

t

.itmIDComp

,t.itmID

,ROW_NUMBER() OVER(PARTITION BY t.itmIDComp ORDER BY (SELECT 1)) N

,Lvl + 1

FROM r

JOIN vw t ON t.itmID = r.itmIDComp

)

SELECT

*

FROM

r

"Farmer" <som...@somewhere.com> wrote in message news:DB892043-8C8A-48FD...@microsoft.com...

Plamen Ratchev

unread,

Oct 5, 2009, 6:13:16 PM10/5/09

to

This does look like a bug. Logically the ranking functions are evaluated at SELECT for the intermediate set.

See if this produces the results you need:

;WITH r AS (
SELECT t.itmID AS itmIDComp
, NULL AS itmID
, CAST(0 AS BIGINT) AS n
FROM (VALUES (1), (2), (3), (4)) AS t (itmID)
UNION ALL
SELECT i.itmIDComp
,i.itmID
,n + 1
FROM r
CROSS APPLY (SELECT itmIDComp, itmID
FROM (SELECT t.itmIDComp
,t.itmID
FROM @t AS t

WHERE t.itmID = r.itmIDComp
UNION ALL
SELECT t.itmIDComp
,t.itmID

FROM @t2 AS t
WHERE t.itmID = r.itmIDComp) AS v
) AS i
)
SELECT itmIDComp, itmID,
CASE WHEN n > 0 THEN
ROW_NUMBER() OVER(PARTITION BY n ORDER BY (SELECT 1))
ELSE n
END AS rk
FROM r;

--
Plamen Ratchev
http://www.SQLStudio.com

TheSQLGuru

unread,

Oct 5, 2009, 6:15:54 PM10/5/09

to

Actually, here is the inserts you need at the top to work with 2005;

DECLARE

@t TABLE (itmID int, itmIDComp int)

INSERT

@t VALUES (1,10)

INSERT

@t VALUES (2,10)

DECLARE

@t2 TABLE (itmID int, itmIDComp int)

INSERT

@t2 VALUES (3,10)

INSERT

@t2 VALUES (4,10)

Also, I am unclear on the exact output you expect. Is the table you post in your OP the expected output?

Oh, I also point out that you PARTITION BY itmIDComp, which for EVERY row in vw is 10, thus you get only 1 value out of the ROW_NUMBER clause.

--
Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net

"Farmer" <som...@somewhere.com> wrote in message news:29C5C189-8D59-4A2A...@microsoft.com...

Plamen Ratchev

unread,

Oct 5, 2009, 6:18:20 PM10/5/09

to

TheSQLGuru wrote:

> Oh, I also point out that you PARTITION BY itmIDComp, which for EVERY
> row in vw is 10, thus you get only 1 value out of the ROW_NUMBER clause.
>

Not exactly... :)

SELECT x, ROW_NUMBER() OVER(PARTITION BY x ORDER BY (SELECT 1)) AS rk
FROM (SELECT 10 UNION ALL
SELECT 10 UNION ALL
SELECT 10 UNION ALL
SELECT 10 UNION ALL
SELECT 10) AS T(x);

/*

x rk
----------- --------------------
10 1
10 2
10 3
10 4
10 5

*/

TheSQLGuru

unread,

Oct 5, 2009, 6:26:10 PM10/5/09

to

sheesh - not having a good day here! :-))

--
Kevin G. Boles
Indicium Resources, Inc.
SQL Server MVP
kgboles a earthlink dt net

"Plamen Ratchev" <Pla...@SQLStudio.com> wrote in message
news:uvWdnQsTq-Ux7VfX...@speakeasy.net...

Farmer

unread,

Oct 5, 2009, 7:05:06 PM10/5/09

to

Plamen,

You see the problem, don't you! :)

I understand what you are suggesting and I clearly see this as intermediate
solution, just for two levels, but in my case I need results of ROW_NUMBER
from the current iteration to be fed into the next recursion.
In other words values I get at the current level will be parent values for
the following levels.

The problem I am working on is to turn BOM structure into directed tree.
In the nutshell, schema like this

dbo.Items dbo.ItemComponents
(itmID) itmID, itmIDComp , itmcQty

turns into this, which is directed tree. I have this working but in
non-elegant way. I tried to re-write with SQL recursion and ran into this
SQL bug, as it looks to me.

"Plamen Ratchev" <Pla...@SQLStudio.com> wrote in message

news:uvWdnQgTq-Xm8lfX...@speakeasy.net...

--CELKO--

unread,

Oct 5, 2009, 7:19:21 PM10/5/09

to

There are many ways to represent a tree or hierarchy in SQL. This is
called an adjacency list model and it looks like this:

CREATE TABLE OrgChart
(emp_name CHAR(10) NOT NULL PRIMARY KEY,
boss_emp_name CHAR(10) REFERENCES OrgChart(emp_name),
<< horrible cycle constraints >>);

OrgChart
emp_name boss_emp_name
==============================
'Albert' NULL
'Bert' 'Albert'
'Chuck' 'Albert'
'Donna' 'Chuck'
'Eddie' 'Chuck'
'Fred' 'Chuck' 600.00

This approach will wind up with really ugly code -- CTEs hiding
recursive procedures, horrible cycle prevention code, etc. This
matches the way we did it in old file systems with pointer chains.
Non-RDBMS programmers are comfortable with it because it looks
familiar -- it looks like records and not rows.

Another way of representing trees is to show them as nested sets.

Since SQL is a set oriented language, this is a better model than the
adjacency list approach. Let us define a simple OrgChart table like
this.

CREATE TABLE OrgChart
(emp_name CHAR(10) NOT NULL PRIMARY KEY,
lft INTEGER NOT NULL UNIQUE CHECK (lft > 0),
rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1),
CONSTRAINT order_okay CHECK (lft < rgt),
<<cycle constraint>> );

OrgChart
emp_name lft rgt
======================
'Albert' 1 12
'Bert' 2 3
'Chuck' 4 11
'Donna' 5 6
'Eddie' 7 8
'Fred' 9 10

The (lft, rgt) pairs are like tags in a mark-up language, or parens in
algebra, BEGIN-END blocks in Algol-family programming languages, etc.
-- they bracket a sub-set. This is a set-oriented approach to trees
in a set-oriented language.

The organizational chart would look like this as a directed graph:

Albert (1, 12)
/ \
/ \
Bert (2, 3) Chuck (4, 11)
/ | \
/ | \
/ | \
/ | \
Donna (5, 6) Eddie (7, 8) Fred (9, 10)

The adjacency list table is denormalized in several ways. The
boss_emp_name and employee columns are the same kind of thing (i.e.
identifiers of personnel), and therefore should be shown in only one
column in a normalized base table. To prove that this is not
normalized, assume that "Chuck" changes his name to "Charles"; you
have to change his name in both columns and several places. The
defining characteristic of a normalized table is that you have one
fact, one place, one time. In short this is a relationship which
needs a base table of Personnel.

The final problem is that the adjacency list model does not model
subordination. It is a graph and not a hierarchy. Inheritance flows
downhill in a hierarchy, but If I fire Chuck, I disconnect all of his
subordinates from Albert. There are situations (i.e. water pipes)
where this is true, but that is not the expected situation in this
case.

The tree structure can be kept in one table and all the information
about a node can be put in a second table and they can be joined on
employee number for queries.

The nested sets model has some predictable results that we can use for
building queries. The root is always (left = 1, right = 2 * (SELECT
COUNT(*) FROM TreeTable)); leaf nodes always have (left + 1 = right);
subtrees are defined by the BETWEEN predicate; etc. Here are two
common queries which can be used to build others:

1. An employee and all their Supervisors, no matter how deep the tree.

SELECT O2.*
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp_name = :myemployee;

2. The employee and all their subordinates. There is a nice symmetry
here.

SELECT O1.*
FROM OrgChart AS O1, OrgChart AS O2
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O2.emp_name = :myemployee;

3. Add a GROUP BY and aggregate functions to these basic queries and
you have hierarchical reports. For example, the total salaries which
each employee controls:

SELECT O2.emp_name, SUM(S1.salary_amt)
FROM OrgChart AS O1, OrgChart AS O2,
Salaries AS S1
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp_name = S1.emp_name
GROUP BY O2.emp_name;

4.The nested set model has an implied ordering of siblings which the
adjacency list model does not. This is often important in overachieves
where the eldest child is promoted in the parent's place, the company
has a “last hired, first fired” rule, etc.

Since you will not state your specs or post DDL, I will guess that
what you are after is a level number in a hierarchical entity in your
schema. Here is a query in nested sets that
a. Will port to any SQL. No proprietary “features” in it.
b. Will run 2 to 3 orders of magnitude faster than your attempt.
Read that again so you understand it. Magnitude, not times.
c. Fits in less than ten lines of code, so people can maintain it.
Unlike your monster.

5. To find the level of each emp_name, so you can print the tree as an
indented listing.

SELECT T1.node,
SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
+ CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END) AS lvl
FROM Tree AS T1, Tree AS T2
WHERE T2.lft <= T1.lft
GROUP BY T1.node;

An untested version of this using full OLAP functions might be better
able to use the ordering. This will not run in SQL Server because it
still lacks the RANGE subclause

SELECT T1.node,
SUM(CASE WHEN T2.lft <= T1.lft THEN 1 ELSE 0 END
+ CASE WHEN T2.rgt < T1.lft THEN -1 ELSE 0 END)
OVER (ORDER BY T1.lft
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS
lvl
FROM Tree AS T1, Tree AS T2
WHERE T2.lft <= T1.lft;

Since I have been teaching SQL for a few decades, I need to know WHY
people write bad code. Most errors are not random; they are the
result of a bad mental model. Very often the bad model can be made to
work, but at great expense.

In 150 AD, Ptolemy of Alexandria published his theory of epicycles--
the idea that the moon, the sun and the planets moved in circles which
were moving in circles which were moving in circles around the Earth.
This theory explained the motion of celestial objects to an
astonishing degree of precision. It was, however, what computer
programmers call a kludge: a dirty, inelegant solution. Some 1,500
years later, Johannes Kepler, a German astronomer, replaced the whole
complex edifice with three simple laws.

If my guess is right,m WHY did you come up with such a screaming
nightmare? Let me go into teacher/doctor mode:

Since you used CROSS APPLY, my guess is that you think in terms of
spreadsheets instead of tables. Their rows and columns are
interchangeable. Sequential numbering is important because of the co-
ordinate system, hence the assumption that you must use ROW_NUMBER().
The name of the spreadsheet is not part of the language like a table
name is in SQL, so you think is not important. At most it is a level
number in a 3D mental model. Spreadsheets are declarative, but
computational. That leads to the same “formula” being replicated in
your query.

Is my diagnosis right? Yes? No? (with an explanation of your mindset)
or "Duh - I never considered my mindset" ?

Plamen Ratchev

unread,

Oct 5, 2009, 10:27:13 PM10/5/09

to

Yes, I do see the problem. :)

Initially I thought it was because of the APPLY operator and tried using a join:

;WITH r AS (
SELECT t.itmID AS itmIDComp
,NULL AS itmID
,CAST(0 AS BIGINT) AS n

,0 AS lvl

FROM (VALUES (1), (2), (3), (4)) AS t (itmID)
UNION ALL

SELECT v.itmIDComp
,v.itmID
,ROW_NUMBER() OVER(PARTITION BY v.itmIDComp ORDER BY (SELECT 1))
,lvl + 1
FROM r
JOIN (SELECT itmIDComp, itmID
FROM @t
UNION ALL
SELECT itmIDComp, itmID
FROM @t2) AS v
ON r.itmIDComp = v.itmID
)
SELECT itmIDComp, itmID, n, lvl
FROM r;

But same results.

Then a very simple recursive query works just fine:

;WITH Foo AS (
SELECT node, parent FROM (VALUES(1, NULL), (2, NULL), (3, 1), (4, 1), (5, 1), (6, 5), (7, 5)) AS T(node, parent)),
CTE AS (
SELECT node, parent, CAST(0 AS BIGINT) AS n,
0 AS lvl
FROM Foo
WHERE parent IS NULL
UNION ALL
SELECT F.node, F.parent,
ROW_NUMBER() OVER(PARTITION BY F.parent ORDER BY (SELECT F.node)),
lvl + 1
FROM Foo AS F
JOIN CTE AS C
ON F.parent = C.node)
SELECT node, parent, n, lvl
FROM CTE
ORDER BY lvl, n;

/*

node parent n lvl
----------- ----------- -------------------- -----------
1 NULL 0 0
2 NULL 0 0
3 1 1 1
4 1 2 1
5 1 3 1
6 5 1 2
7 5 2 2

*/

To me this proves that the behavior is a bug. You should submit a connect feedback
(https://connect.microsoft.com/SQLServer/Feedback) and post back here the link so we can vote.

Tony Rogerson

unread,

Oct 6, 2009, 12:33:11 AM10/6/09

to

Stop trying to force nested sets - it doesn't scale anywhere near the other
solutions we have now built into the product.

You know that - you've been told and shown previously.

--ROGGIE--

"--CELKO--" <jcel...@earthlink.net> wrote in message

news:b9b3ecfc-5889-46ba...@p36g2000vbn.googlegroups.com...

> has a ï¿½last hired, first firedï¿½ rule, etc.

>
> Since you will not state your specs or post DDL, I will guess that
> what you are after is a level number in a hierarchical entity in your
> schema. Here is a query in nested sets that

> a. Will port to any SQL. No proprietary ï¿½featuresï¿½ in it.

> computational. That leads to the same ï¿½formulaï¿½ being replicated in

Farmer

unread,

Oct 6, 2009, 9:29:42 AM10/6/09

to

Plamen,

I went to connect website. When I search for anything, like row_number I
don't have this link under filter button"Submit it yourself"and I am logged
in for sure, as welcome message is my full name.

I wonder what is the problem. I did submit couple of bugs in the past.

Do you wan to have honors and submit yourself if it works for you?

thanks
vladimir

"Plamen Ratchev" <Pla...@SQLStudio.com> wrote in message

news:uvWdnQUTq-WdNlfX...@speakeasy.net...

Linchi Shea

unread,

Oct 6, 2009, 1:27:01 PM10/6/09

to

Does this explain it?

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=127042&wa=wsignin1.0#details

Linchi

Plamen Ratchev

unread,

Oct 6, 2009, 1:48:35 PM10/6/09

to

I was just going to post the same, Linchi already posted the link on the private newsgroups...

Doesn't look good, "Closed as External".

Farmer

unread,

Oct 6, 2009, 5:13:40 PM10/6/09

to

Thank you Linchi.
Yes, this is the exact same issue.
It does explain the current behaviour but it does not make it right. On my
account, it is simply a bug. As Plamen already pointed, ROW_NUMBER logically
is at SELECT level and therefore, the whole iteration level result should be
considered as a set.

What does it mean "closed as external"? Can MS change their mind, ever? :)

"Linchi Shea" <Linch...@discussions.microsoft.com> wrote in message
news:8E01271C-1E61-47E0...@microsoft.com...

Linchi Shea

unread,

Oct 6, 2009, 5:39:01 PM10/6/09

to

> It does explain the current behaviour but it does not make it right.

Then, you should push it. If enough people push it, maybe there'll be some
traction.

Linchi

Farmer

unread,

Oct 6, 2009, 6:24:59 PM10/6/09

to

And I did. Vote please.

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=496271

"Linchi Shea" <Linch...@discussions.microsoft.com> wrote in message

news:8CED2804-F896-4C9D...@microsoft.com...

Plamen Ratchev

unread,

Oct 6, 2009, 7:38:57 PM10/6/09

to

Voted.

--CELKO--

unread,

Oct 6, 2009, 8:25:39 PM10/6/09

to

I thought it might be a good idea to look at the actual ANSI SQL-2003
Standard and what it says.

First big difference is the Standards requires a WITH RECURSIVE
preface to warn the compiler that this is not like a factored common
table expression/ VIEW / derived table. This is more than syntax
sugar.

I tried to read the 2003 Standard, but it will takes days that I don't
have to spend without paid to get thru it. Hey, guys, I have over a
decade of learning to read “ANSI-speak” inside the committee and I
think this is complex. This is one reason of many a lot of us did
not want recursion in SQL.

There is the concept of being “expandable” if all of the following are
true:
1) WITH clause table constructor is declared recursive.
2) WITH clause table constructor is linearly recursive.

Not all recursive functions are linear. For fun, try Ackermann's
Function (http://en.wikipedia.org/wiki/Ackermann_function); you can
break your machine with a few lines of very simple looking recursive
code.

A brief reading leads me to think that the RECURSIVE CTE must first
complete the fix point result set before it looks at the next right
hand of the UNION ALL. Therefore, the all functions including the
ordinal functions are done for that query, like a derived table. Then
all functions are done for each right hand side are computed as
completed sets which are not re-evaluated as the cycle of UNIONs adds
to them. The skeleton is :

((( .. <fix point query> UNION ALL <query #1>) UNION ALL <query
#2>) .. UNION ALL <query #n>)

Where no <query #i> is empty. This is part of being linear; it is
actually defined in terms of a graph. Then there are rules about
about UNION, EXCEPT, and INTERSECT to prevent endless recursion.
DISTINCT has some rules, etc.

Again, I am not sure, but I think that MS is giving the proper
behavior. But with the non-standard, non-relational APPLY and PIVOT
crap all bets are off and they can do anything.

Plamen Ratchev

unread,

Oct 6, 2009, 9:19:19 PM10/6/09

to

The behavior that SQL Server shows is inconsistent. It treats the recursive intermediate result set as single set when
no join is involved (and ranking is correct), and as individual rows (not set) when a join is involved (resulting in
ranking each row of the intermediate set as rank 1). If you read the connect item appears it applies ROW_NUMBER for each
row at the join phase, rather than over the SELECT set which is result of the join.

m

unread,

Oct 6, 2009, 10:36:07 PM10/6/09

to

Well, that's a nice picture, but how do you update the values of lft & rgt
when there are n millions of rows in the table? The model also breaks down
when multiple concurrent updates are being processed as the worst case
update will need to modify n -1 rows (essentially the whole table) and will
most efficiently execute when holding the table lock. This model does work
reasonably well for queries however.

Also remember that the difference between Ptolemy and Kepler is only in
their point of reference (datum) and neither is more true than the other -
and both generally acknowledged as false by modern physics.

"--CELKO--" <jcel...@earthlink.net> wrote in message

news:b9b3ecfc-5889-46ba...@p36g2000vbn.googlegroups.com...

> has a ï¿½last hired, first firedï¿½ rule, etc.

>
> Since you will not state your specs or post DDL, I will guess that
> what you are after is a level number in a hierarchical entity in your
> schema. Here is a query in nested sets that

> a. Will port to any SQL. No proprietary ï¿½featuresï¿½ in it.

> computational. That leads to the same ï¿½formulaï¿½ being replicated in

Linchi Shea

unread,

Oct 6, 2009, 11:36:02 PM10/6/09

to

> but I think that MS is giving the proper behavior.

It's hard to swallow that the MS implementation is a proper behavior.
Instead of applying row_number() to each of the recursive resultsets, which I
think is what people would expect, it applies the function to each row and
restart for every row. Even if it's 'proper', it's just odd. What kind of
problem can it help solve?

Linchi

Tony Rogerson

unread,

Oct 7, 2009, 7:52:24 AM10/7/09

to

> I tried to read the 2003 Standard, but it will takes days that I don't
> have to spend without paid to get thru it. Hey, guys, I have over a

Ahh - finally, an admission of your attitude; you only research if you get
paid.

I would have expected somebody who pushes the ANSI SQL standard so much to
be on top of the standard like a number of the SQL MVP's are and not just
the ones they where personally involved with - is it no wonder you don't
want to move forward and remain in the 20 year past.

No wonder you get so many things wrong in this MICROSOFT SQL SERVER forum.

Most folk on here research the problem from a willingness to expand their
knowledge and ability.

--ROGGIE--

--CELKO--

unread,

Oct 7, 2009, 1:58:42 PM10/7/09

to

>> It's hard to swallow that the MS implementation is a proper behavior. <<

Again, that is a first read on my part. The part of the standard on
this stuff is well over 1000 pages of Standard-speak with many new
features that have to be considered. If you think legal opinions are
complicated, try recursive function theory, RDBMS and ANSI/ISO
terminology. Advising SQL compiler & optimizer programmers is part of
how I earned a living for the past few decades, and I am not sure with
just one day of reading.

Consider the skeleton:

((( .. <fix point query> UNION ALL <query #1>) UNION ALL <query
#2>) .. UNION ALL <query #n>)

The fixed point query has to be completed, I am pretty sure of that
part. But when you go to evaluate <query #1> , how is the union
done? A normal union
(<query #1> UNION [ALL] <query #2>) would complete the queries then do
the UNION on completed sets. That means that an ordinal function in
<query #1> would be local, and an ordinal function in <query #2>
would be local. So we would get duplicates values in the final
result.

HOWEVER, this is not a normal union. There prohibitions about EXCEPT
and INTERSECT, DISTINCT and many other things that might appear in
<query #n>.

I think it is reasonable to read this as do the recursion on each row
of <query #n> in parallel to each row in the fixed point. That means
an ordinal function in <query #n> is being evaluated on a subset of
rows, which can be as small as one row (this particular case).

But, again, with the non-standard, non-relational APPLY and PIVOT crap

all bets are off and they can do anything.

I will get to this when I finish commenting on Tom Johnston's book
manuscript on temporal SQL, get my slides ready for PASS, for the
Orlando SQL Saturday, for DevConnection in Las Vegas, check out my
newest book, write chapters for SQL FOR SMARTIES 4-th edition, do
consulting work for an aircraft industry machine shop, keep myself up
to date on the Voter Registration DB project for January (when the
2010 Census changes ethnicity code and Texas re-districts the state),
find the next paying gig and walk the dogs sometime -- they look very
desperate :) -- I will try to get back to it.

Farmer

unread,

Oct 7, 2009, 4:27:34 PM10/7/09

to

Joe,

Here is code for you that has no CROSS APPLY. I don't think anything about
this particular code is non ANSI.
I used proper ORDER BY too, no function calls.
Question: What result set is proper outcome of this code? simply state 1 or
2.

1.

itmIDComp itmID N Lvl
1 NULL 0 1
2 NULL 0 1
3 NULL 0 1
4 NULL 0 1
10 4 1 2
10 3 1 2
10 2 1 2
10 1 1 2

2.
itmIDComp itmID N Lvl
1 NULL 0 1
2 NULL 0 1
3 NULL 0 1
4 NULL 0 1
10 4 4 2
10 3 3 2
10 2 2 2
10 1 1 2

Code

DECLARE @t TABLE (itmID int, itmIDComp int)
INSERT @t VALUES (1,10), (2,10)
DECLARE @t2 TABLE (itmID int, itmIDComp int)
INSERT @t2 VALUES (3,10), (4,10)
;WITH vw AS
(
SELECT
t.itmIDComp
,t.itmID
FROM @t t

UNION ALL
SELECT
t.itmIDComp
,t.itmID

FROM @t2 t
)
,r AS
(
SELECT
t.itmID as itmIDComp

, NULL as itmID
,cast(0 as bigint) as N
,1 as Lvl
FROM (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) as

t (itmID)

UNION ALL

SELECT
t.itmIDComp
,t.itmID
,ROW_NUMBER() OVER(PARTITION BY t.itmIDComp ORDER BY t.itmIDComp, t.itmID) N
,Lvl + 1
FROM r

JOIN vw t ON t.itmID = r.itmIDComp

)
SELECT *
FROM r

"--CELKO--" <jcel...@earthlink.net> wrote in message
news:3cfeb2e1-f79d-44ea...@f10g2000vbf.googlegroups.com...

--CELKO--

unread,

Oct 7, 2009, 7:30:15 PM10/7/09

to

Actually, you were not that close to ANSI/ISO, nor ISO-11179 names.
The role goes in front of the base data element name, table
declarations are dialect, The correct syntax is INSERT INTO, etc.
Here is a non-dialect clean up of your DDL and DML, which should run
in current T-SQL. But it has problems

That semi-colon in front of the WITH is just the beginning; T-SQL will
have to require semi-colons at the end of ALL statements if they want
to get up to standards. People used to make fun of me in SQL Server
newsgroups for always using it. Now all they have is my bald head
and black suits :)

CREATE TABLE FixPoint
(item_id INTEGER NOT NULL PRIMARY KEY, - - wild guess
comp_item_id INTEGER NOT NULL,
PRIMARY KEY (item_id, comp_item_id));

INSERT INTO VALUES (1,10), (2,10);

CREATE TABLE RecurseStep
(item_id INTEGER NOT NULL PRIMARY KEY, - - wild guess
comp_item_id INTEGER NOT NULL,
PRIMARY KEY (item_id, comp_item_id));

INSERT INTO RecurseStep VALUES (3,10), (4,10);

Let's ignore that this crap is attribute splitting and the DB designer
should burn in hell.

Now the query. I hate “VW” as VIEW because it violates the principle
that VIEWs are also tables and need valid names – not a description of
how they are created. So let's use the most common meaning of those
two letters for the test:

WITH RECURSION Volkswagen (comp_item_id, item_id)
AS
(SELECT comp_item_id, item_id
FROM FixPoint
UNION ALL
SELECT comp_item_id, item_id
FROM RecurseStep),

/* VolksWagen is never used after it is declared. Unh? Let's
continue */

R (comp_item_id, item_id, n, lvl)
AS
(SELECT item_id, CAST (NULL AS INTEGER), CAST(0 AS INTEGER), 1
FROM (VALUES (1), (2), (3), (4)) AS T (item_id)
UNION ALL
SELECT T.comp_item_id, T.item_id,
ROW_NUMBER()
OVER(PARTITION BY T.comp_item_id
ORDER BY T.comp_item_id, T.item_id) AS N,
lvl + 1
FROM R, T
WHERE T.item_id = R.comp_item_id)

SELECT * FROM R; – - select * in production code!!

Now using my SQL:2003 parser, this will NOT work the “WITH RECURSION
Volkswagen ..” and does work with a simple “WITH Volkswagen ..”. Now
we have problems because R and T are on the same level so there is an
issue of LATERAL tables. SQL Server does not have the LATERAL
construct. And this is one of the parts that is a screaming bitch to
read and understand in SQL:2003 for me.

In short, you have written something that could run (AT GREAT COST!!)
in DB2 and which is confusing to the MS-SQL compiler. You are not the
Lone Ranger. Many of us on ANSI X3H2 did not want recursion in SQL.
The members with a business background saw no need for it (COBOL, et
al) and those of us who had written compilers or had a math degree
were afraid of it.

Phil Shaw of IBM presented a proposal that had syntax for depth first
or breath first searching and EVERYTHING that you had in your Data
Structures class on Trees in one multi-clause statement.

This was a major reason I came up with Nested Sets while on ANSI
X3H2.

OT: you put commas at the start of a line and KEYWORDs on single
lines. That is punch card programmer style -- think about it. This
adds 10-12% more time to maintaining code and cost $$$.

lee

unread,

Oct 8, 2009, 10:35:17 AM10/8/09

to

On Oct 7, 7:30 pm, --CELKO-- <jcelko...@earthlink.net> wrote:

> OT: you put commas at the start of a line and KEYWORDs on single
> lines. That is punch card programmer style -- think about it. This
> adds 10-12% more time to maintaining code and cost $$$.

In his defense leading commas & single-line keywords can save time.

The reason is when refactoring a SQL statement the various elements
can be repositioned & commented out more readily when led by comma &
logical operators (AND, OR...).

Select
fld1
, fld2
-- , fld3
from
table1 t1 join table2 t2
on t1.crit1 = t2.crit2
-- and t2.crit3 = <expr>
and t3.crit4 = <expr>

I can readily comment out the middle join filter & the last select
column w/out breaking code (digging for the trailing comma when
commenting out the last field before the FROM clause).

Also the commented-out columns can remain there for later maintenance
& inspection of the output.

I see it as a matter of work style.

And if everyone did it my way we'd all be more efficient. :-)

/l

--CELKO--

unread,

Oct 8, 2009, 6:00:38 PM10/8/09

to

>> In his defense leading commas & single-line keywords can save time. The reason is when refactoring a SQL statement the various elements can be repositioned & commented out more readily when led by comma & logical operators (AND, OR...). <<

Yes, I remember that from the punch card days. We also kept decks of
cards with leading keywords to save time at the keypunch machines; the
deck and the printout looked nice and thick, so the boss thought you
were really working hard :)

But when we were researching formatting at AIRMICS, we found this and
many other "punch card mindset" practices (uppercase program text, no
indentations, etc.) to be measurably harmful for maintaining code.

The usual test was to hand an experienced programmer a program with a
known bug and time how long it took him to find it. That gave us a
baseline. We then used a "pretty printer" on existing code for DoD
and compared the maintenance history it had before it was re-
formatted. Programs without re-formatting were the control group.
That is where we saw the 8-12% drop in effort.

I had the advantages of being a typesetter in the 1970's and knew some
of the basics from all the decades of research that had gone into
books and newspapers.

Today, you can use a text editor with a few macros to clean up your
code. One of my favorites is a simple routine to flip a search
condition around (A = B becomes B = A, etc.) so I can put all of the
search conditions for one table on the left side to make it easier to
read. And a line sort is handy.

Today, you assume that code will be properly indented. We did not do
that with punch cards. With indentation you can see a mis-placed ELSE
clause immediately; try to find one when the text is flush left.

lee

unread,

Oct 8, 2009, 6:57:07 PM10/8/09

to

On Oct 8, 6:00 pm, --CELKO-- <jcelko...@earthlink.net> wrote:
> >> In his defense leading commas & single-line keywords can save time. The reason is when refactoring a SQL statement the various elements can be repositioned & commented out more readily when led by comma & logical operators (AND, OR...). <<
>
> Yes, I remember that from the punch card days. We also kept decks of
> cards with leading keywords to save time at the keypunch machines; the
> deck and the printout looked nice and thick, so the boss thought you
> were really working hard :)
>
> But when we were researching formatting at AIRMICS, we found this and
> many other "punch card mindset" practices (uppercase program text, no
> indentations, etc.) to be measurably harmful for maintaining code.
>
> The usual test was to hand an experienced programmer a program with a
> known bug and time how long it took him to find it. That gave us a
> baseline. We then used a "pretty printer" on existing code for DoD
> and compared the maintenance history it had before it was re-
> formatted. Programs without re-formatting were the control group.
> That is where we saw the 8-12% drop in effort.
>
> I had the advantages of being a typesetter in the 1970's and knew some
> of the basics from all the decades of research that had gone into
> books and newspapers.
>
> Today, you can use a text editor with a few macros to clean up your
> code. One of my favorites is a simple routine to flip a search
> condition around (A = B becomes B = A, etc.) so I can put all of the
> search conditions for one table on the left side to make it easier to
> read. And a line sort is handy.

Which assumes I have my nice editor handy ...

> Today, you assume that code will be properly indented. We did not do
> that with punch cards. With indentation you can see a mis-placed ELSE
> clause immediately; try to find one when the text is flush left.

I still see SQL script maint. as a separate matter from structured
code.

A SQL "wall of text" is absolutely no good for maintenance either. For
instance, if I have "look ahead" correlated subqueries or CASEs in the
SELECT clause.

Comma- & logical operator-led lines has saved me time, otherwise I
have to mouse &/or keybd about. I can't rely on having my fav text
editor. If the SQL bloats from too much vertical orientation then yes,
I see there's a problem in maintainability, but then it's probably
verging on huge already....

As for structured code, as with LINQ, Hib & Rails there's another
matter, using WITH statements in the structured code it looks a bit
more like FoxPro / xBase (altho still can't get completely away from
the var = (backcast)object.method().method garbage ).

/lee

--CELKO--

unread,

Oct 8, 2009, 8:15:59 PM10/8/09

to

>> I still see SQL script maintenance as a separate matter from structured code. <<

Having seen the research for a few years, I disagree. Visual
Presentation can be a HUGE help. Roman Numeral versus Hindu-Arabic?
Decimal fractions versus Egyptian?

>> A SQL "wall of text" is absolutely no good for maintenance either. For instance, if I have "look ahead" correlated subqueries or CASEs in the SELECT clause. <<

Unh? a correlated subquery can only reference things that contain it
within their scope. Now, LATERAL queries (which MS-SQL does not have)
can be a real screaming bitch to read ..

Indentation and "rivers" in the text let me group the subquery
structure very quickly. Have you looked at my SQL PROGRAMMING STYLE?
I go into some of the principles in my usual pedantic boring way :)