Need OLAP-type Ordering

esmith2112

unread,

Nov 23, 2009, 2:33:53 PM11/23/09

to

Given sorted data that looks like this:

COL1 COL2 COL3
----------- ----------- ----------
1 2 11/16/2009
1 2 11/17/2009
2 300 11/18/2009
1 300 11/19/2009
1 2 11/20/2009

I would like to find an OLAP-ordering or grouping function that would
group things into new grouping each time the value of COL1 changed.
The resultant output would look like this:

COL1 COL2 COL3 GROUP_ID
----------- ----------- ---------- ----------------
1 2 11/16/2009 1
1 2 11/17/2009 1
23 300 11/18/2009 2
1 300 11/19/2009 3
1 2 11/20/2009 3

Is this possible?

Running DB2 9.5 AIX 5.3

Thanks,
Evan

jefftyzzer

unread,

Nov 23, 2009, 3:16:48 PM11/23/09

to

Evan,

What are the grouping columns? Also, are you looking for a trigger-
based solution, or just a query that would merely reflect the current
data?

--Jeff

esmith2112

unread,

Nov 23, 2009, 4:16:16 PM11/23/09

to

> Evan,
>
> What are the grouping columns? Also, are you looking for a trigger-
> based solution, or just a query that would merely reflect the current
> data?
>
> --Jeff

The data from the first listing above is the output from another
complicated business logic query it's been sorted by all three
columns. The second listing (the 23 is actually a typo--should have
been 2) is what is what my target results would ideally be. The only
thing I have to go on is the value from COL1.

Humanly I can see that rows one and two share the same value. They
belong in the first group. The value row three is different from the
previous two rows. Therefore it would be next logical grouping. Rows
three and four share the same value and represent a change from row
three. However, they do share the same value as rows one and two, but
should NOT be grouped with those rows.

I know I could write procedural code to iterate through the result
set, summarizing when I see the row change. I was just hoping that
there might be a construct in SQL using the OLAP functions that might
be able to take the pain and need for external programming away from
the equation.

Evan

jefftyzzer

unread,

Nov 23, 2009, 5:10:30 PM11/23/09

to

OK--so basically a break on COL1 when the current value differs from
its predecessor.... I'll put my thinking cap on, but, meantime, based
on a recent posting of inspired SQL legerdemain by Tonkuma, I'm
thinking the solution to your problem will involve the LAG or LEAD
function, so you might want to have a look at those (if you've not
already).

--Jeff

Tonkuma

unread,

Nov 23, 2009, 8:04:23 PM11/23/09

to

Here is an example using OLAP functions.
Another ways would be use of loop by Procedure statements or by
Recursive CTE.

I tried a way without nested table expression.
But, it looks the spec. of current DB2 SQL that OLAP functions can't
nest each other,
then I used a nested table expression.
------------------------------ Commands Entered
------------------------------
WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES
( 1, 2, '11/16/2009')
,( 1, 2, '11/17/2009')
,( 2, 300, '11/18/2009')
,( 1, 300, '11/19/2009')
,( 1, 2, '11/20/2009')
)
/**************************************************
***** End of sample data *****
**************************************************/
SELECT col1, col2, col3
, COUNT(col1_changed)
OVER(ORDER BY col3, col1, col2
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) + 1 AS group_id
FROM (SELECT s.*
, CASE
WHEN col1 <>
LAG(col1) OVER(ORDER BY col3, col1, col2)
THEN 'Y'
/* ELSE NULL */
END col1_changed
FROM sample_data s
) q
ORDER BY
col3, col1, col2
;
------------------------------------------------------------------------------

COL1 COL2 COL3 GROUP_ID
----------- ----------- ---------- -----------

1 2 11/16/2009 1
1 2 11/17/2009 1

2 300 11/18/2009 2

1 300 11/19/2009 3
1 2 11/20/2009 3

5 record(s) selected.

Tonkuma

unread,

Nov 23, 2009, 8:15:03 PM11/23/09

to

On Nov 24, 10:04 am, Tonkuma <tonk...@fiberbit.net> wrote:
>
> I tried a way without nested table expression.
> But, it looks the spec. of current DB2 SQL that OLAP functions can't
> nest each other,
> then I used a nested table expression.

I'm sorry!

This worked(tested on DB2 9.7 for Windows.):

------------------------------ Commands Entered
------------------------------
WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES
( 1, 2, '11/16/2009')
,( 1, 2, '11/17/2009')
,( 2, 300, '11/18/2009')
,( 1, 300, '11/19/2009')
,( 1, 2, '11/20/2009')
)
/**************************************************
***** End of sample data *****
**************************************************/
SELECT col1, col2, col3

, COUNT( CASE

WHEN col1 <>
LAG(col1) OVER(ORDER BY col3, col1, col2)
THEN 'Y'
/* ELSE NULL */

END)

OVER(ORDER BY col3, col1, col2
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) + 1 AS group_id

FROM sample_data s

Tonkuma

unread,

Nov 23, 2009, 8:56:17 PM11/23/09

to

This is shorter than previous.

------------------------------ Commands Entered
------------------------------
WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES
( 1, 2, '11/16/2009')
,( 1, 2, '11/17/2009')
,( 2, 300, '11/18/2009')
,( 1, 300, '11/19/2009')
,( 1, 2, '11/20/2009')
)
/**************************************************
***** End of sample data *****
**************************************************/
SELECT col1, col2, col3

, COUNT( NULLIF( col1
, LAG(col1)

OVER(ORDER BY col3, col1, col2)

)

)
OVER(ORDER BY col3, col1, col2
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW

) AS group_id
FROM sample_data

Tonkuma

unread,

Nov 23, 2009, 9:15:15 PM11/23/09

to

This is a default, then you can omit the clause.
/*

ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW

*/

Like this:

------------------------------ Commands Entered
------------------------------
WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES
( 1, 2, '11/16/2009')
,( 1, 2, '11/17/2009')
,( 2, 300, '11/18/2009')
,( 1, 300, '11/19/2009')
,( 1, 2, '11/20/2009')
)
/**************************************************
***** End of sample data *****
**************************************************/
SELECT col1, col2, col3
, COUNT( NULLIF( col1
, LAG(col1)
OVER(ORDER BY col3, col1, col2)
)
)

OVER(ORDER BY col3, col1, col2)

Ian

unread,

Nov 24, 2009, 12:54:02 AM11/24/09

to

It is *possible* to do this in SQL, but it may not be very efficient.
Here's a hint to get you started:

select
col1
,col2
,col3
,min(col1)
over (order by col3,col1,col2
rows between 1 preceding and 1 preceding) as prev_col1
from
your_table

The last expression will give you the value of col1 on the previous
row. You can use the to compare col1 with the previous row's value
of col1 to see if it's changing.

However, it would be far more efficient to do it in the application
that will be consuming the data. Alternatively, you could write an
external UDF that could do this more efficiently than SQL.

esmith2112

unread,

Nov 24, 2009, 8:44:58 AM11/24/09

to

Thanks, Tonkuma (anata-wa ichi-ban SQL-no sensei desu!) and Ian. I've
learned multiple new concepts. Very useful!

We're in one of those binds, where it's less invasive to change the
code in a stored procedure and push that, rather than to have to file
the paperwork to open the application code and get permission to take
an outage to promote it. That's why doing it in the database was
preferable for the short term.

Regards,
Evan

Tonkuma

unread,

Nov 24, 2009, 10:20:11 AM11/24/09

to

> Thanks, Tonkuma (anata-wa ichi-ban SQL-no sensei desu!) and Ian. I've

Wow! Near perfect Japanese!

anata-wa ichi-ban-no SQL-no sensei desu!
or
anata-wa SQL-no ichi-ban-no sensei desu!

Ian

unread,

Nov 24, 2009, 7:52:50 PM11/24/09

to

esmith2112 wrote:

> We're in one of those binds, where it's less invasive to change the
> code in a stored procedure and push that, rather than to have to file
> the paperwork to open the application code and get permission to take
> an outage to promote it. That's why doing it in the database was
> preferable for the short term.

I know.

If I had a dollar for every time I heard this from app developers
and then had to fix a subsequent performance problem I'd be writing
this message from the beach. Maybe.

ChrisC

unread,

Dec 2, 2009, 12:57:43 PM12/2/09

to

On Nov 23, 5:04 pm, Tonkuma <tonk...@fiberbit.net> wrote:
> Another ways would be use of loop by Procedure statements or by
> Recursive CTE.
>

Just for grins, here is an example of doing this via Recursive CTE:

WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES

( 1, 2, date('11/16/2009'))
,( 1, 2, date('11/17/2009'))
,( 2, 300, date('11/18/2009'))
,( 1, 300, date('11/19/2009'))
,( 1, 2, date('11/20/2009'))
) ,

/**************************************************
***** End of sample data *****
**************************************************/

the_goods (col1, col2, col3, group_id) AS (
select * From (select col1, col2, col3, 1 from sample_data fetch first
1 row only) x
UNION ALL
select sd.col1, sd.col2, sd.col3, case when sd.col1 = tg.col1 then
tg.group_id else tg.group_id + 1 end
from the_goods tg, sample_data sd
where tg.col3 + 1 day = sd.col3
)
select * from the_goods
;
------------------------------------------------------------------------------

COL1 COL2 COL3 GROUP_ID
----------- ----------- ---------- -----------
1 2 11/16/2009 1
1 2 11/17/2009 1
2 300 11/18/2009 2
1 300 11/19/2009 3
1 2 11/20/2009 3

This assumes that COL3 is always ascending date - if this isn't the
case, then something a little more interesting is needed:

WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES

( 1, 2, date('11/16/2009'))
,( 1, 2, date('11/17/2009'))
,( 2, 300, date('11/18/2009'))
,( 1, 300, date('11/19/2009'))
,( 1, 2, date('11/20/2009'))
) ,

/**************************************************
***** End of sample data *****

***** Now append row numbers *****
**************************************************/
ordered_data(COL1, COL2, COL3, ORDER) AS (
select col1, col2, col3, row_number() over() from sample_data
) ,
/**************************************************
***** Now get actual data *****
**************************************************/
the_goods (col1, col2, col3, group_id, order) as (
select * From (select col1, col2, col3, order, order from ordered_data
fetch first 1 row only) x
UNION ALL
select sd.col1, sd.col2, sd.col3, case when sd.col1 = tg.col1 then
tg.group_id else tg.group_id + 1 end, sd.order
from the_goods tg, ordered_data sd
where tg.order + 1 = sd.order
)
select col1, col2, col3, group_id from the_goods
;

Thanks,
Chris

Tonkuma

unread,

Dec 2, 2009, 5:38:46 PM12/2/09

to

> ordered_data(COL1, COL2, COL3, ORDER) AS (
> select col1, col2, col3, row_number() over() from sample_data
> )

It will be better to specify ordering explicitly, like this:

ordered_data(COL1, COL2, COL3, ORDER) AS (

select col1, col2, col3, row_number() over(ORDER BY col3) from
sample_data
)