COL1 COL2 COL3
----------- ----------- ----------
1 2 11/16/2009
1 2 11/17/2009
2 300 11/18/2009
1 300 11/19/2009
1 2 11/20/2009
I would like to find an OLAP-ordering or grouping function that would
group things into new grouping each time the value of COL1 changed.
The resultant output would look like this:
COL1 COL2 COL3 GROUP_ID
----------- ----------- ---------- ----------------
1 2 11/16/2009 1
1 2 11/17/2009 1
23 300 11/18/2009 2
1 300 11/19/2009 3
1 2 11/20/2009 3
Is this possible?
Running DB2 9.5 AIX 5.3
Thanks,
Evan
Evan,
What are the grouping columns? Also, are you looking for a trigger-
based solution, or just a query that would merely reflect the current
data?
--Jeff
The data from the first listing above is the output from another
complicated business logic query it's been sorted by all three
columns. The second listing (the 23 is actually a typo--should have
been 2) is what is what my target results would ideally be. The only
thing I have to go on is the value from COL1.
Humanly I can see that rows one and two share the same value. They
belong in the first group. The value row three is different from the
previous two rows. Therefore it would be next logical grouping. Rows
three and four share the same value and represent a change from row
three. However, they do share the same value as rows one and two, but
should NOT be grouped with those rows.
I know I could write procedural code to iterate through the result
set, summarizing when I see the row change. I was just hoping that
there might be a construct in SQL using the OLAP functions that might
be able to take the pain and need for external programming away from
the equation.
Evan
OK--so basically a break on COL1 when the current value differs from
its predecessor.... I'll put my thinking cap on, but, meantime, based
on a recent posting of inspired SQL legerdemain by Tonkuma, I'm
thinking the solution to your problem will involve the LAG or LEAD
function, so you might want to have a look at those (if you've not
already).
--Jeff
I tried a way without nested table expression.
But, it looks the spec. of current DB2 SQL that OLAP functions can't
nest each other,
then I used a nested table expression.
------------------------------ Commands Entered
------------------------------
WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES
( 1, 2, '11/16/2009')
,( 1, 2, '11/17/2009')
,( 2, 300, '11/18/2009')
,( 1, 300, '11/19/2009')
,( 1, 2, '11/20/2009')
)
/**************************************************
***** End of sample data *****
**************************************************/
SELECT col1, col2, col3
, COUNT(col1_changed)
OVER(ORDER BY col3, col1, col2
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) + 1 AS group_id
FROM (SELECT s.*
, CASE
WHEN col1 <>
LAG(col1) OVER(ORDER BY col3, col1, col2)
THEN 'Y'
/* ELSE NULL */
END col1_changed
FROM sample_data s
) q
ORDER BY
col3, col1, col2
;
------------------------------------------------------------------------------
COL1 COL2 COL3 GROUP_ID
----------- ----------- ---------- -----------
1 2 11/16/2009 1
1 2 11/17/2009 1
2 300 11/18/2009 2
1 300 11/19/2009 3
1 2 11/20/2009 3
5 record(s) selected.
This worked(tested on DB2 9.7 for Windows.):
------------------------------ Commands Entered
------------------------------
WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES
( 1, 2, '11/16/2009')
,( 1, 2, '11/17/2009')
,( 2, 300, '11/18/2009')
,( 1, 300, '11/19/2009')
,( 1, 2, '11/20/2009')
)
/**************************************************
***** End of sample data *****
**************************************************/
SELECT col1, col2, col3
, COUNT( CASE
WHEN col1 <>
LAG(col1) OVER(ORDER BY col3, col1, col2)
THEN 'Y'
/* ELSE NULL */
END)
OVER(ORDER BY col3, col1, col2
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) + 1 AS group_id
FROM sample_data s
Like this:
------------------------------ Commands Entered
------------------------------
WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES
( 1, 2, '11/16/2009')
,( 1, 2, '11/17/2009')
,( 2, 300, '11/18/2009')
,( 1, 300, '11/19/2009')
,( 1, 2, '11/20/2009')
)
/**************************************************
***** End of sample data *****
**************************************************/
SELECT col1, col2, col3
, COUNT( NULLIF( col1
, LAG(col1)
OVER(ORDER BY col3, col1, col2)
)
)
OVER(ORDER BY col3, col1, col2)
It is *possible* to do this in SQL, but it may not be very efficient.
Here's a hint to get you started:
select
col1
,col2
,col3
,min(col1)
over (order by col3,col1,col2
rows between 1 preceding and 1 preceding) as prev_col1
from
your_table
The last expression will give you the value of col1 on the previous
row. You can use the to compare col1 with the previous row's value
of col1 to see if it's changing.
However, it would be far more efficient to do it in the application
that will be consuming the data. Alternatively, you could write an
external UDF that could do this more efficiently than SQL.
We're in one of those binds, where it's less invasive to change the
code in a stored procedure and push that, rather than to have to file
the paperwork to open the application code and get permission to take
an outage to promote it. That's why doing it in the database was
preferable for the short term.
Regards,
Evan
anata-wa ichi-ban-no SQL-no sensei desu!
or
anata-wa SQL-no ichi-ban-no sensei desu!
> We're in one of those binds, where it's less invasive to change the
> code in a stored procedure and push that, rather than to have to file
> the paperwork to open the application code and get permission to take
> an outage to promote it. That's why doing it in the database was
> preferable for the short term.
I know.
If I had a dollar for every time I heard this from app developers
and then had to fix a subsequent performance problem I'd be writing
this message from the beach. Maybe.
Just for grins, here is an example of doing this via Recursive CTE:
WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES
( 1, 2, date('11/16/2009'))
,( 1, 2, date('11/17/2009'))
,( 2, 300, date('11/18/2009'))
,( 1, 300, date('11/19/2009'))
,( 1, 2, date('11/20/2009'))
) ,
/**************************************************
***** End of sample data *****
**************************************************/
the_goods (col1, col2, col3, group_id) AS (
select * From (select col1, col2, col3, 1 from sample_data fetch first
1 row only) x
UNION ALL
select sd.col1, sd.col2, sd.col3, case when sd.col1 = tg.col1 then
tg.group_id else tg.group_id + 1 end
from the_goods tg, sample_data sd
where tg.col3 + 1 day = sd.col3
)
select * from the_goods
;
------------------------------------------------------------------------------
COL1 COL2 COL3 GROUP_ID
----------- ----------- ---------- -----------
1 2 11/16/2009 1
1 2 11/17/2009 1
2 300 11/18/2009 2
1 300 11/19/2009 3
1 2 11/20/2009 3
This assumes that COL3 is always ascending date - if this isn't the
case, then something a little more interesting is needed:
WITH
/**************************************************
***** Start of sample data *****
**************************************************/
sample_data(COL1, COL2, COL3) AS (
VALUES
( 1, 2, date('11/16/2009'))
,( 1, 2, date('11/17/2009'))
,( 2, 300, date('11/18/2009'))
,( 1, 300, date('11/19/2009'))
,( 1, 2, date('11/20/2009'))
) ,
/**************************************************
***** End of sample data *****
***** Now append row numbers *****
**************************************************/
ordered_data(COL1, COL2, COL3, ORDER) AS (
select col1, col2, col3, row_number() over() from sample_data
) ,
/**************************************************
***** Now get actual data *****
**************************************************/
the_goods (col1, col2, col3, group_id, order) as (
select * From (select col1, col2, col3, order, order from ordered_data
fetch first 1 row only) x
UNION ALL
select sd.col1, sd.col2, sd.col3, case when sd.col1 = tg.col1 then
tg.group_id else tg.group_id + 1 end, sd.order
from the_goods tg, ordered_data sd
where tg.order + 1 = sd.order
)
select col1, col2, col3, group_id from the_goods
;
Thanks,
Chris