create table t1 (key varchar2(10), start_dtm date, end_dtm date);
and I have data some of which over laps:
insert into t1 values ('1234', to_date('01/01/2001', 'DD/MM/YYYY'),
to_date('30/03/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('01/02/2001', 'DD/MM/YYYY'),
to_date('30/03/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('15/02/2001', 'DD/MM/YYYY'),
to_date('15/04/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('01/07/2001', 'DD/MM/YYYY'),
to_date('30/12/2001', 'DD/MM/YYYY'));
Ie
1st Jan - 30th March
2nd Feb - 30th March
15th Feb - 15th April
1st July - 30th Dec
I want to return only two rows from the 4 that look like
1st jan - 15th April
1st July - 30th Dec
ie - if the dates are continuous, squash all the continuous rows into
1 with the min start and max end date
but if there is a gap it has to become a second row. There can be any
number of rows and over laps etc.
I reckon this can be done with some analytics trickery, but I just
cannot seem to figure out how.
Any ideas?
Thanks,
Stephen.
Maybe, something like this
SQL> select key, min(start_dtm) start_dtm, max(end_dtm) end_dtm
2 from (select key,
3 start_dtm,
4 end_dtm,
5 sum(interval_change) over(partition by key order by
end_dtm, start_dtm) interval_no
6 from (select key,
7 start_dtm,
8 end_dtm,
9 case
10 when (lag(end_dtm)
11 over(partition by key order by end_dtm,
12 start_dtm) - start_dtm) > 0 then
13 0
14 else
15 1
16 end interval_change
17 from t1))
18 group by key, interval_no
19 order by key, interval_no
20 ;
KEY START_DTM END_DTM
---------- -------------------- --------------------
1234 01-Jan-2001 15-Apr-2001
1234 01-Jul-2001 30-Dec-2001
Best regards
Maxim
Thanks for posting the DDL and DML to create the test setup. I see
that Maxim provided a solution for you in the 90 minutes that I
experimented with the problem (I initially read the requirement
wrong). Maxim's solution appears to be more elegant than what follows
(and there may still be a couple issues), but it might be interesting
to look at how a solution might be worked out.
First, we experiment with LAG so that we are able to examine the
previous row when sorted by START_DTM and END_DTM:
SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM,
LEAD(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY
T1.START_DTM,T1.END_DTM) M_START_DTM,
LEAD(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY
T1.START_DTM,T1.END_DTM) M_END_DTM
FROM
T1
ORDER BY
T1.START_DTM,
T1.END_DTM;
KEY START_DTM END_DTM M_START_D M_END_DTM
---------- --------- --------- --------- ---------
1234 01-JAN-01 30-MAR-01 01-FEB-01 30-MAR-01
1234 01-FEB-01 30-MAR-01 15-FEB-01 15-APR-01
1234 15-FEB-01 15-APR-01 01-JUL-01 30-DEC-01
1234 01-JUL-01 30-DEC-01
If we now slide the above into an inline view and see if the START_DTM
is greater than M_END_DTM (the previous row's END_DTM), we may have
found a new series of dates (note, I just noticed that this may have
problems if there are two rows with discrete date ranges that fall
entirely within a third, larger date range), we will output 1 if true,
or 0 if false:
SELECT
KEY,
START_DTM,
END_DTM,
M_START_DTM,
M_END_DTM,
DECODE(SIGN(START_DTM-NVL(M_END_DTM,START_DTM)),1,1,0) C
FROM
(SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM,
LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY
T1.START_DTM,T1.END_DTM) M_START_DTM,
LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY
T1.START_DTM,T1.END_DTM) M_END_DTM
FROM
T1
ORDER BY
T1.START_DTM,
T1.END_DTM);
KEY START_DTM END_DTM M_START_D M_END_DTM C
---------- --------- --------- --------- --------- ----------
1234 01-JAN-01 30-MAR-01 0
1234 01-FEB-01 30-MAR-01 01-JAN-01 30-MAR-01 0
1234 15-FEB-01 15-APR-01 01-FEB-01 30-MAR-01 0
1234 01-JUL-01 30-DEC-01 15-FEB-01 15-APR-01 1
In the above, a 1 is output whenever there is a jump in the date range
- if we use the COUNT analytical function to create a running count of
the 1s and slide the above into an inline view, we actually create a
column that may be used for grouping:
SELECT
KEY,
START_DTM START_DTM,
END_DTM END_DTM,
SUM(C) OVER (PARTITION BY KEY ORDER BY START_DTM,END_DTM) G
FROM
(SELECT
KEY,
START_DTM,
END_DTM,
M_START_DTM,
M_END_DTM,
DECODE(SIGN(START_DTM-NVL(M_END_DTM,START_DTM)),1,1,0) C
FROM
(SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM,
LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY
T1.START_DTM,T1.END_DTM) M_START_DTM,
LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY
T1.START_DTM,T1.END_DTM) M_END_DTM
FROM
T1
ORDER BY
T1.START_DTM,
T1.END_DTM));
KEY START_DTM END_DTM G
---------- --------- --------- ----------
1234 01-JAN-01 30-MAR-01 0
1234 01-FEB-01 30-MAR-01 0
1234 15-FEB-01 15-APR-01 0
1234 01-JUL-01 30-DEC-01 1
We are now able to group on the column G by again sliding the above
into an inline view:
SELECT
KEY,
MIN(START_DTM) START_DTM,
MAX(END_DTM) END_DTM
FROM
(SELECT
KEY,
START_DTM START_DTM,
END_DTM END_DTM,
SUM(C) OVER (PARTITION BY KEY ORDER BY START_DTM,END_DTM) G
FROM
(SELECT
KEY,
START_DTM,
END_DTM,
M_START_DTM,
M_END_DTM,
DECODE(SIGN(START_DTM-NVL(M_END_DTM,START_DTM)),1,1,0) C
FROM
(SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM,
LAG(T1.START_DTM) OVER (PARTITION BY T1.KEY ORDER BY
T1.START_DTM,T1.END_DTM) M_START_DTM,
LAG(T1.END_DTM) OVER (PARTITION BY T1.KEY ORDER BY
T1.START_DTM,T1.END_DTM) M_END_DTM
FROM
T1
ORDER BY
T1.START_DTM,
T1.END_DTM)))
GROUP BY
KEY,
G;
KEY START_DTM END_DTM
---------- --------- ---------
1234 01-JAN-01 15-APR-01
1234 01-JUL-01 30-DEC-01
Let's add a little more data to see what happens:
insert into t1 values ('1234', to_date('10/10/2001', 'DD/MM/YYYY'),
to_date('29/12/2001', 'DD/MM/YYYY'));
insert into t1 values ('1234', to_date('31/12/2001', 'DD/MM/YYYY'),
to_date('15/01/2002', 'DD/MM/YYYY'));
KEY START_DTM END_DTM
---------- --------- ---------
1234 01-JAN-01 15-APR-01
1234 01-JUL-01 30-DEC-01
1234 31-DEC-01 15-JAN-02
Note the addition of the last row in the output - you may want to
determine if that should be part of the previous group of date. You
should be able to fix this by adding or subtracting a date in the
SIGN( ) function.
I originally thought that you were trying to eliminate this row from
the output:
KEY START_DTM END_DTM
---------- --------- ---------
1234 01-FEB-01 30-MAR-01
For the above, there are several methods:
SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM
FROM
T1
WHERE
(T1.KEY,
T1.START_DTM,
T1.END_DTM) NOT IN (
SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM
FROM
T1,
T1 T2
WHERE
T1.KEY=T2.KEY
AND T1.START_DTM BETWEEN T2.START_DTM AND T2.END_DTM
AND T1.END_DTM BETWEEN T2.START_DTM AND T2.END_DTM
AND (
T1.START_DTM<>T2.START_DTM
OR T1.END_DTM<>T2.END_DTM));
KEY START_DTM END_DTM
---------- --------- ---------
1234 01-JAN-01 30-MAR-01
1234 15-FEB-01 15-APR-01
1234 01-JUL-01 30-DEC-01
SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM
FROM
T1,
(SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM
FROM
T1,
T1 T2
WHERE
T1.KEY=T2.KEY
AND T1.START_DTM BETWEEN T2.START_DTM AND T2.END_DTM
AND T1.END_DTM BETWEEN T2.START_DTM AND T2.END_DTM
AND (
T1.START_DTM<>T2.START_DTM
OR T1.END_DTM<>T2.END_DTM)) T3
WHERE
T1.KEY=T3.KEY(+)
AND T1.START_DTM=T3.START_DTM(+)
AND T1.END_DTM=T3.END_DTM(+)
AND T3.KEY IS NULL;
KEY START_DTM END_DTM
---------- --------- ---------
1234 01-JAN-01 30-MAR-01
1234 15-FEB-01 15-APR-01
1234 01-JUL-01 30-DEC-01
SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM
FROM
T1
MINUS
SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM
FROM
T1,
T1 T2
WHERE
T1.KEY=T2.KEY
AND T1.START_DTM BETWEEN T2.START_DTM AND T2.END_DTM
AND T1.END_DTM BETWEEN T2.START_DTM AND T2.END_DTM
AND (
T1.START_DTM<>T2.START_DTM
OR T1.END_DTM<>T2.END_DTM);
KEY START_DTM END_DTM
---------- --------- ---------
1234 01-JAN-01 30-MAR-01
1234 15-FEB-01 15-APR-01
1234 01-JUL-01 30-DEC-01
SELECT
KEY,
START_DTM,
END_DTM
FROM
(SELECT
T1.KEY,
T1.START_DTM,
T1.END_DTM,
MIN(T1.START_DTM) OVER (PARTITION BY T1.KEY, T1.END_DTM)
M_START_DTM,
MAX(T1.END_DTM) OVER (PARTITION BY T1.KEY, T1.START_DTM)
M_END_DTM
FROM
T1)
WHERE
START_DTM=M_START_DTM
AND END_DTM=M_END_DTM;
KEY START_DTM END_DTM
---------- --------- ---------
1234 01-JAN-01 30-MAR-01
1234 15-FEB-01 15-APR-01
1234 01-JUL-01 30-DEC-01
Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
Guys - thanks for the replies on this. I eventually worked it out too
with a third solution after dredging AskTom and the replies here ...
Here is my solution to complete the picture:
Using the same table as above and this data:
create table t1 (key varchar2(10), start_dtm date, end_dtm date);
insert into t1 values(
9999, to_date('01/01/2008', 'DD/MM/YYYY'),
to_date('01/06/2008', 'DD/MM/YYYY'));
insert into t1 values(
9999, to_date('01/02/2008', 'DD/MM/YYYY'),
to_date('01/06/2008', 'DD/MM/YYYY'));
insert into t1 values(
9999, to_date('01/02/2008', 'DD/MM/YYYY'),
to_date('01/05/2008', 'DD/MM/YYYY'));
insert into t1 values(
9999, to_date('01/02/2008', 'DD/MM/YYYY'),
to_date('01/07/2008', 'DD/MM/YYYY'));
This gives a set of rows like:
|------------------------------------------|
|---------------------------------------|
|---------------------|
|------------------------------------------|
And we expect this to turn into a single row like
01/01/08 |----------------------------------------------| 01/07/2008
insert into t1 values(
9999, to_date('01/08/2008', 'DD/MM/YYYY'),
null);
insert into t1 values(
9999, to_date('01/09/2008', 'DD/MM/YYYY'),
to_date('01/10/2008', 'DD/MM/YYYY'));
|--------------------------------------->
|------------|
And we want an row that looks like
01/08/2008 |----------------------------> (null end date)
I think this query does it:
select key, min(start_dtm), max(end_dtm)
from
(
select key, start_dtm, end_dtm, max (grp) over (partition by key
order by start_dtm asc) grp2
from
(
SELECT key,
start_dtm,
NVL (end_dtm, '31-dec-4712') end_dtm,
case
when
(start_dtm not between lag(start_dtm) over (partition
by key order by start_dtm asc)
and nvl( lag(end_dtm) over (partition by key order
by start_dtm asc), '31-dec-4712') )
or lag(start_dtm) over (partition by key order by
start_dtm asc) is null
then
row_number() over (partition by key order by
start_dtm asc)
end grp
FROM t1
order by key, start_dtm asc
) a
) b
group by key, grp2
GRP MIN(START MAX(END_D
--------------------------------- --------- ---------
09999 01-JAN-08 01-JUL-08
09999 01-AUG-08 31-DEC-12
2 rows selected.
Basically I said, if you order the rows by increasing start date, then
if rows overlap, the current start_dtm must be between the previous
start_dtm and end_dtm - if there is no previous start_dtm or they
don't overlap, then its a new group.
The innermost select outputs something like:
KEY START_DTM END_DTM GRP
--------------------------------- --------- --------- ----------
09999 01-JAN-08 01-JUN-08 1
09999 01-FEB-08 01-JUL-08
09999 01-FEB-08 01-JUN-08
09999 01-FEB-08 01-MAY-08
09999 01-AUG-08 31-DEC-12 5
09999 01-SEP-08 01-OCT-08
Then we use analytics again to fill in the blanks in the enclosing
query (b):
KEY START_DTM END_DTM GRP2
--------------------------------- --------- --------- ----------
09999 01-JAN-08 01-JUN-08 1
09999 01-FEB-08 01-JUL-08 1
09999 01-FEB-08 01-JUN-08 1
09999 01-FEB-08 01-MAY-08 1
09999 01-AUG-08 31-DEC-12 5
09999 01-SEP-08 01-OCT-08 5
and then simply group by KEY, GRP in the outer query to get the
result.
I am glad I went through the pain of figuring this out (took me a good
90 minutes) as I really feel like I get analytics now. It also
allowed me to replace a temporary table and several 100 lines of PLSQL
that was performing rather poorly and written by someone who doesn't
get analytics!
Thanks again,
Stephen.
> I am glad I went through the pain of figuring this out (took me a good
> 90 minutes) as I really feel like I get analytics now. It also
> allowed me to replace a temporary table and several 100 lines of PLSQL
> that was performing rather poorly and written by someone who doesn't
> get analytics!
>
> Thanks again,
>
> Stephen.
I keep feeling like I get it, don't use it enough, then don't get it
any more. But this helped: http://www.orafaq.com/usenet/comp.databases.oracle.server/2007/08/03/0224.htm
jg
--
@home.com is bogus.
http://catless.ncl.ac.uk/Risks/25.21.html#subj8