10.2.x on Linux ...
Is there a "neater"/more efficient way of selecting the maximum value for
a column based on the maximum value of another column. Gods that reads
poorly, I'll try an example ...
SomeTable
ColX ColY ColZ .... MoreCols
A 1 8 ....
A 1 9 ....
A 3 4 ....
A 3 6 ....
....
B 7 2 ....
B 7 7 ....
B 8 3 ....
B 8 5 ....
....
I want to get all columns for each distinct value in ColX, but only the
row for the max value of ColY, and the max value of ColZ for that
combination, ie.:
A 3 6 ....
B 8 5 ....
I already have code that returns the correct result, I'm just wondering if
there is a technique to do this which is more effective/efficient.
Current dummy code:
SELECT ColX, ColY, ColZ, ... MoreCols
FROM SomeTable
WHERE (ColX, ColY, ColZ) IN
(SELECT A.ColX, A.ColY, max(B.ColZ)
FROM
(SELECT ColX, max(ColY) ColY
FROM SomeTable
GROUP BY ColX) A,
SomeTable B
WHERE A.ColX = B.ColX
AND A.ColY = B.ColY
GROUP BY A.ColX, A.ColY)
AND <other unrelated where conditions>;
FWIW, the real code is actually querying an Oracle Change Data Capture
subscriber view (ColX is the PK of the source table, ColY is CSCN$ and
ColZ is RSID$), and I want to pick up only the last available change
values in a change window.
Any advice appreciated.
Geoff M
What about:
SELECT ColX, ColY, ColZ, ... MoreCols
FROM ( select ColX, ColY, ColZ, ... MoreCols, rank () over (partition
by ColX order by colY desc, colZ desc) my_rank from SomeTable ) a
where a.my_rank=1
Not tested.
(No Oracle at hand)
HTH.
Cheers.
Carlos.
My advice would be to re-examine your data model instead of looking for
the Mad Max function.
--
Mladen Gogala
http://mgogala.freehostia.com
Let's try a couple tests. First the DDL and DML for the tests:
CREATE TABLE T1(
COLX VARCHAR2(5),
COLY NUMBER(5),
COLZ NUMBER(5));
INSERT INTO T1 VALUES ('A',1,8);
INSERT INTO T1 VALUES ('A',1,9);
INSERT INTO T1 VALUES ('A',3,4);
INSERT INTO T1 VALUES ('A',3,6);
INSERT INTO T1 VALUES ('B',7,2);
INSERT INTO T1 VALUES ('B',7,7);
INSERT INTO T1 VALUES ('B',8,3);
INSERT INTO T1 VALUES ('B',8,5);
Now the first test to see what happens with the MAX analytical
function:
SELECT
COLX,
MAX(COLY) OVER (PARTITION BY COLX) COLY,
COLZ
FROM
T1;
COLX COLY COLZ
----- ---------- ----------
A 3 8
A 3 9
A 3 4
A 3 6
B 8 2
B 8 7
B 8 3
B 8 5
The above looks interesting, let's see if it will work by sliding the
above into an inline view and then picking up the maximum value for
COLZ:
SELECT DISTINCT
COLX,
COLY,
MAX(COLZ) OVER (PARTITION BY COLX, COLY) COLZ
FROM
(SELECT
COLX,
MAX(COLY) OVER (PARTITION BY COLX) COLY,
COLZ
FROM
T1);
COLX COLY COLZ
----- ---------- ----------
B 8 7
A 3 9
Slight problem with COLZ in the above. Let's try again, this time
partitioning by two columns from the start:
SELECT
COLX,
COLY OLD_COLY,
MAX(COLY) OVER (PARTITION BY COLX) COLY,
MAX(COLZ) OVER (PARTITION BY COLX, COLY) COLZ
FROM
T1;
COLX OLD_COLY COLY COLZ
----- ---------- ---------- ----------
A 1 3 9
A 1 3 9
A 3 3 6
A 3 3 6
B 7 8 7
B 7 8 7
B 8 8 5
B 8 8 5
Now to slide the above into an inline view and pick out only those
rows where COLZ originally existed within a matching COLY value:
SELECT DISTINCT
COLX,
COLY,
COLZ
FROM
(SELECT
COLX,
COLY OLD_COLY,
MAX(COLY) OVER (PARTITION BY COLX) COLY,
MAX(COLZ) OVER (PARTITION BY COLX, COLY) COLZ
FROM
T1)
WHERE
OLD_COLY=COLY;
COLX COLY COLZ
----- ---------- ----------
B 8 5
A 3 6
Let's test the output of the SQL statement provided by Carlos with my
test table:
SELECT ColX, ColY, ColZ
FROM ( select ColX, ColY, ColZ, rank () over (partition
by ColX order by colY desc, colZ desc) my_rank from T1 ) a
where a.my_rank=1;
COLX COLY COLZ
----- ---------- ----------
A 3 6
B 8 5
It looks like the SQL that Carlos provided works with the sample data
also.
Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
> SELECT DISTINCT
> COLX,
> COLY,
> COLZ
> FROM
> (SELECT
> COLX,
> COLY OLD_COLY,
> MAX(COLY) OVER (PARTITION BY COLX) COLY, MAX(COLZ) OVER (PARTITION
> BY COLX, COLY) COLZ
> FROM
> T1)
> WHERE
> OLD_COLY=COLY;
And, if T1 has few thousand rows, it's going to be full table scan,
window sort and sort distinct, in that order. In other words, the
resource consumption will just explode. As this is obviously a report,
materialized view looks like the only reasonable solution. This is a
sort of "mad max" query.
From the very first look, i would probably write this query as
select colx,
max(coly) keep(dense_rank last order by coly,colz) ,
max(colz) keep(dense_rank last order by coly,colz)
from t1
group by colx
but, if i remember correctly, Mladen has shown some weeks ago (i still
had no time to investigate it deeper, though it is pretty interesting
thema) an example where analytics were substantially slower/less
efficient compared to aggregates, due to optimization by hash group by,
so i would probably thoroughly test all alternatives in regard of
performance, especially with significant data amounts....
Best regards
Maxim
In the interest of curiosity, a test set up with 260,000 rows on a
table with 26 columns:
CREATE TABLE T1(
COLX VARCHAR2(5),
COLY NUMBER(5),
COLZ NUMBER(5),
COLA NUMBER(5),
COLB NUMBER(5),
COLC NUMBER(5),
COLD NUMBER(5),
COLE NUMBER(5),
COLF NUMBER(5),
COLG NUMBER(5),
COLH NUMBER(5),
COLI NUMBER(5),
COLJ NUMBER(5),
COLK NUMBER(5),
COLL NUMBER(5),
COLM NUMBER(5),
COLN NUMBER(5),
COLO NUMBER(5),
COLP NUMBER(5),
COLQ NUMBER(5),
COLR NUMBER(5),
COLS NUMBER(5),
COLT NUMBER(5),
COLU NUMBER(5),
COLV NUMBER(5),
COLW NUMBER(5));
INSERT INTO
T1
SELECT
COL.COLX,
MOD(ROWNUM,1000)+1 COLY,
ABS(ROUND(SIN(CNT.RN/180)*100,0)) COLZ,
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100)),
ROUND(DBMS_RANDOM.VALUE(1,100))
FROM
(SELECT
ROWNUM RN
FROM
DUAL
CONNECT BY
LEVEL<=10000) CNT,
(SELECT
CHR(ROWNUM+64) COLX,
ROWNUM C
FROM
DUAL
CONNECT BY
LEVEL<=26) COL;
260000 rows created.
COMMIT;
SELECT
COLX,
COLY,
COLZ,
COLA
FROM
T1
WHERE
ROWNUM<=200;
COLX COLY COLZ COLA
----- ---------- ---------- ----------
K 90 2 14
L 91 2 85
M 92 2 79
N 93 2 82
O 94 2 3
P 95 2 83
...
Z 287 6 64
A 288 7 79
B 289 7 61
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1');
Now that the table has been created, and statistics gathered, we start
the experiments. SQL statement, DBMS_XPLAN with all stats last, and
the row source execution plan from a 10046 trace (Maxim's SQL
statement is fast!):
SELECT DISTINCT
COLX,
COLY,
COLZ
FROM
(SELECT
COLX,
COLY OLD_COLY,
MAX(COLY) OVER (PARTITION BY COLX) COLY,
MAX(COLZ) OVER (PARTITION BY COLX, COLY) COLZ
FROM
T1)
WHERE
OLD_COLY=COLY;
COLX COLY COLZ
----- ---------- ----------
I 1000 100
J 999 100
T 999 99
U 1000 99
B 999 100
F 999 100
P 999 100
R 999 100
H 999 100
N 999 100
Q 1000 100
E 1000 100
G 1000 100
M 1000 100
O 1000 100
X 999 100
K 1000 100
W 1000 100
L 999 100
Y 1000 100
D 999 100
V 999 100
Z 999 100
A 1000 100
C 1000 100
S 1000 100
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-
Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
| 1 | HASH UNIQUE | | 1 | 26 | 26 |
00:00:01.51 | 3022 | 2899 | | | |
|* 2 | VIEW | | 1 | 264K| 520 |
00:00:01.50 | 3022 | 2899 | | | |
| 3 | WINDOW SORT | | 1 | 264K| 260K|
00:00:01.33 | 3022 | 2899 | 8628K| 1150K| 7669K (0)|
| 4 | TABLE ACCESS FULL| T1 | 1 | 264K| 260K|
00:00:00.26 | 3022 | 2899 | | | |
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OLD_COLY"="COLY")
1.421 Seconds of CPU time
FETCH
#8:c=1421875,e=1505456,p=2899,cr=3022,cu=0,mis=0,r=26,dep=0,og=1,tim=4406046677
STAT #8 id=1 cnt=26 pid=0 pos=1 obj=0 op='HASH UNIQUE (cr=3022 pr=2899
pw=0 time=1505477 us)'
STAT #8 id=2 cnt=520 pid=1 pos=1 obj=0 op='VIEW (cr=3022 pr=2899 pw=0
time=1502946 us)'
STAT #8 id=3 cnt=260000 pid=2 pos=1 obj=0 op='WINDOW SORT (cr=3022
pr=2899 pw=0 time=1334835 us)'
STAT #8 id=4 cnt=260000 pid=3 pos=1 obj=54782 op='TABLE ACCESS FULL T1
(cr=3022 pr=2899 pw=0 time=260081 us)'
==========================================
SELECT
COLX,
COLY,
COLZ
FROM
(SELECT
COLX,
COLY OLD_COLY,
MAX(COLY) OVER (PARTITION BY COLX) COLY,
MAX(COLZ) OVER (PARTITION BY COLX, COLY) COLZ
FROM
T1)
WHERE
OLD_COLY=COLY
GROUP BY
COLX,
COLY,
COLZ;
---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-
Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
| 1 | HASH GROUP BY | | 1 | 26 | 26 |
00:00:01.56 | 3022 | 2899 | | | |
|* 2 | VIEW | | 1 | 264K| 520 |
00:00:01.56 | 3022 | 2899 | | | |
| 3 | WINDOW SORT | | 1 | 264K| 260K|
00:00:01.35 | 3022 | 2899 | 8628K| 1150K| 7669K (0)|
| 4 | TABLE ACCESS FULL| T1 | 1 | 264K| 260K|
00:00:00.26 | 3022 | 2899 | | | |
---------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("OLD_COLY"="COLY")
1.515 Seconds of CPU Time
FETCH
#13:c=1515625,e=1558810,p=2899,cr=3022,cu=0,mis=0,r=26,dep=0,og=1,tim=4603512997
STAT #13 id=1 cnt=26 pid=0 pos=1 obj=0 op='HASH GROUP BY (cr=3022
pr=2899 pw=0 time=1558837 us)'
STAT #13 id=2 cnt=520 pid=1 pos=1 obj=0 op='VIEW (cr=3022 pr=2899
pw=0 time=1556393 us)'
STAT #13 id=3 cnt=260000 pid=2 pos=1 obj=0 op='WINDOW SORT (cr=3022
pr=2899 pw=0 time=1351194 us)'
STAT #13 id=4 cnt=260000 pid=3 pos=1 obj=54782 op='TABLE ACCESS FULL
T1 (cr=3022 pr=2899 pw=0 time=260089 us)'
==========================================
SELECT ColX, ColY, ColZ
FROM ( select ColX, ColY, ColZ, rank () over (partition
by ColX order by colY desc, colZ desc) my_rank from T1 ) a
where a.my_rank=1;
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|* 1 | VIEW | | 1 | 264K| 38 |
00:00:02.09 | 3022 | 2899 | | | |
|* 2 | WINDOW SORT PUSHED RANK| | 1 | 264K| 260K|
00:00:01.84 | 3022 | 2899 | 12M| 1367K| 11M (0)|
| 3 | TABLE ACCESS FULL | T1 | 1 | 264K| 260K|
00:00:00.26 | 3022 | 2899 | | | |
-------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"."MY_RANK"=1)
2 - filter(RANK() OVER ( PARTITION BY "COLX" ORDER BY
INTERNAL_FUNCTION("COLY") DESC ,INTERNAL_FUNCTION("COLZ")
DESC )<=1)
2.078 Seconds of CPU Time
FETCH
#13:c=2078125,e=2088059,p=2899,cr=3022,cu=0,mis=0,r=38,dep=0,og=1,tim=4783262353
WAIT #13: nam='SQL*Net message from client' ela= 30556 driver
id=1413697536 #bytes=1 p3=0 obj#=54782 tim=4783295432
STAT #13 id=1 cnt=38 pid=0 pos=1 obj=0 op='VIEW (cr=3022 pr=2899 pw=0
time=2087814 us)'
STAT #13 id=2 cnt=260000 pid=1 pos=1 obj=0 op='WINDOW SORT PUSHED RANK
(cr=3022 pr=2899 pw=0 time=1837749 us)'
STAT #13 id=3 cnt=260000 pid=2 pos=1 obj=54782 op='TABLE ACCESS FULL
T1 (cr=3022 pr=2899 pw=0 time=260091 us)'
==========================================
select colx,
max(coly) keep(dense_rank last order by coly,colz) ,
max(colz) keep(dense_rank last order by coly,colz)
from t1
group by colx;
-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-
Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
| 1 | SORT GROUP BY | | 1 | 26 | 26 |
00:00:00.65 | 3022 | 2899 | 9216 | 9216 | 8192 (0)|
| 2 | TABLE ACCESS FULL| T1 | 1 | 264K| 260K|
00:00:00.26 | 3022 | 2899 | | | |
-------------------------------------------------------------------------------------------------------------------------
0.609 Seconds of CPU Time
FETCH
#13:c=609375,e=651450,p=2899,cr=3022,cu=0,mis=0,r=26,dep=0,og=1,tim=4954930446
WAIT #13: nam='SQL*Net message from client' ela= 5631 driver
id=1413697536 #bytes=1 p3=0 obj#=54782 tim=4954936214
STAT #13 id=1 cnt=26 pid=0 pos=1 obj=0 op='SORT GROUP BY (cr=3022
pr=2899 pw=0 time=651409 us)'
STAT #13 id=2 cnt=260000 pid=1 pos=1 obj=54782 op='TABLE ACCESS FULL
T1 (cr=3022 pr=2899 pw=0 time=260118 us)'
==========================================
SELECT ColX, ColY, ColZ
FROM T1
WHERE (ColX, ColY, ColZ) IN
(SELECT A.ColX, A.ColY, max(B.ColZ)
FROM
(SELECT ColX, max(ColY) ColY
FROM T1
GROUP BY ColX) A,
T1 B
WHERE A.ColX = B.ColX
AND A.ColY = B.ColY
GROUP BY A.ColX, A.ColY);
----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows
| A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------
|* 1 | HASH JOIN RIGHT SEMI | | 1 | 1 | 38 |
00:00:01.75 | 9066 | 8697 | 1206K| 1206K| 1212K (0)|
| 2 | VIEW | VW_NSO_1 | 1 | 26 | 26 |
00:00:01.13 | 6044 | 5798 | | | |
| 3 | HASH GROUP BY | | 1 | 26 | 26 |
00:00:01.13 | 6044 | 5798 | | | |
|* 4 | HASH JOIN | | 1 | 264 | 520 |
00:00:01.13 | 6044 | 5798 | 1348K| 1348K| 1177K (0)|
| 5 | VIEW | | 1 | 26 | 26 |
00:00:00.52 | 3022 | 2899 | | | |
| 6 | HASH GROUP BY | | 1 | 26 | 26 |
00:00:00.52 | 3022 | 2899 | | | |
| 7 | TABLE ACCESS FULL| T1 | 1 | 264K| 260K|
00:00:00.26 | 3022 | 2899 | | | |
| 8 | TABLE ACCESS FULL | T1 | 1 | 264K| 260K|
00:00:00.26 | 3022 | 2899 | | | |
| 9 | TABLE ACCESS FULL | T1 | 1 | 264K| 260K|
00:00:00.26 | 3022 | 2899 | | | |
----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("COLX"="$nso_col_1" AND "COLY"="$nso_col_2" AND
"COLZ"="$nso_col_3")
4 - access("A"."COLX"="B"."COLX" AND "A"."COLY"="B"."COLY")
1.609 Seconds of CPU Time
FETCH
#13:c=1609375,e=1751476,p=8697,cr=9066,cu=0,mis=0,r=38,dep=0,og=1,tim=40897698
STAT #13 id=1 cnt=38 pid=0 pos=1 obj=0 op='HASH JOIN RIGHT SEMI
(cr=9066 pr=8697 pw=0 time=1751397 us)'
STAT #13 id=2 cnt=26 pid=1 pos=1 obj=0 op='VIEW VW_NSO_1 (cr=6044
pr=5798 pw=0 time=1132471 us)'
STAT #13 id=3 cnt=26 pid=2 pos=1 obj=0 op='HASH GROUP BY (cr=6044
pr=5798 pw=0 time=1132438 us)'
STAT #13 id=4 cnt=520 pid=3 pos=1 obj=0 op='HASH JOIN (cr=6044
pr=5798 pw=0 time=1128901 us)'
STAT #13 id=5 cnt=26 pid=4 pos=1 obj=0 op='VIEW (cr=3022 pr=2899 pw=0
time=519125 us)'
STAT #13 id=6 cnt=26 pid=5 pos=1 obj=0 op='HASH GROUP BY (cr=3022
pr=2899 pw=0 time=519095 us)'
STAT #13 id=7 cnt=260000 pid=6 pos=1 obj=54782 op='TABLE ACCESS FULL
T1 (cr=3022 pr=2899 pw=0 time=260092 us)'
STAT #13 id=8 cnt=260000 pid=4 pos=2 obj=54782 op='TABLE ACCESS FULL
T1 (cr=3022 pr=2899 pw=0 time=260095 us)'
STAT #13 id=9 cnt=260000 pid=1 pos=2 obj=54782 op='TABLE ACCESS FULL
T1 (cr=3022 pr=2899 pw=0 time=260092 us)'
It might still be a "mad max" query, but it was a fun exercise.
> And, if T1 has few thousand rows, it's going to be full table scan,
> window sort and sort distinct, in that order. In other words, the
> resource consumption will just explode. As this is obviously a report,
> materialized view looks like the only reasonable solution. This is a
> sort of "mad max" query.
Actually it will need to populate a cursor in a PL/SQL package.
GM
> My advice would be to re-examine your data model instead of looking for
> the Mad Max function.
Not much I can do about it. They are Oracle Change Data Capture
subscriber views created automatically when using the in-built
DBMS_CDC_SUBSCRIBE package.
Geoff M
That looks groovy.
Many thanks Charles (particularly for doing that performance analysis),
Mladen and Maxim.
Rank and dense_rank are the type of techniques I only vaguely understand,
but are likely to be the things that I was looking for.
Will sit down and do the analysis on our own data next week with the
assistance of my DBA (as I am a mere humble developer) when/if I can get
any time in his crowded calendar.
Again, many thanks.
Geoff M
> Actually it will need to populate a cursor in a PL/SQL package.
>
> GM
Geoff, I work for a telecommunication company and we have a process called
"rating" which assigns charging rate and a price to each call. There are
approximately 3,000,000 CDR's (Call Detail Records) per day. There are
many reports for the management in which management asks for max of this,
min of that and so on. It was a long struggle to replace analytic
functions with the triggers which were accounting for minimums and
maximumx as the rating process was running. I have to proudly say that the
performance benefits were great. Of course, each update is a bit more
expensive then it used to be, but our server is coping just fine.