Add support for MEDIAN aggregate function

608 views
Skip to first unread message

Lukas Eder

unread,
Oct 15, 2011, 11:48:14 AM10/15/11
to h2-da...@googlegroups.com
This is not in the SQL standard, but it woudl still be nice to have. To my knowledge, any of these RDBMS support a MEDIAN aggregate function:

- HSQLDB
- Oracle
- Sybase SQL Anywhere

Here are some hints how the MEDIAN value can be calculated using window functions or common table expressions, e.g. in DB2 or SQL Server:

http://stackoverflow.com/questions/1342898/function-to-calculate-median-in-sql-server
http://stackoverflow.com/questions/1247476/db2-sql-median-with-group-by

Thomas Mueller

unread,
Oct 17, 2011, 3:05:10 PM10/17/11
to h2-da...@googlegroups.com
Hi,

H2 supports custom aggregate functions.

Do you have a (real-world) use case for MEDIAN?

Regards,
Thomas

Lukas Eder

unread,
Oct 18, 2011, 2:34:11 AM10/18/11
to h2-da...@googlegroups.com
Hi Thomas,

I don't need it for myself, hence no "real-world" use case. Just thought that this could be some low-prio Oracle compatibility feature request for H2

Cheers
Lukas

RICCARD

unread,
Oct 18, 2011, 3:34:20 AM10/18/11
to h2-da...@googlegroups.com
Hi,
 
MEDIAN is used a lot when calculating salary statistics since AVG is so sensitive
to a few very high or low salaries in a group. I would use it now and then.
Otherwise I don`t know.

Riccard
--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To view this discussion on the web visit https://groups.google.com/d/msg/h2-database/-/OBFo1DHM6xEJ.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.

Thomas Mueller

unread,
Nov 2, 2011, 2:13:18 PM11/2/11
to h2-da...@googlegroups.com
Hi,

I have added the request for MEDIAN to the roadmap. By the way, a
sample implementation (using a user defined aggregate function) is
available here:

http://code.google.com/p/h2database/source/browse/trunk/h2/src/test/org/h2/test/db/TestFunctions.java#486

Regards,
Thomas

Lukas Eder

unread,
Nov 2, 2011, 5:51:37 PM11/2/11
to h2-da...@googlegroups.com
Very nice, thank you!

RICCARD

unread,
Nov 2, 2011, 7:51:25 PM11/2/11
to h2-da...@googlegroups.com

Th2omas!

 

I will have a good use of that!

Thank You!


Riccard

Lukas Eder

unread,
Jan 7, 2015, 1:31:09 PM1/7/15
to h2-da...@googlegroups.com, riccar...@msn.com
Hello,

Much more interesting than the MEDIAN() aggregate function itself are inverse distribution functions, or ordered-set aggregate functions in general. MEDIAN(xyz) can be emulated via PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY xyz), as documented in this blog post:

Have ordered-set aggregate functions been considered in H2, before? I know, window functions are also on the roadmap. It would be awesome if these features were a part of H2 ;-)

Cheers
Lukas

Rami Ojares

unread,
Jan 7, 2015, 5:48:46 PM1/7/15
to h2-da...@googlegroups.com
These window functions seem to me like just a shorthand notation.

Here is an example from postgresql documentation.
http://www.postgresql.org/docs/9.1/static/tutorial-window.html

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;

This is just another way to say

SELECT
  e1.*, e2.avg
FROM
  empsalary e1
JOIN (
  SELECT depname, AVG(salary) as avg
  FROM empsalary
  GROUP BY depname
) e2 ON e1.depname = e2.depname

SELECT MEDIAN(salary) FROM empsalary;

is just a shorthand for

SELECT salary
FROM empsalary
ORDER BY salary
LIMIT 1 OFFSET SELECT (COUNT(salary)+1)/2 FROM empsalary;

- Rami
--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.

To post to this group, send email to h2-da...@googlegroups.com.

Lukas Eder

unread,
Jan 8, 2015, 12:49:15 AM1/8/15
to h2-da...@googlegroups.com
2015-01-07 23:48 GMT+01:00 Rami Ojares <rami....@gmail.com>:
These window functions seem to me like just a shorthand notation.

Here is an example from postgresql documentation.
http://www.postgresql.org/docs/9.1/static/tutorial-window.html

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;

This is just another way to say

SELECT
  e1.*, e2.avg
FROM
  empsalary e1
JOIN (
  SELECT depname, AVG(salary) as avg
  FROM empsalary
  GROUP BY depname
) e2 ON e1.depname = e2.depname
Interesting approach. Obviously, this would have to be critically reviewed from a performance perspective as most databases implement window functions rather efficiently. They can, because the function is applied on the previously materialised result set that is about to be projected by the SELECT clause. Adding JOINs can have quite different implications for the execution plan. I'd probably emulate this using a nested SELECT, to prevent accidental interference with WHERE, GROUP BY, and HAVING clauses that might be added as well.

How would you emulate ORDER BY and ROWS BETWEEN ... clauses?
SELECT MEDIAN(salary) FROM empsalary;

is just a shorthand for

SELECT salary
FROM empsalary
ORDER BY salary
LIMIT 1 OFFSET SELECT (COUNT(salary)+1)/2 FROM empsalary;

It's a bit more complex if you have an even number of values in your set, then, the mean of the two values in the middle is expected. E.g. the median of 1, 2, 3, 10 is 2.5

Noel Grandin

unread,
Jan 8, 2015, 1:14:02 AM1/8/15
to h2-da...@googlegroups.com
I had a bash at implementing this about 6 months ago, but it's decidedly non-trivial.

It doesn't fit into our existing framework, so We would have to implement it by adding another layer in our processing
chain.

Reply all
Reply to author
Forward
0 new messages