Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Median Function in mySQL

4 views
Skip to first unread message

Ross Contino

unread,
Dec 7, 2004, 7:25:55 AM12/7/04
to
Hello to all:

I have been searching the web for examples on how to determine a median
value in a mySQL table. I have reviewed the article at
http://mysql.progen.com.tr/doc/en/Group_by_functions.html. I am an
experienced VB programmer that has recently moved to PHP/mySQL. My employer
has a text file outputted from a vendor specific software with data.
However it cannot be manipulated because it is text. I created a web that
reads the text file on client, converts it to usable data, inputs to mySQL
table and then reflects HTML pages with desired report. One report
subtracts two times and then my employer wishes a median grouped by another
field. I used the AVG function, but of course this is mean, not Median. My
query is:

$mSql = "SELECT AdmitDoc,
Count(AdmitDoc) as Number_Admits,
sec_to_time(AVG(unix_timestamp(Outdt) -
unix_timestamp(Decdt))) as AVG_Doc_To_Admit,
sec_to_time(AVG(unix_timestamp(Outdt) -
unix_timestamp(Indt))) as AVG_Total_Stay
from tadmits GROUP BY AdmitDoc";

This query subtracts several wait times in an emergency department and
groups by admitting physician. I would like to have median rather than
mean. AdmitDoc is text field, Outdt, Decdt, Indt are all date/time fields.

Any thoughts?

Thanks,
Ross


Steve

unread,
Dec 7, 2004, 9:26:38 AM12/7/04
to

There is no MEDIAN() operator in MySQL, so you will have to perform
this operation in several steps...

(1) Count the number of rows in each group
(2) For each group, if the count is odd, the median is the middle row
(when the set is ordered); when the count is even, the median is the
mean of the middle two rows.

In skeletal form...

// get counts for each set of interest...

SELECT AdmitDoc, COUNT(*) AS "Number_Admits"
FROM tadmits
GROUP BY AdmitDoc

// for each AdmitDoc, work out LIMIT parameters (nb first row =
0)...

$doc = $datRow[ 'AdmitDoc' ];
$count = $datRow[ Number_Admits];

if $count is odd,
$limit1 = intval( $count / 2 );
$limit2 = 1;
else
$limit1 = intval( $count / 2 ) - 1;
$limit2 = 2;

// select middle row(s) from ORDERed set...

SELECT AdmitDoc,
SEC_TO_TIME(UNIX_TIMESTAMP(Outdt)-UNIX_TIMESTAMP(Decdt)) AS
"Doc_To_Admit"
FROM tadmits
WHERE AdmitDoc = $doc
ORDER BY UNIX_TIMESTAMP(Outdt) - UNIX_TIMESTAMP(Decdt)
LIMIT $limit1, $limit2


if $count is odd,
// only one row...
$datRow = mysql_fetch_array( $datRows, MYSQL_ASSOC );
$median_doc_to_admit = $datRow[ 'Doc_To_Admit' ];
else
// two rows
$datRow = mysql_fetch_array( $datRows, MYSQL_ASSOC );
$val1 = $datRow[ 'Doc_To_Admit' ];
$datRow = mysql_fetch_array( $datRows, MYSQL_ASSOC );
$val2 = $datRow[ 'Doc_To_Admit' ];
$median_doc_to_admit = ( $val1 + $val2 ) / 2;

You'll have to do this separately for your two required measures
because they each need a different sort order.

---
Steve

Steve

unread,
Dec 7, 2004, 10:01:51 AM12/7/04
to

There is no MEDIAN() operator in MySQL, so you will have to perform

Ross Contino

unread,
Dec 13, 2004, 2:26:40 PM12/13/04
to
Hi Steve:

I ended up writing a php function to select the medians from both values,
but it entailed looping the data frequently and creating a new table to
store it (see below). I could not quite get your method working. Would
your method have only worked via SQL and preclude the constant looping and
need for a table to store the medians?

Median functions:

/***********function block*************************/
function GetMedian_Dec($myDoc, $No_Admits)
{
$MyMedian = "";
$LimitSql = "SELECT
SEC_TO_TIME(UNIX_TIMESTAMP(Outdt)-UNIX_TIMESTAMP(Decdt))AS Doc_To_Admit
FROM tadmits WHERE AdmitDoc = '" . $myDoc . "'" . "
ORDER BY Doc_To_Admit";
$result2 = mysql_query($LimitSql);
$i=1;
$FirstTime = "";
$SecondTime = "";
If (($No_Admits % 2)==1){ //odd number of admissions
$MyCounter = intval(($No_Admits / 2) + 1);
while($row3=mysql_fetch_assoc($result2)){
if ($i ==$MyCounter){
$UpSql = "UPDATE tersummary SET Median_MD_To_Admit = '"
. $row3['Doc_To_Admit'] . "' WHERE AdmitDoc = '" . $myDoc . "'";
mysql_query($UpSql);
}//end if
$i++;
}//end while
}
else { //even number of admissions
$MyCounter = intval($No_Admits / 2);
while($row3=mysql_fetch_assoc($result2)){
if ($i==$MyCounter){
$FirstTime = $row3['Doc_To_Admit'];
}//end if
if ($i==$MyCounter + 1){
$SecondTime = $row3['Doc_To_Admit'];
}//end if
$i++;
}//end while
$AvgVal = (strtotime($FirstTime) + strtotime($SecondTime)) /
2;
$UpSql = "UPDATE tersummary SET Median_MD_To_Admit = '" .
date("H:i:s",$AvgVal) . "' WHERE AdmitDoc = '" . $myDoc . "'";
mysql_query($UpSql);
}
}
function GetMedian_Stay($myDoc, $No_Admits)
{ $MyMedian = "";
$LimitSql = "SELECT
SEC_TO_TIME(UNIX_TIMESTAMP(Outdt)-UNIX_TIMESTAMP(Indt))AS Total_Stay
FROM tadmits WHERE AdmitDoc = '" . $myDoc . "'" . "
ORDER BY Total_Stay";
$result2 = mysql_query($LimitSql);
$i=1;
$FirstTime = "";
$SecondTime = "";
If (($No_Admits % 2)==1){ //odd number of admissions
$MyCounter = intval(($No_Admits / 2) + 1);
while($row3=mysql_fetch_assoc($result2)){
if ($i ==$MyCounter){
$UpSql = "UPDATE tersummary SET Median_Total_Stay = '"
. $row3['Total_Stay'] . "' WHERE AdmitDoc = '" . $myDoc . "'";
mysql_query($UpSql);
}//end if
$i++;
}//end while
}
else { //even number of admissions
$MyCounter = intval($No_Admits / 2);
while($row3=mysql_fetch_assoc($result2)){
if ($i==$MyCounter){
$FirstTime = $row3['Total_Stay'];
}//end if
if ($i==$MyCounter + 1){
$SecondTime = $row3['Total_Stay'];
}//end if
$i++;
}//end while
$AvgVal = (strtotime($FirstTime) + strtotime($SecondTime)) /
2;
$UpSql = "UPDATE tersummary SET Median_Total_Stay = '" .
date("H:i:s",$AvgVal) . "' WHERE AdmitDoc = '" . $myDoc . "'";
mysql_query($UpSql);
}//end if/else
}


/*************end block****************************/

Steve

unread,
Dec 15, 2004, 7:14:07 AM12/15/04
to

Ross:

The outline I suggested uses PHP to control the loop through each
AdmitDoc, and to create the text of the query to find the median
value(s), so it is not a SQL-only solution. The main advantage it has
over your code is that using a LIMIT clause fetches only 1 or 2 rows
regardless of how many are in the table, which could be a sigificant
optimisation if you have thousands of rows.

Which bit wasn't working for you?

Here's a self-contained worked example...


<?php

$t_strServer = 'localhost';
$t_strUser = 'root';
$t_strPassword = '';


$t_objSQLConnection = mysql_connect( $t_strServer, $t_strUser,
$t_strPassword );
mysql_select_db( 'TEST', $t_objSQLConnection );
mysql_query( 'DROP TABLE IF EXISTS tadmits', $t_objSQLConnection );
mysql_query( 'CREATE TABLE tadmits ( id INT, AdmitDoc VARCHAR(2), outdt
DATETIME, decdt DATETIME )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES ( 1, "aa", "20041203000000",
"20041201000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES ( 2, "aa", "20041204000000",
"20041202000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES ( 3, "aa", "20041206000000",
"20041202000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES ( 4, "bb", "20041208000000",
"20041203000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES ( 5, "bb", "20041211000000",
"20041204000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES ( 6, "cc", "20041210000000",
"20041206000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES ( 7, "cc", "20041213000000",
"20041203000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES ( 8, "cc", "20041216000000",
"20041205000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES ( 9, "cc", "20041223000000",
"20041201000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES (10, "cc", "20041222000000",
"20041207000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES (11, "dd", "20041226000000",
"20041202000000" )', $t_objSQLConnection );
mysql_query( 'INSERT INTO tadmits VALUES (12, "ee", "20041214000000",
"20041203000000" )', $t_objSQLConnection );

print "Doctor:\tMedian days\n\n";

$t_datRows = mysql_query( 'SELECT AdmitDoc, COUNT(*) as "Number_Admits"
FROM tadmits GROUP BY AdmitDoc', $t_objSQLConnection );
while( $t_datRow = mysql_fetch_array( $t_datRows, MYSQL_ASSOC ) )
{
$t_strAdmitDoc = $t_datRow[ 'AdmitDoc' ];
$t_lngNumber_Admits = $t_datRow[ 'Number_Admits' ];
if( $t_lngNumber_Admits % 2 == 0 )
{
// even...
$t_lngLimit1 = intval( $t_lngNumber_Admits / 2 ) - 1;
$t_lngLimit2 = 2;
}
else
{
// odd...
$t_lngLimit1 = intval( $t_lngNumber_Admits / 2 );
$t_lngLimit2 = 1;
}

$t_datRows2 = mysql_query( 'SELECT AdmitDoc, UNIX_TIMESTAMP(outdt) -
UNIX_TIMESTAMP(decdt) AS "Doc_To_Admit" FROM tadmits WHERE AdmitDoc =
"' . $t_strAdmitDoc . '" ORDER BY UNIX_TIMESTAMP(outdt) -
UNIX_TIMESTAMP(decdt) LIMIT ' . $t_lngLimit1 . ',' . $t_lngLimit2,
$t_objSQLConnection );

if( $t_lngNumber_Admits % 2 == 0 )
{
// even...
$t_datRow2 = mysql_fetch_array( $t_datRows2, MYSQL_ASSOC );
$t_dblValue1 = $t_datRow2[ 'Doc_To_Admit' ];
$t_datRow2 = mysql_fetch_array( $t_datRows2, MYSQL_ASSOC );
$t_dblValue2 = $t_datRow2[ 'Doc_To_Admit' ];
$t_lngMedianTime = ( $t_dblValue1 + $t_dblValue2 ) / 2;
}
else
{
// odd...
$t_datRow2 = mysql_fetch_array( $t_datRows2, MYSQL_ASSOC );
$t_lngMedianTime = $t_datRow2[ 'Doc_To_Admit' ];
}

mysql_free_result( $t_datRows2 );

// seconds to decimal days...
$t_lngMedianTime = $t_lngMedianTime / ( 3600 * 24 );

print "$t_strAdmitDoc:\t$t_lngMedianTime days\n\n";

}

mysql_free_result( $t_datRows );

?>


I get this:

Doctor: Median days

aa: 2 days

bb: 6 days

cc: 11 days

dd: 24 days

ee: 11 days

---
Steve

0 new messages