MEDIAN function in Big Query

Skip to first unread message

Claritics General

Apr 11, 2012, 11:06:04 AM4/11/12
BQ team,

Do you have a suggested way to calculate the MEDIAN (50th percentile) and percentile distribution based functions in general?

I see a QUANTILES(expr[, buckets]) function listed in the query reference docs but am not clear if this can be used for the above?


Michael Sheldon

Apr 11, 2012, 11:29:40 AM4/11/12
Yes, you can compute median.

Note that quantile provides an approximate answer. The fractional error per quantile is epsilon=1/N, so as N increases, the error becomes smaller and the accuracy becomes higher. To achieve certain accuracy with epsilon=Ε, one can compute N=1/Ε quantiles and only use a subset N' of those at regular intervals. For example, to get deciles with only 0.1% error, one should compute 1000-tiles (quantile(1001)) and then use 0th, 100th, 200th, ..., 1000th ranks with the middle value being the median.
  • quantile(<expr>, 2) computes min and max with 50% error.
  • quantile(<expr>, 3) computes min, median, and max with 33% error.
  • quantile(<expr>, 5) computes quartiles with 25% error.
  • quantile(<expr>, 11) computes deciles with 10% error.
  • quantile(<expr>, 21) computes vingtiles with 5% error.
  • quantile(<expr>, 101) computes percentiles with 1% error.

--Michael Sheldon

Sanjath Shringeri

Apr 16, 2012, 2:54:12 PM4/16/12
Is there a way to  make it work on partition of data (using group by)? 

We are trying to find the medians on rows partitioned by some criteria.   E.g: meidan sales number amongst the cities within each state.   We can apply a where clause and get the numbers using quantiles, but we cannot seem to use the quantiles along with group by clause. 

Are we missing something?

- Sanjath
Sanjath Shringeri | VP, Engineering | claritics | Social. Intelligence. Now. 
Come visit us at GDC 2012 in San Francisco, booth & session
408.796.1287 |

Reply all
Reply to author
0 new messages