Calculating Percentiles and Descriptive Stats

431 views
Skip to first unread message

edsar

unread,
Dec 13, 2011, 1:43:00 PM12/13/11
to BigQuery discuss
Is there a way to take a BigQuery result set and calculate the
following?

- Percentiles (e.g. 90th, 95th)
- Standard Deviation
- Variance

Ed

Jordan Tigani

unread,
Dec 13, 2011, 2:49:25 PM12/13/11
to bigquery...@googlegroups.com
you can compute the variance of field foo by using:
select 
SELECT
(sum(foo*foo)-sum(foo)*sum(foo)/count(foo))/count(foo) AS variance 
FROM ...

standard deviation is just the square root of the variance (you can use the sqrt() function).

Regarding percentiles, there is no way to do this currently. The underlying query engine that BigQuery uses has a quantiles function, so it should be relatively straightforward to add. Note that this would be a statistical approximation, since exact percentiles are very expensive to compute over large data. I'll ping you back after investigating further to make sure there are no gotchas with exposing the additional function.

Moreover, given that we've had several requests, we might add shortcuts for mean/median/variance queries (if these would make your life easier, please let us know).

-jordan
Reply all
Reply to author
Forward
0 new messages