Compute percentile

269 views
Skip to first unread message

Rizwana Rizia

unread,
May 28, 2013, 6:29:36 PM5/28/13
to dat...@googlegroups.com
Hi,

I am very new to PigLatin,

Can anyone tell me if there is any feature in DataFu, that can help calculate percentiles?

For example: Given a set of data, I can tell for each data what is it's percentile.

Example input: 1, 2, 3, 4, 5, 6, 7, ,8 ,9, 10
Output gives you 0th percentile, 10th percentile, 20th percentile, ..........., 100th percentile.

Thanks in advance.
Rizwana Rizia

Matt Hayes

unread,
Jul 3, 2013, 1:42:16 AM7/3/13
to dat...@googlegroups.com
Yes check out either Quantile or StreamingQuantile.  

You can define it like this:

define Quantile datafu.pig.stats.Quantile('0.0','0.10','0.20','0.30','0.40','0.50','0.60','0.70','0.80','0.90','1.0');

Or, the shorthand:

define Quantile datafu.pig.stats.Quantile('11')

Note that in the shorthand version it is 11, not 10, because there are 11 values being produced.

Oscar Wilde

unread,
Nov 13, 2013, 5:41:40 PM11/13/13
to dat...@googlegroups.com
Hello ; 
I would like to use the output from the Quantile to filter my input, for example how do i get all temperatures (as per the example ) that are within 10% to 90% percentile?
Thanks

Matthew Hayes

unread,
Nov 13, 2013, 11:46:49 PM11/13/13
to dat...@googlegroups.com
I gave a talk where I walked through this type of scenario.  Check out the slides here:


Slide 12 is where I apply the filter.  

-Matt


--
You received this message because you are subscribed to the Google Groups "DataFu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datafu+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages