HELP on Understanding the result of GLM binomial prediction, it is confusing

293 views
Skip to first unread message

Mei Liang

unread,
Jun 15, 2015, 8:44:41 PM6/15/15
to h2os...@googlegroups.com
Hi,

I am having trouble to understand the result for a GLM with binomial family prediction. I ran the example code https://github.com/h2oai/sparkling-water/blob/master/examples/scripts/StrataAirlines.scala. It returns the prediction result. I viewed the result using the h2o-flow web interface. The result produce by line 86, it is all 1, which it is impossible for the algorithm to predict all 1s. What does predict mean here? When I modified line 86 to be 
val predGLMH2OFrame = glmModel.score(bigTable) It will return two more results, p0 and p1. I guess this is the probability of 1 and 0, But I am not sure. Can someone help me please? Thanks, Mei


Mei Liang

unread,
Jun 15, 2015, 8:47:15 PM6/15/15
to h2os...@googlegroups.com

P.S Here is a screen shot of my result

Tom Kraljevic

unread,
Jun 15, 2015, 10:27:15 PM6/15/15
to Mei Liang, h2os...@googlegroups.com

Hi Mei,


On Jun 15, 2015, at 5:44 PM, Mei Liang <me...@g.clemson.edu> wrote:

> Hi,
>
> I am having trouble to understand the result for a GLM with binomial family prediction. I ran the example code https://github.com/h2oai/sparkling-water/blob/master/examples/scripts/StrataAirlines.scala. It returns the prediction result. I viewed the result using the h2o-flow web interface. The result produce by line 86, it is all 1, which it is impossible for the algorithm to predict all 1s. What does predict mean here?

‘predict’ is the predicted label with an assumed threshold of 0.5 (which is frequently not what you want).


> When I modified line 86 to be
> val predGLMH2OFrame =
> glmModel.score(bigTable)
>
> It will return two more results, p0 and p1. I guess this is the probability of 1 and 0, But I am not sure.


Yes, this is correct. Use these to do your own thresholding.


Thanks,
Tom

Mei Liang

unread,
Jun 16, 2015, 10:42:57 AM6/16/15
to h2os...@googlegroups.com, me...@g.clemson.edu
Hi Tom,

I still did not get it. If the threshold is 0.5, and the p(1) >= 0.5, then the predict label is 1, if p(0) < 0.5, the predict label is 0.

But why, refers to my result pic above, it predicted 1 on both row 1 and row 5, even though row 1 has p(1) = 0.82, and row 5 has p(1) =   0.26. Does row 5 suppose predict 0 instead of 1?

Thanks,
Mei

ccl...@gmail.com

unread,
Jun 25, 2015, 2:20:58 PM6/25/15
to h2os...@googlegroups.com
Row 1, p(1)=0.82, Row 5, p(1)=0.26

If you choose a threshold of 0.10 then...
row 1: 0.82 > 0.10, so predicts a 1
row 5: 0.26 > 0.10, so predicts a 1

If you choose a threshold of 0.50 then...
row 1: 0.82 > 0.50, so predicts a 1
row 5: 0.26 < 0.50, so predicts a 0

If you choose a threshold of 0.90 then...
row 1: 0.82 < 0.90, so predicts a 0
row 5: 0.26 < 0.90, so predicts a 0


As you see, you can change the predictions by picking a threshold, however row 1 will always trend to a 1 more than row 5. Picking a threshold is often done by looking at the ROC curve, and usually depends on how much you value positive errors over negative errors.

Cliff

Mei Liang

unread,
Jun 25, 2015, 2:54:51 PM6/25/15
to h2os...@googlegroups.com, ccl...@gmail.com
Hi Cliff,

I understand your point, however, for my understand for machine learning is that, after I train/build the model with a training set, the model should have the best threshold and the best theta build and ready for me to use make new prediction(not setting the thresholds myself). Is this not how h2o glm model works?

In addition, after I build the glm model, and try use the model to make a prediction, there is no place for me to set the threshold. I had look at the pojo, according to the pojo, h2o had the threshold set at the 0, therefore, the prediction is always going to be 1. This needs to be fix. 

I had take Michal's suggestion to create a class that override the map function that h2o had. This is, the only way,  how I use h2o to make prediction now. But Can you have this fix, please? 

Thanks,
Mei

Parag Sanghavi

unread,
Jun 25, 2015, 6:05:21 PM6/25/15
to Mei Liang, h2ostream, Cliff Click
Mei,

I was able to see the threshold is hardcoded to 0 in the java POJO . I have filed a JIRA for this issue :


Thank you for your patience.

Parag

--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Parag Sanghavi
Head of Customer Success
H2O.ai
(650) 303-4069

ccl...@gmail.com

unread,
Jun 25, 2015, 7:24:58 PM6/25/15
to h2os...@googlegroups.com, ccl...@gmail.com, me...@g.clemson.edu
Thanks Parag.

Mei - as a work-around, you should be able to get the full ROC out of the model from R or Flow, and then pick the best threshold and hand-edit the POJO.

Cliff

Mei Liang

unread,
Jun 26, 2015, 8:52:43 AM6/26/15
to h2os...@googlegroups.com, ccl...@gmail.com
Thanks Cliff and Parag
Reply all
Reply to author
Forward
0 new messages