Risk vs Excess Risk

Ahmad Ahmadian

unread,

Jan 28, 2019, 11:02:21 AM1/28/19

to machine-lea...@googlegroups.com

Hi,

I was reviewing the lecture notes, and I had a few questions regarding the risk decomposition of an arbitrary function, visible in the image below: (Page 11)

that is, why exactly can we decompose the risk of an arbitrary classifier into these two terms, and what do each of them represent? I understand the derivation steps afterwards, but I just can't understand why that would represent an arbitrary risk.

Theoretically, given that the loss function is the zero-one loss for classification it should be equivalent/equal to the formulation of risk: (Page 10)

But I just don't see how these two functions express the same thing.

I'd appreciate any kind of clarification.

Maksym Andriushchenko

unread,

Jan 28, 2019, 12:05:39 PM1/28/19

to Ahmad Ahmadian, Machine Learning WS18/19

Hi,

First observe that R(f) can be written as:

Next, substitute the boolean condition of the indicator functions. Instead of f(X)=-1 and f(X)=1, use f(X)=f*(X) (i.e. all points X s.t. our classifier f makes the same prediction as the Bayes classifier f*) and f(X)!=f*(X) (i.e. all points X s.t. our classifier takes suboptimal decisions which are the opposite to the decision of the Bayes classifier f*). Then we split the expectation into two integrals. And also observe that the decision of the Bayes classifier is always taking the most probable class, while the suboptimal decision that our classifier can occasionally make is the opposite, i.e. taking the least probable class (leads to the max{} term in the second integral).

Hope that helps,

Maksym

--
You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.
To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-...@googlegroups.com.
To post to this group, send email to machine-lea...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/machine-learning-ws1819/CANCk4M%2BdNRxp2KyS-ABxwtQeVQAvE-bcxyOLr3KcVhZhRshN2w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ahmad Ahmadian

unread,

Jan 28, 2019, 12:12:50 PM1/28/19

to Maksym Andriushchenko, Machine Learning WS18/19

Ah ok. So that's better. But still another point remains: why is there a p(x) multiplied at the R(f) in the decomposition in page 11, while the formula you wrote includes no such term? Is that because of the E_x behind it?

Maksym Andriushchenko

unread,

Jan 28, 2019, 12:19:31 PM1/28/19

to Ahmad Ahmadian, Machine Learning WS18/19

Yes, exactly. In the formula I referred to there is E_x. If we expand the expectation as an integral, we will have p(x) term directly from the definition of the expectation operator.