ROC Area Under Curve

88 views
Skip to first unread message

Eduardo Zamudio

unread,
Jan 5, 2016, 2:47:59 PM1/5/16
to pystruct
Hi all!

I have a binary classification problem for which i've already tested GraphCRF and NSlackSSVM with good results. But I want to get ROC AUC to optimize my model.

The thing is that sklearn.metrics.roc_auc_score uses scores from sklearn classification algorithm methods, such as decision_function() or predict_proba(), but I can't find anything like this in pystruct. 

Should I use something like Probability Calibration to get scores in order to compute ROC AUC? Or there is another way to get scores from pystruct implementation?

Thanks in advance

Andreas Mueller

unread,
Jan 5, 2016, 2:57:55 PM1/5/16
to pyst...@googlegroups.com
Hi Eduardo.
What do you mean by a binary classification problem? If you are using GraphCRF, you have a graph with nodes, right?
So each node is binary?
If the graph has n binary nodes, then there are 2 ** N possible labelings, which makes it usually infeasible to calculate something like a decision function (which would be of shape 2 ** N).
I'm not sure what definition of a ROC you would apply here anyhow.

Cheers,
Andy
--
You received this message because you are subscribed to the Google Groups "pystruct" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystruct+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eduardo Zamudio

unread,
Jan 6, 2016, 10:23:34 AM1/6/16
to pystruct
Hi Andreas

Sorry, thats right, by binary I mean that each node is binary, So I have a graph with n binary nodes.

My question is, how to compute the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve, in order to evaluate the performance of a structured model (ie. GraphCRF)?

I believe this is possible since it is essentially a binary classification task.

Any idea?

Thanks again.

Andreas Mueller

unread,
Jan 6, 2016, 11:30:16 AM1/6/16
to pyst...@googlegroups.com
As I said in my previous mail, it is not a binary classification task, it is a 2 ** n class classification task.
You could look at the marginals in each node, and compute a roc curve for that node. It's not entirely obvious to me what that means,
though, as the marginals in the different nodes are not independent.
And, as I said before, to get marginals, you need to train a probabilistic model, and preferably on a tree-structured graph.

Eduardo Zamudio

unread,
Jan 6, 2016, 12:28:21 PM1/6/16
to pystruct

I'll try it.

Thank you Andy

Laura Mocanu

unread,
Jul 25, 2016, 10:23:08 AM7/25/16
to pystruct
Hi Andreas,

There is still something that I do not get here.

You predict a set of instance ( in the form of nodes) at the same time (that are not independent), but those predictions (with probabilities or scores), although jointly predicted, still belong to some instances which are labeled. So you can draw a PR curve.  I do not understand why you cannot. Can you explain more?

I mean, otherwise, you can never draw a PR curve for any structured output, which is not true. The goal is to compare with non-structured outputs. And PR curves and ROC curve are much more meaningful.

Laura

Laura Mocanu

unread,
Jul 25, 2016, 10:28:15 AM7/25/16
to pystruct
Hi Andreas,

There is still something that I do not get here.

You predict a set of instance ( in the form of nodes) at the same time (that are not independent), but those predictions (with probabilities or scores), although jointly predicted, still belong to some instances which are labeled. So you can draw a PR curve.  I do not understand why you cannot. Can you explain more?

I mean, otherwise, you can never draw a PR curve for any structured output, which is not true. The goal is to compare with non-structured outputs. And PR curves and ROC curve are much more meaningful.

Laura

Colin Drayton

unread,
Aug 4, 2017, 5:47:38 PM8/4/17
to pystruct
Hi All 

I'm using pystruct for my final project for my master's degree in information science and my current performance isn't as good as I would like.
It would be great to be able to get a ROC or Precision Recall curve, While I'm not as technical as I would like, my undergrad was in Anthro, I think I understand Andreas' reason why you can't get a decision function. Would you run into the same problem getting the decision boundaries and calculating the probabilities from that?  Just wondering 
Thanks

Andreas Mueller

unread,
Oct 24, 2017, 1:20:37 PM10/24/17
to pyst...@googlegroups.com

Sorry for the slow reply.


On 07/25/2016 10:23 AM, Laura Mocanu wrote:
I mean, otherwise, you can never draw a PR curve for any structured output, which is not true

Can you give an example?
Reply all
Reply to author
Forward
0 new messages