ROC.jl

658 views
Skip to first unread message

Diego Javier Zea

unread,
Apr 26, 2014, 6:10:50 AM4/26/14
to julia...@googlegroups.com
Hi!!
This is a package for create ROC curves on Julia: https://github.com/diegozea/ROC.jl
I would like to have some feedback before pushing it into METADATA ;)
Best,

cnbiz850

unread,
Apr 26, 2014, 6:17:09 AM4/26/14
to julia...@googlegroups.com
What does ROC mean? Can you give it a more meaningful name please?

On 04/26/2014 06:10 PM, Diego Javier Zea wrote:
> *Hi!!*
> This is a package for create ROC curves on Julia:
> https://github.com/diegozea/ROC.jl
> I would like to have some feedback before pushing it into /METADATA/ ;)
> /Best,/
> --
> You received this message because you are subscribed to the Google
> Groups "julia-stats" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to julia-stats...@googlegroups.com
> <mailto:julia-stats...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Diego Javier Zea

unread,
Apr 26, 2014, 6:25:16 AM4/26/14
to julia...@googlegroups.com
R has three packages for ROC curves: ROCROCR, pROC. Python has CROC...
I thinks that ROC.jl is a meaningful name.

cnbiz850

unread,
Apr 26, 2014, 6:55:43 AM4/26/14
to julia...@googlegroups.com
I am not sure if I can compliment on those names. But perhaps they are
meaningful enough to those who need it. To me ROC means more like
return on capital.

On 04/26/2014 06:25 PM, Diego Javier Zea wrote:
> R has three packages for ROC curves: *ROC*, *ROC*R, p*ROC*. Python has
> C*ROC*...
> I thinks that ROC.jl is a meaningful name.

Harlan Harris

unread,
Apr 26, 2014, 8:38:21 AM4/26/14
to julia...@googlegroups.com


On Sat, Apr 26, 2014 at 6:55 AM, cnbiz850 <cnbi...@gmail.com> wrote:
I am not sure if I can compliment on those names.  But perhaps they are meaningful enough to those who need it.  To me ROC means more like return on capital.

On 04/26/2014 06:25 PM, Diego Javier Zea wrote:
R has three packages for ROC curves: *ROC*, *ROC*R, p*ROC*. Python has C*ROC*...

I thinks that ROC.jl is a meaningful name.
--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats+unsubscribe@googlegroups.com <mailto:julia-stats+unsub...@googlegroups.com>.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats+unsubscribe@googlegroups.com.

Kevin Squire

unread,
Apr 26, 2014, 10:43:58 AM4/26/14
to julia...@googlegroups.com

Hi Diego, all,

Given that they provide similar information, it might be nice to provide Precision-Recall curves in the same package, and call it something like ClassifierPerformanceCurves.jl.  (Descriptive, but slightly unwieldy...  can someone suggest a better name?)

Cheers,
   Kevin


To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats...@googlegroups.com.

Diego Javier Zea

unread,
Apr 26, 2014, 2:35:24 PM4/26/14
to julia...@googlegroups.com
I really think of ROC.jl as a simply, useful and meaningful name.
ROC.jl likes me a lot as name for a package that performs ROC Analysis.
Maybe it's because the language Spanish/English and the area Bio/Stats... I used to listen to people talking about ROC curves or ROC (no one is going to Google Receiver Operating Characteristic or Classifier Performance Curves).
ROC is even part of papers titles example 1, example 2, example 3, example 4...
I'm doing a lot of ROCs, and R's ROCR has become slow. So, I wrote here the part of ROCR that I need for my work. I haven't enough time for extending it rigth now :/ But I make it easy to extend.
roc() returns a ROCData object with a lot of data useful for plotting another curves, like Precision-Recall curves (ROCR also do the same):

immutable ROCData{T<:Real}
scores::Vector{T}
labels::BitVector
P::Int
n::Int
N::Int
ni::UnitRange{Int}
TP::Vector{Int}
TN::Vector{Int}
FP::Vector{Int}
FN::Vector{Int}
FPR::Vector{Float64}
TPR::Vector{Float64}
end

TPR field has the recall, and you can use PPV() over a ROCData object for getting the precision.
I'm going to document it better.

I haven't to much time now. So, help on adding plots types, statistical tests... are really welcome. Would be great to have all the options provided for performance() of ROCR.

Do you think is ready for being include on METADATA?

Best

El sábado, 26 de abril de 2014 11:43:58 UTC-3, Kevin Squire escribió:

Hi Diego, all,

Given that they provide similar information, it might be nice to provide Precision-Recall curves in the same package, and call it something like ClassifierPerformanceCurves.jl.  (Descriptive, but slightly unwieldy...  can someone suggest a better name?)

Cheers,
   Kevin
On Sat, Apr 26, 2014 at 5:38 AM, Harlan Harris <har...@harris.name> wrote:
http://en.wikipedia.org/wiki/Receiver_operating_characteristic from signal processing and statistics...
On Sat, Apr 26, 2014 at 6:55 AM, cnbiz850 <cnbi...@gmail.com> wrote:
I am not sure if I can compliment on those names.  But perhaps they are meaningful enough to those who need it.  To me ROC means more like return on capital.

On 04/26/2014 06:25 PM, Diego Javier Zea wrote:
R has three packages for ROC curves: *ROC*, *ROC*R, p*ROC*. Python has C*ROC*...

I thinks that ROC.jl is a meaningful name.
--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats...@googlegroups.com <mailto:julia-stats+unsubscribe...@googlegroups.com>.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sam Lendle

unread,
Apr 26, 2014, 3:38:10 PM4/26/14
to julia...@googlegroups.com
This package looks great! Very handy to have a quick way of plotting ROCs and calculating AUC.

It looks like some of the functionality of  ROC.jl is already available in MLBase.jl, but I'm not too familiar with how it works. Would it make sense to extend the functionality of MLBase.jl in ROC.jl or add new functionality directly to MLBase?

Stefan Karpinski

unread,
Apr 26, 2014, 5:10:28 PM4/26/14
to julia-stats
This is definitely a useful tool – and I like the graphic of the mythical Roc :-)

It would be nice if when reading the using line – `using ROC` currently – one had some better hint as to what ROC is (in case it's not immediately obvious). `using ROCPlots` would be one option for a more obvious name. Or maybe including a bunch of related tools and calling it `using SignalDetection`?

Iain Dunning

unread,
Apr 26, 2014, 8:58:21 PM4/26/14
to julia...@googlegroups.com
Cool package (and perfectly reasonable name!)

I kinda wish it was in a more general package (and it looks like MLBase is that package) but no harm putting this out there and maybe deprecating it if it does get eventually absorbed.

Cheers,
Iain

Stefan Karpinski

unread,
Apr 26, 2014, 10:01:01 PM4/26/14
to julia...@googlegroups.com
Indeed, the most important thing is to get it out there and then perhaps find a set of other functionality it fits nicely with.

Dahua Lin

unread,
May 23, 2014, 7:30:06 PM5/23/14
to julia...@googlegroups.com
Yes, MLBase already provides the functionalities of computing ROC in different ways.

I am keen to see what is missing there, and am open to new things to be added to MLBase.

Dahua

Diego Javier Zea

unread,
Jun 27, 2014, 11:49:05 PM6/27/14
to julia...@googlegroups.com
I'm going to try with ROC curves on MLBase :) Is AUC (Area Under Curve) missing or is there in another package?
Diego

David van Leeuwen

unread,
Jan 13, 2015, 12:23:07 PM1/13/15
to julia...@googlegroups.com
Hello, 

As an alternative, I also have a package named ROC, with presumably similar capabilities.   It is geared towards large data sets and the evaluation of a probabilistic interpretation of the scores of a binary classifier. 

Cheers, 

---david 

Erin LeDell

unread,
Jan 13, 2015, 12:43:59 PM1/13/15
to julia...@googlegroups.com
HI,
I wrote an trapezoidal-rule-based AUC function a while ago for MLBase, since as far as I can tell, there was not a simple utility function in MLBase to calculate the AUC value.  I have it on my to-do list to double check it for correctness (to check that the discretization matches ROCR and sklearn's AUC implementations), which is why I haven't added it to MLBase yet.  Feel free to use/check it -- once it's been verified as correct, I can make a PR.

using MLBase

function auc(gt, pred, n=200)
    curve = reverse(roc(gt, pred, n))
    area = 0.0
    for i in 2:length(curve)
        dx = curve[i].fp/curve[i].n - curve[i-1].fp/curve[i-1].n  #delta FPR
        dy = curve[i].tp/curve[i].p - curve[i-1].tp/curve[i-1].p  #delta TPR
        area += (0.5*dx*((curve[i].tp/curve[i].p) + (curve[i-1].tp/curve[i-1].p)))  #0.5 * width * (height_(i) + height_(i-1))
    end
    area
end

-Erin

Erin LeDell

unread,
Jan 13, 2015, 1:16:11 PM1/13/15
to julia-stats
David,
Cool package. I've been using MLBase because it seemed like it was the main project for ROC-based calculations in JuliaStats.  Your documentation implies that ROC.jl is the Julia version of your ROC R package, which makes sense why ROC.jl was implemented as a separate effort from MLBase.  Can you summarize what the differences are between MLBase's ROC functionality and the ROC.jl project, or do they essentially provide the same functionality?
 
Thanks!
Erin

David van Leeuwen

unread,
Jan 13, 2015, 6:21:09 PM1/13/15
to julia...@googlegroups.com
Hi Erin, 


On Tuesday, January 13, 2015 at 7:16:11 PM UTC+1, Erin LeDell wrote:
David,
Cool package. I've been using MLBase because it seemed like it was the main project for ROC-based calculations in JuliaStats.  Your documentation implies that ROC.jl is the Julia version of your ROC R package, which makes sense why ROC.jl was implemented as a separate effort from MLBase.  Can you summarize what the differences are between MLBase's ROC functionality and the ROC.jl project, or do they essentially provide the same functionality?
 
I would have to study MLBase's functionality in detail.  From what I read from the docs, there is is not too much overlap with my package.  There is an awful lot that can be derived from the ROC, and naturally different developers will choose different things to implement first. 

In my implementation the ROC convex hull is quite central, with its relation to the optimal score-to-log-likelihood-ratio transformation.  For similar reasons, there is an implementation of the Pool Adjacent Violators algorithm (an implementation of isotonic regression).  Quite some effort has been put in making the ROC computations efficient in memory and speed.  Although by far not covering the functionality in the R counterpart, it is a lot faster for large data sets (of the order of millions of trials). 

Cheers, 

---david 
Reply all
Reply to author
Forward
0 new messages