F-measure

31 views
Skip to first unread message

Tesnime Touil

unread,
Aug 11, 2019, 12:05:39 PM8/11/19
to python-weka-wrapper
Can I calculate the F-measure with python-weka-wrapper ?

Peter Reutemann

unread,
Aug 11, 2019, 6:01:49 PM8/11/19
to python-weka-wrapper
> Can I calculate the F-measure with python-weka-wrapper ?

The Evaluation class does that for you when evaluating a model. See
documentation and example:
http://fracpete.github.io/python-weka-wrapper3/weka.html#weka.classifiers.Evaluation.f_measure
https://github.com/fracpete/python-weka-wrapper3-examples/blob/master/src/wekaexamples/classifiers/classifiers.py

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Tesnime Touil

unread,
Aug 11, 2019, 7:02:55 PM8/11/19
to python-weka-wrapper
What means : the 0-based index of the class label ?

Peter Reutemann

unread,
Aug 11, 2019, 7:06:36 PM8/11/19
to python-weka-wrapper
> What means : the 0-based index of the class label ?

Let's assume that your dataset has a class attribute with three labels: A, B, C.
Then their 0-based indices are: 0, 1, 2

Tesnime Touil

unread,
Aug 11, 2019, 7:15:23 PM8/11/19
to python-weka-wrapper
My class variable is the last variable in my dataset file with the index 3. When I put 3 I get this error : 

javabridge.jutil.JavaException: Index 3 out of bounds for length 2.

For the M5P classifier the F-measure didn't work and I get this error : 

 print("fMeasure: " + str(Eval.f_measure(4)))
  File "C:\pww3.test\lib\site-packages\weka\classifiers.py", line 1680, in f_measure
    return javabridge.call(self.jobject, "fMeasure", "(I)D", class_index)
  File "C:\pww3.test\lib\site-packages\javabridge\jutil.py", line 887, in call
    result = fn(*nice_args)
  File "C:\pww3.test\lib\site-packages\javabridge\jutil.py", line 854, in fn
    raise JavaException(x)
javabridge.jutil.JavaException: <Java object at 0x-e99b298>



Le dimanche 11 août 2019 18:05:39 UTC+2, Tesnime Touil a écrit :

Peter Reutemann

unread,
Aug 11, 2019, 7:17:55 PM8/11/19
to python-weka-wrapper
> My class variable is the last variable in my dataset file with the index 3. When I put 3 I get this error :
>
> javabridge.jutil.JavaException: Index 3 out of bounds for length 2.
>
> For the M5P classifier the F-measure didn't work and I get this error :
>
> print("fMeasure: " + str(Eval.f_measure(4)))
> File "C:\pww3.test\lib\site-packages\weka\classifiers.py", line 1680, in f_measure
> return javabridge.call(self.jobject, "fMeasure", "(I)D", class_index)
> File "C:\pww3.test\lib\site-packages\javabridge\jutil.py", line 887, in call
> result = fn(*nice_args)
> File "C:\pww3.test\lib\site-packages\javabridge\jutil.py", line 854, in fn
> raise JavaException(x)
> javabridge.jutil.JavaException: <Java object at 0x-e99b298>

How many "class labels" does your "class attribute" have? That's the
index that you supply.
The Evaluation class already knows what your class attribute is, but
for the f-measure it needs to know what label this is for. Hence the
0-based index of the "class label".

Tesnime Touil

unread,
Aug 11, 2019, 7:21:48 PM8/11/19
to python-weka-wrapper
Thank you, but why the F-measure don't work for the M5P classifier ?


Le dimanche 11 août 2019 18:05:39 UTC+2, Tesnime Touil a écrit :

Peter Reutemann

unread,
Aug 11, 2019, 7:28:07 PM8/11/19
to python-weka-wrapper
> Thank you, but why the F-measure don't work for the M5P classifier ?

f-measure is for nominal classes only:
https://en.wikipedia.org/wiki/F1_score

M5P is a classifier for numeric (ie continuous) classes.

Tesnime Touil

unread,
Aug 11, 2019, 7:32:55 PM8/11/19
to python-weka-wrapper
Ok, thank you. 
Just a question, when I have in my dataset only the class attribute, it's index is 0 or 1 ? 


Le dimanche 11 août 2019 18:05:39 UTC+2, Tesnime Touil a écrit :

Peter Reutemann

unread,
Aug 11, 2019, 7:38:41 PM8/11/19
to python-weka-wrapper
> Just a question, when I have in my dataset only the class attribute, it's index is 0 or 1 ?

A dataset that only consists of the output variable (= class
attribute) doesn't seem to be of much use, there are no input
variables to learn from.

In what context is the index used?

Rule of thumb: when using the Weka API directly (using integer
numbers), then Weka usually has 0-based indices. When supplying
options to filters, classifiers, etc (via the options property or when
instantiating them) then indices are usually 1-based (human readable
string).

Tesnime Touil

unread,
Aug 11, 2019, 7:41:11 PM8/11/19
to python-weka-wrapper
The point is to evaluate the model with the majority class. When I put 1 for the index I got the value nan for the f-measure, and I put the index 0 I got a value.


Le dimanche 11 août 2019 18:05:39 UTC+2, Tesnime Touil a écrit :

Peter Reutemann

unread,
Aug 11, 2019, 7:50:03 PM8/11/19
to python-weka-wrapper
> The point is to evaluate the model with the majority class. When I put 1 for the index I got the value nan for the f-measure, and I put the index 0 I got a value.

There seems to be a confusion about "attribute index" and "label
index". Label index is the index of a label within a nominal
attribute.
"class attribute" is the attribute that you defined to be the output
variable. "class label index" is the label index within the nominal
class attribute.
f-measure knows what the class attribute is, there is no need to
specify the attribute index. What you need to specify is the class
label index that you want the measure for (and, according to the
documentation, this is a 0-based index).

I recommend you using ZeroR as base-line classifier for determining
statistics for the majority class:
http://weka.sourceforge.net/doc.dev/weka/classifiers/rules/ZeroR.html
You can just use your actual dataset with ZeroR, no need to create a fake one.

Tesnime Touil

unread,
Aug 11, 2019, 8:06:02 PM8/11/19
to python-weka-wrapper
If a nominal attribute has 2 values (boolean), it means that its index is 2 ?

Peter Reutemann

unread,
Aug 11, 2019, 8:17:34 PM8/11/19
to python-weka-wrapper
> If a nominal attribute has 2 values (boolean), it means that its index is 2 ?

Here is an example dataset in ARFF format:
@relation example
@attribute num1 numeric
@attribute num2 numeric
@attribute nomclass {A,B,C}
@data
1,2,A
3.7,9.1,C
0,0.1,B

The dataset has three attributes: two numeric attributes and one
nominal attribute.
The last attribute is used as class attribute, ie we have nominal
class attribute. Therefore the 0-based indices of the class attribute
are as follows:
label A: 0
label B: 1
label C: 2

For accessing the f-measure of the "B" label, you use 1 as the index.
Reply all
Reply to author
Forward
0 new messages