MultiFilter and FilteredClassifier

264 views
Skip to first unread message

Alexander Osherenko

unread,
May 15, 2017, 2:16:51 AM5/15/17
to python-weka-wrapper
I am implementing a classifier that uses MultiFilter (to filter data by removing first two attributes and transforming the last string class attribute in the binary format) -- http://weka.8497.n7.nabble.com/Regression-supervised-learning-tt40588.html. FilteredClassifier would use this MultiFilter for classification: first internally filtering and then classifying the data. However, it doesn't work as expected:

addID = Filter(classname="weka.filters.unsupervised.attribute.AddID", options=["-C", "first", "-N", "ID"])
    
# transform the supervised outcome of an experiment from a nominal/string in a numeric value
transformN_S_NV = Filter(classname="weka.filters.unsupervised.attribute.StringToNominal", options=["-R", "last"])

transformN_B = Filter(classname="weka.filters.unsupervised.attribute.NominalToBinary", options=["-R", "last"])

remove = Filter(classname="weka.filters.unsupervised.attribute.Remove", options=["-R", "1,2"])

multiFilter = Filter(classname="weka.filters.MultiFilter")
multiFilter.filters=[addID, remove, transformN_S_NV, transformN_B]
multiFilter.inputformat(testClassifierInstances)
(!!!) filtered = multiFilter.filter(testClassifierInstances)

metaClassifier = Classifier(classname="weka.classifiers.meta.FilteredClassifier")
metaClassifier.classifier = Classifier(classname="weka.classifiers.functions.SMOreg")
metaClassifier.filter = multiFilter
###bulding classifier

The problem is: it seems, the last three rows don't have any effect at all (when building, FilteredClassifier transfers data without applying the filter to SMOreg that doesn't work with the string attributes). What do I miss?

Cheers, Alexander

Peter Reutemann

unread,
May 15, 2017, 6:16:01 AM5/15/17
to python-weka-wrapper

​When using the FilteredClassifier approach, training filter and base classifier is pointless, as the FilteredClassifier does that itself.​
 
​Also, the FilteredClassifier, starting with version 3.9.1, checks whether the class attribute got modified and throws an exception.
​Without the actual data, it's hard to see what's going wrong, therefore I've attached​
​ a modified iris dataset (class attribute is now string instead of nominal) and some example code for applying the code. Instead of using the FilteredClassifier with all the filters (and to avoid the "class attribute got changed" exception), I push the data through the MultiFilter, leaving the ID attribute intact. Then I use the FilteredClassifier with SMOreg and the Remove filter to avoid having the ID attribute as part of the model.

NB: Not all filters will ​process the class attribute, most of them leave it alone. Hence I use the ClassAssigner filter to unset and the reassign the class attribute.

​Cheers, Peter​
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
iris.arff
multi.py

Alexander Osherenko

unread,
May 15, 2017, 7:22:58 AM5/15/17
to python-we...@googlegroups.com
2017-05-15 11:15 GMT+01:00 Peter Reutemann <frac...@waikato.ac.nz>:


I am implementing a classifier that uses MultiFilter (to filter data by removing first two attributes and transforming the last string class attribute in the binary format) -- http://weka.8497.n7.nabble.com/Regression-supervised-learning-tt40588.html. FilteredClassifier would use this MultiFilter for classification: first internally filtering and then classifying the data. However, it doesn't work as expected:

addID = Filter(classname="weka.filters.unsupervised.attribute.AddID", options=["-C", "first", "-N", "ID"])
    
# transform the supervised outcome of an experiment from a nominal/string in a numeric value
transformN_S_NV = Filter(classname="weka.filters.unsupervised.attribute.StringToNominal", options=["-R", "last"])

transformN_B = Filter(classname="weka.filters.unsupervised.attribute.NominalToBinary", options=["-R", "last"])

remove = Filter(classname="weka.filters.unsupervised.attribute.Remove", options=["-R", "1,2"])

multiFilter = Filter(classname="weka.filters.MultiFilter")
multiFilter.filters=[addID, remove, transformN_S_NV, transformN_B]
multiFilter.inputformat(testClassifierInstances)
(!!!) filtered = multiFilter.filter(testClassifierInstances)

metaClassifier = Classifier(classname="weka.classifiers.meta.FilteredClassifier")
metaClassifier.classifier = Classifier(classname="weka.classifiers.functions.SMOreg")
metaClassifier.filter = multiFilter
###bulding classifier

The problem is: it seems, the last three rows don't have any effect at all (when building, FilteredClassifier transfers data without applying the filter to SMOreg that doesn't work with the string attributes). What do I miss?



​When using the FilteredClassifier approach, training filter and base classifier is pointless, as the FilteredClassifier does that itself.​
 
​Sorry, under classifier I meant the meta classifier (FilteredClassifier).
 
 
​Also, the FilteredClassifier, starting with version 3.9.1, checks whether the class attribute got modified and throws an exception.
​Without the actual data, it's hard to see what's going wrong, therefore I've attached​
​ a modified iris dataset (class attribute is now string instead of nominal) and some example code for applying the code. Instead of using the FilteredClassifier with all the filters (and to avoid the "class attribute got changed" exception), I push the data through the MultiFilter, leaving the ID attribute intact. Then I use the FilteredClassifier with SMOreg and the Remove filter to avoid having the ID attribute as part of the model.

NB: Not all filters will ​process the class attribute, most of them leave it alone. Hence I use the ClassAssigner filter to unset and the reassign the class attribute.

In your code, FilteredClassifier is built using the whole dataset. In my code, I am doing it using cross-validation 

​Hence, I am building FilteredClassifier for every fold myself. Consequently, line 64 cls.build_classifier(train) throws an exception ("SMOreg doesn't process string attributes").
It would't have happened if the filters would have been applied already.

I performed a small experiment: Resolved the multifilter manually and applied individual filters:

filtered = Instances.copy_instances(testClassifierInstances) for f in multiFilter.filters: f.inputformat(filtered) filtered = f.filter(filtered)

After it, I ran stratification and everything was fine. The question is: How can I enforce applying all individual filters manually?

Cheers, Alexander

​Cheers, Peter​
-- 
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

-- 
You received this message because you are subscribed to a topic in the Google Groups "python-weka-wrapper" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-weka-wrapper/nToUq-M-aHU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-weka-wrapper+unsub...@googlegroups.com.
To post to this group, send email to python-weka-wrapper@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python-weka-wrapper/CAHoQ12Kok2gyaOchriTQA7sOuA7DOoO78P7rAS_bPUv8gweYXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Alexander Osherenko

unread,
May 15, 2017, 8:43:58 AM5/15/17
to python-we...@googlegroups.com
After applying MultiFilter (StringToNominal-NominalToBinary) I applied AddClassification filter to store classification results in a dataset what added a classification attribute to the dataset. Now it is not clear for me how to interpret the added double result, for example, 0.0008550976763022633 and map it back onto a nominal.

2017-05-15 12:22 GMT+01:00 Alexander Osherenko <oshe...@gmail.com>:


To unsubscribe from this group and all its topics, send an email to python-weka-wrapper+unsubscri...@googlegroups.com.

Alexander Osherenko

unread,
May 15, 2017, 10:57:58 AM5/15/17
to python-we...@googlegroups.com
I attached an ARFF file I am using in regression.
wekalist-question.arff

Peter Reutemann

unread,
May 15, 2017, 5:55:35 PM5/15/17
to python-weka-wrapper
> After applying MultiFilter (StringToNominal-NominalToBinary) I applied
> AddClassification filter to store classification results in a dataset what
> added a classification attribute to the dataset. Now it is not clear for me
> how to interpret the added double result, for example, 0.0008550976763022633
> and map it back onto a nominal.

Internally, Weka just uses doubles for the indices of the nominal
labels. Rounding the output should give you the index of the
associated label.

But make sure not to use Python's Banker's rounding, but the decimal
module's ROUND_HALF_UP:
https://docs.python.org/3/library/decimal.html#decimal.ROUND_HALF_UP

Alexander Osherenko

unread,
May 16, 2017, 1:46:51 AM5/16/17
to python-we...@googlegroups.com
It is still not quite clear how to interpret the classification result since it is a double not greater than 1. Actually the classification result should be a particular binary in the group of many binary attributes. I attach the snapshot.

Inline-Bild 1 

--
You received this message because you are subscribed to a topic in the Google Groups "python-weka-wrapper" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-weka-wrapper/nToUq-M-aHU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-weka-wrapper+unsub...@googlegroups.com.

To post to this group, send email to python-weka-wrapper@googlegroups.com.

Peter Reutemann

unread,
May 16, 2017, 4:46:38 PM5/16/17
to python-weka-wrapper
Sorry, I'm currently really short on time.

Have you thought about using the meta-classifier ClassificationViaRegression? That could solve your interpretation dilemma.

http://weka.sourceforge.net/doc.dev/weka/classifiers/meta/ClassificationViaRegression.html

Cheers, Peter

To unsubscribe from this group and all its topics, send an email to python-weka-wrapper+unsubscribe...@googlegroups.com.

To post to this group, send email to python-weka-wrapper@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "python-weka-wrapper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-weka-wrapper+unsub...@googlegroups.com.

To post to this group, send email to python-weka-wrapper@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

osherenko

unread,
May 16, 2017, 5:05:34 PM5/16/17
to python-we...@googlegroups.com
Do you mean ClassificationViaRegression and SMOreg as the base classifier? Does ClassificationViaRegression work with nominals?

Best, Alexander

-------- Ursprüngliche Nachricht --------
Von: Peter Reutemann <frac...@waikato.ac.nz>
Datum: 16.05.17 22:46 (GMT+01:00)
An: python-weka-wrapper <python-we...@googlegroups.com>
Betreff: Re: MultiFilter and FilteredClassifier

To unsubscribe from this group and all its topics, send an email to python-weka-wra...@googlegroups.com.
To post to this group, send email to python-we...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python-weka-wrapper/CAHoQ12LWYvmS3%3DRF%3D_pQa5SheaTPyZ2W%3DO24B334q57vWpjdxw%40mail.gmail.com.

Peter Reutemann

unread,
May 16, 2017, 5:10:05 PM5/16/17
to python-weka-wrapper
> Do you mean ClassificationViaRegression and SMOreg as the base classifier? Does ClassificationViaRegression work with nominals?

Yes and Yes.

Use the SingleClassifierEnhancer wrapper.

Alexander Osherenko

unread,
May 18, 2017, 3:12:56 AM5/18/17
to python-weka-wrapper, frac...@waikato.ac.nz
While experimenting with MultiFilter and FilteredClassifier I found a problem I can't explain -- admittedly, StringToNominal changes the class attribute but it is actually desired. FilteredClassifier should work with it.

java -cp weka.jar weka.classifiers.meta.FilteredClassifier -W weka.classifiers.functions.SMO -F ".MultiFilter -F \".AddID -C first -N ID\" -F \".Remove -R 1,2\" -F \".StringToNominal -R last\"" -t E:/SVNcheckout/book2/software/data/IARP/ARFF/subprojects_standard_IARP_NS_weka_rainbow_f2012_clusterWords_freq_words69_freqTrans_slice500_sparse.arff -p 0
java.lang.IllegalArgumentException: Cannot proceed: weka.filters.MultiFilter -F "weka.filters.unsupervised.attribute.AddID -C first -N ID" -F "weka.filters.unsupervised.attribute.Remove -R 1,2" -F "weka.filters.unsupervised.attribute.StringToNominal -R last" has modified the class attribute!
        at weka.classifiers.meta.FilteredClassifier.setUp(FilteredClassifier.java:559)
        at weka.classifiers.meta.FilteredClassifier.buildClassifier(FilteredClassifier.java:578)
        at weka.classifiers.evaluation.Evaluation.evaluateModel(Evaluation.java:1529)
        at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:650)
        at weka.classifiers.AbstractClassifier.runClassifier(AbstractClassifier.java:141)
        at weka.classifiers.meta.FilteredClassifier.main(FilteredClassifier.java:761)

Peter Reutemann

unread,
May 18, 2017, 5:11:44 PM5/18/17
to python-weka-wrapper
> While experimenting with MultiFilter and FilteredClassifier I found a
> problem I can't explain -- admittedly, StringToNominal changes the class
> attribute but it is actually desired. FilteredClassifier should work with
> it.

Moving the position, but keeping type (and labels and order, in case
of nominal class), does not raise an exception (that's what
StringToWordVector does). Changing the type from STRING to NOMINAL, on
the other hand, does raise an exception.

If you turn your STRING class into a NOMINAL one before using
FilteredClassifier, then it should work.

Alexander Osherenko

unread,
May 19, 2017, 2:10:10 AM5/19/17
to python-we...@googlegroups.com

If you turn your STRING class into a NOMINAL one before using
FilteredClassifier, then it should work.

​It would be very nice if I could do it using only one command. There are different options given I use FilteredClassifier and MultiFilter:

1. Two additional filters in MultiFilter to set other intermediate class attribute and to set the class attribute back.
I could add two additional Filters that set the position of the string attribute, for example, at the dataset beginning so that it is not the class attribute anymore then to modify the attribute using StringToNominal and finally to set the nominal attribute after application of StringToNominal string as the class attribute.

2. Two additional filters in MultiFilter to move the position of the string attribute.
As above but with class index moving. I could add two additional Filters that move the string attribute, for example, to the dataset beginning so that it is not the class attribute anymore then to modify the attribute using StringToNominal and finally to move the nominal attribute after application of StringToNominal string to the position of the class attribute.

3. Nested MultiFilter
It could be possible to filter the dataset using a MultiFilter that contains other MultiFilter so that the class modification exception is not thrown.

4. Nested FiteredClassifier
It could be possible to classify the dataset using a FilteredClassifier with a base FilteredClassifier so that the class modification exception is not thrown.

Best, Alexander

Peter Reutemann

unread,
May 19, 2017, 7:10:37 AM5/19/17
to python-we...@googlegroups.com
I added the "check_for_modified_class_attribute" method to the FilteredClassifier class. This allows you to turn off the class check. See whether that works.

NB: You have to clone the repo and then install from source.

osherenko

unread,
May 19, 2017, 8:49:13 AM5/19/17
to python-we...@googlegroups.com
Thanks, checked out already. Will update and build.

Can I access this new method on the console as parameter?

Best, Alexander



Von meinem Samsung Galaxy Smartphone gesendet.

-------- Ursprüngliche Nachricht --------
Von: Peter Reutemann <frac...@gmail.com>
Datum: 19.05.17 13:10 (GMT+01:00)
Betreff: Re: Re: MultiFilter and FilteredClassifier

Peter Reutemann

unread,
May 19, 2017, 3:10:52 PM5/19/17
to python-we...@googlegroups.com
Unfortunately, no.

osherenko

unread,
May 19, 2017, 3:41:44 PM5/19/17
to python-we...@googlegroups.com
Did you mean repo on https://github.com/fracpete/python-weka-wrapper where you checked in the changes?

Best, Alexander

Von meinem Samsung Galaxy Smartphone gesendet.

-------- Ursprüngliche Nachricht --------
Von: Peter Reutemann <frac...@gmail.com>
Datum: 19.05.17 21:10 (GMT+01:00)

Peter Reutemann

unread,
May 19, 2017, 3:49:45 PM5/19/17
to python-we...@googlegroups.com
Both. Python 2.7 and 3.

Alexander Osherenko

unread,
May 23, 2017, 8:37:21 AM5/23/17
to python-we...@googlegroups.com
To reinstall wrapper3, do I need a wheel or it is OK if I simply overwrite existing files through the new source py files in the py directory?

Best, Alexander

2017-05-19 20:49 GMT+01:00 Peter Reutemann <frac...@gmail.com>:
On May 20, 2017 7:41:38 AM GMT+12:00, osherenko <oshe...@gmail.com> wrote:
Did you mean repo on https://github.com/fracpete/python-weka-wrapper where you checked in the changes?

Best, Alexander

Von meinem Samsung Galaxy Smartphone gesendet.

-------- Ursprüngliche Nachricht --------
Von: Peter Reutemann <frac...@gmail.com>
Datum: 19.05.17 21:10 (GMT+01:00)
Betreff: Re: Re: MultiFilter and FilteredClassifier

On May 20, 2017 12:49:00 AM GMT+12:00, osherenko <oshe...@gmail.com> wrote:
Thanks, checked out already. Will update and build.

Can I access this new method on the console as parameter?

Best, Alexander



Von meinem Samsung Galaxy Smartphone gesendet.

-------- Ursprüngliche Nachricht --------
Von: Peter Reutemann <frac...@gmail.com>
Datum: 19.05.17 13:10 (GMT+01:00)

--
You received this message because you are subscribed to a topic in the Google Groups "python-weka-wrapper" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-weka-wrapper/nToUq-M-aHU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-weka-wrapper+unsub...@googlegroups.com.

To post to this group, send email to python-weka-wrapper@googlegroups.com.

Alexander Osherenko

unread,
May 23, 2017, 9:43:16 AM5/23/17
to python-we...@googlegroups.com
Sorry, please forget the last question. It is easier than I thought. From console: 

D:\Downloads\python-weka-wrapper3-master>d:\WinPython-64bit-3.4.3.7\python-3.4.3.amd64\python.exe setup.py install

​This reinstalls weka and adds new sources.

Next question: I wanted to test my script and how the new changes work. If I run

metaClassifier = Classifier(classname="weka.classifiers.meta.FilteredClassifier")

I get a metaClassifier of type weka.classifiers.Classifier what is not quite comprehensible since I thought I would get a weka.classifiers.meta.FilteredClassifier. If I print FilteredClassifier I get message: FilteredClassifier is not built yet, but the type is nevertheless Classifier and 
check_for_modified_class_attribute
​ is not visible​
. If I instead instantiate FilteredClassifier using the constructor FilteredClassifier()
​, I get also the 
check_for_modified_class_attribute
​ member function.

​Best, Alexander​

To unsubscribe from this group and all its topics, send an email to python-weka-wrapper+unsubscribe...@googlegroups.com.

To post to this group, send email to python-weka-wrapper@googlegroups.com.

Peter Reutemann

unread,
May 23, 2017, 3:05:30 PM5/23/17
to python-we...@googlegroups.com

>
>Next question: I wanted to test my script and how the new changes work.
>If
>I run
>
>metaClassifier =
>Classifier(classname="weka.classifiers.meta.FilteredClassifier")
>
>I get a metaClassifier of type weka.classifiers.Classifier what is not
>quite comprehensible since I thought I would get a
>weka.classifiers.meta.FilteredClassifier. If I print FilteredClassifier
>I
>get message: FilteredClassifier is not built yet, but the type is
>nevertheless Classifier and
>check_for_modified_class_attribute
>​ is not visible​
>. If I instead instantiate FilteredClassifier using the constructor
>FilteredClassifier()
>​, I get also the
>check_for_modified_class_attribute
>​ member function.
>


I'm not quite sure why this comes as a surprise to you... The Python wrapper classes just enclose a Java object. It is up to you to choose the most convenient wrapper class.

Also, the jwrapper property exposes the underlying Java object if you should need low level access.

Alexander Osherenko

unread,
May 23, 2017, 3:56:14 PM5/23/17
to python-we...@googlegroups.com
I'm not quite sure why this comes as a surprise to you... The Python wrapper classes just enclose a Java object. It is up to you to choose the most convenient wrapper class.

I assume because I didn't
​know
 how to cast classifier to FilteredClassifier in python
​ to call ​
check_for_modified_class_attribute
​()​
.
 
Also, the jwrapper property exposes the underlying Java object if you should need low level access.
It is also not clear why check_for_modified_class_attribute
​is not found
:

>>> metaClassifier.jwrapper
Instance of weka.classifiers.meta.FilteredClassifier: FilteredClassifier: No model built yet.
>>> metaClassifier.jwrapper.check_for_modified_class_attribute()
Traceback (most recent call last):
  File "<pyshell#87>", line 1, in <module>
    metaClassifier.jwrapper.check_for_modified_class_attribute()
  File "D:\WinPython-64bit-3.4.3.7\python-3.4.3.amd64\lib\site-packages\javabridge\wrappers.py", line 86, in __getattr__
    raise AttributeError()
AttributeError

Best, Alexander
 

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz

--
You received this message because you are subscribed to a topic in the Google Groups "python-weka-wrapper" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-weka-wrapper/nToUq-M-aHU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-weka-wrapper+unsub...@googlegroups.com.

To post to this group, send email to python-weka-wrapper@googlegroups.com.

Peter Reutemann

unread,
May 23, 2017, 4:57:47 PM5/23/17
to python-weka-wrapper
>> Also, the jwrapper property exposes the underlying Java object if you
>> should need low level access.
>
> It is also not clear why check_for_modified_class_attribute
> is not found
> :
>
>>>> metaClassifier.jwrapper
> Instance of weka.classifiers.meta.FilteredClassifier: FilteredClassifier: No
> model built yet.
>>>> metaClassifier.jwrapper.check_for_modified_class_attribute()
> Traceback (most recent call last):
> File "<pyshell#87>", line 1, in <module>
> metaClassifier.jwrapper.check_for_modified_class_attribute()
> File
> "D:\WinPython-64bit-3.4.3.7\python-3.4.3.amd64\lib\site-packages\javabridge\wrappers.py",
> line 86, in __getattr__
> raise AttributeError()
> AttributeError

The Python property uses the following Java method:
setDoNotCheckForModifiedClassAttribute(boolean flag)

http://weka.sourceforge.net/doc.dev/weka/classifiers/meta/FilteredClassifier.html

For accessing the Java API, it is best to confer Weka's Javadoc:
http://weka.sourceforge.net/doc.dev/

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Alexander Osherenko

unread,
May 24, 2017, 11:23:03 AM5/24/17
to python-we...@googlegroups.com
1. I adopted the code from https://github.com/fracpete/python-weka-wrapper-examples/blob/fb4ae27c9221008f5e4181ae3dbd24178ac716b1/src/wekaexamples/classifiers/crossvalidation_addprediction.py to run cross-validation. I thought, somewhere the class attribute in test data would be reset to uninitialized (?), for instance, by the rand_data.test_cv function and afterwards the classifier would set it to calculated class value. However, the attribute attribute is always initialized where I output test data.
2. I wonder, how I can reliably read the number of an instance in instances. Is it possible only by reading the ID attribute value in an instance or I can store the ids of instances in memory and access them in folds?

Best, Alexander 

--
You received this message because you are subscribed to a topic in the Google Groups "python-weka-wrapper" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-weka-wrapper/nToUq-M-aHU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-weka-wrapper+unsub...@googlegroups.com.
To post to this group, send email to python-weka-wrapper@googlegroups.com.

Peter Reutemann

unread,
May 24, 2017, 5:01:37 PM5/24/17
to python-weka-wrapper
> 1. I adopted the code from
> https://github.com/fracpete/python-weka-wrapper-examples/blob/fb4ae27c9221008f5e4181ae3dbd24178ac716b1/src/wekaexamples/classifiers/crossvalidation_addprediction.py
> to run cross-validation. I thought, somewhere the class attribute in test
> data would be reset to uninitialized (?), for instance, by the
> rand_data.test_cv function and afterwards the classifier would set it to
> calculated class value. However, the attribute attribute is always
> initialized where I output test data.

The dataset never gets updated. When using the Evaluation class, it
usually unsets the class value in an instance before obtaining a
prediction from the classifier, to avoid potential cheating. You have
to explicitly call the classify_instance/distribution_for_instance
methods to obtain a prediction from a classifier (and then update the
data yourself - or use the AddClassification filter).

> 2. I wonder, how I can reliably read the number of an instance in instances.
> Is it possible only by reading the ID attribute value in an instance or I
> can store the ids of instances in memory and access them in folds?

Weka has no notion of row IDs, hence the use of ID attributes. As long
as you can ensure that the IDs are unique in a dataset, you should be
able to get the correct IDs within folds when performing/simulating
cross-validation.

Alexander Osherenko

unread,
May 25, 2017, 1:30:17 PM5/25/17
to python-we...@googlegroups.com
> 2. I wonder, how I can reliably read the number of an instance in instances.
> Is it possible only by reading the ID attribute value in an instance or I
> can store the ids of instances in memory and access them in folds?

Weka has no notion of row IDs, hence the use of ID attributes. As long
as you can ensure that the IDs are unique in a dataset, you should be
able to get the correct IDs within folds when performing/simulating
cross-validation.

​As far as I understood after inspecting the java source code, Instances are reordered in the Instances.stratify method that somehow swaps two instances i and j. Could you describe how this somehow works?
 
Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

--
You received this message because you are subscribed to a topic in the Google Groups "python-weka-wrapper" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-weka-wrapper/nToUq-M-aHU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-weka-wrapper+unsubscribe...@googlegroups.com.

To post to this group, send email to python-weka-wrapper@googlegroups.com.

Peter Reutemann

unread,
May 25, 2017, 5:19:21 PM5/25/17
to python-weka-wrapper
>> > 2. I wonder, how I can reliably read the number of an instance in
>> > instances.
>> > Is it possible only by reading the ID attribute value in an instance or
>> > I
>> > can store the ids of instances in memory and access them in folds?
>>
>> Weka has no notion of row IDs, hence the use of ID attributes. As long
>> as you can ensure that the IDs are unique in a dataset, you should be
>> able to get the correct IDs within folds when performing/simulating
>> cross-validation.
>>
> As far as I understood after inspecting the java source code, Instances are
> reordered in the Instances.stratify method that somehow swaps two instances
> i and j. Could you describe how this somehow works?

Not really, as I didn't write that code. Looks a bit like bubble sort, though.

Alexander Osherenko

unread,
May 26, 2017, 2:53:51 AM5/26/17
to python-we...@googlegroups.com
>> > 2. I wonder, how I can reliably read the number of an instance in
> As far as I understood after inspecting the java source code, Instances are
> reordered in the Instances.stratify method that somehow swaps two instances
> i and j. Could you describe how this somehow works?

Not really, as I didn't write that code. Looks a bit like bubble sort, though.

Of course
​, it makes very much sense.​

Best, Alexander
 
Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

--
You received this message because you are subscribed to a topic in the Google Groups "python-weka-wrapper" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-weka-wrapper/nToUq-M-aHU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-weka-wrapper+unsub...@googlegroups.com.

To post to this group, send email to python-weka-wrapper@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages