Discrepancies between jrip through Python and using WEKA

Juan José Expósito González

unread,

Jan 22, 2024, 3:11:02 PMJan 22

to python-weka-wrapper

Hi everyone,

I am trying to automate a process whereby I extract rules with different seeds (starting with 1 and subsequent numbers in a sequence) from a data set. The steps I do with python are:

Load the data and remove features in certain columns (1, 3-7). The code snippet for this is:

remove = Filter(
        classname="weka.filters.unsupervised.attribute.Remove",
        options=["-R", ",".join(features_positions)],
    )

Then I remove all the attributes that correlate greater than a threshold with the class (LABEL in my dataset).

So far, so good and same results if I use weka. Same number of attributes in the same order.

Then, I use Jrip to extract the rules. I am putting here the whole code:

train, _ = data.train_test_split(percentage=train_pct, rnd=None)  # type: ignore

    for seed in tqdm(  # type: ignore
        range(num_iterations), desc="Extracting rules...", unit="iteration"
    ):
        options = f"-F 3 -N 2.0 -O {optimizations} -S {str(seeds[seed])}".split()  # type: ignore
        jrip = Classifier(classname="weka.classifiers.rules.JRip", options=options)
        jrip.build_classifier(train)

I use 2 for the optimization parameter and the seed starts with 1 and increases by 1 on each iteration (I ask for several passes to be executed). For example, if I request 50 iterations, the seed will take numbers from 1 up to 50. Before, I split the set into train and test. But I only want train (will Evaluate at a later stage)

The rules I get with Python are very different for the rules I get with Jrip using the same seed. The options I use for weka are:

I have highlighted the options I change or fine-tune in Weka. I get two rules (which work as expected when tested in another application) but with Python, using the same seed (2 in this case) I get zero rules...

I have been assuming the results should be the same provided the same dataset and steps, but I am missing something.

I have been searching for a similar topic in the online doc, but haven't found anything I can use.

Thanks in advance for your support.

JJ

Peter Reutemann

unread,

Jan 22, 2024, 3:23:36 PMJan 22

to python-we...@googlegroups.com

Just a quick note... In your code, you're splitting the dataset using a percentage split preserving the order (rnd=None -> preserve order), but Weka uses a random number generator seeded with 1 since "Preserve order" is not checked (see your screenshot of the "More options" dialog).

Maybe the differences stem from your different train/test splits?

Cheers, Peter

--
You received this message because you are subscribed to the Google Groups "python-weka-wrapper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-weka-wra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python-weka-wrapper/b8d32154-9964-43a1-808e-0a9d6b57d47fn%40googlegroups.com.

--

Peter Reutemann
Dept. of Computer Science

University of Waikato, Hamilton, NZ

Mobile +64 22 190 2375

https://www.cs.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Juanjo

unread,

Jan 22, 2024, 5:36:13 PMJan 22

to python-we...@googlegroups.com

Thanks! You are right! By using train, _ = data.train_test_split(train_pct * 100, Random(1)) now the rules are exactly the same! One question...I have bought Data Mining with Weka (waiting for it to arrive). Are there explanations of the different weka classifiers/filters... and options and what they do in the book?

Kindest Regards!

JJ

To view this discussion on the web visit https://groups.google.com/d/msgid/python-weka-wrapper/CAHoQ12JQK%2B7ifV9xxrKULyWGNYW8qR8otEB_AG_HpWcvm6Pqcg%40mail.gmail.com.

Peter Reutemann

unread,

Jan 22, 2024, 5:54:40 PMJan 22

to python-we...@googlegroups.com

> Thanks! You are right! By using train, _ = data.train_test_split(train_pct * 100, Random(1)) now the rules are exactly the same!

Cool! It's those little details that can throw one...

> One question...I have bought Data Mining with Weka (waiting for it to arrive). Are there explanations of the different weka classifiers/filters... and options and what they do in the book?

From what I remember (it's been a long time), some of the algorithms
are explained in there.

You can also look at the Javadoc of the relevant classes:
https://weka.sourceforge.io/doc.dev/

Classes that manage options should normally have their options
alongside their help printed their.

The same help can be output when executing a classifier/filter/etc in
the SimpleCLI with the "-h" parameter.

Or bring up the help in the Explorer via the "More" button in the
GenericObjectEditor dialog of any object (this help screen can differ
a bit from the command-line options).

Under pww3, you can use the "to_help(...)" method of a
OptionHandler-derived class to output a help screen:
https://fracpete.github.io/python-weka-wrapper3/weka.core.html?highlight=to_help#weka.core.classes.OptionHandler.to_help

Cheers, Peter

Reply all

Reply to author

Forward