I've installed weka and the python-weka-wrapper.
I got as far as
from weka.classifiers import Classifier
clf=Classifier(classname="weka.classifiers.rules.JRip")
from random import randint
X = [[randint(1,10) for _ in range(5)] for _ in range(100)]
y = [randint(0,1) for _ in range(100)]
but now I don't know how to load my data which is available as a Python data structure.
How can I load my data matrices, output the rules (in some parsable format) and test the classifier on new data?
> I suppose for now I will try calling JRip directly from the command line.
Hopefully we can sort out that problem.
> The are many sklearn users, which might be interested in Weka algorithms
> that cannot be found elsewhere.
> Maybe a small, full example (as you write it) would be good for the
> documentation. Sklearn users are used to the pattern
> X=[[...],[...],...]
> y=[...]
> clf=Clf(...)
> clf.fit(X, y)
> y_pred=clf.predict(X)
The next release enables you to create a dataset from x any y (as long
as it is all numeric data), as I mentioned in my other post.
> Here is my configuration:
> python-weka-wrapper 0.3.1
> javabridge 1.0.11
> python 2.7.8
> (Redhat) Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 GNU/Linux
> java-1.7.0-openjdk-1.7.0.85-2.6.1.3.el6_7.x86_64
What Fedora or RHEL version is that?
>> > I suppose for now I will try calling JRip directly from the command line.
>> Hopefully we can sort out that problem.
>
>
> But I cannot test my trained model on new data by calling it from command line only, can I? :(
Yes, you can. See below.
> Can I store the trained model from command line for now or do I have to parse the string output and recreate the rules?
Yes, you can. -d option for saving model, -l for using model. -l option in conjunction with -T option.
Use -h for help. Also check Weka manual.
>> > The are many sklearn users, which might be interested in Weka algorithms
>> > that cannot be found elsewhere.
>> > Maybe a small, full example (as you write it) would be good for the
>> > documentation. Sklearn users are used to the pattern
>> > X=[[...],[...],...]
>> > y=[...]
>> > clf=Clf(...)
>> > clf.fit(X, y)
>> > y_pred=clf.predict(X)
>> The next release enables you to create a dataset from x any y (as long
>> as it is all numeric data), as I mentioned in my other post.
>
>
> It's a useful option, indeed. I was actually aiming a documentation section where a full working example for sklearn users could be shown.
> Multiple times I tried finding alternatives to Weka after getting frustrated about not finding my case in the documentation. (even mentioning how to set the CLASSPATH could help)
CLASSPATH is not specific to Weka, it's a general Java question. My advice, don't use the CLASSPATH environment variable. Usually just gives you a headache with differing versions. I recommend using explicit -cp option when firing up JVM.
Cheers, Peter
> Thanks a lot for testing.
> I suppose it works on most Linuxes. Maybe this Redhat version is funny or has a weird configuration. I hope magically resolves by some update of the Linux or so. Meanwhile I'm using the jar-call and my colleagues are using RWeka which both work.
OK. Bit weird though. Can you test using eg Virtualbox whether you can set up a system with the steps involved that I described in my previous email? Maybe it is just this particular setup that is a bit strange.
> A last question here for JRip:
> The jar says there is an option -m for a cost matrix. Does JRip really support that and what is the format of the cost file?
This is general functionality provided by the Evaluation class, not JRip.
See the following wiki article:
http://weka.wikispaces.com/CostMatrix
BTW For general Weka questions, it is best to use the Weka mailing list. See Weka homepage for details.
Cheers, Peter