ADABOOST + RANDOM FOREST. Help appreciated!

471 views
Skip to first unread message

Yogesh Joshi

unread,
Sep 6, 2015, 10:28:46 PM9/6/15
to python-weka-wrapper
Hello everyone,

This is my first post. Would appreciate some help.
I would like to classify data using AdaBoost and RandomForest.
Can anyone suggest me how to go about it.
I have tried the things below. The first one is a simple J48Classifier which works absolutely fine (but do feel free to make
suggestions if you feel like it!!)

The second part is the one where I need help.
I am a newbie... so pardon the simple errors if any.

THE FOLLOWING IS THE CODE THAT I HAVE BEEN ACCOMPLISHED SO FAR.
BELOW IS THE ERROR MESSAGE.

##########################################################################################################################################################


import weka.core.jvm as jvm
from weka.core.converters import Loader
from weka.filters import Filter

def RunWeka (data_loc, infname):
    # Load Data.
    loader = Loader (classname = "weka.core.converters.CSVLoader")
    data = loader.load_file(data_loc + infname)
    removeCols = Filter (classname = "weka.filters.unsupervised.attribute.Remove", options=["-R", " 1, 3, 9, 32"])
    removeCols.inputformat (data)
    data_ready = removeCols.filter(data)
    data_ready.class_is_last()
   
    # J48 Classifier.
    print "1.  J48 Classifier."
    J48_class = Classifier(classname = "weka.classifiers.trees.J48", options = ["-C", "0.25", "-M", "2"])
    J48_class.build_classifier(data_ready)
    evaluationj48 = Evaluation(data_ready)
    evaluationj48.crossvalidate_model(J48_class, data_ready, 10, Random(100))
    plcls.plot_roc(evaluationj48, class_index = [0,1], title = report_fname, key_loc = "best", outfile = report_fname+'_J48.png', wait = False)
    j48 = str(evaluationj48.percent_correct)
   
    # Adaboost + Random Forest.
    print "2.  AdaBoost + RandomForest."
    AdaClass = MultipleClassifiersCombiner(classname="weka.classifiers.meta.AdaBoostM1", options = ["-P", "100", "-S", "1", "-I", "10", "-W"])
    AdaClass.build_classifier(data_ready)
    Combinat = Classifier (classname = "weka.classifiers.trees.RandomForest", options = ["-I", "100", "-K", "0", "-S", "1"])
    Combinat.build_classifier(data_ready)
    AdaClass.classifiers[Combinat]
   
    return
   
RunWeka (root_dir, filename_in)
jvm.stop()

##########################################################################################################################################################
ERROR MESSAGE:


1.  J48 Classifier.
2.  AdaBoost + RandomForest.
Traceback (most recent call last):
  File "Bagging.py", line 63, in <module>
    RunWeka (root_dir, filename_in)
  File "Bagging.py", line 49, in RunWeka
    AdaClass = MultipleClassifiersCombiner(classname="weka.classifiers.meta.AdaBoostM1",options=["-P", "100", "-S", "1", "-I", "10", "-W"])
  File "/usr/lib/python2.7/site-packages/weka/classifiers.py", line 458, in __init__
    self.enforce_type(jobject, "weka.classifiers.MultipleClassifiersCombiner")
  File "/usr/lib/python2.7/site-packages/weka/core/classes.py", line 531, in enforce_type
    raise TypeError("Object does not implement or subclass " + intf_or_class + ": " + get_classname(jobject))
TypeError: Object does not implement or subclass weka.classifiers.MultipleClassifiersCombiner: weka.classifiers.meta.AdaBoostM1

Peter Reutemann

unread,
Sep 6, 2015, 10:48:18 PM9/6/15
to python-weka-wrapper
The Python class "MultipleClassifiersCombiner" is a wrapper for the
Weka Java class "weka.classifiers.MultipleClassifiersCombiner".
However, like the error message states, Weka's AdaBoost is not derived
from that class. Instead, it is derived, more or less, from
"SingleClassifierEnhancer".

So, if you want to boost RandomForest using AdaBoost, rather than
combining them, then your code for evaluating/building could look like
this:

import weka.core.jvm as jvm
from weka.core.classes import Random
from weka.core.converters import Loader
from weka.classifiers import Classifier, SingleClassifierEnhancer, Evaluation

# data to use
data_ready = ... # from somewhere

# Adaboost + Random Forest.
print("AdaBoost + RandomForest.")
print("--> Cross-validation")
forest = Classifier(classname="weka.classifiers.trees.RandomForest",
options=["-I", "100", "-K", "0", "-S", "1"])
adaboost = SingleClassifierEnhancer(classname="weka.classifiers.meta.AdaBoostM1",
options=["-P", "100", "-S", "1", "-I", "10"])
adaboost.classifier = forest # set the classifier to boost
# if you want 10-fold cross-validation
evl = Evaluation(data_ready)
evl.crossvalidate_model(adaboost, data_ready, 10, Random(1))
print(evl.summary())
# if you want to just build AdaBoost/RandomForest
print("--> Adaboost model")
adaboost.build_classifier(data_ready)
print(adaboost)

HTH

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Yogesh Joshi

unread,
Sep 7, 2015, 12:36:05 AM9/7/15
to python-weka-wrapper
Thanks a million!!
This works like a charm.!!
Best Regards,
Yogesh.

Yogesh Joshi

unread,
Sep 7, 2015, 3:26:42 PM9/7/15
to python-weka-wrapper
Hello again!
Following your guidelines with AdaBoost and RandomForest, I applied the same logic to Bagging and J48... using the same dataset.
The Code and the Error Message is below.
I obtain the classification results and the code works fine.
The Classification Accuracy achieved is 55.67% with the python weka wrapper and 56.83 % when I use the Weka 3.7.12 GUI- Explorer.
Is this normal, because of the inherent differences between the commandline execution and gui execution or am I missing something.
Also, I do not understand why I keep getting a NullPointer Exception.
Am I doing things the right way?
Help appreciated.

Also... what is the best way to understand these error messages thrown by Java (rather.. a best way to understand how to call and use various Weka classes?)

Best Regards,
Yogesh.

j48 = Classifier(classname = "weka.classifiers.trees.J48", \

        options = ["-C", "0.25", "-M", "2"])
bagging = SingleClassifierEnhancer (classname = "weka.classifiers.meta.Bagging", \
        options = ["-P", "100", "-S", "1"])
bagging.class_j48 = j48
Beval_j48 = Evaluation (data_ready)
Beval_j48.crossvalidate_model (bagging.class_j48, data_ready, 10, Random(10))
print Beval_j48.summary()
print Beval_j48.percent_correct

##########################################################################################################################################################
ERROR MESSAGE:

java.lang.NullPointerException
        at weka.core.ClassCache.initFromManifest(ClassCache.java:248)
        at weka.core.ClassCache.initFromJar(ClassCache.java:293)
        at weka.core.ClassCache.initFromClasspathPart(ClassCache.java:351)
        at weka.core.ClassCache.initialize(ClassCache.java:372)
        at weka.core.ClassCache.<init>(ClassCache.java:111)
        at weka.core.ClassDiscovery.initCache(ClassDiscovery.java:447)
        at weka.core.ClassDiscovery.clearCache(ClassDiscovery.java:481)
        at weka.Run.findSchemeMatch(Run.java:80)
        at weka.core.Utils.forName(Utils.java:1085)
        at weka.classifiers.AbstractClassifier.forName(AbstractClassifier.java:154)
        at weka.classifiers.SingleClassifierEnhancer.setOptions(SingleClassifierEnhancer.java:115)
        at weka.classifiers.IteratedSingleClassifierEnhancer.setOptions(IteratedSingleClassifierEnhancer.java:108)
        at weka.classifiers.ParallelIteratedSingleClassifierEnhancer.setOptions(ParallelIteratedSingleClassifierEnhancer.java:94)
        at weka.classifiers.RandomizableParallelIteratedSingleClassifierEnhancer.setOptions(RandomizableParallelIteratedSingleClassifierEnhancer.java:95)
        at weka.classifiers.meta.Bagging.setOptions(Bagging.java:334)

Correctly Classified Instances        5567               55.67   %
Incorrectly Classified Instances      4433               44.33   %
Kappa statistic                          0.1134
Mean absolute error                      0.4728
Root mean squared error                  0.5452
Relative absolute error                 94.5528 %
Root relative squared error            109.0431 %
Coverage of cases (0.95 level)          92.49   %
Mean rel. region size (0.95 level)      91.28   %
Total Number of Instances            10000

On Sunday, September 6, 2015 at 10:28:46 PM UTC-4, Yogesh Joshi wrote:

Peter Reutemann

unread,
Sep 7, 2015, 5:07:36 PM9/7/15
to python-weka-wrapper
> Following your guidelines with AdaBoost and RandomForest, I applied the same
> logic to Bagging and J48... using the same dataset.
> The Code and the Error Message is below.
> I obtain the classification results and the code works fine.
> The Classification Accuracy achieved is 55.67% with the python weka wrapper
> and 56.83 % when I use the Weka 3.7.12 GUI- Explorer.
> Is this normal, because of the inherent differences between the commandline
> execution and gui execution or am I missing something.

In your Python code, you're using "10" as seed for randomizing the
dataset during CV. In the GUI, however, the default is "1". That's
probably why're you're getting the slight difference.

> Also, I do not understand why I keep getting a NullPointer Exception.

Looks like there is a jar file on your classpath that does not contain
a MANIFEST.MF file. Don't worry about it.

> Am I doing things the right way?

Yep.

> Also... what is the best way to understand these error messages thrown by
> Java (rather.. a best way to understand how to call and use various Weka
> classes?)

Mainly by knowing Weka, unfortunately.

If you get an error message like "does not implement" or "is not
derived from" then you're probably using the wrong wrapper or the
wrong Java class. You can always refer to the Weka Javadoc to figure
out the class hierarchy:
http://weka.sourceforge.net/doc.dev/

I tried to stay close to the Weka naming convention of classes and
methods. However, whenever possible, I used Python properties rather
than typical Java get/set methods.
Reply all
Reply to author
Forward
0 new messages