Hey there,
I conducted some experiments using the latest version of MOA (compiled from the GitHub repository) to assess the performance of
StreamingGradientBoostedTrees on the
airlines.arff dataset.
Interestingly, despite employing different random seeds for each execution, I consistently obtained identical results for Kappa and Accuracy. The seeds I utilized were 5, 9, 13, 17, 19, 23, 29, 31, 37, and 121, which are the same used in the paper
Gradient boosted trees for evolving data streams.
Could you please review the command line below to confirm if it's ok?
seed=121
java -javaagent:sizeofag-1.1.0.jar -cp moa.jar moa.DoTask EvaluateInterleavedTestThenTrain \
-l "(meta.StreamingGradientBoostedTrees -l (trees.FIMTDD \
-s VarianceReductionSplitCriterion \
-g 25 \
-c 0.05 \
-e \
-p ))" \
-s "(ArffFileStream -f airlines.arff)" \
-i 1000000 \
-f 1000000 \
-r $seed \
-d sgbt_airlines_$seed.csv
Additionally, I conducted the experiment with StreamingRandomPatches, using the command line provided below. The results were aligned with those reported in the paper.
seed=121
java -javaagent:sizeofag-1.1.0.jar -cp moa.jar moa.DoTask EvaluateInterleavedTestThenTrain \
-l "(meta.StreamingRandomPatches \
-l (trees.HoeffdingTree \
-g 50 \
-c 0.01) \
-s 100 \
-o (Percentage (M * (m / 100))) \
-m 60 \
-a 6)" \
-s "(ArffFileStream -f airlines.arff)" \
-i 1000000 \
-f 1000000 \
-r $seed \
-d srp_airlines_$seed.csv
Thank you for your attention.
PS:
#classifications correct (percent)
#Results for seeds: 5, 9, 13, 17, 19, 23, 29, 31, 37, and 121
#SRP results
[68.51402435746027,
68.555553289592,
68.47657415973435,
68.58021109304521,
68.52700214875145,
68.59634063364993,
68.59800920681593,
68.45377032646562,
68.54368788041151,
68.53738438178438]
#SGBT results
[67.95338377368215,
67.95338377368215,
67.95338377368215,
67.95338377368215,
67.95338377368215,
67.95338377368215,
67.95338377368215,
67.95338377368215,
67.95338377368215,
67.95338377368215]