Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

StreamingGradientBoostedTrees returns identical results no matter seed value

32 views
Skip to first unread message

Moura

unread,
Apr 12, 2024, 7:48:08 PM4/12/24
to MOA users
Hey there,

I conducted some experiments using the latest version of MOA (compiled from the GitHub repository) to assess the performance of StreamingGradientBoostedTrees on the airlines.arff dataset. 

Interestingly, despite employing different random seeds for each execution, I consistently obtained identical results for Kappa and Accuracy. The seeds I utilized were 5, 9, 13, 17, 19, 23, 29, 31, 37, and 121, which are the same used in the paper Gradient boosted trees for evolving data streams.

Could you please review the command line below to confirm if it's ok?

seed=121
java -javaagent:sizeofag-1.1.0.jar -cp moa.jar moa.DoTask EvaluateInterleavedTestThenTrain \
 -l "(meta.StreamingGradientBoostedTrees -l (trees.FIMTDD  \
        -s VarianceReductionSplitCriterion \
        -g 25 \
        -c 0.05 \
        -e \
        -p ))" \
 -s "(ArffFileStream -f airlines.arff)" \
 -i 1000000 \
 -f 1000000 \
 -r $seed \
 -d sgbt_airlines_$seed.csv



Additionally, I conducted the experiment with StreamingRandomPatches, using the command line provided below. The results were aligned with those reported in the paper.

seed=121
java -javaagent:sizeofag-1.1.0.jar -cp moa.jar moa.DoTask EvaluateInterleavedTestThenTrain \
 -l "(meta.StreamingRandomPatches \
    -l (trees.HoeffdingTree \
        -g 50 \
        -c 0.01) \
    -s 100 \
    -o (Percentage (M * (m / 100))) \
    -m 60 \
    -a 6)" \
 -s "(ArffFileStream -f airlines.arff)" \
 -i 1000000 \
 -f 1000000 \
 -r $seed \
 -d srp_airlines_$seed.csv


Thank you for your attention. 

PS: 
#classifications correct (percent)
#Results for seeds: 5, 9, 13, 17, 19, 23, 29, 31, 37, and 121
#SRP results
[68.51402435746027,
 68.555553289592,
 68.47657415973435,
 68.58021109304521,
 68.52700214875145,
 68.59634063364993,
 68.59800920681593,
 68.45377032646562,
 68.54368788041151,
 68.53738438178438]


#SGBT results
[67.95338377368215,
 67.95338377368215,
 67.95338377368215,
 67.95338377368215,
 67.95338377368215,
 67.95338377368215,
 67.95338377368215,
 67.95338377368215,
 67.95338377368215,
 67.95338377368215]

Nuwan Gunasekara

unread,
Apr 14, 2024, 8:15:52 PM4/14/24
to MOA users

Hi Moura,

This indeed looks like a bug in StreamingGradientBoostedTrees implementation where it does not get the random seed set by the evaluator.

To get around this please use the random seed option (-r) in StreamingGradientBoostedTrees

EvaluateInterleavedTestThenTrain -l (meta.StreamingGradientBoostedTrees -r 121) -s (ArffFileStream -f airlines.arff)  -i 1000000 -f 1000000

Please feel free to raise a bug for the initial issue so that I could look into fixing it.

Thanks and regards,
Nuwan

Nuwan Gunasekara

unread,
Apr 15, 2024, 8:03:16 AM4/15/24
to MOA users
Message has been deleted

Nuwan Gunasekara

unread,
Apr 15, 2024, 7:10:10 PM4/15/24
to MOA users
Changes are in the main branch now.

Moura

unread,
Apr 15, 2024, 7:31:32 PM4/15/24
to MOA users
Thank you very much, Nuwan!
Reply all
Reply to author
Forward
0 new messages