Using DecisionTree to build a random forest model. Small, 200 items, 664 predictors for each item, input file size under 1 MB
I can build a random forest model with 1000 trees in about 8 seconds - great.
@time model=build_forest(yvalues[:,1],features,2,1000,0.5)
Then I tried to save that model for subsequent scoring by writing it to a JLD file.
Writing to an NFS mounted disk took multiple minutes, while writing a 194MB (!!) file.
If I write that to /dev/shm, it still takes 51 seconds (and still 194MB)
@time save("/dev/shm/foo.jld","model",model)
51.406531 seconds (12.01 M allocations: 465.667 MB, 0.38% gc time)
When I do something comparable in R with the same dataset, build the model and then use save() to save the model and the features, the whole process takes about 14 seconds, and is 2.8MB on disk. The save() part of the processing is very fast.
whos() shows
model 6884 KB DecisionTree.Ensemble
so if this is a good estimate of memory, I don't think the problem is with the DecisionTree object.
Am I doing something wrong, or is JLD doing something horrible?
Saw this.
https://github.com/JuliaLang/julia/issues/7893, so perhaps problems still persist?