Hoeffding trees Verses J48

189 views
Skip to first unread message

Abhijeet Godase

unread,
May 5, 2012, 10:50:15 AM5/5/12
to moa-...@googlegroups.com
Hi all,

I'm working on classification imbalanced data streams.
While working on various algorithms i have found that when hoeffding trees are used as baselearner in the ensemble, results are obtained in less time than that of the j48(as baselearner).
But the difference in accuracy is noticeable i.e j48 clearly beats hoeffding trees in terms of accuracy(also other measures like AUROC mostly used for imbalanced data are better with j48).

although hoeffding trees are meant for stream classification can we say that it is justified to use j48 as baselearner in stream classification for examples like imbalanced streams??
Also where can I find clear differentiation in Hoeffding trees and J48 or general decision trees??


----- Abhijeet





Albert Bifet

unread,
May 5, 2012, 5:27:53 PM5/5/12
to moa-...@googlegroups.com
> although hoeffding trees are meant for stream classification can we say that
> it is justified to use j48 as baselearner in stream classification for
> examples like imbalanced streams??
> Also where can I find clear differentiation in Hoeffding trees and J48 or
> general decision trees??

Hoeffding Trees are designed for large amounts of data. With small
quantities of data, J48 will be much more accurate.
This is the original paper of the Hoeffding Tree where it explains the
differences with a non-streming decision tree:

http://www.cs.washington.edu/dm/vfml/papers/vfdt-kdd00.pdf

Cheers,

Albert

Abhijeet Godase

unread,
May 6, 2012, 1:49:35 AM5/6/12
to moa-...@googlegroups.com
>>Hoeffding Trees are designed for large amounts of data. With small
>>quantities of data, J48 will be much more accurate.


But many contributions in the literature have shown their work on imbalanced data streams using j48 as base learner while some used CART trees.
to name a few 




So is it justified atleast in the case of imbalanced data streams to use J48 as baselearner..?
Also paper (MUsera link 2 above) gives mention of VFDT trees and other contributions in related work on data streams but later mentions not much contributions/ related contributions are their on skewed data streams.





------- Abhijeet 





Albert Bifet

unread,
May 6, 2012, 4:22:22 PM5/6/12
to moa-...@googlegroups.com
> So is it justified atleast in the case of imbalanced data streams to use J48
> as baselearner..?

You can use J48 in data streams inside an ensemble.

Cheers,

Albert
Reply all
Reply to author
Forward
0 new messages