On Sun, Oct 6, 2013 at 4:58 PM, Andrei Perhinschi <
aper...@mix.wvu.edu> wrote:
> Hello all,
>
> I am having trouble understanding the discretizer described in the week of sep 17 review questions. Is it supposed to be an equal width discretizer with N=3? I figured so due to (max-min)/3. Furthermore what constitutes a "dull bin" and how does entropy factor in? My confusion stems from not understanding what to do with the output of the EWD algorithm (
menzies.us/cs573/?nums2syms) in relation to the entropy calculation.
>
hey andrei,
thanks for your question
one clarification (and apology): there is some summary cr*p in those
examples. so please ignore lines 2 and 3 of each example. which
means, please ignore
73.57, no,
6.572, 0.6429,
and
69.07, yes
3.605, 0.5
> Is it supposed to be an equal width discretizer with N=3? I figured so due to (max-min)/3.
think of it as a 2 pass algorithm:
pass1: divide into 3 (equal width)
pass2: look at the entropy in each bin (entropy measured from the
class symbols found in each bin).
--- if N consecutive bins have similar entropy, then those N are dull
(since the decision variable does not change across their width) and
can be replaced by 1 bin.
as to how to apply the entropy calc..... suppose we had 8 rows of data
that references two class variables.
f1 class
== ==
1 a
2 a
3 a
4 a
5 b
6 b
7 b
8 b
a four-way split (equal width) on f1 would generate
1 a
2 a
3 a
4 a
5 b
6 b
7 b
8 b
if we split between 4 and 5, we would have n1=4 and n2=4 things above
and below the split and each split would have entropy e1=e2=0 in which
case, the score for this split would be:
score1 = n1/(n1+n2)e1 + n2/(n1+n2)e2 = 4/8*0 + 4/8*0 = 0
but it split between 6 and 7, we would have n1=6 and n2=2 and e1=
entropy(4/6,2/6) and e2 = 0. in which case, the score for that split
would be
score2= 6/8*ent(4/6,2/6) + 2/8*0
note that score2 > score1. ie. whatever we did to generated score1 was
BETTER than whatever we did to generate score2.
so we would prefer the first split
t
> Thank you,
> Andrei
>
> --
> You received this message because you are subscribed to the Google Groups "csx73" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
csx73+un...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
--
:: there are some who call me...
tim.m...@gmail.com
:: prof @ cs.ai.se.csee.wvu.usa.sol.virgo.all.nil
::
+1-304-376-2859
::
http://menzies.us (skype = menzies.tim)
<hubris>
vita= http://
goo.gl/8eNhY
pubs= http://
goo.gl/8KPKA
stats= http://
goo.gl/vggy1
wow = http://
goo.gl/2Wg9A
</hubris>