I'm having an issue where an expresion with sympy.tensor.Indexed variables does not seem to simplify correctly.
It is likely that I'm just doing something incorrect, so I was wondering if anyone could help me figure this out.
I'm using sympy to just generate some simple equations based on Bayes rule.
N is an event and I'm given a set of observations \set{X} = \{\ldots d_i \ldots }\.
I'm using an indexed base to represent the set of observations as an array.
I'm then using sympy.Product to multiply the probability of these observations together
(I'm assuming independence), so I create an Idx variable ``i`` and several sets of varaiables
that are indexed by ``i``. However, at the end of this script. It looks like ``P(di)[i]`` should be canceled out by a
simplification step, but it is not.
Here is the script:
from sympy.tensor import IndexedBase, Idx # NOQA
from sympy import * # NOQA
cardX = sympy.symbols('|X|', integer=True, positive=True, finite=True)
i = Idx(sympy.symbols('i', integer=True, finite=True))
s = symbols(expr, real=True, finite=True, negative=False)
#s = tensor.IndexedBase(expr, shape=(cardX,))[i]
s = tensor.IndexedBase(expr, shape=(1,))[i]
return sympy.Product(s, (i, start, stop))
return sympy.prod([s.subs(i, i_) for i_ in range(1, 4)])
P_X_given_N = psym('P(X|N)')
P_N_given_X = psym('P(N|X)')
P_N_given_di = IdxBase('P(N|di)')
P_di_given_N = IdxBase('P(di|N)')
pprint = sympy.pretty_print
P_N_given_di_ = (P_di_given_N * P_N) / P_di
pprint(Eq(P_N_given_di, P_N_given_di_))
REARANGE USING BAYES P(di | N)
P_di_given_N_ = (P_N_given_di * P_di) / P_N
pprint(Eq(P_di_given_N, P_di_given_N_))
AGGREGATE USING INDEPENDENCE
prod_P_di_given_N = Prod(P_di_given_N)
prod_P_di_given_N_ = Prod(P_di_given_N_)
P_X_given_N__ = prod_P_di_given_N
P_X_given_N_ = prod_P_di_given_N_
pprint(Eq(P_X_given_N, P_X_given_N__))
pprint(Eq(P_X_given_N, P_X_given_N_))
REARANGE TO LIKELIHOOD USING BAYES AGAIN
P_N_given_X__ = (P_X_given_N * P_N) / (P_X)
P_N_given_X_ = (P_X_given_N_ * P_N) / (P_X_)
pprint(Eq(P_N_given_X, P_N_given_X__))
pprint(Eq(P_N_given_X, P_N_given_X_))
print('--- simplify --- ')
P_N_given_X_done = P_N_given_X_.doit(deep=True)
pprint(Eq(P_N_given_X, P_N_given_X_done))
# Does not seem to cancel out the P(di)[i] variable
#pprint(Eq(P_N_given_X, sympy.simplify(P_N_given_X_done)))
The output of this script is:
-----------------------
OUTPUT OF SVM: P(N | di)
-----------------------
P(N)⋅P(di|N)[i]
P(N|di)[i] = ───────────────
P(di)[i]
-----------------------
REARANGE USING BAYES P(di | N)
-----------------------
P(N|di)[i]⋅P(di)[i]
P(di|N)[i] = ───────────────────
P(N)
-----------------------
AGGREGATE USING INDEPENDENCE
-----------------------
|X|
┬───┬
P(X|N) = │ │ P(di|N)[i]
│ │
i = 1
|X|
┬──────┬
│ │ P(N|di)[i]⋅P(di)[i]
P(X|N) = │ │ ───────────────────
│ │ P(N)
│ │
i = 1
=== ALSO ===
|X|
┬───┬
P(X) = │ │ P(di)[i]
│ │
i = 1
-----------------------
REARANGE TO LIKELIHOOD USING BAYES AGAIN
-----------------------
P(N)⋅P(X|N)
P(N|X) = ───────────
P(X)
---
|X|
┬──────┬
│ │ P(N|di)[i]⋅P(di)[i]
P(N)⋅│ │ ───────────────────
│ │ P(N)
│ │
i = 1
P(N|X) = ─────────────────────────────────
|X|
┬───┬
│ │ P(di)[i]
│ │
i = 1
--- simplify ---
|X|
-|X| ┬───┬
P(N)⋅P(N) ⋅│ │ P(N|di)[i]⋅P(di)[i]
│ │
i = 1
P(N|X) = ───────────────────────────────────────
|X|
┬───┬
│ │ P(di)[i]
│ │
i = 1
There are no additions in this formula, so the denominator should completely cancel.
Any ideas why the bottom term is not canceled by the top term?