Re: [sympy] Tedious probability expansion problem and set notation

51 views
Skip to first unread message

Matthew Rocklin

unread,
Nov 12, 2012, 12:46:32 PM11/12/12
to sy...@googlegroups.com
Short answer here is, I think, no. 

The short answer should be that sympy.stats should be able to handle all of this for you at a level higher than sets. It currently errs on this sort of question unfortunately. It wouldn't be hard to add though. The infrastructure is there. 

Sets handles this sort of problem while it computes measures. Sadly it assumes a measure with density that is always equal to 1. It would be nice if this piece were generalized so that it would integrate a general function over the domain. In principle this wouldn't be challenging to add. The tedium that you're looking to automate is already solved in the various `def _measure` functions in `sympy/core/sets.py`. Unfortunately it's tangled up with some too-simple assumptions. You would just need to add an argument to all of the `_measure` methods. Presumably at the Interval base-layer you would replace `return self.right - self.left` with `return integrate(f, self)`. 

If you implement this on your own outside of SymPy you might find Union._measure helpful. It solves the annoying AuBuC == A + B + C - AB - BC + ABC problem generally. If this code were generalized so that `blah.measure` were replaced with `foo(blah)` I suspect you would have your problem 80% solved. 

Your sort of problem is exactly what I would like sympy.stats to be able to solve with sympy.core.sets. If I ever get more time I'll work on this. Probably not for a while though. 

I'm happy to help out if you're willing to fix the problem within SymPy.sets. It'd be a nice contribution.



On Mon, Nov 12, 2012 at 10:59 AM, Simon Clift <ssc...@gmail.com> wrote:
Hi folks,

I have a straightforward, but tedious probability problem that I need to expand symbolically.  Sympy's set and interval material is close, but I can't see how it would work in a multidimensional application.  I've used Sympy for some fairly intricate PDE problems, but never for this sort of thing, and I would appreciate any suggestions, please.

I have a number of events that appear as, for example  2 < X < 5 and 3 < Y < 8 which have associated probabilities and joint distributions (i.e. are not mutually exclusive).  From basic probability and set theory

   p( 2<X<5 \cup 3<Y<8) = p( 2<X<5 ) + p( 3<Y<8 ) - p( 2<X<5 \cap 3<Y<8 )

and so on.  My problems start with about 6 unions that are intersections fo 2 conditions each, all in 3 variables, so requires both the expansion above and reduction for intersecting intervals.  It isn't difficult, just tedious (and error-prone).

I was about to hand-roll the symbolic algebra as Python classes, but I was wondering if there was a way to approach this with Sympy's intervals module.  It's not clear to me, from the docs or from experimentation, that it handles multi-dimensional problems.

Best regards
-- Simon

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sympy/-/qd4-NeUkPzIJ.
To post to this group, send email to sy...@googlegroups.com.
To unsubscribe from this group, send email to sympy+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sympy?hl=en.

Matthew Rocklin

unread,
Nov 12, 2012, 12:50:06 PM11/12/12
to sy...@googlegroups.com
Actually, we could give you a set of non-intersecting sets fairly easily. Your problem can be written as follows

In [2]: s = Union(Interval(2, 5)*Interval(-oo, oo), Interval(-oo, oo)*Interval(3, 8))
In [4]: 2.complement.complement
Out[4]: 
(((-∞, 2) ∪ (5, ∞)) × [3, 8]) ∪ ([2, 5] × ((-∞, 3) ∪ (8, ∞))) ∪ ([2, 5] × [3, 8])

Not the prettiest but it should suffice.

Matthew Rocklin

unread,
Nov 12, 2012, 12:50:29 PM11/12/12
to sy...@googlegroups.com
* should be

In [4]: s.complement.complement

Matthew Rocklin

unread,
Nov 12, 2012, 12:53:41 PM11/12/12
to sy...@googlegroups.com
I'm not strictly convinced that that will work for all complex cases. You should double-check your first results.

Simon Clift

unread,
Nov 12, 2012, 3:09:01 PM11/12/12
to sy...@googlegroups.com
Hi Matthew,

Thank you very much for the examples, I'll play around with them and see if I can cojole things into shape.

I've just been looking through the core/set.py and stats/*.py code (and brushing up with the Measure Theory chapters from Friedman's Foundations of Modern Analysis; it's been a while).  I didn't twig to the * operator creating multi-dimensional intervals.  Naming the axes would be nice but that's just a bookkeeping convenience.

Unit measure isn't a biggie for me.  Joint probabilities will almost certainly have to be estimated as if the two variables were independent.  For my measurements, which are generated from a statistical fit, the model should (ideally, if not actually in theory) give me at least uncorrelated axes and I have an empirical distribution for the random variables that should at least let me scale everything to marginal values.  As a first approximation it's probably good enough.


On Monday, 12 November 2012 12:53:43 UTC-5, Matthew wrote:
I'm not strictly convinced that that will work for all complex cases. You should double-check your first results.

Will do.  I'll post my final notes and code.

Thanks again
-- Simon

Matthew Rocklin

unread,
Nov 12, 2012, 3:15:50 PM11/12/12
to sy...@googlegroups.com
I've just been looking through the core/set.py and stats/*.py code (and brushing up with the Measure Theory chapters from Friedman's Foundations of Modern Analysis; it's been a while).  I didn't twig to the * operator creating multi-dimensional intervals.  Naming the axes would be nice but that's just a bookkeeping convenience.

I agree that this is inconvenient. SymPy stats should really be doing this bookkeeping for you.
 
Unit measure isn't a biggie for me.  Joint probabilities will almost certainly have to be estimated as if the two variables were independent.  For my measurements, which are generated from a statistical fit, the model should (ideally, if not actually in theory) give me at least uncorrelated axes and I have an empirical distribution for the random variables that should at least let me scale everything to marginal values.  As a first approximation it's probably good enough.


On Monday, 12 November 2012 12:53:43 UTC-5, Matthew wrote:
I'm not strictly convinced that that will work for all complex cases. You should double-check your first results.

Will do.  I'll post my final notes and code.

If you do go digging around in the code I'd probably suggest working with sympy.core.sets rather than sympy.stats. sympy.core.sets is better organized and has a much lower entry barrier.

Simon Clift

unread,
Nov 14, 2012, 2:58:38 PM11/14/12
to sy...@googlegroups.com
Digging.  I'm looking to hack something in to support symbolic intervals and need some advice, please.

I'm getting my problem laid out as an intersection, but I'm hitting a problem performing the calculation symbolically.  If I have

  A,B,C,D = symbols( ( "A", "B", "C", "D" ), real=True )

  i_ab = Interval ( A, B )
  i_cd = Interval ( C, D )

  i_chk = Intersection( i_ab, i_cd )

then in line 449 of core/sets.py I don't have the is_comparable property.  I was looking through the core/relational.py code for an operator that would allow me to impose a strict ordering, say A < C < B < D (in Maple I would use something like assume(A<B) ).  Is there an accepted way of attaching an is_comparable type operation to the individual Symbol objects, say:

  A.is_lessthan( C )

which would (say) set

  A.is_comparable = True
  A.comparison_dict = { C : '<' }

or something suitably Sympy-compatible?  Should I fear combinatoric complexity problems evaluating that?

Thanks
-- Simon

Matthew Rocklin

unread,
Nov 14, 2012, 3:01:22 PM11/14/12
to sy...@googlegroups.com
My understanding is that SymPy is not able to handle information like `assume(A<B)`. This has been on our issues list for a while I think. 


--
You received this message because you are subscribed to the Google Groups "sympy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sympy/-/X8YHtVh_nJcJ.

Matthew Rocklin

unread,
Nov 14, 2012, 3:01:43 PM11/14/12
to sy...@googlegroups.com
I'm not very knowledgeable on this part of SymPy though. Hopefully I am wrong. 

Aaron Meurer

unread,
Nov 14, 2012, 3:13:28 PM11/14/12
to sy...@googlegroups.com
assume(x < y) is an open issue (https://code.google.com/p/sympy/issues/detail?id=2721).  For now, you might be able to work with assume(Q.positive(x - y)). But it doesn't look like automatic reasoning about that is implemented yet:

In [17]: ask(Q.positive(x - z), Q.positive(x - y) & Q.positive(y - z))
<None>

The logic is simple enough in this case, though, that you can probably just implement it from scratch.  Definitely if your endpoints are just Symbols (not Adds) it will be simple.  Beyond that, maybe the inequality solver (in solve()) can do some reasoning for you?

Aaron Meurer

Simon Clift

unread,
Nov 14, 2012, 3:37:31 PM11/14/12
to sy...@googlegroups.com
Hi Matthew,

This does appear to be the case.  I've just looked over facts.py and assumptions.py as well.

Interval._intersection evaluates the comparisons and there is an explicit comment about not doing comparisons on Symbol classes on line 539

        # We can't intersect [0,3] with [x,6] -- we don't know if x>0 or x<0
        if not self._is_comparable(other):
            return None

That would leave open the question of how best to put such machinery into the Symbol class.

It's not clear to me, BTW, what distinguishes something that belongs in a FactKb from something that is denoted with a member function such as Symbol.is_comparable.  I gather that "facts" are static (e.g. "this is a real") rather than contextual (e.g. "this A \in \Real < B \in \Real") but doesn't that leave such facts out of the scope of the rules engine?  Seems like there is an opportunity there for a lot more power in the system.  Just reviewing, for example, my analysis text there are a lot of "facts" that might be useful for proofs or reasoning about expressions.  I imagine that's getting out of scope for Sympy, perhaps?

It seems to me that something could be shoe-horned in, but I'd be afraid of doing something inadequately generalized.

Thanks
-- Simon

Matthew Rocklin

unread,
Nov 14, 2012, 3:56:13 PM11/14/12
to sy...@googlegroups.com
There are two different systems in sympy to represent facts; these are our two assumptions systems. 

In the old version expressions have facts attached to them
x = Symbol(x, positive=True)

In the new version they are separate
x = Symbol(x)
facts = Q.real(x)

There are two problems:
1. Neither system is able to handle x>y facts except for simple cases (e.g. for y == 0). The new system should be able to handle a simple version easily. It might be more challenging to do more serious inference.
2. Even if we fixed this much of the code still uses the old system. 

This has been one of the top issues in SymPy for a long while. If we had some mechanism to associate bounties to issues this would probably have the highest.




--
You received this message because you are subscribed to the Google Groups "sympy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sympy/-/Dalk6tMLV3gJ.

Simon Clift

unread,
Nov 14, 2012, 4:15:24 PM11/14/12
to sy...@googlegroups.com
Hi Matthew, Aaron,


On Wednesday, 14 November 2012 15:56:16 UTC-5, Matthew wrote:
There are two different systems in sympy to represent facts; these are our two assumptions systems. 
...
This has been one of the top issues in SymPy for a long while. If we had some mechanism to associate bounties to issues this would probably have the highest.

Thanks again for your help, this puts it into context.

I won a beer once for reliably triggering an exponential storage complexity growth bug in a commercial compiler.  Turned out to be a very nasty issue, just most people didn't hit it quite the way I did. :)  Sorry, it's nasty knack I have for zooming in on these things.

I've got a work-around for now:  I just assigning integer numbers to my symbols by their strict order, but I'll keep this one in mind when I've got the time for it.  At least I'm getting the expansions right even if the code conversion back into Python is going to be peculiar. :)  I do find Sympy very useful and would like to contribute.

Best regards
-- Simon

Simon Clift

unread,
Nov 15, 2012, 10:59:04 AM11/15/12
to sy...@googlegroups.com
Aha.  Here's a tidy expression:

   http://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle

That's what I was looking for.  Combinatorics is not my field... clearly... :(

Aaron Meurer

unread,
Nov 15, 2012, 2:05:58 PM11/15/12
to sy...@googlegroups.com
Most people aren't good combinatorists. SymPy should be able to do this for you. 

Aaron Meurer
--
You received this message because you are subscribed to the Google Groups "sympy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sympy/-/Bk4yrPFD4pEJ.
Reply all
Reply to author
Forward
0 new messages