Issue 3129 in sympy: Drastic change to sympy.stats: Adding concept of Probability Distributions on surface level

2 views
Skip to first unread message

sy...@googlecode.com

unread,
Mar 5, 2012, 2:19:03 AM3/5/12
to sympy-...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 3129 by nathan.f...@gmail.com: Drastic change to sympy.stats:
Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

Currently, you create a random variable from a distribution like this:
>>> X = Binomial(n, p)

This emulates the standard mathematical notation ``X ~ Binomial(n, p)``

That is, X *samples* from the Binomial distribution with count n and
probability p. But the current notation can also be interpreted as X
*equals* this Binomial distribution, and it's unclear that the function
Binomial (or any of the distribution functions) returns a random variable
and not the distribution itself. In fact, sympy.stats does not have any
class or concept of Distribution.

My suggestion is to add ProbabilityDistribution to sympy.stats and change
the current syntax for creating new random variables. I'm not exactly sure
on how this would interact with current ProbabilitySpaces (maybe we can
just rename BinomialPSpace to just Binomial and leave it at that). It
should be visible to the user, unlike, say, PSpace, so the user can play
with it as well as with random variables.

We call a random variable as so:
>>> X = RandomSymbol('X', dist=Binomial(n, p))
or another notation I was thinking of,
>>> X = Binomial(n, p).new_symbol('X')

'Binomial' would in this case be a type of ProbabilityDistribution. This is
more verbose than the current way, but it makes it explicit that X is a
random symbol and not a distribution. This also gets rid of the issue of
generating default random symbol names. Previously you'd have to write
>>> X = Binomial(n, p, symbol='X')
to bind the symbol name 'X' to the variable. Otherwise it would use a
default, incrementing symbol. The first notation appeals to me because it
is similar to the notation for creating non-random symbols. The second
might be more pleasant if we replace 'new_symbol' with something shorter...

Adding distributions would add a bunch of interesting issues. Two
distributions with the same parameter should be equal to each other, but
two variables sampled from the same distribution aren't always equal.

>>> BinomA = Binomial(1, S.Half)
>>> BinomB = Binomial(1, S.Half)
>>> BinomA == BinomB
True
>>> X = RandomSymbol('X', BinomA)
>>> Y = RandomSymbol('Y', BinomA)
>>> P(Eq(X, Y))
0.5

Also, you shouldn't be able to call E (expected value) of a distribution,
though you should store the mean as a static property.

>>> E(X) == BinomA.mean
True
>>> Var(X) == BinomA.variance
True
>>> Density(X) == BinomA.pdf
True

But can you multiply distributions or transform them? They are, after all,
generalized functions...

To summarize:
- Add the concept of ProbabilityDistribution to sympy.stats
- Functions like Binomial, Bernoulli, Gamma are now instances or subclasses
of ProbabilityDistribution.
- Change the syntax of creating a random variable to be unambiguous.
- Distributions are static objects: they carry information like mean,
variance, pdf, and two distributions are equal if they have the same
parameters

Benefits:
- Get rid of redundancy of creating a class for type of PSpace and then a
function to get the random variable of that PSpace.
- Explicitly creating symbol names, no more default symbols with increasing
numbers
- Unambiguous creation of new random variables
- Simple ProbabilityDistribution concept visible to users.

Drawbacks:
- More verbose to create a new RV
- May be seen as complicating the already complicated sympy.stats class
hierarchy

sy...@googlecode.com

unread,
Mar 5, 2012, 6:07:13 PM3/5/12
to sympy-...@googlegroups.com
Updates:
Status: Accepted
Cc: MRock...@gmail.com
Labels: Statistics Milestone-Release0.7.2

Comment #1 on issue 3129 by asme...@gmail.com: Drastic change to

sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

You could also allow to create symbols with a method, like

X = BinomA.random_sample('X')

You could also add various kinds of syntactic sugar to make things easier
(e.g., some kind of analog to symbols()).

All in all, this sounds like a good plan to me. It will totally break the
way things work in stats, so we either need to do this before we release,
or somehow mark that module as "experimental".

sy...@googlecode.com

unread,
Mar 5, 2012, 7:02:46 PM3/5/12
to sympy-...@googlegroups.com

Comment #2 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

The current sugarless way of creating a random symbol is something like this

>>> BinomA = BinomialPSpace(1, S.Half, symbol=Symbol('X'))
>>> X = RandomSymbol(BinomA, Symbol('X'))

We add the function "Binomial" as syntactic sugar

X = Binomial(1, S.Half, symbol=Symbol('X'))
or
X = Binomial(1, S.Half)

If a symbol is not specified then a default one is auto-generated.

It looks like one of the things you're proposing to replace PSpace with
Probability Distribution and clean it up so that it is user-visible. I'm
generally happy with this. There is a lot we can do to clean up the
internals of sympy.stats and this might be one of them. It would be great
to have a second developer go over the code in depth and provide a second
perspective.

I'm much more hesitant to affect the interface however. I think that
single-line random variable creation is important. It allows introductory
users to jump into sympy.stats much more quickly. I like auto-generation of
symbols if not provided (this makes the code look more like math) but I'm
not going to fight very hard for it.

Some comments on your bullet points

To summarize:
- Add the concept of ProbabilityDistribution to sympy.stats

* I think this should only be done if it replaces ProbabilitySpace


- Distributions are static objects: they carry information like mean,
variance, pdf, and two distributions are equal if they have the same
parameters

* You'll have to be careful about depending on this information. You'll end
up creating lots of compound distributions when doing statistical
manipulations and you won't have the mean, variance, pdf, etc... for these
compound distributions a priori.

Benefits:
- Get rid of redundancy of creating a class for type of PSpace and then a
function to get the random variable of that PSpace.

* The functions are just there for syntactic sugar. I would suggest this
sugar in either case.


- Explicitly creating symbol names, no more default symbols with increasing
numbers

* This is a different issue I think. This can be addressed in either
system. I.e. we could ask the user to type in "X = Normal(0, 1, 'X')" in
the current system.


- Unambiguous creation of new random variables

* Can you expand upon this?

Drawbacks:
- More verbose to create a new RV

* We could add sugar to solve this.


- May be seen as complicating the already complicated sympy.stats class
hierarchy

* I'm pretty confident that the second go around you would end up reducing
the complexity, not increasing it.


What I would do if this were entirely up to me:

-- Keep the current interface with functions Normal, Binomial, etc....
Require an explicit letter on creation. I.e.
X = Normal(0, 1, 'X')
I believe that either this or the current way, "X = Normal(0, 1)", is the
right way to do random variable creation. It matches mathematical tradition.
-- Release 0.72
-- Work on internals
-- Decide later if we want to allow the syntax "X = Normal(0, 1)" It's much
easier to decide later to allow this syntax than disallow it.

This allows us to think about this problem over an extended period without
blocking the release. The API for the internals can be released with 0.73.
I.e. we expose only the sugar for the moment to buy us time. I think it
will be easier for us to come up with a plan for the interface than for the
internals.

In any event I encourage you to play with this idea. I think that a lot of
good can come out of it.

sy...@googlecode.com

unread,
Mar 7, 2012, 4:14:50 AM3/7/12
to sympy-...@googlegroups.com

Comment #3 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

It looks like one of the things you're proposing to replace PSpace with

Probability Distribution and clean it up so that it is user-visible. I'm
generally happy with this. There is a lot we can do to clean up the
internals of sympy.stats and this might be one of them. It would be great
to have a second developer go over the code in depth and provide a second
perspective.


But, formally probability distributions and spaces are different things.
Might cause confusion if we say something is a p-distribution when it has
the capabilities of a p-space? I don't have any experience in formal
probability theory and I don't know if you're even thinking about these as
you were implementing stats...:
http://en.wikipedia.org/wiki/Probability_distribution
http://en.wikipedia.org/wiki/Probability_space


I'm much more hesitant to affect the interface however. I think that
single-line random variable creation is important. It allows introductory
users to jump into sympy.stats much more quickly. I like auto-generation of
symbols if not provided (this makes the code look more like math) but I'm
not going to fight very hard for it.

More like math, but less consistent with sympy.
i.e., to make a regular symbol you do
>>> x = Symbol('x')

The only other example of a *specific* symbol I can find is MatrixSymbol,
which also has the same notation (symbol name first, arguments [in this
case dimensions] second):
>>> A = MatrixSymbol('A', 3, 4) # 3x4 matrix

I just reason that making random symbols should be the same:
>>> X = RandomSymbol('X', <stuff>)

Of course, we could go the other way and add auto-generation for regular
symbols :)
>>> x = Symbol()
>>> x
x1


- Unambiguous creation of new random variables
* Can you expand upon this?

Basically, you can call:
>>> X = Binomial(1, 2)
>>> Y = Binomial(1, 2)

And you would get different "results", different objects, while the
notation makes it look like you're storing a static result "Binomial(1, 2)"
to both X and Y. I think


Waiting to expose the internals later sounds like a good idea. Right now
the important thing is to figure out the right notation. I wonder what
Aaron thinks about this?

sy...@googlecode.com

unread,
Mar 7, 2012, 3:00:45 PM3/7/12
to sympy-...@googlegroups.com

Comment #4 on issue 3129 by Ronan.L...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

I certainly agree with the general idea.

I think that the main trouble with sympy.stats is that its objects don't
map cleanly to established mathematical concepts. RandomSymbol does
represent random variables, but PSpace and RandomDomain are odd beasts.

So, having a representation of probability distributions should simplify
the design. The mathematical abstraction underlying them is the concept of
measure (https://en.wikipedia.org/wiki/Measure_theory), and a Measure
object mu would have one important method: mu.integrate(f, D) implementing
\int_D f dµ. Implementing only the counting
measure(https://en.wikipedia.org/wiki/Counting_measure) and the Lebesgue
measure(https://en.wikipedia.org/wiki/Lebesgue_measure) would unify the
discrete and continuous cases.

Note that for us, a measure and a measure space (or a probability
distribution and a probability space) are the same thing, since a measure
needs to "know" its domain of definition and thus has to contain all the
information that makes up a measure space.

sy...@googlecode.com

unread,
Mar 7, 2012, 6:23:54 PM3/7/12
to sympy-...@googlegroups.com

Comment #5 on issue 3129 by asme...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

> And you would get different "results", different objects, while the

> notation makes it look like you're storing a static result "Binomial(1,
> 2)" to both X and Y. I think

I agree with this sentiment. That looks like we should have X == Y == some
kind of object representing a binomial distribution with mean 1 and
standard deviation 2. Requiring the symbol name would go a long way to
removing this ambiguity, so I think that this particular point is only
superficially unrelated to the rest of this discussion.

sy...@googlecode.com

unread,
Mar 12, 2012, 4:32:09 PM3/12/12
to sympy-...@googlegroups.com

Comment #6 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

Aaron, which notation would you prefer?

>>> X = Binomial(2, 0.5, 'X') (what's currently offered)
>>> X = RandomSymbol('X', Binomial(2, 0.5))
>>> X = Binomial(2, 0.5).create_symbol('X')

I think a combination of the second and third ones would work. The second
would be the general case, where you could pass in any function that works
as a density function. The third would be the shorthand--in this case, we
change the name of BinomialPSpace to Binomial, as with the other functions.

One thing I was wondering, Matthew, is there any use for the Symbol
variable stored in PSpaces besides representation in the Density (which
we've changed to use Lambda)? If not, we could probably get rid of it and
simplify the hierarchy even more: Instead of
string >> sympy Symbol >> Random symbol, we'd have
string >> Random symbol

If I have time this week, I'll look over the code and try out these changes.

sy...@googlecode.com

unread,
Mar 12, 2012, 5:04:32 PM3/12/12
to sympy-...@googlegroups.com

Comment #7 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

The symbol helps in the current system by linking back to the concept
you're dealing with. Consider the following example

In [1]: from sympy.stats import *
In [2]: T = Normal(30, 3, symbol=Symbol('T')) # temperature is 30C with std
dev 3C.
In [3]: T_posterior = Given(T, T>29) # We know that T is greater than 29
In [4]: P(T<T_posterior)
ValueError

Whatever system you build needs to understand that T and T_posterior are
linked. There is more structure here between the random symbols than just
how they are distributed. This is an example of the sort of problem that
having internal symbols solves. It is sometimes useful to keep track of the
underlying concept that you're talking about.

You could also consider a bivariate probability space/distribution. You
need to ask for one of the variables within the space. Internal symbols
allow you to clearly specify what you want.

Of course, you could probably figure out a way around all of this that was
cleaner. There are some tricky situations that can come up. Internal
symbols was my solution for them.

sy...@googlecode.com

unread,
Mar 12, 2012, 6:39:01 PM3/12/12
to sympy-...@googlegroups.com

Comment #8 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

Ah, I see. Is there any way/reason for the *user* to see the interior
variable? I've run into confusion before, since the random symbol and the
interior variable that represents it have the same representation.

sy...@googlecode.com

unread,
Mar 12, 2012, 7:13:25 PM3/12/12
to sympy-...@googlegroups.com

Comment #9 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

Both T and T_posterior have a link to their internal symbol.

In [6]: T.symbol is Symbol('T')
Out[6]: True

In [7]: T.symbol is T_posterior.symbol
Out[7]: True

Really a RandomSymbol is just a PSpace/symbol pair.
I.e. "RandomSymbol 'T_posterior' points to the symbol 'T' within the
conditional probability space with density exp((T-30)...."

The user sees the interior variable whenever they print out T_posterior.
They're thinking about the concept "T" or temperature, that's what
T_posterior represents, and that's what gets printed out.

You could think of a RandomSymbol as being a pair of a symbolic variable
and a known value. Lets consider this in a simpler deterministic
(non-random) setting. We link the concept temperature to the known value
30C ("temp", 30C). Even though we know the value we still want to play with
this as a symbolic entity, not just the number 30. As a result we'll use
they symbol "temp" most of the time. When we're done with symbolic
manipulations, we decide we want to compute the value of some expression
like temp**2. Ok, now we're ready to plug in the value 30. This switch
from "temp" to 30 happens in each of the functions E, P, Density, Sample,
etc....

RandomSymbols do exactly this except that the value isn't just a number,
it's more complex. At first it's more complex because we want to link a
symbol to some probabilistic set of outcomes, a distribution. In full
generality these outcomes can depend on lots of things though, not just
this particular concept of the temperature. After intertwining a number of
random variables and conditions things can get a bit hairy. The internal
symbols gave structure to these decisions and, after you've thought about
it a long while, seem to make sense.

This system is probably needlessly complex in many ways. I like the
distribution idea and I really like the idea of someone else taking a fresh
look at all this. There are a few things I suspect you'll run into though.
The above example is one of them.

I'll have a fair amount of time this week and next to be helpful where I
can.


sy...@googlecode.com

unread,
Mar 13, 2012, 9:24:26 PM3/13/12
to sympy-...@googlegroups.com

Comment #10 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

Okay then... why do PSpaces need a symbol bound to them? Printing out
pspace(X) gives us a domain and a density function (or map for discrete
rvs). This can be accomplished by using sets to represent the domain and
Lambdas for the density. I think there are some cool things we can do with
making PSpaces (soon to be distributions?) static, like capture the fact
that
>>> X = Normal(2, 3).new('X')
>>> Z = (X-2)/3
>>> pspace(Z)
Normal(0, 1)

Basically, capturing relationships of transformations of variables. You're
right that this can cause complications with lots of compound variables,
but it's too useful of a feature to disregard...

sy...@googlecode.com

unread,
Mar 13, 2012, 10:07:01 PM3/13/12
to sympy-...@googlegroups.com

Comment #11 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

In general PSpaces can have a many symbols bound to them. They act as
indices into the distribution. The PSpace is making a value statement about
these symbols/concepts.

Consider not using internal symbols at all. How do we represent probability
densities? The current plan is to use a Lambda. Lambdas use internal
symbols i.e.

dist = Lambda( (x,y) , exp((x**2+y**2)/(2*pi)) / ... )

The x and y here are internal symbols. Well, I suppose we could use dummies
instead but then our lambdas look bad with names like x1, x2, .... This is
a minor detail really though, why the big deal? Well, suppose we want a
random variable that corresponds to the 'y' in the probability density. How
do we specify that we want the 1th variable and not the 0th one? Well, we
could use an index. Something like

Y = RandomSymbol('Y', dist, index=1)

The idea of using an index here seems separated from what we want. In this
sense an internal symbol acts like a more conceptual index.

Regarding your example I have two comments.

(1) I suggest NormalDistribution as a name rather than Normal. I would
leave Normal for syntactic sugar later on. This is a relatively minor
disagreement though.
(2) The result you're getting is easy in the case of the normal
distribution but I think it's very challenging in even trivially more
complex situations. How does this work in the case of a beta distribution?
The current design specifically avoids any sort of special-rule for
well-known distributions. Everything is represented as a SymPy Expr. We
fail to get some nice results but, in this sense at least, the system is
much simpler.

>>> X = BetaDistribution(2, 3).new('X')
>>> Z = (X-2)/3
>>> pspace(Z)
???

I suspect that a solution that attempts to make decisions like this will
necessarily become very complex.

I think that we're trying to push too much into the concept of a
distribution. I suspect that there are two separable tasks here. Managing
random symbol interaction and computing on distributions. I now think that
the concept of a probability space is probably necessary. I think that much
of the complexity of the PSpace object should be factored out into a
Distribution object and that PSpace should become very simple. Hopefully
much of the complexity can be simplified in this factoring process.

Some thoughts
There should be a single PSpace class (no subclasses).
It should contain a Distribution and a set of symbols
There should be a Distribution interface that handles things like
compute_density, integrate, P, etc....
Distribution should be subclassed to Continuous and Finite and should be
something like what is proposed above.

This separates two concepts that should have been separated before. I think
this solution is clean.

sy...@googlecode.com

unread,
Mar 13, 2012, 10:20:11 PM3/13/12
to sympy-...@googlegroups.com

Comment #12 on issue 3129 by asme...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

> In [1]: from sympy.stats import *


In [2]: T = Normal(30, 3, symbol=Symbol('T')) # temperature is 30C with std
dev 3C.
In [3]: T_posterior = Given(T, T>29) # We know that T is greater than 29
In [4]: P(T<T_posterior)
ValueError

My probability is a little rusty. Why does this raise ValueError?

> In [6]: T.symbol is Symbol('T')
Out[6]: True

is comparison of Symbols (or indeed anything other than Singletons) only
works because of the cache and should not be relied upon.

sy...@googlecode.com

unread,
Mar 13, 2012, 10:28:17 PM3/13/12
to sympy-...@googlegroups.com

Comment #13 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

T and T_posterior represent the same variable in different contexts. It
doesn't make sense to compare them. T_posterior is under an assumption that
T isn't.

Regarding 'is' yes, I probably should have used '==' in this example.

sy...@googlecode.com

unread,
Mar 18, 2012, 12:58:41 AM3/18/12
to sympy-...@googlegroups.com

Comment #14 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

I'm afraid I won't have any time in the near future to work on such grand
changes as this. Can someone else take it up? When will 0.7.2 be released?

sy...@googlecode.com

unread,
Mar 18, 2012, 1:30:47 AM3/18/12
to sympy-...@googlegroups.com

Comment #15 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

I can't commit to work on any large chunk in the near future. In comment
two I give a "What I would do if this were entirely up to me" plan. I think
we can release an interface without solidifying the internals. I think we
can make minor modifications now that allow us release .72 soon and change
sympy.stats later.

sy...@googlecode.com

unread,
Mar 18, 2012, 1:37:49 AM3/18/12
to sympy-...@googlegroups.com

Comment #16 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

I see. So what do we think should be changed in the interface? I think
getting rid of automatically generated symbols is what Aaron and I agreed
on. How about Binomial, Gamma and friends become aliases of BinomialPSpace,
etc for now? Do you think, instead of passing in a sympy symbol for
internal use, we can pass in a string that gets converted into a symbol
inside?

Instead of
Binomial(2, 0.5, symbol=Symbol('X')), having
Binomial(2, 0.5, symbol='X')

And what do we think is best notation for creating a new rv?

sy...@googlecode.com

unread,
Mar 18, 2012, 11:33:29 AM3/18/12
to sympy-...@googlegroups.com

Comment #17 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

I'm saying that PSpaces should be entirely in the background and not really
accessible by the user. We'll change it later

Notation for creating random variables will still be Normal, Binomial, and
friends. I don't think that Binomial should alias BinomialPSpace because
users should not really be seeing BinomialPSpace anyway. They're going to
be creating random symbols far more often so that should be the clean part.
We'll add in NormalDistribution or whatever later on.

In comment 2 I suggest that the symbol is just a required input of the
Normal, Binomial, etc... functions.

X = Normal(0, 1, 'X') # this is a random variable.

sy...@googlecode.com

unread,
Mar 18, 2012, 3:10:15 PM3/18/12
to sympy-...@googlegroups.com

Comment #18 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

I still like the notation
>>> X = Normal(2, 3).new_var('X')

Which combines the adherence to math convention with unambiguous
generation. Right now, Normal and friends can be containers that hold the
parameters. new_var just creates a new PSpace of the correct type which
binds the symbol 'X' to it. We change them to be distributions or whatever
later, but I prefer this notation to Normal(0, 1, 'X').

sy...@googlecode.com

unread,
Mar 18, 2012, 3:14:16 PM3/18/12
to sympy-...@googlegroups.com

Comment #19 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

I think we're going to have to agree to disagree on this one :)

sy...@googlecode.com

unread,
Mar 18, 2012, 7:02:21 PM3/18/12
to sympy-...@googlegroups.com

Comment #20 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

Haha... defer to a third party? Aaron's shown some support for my way in
comment 1, but I'm fine with doing it your way (the way it's in now). We
can always add things later.

sy...@googlecode.com

unread,
Mar 26, 2012, 9:54:30 PM3/26/12
to sympy-...@googlegroups.com

Comment #22 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

Issue: the definition of Binomial is:

def Binomial(n, p, succ=1, fail=0, symbol=None):

For variables like Binomial, Bernoulli, Die and Coin, we can't have the
simple call


X = Binomial(2, 0.5, 'X')

with our current notation. If we want the symbol to be mandatory, it would
be ideal that the user doesn't have to type in the symbol keyword everytime.

Some solutions:
- Move the symbol keyword up to the front, so it's
def Binomial(symbol, n, p, succ=1, fail=0):

- Move the symbol keyword after required arguments, but before the optional
ones
def Binomial(n, p, symbol, succ=1, fail=0):

- Eliminate optional arguments completely
def Binomial(n, p, symbol):

If we want to stick with this, then the first option seems best, IMO.

There's also the issue that continuous RVs can't accept string arguments as
symbols:
X = Normal(0, 1, symbol='X') returns an error. I'll change this once the
two recent pull requests are merged.

Also, what does an issue status of 'valid' mean?

sy...@googlecode.com

unread,
Mar 26, 2012, 10:43:14 PM3/26/12
to sympy-...@googlegroups.com

Comment #23 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

I agree that the first seems best


- Move the symbol keyword up to the front, so it's
def Binomial(symbol, n, p, succ=1, fail=0)

We should just sympify the symbol input. If we wanted to be clever we could
also make it real, positive, etc... as appropriate.

Valid just means that this issue is a valid problem that should be dealt
with. Most unresolved issues have the status Valid. This used to be called
Accepted.

sy...@googlecode.com

unread,
Mar 31, 2012, 6:15:24 PM3/31/12
to sympy-...@googlegroups.com

Comment #24 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

I'll start on these issues. What do you think about renaming Binomial and
friends to BinomialRV, or something of that sort? Make it more clear that
we are creating a random variable, and reduce confusion between other math
stuff with similar names, e.g. Gamma/gamma, Binomial/binomial. Two letters
doesn't seem like much of a burden to type.
>>> X = BinomialRV('X', 10, S.Half) # "We are creating a random
>>> variable 'X' from a binomial distribution with probability 0.5 and
>>> iterations"

sy...@googlecode.com

unread,
Apr 1, 2012, 6:40:26 PM4/1/12
to sympy-...@googlegroups.com

Comment #25 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

--- I'll start on these issues.
Awesome.

Regarding Names.
My preference is to keep the sugar so that "X = Normal('X', 0, 1)" creates
a normal random variable. My preference however certainly isn't law. I'm
not sure how we resolve disagreements without consensus in the community.

The work to move symbol up to the front and make it a mandatory argument is
independent of this decision. Adding or not adding the RV can be a final
commit and decided and changed up to the last minute.

sy...@googlecode.com

unread,
Apr 1, 2012, 10:59:57 PM4/1/12
to sympy-...@googlegroups.com

Comment #26 on issue 3129 by nathan.f...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

https://github.com/sympy/sympy/pull/1193

sy...@googlecode.com

unread,
Aug 8, 2012, 10:34:51 AM8/8/12
to sympy-...@googlegroups.com
Updates:
Status: NeedsDecision

Comment #27 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

There are a couple choices in this thread that are still not decided

>>> from sympy.stats import Normal
>>> Normal("X", 0, 1) # This is standard
>>> # vs
>>> NormalRV("X", 0, 1) # This has been proposed

My preference is to stick with the first one.

Do we want some sort of Distribution object that is separate from PSpace
and RandomSymbol object. It's not clear to me how feasible this idea is if
we want to continue to support complex conditional expressions.

The first question Normal vs NormalRV I think is appropriate for
Milestone-Release0.7.2. The second question I think can/should be postponed
until a suitable solution is proposed.

Marking this NeedsDecision for the first point.

sy...@googlecode.com

unread,
Aug 23, 2012, 5:34:23 PM8/23/12
to sympy-...@googlegroups.com
Updates:
Labels: -Milestone-Release0.7.2 Milestone-Release0.7.3

Comment #28 on issue 3129 by asme...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

Well, we are going to have to release with what we've got, unless someone
wants to submit a patch now. We can either just change it in the future,
or mark the current way as deprecated if we decide to do it a different way.

sy...@googlecode.com

unread,
Jan 20, 2013, 1:16:31 PM1/20/13
to sympy-...@googlegroups.com

Comment #29 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

The following PR may be relevant
https://github.com/sympy/sympy/pull/1720

sy...@googlecode.com

unread,
Jun 28, 2013, 12:09:25 PM6/28/13
to sympy-...@googlegroups.com
Updates:
Status: Fixed
Labels: -Milestone-Release0.7.3

Comment #30 on issue 3129 by MRock...@gmail.com: Drastic change to
sympy.stats: Adding concept of Probability Distributions on surface level
http://code.google.com/p/sympy/issues/detail?id=3129

This is mostly fixed.

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings
Reply all
Reply to author
Forward
0 new messages