Seeking Assistance with SymPy Simplification for Statistical Equations

62 views
Skip to first unread message

Matthew Robinson

unread,
Apr 23, 2024, 3:25:11 PM4/23/24
to sympy

Dear SymPy Developers Group,

I hope this email finds you well. I am currently exploring the use of SymPy, a powerful symbolic mathematics library, to simplify equations related to mathematical statistics. Specifically, I am interested in developing a function that can handle statistical equations by recognizing and substituting large sample size (asymptotic) approximations.

Here are the key aspects I’d like to address:

  1. Approximations:
    • Sterling’s Approximation: I aim to incorporate Sterling’s Approximation for factorials, which becomes increasingly accurate for large values.
    • Binomial and Poisson Distributions: I want to approximate these discrete distributions with the Normal Distribution when dealing with large sample sizes.
  2. Algebra of Random Variables:
    • Additionally, I would like to explore algebraic operations involving random variables, particularly focusing on the ratio distribution.
    • Ratio distribution - Wikipedia

In summary, my goal is to input formulas that involve distributions (such as binomial and Poisson distributions) into SymPy. The library should then simplify these formulas using well-known large sample size approximations, including the normal distribution and Sterling’s approximation. For example, if the ratio of a Poisson Distribution divided by a Binomial Distribution is input, it should output a mathematically simplified expression representing the large sample size approximations of the Poisson distribution divided by the large sample size approximation for the Binomial distribution.

If there are existing methods or tools for achieving this, I would greatly appreciate any guidance or pointers. Alternatively, if no such methods currently exist, I am enthusiastic about contributing to the development of this functionality.

Thank you for your time, and I look forward to any insights or suggestions you may have.

Best regards,

Matthew Robinson

Aaron Meurer

unread,
Apr 23, 2024, 4:12:55 PM4/23/24
to sy...@googlegroups.com
Hi.

I'm not sure if the things you mentioned are implemented or not, but
if they are, they would be in the sympy.stats module. If they aren't
there yet, it sounds like they would be appropriate for that
submodule. sympy.stats implements the algebra of random variables you
are talking about. Taking ratios of random variables is supported,
although there may be different things that aren't yet implemented.

Also note that some of these things are more general mathematical
concepts applied to statistics (like asymptotic expansions), which may
already be implemented in other parts of SymPy. For example, there is
support for asymptotic expansions (aseries()), although I don't know
if Sterling's approximation is implemented.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/dde90ce8-95b4-4565-9293-e8392c029110n%40googlegroups.com.

Matthew Robinson

unread,
Jul 29, 2024, 11:30:41 AM7/29/24
to sympy

Hi everyone,

I’m encountering an issue with the last line of my code and could use some help troubleshooting it. I suspect that the SymPy Stats module might be adding unknown attributes to a variable, which is causing the problem.

Any insights or suggestions would be greatly appreciated!

Thank you, 

Matthew Robinson


##### Start of Code #####

#This code derives the distribution of a Bray-Curtis Dissimilarity

 from sympy import * #Import the SymPy module

from sympy.stats import *

 

rate1 = Symbol("rate1", positive=True) #Define parameter lambda_1 as positive

rate2 = Symbol("rate2", positive=True) #Define parameter lambda_2 as positive

Y1 = Poisson("y1", rate1) #Make parameter lambda_1 Poisson Distribution function

Y2 = Poisson("y2", rate2) #Make parameter lambda_2 Poisson Distribution function

min_density_Y1 = 2*density(Y1)(rate1) - 2*density(Y1)(rate1)*cdf(Y1)(rate1)

min_density_Y2 = 2*density(Y2)(rate2) - 2*density(Y2)(rate2)*cdf(Y2)(rate2)

Bray_Curtis_Density = 1 - 2 * (min_density_Y1 + min_density_Y2) / (Y1 + Y2)

Bray_Curtis_Density #Print Results

#Export to SciPy for numerical evaluation

export_fcn = lambdify([rate1, rate2], Bray_Curtis_Density)

result = export_fcn(5, 10) #The distriubtion for the given rate parameters

print(result)

 

## Why is the line below failing?? Why isn’t the variable ‘y1’ in the locals() ?? Please help

result(y1 = 6, y2 = 9)


##### End of Code #####

Aaron Meurer

unread,
Jul 29, 2024, 2:30:06 PM7/29/24
to sy...@googlegroups.com
Your expression has four symbols, rate1, rate2, y1, and y2. When you
lambdify it you should include all those symbols, meaning you should
either also include y1 and y2 as parameters or substitute them for
numbers before lambdifying.

Aaron Meurer
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/14ba2426-5259-488a-8440-2b50c67d2e12n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages