Data analysis tools

45 views
Skip to first unread message

Sachin Joglekar

unread,
Oct 5, 2012, 1:27:29 PM10/5/12
to sy...@googlegroups.com
It was recently pointed out to me that though Sympy has much of the theoretical classes used in data analysis, there are no straightforward analysis tools in any module of Sympy. Something like those present in the statistical analysis language R. Sympy already has classes for all important distributions. It would just be a question of providing an 'interface' to make analysis easier, for eg, dealing with .csv files, etc. Would it be a useful addition to the code base?

Matthew Rocklin

unread,
Oct 5, 2012, 1:40:40 PM10/5/12
to sy...@googlegroups.com
I think that data analysis should be done in some of the other popular python packages (numpy, pandas, statsmodels). I think that SymPy should see how it can serve and connect with these other packages. 

You mention distributions so I'll speak about sympy.stats for a moment. A while ago I talked with the statsmodels folks. They could use sympy in a couple of ways. They built their own rudimentary symbolic stats system that we could replace with sympy.stats. They also badly need symbolic derivatives which we could supply with sympy.core. Skipper, one of the developers of statsmodels started working on this connection a while ago but stopped. I probably could have made this work if I put more work in on the SymPy side. 

I don't think we should build analysis tools. I do think that we should do more outreach. 

On Fri, Oct 5, 2012 at 12:27 PM, Sachin Joglekar <srjogl...@gmail.com> wrote:
It was recently pointed out to me that though Sympy has much of the theoretical classes used in data analysis, there are no straightforward analysis tools in any module of Sympy. Something like those present in the statistical analysis language R. Sympy already has classes for all important distributions. It would just be a question of providing an 'interface' to make analysis easier, for eg, dealing with .csv files, etc. Would it be a useful addition to the code base?

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sympy/-/ThBVoQL-0W4J.
To post to this group, send email to sy...@googlegroups.com.
To unsubscribe from this group, send email to sympy+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sympy?hl=en.

Ondřej Čertík

unread,
Oct 5, 2012, 5:24:56 PM10/5/12
to sy...@googlegroups.com
On Fri, Oct 5, 2012 at 10:40 AM, Matthew Rocklin <mroc...@gmail.com> wrote:
> I think that data analysis should be done in some of the other popular
> python packages (numpy, pandas, statsmodels). I think that SymPy should see
> how it can serve and connect with these other packages.
>
> You mention distributions so I'll speak about sympy.stats for a moment. A
> while ago I talked with the statsmodels folks. They could use sympy in a
> couple of ways. They built their own rudimentary symbolic stats system that
> we could replace with sympy.stats. They also badly need symbolic derivatives
> which we could supply with sympy.core. Skipper, one of the developers of
> statsmodels started working on this connection a while ago but stopped. I
> probably could have made this work if I put more work in on the SymPy side.
>
> I don't think we should build analysis tools. I do think that we should do
> more outreach.

I think that having an interface to other tools would always be welcomed.

Ondrej

G B

unread,
Oct 5, 2012, 5:53:29 PM10/5/12
to sy...@googlegroups.com
I don't think we should build analysis tools. I do think that we should do more outreach. 

Speaking as a noob with a lot of this, I wonder if one way to start would be to move documentation closer together.  Perhaps even interlink a bit.  I generally agree with the division of labor among these different packages, but I understand where the instinct comes from to make SymPy expand its role-- its easy to feel like each package is its own island.  As I'm trying to find a way to do something, I need to make a conscious effort to jump to a different documentation tree on a different site.

Having a shared hub for what I'd call scientific computing packages might help me, at least, take in more of whats available and give a better perspective to what functionality belongs in which package. 
 

On Fri, Oct 5, 2012 at 10:40 AM, Matthew Rocklin <mroc...@gmail.com> wrote:

Matthew Rocklin

unread,
Oct 5, 2012, 6:01:13 PM10/5/12
to sy...@googlegroups.com
On Fri, Oct 5, 2012 at 4:53 PM, G B <g.c.b....@gmail.com> wrote:
I don't think we should build analysis tools. I do think that we should do more outreach. 

Speaking as a noob with a lot of this, I wonder if one way to start would be to move documentation closer together.  Perhaps even interlink a bit.  I generally agree with the division of labor among these different packages, but I understand where the instinct comes from to make SymPy expand its role-- its easy to feel like each package is its own island.  As I'm trying to find a way to do something, I need to make a conscious effort to jump to a different documentation tree on a different site.

Having a shared hub for what I'd call scientific computing packages might help me, at least, take in more of whats available and give a better perspective to what functionality belongs in which package. 

I like this. I recommend that you send this idea to http://numfocus.org/ . This seems like the sort of thing that might interest them.

It would be cool to see examples and tutorials that mix and match as well.

There is the idea of a set of "Atomic and Composable tools". Each tool is atomic so it does one thing and can't be split apart. Each tool can interact nicely with the others. The desire for atomicity is what pushed my initial response. I think we've done a good job at being composable in theory but we haven't published this. 

The unix toolset is the common example of atomic and composable. It would be sad if there were only separate tutorials on grep, find, etc... and no tutorials that showed how they could be used together. 

Sachin Joglekar

unread,
Oct 6, 2012, 1:41:07 PM10/6/12
to sy...@googlegroups.com
Another enhancement which I think could be done...could hypothesis testing be added to the stats module?

Kjetil brinchmann Halvorsen

unread,
Oct 6, 2012, 5:46:19 PM10/6/12
to sy...@googlegroups.com
Why should this be in sympy, which is for symbolic mathematics?
within the python
universe, hypothesis testing and friends seems to have a natural home
in scipy. If one needs symbolics and statistics at ythe same time, I
guess both scipy and sympy can be imported, at the same time, withinn
the same python program, as modules?

see: http://www.astro.cornell.edu/staff/loredo/statpy/

kjetil
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/sympy/-/-p97Vl4sFmAJ.
> To post to this group, send email to sy...@googlegroups.com.
> To unsubscribe from this group, send email to
> sympy+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sympy?hl=en.



--
"If you want a picture of the future - imagine a boot stamping on the
human face - forever."

George Orwell (1984)

Matthew Rocklin

unread,
Oct 6, 2012, 6:11:04 PM10/6/12
to sy...@googlegroups.com
My understanding is that hypothesis testing requires both a model (Symbolics) and data (Numerics). 

SymPy.stats can certainly be used to provide the model, likelihood functions, their derivatives etc....
SymPy.stats and SymPy in general aren't very good at handling data however. As Kjetil points out though there are some other excellent packages in the python ecosystem that handle this nicely. I recommend looking at statsmodels.

It would be really nice to see SymPy stats and these other modules interact more naturally. Getting this to work better is a high priority. Even without strong links between these projects it should be possible to write up a simple example using SymPy and numpy to perform hypothesis tests. Unfortunately this isn't really my specialty (I actually do very little statistics). If anyone is interesting in working on this I'd be happy to support from the sympy.stats side. 
Reply all
Reply to author
Forward
0 new messages