Introduction to Statistics with Python

776 views
Skip to first unread message

Thomas Haslwanter

unread,
Mar 6, 2013, 1:37:12 PM3/6/13
to pystat...@googlegroups.com
Hi,

as you may have guessed from some of my previous entries, I am not a statistics expert. But I am a huge Python fan, and I currently have to hold the introductory lecture to Statistics at our institution. Since the wikibook on "Statistics" is not yet in a shape ready to use, and does not include any sample code, I wrote my own script, attached to this message. The plan was to give my students an introduction to statistics, as well as a Python toolkit which they can use later on when they have to analyze data in the field.

The reason I am posting this note is that I would be willing to make this script open source (as I believe in open access to information), and am wondering what other people more experienced in stats think about it. Does this make sense? And if so, what would be the best format?

And one step beyond: to be frank, I found getting started with using statsmodels pretty difficult. If anyone more experienced than me is keen, maybe it would be possible to extend the script to also facilitate a start with statsmodel.

Anyway, I would be grateful for feedback, critical as well as supportive.

thomas

PS: the script is currently written in LaTex.
Statistics_13.pdf

Dartdog

unread,
Mar 6, 2013, 2:20:11 PM3/6/13
to pystat...@googlegroups.com
I would be very interested in the source.. I've been looking at stats models for a while and have found it slow going, (I don't get to spend much unbroken time though) so I'd love anything that helps, as an aside i'm on the hook to give a talk at the local University on Open source stats (I can probably wing it since it will be an overview) Next week, and I intend to emphasize Ipython Panda and statsmodels. Clearly intro material is very helpful to people like me that are somewhat up the curve on Python and slightly so on stats, I get lost easily on each though so I learn best from code examples coupled with real life use cases,,the Greek in most stats material throws me for a loop, it is seldom accompanied by a simple real life case of what useful information or conclusion one might derive from the application of the method expressed in English!

Dartdog

unread,
Mar 6, 2013, 2:25:40 PM3/6/13
to pystat...@googlegroups.com
I'd also add that good treatments of using Bayesian methods on conventional problems are very hard to find..

Dartdog

unread,
Mar 6, 2013, 2:28:33 PM3/6/13
to pystat...@googlegroups.com
As I look through the book I like it a lot!!

Matthew Brett

unread,
Mar 6, 2013, 2:30:58 PM3/6/13
to pystat...@googlegroups.com
Hi,
It looks nicely done.

Is it feasible to write the whole document as a notebook? Or was
there too much formatting in there to do that?

Cheers,

Matthew
Message has been deleted
Message has been deleted

Thomas Haslwanter

unread,
Mar 8, 2013, 10:26:39 AM3/8/13
to pystat...@googlegroups.com
I would be happy to provide Latex, Python, and Images to you for your course, under a
Attribution-NonCommercial-
ShareAlike 3.0
Licence. Still don't know what would be the best way to share it. ScribTex or git come to my mind - but I don't have much experience in either.
In the meantime, we could do it directly. I have left a message on that you your Google+ "circles" page.
thomas

Wes McKinney

unread,
Mar 8, 2013, 3:13:47 PM3/8/13
to pystat...@googlegroups.com
Have you thought about doing it in Sphinx so it can be published as PDF or HTML?

Thomas Haslwanter

unread,
Mar 8, 2013, 5:12:41 PM3/8/13
to pystat...@googlegroups.com
Hi Mathew,
converting it all to ipython notebook format would be difficult. But you can access the individual python files as notebook,

http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/anova.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/anovat.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/bootstrap.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/compGroups.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/distribution_normal.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/dist_continuous.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/dist_discrete.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/gettingStarted.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/getting_started.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/modeling.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/multivariate.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/mult_regress.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/notebook_normalDistribution.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/pandas_intro.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/residuals.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/showStats.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/survival.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/ttest_1samp_notebook.ipynb
http://nbviewer.ipython.org/url/work.thaslwanter.at/CSS/Code/univariate.ipynb

Thomas Haslwanter

unread,
Mar 8, 2013, 5:16:44 PM3/8/13
to pystat...@googlegroups.com
Hi Wes,
I don't have that much experience with Sphinx, and am wondering if converting the LaTex code might be a LOT of work?

BTW, you have done a GREAT job with Pandas. (In fact, this is pretty much what clinched my decision to do the course in Python!). Also like your book!

Wes McKinney

unread,
Mar 9, 2013, 1:41:46 PM3/9/13
to pystat...@googlegroups.com
Since you've already been doing LaTeX, might be a lot of work. I don't know the status of Pweave (the Python analogue to Sweave/knitr)-- might be interesting if you wanted code output/plots to be automatically updated in the output document. Having just written a book with a lot of code examples, having the code results always "in sync" was the only way I kept sane (I built a custom set of code processing scripts on top of Docbook XML). 

- Wes

Thomas Haslwanter

unread,
Mar 10, 2013, 10:57:57 AM3/10/13
to pystat...@googlegroups.com
I have now generated a Sphinx-version of the Latex script, and where Python programs are used, included links to the corresponding ipython notebooks. You can read it under

http://work.thaslwanter.at/Stats/html/StatsFH.html

To Wes:
pandoc does a pretty good job converting LaTex to rst. It was a bit of work, though, to get the Sphinx-generated HTML to be sectioned properly. Did you write your book in Sphinx?

Thomas Haslwanter

unread,
Mar 10, 2013, 1:11:16 PM3/10/13
to pystat...@googlegroups.com
If anyone is keen to contribute: I have now set up a git repository, containing all the required TEX, RST, PY, and TXT files for my statistics introduction. You can find it under

g...@github.com:thomas-haslwanter/statsintro.git

I would be THRILLED if someone else would be keen on extending this introduction. Especially an intro with respect to statsmodels would be highly desirable.

Dartdog

unread,
Mar 10, 2013, 10:17:48 PM3/10/13
to pystat...@googlegroups.com
I looks awesome in Sphinx,Nice job! it will take me some time to read through though...

Thomas Haslwanter

unread,
Mar 11, 2013, 3:01:00 PM3/11/13
to pystat...@googlegroups.com
Thank you for the compliment!
Feel free to use it for your course. And if you like it, perhaps you can get around to add a section on Bayesian methods ... ;)

Best regards
thomas

Dartdog

unread,
Mar 12, 2013, 4:35:20 PM3/12/13
to pystat...@googlegroups.com
I just tried to run the 1st notebook locally and it did not have the getdata module? (and I guess the data either?) Ideas?

Dartdog

unread,
Mar 12, 2013, 4:49:12 PM3/12/13
to pystat...@googlegroups.com
Ah found the data and getdata in the github repo, suggestions on install?

Thomas Haslwanter

unread,
Mar 15, 2013, 3:39:54 PM3/15/13
to pystat...@googlegroups.com
sorry, I have not yet made an installer (since I was not sure if anyone was interested).
So for the time-being, please just download the repo and execute the notebook in its home directory.
I'll check into making an installer.

Thomas Haslwanter

unread,
Mar 16, 2013, 4:02:20 AM3/16/13
to pystat...@googlegroups.com
I thought about it again, and have decided against an installer (although it would be quite easy to do). The reason: an installer puts everything in a location that is hard to find for someone starting with Python. And these should be explicitly introductory examples. So I better it is simpler to leave it as it is, and require the user to "cd" to this directory.
thomas


On Tuesday, March 12, 2013 9:49:12 PM UTC+1, Dartdog wrote:

keshav pokhrel

unread,
Nov 27, 2014, 5:02:36 AM11/27/14
to pystat...@googlegroups.com
Hello Thomas, 

I really liked your notes on statistics using python. I am very interested in source code of the Latex document.  I can handle  the stats part but I needed a gentle introduction of python for introductory stats, thank you !!

~Keshav

Thomas Haslwanter

unread,
Nov 28, 2014, 5:06:23 PM11/28/14
to pystat...@googlegroups.com
Hi Keshav,
you can find the whole lot at https://github.com/thomas-haslwanter/statsintro
Contributions, or simple suggestions on what needs improvement, would be much appreciated!
Reply all
Reply to author
Forward
0 new messages