pandas - how to use the 'mean' function

2,232 views
Skip to first unread message

s

unread,
Dec 3, 2011, 12:48:47 PM12/3/11
to pystat...@googlegroups.com
This is very likely a 'knob' question but I have been trying to compute
the mean of a data frame.

I import stock price data from yahoo for various stock tickers, truncate
the data to be in the desired date range and then I want to compute
various statistics. One of them is the mean.

I have read the documentation on

pandas.Series.mean
pandas.DataFrame.mean

http://pandas.sourceforge.net/generated/pandas.DataFrame.mean.html#pandas.DataFrame.mean

The only problem is I don't know how to call these functions. I can't
seem to import them via

from pandas.series import mean
or
from pandas.dataframe import mean

And when I look in the directory structure there is no such thing as
'series' as a module within pandas.

Apologies if this is a trivial question as I am very new to python and
pandas.


Wes McKinney

unread,
Dec 3, 2011, 1:07:40 PM12/3/11
to pystat...@googlegroups.com

Anytime you see something like Series.mean or DataFrame.mean in the
API it means that those are "instance methods", i.e. they are defined
within a class and they have access to the state (data) in the object:

http://docs.python.org/tutorial/classes.html#method-objects

So if you have a DataFrame df, you would do:

df.mean()

to compute the means of the columns. Or

df.mean(1) to compute the means of the rows.

In R when you do:

f(df)

typically what happens under the hood is the R interpreter actually does

f.data.frame(df)

this is similar to instance methods in Python, since

df.mean()

is just syntactic sugar for

DataFrame.mean(df)

but you would never need to know that in 99% of practical use.

I very strongly recommend that you use IPython (ipython.org) as it
makes exploring object methods and there documentation very easy,
e.g.:

In [3]: df.<TAB>
df.A df.iteritems
df.add df.iterkv
df.add_prefix df.ix
df.add_suffix df.join
df.align df.last_valid_index
df.append df.load
df.apply df.mad
df.applymap df.max
df.asfreq df.mean
df.as_matrix df.median
df.astype df.min
df.axes df.mul
df.B df.ndim
df.boxplot df.pivot
df.C df.pivot_table
df.clip df.plot
df.clip_lower df.pop
df.clip_upper df.prod
df.columns df.product
df.combine df.put_value
df.combineAdd df.quantile
df.combine_first df.radd
df.combineMult df.rdiv
df.consolidate df.reindex
df.copy df.reindex_like
df.corr df.rename
df.corrwith df.rename_axis
df.count df.rmul
df.cov df.rsub
df.cumprod df.save
df.cumsum df.select
df.D df.set_index
df.delevel df.shape
df.describe df.shift
df.diff df.skew
df.div df.sort
df.dot df.sort_index
df.drop df.sortlevel
df.drop_duplicates df.stack
df.dropna df.std
df.dtypes df.sub
df.duplicated df.sum
df.fillna df.swaplevel
df.filter df.T
df.first_valid_index df.tail
df.from_csv df.take
df.from_dict df.to_csv
df.from_records df.to_dict
df.get_dtype_counts df.to_html
df.get_value df.to_records
df.groupby df.to_sparse
df.head df.to_string
df.hist df.transpose
df.idxmax df.truncate
df.idxmin df.unstack
df.index df.values
df.info df.var
df.insert df.xs

In [3]: df.mean?
Type: instancemethod
Base Class: <type 'instancemethod'>
String Form:
<bound method DataFrame.mean of A B C D
2000-01-03 -0.770 <...>
2000-02-10 0.9349 -1.102 -0.229 0.824
2000-02-11 0.5129 -0.1323 0.06714 0.231 >
Namespace: Interactive
File: /home/wesm/code/pandas/pandas/core/frame.py
Definition: df.mean(self, axis=0, skipna=True, level=None)
Docstring:
Return mean over requested axis.
NA/null values are excluded

Parameters
----------
axis : {0, 1}
0 for row-wise, 1 for column-wise
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA
level : int, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a DataFrame

Returns
-------
mean : Series (or DataFrame if level specified)


- Wes

s

unread,
Dec 3, 2011, 1:56:43 PM12/3/11
to pystat...@googlegroups.com
Perfect. Thank you very much Wes! Great help.
Reply all
Reply to author
Forward
0 new messages