Compute EWMA over sparse/irregular TimeSeries in Pandas

103 views
Skip to first unread message

Nick Tomlinson

unread,
Aug 6, 2015, 10:52:58 AM8/6/15
to PyData

Given the following high-frequency but sparse time series:

#Sparse Timeseries
dti1 = pd.date_range(start=datetime(2015,8,1,9,0,0),periods=10,freq='ms')
dti2 = pd.date_range(start=datetime(2015,8,1,9,0,10),periods=10,freq='ms')
dti = dti1 + dti2

ts = pd.Series(index=dti, data=range(20))

I can compute an exponentially weighted moving average with a halflife of 5ms using a pandas function as follows:

ema = pd.ewma(ts, halflife=5, freq='ms')

However, under the hood, the function is resampling my timeseries with an interval of 1 ms (which is the 'freq' that I supplied). This causes thousands of additional datapoints to be included in the output.

In [118]: len(ts)
Out[118]: 20
In [119]: len(ema)
Out[119]: 10010

I know that its possible to do:

In [120]: pd.ewma(ts, halflife=5, freq='ms').reindex(ts.index) 

To reindex the ewma as per ts, but, this is not scalable, as my real Timeseries contains hundreds of thousands of high-frequency observations that are minutes or hours apart.

Is there a Pandas/numpy way of computing an EMA for a sparse timeseries without resampling? Something similar to this: http://oroboro.com/irregular-ema/


Or, do i have to write my own? Thanks!


(FYI I also posted this at http://stackoverflow.com/questions/31769047/compute-ewma-over-sparse-irregular-timeseries-in-pandas)

Reply all
Reply to author
Forward
0 new messages