Re: Resample tick data to OHLC with Pandas - NotImplementedError

3,195 views
Skip to first unread message

Jeff Reback

unread,
May 9, 2013, 10:24:10 AM5/9/13
to pystat...@googlegroups.com
Several things going on here:

1) don't use epoch timestamps, here's how to convert (time is your TIMESTAMP column)

In [50]: df['time'] = pd.to_datetime(df['time']*int(1e9))

In [51]: df
Out[51]: 
                 time  price         vol
0 2011-01-06 10:59:05  24990  1500000000
1 2011-01-06 12:43:33  25499  5000000000
2 2011-01-06 12:54:09   2499  1000000000

On Thursday, May 9, 2013 9:47:40 AM UTC-4, workin...@gmail.com wrote:
Hello,

I've ticks data in a DataFrame like this

    TIMESTAMP    PRICE         VOL
0  1294311545    24990  1500000000
1  1294317813    25499  5000000000
2  1294318449    25499   100000000
 
In [52]: df.dtypes
Out[52]: 
time     datetime64[ns]
price             int64
vol               int64
dtype: object

2) set the index, then convert to float64

df = df.set_index('time').astype('float64')

3) ohlc can only operate on a single column at once (as it returns a frame)
(in theory this could be enhanced to have it return a panel (3dim object), but not implemented right now

In [64]: df['price'].resample('15Min',how='ohlc')
Out[64]: 
                      open   high    low  close
time                                           
2011-01-06 10:45:00  24990  24990  24990  24990
2011-01-06 11:00:00    NaN    NaN    NaN    NaN
2011-01-06 11:15:00    NaN    NaN    NaN    NaN
2011-01-06 11:30:00    NaN    NaN    NaN    NaN
2011-01-06 11:45:00    NaN    NaN    NaN    NaN
2011-01-06 12:00:00    NaN    NaN    NaN    NaN
2011-01-06 12:15:00    NaN    NaN    NaN    NaN
2011-01-06 12:30:00  25499  25499  25499  25499
2011-01-06 12:45:00   2499   2499   2499   2499

In [65]: df['vol'].resample('15Min',how='ohlc')
Out[65]: 
                           open        high         low       close
time                                                               
2011-01-06 10:45:00  1500000000  1500000000  1500000000  1500000000
2011-01-06 11:00:00         NaN         NaN         NaN         NaN
2011-01-06 11:15:00         NaN         NaN         NaN         NaN
2011-01-06 11:30:00         NaN         NaN         NaN         NaN
2011-01-06 11:45:00         NaN         NaN         NaN         NaN
2011-01-06 12:00:00         NaN         NaN         NaN         NaN
2011-01-06 12:15:00         NaN         NaN         NaN         NaN
2011-01-06 12:30:00  5000000000  5000000000  5000000000  5000000000
2011-01-06 12:45:00  1000000000  1000000000  1000000000  1000000000

 
I would like to resample to OHLC candlesticks data

I did this

        self.df['TIMESTAMP'] = self.df['TIMESTAMP']\
            .map(lambda s: datetime.datetime.fromtimestamp(int(s), dateutil.tz.tzutc()))

to get datetime instead of int

and set TIMESTAMP as index

        self.df = self.df.set_index('TIMESTAMP')


and I tried to resample using this command

self.dfOut = self.df.resample('15Min', how='ohlc')

But it raises an Exception
NotImplementedError

I think this is probably due to the way I'm generating index...

But I'm looking for this since some hours... and I'm still in trouble with this...

Any help is welcome...

W4C

workin...@gmail.com

unread,
May 9, 2013, 11:01:44 AM5/9/13
to pystat...@googlegroups.com
Thanks.
It helps me a lot !!!

Jeff Reback

unread,
May 9, 2013, 11:24:30 AM5/9/13
to pystat...@googlegroups.com
I created an enhancement issue about this


also shows how to create a panel from data like this

workin...@gmail.com

unread,
May 9, 2013, 11:24:59 AM5/9/13
to pystat...@googlegroups.com
Oh I'm sorry... I just have an other question...

the problem with this method is that I also get OHLC candlestick for volume...
I should get sum of volume for a given time interval

Any idea to achieve this ?

Jeff Reback

unread,
May 9, 2013, 11:47:17 AM5/9/13
to pystat...@googlegroups.com
just a regular resample

In [2]: df = DataFrame(randn(20,2),index=date_range('20130101',periods=20,freq='min'),columns=list('AB'))

In [3]: df['B']
Out[3]: 
2013-01-01 00:00:00    0.221851
2013-01-01 00:01:00   -1.133104
2013-01-01 00:02:00   -1.160935
2013-01-01 00:03:00   -0.703199
2013-01-01 00:04:00   -1.236053
2013-01-01 00:05:00   -0.470520
2013-01-01 00:06:00   -1.438936
2013-01-01 00:07:00    0.685874
2013-01-01 00:08:00   -0.551225
2013-01-01 00:09:00   -0.770867
2013-01-01 00:10:00    2.335740
2013-01-01 00:11:00    0.754038
2013-01-01 00:12:00   -0.060637
2013-01-01 00:13:00   -0.486963
2013-01-01 00:14:00    0.148781
2013-01-01 00:15:00    0.146861
2013-01-01 00:16:00   -1.579274
2013-01-01 00:17:00   -0.751042
2013-01-01 00:18:00    1.524033
2013-01-01 00:19:00   -0.571428
Freq: T, Name: B, dtype: float64

In [4]: df['B'].resample('15Min',how='sum')
Out[4]: 
2013-01-01 00:00:00   -3.866155
2013-01-01 00:15:00   -1.230849
Freq: 15T, dtype: float64

workin...@gmail.com

unread,
May 9, 2013, 11:51:47 AM5/9/13
to pystat...@googlegroups.com
Thanks a lot again !!!!

workin...@gmail.com

unread,
May 9, 2013, 12:18:57 PM5/9/13
to pystat...@googlegroups.com
Some data are missing...

I need a trick to fill NaN
In fact I want OPEN=HIGH=LOW=CLOSE of data with NaN value = CLOSE of previous candle

I tried
        self.dataframe_out['MISSING'] = self.dataframe_out['open'].isnull()
        self.dataframe_out['MISSING2'] = self.dataframe_out['MISSING'].shift(1)

so I need to calculate MISSING and not MISSING2 to detect when a "new" NaN appears

but you maybe know a better way to achieve my task

workin...@gmail.com

unread,
May 9, 2013, 12:22:46 PM5/9/13
to pystat...@googlegroups.com
for volume I just did
self.dataframe_out['VOL'] = self.dataframe_out['VOL'].fillna(0)

but that's the easier part of the "job"

Jeff Reback

unread,
May 9, 2013, 12:31:35 PM5/9/13
to pystat...@googlegroups.com
you can ffill

workin...@gmail.com

unread,
May 9, 2013, 12:38:44 PM5/9/13
to pystat...@googlegroups.com
I can't, I must use CLOSE value of previous candle for every columns (OPEN HIGH LOW CLOSE)

Everyone can reply here

Reply all
Reply to author
Forward
0 new messages