Shift dates by various months

165 views
Skip to first unread message

james....@gmail.com

unread,
Aug 21, 2017, 5:19:40 PM8/21/17
to PyData
Hi All,

Suppose there is a data frame like below,

df = pd.DataFrame({'Date':['2008-01-31','2008-02-29','2008-03-31','2008-06-30'],'offset':[1, 2, 5, 3]})
df['Date'] = pd.to_datetime(df['Date'])

Date             offset
2008-01-31   1
2008-02-29   2
2008-03-31   5
2008-06-30   3
...

I want to shift each date by number of months in the offset column, saying the resulted dataframe looks like this

Date             offset    offset_date
2008-01-31   1          2008-02-29
2008-02-29   2          2008-04-30
2008-03-31   5          2008-08-31
2008-06-30   3          2008-09-30
...

For this, I can loop through the dates by adding pd.DateOffset(months=offset) + pd.offsets.MonthEnd(0) either using for loop or apply. But I wonder if there is any vectorised way to do this in case of large data frame? I found some solutions using something like numpy.timedelta64['M'], but they cannot produce exact months like the approach with DateOffset. But DateOffset seems not support vectorisation?

Thanks in advance!

Best regards,
James

Chris Bartak

unread,
Aug 21, 2017, 5:43:50 PM8/21/17
to PyData
It's a bit of a hack, but one we use internally - you can cast back and forth with periods to do this pretty quickly.  For performance you have to be careful that the period data is never boxed into a `Series`, otherwise it takes on an object dtype.  (the `.values` bit below is necessary for that)

i = pd.DatetimeIndex(df['Date'])
df['offset_date'] = (i.to_period('M') + df['offset'].values).to_timestamp(how='end')

In [25]: df
Out[25]: 
        Date  offset offset_date
0 2008-01-31       1  2008-02-29
1 2008-02-29       2  2008-04-30
2 2008-03-31       5  2008-08-31
3 2008-06-30       3  2008-09-30

That's for a month-end, if you want to handle DateOffset(months=months) it's a little more involved, some code that used to be in pandas to do that here you can model after.


The actual cython code used now is easily vectorizable, so it wouldn't be hard to support this directly in pandas, mostly a question of what the API would be.








--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

james....@gmail.com

unread,
Aug 21, 2017, 6:55:04 PM8/21/17
to PyData
Many thanks!
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages