I am a hoping to find a clean way to count zeros between two rows in a specific column. I can do this in loop, but I prefer to take the opportunity and get it using group by.
For the sake of the discussion, I have prepared an example:
import numpy as np
import pandas as pd
# Preparing the data
evnt_id = [1, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, -1, 0, 0, 1]
expctd = [0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 4, 0, 0, 2]
dt_rng = pd.date_range(start ='1-1-2018',periods = len(evnt_id), freq ='1h', name = "time")
dt_idx = pd.DatetimeIndex( dt_rng )
arr = np.column_stack( (evnt_id, expctd ))
df = pd.DataFrame( arr, columns=['event_id', 'exp_occ' ] , index= dt_idx)
rng_idx = pd.RangeIndex( start=0, stop=len(evnt_id), step=1 , name='seq' )
data = df.copy()
grp = (
(data['event_id'] == 0 ) & ( data['event_id'].shift() != 0 )
).cumsum()
occurance_count = data.groupby( grp).agg( {
# "WHAT_TO_USE_FOR_INDEX" : "idx",
"event_id": "count"
})
This is producing close enough results, but no way to join it back on the original frame, without some dirty hacks. How would others approach the problem ?
Thank you