So I have a df like this.
df=pd.DataFrame([
['2012-02-03 13:00','honda',100,'k1',False],
['2012-02-03 13:21','nissan',100,'k1',False],
['2012-02-03 11:03','toyota',400,'d1',False],
['2012-02-03 10:03','bmw',300,'s1',False],
['2012-02-03 11:02','toyota',400,'d1',False],
],
columns=['ts','manufacture','size','form','sentinel'])
df.ts=pd.to_datetime(df.ts)
df=df.sort(['ts','manufacture'])
df['verified']=False
df
The dataframe is sorted by timestamp and manufacture.
Find all duplicate manufacture (toyota) and then compare to see if size (400 & 400) match.
If they do, change sentinal to True for the first occurance and verified to True on the second instance
So, I am able to find the manufacture. Toyota here
vc=df.manufacture.value_counts()
vci=vc[vc>1]
But, I am not sure how to compare inside the original dataframe, df
Any ideas?