if/elif/else style method

27 views
Skip to first unread message

John E

unread,
Jul 26, 2015, 10:25:06 AM7/26/15
to PyData
Perhaps this would be of some general interest.  I monkey-patched a wrapper to numpy.select that allows the following syntax:

df['new1'] = df.elsif( [ ( df.x > 0,    9 ),
                         
( df.y > 0,   99 ),
                         
( df.z > 0, df.x ) ], default=np.nan )

whereas with numpy.select you would write it like this:

df['new3'] = np.select( [ df.x > 0, df.y > 0, df.z > 0 ],
                       
[ 9,        99,       df.x     ], default=np.nan )


I tried to do it with dictionary syntax instead of as a list of tuples but due to the unordered nature of dictionaries I couldn't get that to work (and I don't even know if it's possible).

Advantages:
1) Same speed as numpy.select, which I think is the fastest way to do this kind of thing
2) Can be chained (unlike numpy.select)
3) Conditions and choices are entered in a more readable way -- it's essentially a transpose of inputs for numpy.select, and in the same order as you would write an if/elif/else block.

Here's my code and the 3 main alternative ways to code this (that I'm aware of).  Timings are not shown, but in this example, it's about 3.5x faster than the ix way (and the same speed as numpy.select, of course).  numpy.where is slower than numpy.select but faster than ix.

def elsif( self, list_of_tuples, default=0 ):
   tlist
= map(list, zip(*list_of_tuples))
   
return pd.Series( np.select( tlist[0], tlist[1], default=default ), index=self.index )

pd
.DataFrame.elsif = elsif

df
=pd.DataFrame( np.random.randn(10000,3), columns=list('xyz') )

df
['new1'] = df.elsif( [ ( df.x > 0,    9 ),
                         
( df.y > 0,   99 ),
                         
( df.z > 0, df.x ) ], default=np.nan )
 
df
['new2'] = np.nan
df
.ix[ (df.x>0) & df.new2.isnull(), 'new2' ] =    9
df
.ix[ (df.y>0) & df.new2.isnull(), 'new2' ] =   99
df
.ix[ (df.z>0) & df.new2.isnull(), 'new2' ] = df.x

df
['new3'] = np.select( [ df.x > 0, df.y > 0, df.z > 0 ],
                       
[ 9,        99,       df.x     ], default=np.nan )

df
['new4'] = np.nan
df
['new4'] = np.where( (df.x>0) & df.new4.isnull(),    9, df.new4 )
df
['new4'] = np.where( (df.y>0) & df.new4.isnull(),   99, df.new4 )
df
['new4'] = np.where( (df.z>0) & df.new4.isnull(), df.x, df.new4 )


           x         y         z       new1       new2       new3       new4
0   0.908012 -0.272407  0.457136   9.000000   9.000000   9.000000   9.000000
1   0.416751 -0.681781 -2.573377   9.000000   9.000000   9.000000   9.000000
2  -0.117099 -0.263772 -0.543274        NaN        NaN        NaN        NaN
3  -0.469839 -1.987738 -0.398389        NaN        NaN        NaN        NaN
4  -1.167369  0.763096 -0.972087  99.000000  99.000000  99.000000  99.000000
5   1.171037 -0.820329 -1.484193   9.000000   9.000000   9.000000   9.000000
6  -1.195875  0.708652  1.815630  99.000000  99.000000  99.000000  99.000000
7  -0.719962 -0.912801 -0.105269        NaN        NaN        NaN        NaN
8  -0.347748 -1.533121  0.543040  -0.347748  -0.347748  -0.347748  -0.347748
9  -0.177696 -1.079987 -2.023804        NaN        NaN        NaN        NaN
10 -1.154015 -0.860411  0.410985  -1.154015  -1.154015  -1.154015  -1.154015





Reply all
Reply to author
Forward
0 new messages