conditional assignment

36 views
Skip to first unread message

jorge creixell

unread,
Mar 8, 2012, 6:17:18 AM3/8/12
to pystatsmodels
Hello,

maybe i am asking something very basic, but i haven't found the way of
doing something like this in pandas:

df[df['column']==1] = 5

i.e. multiple assignment based on a condition.

Is there any way to do that without iterating the whole dataframe?

Thanks!

Wouter Overmeire

unread,
Mar 8, 2012, 9:09:52 AM3/8/12
to pystat...@googlegroups.com



In [93]: df

Out[93]:
   a         b         c   d
0  6  0.003649  1.259207 NaN
1  2 -0.211644 -1.896789 NaN
2  9 -0.384483  0.243700 NaN
3  9 -2.750329  1.474342 NaN
4  2 -0.193401  0.679474 NaN

In [94]: df.ix[df['a'] == 9, ['b', 'd']] = 50.0

In [95]: df
Out[95]:
   a          b         c   d
0  6   0.003649  1.259207 NaN
1  2  -0.211644 -1.896789 NaN
2  9  50.000000  0.243700  50
3  9  50.000000  1.474342  50
4  2  -0.193401  0.679474 NaN

In [96]: df.ix[df['a'] == 2, 'c'] = 100.0

In [97]: df
Out[97]:
   a          b           c   d
0  6   0.003649    1.259207 NaN
1  2  -0.211644  100.000000 NaN
2  9  50.000000    0.243700  50
3  9  50.000000    1.474342  50
4  2  -0.193401  100.000000 NaN


Wouter Overmeire

unread,
Mar 8, 2012, 9:14:06 AM3/8/12
to pystat...@googlegroups.com

One more, because i think this is the one you need.
 
In [117]: df
Out[117]:
   a         b         c   d
0  2  0.103795 -0.066523 NaN
1  3 -0.939895 -0.208760 NaN
2  9  1.968456  0.419374 NaN
3  6 -0.105170  0.162064 NaN
4  7  0.381707  2.126133 NaN

In [118]: df.ix[df['a'] == 3] = 100

In [119]: df
Out[119]:
     a           b           c    d
0    2    0.103795   -0.066523  NaN
1  100  100.000000  100.000000  100
2    9    1.968456    0.419374  NaN
3    6   -0.105170    0.162064  NaN
4    7    0.381707    2.126133  NaN

I`m surprised that  df[df['a'] == 2] = 100, raises an exception i would expect this one to work too.

jorge creixell

unread,
Mar 8, 2012, 10:27:49 AM3/8/12
to pystat...@googlegroups.com
great, thanks a lot! 

i was surprised by the exception too, would be great to have this feature in the future :)

Wouter Overmeire

unread,
Mar 8, 2012, 11:10:07 AM3/8/12
to pystat...@googlegroups.com
On Thu, Mar 8, 2012 at 4:27 PM, jorge creixell <jorge.c...@wimdu.com> wrote:
great, thanks a lot! 

i was surprised by the exception too, would be great to have this feature in the future :)

Probably a good idea to post exception as an issue on github.

Adam Klein

unread,
Mar 8, 2012, 12:03:00 PM3/8/12
to pystat...@googlegroups.com
That's odd, with random data looks ok both slicing and setting. Let me try with your data next.
 
In [20]: df
Out[20]: 
          0         1         2         3         4
0  0.852543  0.671119  0.186331  0.941615  0.436338
1  0.564321  0.013514  0.020539  0.606974  0.736674
2  0.811880  0.217941  0.083967  0.105470  0.552018
3  0.278145  0.854468  0.884766  0.281775  0.934472
4  0.728728  0.919129  0.301560  0.623416  0.554836

In [21]: df[0] == df.ix[0,0]
Out[21]: 
0     True
1    False
2    False
3    False
4    False

In [19]: df[df[0] == df.ix[0,0]]
Out[19]: 
          0         1         2         3         4
0  0.852543  0.671119  0.186331  0.941615  0.436338

In [22]: df[0]
Out[22]: 
0    0.852543
1    0.564321
2    0.811880
3    0.278145
4    0.728728

In [24]: df[df[0] == df.ix[0,0]] = 5

In [25]: df
Out[25]: 
   0         1         2         3         4
0  5  0.671119  0.186331  0.941615  0.436338
1  5  0.013514  0.020539  0.606974  0.736674
2  5  0.217941  0.083967  0.105470  0.552018
3  5  0.854468  0.884766  0.281775  0.934472
4  5  0.919129  0.301560  0.623416  0.554836


Adam Klein

unread,
Mar 8, 2012, 12:09:06 PM3/8/12
to pystat...@googlegroups.com
Yep, seems like a bug. Seems like it has to do with checking the labels on the wrong axis.

Adam Klein

unread,
Mar 8, 2012, 12:37:46 PM3/8/12
to pystat...@googlegroups.com
Oops, that totally isn't what's supposed to happen in the first case either (all of row 0 should be set to 5, not column). It's an inconsistency, it tries to use the boolean index to access a vector of columns instead of rows in setitem case, which is inconsistent with the getitem behavior.

Wes McKinney

unread,
Mar 14, 2012, 9:28:07 PM3/14/12
to pystat...@googlegroups.com

All seems to be good now after Adam's patch in #881. I indeed had
never implemented the __setitem__ via boolean array logic (only
indirectly in .ix).

- Wes

Reply all
Reply to author
Forward
0 new messages