Re: Issue updating data frame rows.

Jeff

unread,

May 21, 2013, 8:57:43 AM5/21/13

to pyd...@googlegroups.com

you are setting with a numpy array (you are using .values) on the rhs of the expression

if number of True values in the lhs is == to the length of the rhs then this is fine (and this should work)

but it looks like it is taking on actual true/false values (I think its setting to the 0/1 element on the rhs)

if you can reproduce this I can look at it

in any event, you should really do (then the data alignment is assured)

data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] = data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] / 1000.0

On Tuesday, May 21, 2013 8:37:24 AM UTC-4, setrofim wrote:

I'm trying to adjust values of some rows in a data frame (selected through boolean indexing):

In [228]:

data_nz.ix[data_nz['units'] == 'KB/sec', 'value']

Out[228]:

5        990.0
96       988.8
161      981.3
265      935.5
314     1000.1
426      991.5
2359     858.6
2466     833.1
2523     858.0
2572     858.4
2621     860.7
2684     840.1
Name: value, dtype: float64

In [229]:

data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] = data_nz.ix[data_nz['units'] == 'KB/sec', 'value'].values / 1000.0

but for some reason it looks like all rows get updated to the first value:

In [230]:

data_nz.ix[data_nz['units'] == 'KB/sec', 'value']

Out[230]:

5       0.99
96      0.99
161     0.99
265     0.99
314     0.99
426     0.99
2359    0.99
2466    0.99
2523    0.99
2572    0.99
2621    0.99
2684    0.99
Name: value, dtype: float64

Am I doing something wrong? I ran a simple test doing something similar on a fake data frame and I get the results I would expect:

In [22]: df = pandas.DataFrame({'x':range(10), 'y':range(10,20)})

In [23]: df
Out[23]:
   x   y
0 0 10
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19

In [24]: df.ix[df.x % 2 == 0, 'y'] = df.ix[df.x % 2 == 0, 'y'].values * 100

In [25]: df
Out[25]:
   x     y
0 0 1000
1 1    11
2 2 1200
3 3    13
4 4 1400
5 5    15
6 6 1600
7 7    17
8 8 1800
9 9    19

Rows get adjusted to their corresponding values. I don't see a difference in what I'm doing in the test above and in my real data (the only difference I can think of is that the real data frame gets constructed with read_csv method). Any guidance would be greatly appreciated.

setrofim

unread,

May 21, 2013, 9:47:32 AM5/21/13

to pyd...@googlegroups.com

Jeff,

Thank you for your response.

in any event, you should really do (then the data alignment is assured)

data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] = data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] / 1000.0

I have tried doing this, but am getting the same result:

In [233]:

data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] = data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] / 1000.0

data_nz.ix[data_nz['units'] == 'KB/sec', 'value']

Out[233]:

5       0.99
96      0.99
161     0.99
265     0.99
314     0.99
426     0.99
2359    0.99
2466    0.99
2523    0.99
2572    0.99
2621    0.99
2684    0.99
Name: value, dtype: float64

I have also verified that lhs and rhs are of equal length:

In [240]:

len(data_nz.ix[data_nz['units'] == 'KB/sec', 'value']) == len(data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] / 1000.0)

Out[240]:

True

Jeff

unread,

May 21, 2013, 10:05:29 AM5/21/13

to pyd...@googlegroups.com

you post the frame (or a link if too big)

and your code to get to this point?

something odd going on

setrofim

unread,

May 21, 2013, 10:25:14 AM5/21/13

to pyd...@googlegroups.com

Jeff,

Please find the data attached. The following is the minimal code to reproduce the issue:

import pandas
data = pandas.read_csv('output.csv')
data_nz = data[data['value'] > 0]
data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] = data_nz[data_nz['units'] == 'KB/sec']['value'] / 1000.0
wseq = data_nz.ix[data_nz['units'] == 'KB/sec', 'value']
print wseq.values

I'm using Python 2.7.3 and pandas 0.11.0.

Thanks for looking into this.

output.csv

Jeff

unread,

May 21, 2013, 10:54:03 AM5/21/13

to pyd...@googlegroups.com

This is a very specifc case of a mixedframe not aligning with a series input

here's a repr example

will fix and get into 0.11.1

thanks

Jeff

In [35]: df = pandas.DataFrame({'x':range(10), 'y':range(10,20),'z' : 'bar'})

In [36]: df.ix[df.x % 2 == 0, 'y'] = df.ix[df.x % 2 == 0, 'y'].values * 100

In [37]: df
Out[37]:
   x     y    z
0 0 1000 bar
1 1    11 bar
2 2 1000 bar
3 3    13 bar
4 4 1000 bar
5 5    15 bar
6 6 1000 bar
7 7    17 bar
8 8 1000 bar
9 9    19 bar

setrofim

unread,

May 21, 2013, 11:07:29 AM5/21/13

to pyd...@googlegroups.com

I see. Many thanks, Jeff. What would be a good work around in the mean time?

Jeff

unread,

May 21, 2013, 11:14:05 AM5/21/13

to pyd...@googlegroups.com

set it as a single column dataframe

In [7]: df = DataFrame({'x':range(10), 'y':range(10,20),'z' : 'bar'})

In [8]: x = DataFrame(dict(y = df.ix[df.x % 2 == 0, 'y'] * 100))

In [9]: df.ix[df.x % 2 == 0, 'y'] = x

In [10]: df
Out[10]:

x y z
0 0 1000 bar
1 1 11 bar

2 2 1200 bar
3 3    13 bar
4 4 1400 bar
5 5    15 bar
6 6 1600 bar
7 7    17 bar
8 8 1800 bar
9 9    19 bar

setrofim

unread,

May 21, 2013, 11:34:53 AM5/21/13

to pyd...@googlegroups.com

Ah cool, that makes sense (knew there'd be a better way than looping over all the indexes with set_value() calls).

Thanks again for all your help and rapid response, Jeff! I really appreciate it.

Cheers,
Sergei

Jeff

unread,

May 21, 2013, 12:15:47 PM5/21/13

to pyd...@googlegroups.com

heres the PR to fix it - thanks this was an untested case (believe it or not!)

https://github.com/pydata/pandas/pull/3670

will be merged shortly

Jeff

unread,

May 21, 2013, 2:26:12 PM5/21/13

to pyd...@googlegroups.com

all merged into master...pls give a try

Skipper Seabold

unread,

May 21, 2013, 6:37:24 PM5/21/13

to pyd...@googlegroups.com

Funnily enough, I just ran into this issue. Seems to be fixed in master. Thanks.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

setrofim

unread,

May 22, 2013, 5:57:15 AM5/22/13

to pyd...@googlegroups.com

Sorry of the delay in replying. Just tried master, works perfectly now. Thank you very much for the quick turn around.

Reply all

Reply to author

Forward