Re: Issue updating data frame rows.

53 views
Skip to first unread message

Jeff

unread,
May 21, 2013, 8:57:43 AM5/21/13
to pyd...@googlegroups.com
you are setting with a numpy array (you are using .values) on the rhs of the expression
if number of True values in the lhs is == to the length of the rhs then this is fine (and this should work)
but it looks like it is taking on actual true/false values (I think its setting to the 0/1 element on the rhs)
 
if you can reproduce this I can look at it
 
in any event, you should really do (then the data alignment is assured)
 
data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] = data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] / 1000.0
 

On Tuesday, May 21, 2013 8:37:24 AM UTC-4, setrofim wrote:
I'm trying to adjust values of some rows in a data frame (selected through boolean indexing):

In [228]:

data_nz.ix[data_nz['units'] == 'KB/sec', 'value']

Out[228]:

5        990.0
96       988.8
161      981.3
265      935.5
314     1000.1
426      991.5
2359     858.6
2466     833.1
2523     858.0
2572     858.4
2621     860.7
2684     840.1
Name: value, dtype: float64
 
In [229]:
 
data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] = data_nz.ix[data_nz['units'] == 'KB/sec', 'value'].values / 1000.0

but for some reason it looks like all rows get updated to the first value:

In [230]:

data_nz.ix[data_nz['units'] == 'KB/sec', 'value']

Out[230]:

5       0.99
96      0.99
161     0.99
265     0.99
314     0.99
426     0.99
2359    0.99
2466    0.99
2523    0.99
2572    0.99
2621    0.99
2684    0.99
Name: value, dtype: float64

Am I doing something wrong? I ran a simple test doing something similar on a fake data frame and I get the results I would expect:

In [22]: df = pandas.DataFrame({'x':range(10), 'y':range(10,20)})

In [23]: df
Out[23]:
   x   y
0  0  10
1  1  11
2  2  12
3  3  13
4  4  14
5  5  15
6  6  16
7  7  17
8  8  18
9  9  19

In [24]: df.ix[df.x % 2 == 0, 'y'] = df.ix[df.x % 2 == 0, 'y'].values * 100

In [25]: df
Out[25]:
   x     y
0  0  1000
1  1    11
2  2  1200
3  3    13
4  4  1400
5  5    15
6  6  1600
7  7    17
8  8  1800
9  9    19

Rows get adjusted to their corresponding values. I don't see a difference in what I'm doing in the test above and in my real data (the only difference I can think of is that the real data frame gets constructed with read_csv method). Any guidance would be greatly appreciated.

setrofim

unread,
May 21, 2013, 9:47:32 AM5/21/13
to pyd...@googlegroups.com
Jeff,

Thank you for your response.


in any event, you should really do (then the data alignment is assured)
 
data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] = data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] / 1000.0

I have tried doing this, but am getting the same result:

In [233]:
data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] = data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] / 1000.0
data_nz.ix[data_nz['units'] == 'KB/sec', 'value']
Out[233]:
5       0.99
96      0.99
161     0.99
265     0.99
314     0.99
426     0.99
2359    0.99
2466    0.99
2523    0.99
2572    0.99
2621    0.99
2684    0.99
Name: value, dtype: float64

I have also verified that lhs and rhs are of equal length:

In [240]:
len(data_nz.ix[data_nz['units'] == 'KB/sec', 'value']) == len(data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] / 1000.0)
 
Out[240]:
True
 

Jeff

unread,
May 21, 2013, 10:05:29 AM5/21/13
to pyd...@googlegroups.com
you post the frame (or a link if too big)
and your code to get to this point?
 
something odd going on

setrofim

unread,
May 21, 2013, 10:25:14 AM5/21/13
to pyd...@googlegroups.com
Jeff,

Please find the data attached. The following is the minimal code to reproduce the issue:

import pandas
data = pandas.read_csv('output.csv')
data_nz = data[data['value'] > 0]
data_nz.ix[data_nz['units'] == 'KB/sec', 'value'] = data_nz[data_nz['units'] == 'KB/sec']['value'] / 1000.0
wseq = data_nz.ix[data_nz['units'] == 'KB/sec', 'value']
print wseq.values

I'm using Python 2.7.3 and pandas 0.11.0.

Thanks for looking into this.
output.csv

Jeff

unread,
May 21, 2013, 10:54:03 AM5/21/13
to pyd...@googlegroups.com
This is a very specifc case of a mixedframe not aligning with a series input
 
here's a repr example
 
will fix and get into 0.11.1
 
thanks
 
Jeff
In [35]: df = pandas.DataFrame({'x':range(10), 'y':range(10,20),'z' : 'bar'})

In [36]: df.ix[df.x % 2 == 0, 'y'] = df.ix[df.x % 2 == 0, 'y'].values * 100

In [37]: df
Out[37]:
   x     y    z
0  0  1000  bar
1  1    11  bar
2  2  1000  bar
3  3    13  bar
4  4  1000  bar
5  5    15  bar
6  6  1000  bar
7  7    17  bar
8  8  1000  bar
9  9    19  bar

setrofim

unread,
May 21, 2013, 11:07:29 AM5/21/13
to pyd...@googlegroups.com
I see. Many thanks, Jeff. What would be a good work around in the mean time?

Jeff

unread,
May 21, 2013, 11:14:05 AM5/21/13
to pyd...@googlegroups.com
set it as a single column dataframe
 
In [7]: df = DataFrame({'x':range(10), 'y':range(10,20),'z' : 'bar'})

In [8]: x = DataFrame(dict(y = df.ix[df.x % 2 == 0, 'y'] * 100))

In [9]: df.ix[df.x % 2 == 0, 'y'] = x

In [10]: df
Out[10]:

   x     y    z
0  0  1000  bar
1  1    11  bar

2  2  1200  bar
3  3    13  bar
4  4  1400  bar
5  5    15  bar
6  6  1600  bar
7  7    17  bar
8  8  1800  bar
9  9    19  bar

setrofim

unread,
May 21, 2013, 11:34:53 AM5/21/13
to pyd...@googlegroups.com
Ah cool, that makes sense (knew there'd be a better way than looping over all the indexes with set_value() calls).

Thanks again for all your help and rapid response, Jeff! I really appreciate it.

Cheers,
Sergei

Jeff

unread,
May 21, 2013, 12:15:47 PM5/21/13
to pyd...@googlegroups.com
heres the PR to fix it - thanks this was an untested case (believe it or not!)
 
 
will be merged shortly

Jeff

unread,
May 21, 2013, 2:26:12 PM5/21/13
to pyd...@googlegroups.com
all merged into master...pls give a try

Skipper Seabold

unread,
May 21, 2013, 6:37:24 PM5/21/13
to pyd...@googlegroups.com
Funnily enough, I just ran into this issue. Seems to be fixed in master. Thanks.


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

setrofim

unread,
May 22, 2013, 5:57:15 AM5/22/13
to pyd...@googlegroups.com
Sorry of the delay in replying. Just tried master, works perfectly now. Thank you very much for the quick turn around.
Reply all
Reply to author
Forward
0 new messages