pandas dataframe mutability

1,466 views
Skip to first unread message

Tim

unread,
Jul 22, 2012, 8:25:11 AM7/22/12
to pyd...@googlegroups.com
Dear all:
I have the following code:
a = pd.DataFrame([[1,2,3],[3,2,1]])
b = a
b[0] *= 0.1 # now a changes too
c = a
c = c*0.1 # a does not change with c

I found it a bit confusing. but I am wondering if this is intentional? what are the mutable behaviors for pandas data frame?

Thanks a lot!
Tim

Nathaniel Smith

unread,
Jul 24, 2012, 2:58:39 PM7/24/12
to pyd...@googlegroups.com
This isn't actually anything specific to pandas -- what you're
observing is a side-effect of how Python works in general.

What you have to remember is that in Python, there are two different
things: variables (like "a"), and the actual underlying objects that
they refer to (like the chunk-of-memory that calling pd.DataFrame
allocated).

a = pd.DataFrame([[1,2,3],[3,2,1]])

This creates a new data frame object, and creates a new variable named
"a" that points to that object.

b = a

This creates a new variable named "b" that points to the same object
that "a" points to. (Note that the symmetry here is illusory -- the
right-hand side of a = refers to a *value*, a dataframe object in this
case, which is retrieved by evaluating the variable "a". The left-hand
side of a = refers to a *location*, the variable named "b" in this
case.)

b[0] *= 0.1 # now a changes too

This modifies the object that "b" points to. Since "a" points to the
same object, the change is also visible there.

c = a

This creates a new variable named "c" that points to the same object
that "a" points to.

c = c*0.1 # a does not change with c

This code first evaluates "c * 0.1", which looks up the data frame in
"c", then calls its __mul__ method, which returns an entirely new data
frame object. Then it *re-creates* the variable named "c", so that it
points to this new object.

Hopefully that makes it obvious why you're seeing what you're seeing...

-n
Reply all
Reply to author
Forward
0 new messages