DataFrames: replace function?

120 views
Skip to first unread message

Alexander Flyax

unread,
Jan 5, 2015, 6:47:23 PM1/5/15
to julia...@googlegroups.com
In Julia's DataFrames, is there an equivalent of Pandas' replace()? If not, what's the most efficient way to accomplish its equivalent? Thanks...

E.g. (from Python/Pandas):

print(dating_df.head(3))
   miles     games  ice_cream  opinion
0  40920  8.326976   0.953952        3
1  14488  7.153469   1.673904        2
2  26052  1.441871   0.805124        1

dating_df.opinion.replace({1:'disliked', 2:'OK', 3:'liked'}, inplace=True)
print(dating_df.head(3))

   miles     games  ice_cream   opinion
0  40920  8.326976   0.953952     liked
1  14488  7.153469   1.673904        OK
2  26052  1.441871   0.805124  disliked
Message has been deleted
Message has been deleted

Andrew Ellis

unread,
Jan 6, 2015, 6:14:31 PM1/6/15
to julia...@googlegroups.com
in the absence of a built-in replace function, this might work:

using Compat, Distributions, DataFrames, DataFramesMeta

mapping = @Compat.Dict(1 => "a", 2 => "b", 3=> "c")

df[:opinion] = @with df begin
    :opinion = [mapping[i] for i in :opinion]
end

the @with isn't really necessary, you could just write:

df[:opinion] = [mapping[i] for i in df[:opinion]]

Tom Short

unread,
Jan 6, 2015, 8:24:01 PM1/6/15
to julia...@googlegroups.com
Another option is to convert the column to a PooledDataArray and then create a new PooledDataArray using the refs from the first PooledDataArray and a new pool that does the mapping you want.

Also, you can write Andrew's @with construct more concisely:

df[:opinion] = @with df [mapping[i] for i in :opinion]


On Tue, Jan 6, 2015 at 6:09 PM, Andrew Ellis <a.w....@gmail.com> wrote:
in the absence of a built-in replace function, this might work:

using Compat, Distributions, DataFrames, DataFramesMeta

mapping = @Compat.Dict(1 => "a", 2 => "b", 3=> "c")

df[:opinion] = @with df begin
    :opinion = [mapping[i] for i in :opinion]
end

the @with isn't really necessary, you could just write:

df[:opinion] =  df[:opinion] = [mapping[i] for i in df[:opinion]]


On Tuesday, January 6, 2015 12:47:23 AM UTC+1, Alexander Flyax wrote:

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexander Flyax

unread,
Jan 7, 2015, 10:26:36 AM1/7/15
to julia...@googlegroups.com
Thanks, Andrew and tshort. I thought of a similar solution using
dating_df[:opinion] = [ {1 => "bad", 2 => "OK", 3 => "good"}[i] for i in dating_df[:opinion] ]
The problem is when I feed it only, e.g., {1 => "bad", 2 => "OK", 3 => "good"}. Then, as expected: key not found: 3. In Pandas you can df.replace({0:Nan}).

I can write a for loop that will check for each value in dating_df[:opinion] or a function that will use that for loop, but I am just wondering if there is a more elegant solution. Something like:
mapping = {1 => "bad", 2 => "OK", 3 => "good"}
dating_df[:opinion] = [ mapping[i] for i in dating_df[:opinion] if i in mapping]

Alexander Flyax

unread,
Jan 7, 2015, 10:27:24 AM1/7/15
to julia...@googlegroups.com
Sorry, I meant if I feed it only {1 => "bad", 2 => "OK"} ...

Alexander Flyax

unread,
Jan 7, 2015, 11:17:18 AM1/7/15
to julia...@googlegroups.com
Actually, apparently, I can't write a for loop, because: 

`convert` has no method matching convert(::Type{Int64}, ::ASCIIString)

(I am trying to say df[:col][i] = "text".)
Reply all
Reply to author
Forward
0 new messages