Correctly accessing DataFrame columns in loop

673 views
Skip to first unread message

Nils Gudat

unread,
May 8, 2015, 2:27:18 PM5/8/15
to julia...@googlegroups.com
I've been trying to rename a bunch of columns in a DataFrame and as a former pandas user am a bit thrown off by the way columns are accessed using symbols.
Let's say I have a list of column names (strings) which I want to all give the same name with ascending integers at the end. After a few unsuccesful attempts, I came up with the following, but I feel like there has to be a less ugly way of doing this:

df = DataFrame(A = rand(5), B = rand(5), C = rand(5), D = rand(5))
varlist = ["B", "C"]


for i = 1:length(varlist)
  rename!(df, convert(Symbol, varlist[i])), convert(Symbol, "column_"*string(i))
end

Tomas Lycken

unread,
May 8, 2015, 2:34:30 PM5/8/15
to julia...@googlegroups.com

Since you’re in control of varlist, you could simplify this at least once by using symbols from the beginning. Next, you can use the symbol function and a different overload of string instead of convert to make the concat-convert operation a little less ugly:

julia> using DataFrames

julia> df = DataFrame(A = rand(5), B = rand(5), C = rand(5), D = rand(5))
5x4 DataFrames.DataFrame
| Row | A        | B        | C        | D        |
|-----|----------|----------|----------|----------|
| 1   | 0.550768 | 0.464531 | 0.141101 | 0.754492 |
| 2   | 0.629269 | 0.100223 | 0.981175 | 0.035041 |
| 3   | 0.26019  | 0.962588 | 0.948283 | 0.51513  |
| 4   | 0.755892 | 0.202503 | 0.727609 | 0.255172 |
| 5   | 0.28018  | 0.328776 | 0.684717 | 0.502154 |

julia> varlist = [:B,:C]
2-element Array{Symbol,1}:
 :B
 :C

julia> for i in eachindex(varlist)
           rename!(df, varlist[i], symbol(string(varlist[i], i)))
       end

julia> df
5x4 DataFrames.DataFrame
| Row | A        | B1       | C2       | D        |
|-----|----------|----------|----------|----------|
| 1   | 0.550768 | 0.464531 | 0.141101 | 0.754492 |
| 2   | 0.629269 | 0.100223 | 0.981175 | 0.035041 |
| 3   | 0.26019  | 0.962588 | 0.948283 | 0.51513  |
| 4   | 0.755892 | 0.202503 | 0.727609 | 0.255172 |
| 5   | 0.28018  | 0.328776 | 0.684717 | 0.502154 |

I usually think of symbols as "strings with some constraints", and using one instead of the other then becomes quite trivial, conceptually.

// T

Nils Gudat

unread,
May 9, 2015, 9:51:42 AM5/9/15
to julia...@googlegroups.com
Thanks, that does look a bit better. Just for clarification: eachindex is only available from 0.4 onwards, correct?

Tim Holy

unread,
May 9, 2015, 11:00:51 AM5/9/15
to julia...@googlegroups.com
On Saturday, May 09, 2015 06:51:42 AM Nils Gudat wrote:
> Thanks, that does look a bit better. Just for clarification: eachindex is
> only available from 0.4 onwards, correct?

It's also available if you're "using Compat".

--Tim

Reply all
Reply to author
Forward
0 new messages