question about DataFrames package

545 views
Skip to first unread message

cnbiz850

unread,
May 1, 2014, 12:56:54 AM5/1/14
to julia...@googlegroups.com
My question is what is a symbol as in the following warning message?

=========
julia> df = DataFrame(A = 1:10, B = 2:2:20)
10x2 DataFrame
|-------|----|----|
| Row # | A | B |
| 1 | 1 | 2 |
| 2 | 2 | 4 |
| 3 | 3 | 6 |
| 4 | 4 | 8 |
| 5 | 5 | 10 |
| 6 | 6 | 12 |
| 7 | 7 | 14 |
| 8 | 8 | 16 |
| 9 | 9 | 18 |
| 10 | 10 | 20 |

julia> julia> df["A"]
WARNING: indexing DataFrames with strings is deprecated; use symbols instead
in depwarn at deprecated.jl:36
in getindex at ~/.julia/DataFrames/src/deprecated.jl:107
10-element DataArray{Int64,1}:
1
2
3
4
5
6
7
8
9
10



Ivar Nesje

unread,
May 1, 2014, 1:24:05 AM5/1/14
to julia...@googlegroups.com
Symbols in Julia is a special form of strings that are faster for some operations (like comparisons) and much slower for others. Julia uses symbols internally to represent variable names.

You create a symbol from a string with the symbol("A") function, or the :A syntax.

Tomas Lycken

unread,
May 2, 2014, 3:14:48 AM5/2/14
to julia...@googlegroups.com
Sorry to hijack the thread, but since I stumbled over this problem myself (in the same context) and didn't know about the `symbol("A")` syntax, this seems like an appropriate place to ask:

In the dataframe I was working with, I had one column named "R", and another which I wanted to name "<R>". Using :R was no problem, but it's not possible to refer to :<R> at all. (Try it in the REPL - it parses it as an incomplete expression, and if I add something after I get an error "R not defined"...)

I think it's cool that it's possible to define symbols from arbitrary strings using e.g. `symbol("<R>")`, but it's kind of clunky that you can't refer to them with the colon syntax once they're defined. Is there a way around this, or do I have to simply "deal with it"? =)

// T

Stefan Karpinski

unread,
May 2, 2014, 3:50:14 AM5/2/14
to Julia Users
We talked at some point about making :"<R>" syntax for symbol("<R>") and requiring something like :("<R>") to express the fairly useless operation of quoting the string "<R>" (this is useless because the result is just the string "<R>"). Barring some problem with this that I'm not thinking of, I'd be in favor of such a change.

Tomas Lycken

unread,
May 2, 2014, 4:03:24 AM5/2/14
to julia...@googlegroups.com
:"<R>" for symbol("<R>") makes sense to me, so if it's not in the way of anything else, I'm all for it.

And yeah, :"<R>" == "<R>" returns true, so I don't see how this could really make something impossible to do, which is possible today. I guess if there's code out that quotes literal strings like that it'll break, but I doubt that there's a lot of it... I have no idea how such a change would be implemented, though, so I'm afraid I won't be of much help making it happen.

// T

John Myles White

unread,
May 2, 2014, 10:08:37 AM5/2/14
to julia...@googlegroups.com
I would discourage using <R> as the name of a column in a DataFrame.

Part of the reason we’re using symbols now is that it will encourage people to use column names that are valid Julia variable names. If you stick to valid variable names, you’ll always be able to use metaprogramming tools like those employed to generate formulas for GLM’s.

 — John

Harlan Harris

unread,
May 2, 2014, 11:21:49 AM5/2/14
to julia...@googlegroups.com
What John said. This said, an old idea for the DF design would be to include additional metadata for each column, which could include things like an arbitrary Unicode pretty name that's not restricted to valid variable name strings.

Stefan Karpinski

unread,
May 2, 2014, 11:23:57 AM5/2/14
to Julia Users
Also, at some point one may be able to write df.foo instead of df[:foo].
Reply all
Reply to author
Forward
0 new messages