question about DataFrames package

瀏覽次數:544 次
跳到第一則未讀訊息

cnbiz850

未讀,
2014年5月1日 凌晨12:56:542014/5/1
收件者:julia...@googlegroups.com
My question is what is a symbol as in the following warning message?

=========
julia> df = DataFrame(A = 1:10, B = 2:2:20)
10x2 DataFrame
|-------|----|----|
| Row # | A | B |
| 1 | 1 | 2 |
| 2 | 2 | 4 |
| 3 | 3 | 6 |
| 4 | 4 | 8 |
| 5 | 5 | 10 |
| 6 | 6 | 12 |
| 7 | 7 | 14 |
| 8 | 8 | 16 |
| 9 | 9 | 18 |
| 10 | 10 | 20 |

julia> julia> df["A"]
WARNING: indexing DataFrames with strings is deprecated; use symbols instead
in depwarn at deprecated.jl:36
in getindex at ~/.julia/DataFrames/src/deprecated.jl:107
10-element DataArray{Int64,1}:
1
2
3
4
5
6
7
8
9
10



Ivar Nesje

未讀,
2014年5月1日 凌晨1:24:052014/5/1
收件者:julia...@googlegroups.com
Symbols in Julia is a special form of strings that are faster for some operations (like comparisons) and much slower for others. Julia uses symbols internally to represent variable names.

You create a symbol from a string with the symbol("A") function, or the :A syntax.

Tomas Lycken

未讀,
2014年5月2日 凌晨3:14:482014/5/2
收件者:julia...@googlegroups.com
Sorry to hijack the thread, but since I stumbled over this problem myself (in the same context) and didn't know about the `symbol("A")` syntax, this seems like an appropriate place to ask:

In the dataframe I was working with, I had one column named "R", and another which I wanted to name "<R>". Using :R was no problem, but it's not possible to refer to :<R> at all. (Try it in the REPL - it parses it as an incomplete expression, and if I add something after I get an error "R not defined"...)

I think it's cool that it's possible to define symbols from arbitrary strings using e.g. `symbol("<R>")`, but it's kind of clunky that you can't refer to them with the colon syntax once they're defined. Is there a way around this, or do I have to simply "deal with it"? =)

// T

Stefan Karpinski

未讀,
2014年5月2日 凌晨3:50:142014/5/2
收件者:Julia Users
We talked at some point about making :"<R>" syntax for symbol("<R>") and requiring something like :("<R>") to express the fairly useless operation of quoting the string "<R>" (this is useless because the result is just the string "<R>"). Barring some problem with this that I'm not thinking of, I'd be in favor of such a change.

Tomas Lycken

未讀,
2014年5月2日 凌晨4:03:242014/5/2
收件者:julia...@googlegroups.com
:"<R>" for symbol("<R>") makes sense to me, so if it's not in the way of anything else, I'm all for it.

And yeah, :"<R>" == "<R>" returns true, so I don't see how this could really make something impossible to do, which is possible today. I guess if there's code out that quotes literal strings like that it'll break, but I doubt that there's a lot of it... I have no idea how such a change would be implemented, though, so I'm afraid I won't be of much help making it happen.

// T

John Myles White

未讀,
2014年5月2日 上午10:08:372014/5/2
收件者:julia...@googlegroups.com
I would discourage using <R> as the name of a column in a DataFrame.

Part of the reason we’re using symbols now is that it will encourage people to use column names that are valid Julia variable names. If you stick to valid variable names, you’ll always be able to use metaprogramming tools like those employed to generate formulas for GLM’s.

 — John

Harlan Harris

未讀,
2014年5月2日 上午11:21:492014/5/2
收件者:julia...@googlegroups.com
What John said. This said, an old idea for the DF design would be to include additional metadata for each column, which could include things like an arbitrary Unicode pretty name that's not restricted to valid variable name strings.

Stefan Karpinski

未讀,
2014年5月2日 上午11:23:572014/5/2
收件者:Julia Users
Also, at some point one may be able to write df.foo instead of df[:foo].
回覆所有人
回覆作者
轉寄
0 則新訊息