reshaping dataframe to make rows columns?

132 views
Skip to first unread message

Reuben

unread,
Sep 4, 2016, 3:49:35 PM9/4/16
to julia-stats
Seems like this should be easy, probably is, but I cannot figure out how to get it to work with the melt / stack functions. The reason I want to is, I'm finding that with the data I have, I search for a the row where the string in column 1 == myparameter, then I get the values for columns 2-6, clumsily extract them into an array (because a dataframe row is not a dataArray), then apply the mean function to this array. It would be a lot simpler to have column names that were based on my parameter values. Then i could say "mean(df[:parameter][2:6])" and be done.

I suspect I am missing out on how to use dataframes to make this easy; can someone point me in the right direction?

-Reuben

Andreas Noack

unread,
Sep 4, 2016, 4:28:07 PM9/4/16
to julia...@googlegroups.com
Would the by function help here? E.g. something like

julia> df = DataFrame(ct = vcat(fill("A", 3), fill("B", 5)), x = randn(8))
8×2 DataFrames.DataFrame
│ Row │ ct  │ x         │
├─────┼─────┼───────────┤
│ 1   │ "A" │ -1.17715  │
│ 2   │ "A" │ 0.781145  │
│ 3   │ "A" │ 0.74948   │
│ 4   │ "B" │ -1.88212  │
│ 5   │ "B" │ 1.30658   │
│ 6   │ "B" │ -0.578074 │
│ 7   │ "B" │ -0.710504 │
│ 8   │ "B" │ -1.98858  │

julia> by(df, :ct, d -> mean(d[:x]))
2×2 DataFrames.DataFrame
│ Row │ ct  │ x1        │
├─────┼─────┼───────────┤
│ 1   │ "A" │ 0.117825  │
│ 2   │ "B" │ -0.770539 │

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reuben

unread,
Sep 4, 2016, 7:18:20 PM9/4/16
to julia-stats
That isn't exactly what I'm looking for...what i'm trying to do is transform something like the first one here to the second:

│ Row │ ct  │ x         │
├─────┼─────┼───────────┤
│ 1   │ "A" │ -1.17715  │
│ 2   │ "B" │ 0.781145  │
│ 3   │ "C" │ 0.74948   │
│ 4   │ "D" │ -1.88212  │
│ 5   │ "E" │ 1.30658   │

│ Row │ ct  │ "A"   │"B"             | "C"         | "D"        | "E"
├───┼─────┼───────|──-------┤------------|--------------
│ 1   │ -1.17715     │   0.781145 | 0.74948|-1.88212 | 1.30658 

Michael Borregaard

unread,
Sep 5, 2016, 5:39:13 AM9/5/16
to julia-stats
I wonder if the functions are broken. This:
using DataFrames, RDatasets
iris = dataset("datasets", "iris")
iris[:id] = 1:size(iris, 1)
longdf = melt(iris, :id)
widedf = unstack(longdf, :id, :variable, :value)

from http://dataframesjl.readthedocs.io/en/latest/reshaping_and_pivoting.html fails with
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods
.
 
in setindex!(::DataArrays.DataArray{Float64,1}, ::String, ::Int64) at /Users/michael/.julia/v0.5/DataArrays/src/indexing.jl:217
 
in unstack(::DataFrames.DataFrame, ::Int64, ::Int64, ::Int64) at /Users/michael/.julia/v0.5/DataFrames/src/abstractdataframe/reshape.jl:183
 
in unstack(::DataFrames.DataFrame, ::Symbol, ::Symbol, ::Symbol) at /Users/michael/.julia/v0.5/DataFrames/src/abstractdataframe/reshape.jl:188

I have opened an issue at the repo.



Michael Borregaard

unread,
Sep 13, 2016, 3:50:11 AM9/13/16
to julia-stats
The functions in DataFrames appear to be working now.

To do what you want to do with unstack() you need to define an id variable, in this case (where you only want one row in the result) as a vector of identical values. Example:

using DataFrames
df
= DataFrame(ct = ["a", "b", "c", "d"], x = randn(4), id = ones(Int, 4))
unstack
(df, :id, :ct, :x)

Reply all
Reply to author
Forward
0 new messages