reshaping dataframe to make rows columns?

Reuben

unread,

Sep 4, 2016, 3:49:35 PM9/4/16

to julia-stats

Seems like this should be easy, probably is, but I cannot figure out how to get it to work with the melt / stack functions. The reason I want to is, I'm finding that with the data I have, I search for a the row where the string in column 1 == myparameter, then I get the values for columns 2-6, clumsily extract them into an array (because a dataframe row is not a dataArray), then apply the mean function to this array. It would be a lot simpler to have column names that were based on my parameter values. Then i could say "mean(df[:parameter][2:6])" and be done.

I suspect I am missing out on how to use dataframes to make this easy; can someone point me in the right direction?

-Reuben

Andreas Noack

unread,

Sep 4, 2016, 4:28:07 PM9/4/16

to julia...@googlegroups.com

Would the by function help here? E.g. something like

julia> df = DataFrame(ct = vcat(fill("A", 3), fill("B", 5)), x = randn(8))

8×2 DataFrames.DataFrame

│ Row │ ct │ x │

├─────┼─────┼───────────┤

│ 1 │ "A" │ -1.17715 │

│ 2 │ "A" │ 0.781145 │

│ 3 │ "A" │ 0.74948 │

│ 4 │ "B" │ -1.88212 │

│ 5 │ "B" │ 1.30658 │

│ 6 │ "B" │ -0.578074 │

│ 7 │ "B" │ -0.710504 │

│ 8 │ "B" │ -1.98858 │

julia> by(df, :ct, d -> mean(d[:x]))

2×2 DataFrames.DataFrame

│ Row │ ct │ x1 │

├─────┼─────┼───────────┤

│ 1 │ "A" │ 0.117825 │

│ 2 │ "B" │ -0.770539 │

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reuben

unread,

Sep 4, 2016, 7:18:20 PM9/4/16

to julia-stats

That isn't exactly what I'm looking for...what i'm trying to do is transform something like the first one here to the second:

│ Row │ ct │ x │

├─────┼─────┼───────────┤

│ 1 │ "A" │ -1.17715 │

│ 2 │ "B" │ 0.781145 │

│ 3 │ "C" │ 0.74948 │

│ 4 │ "D" │ -1.88212 │

│ 5 │ "E" │ 1.30658 │

│ Row │ ct │ "A" │"B" | "C" | "D" | "E"

├───┼─────┼───────|──-------┤------------|--------------

│ 1 │ -1.17715 │ 0.781145 | 0.74948|-1.88212 | 1.30658

Michael Borregaard

unread,

Sep 5, 2016, 5:39:13 AM9/5/16

to julia-stats

I wonder if the functions are broken. This:

using DataFrames, RDatasets
iris = dataset("datasets", "iris")
iris[:id] = 1:size(iris, 1)
longdf = melt(iris, :id)
widedf = unstack(longdf, :id, :variable, :value)

from http://dataframesjl.readthedocs.io/en/latest/reshaping_and_pivoting.html fails with

ERROR: MethodError: Cannot `convert` an object of type String to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.
 in setindex!(::DataArrays.DataArray{Float64,1}, ::String, ::Int64) at /Users/michael/.julia/v0.5/DataArrays/src/indexing.jl:217
 in unstack(::DataFrames.DataFrame, ::Int64, ::Int64, ::Int64) at /Users/michael/.julia/v0.5/DataFrames/src/abstractdataframe/reshape.jl:183
 in unstack(::DataFrames.DataFrame, ::Symbol, ::Symbol, ::Symbol) at /Users/michael/.julia/v0.5/DataFrames/src/abstractdataframe/reshape.jl:188

I have opened an issue at the repo.

Michael Borregaard

unread,

Sep 13, 2016, 3:50:11 AM9/13/16

to julia-stats

The functions in DataFrames appear to be working now.

To do what you want to do with unstack() you need to define an id variable, in this case (where you only want one row in the result) as a vector of identical values. Example:

using DataFrames
df = DataFrame(ct = ["a", "b", "c", "d"], x = randn(4), id = ones(Int, 4))
unstack(df, :id, :ct, :x)

Reply all

Reply to author

Forward