julia> df = DataFrame(a=1:5, b=7:11, c=10:14)
5x3 DataFrames.DataFrame
| Row | a | b | c |
|-----|---|----|----|
| 1 | 1 | 7 | 10 |
| 2 | 2 | 8 | 11 |
| 3 | 3 | 9 | 12 |
| 4 | 4 | 10 | 13 |
| 5 | 5 | 11 | 14 |
julia> colwise(mean,df)
3-element Array{Any,1}:
[3.0]
[9.0]
[12.0]
julia> colwise(mean,df[1,1:2])
2-element Array{Any,1}:
[1.0]
[7.0]
julia> mean(convert(Array,df[1,1:3]))
6.0
You can try `eachrow`. It probably won't be fast, though. Here's an example:
https://github.com/JuliaStats/DataFrames.jl/blob/master/test/iteration.jl#L34
julia> for r in eachrow(df)
println(mean(convert(Array,r)))
end
6.0
7.0
8.0
9.0
10.0
julia> for r in eachrow(df)
println(mean(convert(Array,r)))
end
6.0
7.0
8.0
9.0
10.0
julia> for r in eachrow(df)
println(mean(convert(Array,r[1:2])))
end
WARNING: [a] concatenation is deprecated; use collect(a) instead
in depwarn at deprecated.jl:73
in oldstyle_vcat_warning at ./abstractarray.jl:29
[inlined code] from none:2
in anonymous at no file:0
while loading no file, in expression starting on line 0
4.0
For the subset, do the indexing after the conversion to an array, or subset the DataFrame first (probably faster).
julia> df = DataFrame(a=1:5, b=7:11, c=10:14, d=20:24)
5x4 DataFrames.DataFrame
| Row | a | b | c | d |
|-----|---|----|----|----|
| 1 | 1 | 7 | 10 | 20 |
| 2 | 2 | 8 | 11 | 21 |
| 3 | 3 | 9 | 12 | 22 |
| 4 | 4 | 10 | 13 | 23 |
| 5 | 5 | 11 | 14 | 24 |
julia> df1 = df[1:2,]
5x2 DataFrames.DataFrame
| Row | a | b |
|-----|---|----|
| 1 | 1 | 7 |
| 2 | 2 | 8 |
| 3 | 3 | 9 |
| 4 | 4 | 10 |
| 5 | 5 | 11 |
julia> df1 = df[3:4,]
5x2 DataFrames.DataFrame
| Row | c | d |
|-----|----|----|
| 1 | 10 | 20 |
| 2 | 11 | 21 |
| 3 | 12 | 22 |
| 4 | 13 | 23 |
| 5 | 14 | 24 |
julia> for r1,r2 in eachrow(df1, df2)
println(mean(r1,r2))
end
ERROR: syntax: invalid iteration specificationjulia> for r1,r2 in eachrow(df1, df2)
println(TTest(r1,r2))
end
ERROR: syntax: invalid iteration specificationI'd convert the whole DataFrame to a matrix and use a loop over rows.
Contributions/pull requests from folks that need that are welcome. I don't have that need. For row operations, I can generally get by with loops or `@byrow!` in DataFramesMeta.