Fully agree with you here - surprised to see such huge performance degradation.
The complete definition is about 140 lines - too lengthy for the letter.
I will put all the 3 codes (fortran, Array based and SubArray based) somewhere and prepare a shorter testcase.
cat main_farr.jl.profile
elapsed time: 31.571042152 seconds
elapsed time: 32.855328249 seconds
30239 boot.jl; include; line: 238
30239 profile.jl; anonymous; line: 14
1 ...OGRAM0/main_farr.jl; main; line: 99
46 ...OGRAM0/main_farr.jl; main; line: 101
1 ...OGRAM0/main_farr.jl; main; line: 107
5 ...OGRAM0/main_farr.jl; main; line: 110
4 ...OGRAM0/main_farr.jl; main; line: 114
3 ...OGRAM0/main_farr.jl; main; line: 119
6497 ...OGRAM0/main_farr.jl; main; line: 121
26 float.jl; *; line: 136
24 subarray.jl; getindex; line: 176
32 subarray.jl; setindex!; line: 449
23678 ...OGRAM0/main_farr.jl; main; line: 126
63 float.jl; -; line: 134
21 float.jl; /; line: 138
141 subarray.jl; getindex; line: 176
29 subarray.jl; setindex!; line: 449
2 ...OGRAM0/main_farr.jl; main; line: 130
2 ...OGRAM0/main_farr.jl; main; line: 137
27 float.jl; *; line: 136
54 float.jl; -; line: 134
17 float.jl; /; line: 138
651 subarray.jl; getindex; line: 176
567 subarray.jl; setindex!; line: 449
Timing is different - another machine: 0.29s for the fast version Array-based version and 31s for slow SubArray-based.
I noticed that for the fast version there are no getindex, setindex calls in the profile log, looks like they were inlined and effectively eliminated as Julia has an efficient intrinsic Array support.
The difference between fast and slow codes (not full, just a few excerpts):
diff -u main.jl main_farr.jl |less
+
+function farray(T, r::Range1{Int}...)
+ dims = map((x) -> length(x), r)
+ array = Array(T, dims)
+ sub_indices = map((x) -> -minimum(x) + 2 : maximum(x), r)
+ sub(array, sub_indices)
+end
+
function main()
@@ -33,9 +41,13 @@
# & x(0:ncells),
# & flux(0:ncells)
- u = Array(Float64, ncells+4)
- x = Array(Float64, ncells+1)
- flux = Array(Float64, ncells+1)
+ u = farray(Float64, -2:ncells+1)
+ x = farray(Float64, 0:ncells)
+ flux = farray(Float64, 0:ncells)
[...]
# conservative difference
for ic=ifirst:ilast
- u[ic+3] -= (flux[ic+2]-flux[ic+1]) / (x[ic+2]-x[ic+1])
+ u[ic] -= (flux[ic+1]-flux[ic]) / (x[ic+1]-x[ic])
end
[...]
Thanks,