julia> timeit(1000,1000)
GFlop = 1.2292170133468385
GFlop (SIMD) = 1.5351220575547964Julia Version 0.4.0
Commit 0ff703b* (2015-10-08 06:20 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin15.0.0)
CPU: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
WORD_SIZE: 64
BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3You can check with `code_llvm(innersimd,
Tuple{Vector{Float32},Vector{Float32}})`
julia> code_llvm(innersimd, Tuple{Vector{Float32},Vector{Float32}})
define float @julia_innersimd_21674(%jl_value_t*, %jl_value_t*) {
L:
%2 = bitcast %jl_value_t* %0 to %jl_array_t*
%3 = getelementptr inbounds %jl_array_t* %2, i32 0, i32 1
%4 = load i64* %3
%5 = icmp sle i64 1, %4
%6 = xor i1 %5, true
%7 = select i1 %6, i64 0, i64 %4
%8 = insertvalue %UnitRange.1 { i64 1, i64 undef }, i64 %7, 1
%9 = extractvalue %UnitRange.1 %8, 1
%10 = load %jl_value_t** @jl_overflow_exception
%11 = call { i64, i1 } @llvm.ssub.with.overflow.i64(i64 %9, i64 1)
%12 = extractvalue { i64, i1 } %11, 1
%13 = xor i1 %12, true
br i1 %13, label %pass, label %fail
fail: ; preds = %L
call void @jl_throw_with_superfluous_argument(%jl_value_t* %10, i32 67)
unreachable
pass: ; preds = %L
%14 = extractvalue { i64, i1 } %11, 0
%15 = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %14, i64 1)
%16 = extractvalue { i64, i1 } %15, 1
%17 = xor i1 %16, true
br i1 %17, label %pass2, label %fail1
fail1: ; preds = %pass
call void @jl_throw_with_superfluous_argument(%jl_value_t* %10, i32 67)
unreachable
pass2: ; preds = %pass
%18 = extractvalue { i64, i1 } %15, 0
%19 = icmp slt i64 0, %18
%20 = xor i1 %19, true
br i1 %20, label %L11, label %L5.preheader
L5.preheader: ; preds = %pass2
%sunkaddr = ptrtoint %jl_value_t* %0 to i64
%sunkaddr19 = inttoptr i64 %sunkaddr to i8**
%21 = load i8** %sunkaddr19
%sunkaddr20 = ptrtoint %jl_value_t* %1 to i64
%sunkaddr21 = inttoptr i64 %sunkaddr20 to i8**
%22 = load i8** %sunkaddr21
br label %L5
L5: ; preds = %L5, %L5.preheader
%lsr.iv16 = phi i8* [ %22, %L5.preheader ], [ %scevgep17, %L5 ]
%lsr.iv = phi i8* [ %21, %L5.preheader ], [ %scevgep, %L5 ]
%"##i#7153.0" = phi i64 [ %27, %L5 ], [ 0, %L5.preheader ]
%s.1 = phi float [ %26, %L5 ], [ 0.000000e+00, %L5.preheader ]
%lsr.iv1618 = bitcast i8* %lsr.iv16 to float*
%lsr.iv15 = bitcast i8* %lsr.iv to float*
%23 = load float* %lsr.iv15
%24 = load float* %lsr.iv1618
%25 = fmul float %23, %24
%26 = fadd fast float %s.1, %25
%27 = add i64 %"##i#7153.0", 1
%scevgep = getelementptr i8* %lsr.iv, i64 4
%scevgep17 = getelementptr i8* %lsr.iv16, i64 4
%28 = icmp slt i64 %27, %18
br i1 %28, label %L5, label %L11
L11: ; preds = %L5, %pass2
%s.3 = phi float [ 0.000000e+00, %pass2 ], [ %26, %L5 ]
ret float %s.3
}
I have limited understanding of the process, but believe there is some compilation involved.
I am pretty sure must something specific to your installation.
On Nov 6, 2015, at 10:35 AM, DNF <oyv...@gmail.com> wrote:Thanks for the feedback. It seems like this is not a problem for most.If anyone has even the faintest clue where I could start looking for a solution to this, I would be grateful. Perhaps there is some software I could run that would detect hardware problems, or maybe I am missing software dependencies of some kind? What could I even google for? All my searches just seem to bring up general info about SIMD, nothing like what I'm describing.
I install using homebrew from here: https://github.com/staticfloat/homebrew-juliaI have limited understanding of the process, but believe there is some compilation involved.
Julia Version 0.4.0
Commit 0ff703b* (2015-10-08 06:20 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin15.0.0)julia> timeit(1000,1000)GFlop = 2.3913033081289967GFlop (SIMD) = 2.2694726426420293
julia> versioninfo()Julia Version 0.4.1-pre+22Commit 669222e* (2015-11-01 00:06 UTC)Platform Info: System: Darwin (x86_64-apple-darwin14.5.0) CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-svn$ brew rm gcc openblas-julia suite-sparse-julia arpack-julia
$ brew install gcc openblas-julia suite-sparse-julia arpack-julia
GFlop = 0.39122184254142406
GFlop (SIMD) = 1.7337076986157214
julia> code_llvm(innersimd, Tuple{Vector{Float32},Vector{Float32}})
define float @julia_innersimd_21416(%jl_value_t*, %jl_value_t*) {
L:
%2 = bitcast %jl_value_t* %0 to %jl_array_t*
%3 = getelementptr inbounds %jl_array_t* %2, i32 0, i32 1
%4 = load i64* %3
%5 = icmp sle i64 1, %4
%6 = xor i1 %5, true
%7 = select i1 %6, i64 0, i64 %4
%8 = insertvalue %UnitRange.1 { i64 1, i64 undef }, i64 %7, 1
%9 = extractvalue %UnitRange.1 %8, 1
%10 = load %jl_value_t** @jl_overflow_exception
%11 = call { i64, i1 } @llvm.ssub.with.overflow.i64(i64 %9, i64 1)
%12 = extractvalue { i64, i1 } %11, 1
%13 = xor i1 %12, true
br i1 %13, label %pass, label %fail
fail: ; preds = %L
call void @jl_throw_with_superfluous_argument(%jl_value_t* %10, i32 67)
unreachable
pass: ; preds = %L
%14 = extractvalue { i64, i1 } %11, 0
%15 = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %14, i64 1)
%16 = extractvalue { i64, i1 } %15, 1
%17 = xor i1 %16, true
br i1 %17, label %pass2, label %fail1
fail1: ; preds = %pass
call void @jl_throw_with_superfluous_argument(%jl_value_t* %10, i32 67)
unreachable
pass2: ; preds = %pass
%18 = extractvalue { i64, i1 } %15, 0
%19 = icmp slt i64 0, %18
%20 = xor i1 %19, true
br i1 %20, label %L11, label %L5.preheader
L5.preheader: ; preds = %pass2
%sunkaddr = ptrtoint %jl_value_t* %0 to i64
%sunkaddr19 = inttoptr i64 %sunkaddr to i8**
%21 = load i8** %sunkaddr19
%sunkaddr20 = ptrtoint %jl_value_t* %1 to i64
%sunkaddr21 = inttoptr i64 %sunkaddr20 to i8**
%22 = load i8** %sunkaddr21
br label %L5
L5: ; preds = %L5, %L5.preheader
%lsr.iv16 = phi i8* [ %22, %L5.preheader ], [ %scevgep17, %L5 ]
%lsr.iv = phi i8* [ %21, %L5.preheader ], [ %scevgep, %L5 ]
%"##i#7098.0" = phi i64 [ %27, %L5 ], [ 0, %L5.preheader ]
%s.1 = phi float [ %26, %L5 ], [ 0.000000e+00, %L5.preheader ]
%lsr.iv1618 = bitcast i8* %lsr.iv16 to float*
%lsr.iv15 = bitcast i8* %lsr.iv to float*
%23 = load float* %lsr.iv15
%24 = load float* %lsr.iv1618
%25 = fmul float %23, %24
%26 = fadd fast float %s.1, %25
%27 = add i64 %"##i#7098.0", 1
%scevgep = getelementptr i8* %lsr.iv, i64 4
%scevgep17 = getelementptr i8* %lsr.iv16, i64 4
%28 = icmp slt i64 %27, %18
br i1 %28, label %L5, label %L11
L11: ; preds = %L5, %pass2
%s.3 = phi float [ 0.000000e+00, %pass2 ], [ %26, %L5 ]
ret float %s.3
}
julia> code_llvm(inner, Tuple{Vector{Float32},Vector{Float32}})
define float @julia_inner_21415(%jl_value_t*, %jl_value_t*) {
top:
%2 = bitcast %jl_value_t* %0 to %jl_array_t*
%3 = getelementptr inbounds %jl_array_t* %2, i32 0, i32 1
%4 = load i64* %3
%5 = icmp sle i64 1, %4
%6 = xor i1 %5, true
%7 = select i1 %6, i64 0, i64 %4
%8 = insertvalue %UnitRange.1 { i64 1, i64 undef }, i64 %7, 1
%9 = extractvalue %UnitRange.1 %8, 1
%10 = add i64 %9, 1
%11 = icmp eq i64 1, %10
br i1 %11, label %L3, label %L.preheader
L.preheader: ; preds = %top
%12 = bitcast %jl_value_t* %0 to %jl_array_t*
%13 = bitcast %jl_array_t* %12 to i8**
%14 = load i8** %13
%15 = bitcast %jl_value_t* %1 to %jl_array_t*
%16 = bitcast %jl_array_t* %15 to i8**
%17 = load i8** %16
%18 = add i64 %9, -1
br label %L
L: ; preds = %L, %L.preheader
%lsr.iv6 = phi i8* [ %14, %L.preheader ], [ %scevgep7, %L ]
%lsr.iv4 = phi i8* [ %17, %L.preheader ], [ %scevgep, %L ]
%lsr.iv = phi i64 [ %18, %L.preheader ], [ %lsr.iv.next, %L ]
%s.0 = phi float [ %22, %L ], [ 0.000000e+00, %L.preheader ]
%lsr.iv68 = bitcast i8* %lsr.iv6 to float*
%lsr.iv45 = bitcast i8* %lsr.iv4 to float*
%19 = load float* %lsr.iv68
%20 = load float* %lsr.iv45
%21 = fmul float %19, %20
%22 = fadd float %s.0, %21
%23 = icmp eq i64 %lsr.iv, 0
%24 = xor i1 %23, true
%lsr.iv.next = add i64 %lsr.iv, -1
%scevgep = getelementptr i8* %lsr.iv4, i64 4
%scevgep7 = getelementptr i8* %lsr.iv6, i64 4
br i1 %24, label %L, label %L3
L3: ; preds = %L, %top
%s.1 = phi float [ 0.000000e+00, %top ], [ %22, %L ]
ret float %s.1
}
<simd.jl>
On Nov 6, 2015, at 5:35 PM, Rob J. Goedman <goe...@icloud.com> wrote:
Thanks Seth,That's the end of my first attempt to figure out what’s happening here. Back to the drawing board!Regards,Rob
On Nov 6, 2015, at 4:53 PM, Seth <catc...@bromberger.com> wrote:
Hi Rob,I built it (and openblas) myself (via git clone) since I'm testing out Cxx.jl. Xcode is Version 7.1 (7B91b).Seth.
On Friday, November 6, 2015 at 3:54:04 PM UTC-8, Rob J Goedman wrote:
Seth,You must have built Julia 0.4.1-pre yourself. Did you use brew?It looks like you are on Yosemite and picked up a newer libLLVM. Which Xcode are you using?In the Julia.rb formula there is a test ENV.compiler, could it be clang is not being used?Rob
On Nov 6, 2015, at 3:01 PM, Seth <catc...@bromberger.com> wrote:
For what it's worth, I'm getting
julia> timeit(1000,1000)GFlop = 2.3913033081289967
GFlop (SIMD) = 2.2694726426420293julia> versioninfo()Julia Version 0.4.1-pre+22
Commit 669222e* (2015-11-01 00:06 UTC)Platform Info:
System: Darwin (x86_64-apple-darwin14.5.0)CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
WORD_SIZE: 64BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)LAPACK: libopenblas64_LIBM: libopenlibmLLVM: libLLVM-svn
so it doesn't look like I'm taking advantage of simd either. :(
On Friday, November 6, 2015 at 11:43:41 AM UTC-8, Rob J Goedman wrote:
Hi DNF,In below versioninfo’s only libopenblas appears different. You installed using brew. The first thing I would try is to execute the steps under Common Issues listed on https://github.com/staticfloat/homebrew-julia. A bit further down on that site there is also some additional openblas related info.Rob
On Nov 6, 2015, at 10:35 AM, DNF <oyv...@gmail.com> wrote:Thanks for the feedback. It seems like this is not a problem for most.If anyone has even the faintest clue where I could start looking for a solution to this, I would be grateful. Perhaps there is some software I could run that would detect hardware problems, or maybe I am missing software dependencies of some kind? What could I even google for? All my searches just seem to bring up general info about SIMD, nothing like what I'm describing.
On Friday, November 6, 2015 at 12:15:47 AM UTC+1, DNF wrote:I install using homebrew from here: https://github.com/staticfloat/homebrew-juliaI have limited understanding of the process, but believe there is some compilation involved.
Julia Version 0.4.0Commit 0ff703b* (2015-10-08 06:20 UTC)Platform Info:
System: Darwin (x86_64-apple-darwin13.4.0)CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
WORD_SIZE: 64BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_LIBM: libopenlibmLLVM: libLLVM-3.3
Julia Version 0.4.0
Commit 0ff703b* (2015-10-08 06:20 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin15.0.0)
Thanks a lot. That indeed works.
buf = IOBuffer()
n = 10000
reps = 1000for T in (Int16,Int32,Int64,Float32,Float64)
code_llvm(buf, innersimd, Tuple{Vector{T},Vector{T}})
println(T, " ", contains(takebuf_string(buf), "vector.body"))
timeit(T, n, reps)
end
Int16 true
GFlop Int16 = 14.329049425190183
GFlop Int16 (SIMD) = 14.64120268695352
Int32 true
GFlop Int32 = 4.339303129613899
GFlop Int32 (SIMD) = 4.436321047681579
Int64 false
GFlop Int64 = 2.1942537759816103
GFlop Int64 (SIMD) = 2.195101499298226
Float32 true
GFlop Float32 = 2.1954870446504984
GFlop Float32 (SIMD) = 7.82465266366826
Float64 true
GFlop Float64 = 2.171535919755667
GFlop Float64 (SIMD) = 4.0068798126383begin
Expr(:meta, :simd)
s = 0
s += x[1]*y[1]
s += x[2]*y[2]
s += x[3]*y[3]
s += x[4]*y[4]
s += x[5]*y[5]
s += x[6]*y[6]
s += x[7]*y[7]
s += x[8]*y[8]
end
x = rand(Float32,n)::Array{Float32,1}
y = rand(Float32,n)::Array{Float32,1}
s = zero(Float64)::Float64
I have been looking through the performance tips section of the manual. Specifically, I am curious about @simd (http://docs.julialang.org/en/release-0.4/manual/performance-tips/#performance-annotations).When I cut and paste the code demonstrating the @simd macro, I don't get substantial speedups. Before updating from OSX Yosemite to El Capitan, I saw no speedup whatsoever. After the update, there is a small speedup (I ran the example repeatedly):julia> timeit(1000,1000)
GFlop = 1.2292170133468385
GFlop (SIMD) = 1.5351220575547964This contrasts sharply to the example in the documentation which shows a speedup from 1.95GFlop to 17.6GFlop.Does my computer not have simd? How can I tell?This is my versioninfo: