f(x) = 2.0x + 3.0g(x) = muladd(x,2.0, 3.0)h(x) = fma(x,2.0, 3.0)
@code_llvm f(4.0)@code_llvm g(4.0)@code_llvm h(4.0)
@code_native f(4.0)@code_native g(4.0)@code_native h(4.0)Julia Version 0.5.0-rc4+0Commit 9c76c3e* (2016-09-09 01:43 UTC)Platform Info: System: Linux (x86_64-redhat-linux) CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz WORD_SIZE: 64 BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblasp.so.0 LIBM: libopenlibm LLVM: libLLVM-3.7.1 (ORCJIT, broadwell)[crackauc@crackauc2 ~]$ lscpuArchitecture: x86_64CPU op-mode(s): 32-bit, 64-bitByte Order: Little EndianCPU(s): 16On-line CPU(s) list: 0-15Thread(s) per core: 1Core(s) per socket: 8Socket(s): 2NUMA node(s): 2Vendor ID: GenuineIntelCPU family: 6Model: 79Model name: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHzStepping: 1CPU MHz: 1200.000BogoMIPS: 6392.58Virtualization: VT-xL1d cache: 32KL1i cache: 32KL2 cache: 256KL3 cache: 25600KNUMA node0 CPU(s): 0-7NUMA node1 CPU(s): 8-15
define double @julia_f_72025(double) #0 {top: %1 = fmul double %0, 2.000000e+00 %2 = fadd double %1, 3.000000e+00 ret double %2}
define double @julia_g_72027(double) #0 {top: %1 = call double @llvm.fmuladd.f64(double %0, double 2.000000e+00, double 3.000000e+00) ret double %1}
define double @julia_h_72029(double) #0 {top: %1 = call double @llvm.fma.f64(double %0, double 2.000000e+00, double 3.000000e+00) ret double %1} .textFilename: fmatest.jl pushq %rbp movq %rsp, %rbpSource line: 1 addsd %xmm0, %xmm0 movabsq $139916162906520, %rax # imm = 0x7F40C5303998 addsd (%rax), %xmm0 popq %rbp retq nopl (%rax,%rax) .textFilename: fmatest.jl pushq %rbp movq %rsp, %rbpSource line: 2 addsd %xmm0, %xmm0 movabsq $139916162906648, %rax # imm = 0x7F40C5303A18 addsd (%rax), %xmm0 popq %rbp retq nopl (%rax,%rax) .textFilename: fmatest.jl pushq %rbp movq %rsp, %rbp movabsq $139916162906776, %rax # imm = 0x7F40C5303A98Source line: 3 movsd (%rax), %xmm1 # xmm1 = mem[0],zero movabsq $139916162906784, %rax # imm = 0x7F40C5303AA0 movsd (%rax), %xmm2 # xmm2 = mem[0],zero movabsq $139925776008800, %rax # imm = 0x7F43022C8660 popq %rbp jmpq *%rax nopl (%rax)Julia Version 0.6.0-dev.557
Commit c7a4897 (2016-09-08 17:50 UTC)Platform Info: System: NT (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.7.1 (ORCJIT, haswell); Function Attrs: uwtabledefine double @julia_f_66153(double) #0 {top: %1 = fmul double %0, 2.000000e+00 %2 = fadd double %1, 3.000000e+00 ret double %2}
; Function Attrs: uwtabledefine double @julia_g_66157(double) #0 {top: %1 = call double @llvm.fmuladd.f64(double %0, double 2.000000e+00, double 3.000000e+00) ret double %1}
; Function Attrs: uwtabledefine double @julia_h_66158(double) #0 {top: %1 = call double @llvm.fma.f64(double %0, double 2.000000e+00, double 3.000000e+00) ret double %1} .textFilename: console pushq %rbp movq %rsp, %rbpSource line: 1 addsd %xmm0, %xmm0 movabsq $534749456, %rax # imm = 0x1FDFA110 addsd (%rax), %xmm0 popq %rbp retq nopl (%rax,%rax) .textFilename: console pushq %rbp movq %rsp, %rbpSource line: 2 addsd %xmm0, %xmm0 movabsq $534749584, %rax # imm = 0x1FDFA190 addsd (%rax), %xmm0 popq %rbp retq nopl (%rax,%rax) .textFilename: console pushq %rbp movq %rsp, %rbp movabsq $534749712, %rax # imm = 0x1FDFA210Source line: 3 movsd dcabs164_(%rax), %xmm1 # xmm1 = mem[0],zero movabsq $534749720, %rax # imm = 0x1FDFA218 movsd (%rax), %xmm2 # xmm2 = mem[0],zero movabsq $fma, %rax popq %rbp jmpq *%rax nopHi,First of all, does LLVM essentially fma or muladd expressions like `a1*x1 + a2*x2 + a3*x3 + a4*x4`? Or is it required that one explicitly use `muladd` and `fma` on these types of instructions (is there a macro for making this easier)?
Secondly, I am wondering if my setup is no applying these operations correctly. Here's my test code:f(x) = 2.0x + 3.0g(x) = muladd(x,2.0, 3.0)h(x) = fma(x,2.0, 3.0)@code_llvm f(4.0)@code_llvm g(4.0)@code_llvm h(4.0)@code_native f(4.0)@code_native g(4.0)@code_native h(4.0)
Computer 1Julia Version 0.5.0-rc4+0Commit 9c76c3e* (2016-09-09 01:43 UTC)Platform Info:System: Linux (x86_64-redhat-linux)CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHzWORD_SIZE: 64BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)LAPACK: libopenblasp.so.0LIBM: libopenlibmLLVM: libLLVM-3.7.1 (ORCJIT, broadwell)
On Wednesday, September 21, 2016 at 5:56:45 AM UTC, Chris Rackauckas wrote:Julia Version 0.5.0-rc4+0
Hi,First of all, does LLVM essentially fma or muladd expressions like `a1*x1 + a2*x2 + a3*x3 + a4*x4`? Or is it required that one explicitly use `muladd` and `fma` on these types of instructions (is there a macro for making this easier)?
Secondly, I am wondering if my setup is no applying these operations correctly. Here's my test code:
I'm not seeing `@fastmath` apply fma/muladd. I rebuilt the sysimg and now I get results where g and h apply muladd/fma in the native code, but a new function k which is `@fastmath` inside of f does not apply muladd/fma.
julia> k(x) = @fastmath 2.4x + 3.0WARNING: Method definition k(Any) in module Main at REPL[14]:1 overwritten at REPL[23]:1.k (generic function with 1 method)
julia> @code_llvm k(4.0)
; Function Attrs: uwtabledefine double @julia_k_66737(double) #0 {top: %1 = fmul fast double %0, 2.400000e+00 %2 = fadd fast double %1, 3.000000e+00 ret double %2}
julia> @code_native k(4.0) .textFilename: REPL[23] pushq %rbp movq %rsp, %rbp movabsq $568231032, %rax # imm = 0x21DE8478Source line: 1 vmulsd (%rax), %xmm0, %xmm0 movabsq $568231040, %rax # imm = 0x21DE8480 vaddsd (%rax), %xmm0, %xmm0 popq %rbp retq nopw %cs:(%rax,%rax)