Float128

708 views
Skip to first unread message

Joachim Dahl

unread,
Jan 13, 2014, 4:16:42 AM1/13/14
to juli...@googlegroups.com
I would like to experiment with 128bit precision LAPACK calls from the mlapack library:

There are various posts on these forums expressing interest in Float128 and Float80, but I have not found any manifestations in Julia. For my needs, I would wrap a few mlapack functions using ccall,  but I need to figure out how to promote a given Float64 array to __float128 array (and converting the result back to Float64 afterwards).

Can anyone suggest a pedestrian approach for doing this? I don't need full Float128 support - just the ability to convert between Float64 and a (possibly opaque) structure representing __float128. 

Andreas Noack Jensen

unread,
Jan 14, 2014, 5:51:38 AM1/14/14
to juli...@googlegroups.com
Does it have to be float128? It appears that mlapack supports mpfr which is used by julia's BigFloat.


2014/1/13 Joachim Dahl <dahl.j...@gmail.com>



--
Med venlig hilsen

Andreas Noack Jensen

Joachim Dahl

unread,
Jan 14, 2014, 8:10:10 AM1/14/14
to juli...@googlegroups.com
I will try that. It's probably the easiest approach...

Stefan Karpinski

unread,
Jan 14, 2014, 9:54:16 AM1/14/14
to Julia Dev
It would, however, be nice to have Float128 support. We should probably work on that.

Jason Riedy

unread,
Jan 14, 2014, 5:09:36 PM1/14/14
to juli...@googlegroups.com
And Stefan Karpinski writes:
> It would, however, be nice to have Float128 support. We should
> probably work on that.

Another, related query: How might someone add an intrinsic?

I looked into adding fused multiply-add. I emulated the other
floating-point intrinsics (copysign is a close similarity),
and... ran into a Julia segfault on start. Right now, that's as
far as I've had time to go (battery ran out on the flight).

I'm looking at an intrinsic that calls out to the LLVM FMA
because that's how copysign is implemented. Is there a better
route? Or is there guidance on how to add an intrinsic?

This is related because high-performance implementations of
multiplied precisions (double double, quad double) really benefit
from FMA on platforms that support it. Julia would be a fun
environment to look into different vectorization forms and
optimizations. The current "state of the art" implementations
exhibited in a SC13 workshop used none of the known optimizations
(e.g. not re-normalizing the pair until necessary,
vector-striping reductions), likely because expressing them in a
language with no serious type manipulation support is a royal
pain. I suspect Julia could be convinced to infer types
sufficiently well, and it's more fun and approachable for
floating-point work than the other options.

Obviously, the second is somehow marking when the intrinsic is
worth using for performance in a way that Julia can optimize on
module loading. (There are algorithms where FMA is useful
regardless of its performance, so...)
--
Jason

Stefan Karpinski

unread,
Jan 14, 2014, 8:52:11 PM1/14/14
to Julia Dev
The place to add it would be in src/intrinsics.cpp. I wonder if LLVM ever optimizes fp mul + add into an fma or if the semantics are different enough that it has to be conservative and not do that. I'd imagine if it's safe, LLVM would do that well.

Jason Riedy

unread,
Jan 15, 2014, 12:08:21 PM1/15/14
to juli...@googlegroups.com
And Stefan Karpinski writes:
> The place to add it would be in src/intrinsics.cpp.

Thanks! I'll try again when I can.

> I wonder if LLVM ever optimizes fp mul + add into an fma or if
> the semantics are different enough that it has to be
> conservative and not do that. I'd imagine if it's safe, LLVM
> would do that well.

In x = a * b + c * d, what contraction is safe? What if that's
part of complex multiplication? The code generator can't really
know, hence all the contraction options in languages that
recognize FMAs. IIRC, by default LLVM relies on the upper level
to decide if FMA is appropriate.

For precisions like double double, one of the primitives for
writing multiplication is ridiculously faster if you have a fast
FMA (very rough sketch, there are complications about sign
flipping, macro-izing the conditional, etc.):

function two_prod{T<:FloatingPoint} (T a, T b)
err::T
p = a * b
if (have_fast_fma)
err = fused_multiply_subtract(a, b, p)
else
ad = split(a) # Split a's significand across two Ts in
bd = split(b) # two comparisons, five fp ops best case
err = ((ad.head * bd.head - p) + ad.head * bd.tail + ad.tail * bd.head) + ad.tail * bd.tail
end
return DoubledNumber(p, err)
end

(A primitive for addition that would be faster with a fused a+b-c
operation, but let's just say proposing that made chip folks
giggle. FMA lets them double apparent flop/s in GEMM, so they
like that...)
--
Jason

Steven G. Johnson

unread,
Jan 15, 2014, 2:22:56 PM1/15/14
to juli...@googlegroups.com
On Tuesday, January 14, 2014 8:52:11 PM UTC-5, Stefan Karpinski wrote:
The place to add it would be in src/intrinsics.cpp. I wonder if LLVM ever optimizes fp mul + add into an fma or if the semantics are different enough that it has to be conservative and not do that. I'd imagine if it's safe, LLVM would do that well.

Yes, similar to gcc, my understanding is that LLVM will automatically turn a*b+c into an FMA instruction on supporting CPUs.  See e.g. here:

     http://llvm.org/docs/CodeGenerator.html#selectiondag-select-phase

So we really shouldn't need to add an intrinsic to Julia for this.  In principle, you should just need to hand LLVM add/mul instructions that are parenthesized the correct way and it will do the right thing.  (A possible problem is that LLVM may rewrite your expression in a way that destroys the FMA.  This was a problem with gcc in the past, see e.g. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19988)

Jeff Bezanson

unread,
Jan 17, 2014, 3:17:55 PM1/17/14
to juli...@googlegroups.com
One thing to watch out for: the LLVM language reference describes what
types various operations are *valid* for, but in practice many cases
are not actually implemented and cause segfaults when triggered. Our
library code is peppered with work-arounds for these sorts of cases.

Simon Byrne

unread,
Jan 24, 2014, 4:55:30 AM1/24/14
to juli...@googlegroups.com
While I agree that generally is is best for FMAs to be handled by the compiler, some issues can arise with their indiscriminate use, particularly with regard to error guarantees. Higham gives some examples:

In particular, in julia the expression a*b+(c*d) is parsed the same as (a*b)+c*d, so it is difficult to enforce precedence.

Perhaps some sort of macro which could be placed next to an expression, and would flag whether or not it was FMA'd by the compiler?

(I tried to post something similar a couple of days ago, but it appears to have disappeared, so apologies if this ends up being a duplicate)
Reply all
Reply to author
Forward
0 new messages