And Stefan Karpinski writes:
> It would, however, be nice to have Float128 support. We should
> probably work on that.
Another, related query: How might someone add an intrinsic?
I looked into adding fused multiply-add. I emulated the other
floating-point intrinsics (copysign is a close similarity),
and... ran into a Julia segfault on start. Right now, that's as
far as I've had time to go (battery ran out on the flight).
I'm looking at an intrinsic that calls out to the LLVM FMA
because that's how copysign is implemented. Is there a better
route? Or is there guidance on how to add an intrinsic?
This is related because high-performance implementations of
multiplied precisions (double double, quad double) really benefit
from FMA on platforms that support it. Julia would be a fun
environment to look into different vectorization forms and
optimizations. The current "state of the art" implementations
exhibited in a SC13 workshop used none of the known optimizations
(e.g. not re-normalizing the pair until necessary,
vector-striping reductions), likely because expressing them in a
language with no serious type manipulation support is a royal
pain. I suspect Julia could be convinced to infer types
sufficiently well, and it's more fun and approachable for
floating-point work than the other options.
Obviously, the second is somehow marking when the intrinsic is
worth using for performance in a way that Julia can optimize on
module loading. (There are algorithms where FMA is useful
regardless of its performance, so...)
--
Jason