simultaneous sin and cos: sincos in julia from openlibm?

497 views
Skip to first unread message

Ken B

unread,
Jul 27, 2014, 3:25:11 PM7/27/14
to julia...@googlegroups.com
Hi,

I want to calculated sine and cosine together of the same angle. I saw this function is implemented in openlibm, but is it available in julia and how?

https://github.com/JuliaLang/openlibm/blob/18f475de56ec7b478b9220a5f28eb9a23cb51d96/src/s_sincos.c

Thanks!
Ken

Isaiah Norton

unread,
Jul 27, 2014, 3:32:06 PM7/27/14
to julia...@googlegroups.com
It doesn't appear to be wrapped, but you can call it yourself like this:

julia> sincos(x) = begin psin = Cdouble[0]; pcos = Cdouble[0]; ccall(:sincos, Void, (Cdouble, Ptr{Cdouble}, Ptr{Cdouble}), x, psin, pcos); (psin[1], pcos[1]); end
sincos (generic function with 1 method)

julia> sincos(pi)
(1.2246467991473532e-16,-1.0)

Feel free to open an issue or pull request if you think it should be exported - might have just been an oversight.

Viral Shah

unread,
Jul 27, 2014, 9:26:39 PM7/27/14
to julia...@googlegroups.com
Is sincos a standard libm function?

Also, I wonder if creating the one entry array is too expensive, and if we should just call sin and cos separately. The vectorized version may be able to benefit from calling sincos directly.

-viral

Ken Bastiaensen

unread,
Jul 28, 2014, 12:21:30 AM7/28/14
to julia...@googlegroups.com
Thank you for your answers. Yes, it seems sincos is slower than calling (sin, cos) at the moment.
I don't know about libm standards, but if it's available in openlibm, why not export it?

Ken

Stefan Karpinski

unread,
Jul 28, 2014, 12:27:00 AM7/28/14
to julia...@googlegroups.com
Is there any reason to think that sin and cos can be computed jointly more efficiently than computing each one independently?

Elliot Saba

unread,
Jul 28, 2014, 12:43:27 AM7/28/14
to julia...@googlegroups.com
Yeah, there is some work that can be done in common.  Checking to see in what multiple of pi/4 the argument is in, etc...  The actual polynomial evaluation must be done separately I think.  I believe I wrote our current sincos() implementation, which supposedly is more efficient when operating on floats and doubles, but just falls back to sin() and cos() for long double.
-E

Stuart Brorson

unread,
Jul 28, 2014, 7:33:41 AM7/28/14
to julia...@googlegroups.com
On Sun, 27 Jul 2014, Viral Shah wrote:

> Is sincos a standard libm function?

Out of curiosity I looked into sincos since I had never heard of it.
A quick check shows there's no sincos in fdlibm
(on netlib). However, a little Googling reveals an old Sun math
library libsunmath seems to implement it.

I did find a couple of libm variants which implemented sincos.
However, they simply called sin & cos separately. As Stephan says
upthread, no performance improvement.

As far as I know, sin & cos are usually computed using mod to fold
the input x down to the first quadrant, and then using a power series
(needs only 6 or 8 terms IIRC) to compute the function. Perhaps
libsunmath computed e.g. cos first, and then did sin = sqrt(1 -
cos^2)? Taking the sqrt seems non-performant compared to evaluating a
short power series, but maybe they had a reason? Another thought: sin
and cos are reflections of each other (over the line x = pi/4) in the
first quadrant. Perhaps there some other clever way to get sin from
cos? I couldn't think if any in the short time I spent considering
it.

Stuart

Kevin Squire

unread,
Jul 28, 2014, 8:34:53 AM7/28/14
to julia...@googlegroups.com
This paper seems relevant, though possibly only for 32-bit:

http://hal.archives-ouvertes.fr/docs/00/67/23/27/PDF/Jeannerod-JourdanLu.pdf

Cheers,
   Kevin

Simon Byrne

unread,
Jul 28, 2014, 9:35:21 AM7/28/14
to julia...@googlegroups.com
I've often wondered this myself. As I understand it, the purpose of the sincos function was to call the FSINCOS assembly instruction for x87 FPU. On modern processors however, it is generally acknowledged that calling a well-written math library compiled to use SSE instructions is typically faster (and can be more accurate) than using x87 trig instructions. See this discussion:

Calling sincos using Isaiah's method seems to be about 9 times slower than calling the sin and cos separately:
though a lot of this seems to be due to the overhead in creating and destroying the arrays for return values.

Stuart Brorson

unread,
Jul 28, 2014, 10:13:41 AM7/28/14
to julia...@googlegroups.com
A bell rang in the back of my head as I was on my way to work this
morning. I was thinking about sincos again, and remembered something
about CORDIC algorithms from the distant past. These are add and
shift algorithms used to compute certain trig and other elementary
functions. They were very popular for scientific calculators back when
hand-held calculators were new since they are easily implementable in
hardware, and don't require floating point multiply. A couple of
references:

http://en.wikipedia.org/wiki/CORDIC
http://ww1.microchip.com/downloads/en/AppNotes/01061A.pdf

It appears that a common CORDIC computation will product sin & cos
together. I'll bet dollars to doughnuts (without actually knowing)
that the x87 assembly instruction mentioned below was doing a CORDIC
computation, and it made sense to return both sin & cos since
they were computed together.

The paper by Jeannerod & JourdanLu refer to CORDIC methods, but is
apparently an extension, as far as I can tell.

Stuart



On Mon, 28 Jul 2014, Simon Byrne wrote:

> I've often wondered this myself. As I understand it, the purpose of the
> sincos function was to call the FSINCOS assembly instruction for x87 FPU.
> On modern processors however, it is generally acknowledged that calling a
> well-written math library compiled to use SSE instructions is typically
> faster (and can be more accurate) than using x87 trig instructions. See
> this discussion:
> http://stackoverflow.com/questions/12485190/calling-fsincos-instruction-in-llvm-slower-than-calling-libc-sin-cos-functions
>
> Calling sincos using Isaiah's method seems to be about 9 times slower than
> calling the sin and cos separately:
> https://gist.github.com/734dcacde1f107397b3b.git
> though a lot of this seems to be due to the overhead in creating and
> destroying the arrays for return values.
>
>
> On Monday, 28 July 2014 13:34:53 UTC+1, Kevin Squire wrote:
>>
>> This paper seems relevant, though possibly only for 32-bit:
>>
>>
>> http://hal.archives-ouvertes.fr/docs/00/67/23/27/PDF/Jeannerod-JourdanLu.pdf
>>
>> Cheers,
>> Kevin
>>
>> On Monday, July 28, 2014, Stuart Brorson <s...@cloud9.net <javascript:>>

Stefan Karpinski

unread,
Jul 28, 2014, 11:18:50 AM7/28/14
to Julia Users
This CORDIC explanation seems quite plausible. However, it seems like we probably don't want to use this algorithm, which means that we're not at this point able to get any kind of speedup for sincos(x) relative to sin(x), cos(x).

Stuart Brorson

unread,
Jul 28, 2014, 11:29:05 AM7/28/14
to Julia Users
Yup, you certainly don't want to use CORDIC. My e-mail is just a
historical note.

The standard (fdlibm) implementation of sin & cos involves
summing a polynomial with 6 or 8 coefficients (after folding down to
the first quadrant). Here's the kernel:

http://www.netlib.org/fdlibm/k_sin.c

This impl seems pretty quick, IMO. I'd wager that
there's probably not much room for improvement over the existing,
separate implementations.

Stuart

Simon Byrne

unread,
Jul 28, 2014, 11:39:44 AM7/28/14
to julia...@googlegroups.com
Yes, I would agree: as Elliot mentioned, you might get some gain by only doing the range-reduction once.

Looking at the source of openlibm/src/s_sincos.c, it seems that's what it does, as well as calculating z = x*x and w= z*z once...

Is there a more efficient way to get return reference values than using arrays?

Simon Kornblith

unread,
Jul 28, 2014, 1:02:31 PM7/28/14
to julia...@googlegroups.com
Presumably we could use the same global or let-scoped array rather than allocating a new array on each call.

Kevin Squire

unread,
Jul 28, 2014, 1:04:30 PM7/28/14
to julia...@googlegroups.com
That wouldn't be thread safe, which isn't a worry right now, but might be in the near future.

Ivar Nesje

unread,
Jul 28, 2014, 2:13:31 PM7/28/14
to julia...@googlegroups.com
Can someone remind me why we can't give the address of a Float64 on the stack to such a function?

Ivar

Stefan Karpinski

unread,
Jul 28, 2014, 2:24:06 PM7/28/14
to Julia Users
Local variables often don't really have an address. You have to emulate something like a C volatile variable. It's possible but not trivial.

Jameson Nash

unread,
Jul 28, 2014, 3:20:50 PM7/28/14
to julia...@googlegroups.com
More intelligent would be to utilize the inference code automatically compute which variables have limited scope and can be allocated on the stack
Reply all
Reply to author
Forward
0 new messages