Trigonometric functions using fdlibm. (issue 303753002)

31 views
Skip to first unread message

rt...@chromium.org

unread,
Jun 2, 2014, 1:26:12 PM6/2/14
to yan...@chromium.org, v8-...@googlegroups.com

https://codereview.chromium.org/303753002/diff/40001/src/math.js
File src/math.js (right):

https://codereview.chromium.org/303753002/diff/40001/src/math.js#newcode262
src/math.js:262: }
As you mentioned via email, you've removed the 3rd iteration. This is
really needed if you want to be able to reduce multiples of pi/2
accurately.

https://codereview.chromium.org/303753002/

yan...@chromium.org

unread,
Jun 3, 2014, 3:01:47 AM6/3/14
to rt...@chromium.org, v8-...@googlegroups.com
Reviewers: Raymond Toy,
On 2014/06/02 17:26:11, Raymond Toy wrote:
> As you mentioned via email, you've removed the 3rd iteration. This is
really
> needed if you want to be able to reduce multiples of pi/2 accurately.

That's true. However, the reduction step is not exposed as a library
function. From what I have seen, the third step seems to only affect y1.
With a y0 really close to y1, it does not change the result of sine or
cosine. This is also why I was asking for a test case where removing
this third step would make a difference.

Description:
Trigonometric functions using fdlibm.

BUG=v8:3006
LOG=N

Please review this at https://codereview.chromium.org/303753002/

SVN Base: https://v8.googlecode.com/svn/branches/bleeding_edge

Affected files (+655, -281 lines):
M src/bootstrapper.cc
M src/math.js
A src/rempio2.h
A src/rempio2.cc
M src/runtime.h
M src/runtime.cc
D src/trig-table.h
A + test/mjsunit/runtime-gen/rempio2.js
M test/mjsunit/sin-cos.js
M tools/generate-runtime-tests.py
D tools/generate-trig-table.py
M tools/gyp/v8.gyp


rt...@chromium.org

unread,
Jun 3, 2014, 12:51:31 PM6/3/14
to yan...@chromium.org, v8-...@googlegroups.com
On 2014/06/03 07:01:45, Yang wrote:
> https://codereview.chromium.org/303753002/diff/40001/src/math.js
> File src/math.js (right):

> https://codereview.chromium.org/303753002/diff/40001/src/math.js#newcode262
> src/math.js:262: }
> On 2014/06/02 17:26:11, Raymond Toy wrote:
> > As you mentioned via email, you've removed the 3rd iteration. This is
> really
> > needed if you want to be able to reduce multiples of pi/2 accurately.

> That's true. However, the reduction step is not exposed as a library
> function.
> From what I have seen, the third step seems to only affect y1. With a y0
really
> close to y1, it does not change the result of sine or cosine. This is
> also why
I
> was asking for a test case where removing this third step would make a
> difference.

I don't understand what you mean by "y0 really close to y1". What are you
saying?


tan(Math.PI*45/2) requires the 3rd iteration. ieee754_rem_pio2 returns
[45, -9.790984586812941e-16, -6.820314736619894e-32]

If you ignore the y1 result, we have
kernel_tan(-9.790984586812941e-16, 0e0, -1) -> 1021347742030824.2

If you include the y1 result:
kernel_tan(-9.790984586812941e-16,-6.820314736619894e-32, -1) ->
1021347742030824.1



https://codereview.chromium.org/303753002/

yan...@chromium.org

unread,
Jun 4, 2014, 3:19:30 AM6/4/14
to rt...@chromium.org, v8-...@googlegroups.com
I somehow didn't type what I thought. I meant to say: if y0 is really close
to
0, there does not seem to be any point to invest in the third loop. (I am
aware
that omitting y1 changes the result in some cases. I'm not arguing this).

So in the example here, if I omit the third iteration, I get
[45, -9.790984586812941e-16, -6.820199415561299e-32]

y0 is the same, y1 differs slightly, but the end result is still
1021347742030824.1.


https://codereview.chromium.org/303753002/

rt...@chromium.org

unread,
Jun 4, 2014, 12:30:39 PM6/4/14
to yan...@chromium.org, v8-...@googlegroups.com
While I understand your desire to reduce the complexity, you are modifying
an
algorithm written by an expert. I think the burden is on you to prove that
by
removing the third iteration you do not change the value of y0.

Also, where is this coming from? In reality, how often will you compute
sin(x)
where x is very near a multiple of pi/2 (where the third iteration is
needed)?
I suspect it occurs more often than we might expect, but also that if you're
doing that, I think you're also computing zillions more values that are not
a
multiple of pi/2.

For example, in 3d-morph, we compute sin((n-1)*pi/15) for n = 0 to 119.
Thus
out of 120 values, we have a multiple of pi just 8 times out of 120. If the
cost
of reduction for multiples of pi/2 AND the computation of sin were reduced
to
exactly zero, you would save about just 6.6% in runtime.

I think there are more important things to look at. We need profile
results. We
need to understand what is really expensive in the reduction, not what we
think
is expensive.


https://codereview.chromium.org/303753002/

yan...@chromium.org

unread,
Jun 5, 2014, 11:24:00 AM6/5/14
to rt...@chromium.org, v8-...@googlegroups.com
I added back the third iteration, and tweaked some places, so that the
runtime
is now down to 16ms (vs the current 12ms).

https://codereview.chromium.org/303753002/

Yang Guo

unread,
Jun 5, 2014, 11:28:26 AM6/5/14
to rt...@chromium.org, v8-...@googlegroups.com
Here's a profile of the 64bit build. MathSinSlow takes most of the time, and the file includes a disassembly of the generated code, with each instruction annotated with profiling stats. Note that this runs an altered version of SunSpider's 3d-morph to run longer, giving more profiling samples.

Yang
prof.txt

Raymond Toy

unread,
Jun 5, 2014, 12:59:31 PM6/5/14
to Yang Guo, v8-...@googlegroups.com
Thanks for the profile results. I'll look over them as soon as I can.

However, can you also send the source for the modified 3d-morph?  I'd like to know what was changed.

Thanks!

Raymond Toy

unread,
Jun 5, 2014, 6:52:14 PM6/5/14
to Yang Guo, v8-...@googlegroups.com
Can you explain what some of the code is in the prof results you sent?

What is all the stuff from address 0xa to 0x47 doing?

What is 0xd5 to 0x147 doing? I'm guessing it's doing MathRound, but it seems that can be done with just one or two instructions.  And the original code was Math.floor(x + 0.5).  If MathRound is rounding to even, then that is not what we want.

There are some various bits of code comparing ebx to small positive constants Is that a bounds check on the kTrig array?

When I compare this disassembly with what gcc produces on the original fdlibm code, gcc seems to be much smaller and simpler.  The actual computation parts, however, appear roughly equal.  It's all the stuff around it that makes V8 probably run slower than I would have expected.



On Thu, Jun 5, 2014 at 8:28 AM, Yang Guo <yan...@chromium.org> wrote:

Yang Guo

unread,
Jun 6, 2014, 9:56:52 AM6/6/14
to Raymond Toy, v8-...@googlegroups.com
Hi Raymond,

the modified 3d-morph is attached.

The code from 0xa to 0x47 are a stack check (at the entry to function to detect stack overflow) and unboxing the argument into a double register (double numbers are usually boxed in V8 and stored on the heap, except for certain kinds of arrays and in optimized code).

The code from 0xd5 to 0x147 is indeed a MathRound. Replacing it with a floor (updated CL) actually gives a slight boost. The modified 3d-morph goes from 8250ms to 8050ms, and the unmodified one now alternates between 15ms and 16ms.

Yes, those comparisons are bounds checks. Unfortunately, out-of-bound reads on typed arrays in Javascript should return undefined. We already eliminate some of the redundant bounds checks, but not all can be eliminated. Of course the generated code for Javascript is a lot larger than that for C, no surprise there. Javascript is a dynamic language after all. And are right in that we probably should focus on the things that add overhead.

Moving the calculation to C wouldn't make things faster though, since the switch to C code is rather expensive, and C code cannot be inlined either.

Yang

3d-morph.js

Raymond Toy

unread,
Jun 6, 2014, 12:03:46 PM6/6/14
to Yang Guo, v8-...@googlegroups.com
Thanks for clarifying these results and for providing the modified 3d-morph.

When you get a chance could you provide new profile results with MathRound removed? And can you provide the pref results with the event counters enabled so we can see cache effects?

Thanks!

Yang Guo

unread,
Jun 6, 2014, 12:28:53 PM6/6/14
to Raymond Toy, v8-...@googlegroups.com

Argh. I even prepared it, but totally forgot to send it to you. Will do when I get home.

Yang

rt...@chromium.org

unread,
Jun 9, 2014, 5:28:40 PM6/9/14
to yan...@chromium.org, v8-...@googlegroups.com

rt...@chromium.org

unread,
Jun 9, 2014, 6:08:34 PM6/9/14
to v8-...@googlegroups.com, rt...@chromium.org
(Yang sent me the new profile result in private email)

This looks much better.  The only question I have now is what the code in 0xed to 0x105 is doing. Something related to converting to a float to an integer; perhaps boxing the result? 

Otherwise, it looks roughly like what gcc does, with a few extra moves and the bounds checks for kTrig.

yan...@chromium.org

unread,
Jun 10, 2014, 2:51:55 AM6/10/14
to rt...@chromium.org, v8-...@googlegroups.com
On 2014/06/09 21:28:38, Raymond Toy wrote:
> Why is this case removed?

This case wasn't removed. It was moved to C++. Motivation being that we
don't need duplicate code in sin/cos/tan when this is a rare case where
performance does not matter.

https://codereview.chromium.org/303753002/

Yang Guo

unread,
Jun 10, 2014, 2:57:15 AM6/10/14
to v8-...@googlegroups.com
Yep. That piece of code does a double to integer conversion.


--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Raymond Toy

unread,
Jun 10, 2014, 6:41:27 PM6/10/14
to v8-...@googlegroups.com
But what is the function call for?

And for the record, I'm attaching the new profile results.




You received this message because you are subscribed to a topic in the Google Groups "v8-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/v8-dev/Y6u4aNdP2Xs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to v8-dev+un...@googlegroups.com.
prof2.txt

Yang Guo

unread,
Jun 12, 2014, 3:37:16 AM6/12/14
to v8-...@googlegroups.com
On Wed, Jun 11, 2014 at 12:41 AM, Raymond Toy <rt...@chromium.org> wrote:
But what is the function call for?

The function call is the slow case for when double to integer (truncated) conversion using cvttsd2si overflows.

Raymond Toy

unread,
Jun 12, 2014, 12:29:00 PM6/12/14
to v8-...@googlegroups.com
On Thu, Jun 12, 2014 at 12:37 AM, Yang Guo <yan...@chromium.org> wrote:


On Wed, Jun 11, 2014 at 12:41 AM, Raymond Toy <rt...@chromium.org> wrote:
But what is the function call for?

The function call is the slow case for when double to integer (truncated) conversion using cvttsd2si overflows.

Ah, ok.  If we changed the if statement at line 222 to say abs(x) < 1647099.9999, would the compiler be able to derive that n cannot overflow?  That be another opportunity to micro-optimize this routine. Actually some of the if's could be replaced with float comparisons instead of integer comparisons, if that makes a difference.

Yang Guo

unread,
Jun 18, 2014, 5:32:42 AM6/18/14
to v8-...@googlegroups.com
Status update:
I haven't had much time recently, and there are still somethings I want to try before cleaning up the code to get it in shape for landing.
I also discovered that it would tank Kraken's audio-oscillator, but the reason for that is not the sin/cos performance, but for some reason the compiler is generating sub-optimal code for the main hot function in that benchmark. It seems timing-dependent. I need to find out what's happening there before progressing here.

Yang
Reply all
Reply to author
Forward
0 new messages