Re: Replace a set of Hydrogen instructions with rotate instructions on ARM (issue 10984057)

27 views
Skip to first unread message

dco...@codeaurora.org

unread,
Sep 29, 2012, 6:17:45 PM9/29/12
to erik....@gmail.com, ul...@chromium.org, mstar...@chromium.org, da...@chromium.org, v8-...@googlegroups.com
Reviewers: Erik Corry, ulan, Michael Starzinger, danno,

Message:
On 2012/09/27 23:53:27, Michael Starzinger wrote:
> Some general high-level observations about this change:

> 1) This is a very specialized optimization targeting one specific
> instruction
> (i.e. rotation). I think a more general approach that performs general
> pattern
> matching might scale better and cover other peephole optimizations as well
(e.g.
> scaled array access, algebraic simplification or other strength
> reductions)

In this case, we focused on getting StringSHA1 performance to improve
quickly. I
agree that a more general approach would be best. For this kind of thing, a
tiling instruction selector would work very well, but it's hard to implement
that kind of thing with the coarse granularity of Lithium. Perhaps we can
work
together on something of this nature in the future. It would definitely
help ARM
platforms which are more sensitive to instructions emitted than x86.

> 2) Is this a common pattern in JavaScript applications or is this tracked
> by
any
> benchmark? I would be interested to know what particular use case this
> optimization targets.

Rotations are common in cryptography, and pretty much every major benchmark
has
a crypto component. When we wrote this, we were targeting StringSHA1 in
BrowserMark, which has the following function:

rotateLeft: function(n, s) {
var t4 = ( n << s ) | (n >>> (32 - s));
return t4;
},

> 3) Adding a token for something that has no equivalent in JavaScript seems
> wrong. We only do this in rare cases where we need to push information
> about
> variable initialization through the AST, but not for Crankshaft
> optimizations.

I think it's strange that the LShift instruction uses AST tokens to
determine
what operation to perform. It seems like a common pattern, so we didn't
want to
change it. Maybe it would be better to have separate enums for syntax and
operations, since, as in this case, we want to represent a wider variety of
operations than the syntax allows.


> 4) This change definitely needs better test coverage.

Agree.

I have refactored this change, incorporating as much as the above feedback
as I
could (although I'm not making any architectural changes for now). Once the
new
revision passes legal review (should be Monday), I'll upload here. Test
cases
will follow soon after that.

Description:
Replace a set of Hydrogen instructions with rotate instructions on ARM
BUG=none
TEST=none

Please review this at http://codereview.chromium.org/10984057/

SVN Base: http://v8.googlecode.com/svn/branches/bleeding_edge/

Affected files:
M src/arm/lithium-arm.cc
M src/arm/lithium-codegen-arm.cc
M src/arm/simulator-arm.cc
M src/hydrogen-instructions.h
M src/hydrogen-instructions.cc
M src/hydrogen.h
M src/hydrogen.cc
M src/ia32/lithium-ia32.cc
M src/token.h
M src/x64/lithium-x64.cc


ul...@chromium.org

unread,
Oct 1, 2012, 2:10:59 PM10/1/12
to dco...@codeaurora.org, erik....@gmail.com, mstar...@chromium.org, da...@chromium.org, v8-...@googlegroups.com
Hi Jay,

While waiting for the new revision of this patch set, I started prototyping
implementation of ROR for ia32 and x64 architectures, so that we can get
rid of
the ARM specific #ifdef in hydrogen.cc.

You can review the CL at https://chromiumcodereview.appspot.com/11033005/

Once you upload new patch set, we can combine both CLs to get something in
landable state.

I also added some tests and moved detection of the ROR pattern to the
visitor of
the bitwise OR. This seems cleaner to me than adding a new hydrogen phase
just
to detect one specific pattern, WDYT?

Please note that I checked the generated code but didn't check performance
on
StringSHA1 yet. How can I do this?

https://chromiumcodereview.appspot.com/10984057/

Jay Conrod

unread,
Oct 1, 2012, 4:01:12 PM10/1/12
to dco...@codeaurora.org, erik....@gmail.com, ul...@chromium.org, mstar...@chromium.org, da...@chromium.org, v8-...@googlegroups.com
LGTM. I refactored the original patch, but I think your new CL came out
cleaner.

I've attached stringSHA1.txt (rename to .js; our e-mail system rejects
.js files). We have a shell-based driver script, but it would take a
while to get through legal review, and it's probably faster if you write
your own. Call the "run" method in a loop which terminates in 2000
iterations or 2000 ms, whichever comes first. The score is iterations
per second.
stringSHA1.txt

Ulan Degenbaev

unread,
Oct 2, 2012, 12:36:40 PM10/2/12
to Jay Conrod, Erik Corry, mstar...@chromium.org, da...@chromium.org, v8-...@googlegroups.com
Hi Jay,

I attached a runnable version of the benchmark. Does it compute the score correctly? If yes, could you please tell me what scores do you get?

My scores are in range 11170-11299 for V8 TOT, the second version of this CL, and my new CL.

Cheers,
Ulan.
string-sha1-with-runner.txt

Jay Conrod

unread,
Oct 4, 2012, 1:08:50 PM10/4/12
to Ulan Degenbaev, Erik Corry, mstar...@chromium.org, da...@chromium.org, v8-...@googlegroups.com
Sorry for the delayed response. I also get scores in the same range with the script, as written, and I didn't see any significant difference due to the patch. However, when I increase the number of iterations to 10000 though, I see a 4.46% gain.

A problem with Browsermark in general is that each test doesn't really run long enough to take advantage of Crankshaft; you tend to see the penalty of the compiler kicking in, but you don't actually spend much time in optimized code. That may be what we're seeing here. When I profiled this, I saw about twice as many ticks in Crankshaft code as in the compiler in the 2000 iteration test.

There's also a difference between the shell and browser results here: several other library functions get compiled during the test. I see IsPrimitive, NonNumberToNumber, valueOf, etc. In the browser, I think these would be optimized ahead of time by earlier tests, so there would be less compiler overhead and more measurable difference.
Reply all
Reply to author
Forward
0 new messages