[LLVMdev] rotate

391 views
Skip to first unread message

reed kotler

unread,
Jul 28, 2012, 11:29:58 PM7/28/12
to ll >> "llvmdev@cs.uiuc.edu"
in C or C++, how can I get clang/llvm to try and do a "rotate".

(want to test this code in the mips16 port)

i.e. emit rotr node.

tia.

reed

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Michael Gottesman

unread,
Jul 28, 2012, 11:55:46 PM7/28/12
to reed kotler, ll >> "llvmdev@cs.uiuc.edu"
I can get clang/llvm to emit a rotate instruction on x86-64 when compiling C by just using -Os and the rotate from Hacker's Delight i.e.,

======
#include <stdlib.h>
#include <stdint.h>

uint32_t ror(uint32_t input, size_t rot_bits)
{
return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits));
}
======

Then compile with (assuming you are on OS X):
======
ISYSROOT=$(xcodebuild -sdk macosx -version PlatformPath)/Developer/SDKs/MacOSX10.8.sdk
$(xcrun -find clang) -isysroot $ISYSROOT ror.c -c -S -Os -o -
======

yielding an assembly output of:
======
.section __TEXT,__text,regular,pure_instructions
.globl _rotr
_rotr: ## @rotr
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
movb %sil, %cl
rorl %cl, %edi <==== Rotate instruction
movl %edi, %eax
popq %rbp
ret
.cfi_endproc
.subsections_via_symbols
======

I hope this helps.

Michael

reed kotler

unread,
Jul 29, 2012, 12:04:33 AM7/29/12
to Michael Gottesman, ll >> "llvmdev@cs.uiuc.edu"
Nice!

Clever compiler..

Michael Gottesman

unread,
Jul 29, 2012, 12:11:13 AM7/29/12
to reed kotler, ll >> "llvmdev@cs.uiuc.edu"
*NOTE* IIRC compiling this with -O0 on x86-64 can yield the wrong result since clang will emit shifts and on intel shifts are mod the register size:

=====
.section __TEXT,__text,regular,pure_instructions
.globl _ror
.align 4, 0x90
_ror: ## @ror
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl -4(%rbp), %edi
movq -16(%rbp), %rsi
movl %esi, %eax
movl %eax, %ecx
## kill: CL<def> ECX<kill>
shrl %cl, %edi
movl -4(%rbp), %eax
movabsq $32, %rsi
subq -16(%rbp), %rsi
movl %esi, %edx
movl %edx, %ecx
## kill: CL<def> ECX<kill>
shll %cl, %eax
orl %eax, %edi
movl %edi, %eax
popq %rbp
ret
.cfi_endproc


.subsections_via_symbols
=====

Michael

Andy Gibbs

unread,
Jul 29, 2012, 4:02:17 PM7/29/12
to Michael Gottesman, llv...@cs.uiuc.edu
> *NOTE* IIRC compiling this with -O0 on x86-64 can yield the wrong result
> since clang will emit shifts and on intel shifts are mod the register
> size [...snip...]

I remember finding the same thing (although I haven't tried it on a recent
clang version) and what I wondered was whether there was mileage in having
an explicit intrinsic for rotation (like there is for bit counting, as in
__builtin_clz and __builtin_ctz and so on). Microsoft's compiler has an
explicit intrinsic in the form of _rotl8 and _rotl16 (IIRC -- this is from
memory!). It would be nice to have a __builtin_rotl family in clang, in
my opinion, but it would need back-end support from llvm. I would expect
it to find use in code relating to hashing and cryptology. I know that
the compiler *should* optimise

uint32_t ror(uint32_t input, size_t rot_bits) {
return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits));
}

and such macros / inline functions are common, but an intrinsic specifies
intent better and could provide for better code optimisation.

Would this be worthwhile pursuing?

Cheers
Andy

Michael Gottesman

unread,
Jul 29, 2012, 5:39:01 PM7/29/12
to Andy Gibbs, llv...@cs.uiuc.edu
After some reflection, I believe the problem was actually with shld/shrd and a similar
bit of C code.

Cameron McInally

unread,
Jul 29, 2012, 6:16:57 PM7/29/12
to Andy Gibbs, llv...@cs.uiuc.edu
Hey Andy,

I proposed a similar patch to LLVM (left circular shift) around 10/2011. Parts of my patch did make it into trunk about a year after, but others did not. 

At that time, my solution was to add a binary operator to the IRBuilder, since LCS fits in nicely with the other shift operators. But, it is quite cumbersome to merge :*(. I would be happy to resend the original patch if you'd like.

-Cameron

On Sun, Jul 29, 2012 at 4:02 PM, Andy Gibbs <andy...@hotmail.co.uk> wrote:
... 

Andy Gibbs

unread,
Jul 31, 2012, 6:04:34 AM7/31/12
to cameron....@nyu.edu, llv...@cs.uiuc.edu
On Monday, July 30, 2012 12:16 AM, Cameron McInally wrote:
> Hey Andy,
>
> I proposed a similar patch to LLVM (left circular shift) around 10/2011.
> Parts of my patch did make it into trunk about a year after, but others
> did not.
>
> At that time, my solution was to add a binary operator to the IRBuilder,
> since LCS fits in nicely with the other shift operators. But, it is quite
> cumbersome to merge :*(. I would be happy to resend the original patch
> if you'd like.

Yes, I would be interested. Thank you! I don't know if was rejected before
that I'll have any better luck this time, but I can try...

Cameron McInally

unread,
Jul 31, 2012, 8:05:51 AM7/31/12
to Andy Gibbs, llv...@cs.uiuc.edu
Oh, no. I should have been more clear. The patch was not rejected, just lost in the daily shuffle.

I already have my employer's approval to send this upstream, so I will prepare a patch against trunk this morning.

> I proposed a similar patch to LLVM (left circular shift) around 10/2011.
> Parts of my patch did make it into trunk about a year after, but others
> did not.

And now that I reread this... it should have been "month after", not "year after". 

Joerg Sonnenberger

unread,
Jul 31, 2012, 10:39:16 AM7/31/12
to llv...@cs.uiuc.edu
On Sat, Jul 28, 2012 at 09:11:13PM -0700, Michael Gottesman wrote:
> *NOTE* IIRC compiling this with -O0 on x86-64 can yield the wrong
> result since clang will emit shifts and on intel shifts are mod the
> register size:

The original code is depending on UB, if it doesn't mask.

Joerg

Cameron McInally

unread,
Jul 31, 2012, 11:42:42 AM7/31/12
to Andy Gibbs, llvm-c...@cs.uiuc.edu, llv...@cs.uiuc.edu
Andy,

Here is the left circular shift operator patch. I apologize to the reviewer in advance. The patch has a good bit of fine detail. Any comments/criticisms?

Some caveats...

1) This is just the bare minimum needed to make the left circular shift operator work (e.g. no instruction combining).

2) I tried my best to select operator names in the existing style; please feel free to change them as appropriate.

Ty,
Cameron
CShl.txt

Eli Friedman

unread,
Jul 31, 2012, 12:17:53 PM7/31/12
to cameron....@nyu.edu, llvm-c...@cs.uiuc.edu, Andy Gibbs, llv...@cs.uiuc.edu
On Tue, Jul 31, 2012 at 8:42 AM, Cameron McInally
<cameron....@nyu.edu> wrote:
> Andy,
>
> Here is the left circular shift operator patch. I apologize to the reviewer
> in advance. The patch has a good bit of fine detail. Any
> comments/criticisms?
>
> Some caveats...
>
> 1) This is just the bare minimum needed to make the left circular shift
> operator work (e.g. no instruction combining).
>
> 2) I tried my best to select operator names in the existing style; please
> feel free to change them as appropriate.

We intentionally haven't included a rotate instruction in LLVM in the
past; the justification is that it's generally straightforward for the
backend to form rotate operations, and making the optimizer
effectively handle the new rotation instruction adds a substantial
amount of complexity. You're going to need to make a strong argument
that the current approach is insufficient if you want to commit a
patch like this.

-Eli

Cameron McInally

unread,
Jul 31, 2012, 1:01:07 PM7/31/12
to Eli Friedman, llvm-c...@cs.uiuc.edu, Andy Gibbs, llv...@cs.uiuc.edu
Well, we like the operator because it maps cleanly to Fortran's ISHFTC intrinsic. There must also be other compilers out there, maybe catering to cryptos, that have their own intrinsic for circular shift in other languages. It seems wasteful for an optimizer to break apart an intrinsic into its elemental pieces in order for LLVM to put them back together. This was done in our compiler for some time and it cluttered up the interface code.

Just curious... what kind of optimizations are done on ISD::ROTL/ROTR? We're able to preform certain InstCombines and other peeps when we see a binary operator. I do not have any experience trying to optimize ISD::ROTL.

Andy Gibbs

unread,
Jul 31, 2012, 1:16:41 PM7/31/12
to cameron....@nyu.edu, Eli Friedman, llvm-c...@cs.uiuc.edu, llv...@cs.uiuc.edu
On Tuesday, July 31, 2012 7:01 PM, Cameron McInally wrote:
> [...snip...]
> There must also be other compilers out there, maybe catering to cryptos,
> that have their own intrinsic for circular shift [...snip...]

This is possibly inconsequent, but Microsoft's compiler for one has rotate
intrinsics... (just my 2 cents!)

Andy

Arnaud A. de Grandmaison

unread,
Jul 31, 2012, 1:22:29 PM7/31/12
to Eli Friedman, llv...@cs.uiuc.edu
On 07/31/2012 06:17 PM, Eli Friedman wrote:
> On Tue, Jul 31, 2012 at 8:42 AM, Cameron McInally
> <cameron....@nyu.edu> wrote:
>> Andy,
>>
>> Here is the left circular shift operator patch. I apologize to the reviewer
>> in advance. The patch has a good bit of fine detail. Any
>> comments/criticisms?
>>
>> Some caveats...
>>
>> 1) This is just the bare minimum needed to make the left circular shift
>> operator work (e.g. no instruction combining).
>>
>> 2) I tried my best to select operator names in the existing style; please
>> feel free to change them as appropriate.
> We intentionally haven't included a rotate instruction in LLVM in the
> past; the justification is that it's generally straightforward for the
> backend to form rotate operations, and making the optimizer
> effectively handle the new rotation instruction adds a substantial
> amount of complexity. You're going to need to make a strong argument
> that the current approach is insufficient if you want to commit a
> patch like this.
>
Well,

I believe something is currently broken with respect to forming rotate
instructions :

For example, using a recent clang/llvm on linux/x86_64 :

uint32_t ror32(uint32_t input, size_t rot_bits) {
return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits));
}

uint32_t rol32(uint32_t input, size_t rot_bits) {
return (input << rot_bits) | (input >> ((sizeof(input) << 3) - rot_bits));
}

gives the expected ror and rol instructions, but their 16bits counter
parts :

uint16_t ror16(uint16_t input, size_t rot_bits) {
return (input >> rot_bits) | (input << ((sizeof(input) << 3) - rot_bits));
}

uint16_t rol16(uint16_t input, size_t rot_bits) {
return (input << rot_bits) | (input >> ((sizeof(input) << 3) - rot_bits));
}

fail miserably :

.globl ror16
.align 16, 0x90
.type ror16,@function
ror16: # @ror16
.cfi_startproc
# BB#0: # %entry
movb %sil, %cl
movl %edi, %eax
shrl %cl, %eax
movl $16, %ecx
subl %esi, %ecx
# kill: CL<def> CL<kill> ECX<kill>
shll %cl, %edi
orl %eax, %edi
movzwl %di, %eax
ret
.Ltmp2:
.size ror16, .Ltmp2-ror16
.cfi_endproc

.globl rol16
.align 16, 0x90
.type rol16,@function
rol16: # @rol16
.cfi_startproc
# BB#0: # %entry
movb %sil, %cl
movl %edi, %eax
shll %cl, %eax
movl $16, %ecx
subl %esi, %ecx
# kill: CL<def> CL<kill> ECX<kill>
shrl %cl, %edi
orl %eax, %edi
movzwl %di, %eax
ret
.Ltmp3:
.size rol16, .Ltmp3-rol16
.cfi_endproc

At a quick first glance, this seems to be related to the values being
promoted from i16 to i32 in the IR optimization passes, but this may not
be the only reason.

--
Arnaud de Grandmaison

Chris Lattner

unread,
Jul 31, 2012, 1:58:27 PM7/31/12
to Andy Gibbs, llv...@cs.uiuc.edu

On Jul 31, 2012, at 3:04 AM, Andy Gibbs <andy...@hotmail.co.uk> wrote:

> On Monday, July 30, 2012 12:16 AM, Cameron McInally wrote:
>> Hey Andy,
>>
>> I proposed a similar patch to LLVM (left circular shift) around 10/2011.
>> Parts of my patch did make it into trunk about a year after, but others
>> did not.
>>
>> At that time, my solution was to add a binary operator to the IRBuilder,
>> since LCS fits in nicely with the other shift operators. But, it is quite
>> cumbersome to merge :*(. I would be happy to resend the original patch
>> if you'd like.
>
> Yes, I would be interested. Thank you! I don't know if was rejected before
> that I'll have any better luck this time, but I can try…

Why do we need a new builtin for rotates? What problem does it solve?

-Chris

Cameron McInally

unread,
Jul 31, 2012, 2:39:43 PM7/31/12
to Chris Lattner, Andy Gibbs, llv...@cs.uiuc.edu
The initial inspiration for my local mod was to create a way for our optimizer to get at ISD::ROTL directly. This would allow us to always produce an x86 rol instruction for a user intrinsic call, even at -O0.

I suppose an LLVM intrinsic would do the same thing. But, it is nice to have an operator so LLVM can optimize common cases, like cshift by a constant. I personally feel that it fits in with LSHL, LSHR, and ASHR.

That being said, I'd be just as happy with any other way to convey the same functionality. It just seems less nice to have to unfold a circular shift intrinsic into the Hacker's Delight algorithm (5 or more operations) at our opt/LLVM interface, just to have LLVM try to put it back together.

On Tue, Jul 31, 2012 at 1:58 PM, Chris Lattner <clat...@apple.com> wrote:
...
Reply all
Reply to author
Forward
0 new messages