Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Byte swap asm optimization

1,010 views
Skip to first unread message

Guillaume Dargaud

unread,
Apr 6, 2012, 11:02:51 AM4/6/12
to
Ian Collins wrote:

> On 04/ 6/12 08:03 PM, Guillaume Dargaud wrote:
>> Hello all,
>> a little bit off topic, but I'd like some help on how to optimize one of
>> my C progs which requires byte swaps on 64-bits integers.
>>
>> It's been about 20 years since my last lines of assembly, so I'm notably
>> rusty. Also I've never used the asm keyword while in C.
>>
>> I've seen the bswap x86 asm instruction and tried to find example without
>> success.
>>
>> How would I go to using C inline assembly to swap a long long variable ?
>> Thanks
>
> For one example, see
>
> http://cvs.opensolaris.org/source/xref/onnv/onnv-
gate/usr/src/lib/libc/amd64/gen/byteorder.s#44
>

Thanks.

I know this is getting offtopic for c.l.c...

I got this to work fine:
unsigned long x=0x12345678;
asm("bswapl %0"
: "=r" (x)
: "0" (x));

But I can't compile the following:
unsigned long long x=0x123456789abcdef0ULL;
asm("bswapq %0"
: "=r" (x)
: "0" (x));
Error: suffix or operands invalid for `bswap'

I'm on a x86_64 processor, but with a 32-bit distro, maybe that's the
reason?
--
Guillaume Dargaud
http://www.gdargaud.net/

BartC

unread,
Apr 6, 2012, 2:42:55 PM4/6/12
to


"Guillaume Dargaud" <use_the_co...@nospicedham.www.gdargaud.net>
wrote in message news:jln0ir$lf8$1...@ccpntc8.in2p3.fr...
> Ian Collins wrote:
>
>> On 04/ 6/12 08:03 PM, Guillaume Dargaud wrote:

> I got this to work fine:
> unsigned long x=0x12345678;
> asm("bswapl %0"
> : "=r" (x)
> : "0" (x));
>
> But I can't compile the following:
> unsigned long long x=0x123456789abcdef0ULL;
> asm("bswapq %0"
> : "=r" (x)
> : "0" (x));
> Error: suffix or operands invalid for `bswap'

Good luck using the syntax. An assembler with proper syntax, whether
separate or inline, would allow you to write:

mov rax,[x]
bswap rax
mov [x],rax

(This is NASM syntax)

> I'm on a x86_64 processor, but with a 32-bit distro, maybe that's the
> reason?

To assemble my code, I had to give tell the assembler to generate 64-bit
object files. Otherwise use of 'rax' generates errors.

Likely the same is happening in your code; try and do anything with a 64-bit
register (whatever one of those looks like in your syntax) and see what
happens. BTW having a 64-bit processor doesn't mean anything; they will run
32-bit or 64-bit programs. Your compiler must be generating 64-bit output.
The compiler itself might be a 32-bit or 64-bit executable, it doesn't
matter, but does confuse things.

--
Bartc

Alan Curry

unread,
Apr 6, 2012, 5:20:07 PM4/6/12
to
In article <jlndfn$rlh$1...@dont-email.me>,
BartC <b...@nospicedham.freeuk.com> wrote:
[...]
>> I got this to work fine:
>> unsigned long x=0x12345678;
>> asm("bswapl %0"
>> : "=r" (x)
>> : "0" (x));
>>
>> But I can't compile the following:
>> unsigned long long x=0x123456789abcdef0ULL;
>> asm("bswapq %0"
>> : "=r" (x)
>> : "0" (x));
>> Error: suffix or operands invalid for `bswap'
>
>Good luck using the syntax. An assembler with proper syntax, whether
>separate or inline, would allow you to write:
>
> mov rax,[x]
> bswap rax
> mov [x],rax

I can't help noticing that the "proper" one has 2 more mov instructions, both
of which seem to also involve memory transfers. And it's better because...

--
Alan Curry

BartC

unread,
Apr 6, 2012, 7:46:31 PM4/6/12
to


"Alan Curry" <pac...@nospicedham.kosh.dhis.org> wrote in message
news:jlnmm7$3l2$1...@speranza.aioe.org...
> In article <jlndfn$rlh$1...@dont-email.me>,
> BartC <b...@nospicedham.freeuk.com> wrote:

>>> unsigned long long x=0x123456789abcdef0ULL;
>>> asm("bswapq %0"
>>> : "=r" (x)
>>> : "0" (x));
>>> Error: suffix or operands invalid for `bswap'
>>
>>Good luck using [that] syntax. An assembler with proper syntax, whether
>>separate or inline, would allow you to write:
>>
>> mov rax,[x]
>> bswap rax
>> mov [x],rax
>
> I can't help noticing that the "proper" one has 2 more mov instructions,
> both
> of which seem to also involve memory transfers. And it's better because...

Well, you can actually see there are two more mov instructions for a start!

But, unless bswap now works directly on memory, how else would you byte-swap
a 64-bit operand in memory? (x=byteswap(x);)

--
Bartc

Stephen Sprunk

unread,
Apr 6, 2012, 9:57:55 PM4/6/12
to
Part of the point of inline assembly is that it allows you to specify
input and output registers; the compiler will take care of loading and
storing the appropriate values if necessary. In many cases it isn't
because they'll already be in registers due to the surrounding code and
the compiler just passes the appropriate register to your assembly code.
If not, the compiler handles the loads and stores for you.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

BartC

unread,
Apr 7, 2012, 5:55:55 AM4/7/12
to
"Stephen Sprunk" <ste...@nospicedham.sprunk.org> wrote in message
news:jlo6v3$1h0$1...@dont-email.me...
> On 06-Apr-12 18:46, BartC wrote:
>> "Alan Curry" <pac...@nospicedham.kosh.dhis.org> wrote in message
>> news:jlnmm7$3l2$1...@speranza.aioe.org...
>>> In article <jlndfn$rlh$1...@dont-email.me>,
>>> BartC <b...@nospicedham.freeuk.com> wrote:

>>>>> asm("bswapq %0"
>>>>> : "=r" (x)
>>>>> : "0" (x));

>>>> mov rax,[x]
>>>> bswap rax
>>>> mov [x],rax
>>>
>>> I can't help noticing that the "proper" one has 2 more mov instructions,
>>> both of which seem to also involve memory transfers. And it's better
>>> because...
>>
>> Well, you can actually see there are two more mov instructions for a
>> start!
>>
>> But, unless bswap now works directly on memory, how else would you
>> byte-swap a 64-bit operand in memory? (x=byteswap(x);)
>
> Part of the point of inline assembly is that it allows you to specify
> input and output registers; the compiler will take care of loading and
> storing the appropriate values if necessary. In many cases it isn't
> because they'll already be in registers due to the surrounding code and
> the compiler just passes the appropriate register to your assembly code.
> If not, the compiler handles the loads and stores for you.

No, much of the point of inline assembly is that you are 100% in control at
that point. The code must also be reasonably transparent, and easy to see
what's happening. That's not the case with the (presumably gcc) example
shown. The AT&T syntax doesn't help, having to quote things in strings makes
it worse, and having to use this weird interface to specify where values are
coming from, and where they're going to, makes it impossible.

I can see why gcc wants to have inline assembly which can work with
register-based variables, and which can be inserted into the middle of
highly optimised code without having to trash most of the registers, but
their approach looks terrible.

And for all we know, the gcc version also uses two memory mov instructions!

Anyway, the OPs real problem, if it is a 32-bit/64-bit issue, is not getting
the right error message. My Nasm assembler says "instruction not supported
in 32-bit mode" when using "bswap rax".

--
Bartc

Bernhard Schornak

unread,
Apr 7, 2012, 9:04:58 AM4/7/12
to
Try "movabs %0, %%rax"
"bswap %%rax"

or "$%0"? I'm not sure about GCC's inline syntax...
'movabs' is required for numbers exceeding 32 bit.

As pointed out in the other replies - this clobbers
RAX. Shouldn't do too much harm, because RAX is the
register for 'general purposes', anyway.


> I'm on a x86_64 processor, but with a 32-bit distro, maybe that's the
> reason?


If it is a 64 bit GCC, it should emit a 64 bit exe.
The extended register set is only available in long
mode, so you need a 64 bit OS to run it.


Greetings from Augsburg

Bernhard Schornak

Richard Damon

unread,
Apr 7, 2012, 12:27:44 PM4/7/12
to
On 4/7/12 5:55 AM, BartC wrote:
> "Stephen Sprunk" <ste...@nospicedham.sprunk.org> wrote in message
>>
>> Part of the point of inline assembly is that it allows you to specify
>> input and output registers; the compiler will take care of loading and
>> storing the appropriate values if necessary. In many cases it isn't
>> because they'll already be in registers due to the surrounding code and
>> the compiler just passes the appropriate register to your assembly code.
>> If not, the compiler handles the loads and stores for you.
>
> No, much of the point of inline assembly is that you are 100% in control
> at that point. The code must also be reasonably transparent, and easy to
> see what's happening. That's not the case with the (presumably gcc)
> example shown. The AT&T syntax doesn't help, having to quote things in
> strings makes it worse, and having to use this weird interface to
> specify where values are coming from, and where they're going to, makes
> it impossible.

The purpose of in-line assembly is to allow you to specify something
that you can't (as efficiently) in C code itself. By its nature, it is
"non-portable". Some implementations of in-line assembly just provide
for the literal insertion of assembly code into the code stream (this is
fairly simple for the compiler), with maybe some smarts in the optimizer
to figure out what has happened to its register store. This type of
in-line assembly can actually cause "pessimisms" as its presence can
greatly interfere with the compilers ability to optimize the code.
Another method of implementing in-line assembly inserts more abstract
operations, which do not necessarily specify exact registers to be used,
but register classes and transfer flow from one instruction to another
(normally you CAN specify an exact register if needed). This form is
more common on RISC-like machines with a rich set of registers and many
operations can work on a variety of registers. In this case, the
optimizer works with the in-line assembly and does what optimizers are
good at, register scheduling. Yes, the programmer has given up some
control over the exact code sequence being executed (but often still has
ways to keep that control if they wanted to), but has gained in the
ability for the compiler to optimize the code better.

In this case, if the value happened to already have been in a suitable
register, than the code could just use that register instead of
reloading the value from memory. If it was in a register not suitable
for the instruction, a quicker move operation from the register instead
of out to memory can be used.

svfu...@nospicedham.gmail.com

unread,
Apr 10, 2012, 6:30:31 PM4/10/12
to
Jus
Just use __builtin_bswap64() It will do exactly what you want in 64 bit and in 32 bit mode. The problem is that you are compiling for 32 bit... and a 64 bit bswap instruction doesn't exist there.

Steven

Nathan

unread,
Apr 11, 2012, 2:25:50 AM4/11/12
to
On Apr 10, 6:30 pm, svfue...@nospicedham.gmail.com wrote:

>
> Just use __builtin_bswap64()

That'd be too much like coding for HLA. ;)

Nathan.
--
http://clax.inspiretomorrow.net
0 new messages