(defun i (x y)
(declare (type (unsigned-byte 32) x y))
(ldb (byte 32 0) (logxor x (lognot y))))
According to the manual, this is supposed to compile to native 32-bit
machine arithmetic on x86 CPUs. This sounds cool. The problem is
that it doesn't seem to work:
CL-USER> (disassemble 'i)
; 11B65DDA: 8BC6 MOV EAX, ESI ; no-arg-parsing entry point
; DDC: F7D0 NOT EAX
; DDE: 8BCB MOV ECX, EBX
; DE0: 31C1 XOR ECX, EAX
; DE2: F7C1000000E0 TEST ECX, 3758096384
; DE8: 7522 JNE L1
; DEA: 8D148D00000000 LEA EDX, [ECX*4]
; DF1: L0: 8D65F8 LEA ESP, [EBP-8]
; DF4: F8 CLC
; DF5: 8B6DFC MOV EBP, [EBP-4]
; DF8: C20400 RET 4
; DFB: 90 NOP
; DFC: 90 NOP
; DFD: 90 NOP
; DFE: 90 NOP
; DFF: 90 NOP
; E00: 0F0B0A BREAK 10 ; error trap
; E03: 02 BYTE #X02
; E04: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR
; E05: 0D BYTE #X0D ; EAX
; E06: 0F0B0A BREAK 10 ; error trap
; E09: 02 BYTE #X02
; E0A: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR
; E0B: 4D BYTE #X4D ; ECX
; E0C: L1: 7907 JNS L2
; E0E: BA0A020000 MOV EDX, 522
; E13: EB05 JMP L3
; E15: L2: BA0A010000 MOV EDX, 266
; E1A: L3: C6057C02000804 MOV BYTE PTR [#x800027C], 4 ; unboxed_region
; E21: B810000000 MOV EAX, 16
; E26: 030590783100 ADD EAX, [#x317890] ; boxed_region
; E2C: 3B0594783100 CMP EAX, [#x317894] ; boxed_region
; E32: 7607 JBE L4
; E34: E81BAF4AEE CALL #x10D54 ; alloc_overflow_eax
; E39: EB09 JMP L5
; E3B: L4: 890590783100 MOV [#x317890], EAX ; boxed_region
; E41: 83E810 SUB EAX, 16
; E44: L5: 8910 MOV [EAX], EDX
; E46: 8D5007 LEA EDX, [EAX+7]
; E49: 894AFD MOV [EDX-3], ECX
; E4C: C6057C02000800 MOV BYTE PTR [#x800027C], 0 ; unboxed_region
; E53: 803D9402000800 CMP BYTE PTR [#x8000294], 0 ; unboxed_region
; E5A: 7403 JEQ L6
; E5C: 0F0B09 BREAK 9 ; pending interrupt trap
; E5F: L6: EB90 JMP L0
Adding an (optimze (speed 3) (safety 0)) declaration doesn't change
anything.
Does anyone know what is wrong here? Do I have to do something
special to enable the modular arithmetic optimizer?
--
Tord Romstad
> I'm trying to get started using SBCL 0.9.11 on an Intel iMac (on my
> old Mac I used OpenMCL, which unfortunately doesn't run on Intel
> Macs). Section 5.2 of the SBCL manual shows a way to do fast 32-bit
> arithmetic with SBCL. The following function is given as an example:
>
> (defun i (x y)
> (declare (type (unsigned-byte 32) x y))
> (ldb (byte 32 0) (logxor x (lognot y))))
>
> According to the manual, this is supposed to compile to native 32-bit
> machine arithmetic on x86 CPUs. This sounds cool. The problem is
> that it doesn't seem to work:
It does, actually.
> CL-USER> (disassemble 'i)
> ; 11B65DDA: 8BC6 MOV EAX, ESI ; no-arg-parsing entry point
> ; DDC: F7D0 NOT EAX
> ; DDE: 8BCB MOV ECX, EBX
> ; DE0: 31C1 XOR ECX, EAX
This bit is the native 32-bit arithmetic: (lognot y) is the second
line, (logxor x <result>) is the fourth. By this stage, we have
computed the answer to your function. However...
> ; DE2: F7C1000000E0 TEST ECX, 3758096384
> ; DE8: 7522 JNE L1
> ; DEA: 8D148D00000000 LEA EDX, [ECX*4]
> ; DF1: L0: 8D65F8 LEA ESP, [EBP-8]
> ; DF4: F8 CLC
> ; DF5: 8B6DFC MOV EBP, [EBP-4]
> ; DF8: C20400 RET 4
This bit (and the consing sequence that I've snipped, down at L1) is
_returning_ the 32-bit value. The callee of I must get a tagged lisp
object back, rather than a raw 32-bit value; since (LOGXOR Y (LOGNOT
X)) can have any 32-bit value, the return sequence must test the raw
32-bit value to see if it fits in a fixnum, and otherwise must
allocate a bignum for it.
So, why is this useful? Well, if you're just calling your I function
on random input, it isn't; however, if you have slightly longer
arithmetic sequences, as are found in cryptographic or hashing
functions, or if you are storing the result of your short arithmetic
sequence into an array of results, then the inner loop or bottleneck
of the routine is made to take much less time than if the arithmetic
were generic.
Christophe
> It does, actually.
>
>> CL-USER> (disassemble 'i)
>> ; 11B65DDA: 8BC6 MOV EAX, ESI ; no-arg-parsing entry point
>> ; DDC: F7D0 NOT EAX
>> ; DDE: 8BCB MOV ECX, EBX
>> ; DE0: 31C1 XOR ECX, EAX
>
> This bit is the native 32-bit arithmetic: (lognot y) is the second
> line, (logxor x <result>) is the fourth. By this stage, we have
> computed the answer to your function. However...
>
>> ; DE2: F7C1000000E0 TEST ECX, 3758096384
>> ; DE8: 7522 JNE L1
>> ; DEA: 8D148D00000000 LEA EDX, [ECX*4]
>> ; DF1: L0: 8D65F8 LEA ESP, [EBP-8]
>> ; DF4: F8 CLC
>> ; DF5: 8B6DFC MOV EBP, [EBP-4]
>> ; DF8: C20400 RET 4
>
> This bit (and the consing sequence that I've snipped, down at L1) is
> _returning_ the 32-bit value. The callee of I must get a tagged lisp object
> back, rather than a raw 32-bit value; since (LOGXOR Y (LOGNOT X)) can have
> any 32-bit value, the return sequence must test the raw 32-bit value to see
> if it fits in a fixnum, and otherwise must allocate a bignum for it.
This makes me think; what would be your advice to become familiar with
compiled lisp code disassembly and understanding (especially for somebody not
fluent at all in assembler) ? I'm aware of the page on this matter at the
CMUCL web site, but it is really short.
Thanks.
--
Didier Verna, did...@lrde.epita.fr, http://www.lrde.epita.fr/~didier
EPITA / LRDE, 14-16 rue Voltaire Tel.+33 (1) 44 08 01 85
94276 Le Kremlin-Bicętre, France Fax.+33 (1) 53 14 59 22 did...@xemacs.org
> This bit (and the consing sequence that I've snipped, down at L1) is
> _returning_ the 32-bit value. The callee of I must get a tagged lisp
> object back, rather than a raw 32-bit value; since (LOGXOR Y (LOGNOT
> X)) can have any 32-bit value, the return sequence must test the raw
> 32-bit value to see if it fits in a fixnum, and otherwise must
> allocate a bignum for it.
I see. Thanks a lot for the explanation!
--
Tord Romstad