The epic floating-point battle

Joel Reymont

unread,

Oct 20, 2006, 3:35:48 PM10/20/06

to

Folks,

There's a thread in the Lispworks Users Group that I started:

http://thread.gmane.org/gmane.lisp.lispworks.general/6012/focus=6012

This is the type of thread that I'm sure Lispers are gonna love: I'm
pitting ACL 8.0 vs LW 5.0 in a floating point pseudo-benchmark.

I would love to get stats from SBCL, CMUCL and as many other Lisps as
possible. I would also love to improve the benchmark itself and measure
GCC on the same code.

Thanks, Joel

--
http://whylisp.com

Pascal Bourguignon

unread,

Oct 20, 2006, 3:54:13 PM10/20/06

to

"Joel Reymont" <joe...@gmail.com> writes:

times (see listing below):

clisp 2.39 110.53891 seconds
sbcl 0.9.17 0.476
cmucl 19c 0.77
gcl 2.6.7 13.250
ecl 0.9g 18.460
Allegro 8.0 20.820

disassembles:

[pjb@thalassa lisp]$ clall '(disassemble (defun digamma/setq-x (x)
(declare (:explain :types :inlining :variables)
(optimize (speed 3) (safety 0) (debug 0))
((double-float (0.0d0) *) x))
(incf x 6d0)
(let* ((p (/ 1d0 (* x x)))
(logx (log x))
(one 1d0))
(declare (double-float one p))
(setq p (* (- (* (+ (* p (- (* p 0.004166666666667d0)
0.003968253986254d0))
0.008333333333333d0) p)
0.083333333333333d0) p))
(+ p (- logx
(/ 0.5d0 x)
(/ one (setq x (- x one)))
(/ one (setq x (- x one)))
(/ one (setq x (- x one)))
(/ one (setq x (- x one)))
(/ one (setq x (- x one)))
(/ one (setq x (- x one)))))
)))'
> > > > > > > > > > > > > > > > > > > > >
------------------------------------------------------------------------
CLISP 2.39 (2006-07-16) (built 3364813332) (memory 3364813914)

WARNING in DIGAMMA/SETQ-X :
Unknown declaration (DOUBLE-FLOAT (0.0d0) *).
The whole declaration will be ignored.
WARNING in DIGAMMA/SETQ-X :
Unknown declaration :EXPLAIN.
The whole declaration will be ignored.

Disassembly of function DIGAMMA/SETQ-X
(CONST 0) = 6.0d0
(CONST 1) = 1.0d0
(CONST 2) = -0.083333333333333d0
(CONST 3) = 0.008333333333333d0
(CONST 4) = -0.003968253986254d0
(CONST 5) = 0.004166666666667d0
(CONST 6) = 0.5d0
1 required argument
0 optional arguments
No rest parameter
No keyword parameters
70 byte-code instructions:
0 (CONST&PUSH 0) ; 6.0d0
1 (LOAD&PUSH 2)
2 (CALLSR&STORE 2 53 1) ; +
6 (CONST&PUSH 1) ; 1.0d0
7 (LOAD&PUSH 2)
8 (LOAD&PUSH 3)
9 (CALLSR&PUSH 2 55) ; *
12 (CALLSR&PUSH 1 56) ; /
15 (LOAD&PUSH 2)
16 (PUSH-UNBOUND 1)
18 (CALLS2&PUSH 156) ; LOG
20 (CONST&PUSH 2) ; -0.083333333333333d0
21 (CONST&PUSH 3) ; 0.008333333333333d0
22 (LOAD&PUSH 3)
23 (CONST&PUSH 4) ; -0.003968253986254d0
24 (CONST&PUSH 5) ; 0.004166666666667d0
25 (LOAD&PUSH 6)
26 (CALLSR&PUSH 2 55) ; *
29 (CALLSR&PUSH 2 53) ; +
32 (CALLSR&PUSH 2 55) ; *
35 (CALLSR&PUSH 2 53) ; +
38 (LOAD&PUSH 3)
39 (CALLSR&PUSH 2 55) ; *
42 (CALLSR&PUSH 2 53) ; +
45 (LOAD&PUSH 2)
46 (CALLSR&STORE 2 55 1) ; *
50 (PUSH)
51 (LOAD&PUSH 1)
52 (CONST&PUSH 6) ; 0.5d0
53 (LOAD&PUSH 6)
54 (CALLSR&PUSH 1 56) ; /
57 (CONST&PUSH 1) ; 1.0d0
58 (LOAD&PUSH 7)
59 (CONST&PUSH 1) ; 1.0d0
60 (CALLSR&STORE 1 54 7) ; -
64 (PUSH)
65 (CALLSR&PUSH 1 56) ; /
68 (CONST&PUSH 1) ; 1.0d0
69 (LOAD&PUSH 8)
70 (CONST&PUSH 1) ; 1.0d0
71 (CALLSR&STORE 1 54 8) ; -
75 (PUSH)
76 (CALLSR&PUSH 1 56) ; /
79 (CONST&PUSH 1) ; 1.0d0
80 (LOAD&PUSH 9)
81 (CONST&PUSH 1) ; 1.0d0
82 (CALLSR&STORE 1 54 9) ; -
86 (PUSH)
87 (CALLSR&PUSH 1 56) ; /
90 (CONST&PUSH 1) ; 1.0d0
91 (LOAD&PUSH 10)
92 (CONST&PUSH 1) ; 1.0d0
93 (CALLSR&STORE 1 54 10) ; -
97 (PUSH)
98 (CALLSR&PUSH 1 56) ; /
101 (CONST&PUSH 1) ; 1.0d0
102 (LOAD&PUSH 11)
103 (CONST&PUSH 1) ; 1.0d0
104 (CALLSR&STORE 1 54 11) ; -
108 (PUSH)
109 (CALLSR&PUSH 1 56) ; /
112 (CONST&PUSH 1) ; 1.0d0
113 (LOAD&PUSH 12)
114 (CONST&PUSH 1) ; 1.0d0
115 (CALLSR&STORE 1 54 12) ; -
119 (PUSH)
120 (CALLSR&PUSH 1 56) ; /
123 (CALLSR&PUSH 7 54) ; -
126 (CALLSR 2 53) ; +
129 (SKIP&RET 4)
(DISASSEMBLE
(DEFUN DIGAMMA/SETQ-X (X)
(DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
(OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0)) ((DOUBLE-FLOAT (0.0d0) *) X))
(INCF X 6.0d0)
(LET* ((P (/ 1.0d0 (* X X))) (LOGX (LOG X)) (ONE 1.0d0))
(DECLARE (DOUBLE-FLOAT ONE P))
(SETQ P
(*
(-
(*
(+ (* P (- (* P 0.004166666666667d0) 0.003968253986254d0))
0.008333333333333d0)
P)
0.083333333333333d0)
P))
(+ P
(- LOGX (/ 0.5d0 X) (/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE))))))))
--> NIL

------------------------------------------------------------------------
SBCL 0.9.17

; in: DEFUN DIGAMMA/SETQ-X
; (DEFUN DIGAMMA/SETQ-X (X)
; (DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
; (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0)) ((DOUBLE-FLOAT (0.0d0) *) X))
; (INCF X 6.0d0)
; (LET* ((P (/ 1.0d0 #)) (LOGX (LOG X)) (ONE 1.0d0))
; (DECLARE (DOUBLE-FLOAT ONE P))
; (SETQ P (* (- # 0.083333333333333d0) P))
; (+ P
; (- LOGX (/ 0.5d0 X) (/ ONE #) (/ ONE #) (/ ONE #) (/ ONE #) (/ ONE #)
; (/ ONE #)))))
; --> PROGN EVAL-WHEN SB-IMPL::%DEFUN SB-INT:NAMED-LAMBDA
; ==>
; #'(SB-INT:NAMED-LAMBDA DIGAMMA/SETQ-X (X)
; (DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
; (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
; ((DOUBLE-FLOAT (0.0d0) *) X))
; (BLOCK DIGAMMA/SETQ-X
; (INCF X 6.0d0)
; (LET* ((P #) (LOGX #) (ONE 1.0d0))
; (DECLARE (DOUBLE-FLOAT ONE P))
; (SETQ P (* # P))
; (+ P (- LOGX # # # # # # #)))))
;
; caught WARNING:
; unrecognized declaration (:EXPLAIN :TYPES :INLINING :VARIABLES)
;
; note: doing float to pointer coercion (cost 13) to "<return value>"
;
; compilation unit finished
; caught 1 WARNING condition
; printed 1 note
; 0A6AF68D: 8B05A4F56A0A MOV EAX, [#xA6AF5A4] ; 6.0d0
; no-arg-parsing entry point
; 693: DDD8 FSTPD FR0
; 695: DD4001 FLDD [EAX+1]
; 698: DCC2 FADD-STI FR2
; 69A: 9B WAIT
; 69B: DDD8 FSTPD FR0
; 69D: D9C1 FLDD FR1
; 69F: D8CA FMULD FR2
; 6A1: 9B WAIT
; 6A2: DDD9 FSTPD FR1
; 6A4: D9E8 FLD1
; 6A6: D9C9 FXCH FR1
; 6A8: D8F9 FDIVRD FR1
; 6AA: DDD3 FSTD FR3
; 6AC: DDD8 FSTPD FR0
; 6AE: DDD8 FSTPD FR0
; 6B0: D9ED FLDLN2
; 6B2: D9C1 FLDD FR1
; 6B4: D9F1 FYL2X
; 6B6: D9C0 FLDD FR0
; 6B8: 8B05A8F56A0A MOV EAX, [#xA6AF5A8] ; 0.004166666666667d0
; 6BE: DDD8 FSTPD FR0
; 6C0: DD4001 FLDD [EAX+1]
; 6C3: D8CB FMULD FR3
; 6C5: 9B WAIT
; 6C6: 8B05ACF56A0A MOV EAX, [#xA6AF5AC] ; 0.003968253986254d0
; 6CC: DDDC FSTPD FR4
; 6CE: DD4001 FLDD [EAX+1]
; 6D1: D9CC FXCH FR4
; 6D3: D8E4 FSUBD FR4
; 6D5: 9B WAIT
; 6D6: D8CB FMULD FR3
; 6D8: 9B WAIT
; 6D9: 8B05B0F56A0A MOV EAX, [#xA6AF5B0] ; 0.008333333333333d0
; 6DF: DDDC FSTPD FR4
; 6E1: DD4001 FLDD [EAX+1]
; 6E4: D9CC FXCH FR4
; 6E6: D8C4 FADDD FR4
; 6E8: 9B WAIT
; 6E9: D8CB FMULD FR3
; 6EB: 9B WAIT
; 6EC: 8B05B4F56A0A MOV EAX, [#xA6AF5B4] ; 0.083333333333333d0
; 6F2: DDDC FSTPD FR4
; 6F4: DD4001 FLDD [EAX+1]
; 6F7: D9CC FXCH FR4
; 6F9: D8E4 FSUBD FR4
; 6FB: 9B WAIT
; 6FC: DCCB FMUL-STI FR3
; 6FE: 9B WAIT
; 6FF: 8B05B8F56A0A MOV EAX, [#xA6AF5B8] ; 0.5d0
; 705: DDD8 FSTPD FR0
; 707: DD4001 FLDD [EAX+1]
; 70A: D8F2 FDIVD FR2
; 70C: 9B WAIT
; 70D: DCE9 FSUB-STI FR1
; 70F: 9B WAIT
; 710: DDD8 FSTPD FR0
; 712: D9E8 FLD1
; 714: D8EA FSUBRD FR2
; 716: 9B WAIT
; 717: DDD2 FSTD FR2
; 719: DDDC FSTPD FR4
; 71B: D9E8 FLD1
; 71D: D9CC FXCH FR4
; 71F: D8FC FDIVRD FR4
; 721: 9B WAIT
; 722: DCE9 FSUB-STI FR1
; 724: 9B WAIT
; 725: DDD8 FSTPD FR0
; 727: D9E8 FLD1
; 729: D8EA FSUBRD FR2
; 72B: 9B WAIT
; 72C: DDD2 FSTD FR2
; 72E: DDDC FSTPD FR4
; 730: D9E8 FLD1
; 732: D9CC FXCH FR4
; 734: D8FC FDIVRD FR4
; 736: 9B WAIT
; 737: DCE9 FSUB-STI FR1
; 739: 9B WAIT
; 73A: DDD8 FSTPD FR0
; 73C: D9E8 FLD1
; 73E: D8EA FSUBRD FR2
; 740: 9B WAIT
; 741: DDD2 FSTD FR2
; 743: DDDC FSTPD FR4
; 745: D9E8 FLD1
; 747: D9CC FXCH FR4
; 749: D8FC FDIVRD FR4
; 74B: 9B WAIT
; 74C: DCE9 FSUB-STI FR1
; 74E: 9B WAIT
; 74F: DDD8 FSTPD FR0
; 751: D9E8 FLD1
; 753: D8EA FSUBRD FR2
; 755: 9B WAIT
; 756: DDD2 FSTD FR2
; 758: DDDC FSTPD FR4
; 75A: D9E8 FLD1
; 75C: D9CC FXCH FR4
; 75E: D8FC FDIVRD FR4
; 760: 9B WAIT
; 761: DCE9 FSUB-STI FR1
; 763: 9B WAIT
; 764: DDD8 FSTPD FR0
; 766: D9E8 FLD1
; 768: D8EA FSUBRD FR2
; 76A: 9B WAIT
; 76B: DDD2 FSTD FR2
; 76D: DDDC FSTPD FR4
; 76F: D9E8 FLD1
; 771: D9CC FXCH FR4
; 773: D8FC FDIVRD FR4
; 775: 9B WAIT
; 776: DCE9 FSUB-STI FR1
; 778: 9B WAIT
; 779: DDD8 FSTPD FR0
; 77B: D9E8 FLD1
; 77D: D8EA FSUBRD FR2
; 77F: 9B WAIT
; 780: DDD2 FSTD FR2
; 782: DDDA FSTPD FR2
; 784: D9E8 FLD1
; 786: D9CA FXCH FR2
; 788: D8FA FDIVRD FR2
; 78A: 9B WAIT
; 78B: D8E9 FSUBRD FR1
; 78D: 9B WAIT
; 78E: D8C3 FADDD FR3
; 790: 9B WAIT
; 791: 800D2403000504 OR BYTE PTR [#x5000324], 4
; 798: BA10000000 MOV EDX, 16
; 79D: 0315A05F1708 ADD EDX, [#x8175FA0] ; boxed_region
; 7A3: 3B15A45F1708 CMP EDX, [#x8175FA4] ; boxed_region
; 7A9: 7607 JBE L0
; 7AB: E85CD79AFD CALL #x805CF0C ; alloc_overflow_edx
; 7B0: EB09 JMP L1
; 7B2: L0: 8915A05F1708 MOV [#x8175FA0], EDX ; boxed_region
; 7B8: 83EA10 SUB EDX, 16
; 7BB: L1: C70216030000 MOV DWORD PTR [EDX], 790
; 7C1: 8D5207 LEA EDX, [EDX+7]
; 7C4: DD5201 FSTD [EDX+1]
; 7C7: 80352403000504 XOR BYTE PTR [#x5000324], 4
; 7CE: 7402 JEQ L2
; 7D0: CC09 BREAK 9 ; pending interrupt trap
; 7D2: L2: 8D65F8 LEA ESP, [EBP-8]
; 7D5: F8 CLC
; 7D6: 8B6DFC MOV EBP, [EBP-4]
; 7D9: C20400 RET 4
; 7DC: 90 NOP
; 7DD: 90 NOP
; 7DE: 90 NOP
; 7DF: 90 NOP
; 7E0: CC0A BREAK 10 ; error trap
; 7E2: 02 BYTE #X02
; 7E3: 18 BYTE #X18 ; INVALID-ARG-COUNT-ERROR
; 7E4: 0D BYTE #X0D ; EAX
;
(DISASSEMBLE
(DEFUN DIGAMMA/SETQ-X (X)
(DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
(OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0)) ((DOUBLE-FLOAT (0.0d0) *) X))
(INCF X 6.0d0)
(LET* ((P (/ 1.0d0 (* X X))) (LOGX (LOG X)) (ONE 1.0d0))
(DECLARE (DOUBLE-FLOAT ONE P))
(SETQ P
(*
(-
(*
(+ (* P (- (* P 0.004166666666667d0) 0.003968253986254d0))
0.008333333333333d0)
P)
0.083333333333333d0)
P))
(+ P
(- LOGX (/ 0.5d0 X) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))))))))
--> NIL

------------------------------------------------------------------------
CMU Common Lisp 19c (19C)

; In: DEFUN DIGAMMA/SETQ-X

; (DEFUN DIGAMMA/SETQ-X (X) (DECLARE # # #) (INCF X 6.0d0) ...)
; Warning: Unrecognized declaration: (:EXPLAIN :TYPES :INLINING :VARIABLES).
;
; Note: Abbreviated type declaration: ((DOUBLE-FLOAT # *) X).
;
; Converted DIGAMMA/SETQ-X.

; In: LAMBDA (X)

; #'(LAMBDA (X) (DECLARE # # #) (BLOCK DIGAMMA/SETQ-X # #))
; Warning: Unrecognized declaration: (:EXPLAIN :TYPES :INLINING :VARIABLES).
;
; Note: Abbreviated type declaration: ((DOUBLE-FLOAT # *) X).
;
; Compiling LAMBDA (X):

; In: LAMBDA (X)

; #'(LAMBDA (X) (DECLARE # # #) (BLOCK DIGAMMA/SETQ-X # #))
; Note: Doing float to pointer coercion (cost 13) to "<return value>".
;
; Compiling Top-Level Form:

; Compilation unit finished.
; 1 warning
; 2 notes

5808A4D0: .ENTRY "LAMBDA (X)"() ; FUNCTION
4E8: POP DWORD PTR [EBP-8]
4EB: LEA ESP, [EBP-32]
4EE: FSTPD ST(0)
4F0: FLDD [EDX+1]
4F3: FSTD ST(2)
4F5: MOV EAX, [#x5808A4B8] ; 6.0d0
; No-arg-parsing entry point
4FB: FSTPD ST(0)
4FD: FLDD [EAX+1]
500: FADD ST(2), ST(0)
502: FSTPD ST(0)
504: FLDD ST(1)
506: FMULD ST(2)
508: FSTPD ST(1)
50A: FLD1
50C: FXCH ST(1)
50E: FDIVRD ST(1)
510: FSTD ST(3)
512: FSTPD ST(0)
514: FSTPD ST(0)
516: FLDLN2
518: FLDD ST(1)
51A: FYL2X
51C: FLDD ST(0)
51E: MOV EAX, [#x5808A4BC] ; 0.004166666666667d0
524: FSTPD ST(0)
526: FLDD [EAX+1]
529: FMULD ST(3)
52B: MOV EAX, [#x5808A4C0] ; 0.003968253986254d0
531: FSTPD ST(4)
533: FLDD [EAX+1]
536: FXCH ST(4)
538: FSUBD ST(4)
53A: FMULD ST(3)
53C: MOV EAX, [#x5808A4C4] ; 0.008333333333333d0
542: FSTPD ST(4)
544: FLDD [EAX+1]
547: FXCH ST(4)
549: FADDD ST(4)
54B: FMULD ST(3)
54D: MOV EAX, [#x5808A4C8] ; 0.083333333333333d0
553: FSTPD ST(4)
555: FLDD [EAX+1]
558: FXCH ST(4)
55A: FSUBD ST(4)
55C: FMUL ST(3), ST(0)
55E: MOV EAX, [#x5808A4CC] ; 0.5d0
564: FSTPD ST(0)
566: FLDD [EAX+1]
569: FDIVD ST(2)
56B: FSUB ST(1), ST(0)
56D: FSTPD ST(0)
56F: FLD1
571: FSUBRD ST(2)
573: FSTD ST(2)
575: FSTPD ST(4)
577: FLD1
579: FXCH ST(4)
57B: FDIVRD ST(4)
57D: FSUB ST(1), ST(0)
57F: FSTPD ST(0)
581: FLD1
583: FSUBRD ST(2)
585: FSTD ST(2)
587: FSTPD ST(4)
589: FLD1
58B: FXCH ST(4)
58D: FDIVRD ST(4)
58F: FSUB ST(1), ST(0)
591: FSTPD ST(0)
593: FLD1
595: FSUBRD ST(2)
597: FSTD ST(2)
599: FSTPD ST(4)
59B: FLD1
59D: FXCH ST(4)
59F: FDIVRD ST(4)
5A1: FSUB ST(1), ST(0)
5A3: FSTPD ST(0)
5A5: FLD1
5A7: FSUBRD ST(2)
5A9: FSTD ST(2)
5AB: FSTPD ST(4)
5AD: FLD1
5AF: FXCH ST(4)
5B1: FDIVRD ST(4)
5B3: FSUB ST(1), ST(0)
5B5: FSTPD ST(0)
5B7: FLD1
5B9: FSUBRD ST(2)
5BB: FSTD ST(2)
5BD: FSTPD ST(4)
5BF: FLD1
5C1: FXCH ST(4)
5C3: FDIVRD ST(4)
5C5: FSUB ST(1), ST(0)
5C7: FSTPD ST(0)
5C9: FLD1
5CB: FSUBRD ST(2)
5CD: FSTD ST(2)
5CF: FSTPD ST(2)
5D1: FLD1
5D3: FXCH ST(2)
5D5: FDIVRD ST(2)
5D7: FSUBRD ST(1)
5D9: FADDD ST(3)
5DB: MOV BYTE PTR [#x28000234], 0 ; LISP::*PSEUDO-ATOMIC-INTERRUPTED*
5E2: MOV BYTE PTR [#x2800021C], 4 ; LISP::*PSEUDO-ATOMIC-ATOMIC*
5E9: MOV EDX, 16
5EE: ADD EDX, [#x2800057C] ; X86::*CURRENT-REGION-FREE-POINTER*
5F4: CMP EDX, [#x28000594] ; X86::*CURRENT-REGION-END-ADDR*
5FA: JBE L0
5FC: CALL #xBE000010 ; #xBE000010: alloc_overflow_edx
601: L0: XCHG EDX, [#x2800057C] ; X86::*CURRENT-REGION-FREE-POINTER*
607: MOV DWORD PTR [EDX], 790
60D: LEA EDX, [EDX+7]
610: FSTD [EDX+1]
613: MOV BYTE PTR [#x2800021C], 0 ; LISP::*PSEUDO-ATOMIC-ATOMIC*
61A: CMP BYTE PTR [#x28000234], 0 ; LISP::*PSEUDO-ATOMIC-INTERRUPTED*
621: JEQ L1
623: BREAK 9 ; Pending interrupt trap
625: L1: MOV ECX, [EBP-8]
628: MOV EAX, [EBP-4]
62B: ADD ECX, 2
62E: MOV ESP, EBP
630: MOV EBP, EAX
632: JMP ECX
634: NOP
635: NOP
636: NOP
637: NOP

(DISASSEMBLE
(DEFUN DIGAMMA/SETQ-X (X)
(DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
(OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
((DOUBLE-FLOAT (0.0d0) *) X))
(INCF X 6.0d0)
(LET* ((P (/ 1.0d0 (* X X))) (LOGX (LOG X)) (ONE 1.0d0))
(DECLARE (DOUBLE-FLOAT ONE P))
(SETQ P
(*
(-
(*
(+ (* P (- (* P 0.004166666666667d0) 0.003968253986254d0))
0.008333333333333d0)
P)
0.083333333333333d0)
P))
(+ P
(- LOGX
(/ 0.5d0 X)
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))))))))
-->

------------------------------------------------------------------------
GNU Common Lisp (GCL) GCL 2.6.7

Compiling gazonk0.lsp.
Warning:
The OPTIMIZE quality DEBUG is unknown.
Warning:
The declaration specifier :EXPLAIN is unknown.
End of Pass 1.
End of Pass 2.
OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3
Finished compiling gazonk0.lsp.

#include "gazonk0.h"
void init_code(){do_init(VV);}
/* function definition for DIGAMMASETQ-X */

static void L1()
{register object *base=vs_base;
register object *sup=base+VM1; VC1
vs_check;
{register object V1;
V1=(base[0]);
vs_top=sup;
goto TTL;
TTL:;
V1= number_plus((V1),VV[0]);
{register double V2;
object V3;
register double V4;
base[2]= VV[1];
base[3]= number_times((V1),(V1));
vs_top=(vs_base=base+2)+2;
Ldivide();
vs_top=sup;
V2= lf(vs_base[0]);
base[2]= (V1);
vs_top=(vs_base=base+2)+1;
Llog();
vs_top=sup;
V3= vs_base[0];
V4= 1. ;
V2= (double)((double)((double)((double)((double)(V2)*(double)((double)((double)(V2)*(double)(4.16666666666700140e-3))-(double)(3.96825398625400010e-3)))+(double)(8.33333333333300020e-3))*(double)(V2))-(double)(8.33333333333329960e-2))*(double)(V2);
V5 = make_longfloat(V2);
base[2]= (V3);
base[4]= VV[6];
base[5]= (V1);
vs_top=(vs_base=base+4)+2;
Ldivide();
vs_top=sup;
base[3]= vs_base[0];
base[5]= make_longfloat(V4);
V7 = make_longfloat(V4);
V1= number_minus((V1),V7);
base[6]= (V1);
vs_top=(vs_base=base+5)+2;
Ldivide();
vs_top=sup;
base[4]= vs_base[0];
base[6]= make_longfloat(V4);
V8 = make_longfloat(V4);
V1= number_minus((V1),V8);
base[7]= (V1);
vs_top=(vs_base=base+6)+2;
Ldivide();
vs_top=sup;
base[5]= vs_base[0];
base[7]= make_longfloat(V4);
V9 = make_longfloat(V4);
V1= number_minus((V1),V9);
base[8]= (V1);
vs_top=(vs_base=base+7)+2;
Ldivide();
vs_top=sup;
base[6]= vs_base[0];
base[8]= make_longfloat(V4);
V10 = make_longfloat(V4);
V1= number_minus((V1),V10);
base[9]= (V1);
vs_top=(vs_base=base+8)+2;
Ldivide();
vs_top=sup;
base[7]= vs_base[0];
base[9]= make_longfloat(V4);
V11 = make_longfloat(V4);
V1= number_minus((V1),V11);
base[10]= (V1);
vs_top=(vs_base=base+9)+2;
Ldivide();
vs_top=sup;
base[8]= vs_base[0];
base[10]= make_longfloat(V4);
V12 = make_longfloat(V4);
V1= number_minus((V1),V12);
base[11]= (V1);
vs_top=(vs_base=base+10)+2;
Ldivide();
vs_top=sup;
base[9]= vs_base[0];
vs_top=(vs_base=base+2)+8;
Lminus();
vs_top=sup;
V6= vs_base[0];
base[2]= number_plus(V5,V6);
vs_top=(vs_base=base+2)+1;
return;}
}
}

#(
#(6.0F0 1.0F0 0.0041666666666670014F0 0.0039682539862540001F0 0.0083333333333330002F0 0.083333333333332996F0 0.5F0 (system::%init . #((system::mf (lisp::quote common-lisp-user::digamma/setq-x) 0) (system::debug (lisp::quote common-lisp-user::digamma/setq-x) (lisp::quote (common-lisp-user::x common-lisp-user::logx))))))
)

static void L1();
#define VC1 object V12 ,V11 ,V10 ,V9 ,V8 ,V7 ,V6 ,V5;
#define VM1 12
static char * VVi[8]={
#define Cdata VV[7]
(char *)(L1)
};
#define VV ((object *)VVi)

gazonk0.o: file format elf32-i386

Disassembly of section .text:

00000000 <init_code>:
init_code():
0: 83 ec 18 sub $0x18,%esp
3: 68 00 00 00 00 push $0x0
8: e8 fc ff ff ff call 9 <init_code+0x9>
d: 83 c4 1c add $0x1c,%esp
10: c3 ret
11: eb 0d jmp 20 <L1>
13: 90 nop
14: 90 nop
15: 90 nop
16: 90 nop
17: 90 nop
18: 90 nop
19: 90 nop
1a: 90 nop
1b: 90 nop
1c: 90 nop
1d: 90 nop
1e: 90 nop
1f: 90 nop

00000020 <L1>:
L1():
20: 55 push %ebp
21: 57 push %edi
22: 56 push %esi
23: 53 push %ebx
24: 83 ec 5c sub $0x5c,%esp
27: a1 00 00 00 00 mov 0x0,%eax
2c: 8b 2d 00 00 00 00 mov 0x0,%ebp
32: 39 05 00 00 00 00 cmp %eax,0x0
38: 8d 75 30 lea 0x30(%ebp),%esi
3b: 0f 83 6f 03 00 00 jae 3b0 <L1+0x390>
41: 83 ec 08 sub $0x8,%esp
44: 8b 1d 00 00 00 00 mov 0x0,%ebx
4a: 8b 45 00 mov 0x0(%ebp),%eax
4d: 53 push %ebx
4e: 50 push %eax
4f: 89 35 00 00 00 00 mov %esi,0x0
55: e8 fc ff ff ff call 56 <L1+0x36>
5a: 89 44 24 38 mov %eax,0x38(%esp,1)
5e: a1 04 00 00 00 mov 0x4,%eax
63: 89 45 08 mov %eax,0x8(%ebp)
66: 58 pop %eax
67: 5a pop %edx
68: 8b 44 24 30 mov 0x30(%esp,1),%eax
6c: 50 push %eax
6d: 8b 44 24 34 mov 0x34(%esp,1),%eax
71: 50 push %eax
72: e8 fc ff ff ff call 73 <L1+0x53>
77: 8d 7d 08 lea 0x8(%ebp),%edi
7a: 8d 4d 10 lea 0x10(%ebp),%ecx
7d: 89 45 0c mov %eax,0xc(%ebp)
80: 89 4c 24 54 mov %ecx,0x54(%esp,1)
84: 89 0d 00 00 00 00 mov %ecx,0x0
8a: 89 7c 24 58 mov %edi,0x58(%esp,1)
8e: 89 3d 00 00 00 00 mov %edi,0x0
94: e8 fc ff ff ff call 95 <L1+0x75>
99: a1 00 00 00 00 mov 0x0,%eax
9e: 8b 00 mov (%eax),%eax
a0: dd 40 04 fldl 0x4(%eax)
a3: 8b 44 24 38 mov 0x38(%esp,1),%eax
a7: 8d 55 0c lea 0xc(%ebp),%edx
aa: 89 45 08 mov %eax,0x8(%ebp)
ad: 8b 44 24 58 mov 0x58(%esp,1),%eax
b1: 89 54 24 50 mov %edx,0x50(%esp,1)
b5: 89 35 00 00 00 00 mov %esi,0x0
bb: dd 5c 24 10 fstpl 0x10(%esp,1)
bf: 89 15 00 00 00 00 mov %edx,0x0
c5: a3 00 00 00 00 mov %eax,0x0
ca: e8 fc ff ff ff call cb <L1+0xab>
cf: dd 44 24 10 fldl 0x10(%esp,1)
d3: d9 c0 fld %st(0)
d5: dc 0d 00 00 00 00 fmull 0x0
db: dc 25 08 00 00 00 fsubl 0x8
e1: d8 c9 fmul %st(1),%st
e3: dc 05 10 00 00 00 faddl 0x10
e9: d8 c9 fmul %st(1),%st
eb: dc 25 18 00 00 00 fsubl 0x18
f1: a1 00 00 00 00 mov 0x0,%eax
f6: de c9 fmulp %st,%st(1)
f8: 8b 18 mov (%eax),%ebx
fa: 89 35 00 00 00 00 mov %esi,0x0
100: dd 1c 24 fstpl (%esp,1)
103: e8 fc ff ff ff call 104 <L1+0xe4>
108: 89 5d 08 mov %ebx,0x8(%ebp)
10b: 89 44 24 5c mov %eax,0x5c(%esp,1)
10f: a1 18 00 00 00 mov 0x18,%eax
114: 89 45 10 mov %eax,0x10(%ebp)
117: 8b 44 24 38 mov 0x38(%esp,1),%eax
11b: 8d 7d 18 lea 0x18(%ebp),%edi
11e: 89 45 14 mov %eax,0x14(%ebp)
121: 8b 44 24 54 mov 0x54(%esp,1),%eax
125: 89 7c 24 4c mov %edi,0x4c(%esp,1)
129: a3 00 00 00 00 mov %eax,0x0
12e: 89 3d 00 00 00 00 mov %edi,0x0
134: e8 fc ff ff ff call 135 <L1+0x115>
139: a1 00 00 00 00 mov 0x0,%eax
13e: 8b 00 mov (%eax),%eax
140: 89 45 0c mov %eax,0xc(%ebp)
143: 59 pop %ecx
144: 5b pop %ebx
145: 68 00 00 f0 3f push $0x3ff00000
14a: 6a 00 push $0x0
14c: 89 35 00 00 00 00 mov %esi,0x0
152: e8 fc ff ff ff call 153 <L1+0x133>
157: 89 45 14 mov %eax,0x14(%ebp)
15a: 58 pop %eax
15b: 5a pop %edx
15c: 68 00 00 f0 3f push $0x3ff00000
161: 6a 00 push $0x0
163: e8 fc ff ff ff call 164 <L1+0x144>
168: 5b pop %ebx
169: 5f pop %edi
16a: 50 push %eax
16b: 8b 44 24 34 mov 0x34(%esp,1),%eax
16f: 50 push %eax
170: e8 fc ff ff ff call 171 <L1+0x151>
175: 8d 5d 1c lea 0x1c(%ebp),%ebx
178: 8d 4d 14 lea 0x14(%ebp),%ecx
17b: 89 45 18 mov %eax,0x18(%ebp)
17e: 89 0d 00 00 00 00 mov %ecx,0x0
184: 89 44 24 34 mov %eax,0x34(%esp,1)
188: 89 5c 24 48 mov %ebx,0x48(%esp,1)
18c: 89 1d 00 00 00 00 mov %ebx,0x0
192: e8 fc ff ff ff call 193 <L1+0x173>
197: a1 00 00 00 00 mov 0x0,%eax
19c: 8b 00 mov (%eax),%eax
19e: 89 45 10 mov %eax,0x10(%ebp)
1a1: 58 pop %eax
1a2: 5a pop %edx
1a3: 68 00 00 f0 3f push $0x3ff00000
1a8: 6a 00 push $0x0
1aa: 89 35 00 00 00 00 mov %esi,0x0
1b0: e8 fc ff ff ff call 1b1 <L1+0x191>
1b5: 89 45 18 mov %eax,0x18(%ebp)
1b8: 5b pop %ebx
1b9: 5f pop %edi
1ba: 68 00 00 f0 3f push $0x3ff00000
1bf: 6a 00 push $0x0
1c1: e8 fc ff ff ff call 1c2 <L1+0x1a2>
1c6: 5a pop %edx
1c7: 59 pop %ecx
1c8: 50 push %eax
1c9: 8b 44 24 30 mov 0x30(%esp,1),%eax
1cd: 50 push %eax
1ce: e8 fc ff ff ff call 1cf <L1+0x1af>
1d3: 89 c3 mov %eax,%ebx
1d5: 8d 55 20 lea 0x20(%ebp),%edx
1d8: 89 45 1c mov %eax,0x1c(%ebp)
1db: 8b 44 24 4c mov 0x4c(%esp,1),%eax
1df: 89 54 24 44 mov %edx,0x44(%esp,1)
1e3: 89 15 00 00 00 00 mov %edx,0x0
1e9: a3 00 00 00 00 mov %eax,0x0
1ee: e8 fc ff ff ff call 1ef <L1+0x1cf>
1f3: a1 00 00 00 00 mov 0x0,%eax
1f8: 8b 00 mov (%eax),%eax
1fa: 89 45 14 mov %eax,0x14(%ebp)
1fd: 59 pop %ecx
1fe: 5f pop %edi
1ff: 68 00 00 f0 3f push $0x3ff00000
204: 6a 00 push $0x0
206: 89 35 00 00 00 00 mov %esi,0x0
20c: e8 fc ff ff ff call 20d <L1+0x1ed>
211: 89 45 1c mov %eax,0x1c(%ebp)
214: 58 pop %eax
215: 5a pop %edx
216: 68 00 00 f0 3f push $0x3ff00000
21b: 6a 00 push $0x0
21d: e8 fc ff ff ff call 21e <L1+0x1fe>
222: 59 pop %ecx
223: 5f pop %edi
224: 50 push %eax
225: 53 push %ebx
226: e8 fc ff ff ff call 227 <L1+0x207>
22b: 8d 7d 24 lea 0x24(%ebp),%edi
22e: 89 44 24 30 mov %eax,0x30(%esp,1)
232: 89 45 20 mov %eax,0x20(%ebp)
235: 8b 44 24 48 mov 0x48(%esp,1),%eax
239: 89 7c 24 40 mov %edi,0x40(%esp,1)
23d: a3 00 00 00 00 mov %eax,0x0
242: 89 3d 00 00 00 00 mov %edi,0x0
248: e8 fc ff ff ff call 249 <L1+0x229>
24d: a1 00 00 00 00 mov 0x0,%eax
252: 8b 00 mov (%eax),%eax
254: 89 45 18 mov %eax,0x18(%ebp)
257: 58 pop %eax
258: 5a pop %edx
259: 68 00 00 f0 3f push $0x3ff00000
25e: 6a 00 push $0x0
260: 89 35 00 00 00 00 mov %esi,0x0
266: e8 fc ff ff ff call 267 <L1+0x247>
26b: 89 45 20 mov %eax,0x20(%ebp)
26e: 5b pop %ebx
26f: 5f pop %edi
270: 68 00 00 f0 3f push $0x3ff00000
275: 6a 00 push $0x0
277: e8 fc ff ff ff call 278 <L1+0x258>
27c: 5a pop %edx
27d: 59 pop %ecx
27e: 50 push %eax
27f: 8b 44 24 2c mov 0x2c(%esp,1),%eax
283: 50 push %eax
284: e8 fc ff ff ff call 285 <L1+0x265>
289: 89 44 24 2c mov %eax,0x2c(%esp,1)
28d: 8d 4d 28 lea 0x28(%ebp),%ecx
290: 89 45 24 mov %eax,0x24(%ebp)
293: 8b 44 24 44 mov 0x44(%esp,1),%eax
297: a3 00 00 00 00 mov %eax,0x0
29c: 89 4c 24 3c mov %ecx,0x3c(%esp,1)
2a0: 89 0d 00 00 00 00 mov %ecx,0x0
2a6: e8 fc ff ff ff call 2a7 <L1+0x287>
2ab: a1 00 00 00 00 mov 0x0,%eax
2b0: 8b 00 mov (%eax),%eax
2b2: 89 45 1c mov %eax,0x1c(%ebp)
2b5: 59 pop %ecx
2b6: 5b pop %ebx
2b7: 68 00 00 f0 3f push $0x3ff00000
2bc: 6a 00 push $0x0
2be: 89 35 00 00 00 00 mov %esi,0x0
2c4: e8 fc ff ff ff call 2c5 <L1+0x2a5>
2c9: 89 45 24 mov %eax,0x24(%ebp)
2cc: 58 pop %eax
2cd: 5a pop %edx
2ce: 68 00 00 f0 3f push $0x3ff00000
2d3: 6a 00 push $0x0
2d5: e8 fc ff ff ff call 2d6 <L1+0x2b6>
2da: 5b pop %ebx
2db: 5f pop %edi
2dc: 50 push %eax
2dd: 8b 5c 24 28 mov 0x28(%esp,1),%ebx
2e1: 53 push %ebx
2e2: e8 fc ff ff ff call 2e3 <L1+0x2c3>
2e7: 89 44 24 28 mov %eax,0x28(%esp,1)
2eb: 8d 55 2c lea 0x2c(%ebp),%edx
2ee: 89 45 28 mov %eax,0x28(%ebp)
2f1: 8b 44 24 40 mov 0x40(%esp,1),%eax
2f5: a3 00 00 00 00 mov %eax,0x0
2fa: 89 15 00 00 00 00 mov %edx,0x0
300: e8 fc ff ff ff call 301 <L1+0x2e1>
305: a1 00 00 00 00 mov 0x0,%eax
30a: 8b 00 mov (%eax),%eax
30c: 89 45 20 mov %eax,0x20(%ebp)
30f: 58 pop %eax
310: 5a pop %edx
311: 68 00 00 f0 3f push $0x3ff00000
316: 6a 00 push $0x0
318: 89 35 00 00 00 00 mov %esi,0x0
31e: e8 fc ff ff ff call 31f <L1+0x2ff>
323: 89 45 28 mov %eax,0x28(%ebp)
326: 5b pop %ebx
327: 5f pop %edi
328: 68 00 00 f0 3f push $0x3ff00000
32d: 6a 00 push $0x0
32f: e8 fc ff ff ff call 330 <L1+0x310>
334: 5a pop %edx
335: 59 pop %ecx
336: 50 push %eax
337: 8b 44 24 24 mov 0x24(%esp,1),%eax
33b: 50 push %eax
33c: e8 fc ff ff ff call 33d <L1+0x31d>
341: 89 45 2c mov %eax,0x2c(%ebp)
344: 8b 44 24 3c mov 0x3c(%esp,1),%eax
348: a3 00 00 00 00 mov %eax,0x0
34d: 89 35 00 00 00 00 mov %esi,0x0
353: e8 fc ff ff ff call 354 <L1+0x334>
358: a1 00 00 00 00 mov 0x0,%eax
35d: 8b 00 mov (%eax),%eax
35f: 89 45 24 mov %eax,0x24(%ebp)
362: 8b 44 24 58 mov 0x58(%esp,1),%eax
366: a3 00 00 00 00 mov %eax,0x0
36b: 8b 44 24 3c mov 0x3c(%esp,1),%eax
36f: a3 00 00 00 00 mov %eax,0x0
374: e8 fc ff ff ff call 375 <L1+0x355>
379: 59 pop %ecx
37a: a1 00 00 00 00 mov 0x0,%eax
37f: 5b pop %ebx
380: 8b 08 mov (%eax),%ecx
382: 51 push %ecx
383: 8b 44 24 58 mov 0x58(%esp,1),%eax
387: 50 push %eax
388: 89 35 00 00 00 00 mov %esi,0x0
38e: e8 fc ff ff ff call 38f <L1+0x36f>
393: 89 45 08 mov %eax,0x8(%ebp)
396: 8b 44 24 58 mov 0x58(%esp,1),%eax
39a: a3 00 00 00 00 mov %eax,0x0
39f: 8b 44 24 50 mov 0x50(%esp,1),%eax
3a3: a3 00 00 00 00 mov %eax,0x0
3a8: 83 c4 6c add $0x6c,%esp
3ab: 5b pop %ebx
3ac: 5e pop %esi
3ad: 5f pop %edi
3ae: 5d pop %ebp
3af: c3 ret
3b0: e8 fc ff ff ff call 3b1 <L1+0x391>
3b5: e9 87 fc ff ff jmp 41 <L1+0x21>

(DISASSEMBLE
(DEFUN DIGAMMA/SETQ-X (X)
(DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
(OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
((DOUBLE-FLOAT (0.0) *) X))
(INCF X 6.0)
(LET* ((P (/ 1.0 (* X X))) (LOGX (LOG X)) (ONE 1.0))
(DECLARE (DOUBLE-FLOAT ONE P))
(SETQ P
(* (- (* (+ (* P
(- (* P 0.0041666666666670005)
0.0039682539862540001))
0.0083333333333330002)
P)
0.083333333333332996)
P))
(+ P
(- LOGX (/ 0.5 X) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))))))))
--> T

------------------------------------------------------------------------
ECL 0.9g

;;; Loading #P"/usr/local/languages/ecl-cvs/lib/ecl/cmp.fas"
;;; Loading #P"/usr/local/languages/ecl-cvs/lib/ecl/sysfun.lsp"

Name: DIGAMMA/SETQ-X
Required: X
Documentation: NIL
Declarations: ((EXPLAIN TYPES INLINING VARIABLES) (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0)) ((DOUBLE-FLOAT (0.0d0) *) X))
0 QUOTE 6.0d0
2 BIND #:G42
4 PUSHV 2
6 PUSHV 0
8 CALLG 2,+
11 SETQ 2
13 UNBIND 1
15 PUSH '1.0d0
17 PUSHV 1
19 PUSHV 1
21 PCALLG 2,*
24 CALLG 2,/
27 BIND P
29 PUSHV 2
31 CALLG 1,LOG
34 BIND LOGX
36 QUOTE 1.0d0
38 BIND ONE
40 PUSHV 2
42 PUSHV 2
44 PUSH '0.004166666666667d0
46 PCALLG 2,*
49 PUSH '0.003968253986254d0
51 PCALLG 2,-
54 PCALLG 2,*
57 PUSH '0.008333333333333d0
59 PCALLG 2,+
62 PUSHV 2
64 PCALLG 2,*
67 PUSH '0.083333333333333d0
69 PCALLG 2,-
72 PUSHV 2
74 CALLG 2,*
77 SETQ 2
79 PUSHV 2
81 PUSHV 1
83 PUSH '0.5d0
85 PUSHV 4
87 PCALLG 2,/
90 PUSHV 0
92 PUSHV 4
94 PUSHV 0
96 CALLG 2,-
99 SETQ 4
101 PUSH VALUES(0)
102 PCALLG 2,/
105 PUSHV 0
107 PUSHV 4
109 PUSHV 0
111 CALLG 2,-
114 SETQ 4
116 PUSH VALUES(0)
117 PCALLG 2,/
120 PUSHV 0
122 PUSHV 4
124 PUSHV 0
126 CALLG 2,-
129 SETQ 4
131 PUSH VALUES(0)
132 PCALLG 2,/
135 PUSHV 0
137 PUSHV 4
139 PUSHV 0
141 CALLG 2,-
144 SETQ 4
146 PUSH VALUES(0)
147 PCALLG 2,/
150 PUSHV 0
152 PUSHV 4
154 PUSHV 0
156 CALLG 2,-
159 SETQ 4
161 PUSH VALUES(0)
162 PCALLG 2,/
165 PUSHV 0
167 PUSHV 4
169 PUSHV 0
171 CALLG 2,-
174 SETQ 4
176 PUSH VALUES(0)
177 PCALLG 2,/
180 PCALLG 8,-
183 CALLG 2,+
186 UNBIND 3
188 EXIT
(DISASSEMBLE
(DEFUN DIGAMMA/SETQ-X (X)
(DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
(OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0)) ((DOUBLE-FLOAT (0.0d0) *) X))
(INCF X 6.0d0)
(LET* ((P (/ 1.0d0 (* X X))) (LOGX (LOG X)) (ONE 1.0d0))
(DECLARE (DOUBLE-FLOAT ONE P))
(SETQ P
(*
(-
(*
(+ (* P (- (* P 0.004166666666667d0) 0.003968253986254d0))
0.008333333333333d0)
P)
0.083333333333333d0)
P))
(+ P
(- LOGX
(/ 0.5d0 X)
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))))))))
--> NIL

------------------------------------------------------------------------
International Allegro CL Free Express Edition 8.0 [Linux (x86)] (Jun 6, 2006 16:01)

;Tgen1:Examined a (possibly unboxed) call to +_2OP with arguments:
;Targ2: symeval X type (DOUBLE-FLOAT (0.0d0) *)
;Tinf1: VARIABLE-information: LEXICAL: ((TYPE (DOUBLE-FLOAT (0.0d0) *)))
;Targ3: constant 6.0d0 type (DOUBLE-FLOAT 6.0d0 6.0d0)

[...a lot of output....]

;Igen2: Arg1: type is unboxable as DOUBLE-FLOAT
;Igen4: Inline attempt succeeded.
;Vflt1: Variables stored in floating point registers: (X P LOGX ONE X X X X X X).
;Vflt2: Variables stored in floating point memory: (X).
;; disassembly of An unhandled error occurred during initialization:
An error occurred
(Unable to print
#<Function (:ANONYMOUS-LAMBDA 2) @ #x71587c12>
readably and *PRINT-READABLY* is true.)
during the load "/tmp/clall-7736.lisp"

[pjb@thalassa lisp]$ clall '(progn (compile (defun digamma/setq-x (x)
(declare (:explain :types :inlining :variables)
(optimize (speed 3) (safety 0) (debug 0))
(type (double-float (0.0d0) *) x))
(incf x 6d0)
(let* ((p (/ 1d0 (* x x)))
(logx (log x))
(one 1d0))
(declare (double-float one p))
(setq p (* (- (* (+ (* p (- (* p 0.004166666666667d0)
0.003968253986254d0))
0.008333333333333d0) p)
0.083333333333333d0) p))
(+ p (- logx
(/ 0.5d0 x)
(/ one (setq x (- x one)))
(/ one (setq x (- x one)))
(/ one (setq x (- x one)))
(/ one (setq x (- x one)))
(/ one (setq x (- x one)))
(/ one (setq x (- x one)))))
))) (time (loop for i from 10000 to 1000000
do (digamma/setq-x (coerce i (quote double-float))))))'
> > > > > > > > > > > > > > > > > > > > > >
------------------------------------------------------------------------
CLISP 2.39 (2006-07-16) (built 3364813332) (memory 3364813914)

WARNING in DIGAMMA/SETQ-X :
Unknown declaration :EXPLAIN.
The whole declaration will be ignored.
Real time: 123.51723 sec.
Run time: 110.53891 sec.
Space: 2714225608 Bytes
GC: 5177, GC time: 42.89465 sec.
(PROGN
(COMPILE
(DEFUN DIGAMMA/SETQ-X (X)
(DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
(OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
(TYPE (DOUBLE-FLOAT (0.0d0) *) X))
(INCF X 6.0d0)
(LET* ((P (/ 1.0d0 (* X X))) (LOGX (LOG X)) (ONE 1.0d0))
(DECLARE (DOUBLE-FLOAT ONE P))
(SETQ P
(*
(-
(*
(+ (* P (- (* P 0.004166666666667d0) 0.003968253986254d0))
0.008333333333333d0)
P)
0.083333333333333d0)
P))
(+ P
(- LOGX (/ 0.5d0 X) (/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE))))))))
(TIME
(LOOP FOR I FROM 10000 TO 1000000 DO
(DIGAMMA/SETQ-X (COERCE I 'DOUBLE-FLOAT)))))
--> NIL

------------------------------------------------------------------------
SBCL 0.9.17

; in: DEFUN DIGAMMA/SETQ-X
; (DEFUN DIGAMMA/SETQ-X (X)
; (DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
; (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
; (TYPE (DOUBLE-FLOAT (0.0d0) *) X))
; (INCF X 6.0d0)
; (LET* ((P (/ 1.0d0 #)) (LOGX (LOG X)) (ONE 1.0d0))
; (DECLARE (DOUBLE-FLOAT ONE P))
; (SETQ P (* (- # 0.083333333333333d0) P))
; (+ P
; (- LOGX (/ 0.5d0 X) (/ ONE #) (/ ONE #) (/ ONE #) (/ ONE #) (/ ONE #)
; (/ ONE #)))))
; --> PROGN EVAL-WHEN SB-IMPL::%DEFUN SB-INT:NAMED-LAMBDA
; ==>
; #'(SB-INT:NAMED-LAMBDA DIGAMMA/SETQ-X (X)
; (DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
; (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
; (TYPE (DOUBLE-FLOAT (0.0d0) *) X))
; (BLOCK DIGAMMA/SETQ-X
; (INCF X 6.0d0)
; (LET* ((P #) (LOGX #) (ONE 1.0d0))
; (DECLARE (DOUBLE-FLOAT ONE P))
; (SETQ P (* # P))
; (+ P (- LOGX # # # # # # #)))))
;
; caught WARNING:
; unrecognized declaration (:EXPLAIN :TYPES :INLINING :VARIABLES)
;
; note: doing float to pointer coercion (cost 13) to "<return value>"
;
; compilation unit finished
; caught 1 WARNING condition
; printed 1 note
Evaluation took:
0.476 seconds of real time
0.340021 seconds of user run time
0.032002 seconds of system run time
[Run times include 0.012 seconds GC run time.]
0 calls to %EVAL
0 page faults and
15,843,328 bytes consed.

(PROGN
(COMPILE
(DEFUN DIGAMMA/SETQ-X (X)
(DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
(OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
(TYPE (DOUBLE-FLOAT (0.0d0) *) X))
(INCF X 6.0d0)
(LET* ((P (/ 1.0d0 (* X X))) (LOGX (LOG X)) (ONE 1.0d0))
(DECLARE (DOUBLE-FLOAT ONE P))
(SETQ P
(*
(-
(*
(+ (* P (- (* P 0.004166666666667d0) 0.003968253986254d0))
0.008333333333333d0)
P)
0.083333333333333d0)
P))
(+ P
(- LOGX (/ 0.5d0 X) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))))))))
(TIME
(LOOP FOR I FROM 10000 TO 1000000 DO
(DIGAMMA/SETQ-X (COERCE I 'DOUBLE-FLOAT)))))
--> NIL

------------------------------------------------------------------------
CMU Common Lisp 19c (19C)

; In: DEFUN DIGAMMA/SETQ-X

; (DEFUN DIGAMMA/SETQ-X (X) (DECLARE # # #) (INCF X 6.0d0) ...)
; Warning: Unrecognized declaration: (:EXPLAIN :TYPES :INLINING :VARIABLES).
;
; Converted DIGAMMA/SETQ-X.

; In: LAMBDA (X)

; #'(LAMBDA (X) (DECLARE # # #) (BLOCK DIGAMMA/SETQ-X # #))
; Warning: Unrecognized declaration: (:EXPLAIN :TYPES :INLINING :VARIABLES).
;
; Compiling LAMBDA (X):

; In: LAMBDA (X)

; #'(LAMBDA (X) (DECLARE # # #) (BLOCK DIGAMMA/SETQ-X # #))
; Note: Doing float to pointer coercion (cost 13) to "<return value>".
;
; Compiling Top-Level Form:

; Compilation unit finished.
; 1 warning
; 1 note

; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; [GC threshold exceeded with 12,012,864 bytes in use. Commencing GC.]
; [GC completed with 323,128 bytes retained and 11,689,736 bytes freed.]
; [GC will next occur when at least 12,323,128 bytes are in use.]
; [GC threshold exceeded with 12,331,640 bytes in use. Commencing GC.]
; [GC completed with 327,216 bytes retained and 12,004,424 bytes freed.]
; [GC will next occur when at least 12,327,216 bytes are in use.]

; Evaluation took:
; 0.77 seconds of real time
; 0.520032 seconds of user run time
; 0.080005 seconds of system run time
; 925,499,781 CPU cycles
; [Run times include 0.1 seconds GC run time]
; 3 page faults and
; 31,685,280 bytes consed.
;

(PROGN
(COMPILE
(DEFUN DIGAMMA/SETQ-X (X)
(DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
(OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
(TYPE (DOUBLE-FLOAT (0.0d0) *) X))
(INCF X 6.0d0)
(LET* ((P (/ 1.0d0 (* X X))) (LOGX (LOG X)) (ONE 1.0d0))
(DECLARE (DOUBLE-FLOAT ONE P))
(SETQ P
(*
(-
(*
(+ (* P (- (* P 0.004166666666667d0) 0.003968253986254d0))
0.008333333333333d0)
P)
0.083333333333333d0)
P))
(+ P
(- LOGX
(/ 0.5d0 X)
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))))))))
(TIME
(LOOP FOR I FROM 10000 TO 1000000
DO (DIGAMMA/SETQ-X (COERCE I 'DOUBLE-FLOAT)))))
--> NIL

------------------------------------------------------------------------
GNU Common Lisp (GCL) GCL 2.6.7

Compiling gazonk0.lsp.
Warning:
The OPTIMIZE quality DEBUG is unknown.
Warning:
The declaration specifier :EXPLAIN is unknown.
End of Pass 1.
End of Pass 2.
OPTIMIZE levels: Safety=0 (No runtime error checking), Space=0, Speed=3
Finished compiling gazonk0.lsp.
real time : 13.250 secs
run-gbc time : 9.590 secs
child run time : 0.000 secs
gbc time : 2.460 secs

(PROGN
(COMPILE (DEFUN DIGAMMA/SETQ-X (X)
(DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
(OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
(TYPE (DOUBLE-FLOAT (0.0) *) X))
(INCF X 6.0)
(LET* ((P (/ 1.0 (* X X))) (LOGX (LOG X)) (ONE 1.0))
(DECLARE (DOUBLE-FLOAT ONE P))
(SETQ P
(* (- (* (+ (* P
(- (* P 0.0041666666666670005)
0.0039682539862540001))
0.0083333333333330002)
P)
0.083333333333332996)
P))
(+ P
(- LOGX (/ 0.5 X) (/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))))))))
(TIME (LOOP
FOR
I
FROM
10000
TO
1000000
DO
(DIGAMMA/SETQ-X (COERCE I 'DOUBLE-FLOAT)))))
--> NIL

------------------------------------------------------------------------
ECL 0.9g

;;; Loading #P"/usr/local/languages/ecl-cvs/lib/ecl/cmp.fas"
;;; Loading #P"/usr/local/languages/ecl-cvs/lib/ecl/sysfun.lsp"
;;; Compiling (LET* (# #) ...).
;;; Warning: The declaration specifier :EXPLAIN is unknown.
;;; Note: Replacing variable G58 by its value #<form LOCATION 82EF870>
;;; End of Pass 1.
;;; Note: Replacing variable G53 by its value
;;; Calling the C compiler...
;;; Note:
;;; Invoking external command: gcc -I../include -g -O2 -fPIC -fstrict-aliasing -Dlinux -O "-I/usr/local/languages/ecl-cvs/lib/ecl//h" -w -c "/a6/users/pjb/src/public/lisp/ECL001przwXx.c" -o "/a6/users/pjb/src/public/lisp/ECL001przwXx.o"

;;; Note:
;;; Invoking external command: gcc -o "/a6/users/pjb/src/public/lisp/ECL001przwXx.fas" -L"/usr/local/languages/ecl-cvs/lib/ecl/" "/a6/users/pjb/src/public/lisp/ECL001przwXx.o" -Wl,--rpath,/usr/local/languages/ecl-cvs/lib/ecl/ -shared -lecl -ldl -lm

;;; OPTIMIZE levels: Safety=2, Space=0, Speed=3
real time : 19.000 secs
run time : 18.460 secs

(PROGN
(COMPILE
(DEFUN DIGAMMA/SETQ-X (X)
(DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES)
(OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0))
(TYPE (DOUBLE-FLOAT (0.0d0) *) X))
(INCF X 6.0d0)
(LET* ((P (/ 1.0d0 (* X X))) (LOGX (LOG X)) (ONE 1.0d0))
(DECLARE (DOUBLE-FLOAT ONE P))
(SETQ P
(*
(-
(*
(+ (* P (- (* P 0.004166666666667d0) 0.003968253986254d0))
0.008333333333333d0)
P)
0.083333333333333d0)
P))
(+ P
(- LOGX
(/ 0.5d0 X)
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE)))
(/ ONE (SETQ X (- X ONE))))))))
(TIME
(LOOP FOR
I
FROM
10000
TO
1000000
DO
(DIGAMMA/SETQ-X (COERCE I 'DOUBLE-FLOAT)))))
--> NIL

------------------------------------------------------------------------
International Allegro CL Free Express Edition 8.0 [Linux (x86)] (Jun 6, 2006 16:01)

;Tgen1:Examined a (possibly unboxed) call to +_2OP with arguments:
;Targ2: symeval X type (DOUBLE-FLOAT (0.0d0) *)
;Tinf1: VARIABLE-information: LEXICAL: ((TYPE (DOUBLE-FLOAT (0.0d0) *)))
;Targ3: constant 6.0d0 type (DOUBLE-FLOAT 6.0d0 6.0d0)
;Tres1: which returns a value of type (DOUBLE-FLOAT (6.0d0) *)

[... a lot of output ...]

;Igen1: Arg1: checking for implied unboxable type (floats, or machine-integer):
;Igen2: Arg1: type is unboxable as DOUBLE-FLOAT
;Igen4: Inline attempt succeeded.
;Vflt1: Variables stored in floating point registers: (X P LOGX ONE X X X X X X).
;Vflt2: Variables stored in floating point memory: (X).
; cpu time (non-gc) 19,130 msec user, 20 msec system
; cpu time (gc) 1,690 msec user, 0 msec system
; cpu time (total) 20,820 msec user, 20 msec system
; real time 22,630 msec
; space allocation:
; 15,840,172 cons cells, 150,483,824 other bytes, 0 static bytes
(PROGN (COMPILE (DEFUN DIGAMMA/SETQ-X (X) (DECLARE (:EXPLAIN :TYPES :INLINING :VARIABLES) (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 0)) (TYPE (DOUBLE-FLOAT (0.0d0) *) X)) (INCF X 6.0d0) (LET* ((P (/ 1.0d0 (* X X))) (LOGX (LOG X)) (ONE 1.0d0)) (DECLARE (DOUBLE-FLOAT ONE P)) (SETQ P (* (- (* (+ (* P (- (* P 0.004166666666667d0) 0.003968253986254d0)) 0.008333333333333d0) P) 0.083333333333333d0) P)) (+ P (- LOGX (/ 0.5d0 X) (/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE))) (/ ONE (SETQ X (- X ONE)))))))) (TIME (LOOP FOR I FROM 10000 TO 1000000 DO (DIGAMMA/SETQ-X (COERCE I (QUOTE DOUBLE-FLOAT))))))
--> NIL

------------------------------------------------------------------------

[pjb@thalassa lisp]$

--
__Pascal Bourguignon__ http://www.informatimago.com/

"A TRUE Klingon warrior does not comment his code!"

Joel Reymont

unread,

Oct 20, 2006, 4:04:26 PM10/20/06

to

Pascal,

Pascal Bourguignon wrote:

> Allegro 8.0 20.820

Thanks for the stats! The ACL timing above is wrong, though, and
happens when log is used instead of double-log. Please modify (logx
(log x) to read (logx (double-log x)). So long as you include the
optimizatio hints for ACL you'll have awesome performance.

> (let* ((p (/ 1d0 (* x x)))
> (logx (log x))
> (one 1d0))

It seems that Lisps other than ACL and LW are not using SSE2/SSE3
instructions. LW also seems to use them partially, i.e. mix SSE and x87
in the code for digamma.

This is my latest version of the code with the change above:

(in-package :cl-user)

#+lispworks
(declaim (optimize (float 0))
(ftype (function (double-float) double-float) digamma/setq-x)
)

#+allegro
(eval-when (compile load eval)
(setf (get 'digamma/setq-x 'sys::immed-args-call) ; now say some
magic
'((double-float) double-float)))

#+allegro
(ff:def-foreign-call (double-log "log") ((x :double))
:returning :double
:call-direct t
:arg-checking nil)

(defun digamma/setq-x (x)
(declare (optimize (speed 3) (safety 0) (debug 0) (float 0))
(type (double-float (0d0) *) x))
(let ((x (+ x 6d0)))
(declare (type (double-float (6d0) *) x))

(let* ((p (/ 1d0 (* x x)))

#+lispworks (logx (log x))
#+allegro (logx (double-log x))
(one 1d0))
(declare (type double-float one p))

(setq p (* (- (* (+ (* p (- (* p 0.004166666666667d0)
0.003968253986254d0))
0.008333333333333d0) p)
0.083333333333333d0) p))

(+ p (- (- (- (- (- (- (- logx
(/ 0.5d0 x))
(/ one (setq x (- x one))))
(/ one (setq x (- x one))))
(/ one (setq x (- x one))))
(/ one (setq x (- x one))))
(/ one (setq x (- x one))))

(/ one (setq x (- x one)))))
)))

(defun timeit ()
(declare (optimize (speed 3) (safety 0) (debug 0) (float 0)))
(loop for i from 10000 to 10000000
do (digamma/setq-x (coerce (the fixnum i) 'double-float))))

Richard J. Fateman

unread,

Oct 20, 2006, 4:17:45 PM10/20/06

to Joel Reymont

I guess I don't understand the double-log business.
I ran your "latest version" and then I changed the line
#+allegro (logx (double-log x))

to
#+allegro (logx (log x))

and it ran faster. On my Windows machine the first version was
about 6.5 seconds. The second was about 2.5 seconds. [Timing (timeit)..]

RJF

Joel Reymont

unread,

Oct 20, 2006, 4:31:43 PM10/20/06

to

Richard J. Fateman wrote:
> On my Windows machine the first version was
> about 6.5 seconds. The second was about 2.5 seconds. [Timing (timeit)..]

I will let Duane comment on this. What type of architecture are you on?
What version of ACL?

Thanks, Joel

Richard J. Fateman

unread,

Oct 20, 2006, 5:01:38 PM10/20/06

to

(I thought "Windows" gave it away.. but my benchmark machine is a 2.6GHz
Pentium 4 running Windows XP. I'm running Allegro CL 8.0 [licensed,
though I think the free trial "express" version would be identical])

I subsequently looked at the gmane discussion and found it had to do
with lisp running on Macintosh. So maybe it's irrelevant.

My own interest in doing floating point in lisp has been trying to make
better use of some of the subtleties in the IEEE floating point
implementations (rounding modes, traps); speed of properly declared code
has become much less of an issue in comparing Lisp to C (etc).

Joel Reymont

unread,

Oct 20, 2006, 5:47:04 PM10/20/06

to

Richard J. Fateman wrote:
> (I thought "Windows" gave it away.. but my benchmark machine is a 2.6GHz
> Pentium 4 running Windows XP. I'm running Allegro CL 8.0 [licensed,
> though I think the free trial "express" version would be identical])

The P4 bit is what I was interested in. Can you send me the disassembly
from your digamma or post it here? I wonder if SSE instructions are
being used. The Mac Intel bit should not matter as it's still x86. At
least I don't think it should matter.

mark.h...@gmail.com

unread,

Oct 21, 2006, 1:59:06 AM10/21/06

to

On Oct 20, 2:47 pm, "Joel Reymont" <joe...@gmail.com> wrote:
> The P4 bit is what I was interested in. Can you send me the disassembly
> from your digamma or post it here? I wonder if SSE instructions are
> being used. The Mac Intel bit should not matter as it's still x86. At
> least I don't think it should matter.

One thing to remember is that just because you see that SSE
instructions are being used, it doesn't mean that the compiler is
taking advantage of their parallelism. I've seen recent versions of
the Intel C compiler emit SSE2 instructions but only using one-half of
the 128-bit pipe. The reason for this is that on the Pentium 4, the
SSE2 instructions have one cycle less latency than the double-precision
x87 instructions. Intel engineers (and presumably also AMD, though I
haven't checked) don't seem to consider the x87 pipe a priority for
optimization (alas for poor Prof. Kahan, who helped design it and
insisted on the 80-bit internal registers -- the SSE2 pipes are 64-bit
all the way).

To tell if the SSE2 instructions are actually being exploited for their
full parallelism, you'll need to make sure that both halves of the SSE2
register are being filled.

Best,
mfh

Douglas Crosher

unread,

Oct 21, 2006, 6:19:10 AM10/21/06

to

Hello Joel,

The Scieneer Common Lisp is 10% faster than optimize C code for your
problem. The Scieneer Common Lisp takes advantage of SSE2 and SSE3
instructions but does not yet taking advantage of SIMD operations.
Extended precision floating point is also supported using the x87 FPU
instructions. The results below were run on a 2000MHz Athlon 64.

File: digamma.c
-=-
#include "math.h"

static inline double digamma(double x)
{
double p;
x=x+6;
p=1/(x*x);
p=(((0.004166666666667*p-0.003968253986254)*p+
0.008333333333333)*p-0.083333333333333)*p;
p=p+log(x)-0.5/x-1/(x-1)-1/(x-2)-1/(x-3)-1/(x-4)-1/(x-5)-1/(x-6);
return p;
}

double timeit()
{
int i;
double result;

for (i = 10000; i < 10000000; i++)
{
result = digamma((double) i);
}

return result;
}

main()
{
timeit();
}
-=-

cc -O3 -o digamma digamma.c -lm
time digamma
1.624u 0.004s 0:01.61 100.6% 0+0k 0+0io 0pf+0w

File: digamma1.lisp
-=-
(in-package :cl-user)

(declaim (inline digamma/setq-x))
(defun digamma/setq-x (x)
(declare (optimize (speed 3) (safety 0) (debug 0))

(type (double-float (0d0) *) x))
(let ((x (+ x 6d0)))
(declare (type (double-float (6d0) *) x))
(let* ((p (/ 1d0 (* x x)))

(logx (log x))
(one 1d0))

(declare (type double-float one p))
(setq p (* (- (* (+ (* p (- (* p 0.004166666666667d0)
0.003968253986254d0))
0.008333333333333d0) p)
0.083333333333333d0) p))
(+ p (- (- (- (- (- (- (- logx
(/ 0.5d0 x))
(/ one (setq x (- x one))))
(/ one (setq x (- x one))))
(/ one (setq x (- x one))))
(/ one (setq x (- x one))))
(/ one (setq x (- x one))))
(/ one (setq x (- x one)))))
)))

(defun timeit ()
(declare (optimize (speed 3) (safety 0) (debug 0)))
(let ((result 0d0))
(declare (double-float result))

(loop for i from 10000 to 10000000

do (setf result (digamma/setq-x (coerce (the fixnum i) 'double-float))))
result))
-=-

* (time (timeit))
Compiling lambda nil:
Compiling Top-Level Form:

Evaluation took:
1.4379311 seconds of real time
1.4306431 seconds of thread run time
1.4306521 seconds of process run time
1.432089 seconds of user run time
0.004 seconds of system run time
0 page faults
16 bytes consed by this thread and
16 total bytes consed.

Thibault Langlois

unread,

Oct 21, 2006, 7:55:27 AM10/21/06

to

I get similar results with cmucl 19c (2GHz Pentium M):

CL-USER> (time (timeit))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:

; Evaluation took:
; 1.55 seconds of real time
; 1.544097 seconds of user run time
; 0.0 seconds of system run time
; 3,081,616,411 CPU cycles
; 0 page faults and
; 16 bytes consed.

C version on the same machine:
$ time digamma

real 0m2.237s
user 0m2.228s
sys 0m0.000s

William D Clinger

unread,

Oct 21, 2006, 9:31:19 AM10/21/06

to

Joel Reymont wrote:
> The P4 bit is what I was interested in. Can you send me the disassembly
> from your digamma or post it here? I wonder if SSE instructions are
> being used. The Mac Intel bit should not matter as it's still x86. At
> least I don't think it should matter.

Apple's C compiler generates SSE2 instructions. I looked
at this just yesterday, while we were trying to figure out
why double precision (sqrt 0.9999999999999999) was giving
us the correct answer (according to IEEE-754) under Mac
OS X but was returning 1.0 under Linux.

Under Linux, the C compiler was generating the old-style
instructions, probably because there are so many Linux
machines in the field that don't support SSE and SSE2.
Apple doesn't have that problem.

Will

Marcin 'Qrczak' Kowalczyk

unread,

Oct 21, 2006, 10:47:05 AM10/21/06

to

Douglas Crosher <d...@scieneer.com> writes:

> The Scieneer Common Lisp is 10% faster than optimize C code for your
> problem.

Adding -ffast-math to gcc invocation makes the C code 10% faster on my
machine.

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

Terry Sullivan

unread,

Oct 21, 2006, 4:24:09 PM10/21/06

to

Just to make sure that the challenge isn't using unoptimized C to
compare with, I played arount with the C version a bit just to see if I
could make it any faster and came up with this:

#include "math.h"

static inline double digamma(double x)
{
double p;
x=x+6;
p=1/(x*x);
p=(((0.004166666666667*p-0.003968253986254)*p+
0.008333333333333)*p-0.083333333333333)*p;
p=p+log(x)-0.5/x-1/(x-1)-1/(x-2)-1/(x-3)-1/(x-4)-1/(x-5)-1/(x-6);
return p;

}

#define EnableDuffDevice 1
#define DuffCount 2

#if DuffCount > 10
#define EnableDuffDevice 0
#endif

#define DuffCase(n) case n: result = digamma(d += 1.0)

double timeit()
{
double d;
double result;

d = 10000.0;
while (d < 10000000.0)
{
#if EnableDuffDevice == 1
switch ((int) d % DuffCount)
{
DuffCase (0);
#if DuffCount > 9
DuffCase (9);
#endif
#if DuffCount > 8
DuffCase (8);
#endif
#if DuffCount > 7
DuffCase (7);
#endif
#if DuffCount > 6
DuffCase (6);
#endif
#if DuffCount > 5
DuffCase (5);
#endif
#if DuffCount > 4
DuffCase (4);
#endif
#if DuffCount > 3
DuffCase (3);
#endif
#if DuffCount > 2
DuffCase (2);
#endif
#if DuffCount > 1
DuffCase (1);
#endif
}
#else
result = digamma(d += 1.0);
#endif
}

return result;

}

main()
{
timeit();
}

On my system (PPC G4 1Ghz MacOSX), the original gave the following times

time ./digamma
real 0m6.033s
user 0m5.782s
sys 0m0.039s

The my version gives:

time ./digamma2
real 0m0.115s
user 0m0.076s
sys 0m0.007s

Play with the DuffCount to find a setup where the code in the loop fits
in L1 or at worst L2 cache.

I would be interested to know how this version does on a newer faster
machine. (Assuming I haven't made a horribly stupid coding error)

Ts

Terry Sullivan

unread,

Oct 21, 2006, 4:31:53 PM10/21/06

to

The compiler options I used were:

cc -O3 -ffast-math -o digamma digamma.c -lm

Ts

Brian Downing

unread,

Oct 21, 2006, 5:51:02 PM10/21/06

to

In article <a0f84$453a8193$186091bf$14...@KNOLOGY.NET>,

Terry Sullivan <MyNameI...@drifter.net> wrote:
> Just to make sure that the challenge isn't using unoptimized C to
> compare with, I played arount with the C version a bit just to see if I
> could make it any faster and came up with this:

[snip]

> On my system (PPC G4 1Ghz MacOSX), the original gave the following times
> time ./digamma
> real 0m6.033s
>

> The my version gives:
> time ./digamma2
> real 0m0.115s

This is a good object lesson in that you must be very careful benchmarking
things with modern optimizing compilers - you've managed to confuse the
compiler into "optimizing" the whole algorithm away!

%% 154 tiamat:~/projects/c/scratch
:; gcc -O3 -Wall -ffast-math -o digamma2-orig digamma2-orig.c -lm
digamma2-orig.c:74: warning: return type defaults to ‘int’
digamma2-orig.c: In function ‘main’:
digamma2-orig.c:76: warning: control reaches end of non-void function
digamma2-orig.c: In function ‘timeit’:
digamma2-orig.c:27: warning: ‘result’ may be used uninitialized in this function

Notice the last warning there.

Let's try something here:

:; valgrind --tool=cachegrind ./digamma2-orig
==28188== Cachegrind, an I1/D1/L2 cache profiler.
[...]
==28188== I refs: 65,061,501
==28188== I1 misses: 541
==28188== L2i misses: 536

Only 65 million instruction fetches? Let's see, with 9990000 runs of
digamma2 that makes for:

CL-USER> (float (/ 65061501 9990000))
6.512663

about 6.5 instruction fetches per loop! I purport that it's impossible
to compile the algorithm down to six amd64 instructions.

Stepping with GDB, the main() loop looks like this:

1: x/i $rip 0x400510 <main+48>: cvttsd2si %xmm0,%eax
1: x/i $rip 0x400514 <main+52>: mov %eax,%edx
1: x/i $rip 0x400516 <main+54>: shr $0x1f,%edx
1: x/i $rip 0x400519 <main+57>: add %edx,%eax
1: x/i $rip 0x40051b <main+59>: and $0x1,%eax
1: x/i $rip 0x40051e <main+62>: sub %edx,%eax
1: x/i $rip 0x400520 <main+64>: jne 0x400500 <main+32>
1: x/i $rip 0x400522 <main+66>: mov %rcx,0xfffffffffffffff8(%rsp)
1: x/i $rip 0x400527 <main+71>: movlpd 0xfffffffffffffff8(%rsp),%xmm1
1: x/i $rip 0x40052d <main+77>: addsd %xmm1,%xmm0
1: x/i $rip 0x400531 <main+81>: addsd %xmm1,%xmm0
1: x/i $rip 0x400535 <main+85>: comisd %xmm2,%xmm0
1: x/i $rip 0x400539 <main+89>: jb 0x400510 <main+48>
1: x/i $rip 0x400510 <main+48>: cvttsd2si %xmm0,%eax
1: x/i $rip 0x400514 <main+52>: mov %eax,%edx
1: x/i $rip 0x400516 <main+54>: shr $0x1f,%edx
1: x/i $rip 0x400519 <main+57>: add %edx,%eax
1: x/i $rip 0x40051b <main+59>: and $0x1,%eax
1: x/i $rip 0x40051e <main+62>: sub %edx,%eax
1: x/i $rip 0x400520 <main+64>: jne 0x400500 <main+32>
1: x/i $rip 0x400522 <main+66>: mov %rcx,0xfffffffffffffff8(%rsp)
1: x/i $rip 0x400527 <main+71>: movlpd 0xfffffffffffffff8(%rsp),%xmm1
1: x/i $rip 0x40052d <main+77>: addsd %xmm1,%xmm0
1: x/i $rip 0x400531 <main+81>: addsd %xmm1,%xmm0
1: x/i $rip 0x400535 <main+85>: comisd %xmm2,%xmm0
1: x/i $rip 0x400539 <main+89>: jb 0x400510 <main+48>
[repeat]

There's nothing left of the algorithm here.

Indeed, with this patch (which sums the results from inside the inline
function, thereby forcing the code to be compiled in), you'll see that
the Duff's Device case is just as slow as the naive loop:

--- digamma2-orig.c 2006-10-21 16:26:46.000000000 -0500
+++ digamma2.c 2006-10-21 16:25:04.000000000 -0500
@@ -1,13 +1,23 @@
+#include "stdio.h"
#include "math.h"

+double tot_x = 0.0;
+double tot_p = 0.0;
+
+#define SumResults 1
+

static inline double digamma(double x)
{
double p;

+ tot_x += x;

x=x+6;
p=1/(x*x);
p=(((0.004166666666667*p-0.003968253986254)*p+
0.008333333333333)*p-0.083333333333333)*p;
p=p+log(x)-0.5/x-1/(x-1)-1/(x-2)-1/(x-3)-1/(x-4)-1/(x-5)-1/(x-6);

+ #if SumResults == 1
+ tot_p += p;
+ #endif
return p;

}
@@ -66,6 +76,9 @@
#endif
}

+ printf("tot_x: %f\n", tot_x);
+ printf("tot_p: %f\n", tot_p);
+
return result;

}

-bcd

Terry Sullivan

unread,

Oct 22, 2006, 1:35:56 AM10/22/06

to

Given the speed difference that I was seeing between the new and the
old, I wondered if something like that wasn't taking place. I didn't
make changes int the digamma function it's self, because that would have
been changing the algorithm that was being benchmarked. I just wanted to
eliminate inefficiencies in the supporting code.

Brian Downing wrote:
> In article <a0f84$453a8193$186091bf$14...@KNOLOGY.NET>,
> Terry Sullivan <MyNameI...@drifter.net> wrote:
>
>>Just to make sure that the challenge isn't using unoptimized C to
>>compare with, I played arount with the C version a bit just to see if I
>>could make it any faster and came up with this:
>
>
> [snip]
>
>
>>On my system (PPC G4 1Ghz MacOSX), the original gave the following times
>>time ./digamma
>>real 0m6.033s
>>
>>The my version gives:
>>time ./digamma2
>>real 0m0.115s
>
>
> This is a good object lesson in that you must be very careful benchmarking
> things with modern optimizing compilers - you've managed to confuse the
> compiler into "optimizing" the whole algorithm away!
>
> %% 154 tiamat:~/projects/c/scratch
> :; gcc -O3 -Wall -ffast-math -o digamma2-orig digamma2-orig.c -lm

> digamma2-orig.c:74: warning: return type defaults to â€˜intâ€™
> digamma2-orig.c: In function â€˜mainâ€™:

> digamma2-orig.c:76: warning: control reaches end of non-void function

> digamma2-orig.c: In function â€˜timeitâ€™:
> digamma2-orig.c:27: warning: â€˜resultâ€™ may be used uninitialized in this function

Terry Sullivan

unread,

Oct 22, 2006, 2:09:48 AM10/22/06

to

With these changes in both my code and the original, minus the argument
summation, I now get:

Original Code:
time ./digamma
tot_p: 151098846.198064
Result = 16.118096

real 0m6.081s
user 0m5.892s
sys 0m0.036s

My Code with Duffs @ 2:
time ./digamma2
tot_p: 151098853.105869
Result = 16.118096

real 0m5.955s
user 0m5.581s
sys 0m0.041s

My code with out Duffs:
time ./digamma2
tot_p: 151098853.105869
Result = 16.118096

real 0m5.845s
user 0m5.631s
sys 0m0.037s

So I would agree, Duff's device doesn't actually help much in this case
on my CPU.

Oh well.
Had fun anyway.

Ts

mark.h...@gmail.com

unread,

Oct 22, 2006, 1:25:49 PM10/22/06

to

Douglas Crosher wrote:
> The Scieneer Common Lisp is 10% faster than optimize C code for your
> problem. The Scieneer Common Lisp takes advantage of SSE2 and SSE3
> instructions but does not yet taking advantage of SIMD operations.
> Extended precision floating point is also supported using the x87 FPU
> instructions. The results below were run on a 2000MHz Athlon 64.

Say, you wouldn't happen to have a Scieneer Common Lisp manual
on your hands, would you? I'm working on a project that I'm trying
to make as portable as possible, and I'd like to know if and how SCL
supports static arrays (i.e. memory pinned and not moved around
when the system GC's) and/or turning off GC temporarily.

Best,
mfh

Douglas Crosher

unread,

Oct 22, 2006, 6:43:52 PM10/22/06

to mark.h...@gmail.com

The recommended approach is to use the 'ext:with-pinned-object macro which
pins the address of a lisp object within the context of the macro, and
by default also inhibits write protection of pages occupied by the object.
This allows other threads to execute garbage collection while the current
thread is executing the foreign function. For example:

(defun example (array)
(ext:with-pinned-object (array)
(let ((sap (sys:vector-sap array)))
... foreign call that may read or write directly to the lisp array ...
)))

If the foreign function does not write to the lisp object then the following
faster form may be used:

(ext:with-pinned-object (array :inhibit-write-protection nil) ...)

Please follow up in private if you have any more questions.

Regards
Douglas Crosher

Joel Reymont

unread,

Oct 23, 2006, 5:30:59 AM10/23/06

to

For the record, ACL 8.0 turns out to be 40% faster on my floating point
benchmark.

See http://article.gmane.org/gmane.lisp.lispworks.general/6093

; cpu time (non-gc) 1,940 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 1,940 msec user, 0 msec system
; real time 1,947 msec
; space allocation:
; 0 cons cells, 0 other bytes, 0 static bytes