Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

need help with fast bit twiddling, can someone suggest faster CL code?

68 views
Skip to first unread message

Jim Newton

unread,
May 18, 2018, 11:47:16 AM5/18/18
to
I have some bit manipulation code which runs in a time-critical loop of my program,
it is part of an n^2 closest-neighbor sorting function.
Can someone suggest a way to make this code faster, particularly in SBCL?


(defun count-1-bits (n &aux (bits 0))
(declare (optimize (speed 3) (debug 0))
(type (and unsigned-byte fixnum) bits)
(type unsigned-byte n))
(if (typep n 'fixnum)
(let ()
(declare (type fixnum n))
(while (plusp n)
(when (oddp n)
(incf bits))
(setf n (ash n -1))))
(let ()
(declare (type bignum n))
(while (plusp n)
(when (oddp n)
(incf bits))
(setf n (ash n -1)))))
bits)

(defun count-bit-diffs (a b)
(declare (type unsigned-byte a b) ; warning maybe bignums
(optimize (speed 3) (debug 0)))
(count-1-bits (boole boole-xor a b)))


Jim Newton

unread,
May 18, 2018, 12:11:16 PM5/18/18
to
ouch, there's a bug in that code. I can't declare N a BIGNUM
because it is decreasing (thanks to ASH), so it eventually becomes a FIXNUM.
but the question still remains, for the best strategy for make this as fast as possible.

Kaz Kylheku

unread,
May 18, 2018, 12:54:46 PM5/18/18
to
On 2018-05-18, Jim Newton <jimka...@gmail.com> wrote:
> I have some bit manipulation code which runs in a time-critical loop of my program,
> it is part of an n^2 closest-neighbor sorting function.
> Can someone suggest a way to make this code faster, particularly in SBCL?

Extract bit fields instead of bits, look up through table, add up.

Also see this: https://stackoverflow.com/a/109025/1250772

int numberOfSetBits(int i)
{
// Java: use >>> instead of >>
// C or C++: use uint32_t
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

Could be done one 32 bits at a time or maybe more.
>
>
> (defun count-1-bits (n &aux (bits 0))
> (declare (optimize (speed 3) (debug 0))
> (type (and unsigned-byte fixnum) bits)
> (type unsigned-byte n))
> (if (typep n 'fixnum)
> (let ()
> (declare (type fixnum n))
> (while (plusp n)
> (when (oddp n)
> (incf bits))
> (setf n (ash n -1))))

bits is bounded to a small value in this case; bits can be declared
fixnum also. In the bignum case, the number of bits is also unlikely
ever to be a bignum.

> (let ()
> (declare (type bignum n))
> (while (plusp n)
> (when (oddp n)
> (incf bits))
> (setf n (ash n -1)))))
> bits)

A minimal change would be to have this function extract fixnum-sized chunks in
the bignum case and recurse.

Pascal J. Bourguignon

unread,
May 18, 2018, 1:36:30 PM5/18/18
to
(let ()
(declare …)
…)

==>

(locally
(declare …)
…)


Don't try to optimize, it doesn't work. Just think.


---- (count-1.lisp) ----------------------------------------------------

(defun count-1-bits/optimized (n &aux (bits 0))
(declare (optimize (speed 3) (debug 0))
(type (and unsigned-byte fixnum) bits)
(type unsigned-byte n))
(if (typep n 'fixnum)
(locally
(declare (type fixnum n))
(loop :while (plusp n) :do
(when (oddp n)
(incf bits))
(setf n (ash n -1))))
(loop :while (plusp n) :do
(when (oddp n)
(incf bits))
(setf n (ash n -1))))
bits)


(defun count-1-bits/not-optimized (n)
(check-type n unsigned-byte)
(labels ((count-bits (n)
(if (< n 256)
(aref #.(coerce (loop
:for n :below 256
:collect (loop :for i :below 8 :when (logbitp i n) :sum 1))
'vector)
n)
(let ((divisor (max (truncate (log (integer-length n) 2) 2)
256)))
(multiple-value-bind (left right) (truncate n divisor)
(+ (count-bits left)
(count-bits right)))))))
(count-bits n)))

(defun test ()
(loop
:with sa := 0
:with sb := 0
:with rep := 20
:repeat rep
:do (let ((n (random (expt 2 4096))))
(let* ((a)
(b)
(ta (com.informatimago.common-lisp.cesarum.time:chrono-run-time
(setf a (count-1-bits/optimized n))))
(tb (com.informatimago.common-lisp.cesarum.time:chrono-run-time
(setf b (count-1-bits/not-optimized n)))))
(assert (= a b))
(incf sa ta)
(incf sb tb)))
:finally (format t "~&average optimized = ~,9f~%average non-optimized = ~,9f~%"
(/ sa rep) (/ sb rep))))
------------------------------------------------------------------------



cl-user> (load (compile-file #P"~/Desktop/lisp/count-1.lisp"))
#P"/Users/pjb/Desktop/lisp/count-1.dx64fsl"
cl-user> (test)
average optimized = 0.002871750
average non-optimized = 0.000687450
nil
cl-user> (test)
average optimized = 0.002862700
average non-optimized = 0.000653150
nil
cl-user> (test)
average optimized = 0.002799100
average non-optimized = 0.000717150
nil
cl-user> (test)
average optimized = 0.002888150
average non-optimized = 0.000745800
nil
cl-user> (test)
average optimized = 0.002842350
average non-optimized = 0.000666450
nil
cl-user> (test)
average optimized = 0.002860200
average non-optimized = 0.000701350
nil


--
__Pascal J. Bourguignon
http://www.informatimago.com

Robert Munyer

unread,
May 18, 2018, 2:22:39 PM5/18/18
to
#'LOGCOUNT

--
-- Robert Munyer code below generates e-mail address

(format nil "~(~{~a~^ ~}~)" (reverse `(com dot munyer at ,(* 175811 53922))))

Kaz Kylheku

unread,
May 18, 2018, 2:47:14 PM5/18/18
to
On 2018-05-18, Robert Munyer <rob...@not-for-mail.invalid> wrote:
> #'LOGCOUNT

Doh!

Robert L.

unread,
May 18, 2018, 6:04:29 PM5/18/18
to
On 5/18/2018, Robert Munyer wrote:

> (format nil "~(~{~a~^ ~}~)" (reverse `(com dot munyer at ,(* 175811 53922))))

===>
"9480080742 at munyer dot com"

Incomplete transformation.


(match `(com dot munyer at ,(* 42 53922))
[`(,z dot ,y at ,x) (format "~a@~a.~a" x y z)])

===>
"226...@munyer.com"

--
[Amazon banned a multitude of history books, including one that received 300
5-star reviews.]
https://archive.org/details/the-day-amazon-murdered-history
http://www.tomatobubble.com/worldwarii.html
http://archive.org/details/nolies

Robert L.

unread,
May 18, 2018, 6:20:54 PM5/18/18
to
On 5/18/2018, Jim Newton wrote:

> need help with fast bit twiddling, can someone suggest faster CL code?

A comma isn't correct here. Did you not never get no education?

Need help with fast bit twiddling; can someone suggest faster CL code?


> (defun count-1-bits (n &aux (bits 0))
> (declare (optimize (speed 3) (debug 0))
> (type (and unsigned-byte fixnum) bits)
> (type unsigned-byte n))
> (if (typep n 'fixnum)
> (let ()
> (declare (type fixnum n))
> (while (plusp n)


SBCL doewn't have "while". Do you not never test no code before
you post it?


> (when (oddp n)
> (incf bits))
> (setf n (ash n -1))))
> (let ()
> (declare (type bignum n))
> (while (plusp n)
> (when (oddp n)
> (incf bits))
> (setf n (ash n -1)))))
> bits)


(defun count-1-bits (n &aux (bits 0))
(declare (optimize (speed 3) (debug 0))
(type (and unsigned-byte fixnum) bits)
(type unsigned-byte n))
(if (typep n 'fixnum)
(let ()
(declare (type fixnum n))
(do () ((zerop n))
(when (oddp n)
(incf bits))
(setf n (ash n -1))))
(let ()
(declare (type bignum n))
(do () ((zerop n))
(when (oddp n)
(incf bits))
(setf n (ash n -1)))))
bits)

(time
(let ((small-int (parse-integer "11101111111111111111111111" :radix 2))
(reps 3999888))
(do ((rep reps (- rep 1)))
((zerop rep))
(count-1-bits small-int))))

Evaluation took:
0.375 seconds of real time
0.375000 seconds of total run time (0.375000 user, 0.000000 system)
100.00% CPU
816,463,747 processor cycles
0 bytes consed





;; Compile with
;; csc -O5 bitcount.scm

(declare
(block)
(extended-bindings)
(unsafe)
(disable-interrupts))

(define (count-1-bits n)
(let go ((n n) (count 0))
(cond ((fxodd? n)
(go (fxshr n 1) (fx+ count 1)))
((fx= 0 n) count)
(else (go (fxshr n 1) count)))))

(time
(let ((small-int (string->number "11101111111111111111111111" 2))
(reps 3999888))
(do ((rep reps (- rep 1)))
((zero? rep))
(count-1-bits small-int))))

===>
0.266s CPU time


--
Mr. Porter disclosed the fact that his ... staff in the Embassy ... had ...
decoded communications between the commander ... and the Israeli High Command,
which proved that the latter knew that the Liberty was an unarmed American
naval vessel, and ... ordered that the American ship be attacked and sunk.
http://archive.org/details/nolies

Pascal J. Bourguignon

unread,
May 18, 2018, 6:52:39 PM5/18/18
to
Sorry, there's a bug (log wasn't the function to use); with the right
divisor, it's ten time faster than the original. Here's the correction
with a third (but not faster) solution. Thanks to
mak...@irc.freenode.org for the code review ;-)
(let ((divisor (expt 2 (truncate (integer-length n) 2))))
(multiple-value-bind (left right) (truncate n divisor)
(+ (count-bits left)
(count-bits right)))))))
(count-bits n)))



(defun count-1-bits/not-optimized-loop (n)
(check-type n unsigned-byte)
(loop
:for i :below (integer-length n) :by 8
:for byte := (ldb (byte 8 i) n)
:sum (aref #.(coerce (loop
:for n :below 256
:collect (loop :for i :below 8 :when (logbitp i n) :sum 1))
'vector)
byte)))


(defun test ()
(loop
:with sa := 0
:with sb := 0
:with sc := 0
:with sd := 0
:with rep := 20
:repeat rep
:do (let ((n (random (expt 2 4096))))
(let* ((a)
(b)
(c)
(d)
(ta (com.informatimago.common-lisp.cesarum.time:chrono-run-time
(setf a (count-1-bits/optimized n))))
(tb (com.informatimago.common-lisp.cesarum.time:chrono-run-time
(setf b (count-1-bits/not-optimized n))))
(tc (com.informatimago.common-lisp.cesarum.time:chrono-run-time
(setf c (count-1-bits/not-optimized-loop n))))
(td (com.informatimago.common-lisp.cesarum.time:chrono-run-time
(setf d (logcount n)))))
(assert (= a b c d))
(incf sa ta)
(incf sb tb)
(incf sc tc)
(incf sd td)))
:finally (format t "~&average optimized = ~,9f~
~%average non-optimized = ~,9f~
~%average non-optimized-loop = ~,9f~
~%average logcount = ~,9f~%"
(/ sa rep) (/ sb rep) (/ sc rep) (/ sd rep))))


#|
cl-user> (load (compile-file #P"~/Desktop/lisp/count-1.lisp"))
#P"/Users/pjb/Desktop/lisp/count-1.dx64fsl"
cl-user> (test)
average optimized = 0.002937250
average non-optimized = 0.000252450
average non-optimized-loop = 0.000401200
average logcount = 0.000003650
nil
cl-user>
|#

Kaz Kylheku

unread,
May 19, 2018, 11:39:53 AM5/19/18
to
On 2018-05-18, Pascal J. Bourguignon <p...@informatimago.com> wrote:
> average logcount = 0.000003650

3.k microseconds (right?) for logcount do do a 4096 bit number seems
quite slow for any sort of modern machine. You don't have enough
repetitions here, maybe?

This thread prompted me to implement log countfor TXR Lisp (gapingly
absent, when you're made aware of the gape).

I have to logcount a 655360 bit number 1000 times just to rack up around
70-80 ms.

1> (let ((r (rand (expt 2 655360)))) (pprof (dotimes (x 1000) (logcount
r))))
malloc bytes: 0
gc heap bytes: 80
total: 80
milliseconds: 70

This is in a VirtualBox; cpuinfo:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz
stepping : 7
cpu MHz : 2777.496
cache size : 6144 KB

Kaz Kylheku

unread,
May 19, 2018, 1:06:14 PM5/19/18
to
On 2018-05-19, Kaz Kylheku <157-07...@kylheku.com> wrote:
> On 2018-05-18, Pascal J. Bourguignon <p...@informatimago.com> wrote:
>> average logcount = 0.000003650
>
> 3.k microseconds (right?) for logcount do do a 4096 bit number seems
^^^
??? Should have been 3.5.
0 new messages