can binary arithmetic guru help improve these "bitmasking" ops?

120 views
Skip to first unread message

danl...@gmail.com

unread,
Feb 23, 2015, 1:59:32 PM2/23/15
to clo...@googlegroups.com
So, much of the pain involved in handling UUID's correctly on the JVM
relates to the fact that there is no primitive unsigned numeric type
that can represent the full range of possible values of the msb and lsb.
Ie., we need to always deal with the unpleasant "am I negative?" approach to
reading (writing) that 64th bit.  To avoid the complexity of all the 
edge cases, we encapsulate the basic primitives of working with
unsigned numbers entirely within the abstraction of "mask" and
"mask offset".  Using these, we built the two fundamental bitwise operations
that are used for most of the UUID calculation: ldb (load-byte) and
dpb (deposit-byte).

This scrap of code from my clj-uuid.bitmop library is extremely useful for working 
with "unsigned" long/binary values (analogously to how one might using the common-lisp
functions by the same name).  And, it has been "good enough" to do pretty well
so far in terms of performance.  But I'm sure that there are gifted binariticians
in the audience that can improve this. (Note, the namespace uses ztellman/primitive-math
which changes the semantics of some arithmetic operations and some type hinting.  Also
some of the 'let's are there for that reason. It may be helpful to refer to the link.



(defn ^long expt2 [^long pow]
  (bit-set 0 pow))

(defn ^long mask [^long width ^long offset]
  (if (< (+ width offset) 64)
    (bit-shift-left (dec (bit-shift-left 1 width)) offset)
    (let [x (expt2 offset)]
      (bit-and-not -1 (dec ^long x)))))

(declare ^long mask-offset ^long mask-width)

(defn ^long mask-offset [^long m]
  (cond
    (zero? m) 0
    (neg?  m) (- 64 ^long (mask-width m))
    :else     (loop [c 0]
                (if (pos? (bit-and 1 (bit-shift-right m c)))
                  c
                  (recur (inc c))))))

(defn ^long mask-width [^long m]
  (if (neg? m)
    (let [x (mask-width (- (inc m)))]
      (- 64  ^long x))
    (loop [m (bit-shift-right m (mask-offset m)) c 0]
      (if (zero? (bit-and 1 (bit-shift-right m c)))
        c
        (recur m (inc c))))))

(defn ^long ldb
  "Load Byte"
  [^long bitmask ^long num]
  (let [off (mask-offset bitmask)]
    (bit-and (>>> bitmask ^long off)
      (bit-shift-right num off))))

(defn ^long dpb
  "Deposit Byte"
  [^long bitmask ^long num ^long value]
  (bit-or (bit-and-not num bitmask)
    (bit-and bitmask
      (bit-shift-left value (mask-offset bitmask)))))

Mikera

unread,
Feb 23, 2015, 8:39:19 PM2/23/15
to clo...@googlegroups.com
Bit operations is one area where I'd actually recommend writing in Java before wrapping in Clojure for several reasons:
1) You can port / compare directly with canonical C implementation
2) You can be pretty sure of avoiding boxing / other overheads by sticking to Java primitive maths
3) Clojure still doesn't quite have the full range of bitwise functionality as far as I'm aware

Feel free to use any of my Java bitwise code, if helpful:

Dan Lentz

unread,
Feb 23, 2015, 9:18:20 PM2/23/15
to clo...@googlegroups.com
Actually, yes, your code is exactly what I was looking for. Your "countTrailingZeros" is what I'm calling "mask-offset".  Very helpful, thank you.  Although, I gotta say, I'm still gonna do it in Clojure.  This is supposed to be programming for enjoyment! :)

Best,
Dan

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "Clojure" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clojure/XIN2ZerhIcI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Leif

unread,
Feb 23, 2015, 9:20:12 PM2/23/15
to clo...@googlegroups.com

If you're trying to squeeze every last bit of performance, and you don't mind some messiness and macros, I noticed that many of these functions are called with only a few values in the first argument.  So you are doing a lot of bit-fiddling at runtime that could be done at compile-time.  So you could partially evaluate (i.e. unroll, i.e. ~make lookup table for) 'ldb' and 'dpb' for the 8 or 9 different first arguments you actually call it with.  I unrolled your 'mask' fn into a 8-way case statement, and got about 6% speedup.

--Leif

Leif

unread,
Feb 23, 2015, 10:33:13 PM2/23/15
to clo...@googlegroups.com
Hello, again.  Just wanted to clarify, in case the below speedup didn't seem worth it.  That 6% speedup was in the top-level uuid/v1 function, not mask!

I'll try to give a clearer picture of my suggestion:  Consider

(let [bitmask (mask 8 (* 8 (dec 7)))
      off (mask-offset bitmask)
      moff (>>> bitmask off)]
  (defn ^long ldb6 [^long num]
    (bit-and moff (bit-shift-right num off))))


This function gives a speedup of 370% over calling (ldb (mask ...) num), because you don't really need to calculate the mask, offset, or offset-mask at runtime.

I envisioned writing a macro that would write the function

(let [bitmask0 (mask 8 (* 8 (dec 1)))
      off0 (mask-offset bitmask0)
      moff0 (>>> bitmask0 off0)
      ...]
  (defn ^long ldb-j [^long j ^long num]
    (case j
      0 (bit-and moff0 (bit-shift-right num off0))
      ...
      7 (bit-and moff7 (bit-shift-right num off7))
      (throw IAE))))

Or just write the unrolled version by hand.  I think you would get a fair speedup if you did that for ldb and dpb.

Hoping that's clearer,
Leif

danl...@gmail.com

unread,
Feb 24, 2015, 7:26:30 AM2/24/15
to clo...@googlegroups.com
Lief you have a really valid point here. I was definitely aware that there are only about a dozen actual masks that get used,
so I had been looking for a good way to unroll that computation. So, you definitely have showed me something useful
with your macro approach. I think doing that though is something I will want to revisit a little bit down the line -- ldb and dpb
Have proven themselves useful enough to try to first look at ways to optimize them for the "general use" case. The
.countTrailingZeros code seems to take a much more efficient approach to "mask-offset" and IIRC may completely eliminate
"mask-width". I'm going to play with it today and I'll report back any interesting results.

This group is really a fantastic resource. You guys are awesome.

Reply all
Reply to author
Forward
Message has been deleted
0 new messages