Help optimizing array-to-integer operation?

162 views
Skip to first unread message

Raph

unread,
Mar 23, 2010, 3:55:57 AM3/23/10
to Clojure

I need a little help. I need to convert a 4-byte array to an integer
as quickly as possible. (I use this operation in a tool I am writing,
and I need to process millions of records per second, so performance
matters.)

In a half-Clojure, half-Java operation, I see:

(time
(dotimes [x 24000000]
(let [foo (com.foo.BinToInt/makeByte4FromInt x)]
(com.martelles.BinToInt/makeLongFromByte4 foo))))

; 880 msecs

where, in java, I have implemented:

public static final long makeLongFromByte4(byte[] b) {
return (long)(b[0]<<24 | (b[1]&0xff)<<16 | (b[2]&0xff)<<8 |
(b[3]&0xff));
}

public static final byte[] makeByte4FromInt(int i) {
return new byte[] { (byte)(i>>24), (byte)(i>>16), (byte)
(i>>8), (byte)i };
}

I like the 880 msecs number. For 8 fields per record, that give me
over 3,000,000 records per second.

But, in clojure:

Array definition:

(def tba #^ints (make-array Integer/TYPE 4))
(aset tba 0 (byte 0x1f))
(aset tba 2 (byte 0x01))
(aset tba 3 (byte 0xFF))

(defn arr-to-int [#^ints x]
(reduce
bit-or
[(bit-shift-left (bit-and 0xFF (aget x 0)) 24)
(bit-shift-left (bit-and 0xFF (aget x 1)) 16)
(bit-shift-left (bit-and 0xFF (aget x 2)) 8)
(bit-and 0xFF (aget x 3))
]))

(time (dotimes [x 8000000] (arr-to-int tba)))

; 4,736 msecs

I have tried various hinting methods, but I cannot increase the speed.
What can I do to make the clojure operations as fast as the native
Java operations? If I can't, that's okay, but since I am new to this
(wonderful) language, I feel like I am missing something.

Thank you!

Raph

Konrad Hinsen

unread,
Mar 23, 2010, 8:08:01 AM3/23/10
to clo...@googlegroups.com
On 23.03.2010, at 08:55, Raph wrote:

> I have tried various hinting methods, but I cannot increase the speed.
> What can I do to make the clojure operations as fast as the native
> Java operations?

Get rid of the reduce and unroll the bit-or computations instead:

(defn arr-to-int [#^ints x]
(bit-or (bit-shift-left (bit-and 0xFF (aget x 0)) 24)
(bit-or (bit-shift-left (bit-and 0xFF (aget x 1)) 16)
(bit-or (bit-shift-left (bit-and 0xFF (aget x 2)) 8)
(bit-and 0xFF (aget x 3))))))

That makes for a factor 3.3 on my machine.

Konrad.

Per Vognsen

unread,
Mar 23, 2010, 8:35:27 AM3/23/10
to clo...@googlegroups.com
In addition, right now his Clojure code isn't directly analogous to
his Java code. If he used an array of bytes instead of integers, he
wouldn't need the bit-masking.

Getting this to work revealed a big surprise to me. Java has only
signed integer types: byte, short, int and long. Clojure reads 0xFF as
the _signed_ integer 255. I would expect it to be -1. For example,
(every? #(= % (bit-and -1 %)) (range -128 128)) evaluates to true.
This looks like a genuine bug.

-Per

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>
> To unsubscribe from this group, send email to clojure+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.
>

Per Vognsen

unread,
Mar 23, 2010, 8:40:57 AM3/23/10
to clo...@googlegroups.com
Sorry, I didn't put that right. 0xFF would only be -1 as a signed
byte. What I'm saying is that the interaction between the type system
of integers and the reader's hexadecimal notation is pretty surprising
to me. In particular, (byte 0xFF) throws an error.

-Per

Mark J. Reed

unread,
Mar 23, 2010, 9:39:08 AM3/23/10
to clo...@googlegroups.com
On Tue, Mar 23, 2010 at 8:40 AM, Per Vognsen <per.v...@gmail.com> wrote:
> Sorry, I didn't put that right. 0xFF would only be -1 as a signed
> byte. What I'm saying is that the interaction between the type system
> of integers and the reader's hexadecimal notation is pretty surprising
> to me. In particular, (byte 0xFF) throws an error.

What version? It works here:

Clojure 1.1.0
user=> (byte 0xff)
-1

In fact, it seems that (byte) doesn't check the range at all:

user=> (byte -129)
127

--
Mark J. Reed <mark...@gmail.com>

Per Vognsen

unread,
Mar 23, 2010, 10:41:08 AM3/23/10
to clo...@googlegroups.com
Interesting. It's Clojure 1.2.0-master-SNAPSHOT as of last week:

Clojure 1.2.0-master-SNAPSHOT
user=> 0xff
255
user=> (byte 0xff)
java.lang.IllegalArgumentException: Value out of range for byte: 255
(NO_SOURCE_FILE:0)

So, this looks like a new issue. Rich?

-Per

ataggart

unread,
Mar 24, 2010, 2:19:03 AM3/24/10
to Clojure
It was added here: http://github.com/richhickey/clojure/commit/d0e1eef26abd215668ed27357839b2bba4ba095c

Not automatically overflowing numbers is a Good Thing. For example:
http://seclists.org/fulldisclosure/2010/Mar/447

Though the behavior now deviates from the java casting behavior where
the more significant bits are just truncated. I can't think of many
cases where a downcast is used *and* the original value is expected to
ever be larger than the target type. If you really need that, then
bit-masking prior to casting seems appropriate.

On Mar 23, 7:41 am, Per Vognsen <per.vogn...@gmail.com> wrote:
> Interesting. It's Clojure 1.2.0-master-SNAPSHOT as of last week:
>
> Clojure 1.2.0-master-SNAPSHOT
> user=> 0xff
> 255
> user=> (byte 0xff)
> java.lang.IllegalArgumentException: Value out of range for byte: 255
> (NO_SOURCE_FILE:0)
>
> So, this looks like a new issue. Rich?
>
> -Per
>

> On Tue, Mar 23, 2010 at 8:39 PM, Mark J. Reed <markjr...@gmail.com> wrote:


>
>
>
> > On Tue, Mar 23, 2010 at 8:40 AM, Per Vognsen <per.vogn...@gmail.com> wrote:
> >> Sorry, I didn't put that right. 0xFF would only be -1 as a signed
> >> byte. What I'm saying is that the interaction between the type system
> >> of integers and the reader's hexadecimal notation is pretty surprising
> >> to me. In particular, (byte 0xFF) throws an error.
>
> > What version?  It works here:
>
> > Clojure 1.1.0
> > user=> (byte 0xff)
> > -1
>
> > In fact, it seems that (byte) doesn't check the range at all:
>
> > user=> (byte -129)
> > 127
>
> > --

> > Mark J. Reed <markjr...@gmail.com>

Per Vognsen

unread,
Mar 24, 2010, 5:42:57 AM3/24/10
to clo...@googlegroups.com
I never said automatic overflowing is a good thing per se.

The problem is that the expectations set by hexadecimal notation are
violated. When I write 0xFF, I don't mean the abstract mathematical
number 255; I mean the number whose bit representation (e.g. one's
complement for signed integers) has the 8 least significant bits set
to 1 and any higher bits set to 0. Unfortunately this isn't a single
definite number but depends on the type. For signed bytes it is -1 but
for shorts, ints and longs it is 255. In a statically typed language
like Java, the compiler can treat a hexadecimal literal like 0xFF
initially as uninterpreted bits and then interpret it appropriately at
different types. In a low-level sense, this works out to simple bit
truncation ("automatic overflow").

-Per

Josh Arnold

unread,
Mar 24, 2010, 11:21:41 AM3/24/10
to Clojure
> I can't think of many
> cases where a downcast is used *and* the original value is expected to
> ever be larger than the target type.

I can. Particularly when casting to a byte, I'm often more interested
in the bit pattern than the integer value. (Admittedly, sometimes
this is due to java API idiosynchrocies. For example,
InputStream.read() reads a single byte, but returns it as an integer
from 0 to 255, which is out of range for a java byte.) This will be a
breaking change for me.

I was tempted to argue that explicitly calling a coercion function
shouldn't qualify as 'automatic overflow', but in Clojure these
functions are sometimes used to achieve an optimization (forcing the
use of java primitives) rather than for their behavior. In those use
cases, I see that overflow would be unexpected and hence, harmful.

Note that because of the sign, bit-masking alone can't be used to get
a truncating conversion. For example, to force the lower 8 bits of
an integer into a java byte you'd need something like:

(defn truncating-byte [n]
(byte (if (bit-test n 7) (bit-or n -256) (bit-and n 255))))


On Mar 24, 2:19 am, ataggart <alex.tagg...@gmail.com> wrote:
> It was added here:http://github.com/richhickey/clojure/commit/d0e1eef26abd215668ed27357...

Raph

unread,
Mar 23, 2010, 8:19:13 PM3/23/10
to Clojure
Thank you, Konrad. Unrolling the bit-or computations helped; total
time for 8,000,000 iterations was 1305 msecs.

Per, you are correct. The byte issue (0xFF out of range) is why I have
to include the mask. I think this is more Java's limitation than
anything. 0xFF is not a valid value for a byte due to the JVM's broken
implementation of bytes. (My opinion, anyway.I think a byte should be
8 bits and I should be able to use all of them.)

On a more hideous note:

(defn arr-to-int [#^ints x]
(bit-or

(int (bit-or (int (bit-shift-left (int (bit-and (int 0x9F) (int
(aget x 0)))) 24)) ; don't want to overflow to long, hence 0x9F
(int (bit-shift-left (int (bit-and (int 0xFF) (int
(aget x 1)))) 16))))
(int (bit-or (int (bit-shift-left (int (bit-and (int 0xFF) (int
(aget x 2)))) 8))
(int (bit-and (int 0xFF) (int (aget x 3))))))))


(time (dotimes [x 8000000] (arr-to-int tba)))

executes in 785 msecs. This is a substantial speedup over 4,736 msecs
(and even 1305 msecs).

Still not as good as native java, but you can't have everything. Thank
you, everyone, for the assistance!

Raph


Mark J. Reed

unread,
Mar 24, 2010, 1:57:13 PM3/24/10
to clo...@googlegroups.com
On Tue, Mar 23, 2010 at 8:19 PM, Raph <mar...@gmail.com> wrote:
> (My opinion, anyway.I think a byte should be 8 bits and I should be able to use all of them.)

Er, it is, and you can. A Java byte still gives you all 8 bits' worth
of 256 different possible values; the interpretation of those values
is all that differs here. Whereas C lets you pick between signed and
unsigned (with the default unfortunately not always well-defined),
Java gives you no choice but to use the signed interpretation. But
you still get to use all 8 bits of the byte; it's just that the
numbers mapped to [128, 255] in unsigned interpretations map to
[-128,-1] instead.

The dissonance here comes from the fact that there's no real tradition
of negative hexadecimal numbers in programming. We've typically used
the hexadecimal form of the unsigned integer interpretation to
represent the corresponding bit patterns no matter how they're being
used in a given program or context. So anyone experienced with
manipulating things at the bit level comes in expecting things like
(byte 0xff) to just work, and is surprised when they don't.

Still, the nice thing about Clojure vs Java is it's not hard to write a fix:

(defmacro unsigned-byte [bval] (byte (if (> bval 127) (- bval 256) bval)))

Or call it (ubyte) for less wordiness...

ataggart

unread,
Mar 24, 2010, 3:28:27 PM3/24/10
to Clojure
Quite right. My comment was mainly aimed at something like casting
0xABC to a byte, and didn't consider the signed value problem. I
think this is a good case for a separate function, such as the one you
provided.

ataggart

unread,
Mar 24, 2010, 4:29:10 PM3/24/10
to Clojure

Raph

unread,
Mar 25, 2010, 6:54:20 PM3/25/10
to Clojure

On Mar 24, 10:57 am, "Mark J. Reed" <markjr...@gmail.com> wrote:

> On Tue, Mar 23, 2010 at 8:19 PM, Raph <mart...@gmail.com> wrote:
> > (My opinion, anyway.I think a byte should be 8 bits and I should be able to use all of them.)
>
> Er, it is, and you can.  A Java byte still gives you all 8 bits' worth
> of 256 different possible values; the interpretation of those values
> is all that differs here.  Whereas C lets you pick between signed and
> unsigned (with the default unfortunately not always well-defined),
> Java gives you no choice but to use the signed interpretation.  But
> you still get to use all 8 bits of the byte; it's just that the
> numbers mapped to [128, 255] in unsigned interpretations map to
> [-128,-1] instead.

Right, should have been more specific. The 0xFF byte doesn't work the
way I expect it to. I have to use ints to get the correct answer.

(bit-or (bit-shift-left (byte 0x01) 16)
(bit-shift-left (byte 0x7F) 8)) => 98048

(bit-or (bit-shift-left (int 0x01) 16)
(bit-shift-left (int 0x7F) 8)) => 98048

But...

(bit-or (bit-shift-left (byte 0x01) 16)
(bit-shift-left (byte 0xFF) 8)) => -256

(bit-or (bit-shift-left (int 0x01) 16)
(bit-shift-left (int 0xFF) 8)) => 130816

So I can't use the bits the way I'd expect.

Raph

ataggart

unread,
Mar 26, 2010, 4:51:16 AM3/26/10
to Clojure
The problem you're seeing is again due to bytes being a signed number,
in particular because widening conversions preserve the sign, thus
left-padding with ones if the MSB is one (i.e., a negative number).
Since bit-shift-left converts the byte to an int, the 0xFF byte
becomes a 0xFFFFFFFF int. When you start with ints instead of bytes,
the int is just 0xFF, thus no leading ones.

If you're playing with bytes, you always have to bit-mask the widening
conversions to deal with sign issues. In fact you handled this in
your original java code by using &0xFF against each byte prior to
shifting.

Reply all
Reply to author
Forward
0 new messages