In a half-Clojure, half-Java operation, I see:
(time
(dotimes [x 24000000]
(let [foo (com.foo.BinToInt/makeByte4FromInt x)]
(com.martelles.BinToInt/makeLongFromByte4 foo))))
; 880 msecs
where, in java, I have implemented:
public static final long makeLongFromByte4(byte[] b) {
return (long)(b[0]<<24 | (b[1]&0xff)<<16 | (b[2]&0xff)<<8 |
(b[3]&0xff));
}
public static final byte[] makeByte4FromInt(int i) {
return new byte[] { (byte)(i>>24), (byte)(i>>16), (byte)
(i>>8), (byte)i };
}
I like the 880 msecs number. For 8 fields per record, that give me
over 3,000,000 records per second.
But, in clojure:
Array definition:
(def tba #^ints (make-array Integer/TYPE 4))
(aset tba 0 (byte 0x1f))
(aset tba 2 (byte 0x01))
(aset tba 3 (byte 0xFF))
(defn arr-to-int [#^ints x]
(reduce
bit-or
[(bit-shift-left (bit-and 0xFF (aget x 0)) 24)
(bit-shift-left (bit-and 0xFF (aget x 1)) 16)
(bit-shift-left (bit-and 0xFF (aget x 2)) 8)
(bit-and 0xFF (aget x 3))
]))
(time (dotimes [x 8000000] (arr-to-int tba)))
; 4,736 msecs
I have tried various hinting methods, but I cannot increase the speed.
What can I do to make the clojure operations as fast as the native
Java operations? If I can't, that's okay, but since I am new to this
(wonderful) language, I feel like I am missing something.
Thank you!
Raph
> I have tried various hinting methods, but I cannot increase the speed.
> What can I do to make the clojure operations as fast as the native
> Java operations?
Get rid of the reduce and unroll the bit-or computations instead:
(defn arr-to-int [#^ints x]
(bit-or (bit-shift-left (bit-and 0xFF (aget x 0)) 24)
(bit-or (bit-shift-left (bit-and 0xFF (aget x 1)) 16)
(bit-or (bit-shift-left (bit-and 0xFF (aget x 2)) 8)
(bit-and 0xFF (aget x 3))))))
That makes for a factor 3.3 on my machine.
Konrad.
Getting this to work revealed a big surprise to me. Java has only
signed integer types: byte, short, int and long. Clojure reads 0xFF as
the _signed_ integer 255. I would expect it to be -1. For example,
(every? #(= % (bit-and -1 %)) (range -128 128)) evaluates to true.
This looks like a genuine bug.
-Per
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>
> To unsubscribe from this group, send email to clojure+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.
>
-Per
What version? It works here:
Clojure 1.1.0
user=> (byte 0xff)
-1
In fact, it seems that (byte) doesn't check the range at all:
user=> (byte -129)
127
--
Mark J. Reed <mark...@gmail.com>
Clojure 1.2.0-master-SNAPSHOT
user=> 0xff
255
user=> (byte 0xff)
java.lang.IllegalArgumentException: Value out of range for byte: 255
(NO_SOURCE_FILE:0)
So, this looks like a new issue. Rich?
-Per
Not automatically overflowing numbers is a Good Thing. For example:
http://seclists.org/fulldisclosure/2010/Mar/447
Though the behavior now deviates from the java casting behavior where
the more significant bits are just truncated. I can't think of many
cases where a downcast is used *and* the original value is expected to
ever be larger than the target type. If you really need that, then
bit-masking prior to casting seems appropriate.
On Mar 23, 7:41 am, Per Vognsen <per.vogn...@gmail.com> wrote:
> Interesting. It's Clojure 1.2.0-master-SNAPSHOT as of last week:
>
> Clojure 1.2.0-master-SNAPSHOT
> user=> 0xff
> 255
> user=> (byte 0xff)
> java.lang.IllegalArgumentException: Value out of range for byte: 255
> (NO_SOURCE_FILE:0)
>
> So, this looks like a new issue. Rich?
>
> -Per
>
> On Tue, Mar 23, 2010 at 8:39 PM, Mark J. Reed <markjr...@gmail.com> wrote:
>
>
>
> > On Tue, Mar 23, 2010 at 8:40 AM, Per Vognsen <per.vogn...@gmail.com> wrote:
> >> Sorry, I didn't put that right. 0xFF would only be -1 as a signed
> >> byte. What I'm saying is that the interaction between the type system
> >> of integers and the reader's hexadecimal notation is pretty surprising
> >> to me. In particular, (byte 0xFF) throws an error.
>
> > What version? It works here:
>
> > Clojure 1.1.0
> > user=> (byte 0xff)
> > -1
>
> > In fact, it seems that (byte) doesn't check the range at all:
>
> > user=> (byte -129)
> > 127
>
> > --
> > Mark J. Reed <markjr...@gmail.com>
The problem is that the expectations set by hexadecimal notation are
violated. When I write 0xFF, I don't mean the abstract mathematical
number 255; I mean the number whose bit representation (e.g. one's
complement for signed integers) has the 8 least significant bits set
to 1 and any higher bits set to 0. Unfortunately this isn't a single
definite number but depends on the type. For signed bytes it is -1 but
for shorts, ints and longs it is 255. In a statically typed language
like Java, the compiler can treat a hexadecimal literal like 0xFF
initially as uninterpreted bits and then interpret it appropriately at
different types. In a low-level sense, this works out to simple bit
truncation ("automatic overflow").
-Per
I can. Particularly when casting to a byte, I'm often more interested
in the bit pattern than the integer value. (Admittedly, sometimes
this is due to java API idiosynchrocies. For example,
InputStream.read() reads a single byte, but returns it as an integer
from 0 to 255, which is out of range for a java byte.) This will be a
breaking change for me.
I was tempted to argue that explicitly calling a coercion function
shouldn't qualify as 'automatic overflow', but in Clojure these
functions are sometimes used to achieve an optimization (forcing the
use of java primitives) rather than for their behavior. In those use
cases, I see that overflow would be unexpected and hence, harmful.
Note that because of the sign, bit-masking alone can't be used to get
a truncating conversion. For example, to force the lower 8 bits of
an integer into a java byte you'd need something like:
(defn truncating-byte [n]
(byte (if (bit-test n 7) (bit-or n -256) (bit-and n 255))))
On Mar 24, 2:19 am, ataggart <alex.tagg...@gmail.com> wrote:
> It was added here:http://github.com/richhickey/clojure/commit/d0e1eef26abd215668ed27357...
Per, you are correct. The byte issue (0xFF out of range) is why I have
to include the mask. I think this is more Java's limitation than
anything. 0xFF is not a valid value for a byte due to the JVM's broken
implementation of bytes. (My opinion, anyway.I think a byte should be
8 bits and I should be able to use all of them.)
On a more hideous note:
(defn arr-to-int [#^ints x]
(bit-or
(int (bit-or (int (bit-shift-left (int (bit-and (int 0x9F) (int
(aget x 0)))) 24)) ; don't want to overflow to long, hence 0x9F
(int (bit-shift-left (int (bit-and (int 0xFF) (int
(aget x 1)))) 16))))
(int (bit-or (int (bit-shift-left (int (bit-and (int 0xFF) (int
(aget x 2)))) 8))
(int (bit-and (int 0xFF) (int (aget x 3))))))))
(time (dotimes [x 8000000] (arr-to-int tba)))
executes in 785 msecs. This is a substantial speedup over 4,736 msecs
(and even 1305 msecs).
Still not as good as native java, but you can't have everything. Thank
you, everyone, for the assistance!
Raph
Er, it is, and you can. A Java byte still gives you all 8 bits' worth
of 256 different possible values; the interpretation of those values
is all that differs here. Whereas C lets you pick between signed and
unsigned (with the default unfortunately not always well-defined),
Java gives you no choice but to use the signed interpretation. But
you still get to use all 8 bits of the byte; it's just that the
numbers mapped to [128, 255] in unsigned interpretations map to
[-128,-1] instead.
The dissonance here comes from the fact that there's no real tradition
of negative hexadecimal numbers in programming. We've typically used
the hexadecimal form of the unsigned integer interpretation to
represent the corresponding bit patterns no matter how they're being
used in a given program or context. So anyone experienced with
manipulating things at the bit level comes in expecting things like
(byte 0xff) to just work, and is surprised when they don't.
Still, the nice thing about Clojure vs Java is it's not hard to write a fix:
(defmacro unsigned-byte [bval] (byte (if (> bval 127) (- bval 256) bval)))
Or call it (ubyte) for less wordiness...
On Mar 24, 10:57 am, "Mark J. Reed" <markjr...@gmail.com> wrote:
> Mark J. Reed <markjr...@gmail.com>
On Mar 24, 10:57 am, "Mark J. Reed" <markjr...@gmail.com> wrote:
> On Tue, Mar 23, 2010 at 8:19 PM, Raph <mart...@gmail.com> wrote:
> > (My opinion, anyway.I think a byte should be 8 bits and I should be able to use all of them.)
>
> Er, it is, and you can. A Java byte still gives you all 8 bits' worth
> of 256 different possible values; the interpretation of those values
> is all that differs here. Whereas C lets you pick between signed and
> unsigned (with the default unfortunately not always well-defined),
> Java gives you no choice but to use the signed interpretation. But
> you still get to use all 8 bits of the byte; it's just that the
> numbers mapped to [128, 255] in unsigned interpretations map to
> [-128,-1] instead.
Right, should have been more specific. The 0xFF byte doesn't work the
way I expect it to. I have to use ints to get the correct answer.
(bit-or (bit-shift-left (byte 0x01) 16)
(bit-shift-left (byte 0x7F) 8)) => 98048
(bit-or (bit-shift-left (int 0x01) 16)
(bit-shift-left (int 0x7F) 8)) => 98048
But...
(bit-or (bit-shift-left (byte 0x01) 16)
(bit-shift-left (byte 0xFF) 8)) => -256
(bit-or (bit-shift-left (int 0x01) 16)
(bit-shift-left (int 0xFF) 8)) => 130816
So I can't use the bits the way I'd expect.
Raph
If you're playing with bytes, you always have to bit-mask the widening
conversions to deal with sign issues. In fact you handled this in
your original java code by using &0xFF against each byte prior to
shifting.