Handling of unsigned bytes

2,875 views
Skip to first unread message

timc

unread,
Feb 11, 2011, 12:22:11 PM2/11/11
to Clojure
How on earth is one supposed to do communication programming (not to
mention handling binary files etc) without an unsigned byte type?

I see that this issue has been talked about vaguely - is there a
solution?

Thanks

Andy Fingerhut

unread,
Feb 11, 2011, 12:24:51 PM2/11/11
to clo...@googlegroups.com
What can you not do with the signed byte type and arrays of bytes
(Java byte[] and Clojure (byte-array ...))?

I believe these are frequently used for Java I/O, and can be used for
Clojure I/O as well.

Andy

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient
> with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Miki

unread,
Feb 11, 2011, 12:47:53 PM2/11/11
to clo...@googlegroups.com

Stuart Sierra

unread,
Feb 11, 2011, 2:34:54 PM2/11/11
to clo...@googlegroups.com
Java doesn't have any unsigned types, and that's not really something we can change.  Java libraries that need to do binary I/O tend to work with byte arrays and handle individual bytes as ints.

-Stuart Sierra

B Smith-Mannschott

unread,
Feb 11, 2011, 2:41:22 PM2/11/11
to clo...@googlegroups.com

Well, except for char. That's unsigned. An odd language, Java.

// Ben

jk

unread,
Feb 11, 2011, 8:25:22 PM2/11/11
to Clojure
Well, until you start doing "bit-shift-right"s and the sign bit (high
bit) doesn't go to 0 after shifting it down. Actually you typically
need to represent individual bytes as ints and write them back out
using writeByte() when you're trying to do low level bit-twiddling.

It's a pain but it works.

Richard Lyman

unread,
Feb 11, 2011, 10:42:04 PM2/11/11
to clo...@googlegroups.com
I have to deal with them when processing AMF packets, and I use the
Netty library - it's amazing, you should look into it.

http://www.jboss.org/netty

-Rich

timc

unread,
Feb 12, 2011, 8:08:00 AM2/12/11
to Clojure
Sorry I did not make myself clear - I thought it was obvious given the
previous postings on this subject.
This java program:

public class TestByte {
public static void main(String[] args)
{
int i = 0x123456ab;
byte[] b = new byte[1];
b[0] = (byte) i;
showInt("i",i);
showByte("b[0]",b[0]);
}

private static void showInt(String s, int x)
{
System.out.println(String.format("%s=%d(0x%08x 0x%02x)", s, x, x,
x));
}

private static void showByte(String s, byte x)
{
System.out.println(String.format("%s=%d(0x%08x 0x%02x)", s, x, x,
x));
}
}

compiles and runs, producing this output:

i=305419947(0x123456ab 0x123456ab)
b[0]=-85(0x000000ab 0xab)

But, this bit of clojure:

(defn show [s x] (println (format "%s=%d(0x%08x 0x%02x)" s, x, x, x)))
(def i 0x123456ab)
(def b (byte-array 1))
(aset-byte b 0 (byte i))
(show "i" i)
(show "b[0]" (aget b 0))

throws this exception:
java.lang.IllegalArgumentException: Value out of range for byte:
305419947

Shouldn't these two programs be equivalent?
Presumably they are not because the effect of

int i = (byte) b;

is NOT the same as

(def i (byte b))

-- but surely it should be the same?

timc

unread,
Feb 12, 2011, 8:28:08 AM2/12/11
to Clojure
Further investigation reveals that

(def b (byte i))

is doing something equivalent to this internally:

byte b = Byte.parseByte(String.format("%d",i));

which does indeed throw a NumberFormatException if the decimal integer
representation given to it produces an out-of-range value (as it
should).

So - what I'm pleading for, is that (byte b) and (int i), (short s),
etc. should simply perform a masking operation (on the appropriate
number of least significant bits) in the way that java clearly does.

Aaron Cohen

unread,
Feb 12, 2011, 9:53:27 AM2/12/11
to clo...@googlegroups.com
On Sat, Feb 12, 2011 at 8:28 AM, timc <timg...@gmail.com> wrote:
> Further investigation reveals that
>
> (def b (byte i))
>
> is doing something equivalent to this internally:
>
> byte b = Byte.parseByte(String.format("%d",i));
>
> which does indeed throw a NumberFormatException if the decimal integer
> representation given to it produces an out-of-range value (as it
> should).
>
> So - what I'm pleading for, is that (byte b) and (int i), (short s),
> etc. should simply perform a masking operation (on the appropriate
> number of least significant bits) in the way that java clearly does.

For the record, unchecked coercions are in progress:
http://dev.clojure.org/jira/browse/CLJ-441

Aaron Cohen

unread,
Feb 12, 2011, 10:22:25 AM2/12/11
to clo...@googlegroups.com
I should also mention that for this sort of stuff, people often get
tired of raw bit-twiddling and move to something like a wrapper around
protocol buffers (such as https://github.com/ninjudd/clojure-protobuf)
or a DSL such as gloss (https://github.com/ztellman/gloss/wiki).

--Aaron

Brian Hurt

unread,
Feb 12, 2011, 12:46:09 PM2/12/11
to clo...@googlegroups.com

Java guarantees twos complement representations for integers- and in twos complement, the only operations which are different between signed and unsigned are shift right, divide, and modulo.  Java provides a >>> operator for doing an unsigned shift right- not sure how to access it from clojure, I haven't needed to do that yet.  For div and mod you need to up the data size (convert ints to longs, or longs to bigints) and do them there.

By the way, a nice way to think about 2's complement arithmetic is the following: it stores -x as the unsigned number 2^32 - x (or 2^64 - x for 64-bit longs), and then does everything as unsigned arithmetic modulo 2^32 (or 2^64).

Brian

Andy Fingerhut

unread,
Feb 12, 2011, 1:20:35 PM2/12/11
to clo...@googlegroups.com

And until those are available, I have sometimes worked around it by
making a tiny function that takes an int in the range 0..255 and
returns a byte:

(defn ubyte [val]
(if (>= val 128)
(byte (- val 256))
(byte val)))

Andy


Ken Wesson

unread,
Feb 12, 2011, 4:42:01 PM2/12/11
to clo...@googlegroups.com
On Sat, Feb 12, 2011 at 8:28 AM, timc <timg...@gmail.com> wrote:
> Further investigation reveals that
>
> (def b (byte i))
>
> is doing something equivalent to this internally:
>
> byte b = Byte.parseByte(String.format("%d",i));

What the HELL?

That's incredibly icky and inefficient. :)

Why not

if (i < 128 || i > 127)
throw something;
else
b = (byte)i;

which still bounds-checks the conversion but has got to be orders of
magnitude faster?

Aaron Cohen

unread,
Feb 12, 2011, 4:54:11 PM2/12/11
to clo...@googlegroups.com
On Sat, Feb 12, 2011 at 4:42 PM, Ken Wesson <kwes...@gmail.com> wrote:
> On Sat, Feb 12, 2011 at 8:28 AM, timc <timg...@gmail.com> wrote:
>> Further investigation reveals that
>>
>> (def b (byte i))
>>
>> is doing something equivalent to this internally:
>>
>> byte b = Byte.parseByte(String.format("%d",i));
>
> What the HELL?
>
> That's incredibly icky and inefficient. :)
>

Don't worry, it's not actually doing that:
https://github.com/clojure/clojure/blob/f128af9d36dfcb268b6e9ea63676cf254c0f1c40/src/jvm/clojure/lang/RT.java#L902

Ken Wesson

unread,
Feb 12, 2011, 5:04:26 PM2/12/11
to clo...@googlegroups.com

Oh, good. That's basically what I said it should do.

timc

unread,
Feb 12, 2011, 6:31:07 PM2/12/11
to Clojure
Thanks for the help. Sorry I got agitated about this - it was just
that my code (that was doing lots of byte handling) worked with a
previous version of clojure and then stopped working.
Thanks Ken, I shall use your little workaround.

On Feb 12, 10:04 pm, Ken Wesson <kwess...@gmail.com> wrote:
> On Sat, Feb 12, 2011 at 4:54 PM, Aaron Cohen <aa...@assonance.org> wrote:
> > On Sat, Feb 12, 2011 at 4:42 PM, Ken Wesson <kwess...@gmail.com> wrote:
> >> On Sat, Feb 12, 2011 at 8:28 AM, timc <timgcl...@gmail.com> wrote:
> >>> (def b (byte i))
>
> >>> is doing something equivalent to this internally:
>
> >>> byte b = Byte.parseByte(String.format("%d",i));
>
> >> What the HELL?
>
> >> That's incredibly icky and inefficient. :)
>
> > Don't worry, it's not actually doing that:
> >https://github.com/clojure/clojure/blob/f128af9d36dfcb268b6e9ea63676c...

Rasmus Svensson

unread,
Feb 14, 2011, 2:21:58 AM2/14/11
to clo...@googlegroups.com
To turn a signed byte (-128 to 127) into an unsigned one:

(bit-and the-byte 0xff)

The byte (for example 0x80, which is negative) will be extended to an
int (0xffffff80) and anded with 0x000000ff (and you get 0x00000080,
which is positive).

The javadoc for the methods of DataInput[1] contain formulas for how
to convert shorts and ints and longs too. DataOutput[2] contains
comments for the opposite opereration.

1. http://download.oracle.com/javase/6/docs/api/java/io/DataInput.html
2. http://download.oracle.com/javase/6/docs/api/java/io/DataOutput.html

Reply all
Reply to author
Forward
0 new messages