String.getBytes does not translate to byte[]?

1,964 views
Skip to first unread message

tsuraan

unread,
May 12, 2009, 2:45:12 PM5/12/09
to clo...@googlegroups.com
I'm trying to encode a java string into utf-8 for encapsulation within
an OtpErlangBinary
(http://erlang.org/doc/apps/jinterface/java/com/ericsson/otp/erlang/OtpErlangBinary.html).
When I try to construct an OtpErlangBinary from the results of
String.getBytes(encoding), I get bad data. A string (pure ascii) with
20 characters becomes a 47-byte OtpErlangBinary, and none of the bytes
in that binary seem to correspond to the bytes of the string. The
simple function that I have looks like this:

(defmethod to-otp String [ s ]
(new OtpErlangBinary (.getBytes s "utf-8")))

And the OtpErlangBinary gets 47 bytes of data for a 20 byte string.
However, if I change that code to read:

(defmethod to-otp String [ s ]
(new OtpErlangBinary (.getBytes (str s) "utf-8")))

(notice the (str s)), the code works. It seems really strange to me
that this should happen, but I don't know clojure well enough to
determine what's going on under the hood. Is there some way to dump
the java code equivalent of those two functions, so I can compare
them? I've tried making a minimal test case for converting strings
into OtpErlangBinaries, but I can't get this bug to manifest in any
circumstances other than this one program.

Any tips for debugging would be much appreciated. I'm running clojure
1.0.0, with java 1.6.0, and erlang jinterface 1.4.

Adrian Cuthbertson

unread,
May 12, 2009, 8:33:19 PM5/12/09
to clo...@googlegroups.com
Using a java nio ByteBuffer to simulate what you're doing, the
following works ok for me;

(defmulti t-str class)
(defmethod t-str String [s] (java.nio.ByteBuffer/wrap
(.getBytes s "us-ascii")))

(t-str "abcde")
#<HeapByteBuffer java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]>

(defmethod t-str String [s] (java.nio.ByteBuffer/wrap
(.getBytes s "utf-8")))

(t-str "abcde")
#<HeapByteBuffer java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]>

Maybe there's something about the particular [ s ] object that you're
passing in?

Rgds, Adrian.

tsuraan

unread,
May 12, 2009, 10:59:10 PM5/12/09
to clo...@googlegroups.com
> Maybe there's something about the particular [ s ] object that you're
> passing in?

I believe that you're right; in general, the getBytes seems to work.
It is just in this one freakish case that it doesn't, but I have no
idea how to tell what's special about my string. I'm not exactly
proficient in Java, and I'm a complete clojure newbie, so I really
have no idea how to proceed with debugging this.

Under what circumstances would (str s) give something different from
s, when (.getClass s) gives java.lang.String?

Also, sorry for the horrible subject line; that was my initial guess,
and I had a big email written up about it, but then I started testing
my assumptions and they were all garbage. I rewrote the email, but
forgot about the subject...

Adrian Cuthbertson

unread,
May 13, 2009, 12:53:37 AM5/13/09
to clo...@googlegroups.com
Well, under the covers the str function applies the java "toString"
method to any passed in object and hence the result could for some
reason be different to the original String object passed in. I think
this could occur if the object subclasses String, but has a different
representation (i.e a different toString method).

Are you able to post what s contains when this happens?
You could try printing s in your calling function and then also in the
defmethod, e.g;

(defmethod t-str String [s] (prn "cls:" (class s) "s:" s "chars:" (vec s))))

Maybe that'll shed some light?

tsuraan

unread,
May 13, 2009, 10:35:43 AM5/13/09
to clo...@googlegroups.com
> Well, under the covers the str function applies the java "toString"
> method to any passed in object and hence the result could for some
> reason be different to the original String object passed in. I think
> this could occur if the object subclasses String, but has a different
> representation (i.e a different toString method).
>
> Are you able to post what s contains when this happens?
> You could try printing s in your calling function and then also in the
> defmethod, e.g;
>
> (defmethod t-str String [s] (prn "cls:" (class s) "s:" s "chars:" (vec s))))
>
> Maybe that'll shed some light?

I thought I sent this to the list, but I think I just sent it to a
single person, so I'm trying again...

Ok, here's a stripped down set of code that has the problem. It does
require that erlang be installed with java support. I couldn't figure
out a way to duplicate the problem without it, unfortunately.

To test this out, first start erlang like this:

erl -setcookie cookie -sname erl@localhost

Then, run the attached demo.clj. Once that's started (it won't print
anything, so just give it a sec to get running), run the following
from the erlang REPL:

{ echo, echo@localhost } ! { self(), "hi there" }.

and then, after the demo.clj has exited,

(fun() -> receive X -> X after 0 -> nil end end)().

As the code is submitted, you will get a reply that looks like

<<172,237,0,5,117,114,0,2,91,66,172,243,23,248,6,8,84,224,
2,0,0,120,112,0,0,0,8,104,105,...>>

If you replace the (let [ inside s ...]) with (let [ inside (str s)
...]) and rerun the test, you will get the string "hi there", as a
binary. On the clojure side, the following is printed

with [ inside s ]:

I was given a class java.lang.String
Chars are [\h \i \space \t \h \e \r \e]
Given string is 'hi there'
Bytes length is 8
OtpBinary size is 35

with [ inside (str s) ]

I was given a class java.lang.String
Chars are [\h \i \space \t \h \e \r \e]
Given string is 'hi there'
Bytes length is 8
OtpBinary size is 8

So, I'm hoping this gives somebody an idea, because I'm stumped.

demo.clj

Boris Mizhen - 迷阵

unread,
May 13, 2009, 10:42:53 AM5/13/09
to clo...@googlegroups.com
> Well, under the covers the str function applies the java "toString"
> method to any passed in object and hence the result could for some
> reason be different to the original String object passed in. I think
> this could occur if the object subclasses String, but has a different
> representation (i.e a different toString method).
You can't subclass a String in Java because String is a final class.
This is enforced by JVM (not just compiler).
I don't know java dynamic proxy well, but would be surprised if it
allowed a proxy for a String as it would circumvent JVM's security
mechanisms that rely on immutable strings...

Boris

Christophe Grand

unread,
May 13, 2009, 11:05:29 AM5/13/09
to clo...@googlegroups.com
tsuraan a écrit :

> I was given a class java.lang.String
> Chars are [\h \i \space \t \h \e \r \e]
> Given string is 'hi there'
> Bytes length is 8
> OtpBinary size is 35
>
> with [ inside (str s) ]
>
> I was given a class java.lang.String
> Chars are [\h \i \space \t \h \e \r \e]
> Given string is 'hi there'
> Bytes length is 8
> OtpBinary size is 8
>
> So, I'm hoping this gives somebody an idea, because I'm stumped.
>

I guess that if you enable reflection warnings, you'll get a warning on
the line where you invoke the constructor.

I think the reflective dispatch doesn't pick the good constructor and
invokes OtpErlangBitstr(Object) instead of OtpErlangBitstr(byte[]).
Thus your byte[] is serialized and the 35 above is the length of the
serialized array.

A type hint on s should fix the issue:

(defmethod to-otp String [ #^String s ]
(OtpErlangBinary. (.getBytes s "utf-8")))

hth

Christophe

--
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.blogspot.com/ (en)


tsuraan

unread,
May 13, 2009, 12:03:09 PM5/13/09
to clo...@googlegroups.com
> I guess that if you enable reflection warnings, you'll get a warning on
> the line where you invoke the constructor.
>
> I think the reflective dispatch doesn't pick the good constructor and
> invokes OtpErlangBitstr(Object) instead of OtpErlangBitstr(byte[]).
> Thus your byte[] is serialized and the 35 above is the length of the
> serialized array.
>
> A type hint on s should fix the issue:
>
> (defmethod to-otp String [ #^String s ]
> (OtpErlangBinary. (.getBytes s "utf-8")))

Ok, that worked, but I don't understand why. How does forcing the
type of s to be String cause the dispatch to use the byte[]
constructor instead of the Object constructor? This would make sense
if we were somehow coercing the return value of .getBytes to be a
byte[] instead of an Object, but I don't see how the type checker is
calling any constructor other than a byte[] constructor when that's
the return value of getBytes already.

Does this have anything to do with the fact that (class (first
(.getBytes "hi"))) gives java.lang.Byte instead of just plain byte? I
had noticed that (and thus the subject of this topic), but in my
smaller tests it didn't seem to matter.

tsuraan

unread,
May 13, 2009, 12:12:12 PM5/13/09
to clo...@googlegroups.com
> I guess that if you enable reflection warnings, you'll get a warning on
> the line where you invoke the constructor.

As a bit of an aside, is there a reason that using multimethods with
class-based dispatch doesn't add type hints by itself? It seems sort
of strange that it's necessary to define your methods with (defmethod
name Class [ #^Class arg ] ...) instead of just being able to do
(defmethod name Class [ arg ] ...).

Christophe Grand

unread,
May 13, 2009, 12:17:01 PM5/13/09
to clo...@googlegroups.com
tsuraan a écrit :

>> I guess that if you enable reflection warnings, you'll get a warning on
>> the line where you invoke the constructor.
>>
>> I think the reflective dispatch doesn't pick the good constructor and
>> invokes OtpErlangBitstr(Object) instead of OtpErlangBitstr(byte[]).
>> Thus your byte[] is serialized and the 35 above is the length of the
>> serialized array.
>>
>> A type hint on s should fix the issue:
>>
>> (defmethod to-otp String [ #^String s ]
>> (OtpErlangBinary. (.getBytes s "utf-8")))
>>
>
> Ok, that worked, but I don't understand why. How does forcing the
> type of s to be String cause the dispatch to use the byte[]
> constructor instead of the Object constructor? This would make sense
> if we were somehow coercing the return value of .getBytes to be a
> byte[] instead of an Object, but I don't see how the type checker is
> calling any constructor other than a byte[] constructor when that's
> the return value of getBytes already.
>

It's a type hint, it's not a type coercion.

Without the type hint, the compiler doesn't know the type of s, so it
can't find the .getBytes method (nor, of course, its return type) and,
in doubt, picks the "broader" constructor: OtpErlangBinary(Object).
With the type hint, the compiler know that s must be treated as a
String, it find .getBytes, sees that it returns a byte[] and is able to
select the right constructor for OtpErlangBinary.

tsuraan

unread,
May 13, 2009, 12:56:35 PM5/13/09
to clo...@googlegroups.com
> It's a type hint, it's not a type coercion.
>
> Without the type hint, the compiler doesn't know the type of s, so it
> can't find the .getBytes method (nor, of course, its return type) and,
> in doubt, picks the "broader" constructor: OtpErlangBinary(Object).
> With the type hint, the compiler know that s must be treated as a
> String, it find .getBytes, sees that it returns a byte[] and is able to
> select the right constructor for OtpErlangBinary.

Is the OtpErlangBinary constructor that's being used actually
determined at compile time, rather than at run time?

Reply all
Reply to author
Forward
0 new messages