Pesky java interop bug with 0xFFFFFFF error in BuferredReader

206 views
Skip to first unread message

Hank Lenzi

unread,
Dec 25, 2021, 2:23:37 PM12/25/21
to Clojure

Hello --

I'm learning Clojure and its Java interop stuff. I am trying to emulate this function:

 public String readAllCharsOneByOne(BufferedReader bufferedReader) throws IOException {
    StringBuilder content = new StringBuilder();
       
    int value;
    while ((value = bufferedReader.read()) != -1) {
        content.append((char) value);
    }
       
    return content.toString();
}


So far, I've been able to reason that the parts I need are:


(def myfile "/path/to/svenska_sample.txt")
(import java.io.BufferedReader)
(import java.io.FileReader)
(import java.lang.StringBuilder)
(import java.lang.Character)
 
(def a-FileReader (FileReader. myfile))
(def bufferedReader (BufferedReader. a-FileReader))
(def content (StringBuilder.))

which works

user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDe"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen "]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen t"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen ty"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen typ"]
user> (.append content (Character/toChars (.read bufferedReader)))
#object[java.lang.StringBuilder 0x490447d0 "\nDen typi"]

The file is a small text file UTF-8 encoded in Linux with the following content:

❯ cat svenska_sample.txt

Den typiska impulsiva olycksfågeln är en ung man som kraschar flera bilar, och ofta skryter lite med det, i varje fall när han är tillsammans med sina vänner._. För dem har otur i det närmaste blivit en livsstil, och de råkar konstant ut för olyckor, stora som små. Olycksfåglar kallar vi dem. Hur många det finns kan ingen med säkerhet säga, för det finns inga konkreta definitioner på denna grupp, och heller ingen given avgränsning av den. Att de finns, råder det emellertid ingen tvekan om, varken på sjukhusens akutmottagningar eller i försäkringsbranschen

I wrote a function, with my brand new Clojure Java interop chops, that looks like this:

;; COMPILES  - BUT CAN'T GET AROUND THE 0XFFFFFFF BUG
(defn pt%% [file]
   (let [afr (FileReader. file); instances of FileReader, BufferedReader, StringBuffer
         bfr (BufferedReader. afr)
         ct (StringBuilder.)
         val (.read bfr)
         this-list (list afr bfr ct)]
         ; (apply println this-list)
         (loop []
               (when (not (= val -1))
                 (.append ct (Character/toChars (.read bfr))))
               (recur))
                ; when finished...
         (.toString ct)))
         
but it borks with the following error:

user> (pt%% myfile)
Execution error (IllegalArgumentException) at java.lang.Character/toChars (Character.java:8572).
Not a valid Unicode code point: 0xFFFFFFFF

What in the world could be causing this (NOTE: I am not a Java programmer)?
Here is the hex dump of the file text file:

❯ cat svenska_sample.hexdump
00000000: 0a44 656e 2074 7970 6973 6b61 2069 6d70  .Den typiska imp
00000010: 756c 7369 7661 206f 6c79 636b 7366 c3a5  ulsiva olycksf..
00000020: 6765 6c6e 20c3 a472 2065 6e20 756e 6720  geln ..r en ung
00000030: 6d61 6e20 736f 6d20 6b72 6173 6368 6172  man som kraschar
00000040: 2066 6c65 7261 2062 696c 6172 2c20 6f63   flera bilar, oc
00000050: 6820 6f66 7461 2073 6b72 7974 6572 206c  h ofta skryter l
00000060: 6974 6520 6d65 6420 6465 742c 2069 2076  ite med det, i v
00000070: 6172 6a65 2066 616c 6c20 6ec3 a472 2068  arje fall n..r h
00000080: 616e 20c3 a472 2074 696c 6c73 616d 6d61  an ..r tillsamma
00000090: 6e73 206d 6564 2073 696e 6120 76c3 a46e  ns med sina v..n
000000a0: 6e65 722e 5f2e 2046 c3b6 7220 6465 6d20  ner._. F..r dem
000000b0: 6861 7220 6f74 7572 2069 2064 6574 206e  har otur i det n
000000c0: c3a4 726d 6173 7465 2062 6c69 7669 7420  ..rmaste blivit
000000d0: 656e 206c 6976 7373 7469 6c2c 206f 6368  en livsstil, och
000000e0: 2064 6520 72c3 a56b 6172 206b 6f6e 7374   de r..kar konst
000000f0: 616e 7420 7574 2066 c3b6 7220 6f6c 7963  ant ut f..r olyc
00000100: 6b6f 722c 2073 746f 7261 2073 6f6d 2073  kor, stora som s
00000110: 6dc3 a52e 204f 6c79 636b 7366 c3a5 676c  m... Olycksf..gl
00000120: 6172 206b 616c 6c61 7220 7669 2064 656d  ar kallar vi dem
00000130: 2e20 4875 7220 6dc3 a56e 6761 2064 6574  . Hur m..nga det
00000140: 2066 696e 6e73 206b 616e 2069 6e67 656e   finns kan ingen
00000150: 206d 6564 2073 c3a4 6b65 7268 6574 2073   med s..kerhet s
00000160: c3a4 6761 2c20 66c3 b672 2064 6574 2066  ..ga, f..r det f
00000170: 696e 6e73 2069 6e67 6120 6b6f 6e6b 7265  inns inga konkre
00000180: 7461 2064 6566 696e 6974 696f 6e65 7220  ta definitioner
00000190: 70c3 a520 6465 6e6e 6120 6772 7570 702c  p.. denna grupp,
000001a0: 206f 6368 2068 656c 6c65 7220 696e 6765   och heller inge
000001b0: 6e20 6769 7665 6e20 6176 6772 c3a4 6e73  n given avgr..ns
000001c0: 6e69 6e67 2061 7620 6465 6e2e 2041 7474  ning av den. Att
000001d0: 2064 6520 6669 6e6e 732c 2072 c3a5 6465   de finns, r..de
000001e0: 7220 6465 7420 656d 656c 6c65 7274 6964  r det emellertid
000001f0: 2069 6e67 656e 2074 7665 6b61 6e20 6f6d   ingen tvekan om
00000200: 2c20 7661 726b 656e 2070 c3a5 2073 6a75  , varken p.. sju
00000210: 6b68 7573 656e 7320 616b 7574 6d6f 7474  khusens akutmott
00000220: 6167 6e69 6e67 6172 2065 6c6c 6572 2069  agningar eller i
00000230: 2066 c3b6 7273 c3a4 6b72 696e 6773 6272   f..rs..kringsbr
00000240: 616e 7363 6865 6e0a                      anschen.


Any ideas?
TIA
-- Hank



LaurentJ

unread,
Dec 25, 2021, 7:11:46 PM12/25/21
to Clojure
Hi,

Your loop/recur usage is wrong, your error may be because your loop has no halting condition.


Regards
Laurent

Mark Nutter

unread,
Dec 25, 2021, 7:22:46 PM12/25/21
to clo...@googlegroups.com
I think at least part of the problem is your use of val in the let statement. Inside the loop, you're testing (not (= val -1)),  but val is an immutable value defined above the loop as being the first character read from the buffer, so it will always loop until it reads in the 0xFFFFFFF that makes it crash. You probably want to modify your loop like this:

(loop [val (.read bfr)]

  (when (not (= val -1))
    (.append ct (Character/toChars val)))
    (recur (.read bfr))


Now this re-assigns the next character from the buffer to val each time you go through the loop.

I'm not a guru when it comes to Java interop, but I think the above is the main problem you're having right now, so hopefully that will get you back on track.


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/46a1af62-833a-4018-a22c-1c7c26c170aan%40googlegroups.com.

Hank Lenzi

unread,
Dec 25, 2021, 8:35:47 PM12/25/21
to Clojure
Thank for the answers.
Trying to recur with '(recur (.read bfr))' resulted in a:
Syntax error (UnsupportedOperationException) compiling recur at (*cider-repl ~:localhost:41097(clj)*:237:9).
Can only recur from tail position
So I changed the code (see below). 

And now it complains that a previous form that was working '(.append etc..') doesn't and the same error remains. 

user> (pt5 myfile)

Execution error (IllegalArgumentException) at java.lang.Character/toChars (Character.java:8572).
Not a valid Unicode code point: 0xFFFFFFFF

(defn pt5 [file]

   (let [afr (FileReader. file); instances of FileReader, BufferedReader, StringBuffer
         bfr (BufferedReader. afr)
         ct (StringBuilder.)
         this-list (list afr bfr ct)]
         ; (apply println this-list)
         (loop [val (.read bfr)]
               (when (not (= val -1))
                 (.append ct (Character/toChars (.read bfr))))
               (recur val))
                ; when finished...
                (.toString ct)))

Harder then it seemed at first sight...
-- Hank


Harold

unread,
Dec 25, 2021, 11:31:40 PM12/25/21
to Clojure
Hank,

Welcome. Great efforts- it certainly seems like you're learning a lot, and quickly.

`clojure.core/slurp` is related, in case you haven't seen it yet: https://clojuredocs.org/clojure.core/slurp


It's natural when coming from non-functional languages to write `loop`-y code like you did, and while it's neat that Clojure enables writing code in that style (essential, actually, to the stated goal of pragmatism), it's not always helpful. In this case, the implementation of `slurp` also interops with Java to produce strings, but does so at a more helpful level of abstraction.

I'll also note here that the https://ask.clojure.org/ site may be more fun for these types of discussions than the mailing list- though getting help anywhere (there are quite a few places actually) is of course fine.

Best of luck, and warm wishes,
-Harold

Hank Lenzi

unread,
Dec 26, 2021, 8:12:17 AM12/26/21
to Clojure
Thanks, Harold.
You see, that was an exercise in Java interop - I know about slurp, but I was trying to understand what was going on there.
-- Hank

Hank Lenzi

unread,
Dec 26, 2021, 8:41:15 AM12/26/21
to Clojure
2021-12-25, 21:11:46 UTC-3, LaurentJ wrote:
"Hi,

Your loop/recur usage is wrong, your error may be because your loop has no halting condition."

Hi Laurent --
I actually took inspiration from one of the sources you posted:
(import '(javax.sound.sampled AudioSystem AudioFormat$Encoding))

(let [mp3-file (java.io.File. "tryout.mp3")
      audio-in (AudioSystem/getAudioInputStream mp3-file)
      audio-decoded-in (AudioSystem/getAudioInputStream AudioFormat$Encoding/PCM_SIGNED audio-in)
      buffer (make-array Byte/TYPE 1024)]
  (loop []
    (let [size (.read audio-decoded-in buffer)]
      (when (> size 0)
        ;do something with PCM data
        (recur)))))


LaurentJ

unread,
Dec 26, 2021, 8:57:41 AM12/26/21
to Clojure
Hi,

In the quoted example the `recur` call is *inside* the `when` which is a huge difference because there is in this case an halting condition to get out of the loop ;)

regards
Laurent

LaurentJ

unread,
Dec 26, 2021, 9:35:54 AM12/26/21
to Clojure
Hank

Your loop/recur in your pt5 function is still not good. Take the time to read the loop/recur documentation and to understand examples.

A Clojure loop/recur is not really a loop like in other procedural languages.
It is more akin to a new function call at the `loop` point with new args provided by the `recur` call.

If you continue with the function call metaphor
The `loop` call defines 3 things:
  - the starting point of the function
  - the argument name of the function
  - the **initial values** of those arguments

When you need to call again that function with new arguments, you use `recur` with **new values**.
When you don't need to recur, well... dont call `recur` :) and just evaluate a last expression which is the result of the `loop` expression.

regards
Laurent

Hank Lenzi

unread,
Dec 26, 2021, 12:12:56 PM12/26/21
to Clojure
Hi --
Thanks for taking the time to help me.
As far as I understand the examples, loop has this template:

loop [binding]
  (condition
     (statement)
         (recur (binding)))
And in 'recur' the loop is re-executed with new bindings.

There was indeed an issue with the 'recur' outside 'when'. Thanks for pointing that out.
I corrected that in the version below.

I also changed to a smaller file (UTF-8 encoded in Linux), called 'ribs.txt', with the following content:

source/txt on  master [!?]
❯ cat ribs.txt
Try my delicious pork-chop ribs!

source/txt on  master [!?]
❯ cat ribs.hexdump
00000000: 54 72 79 20 6d 79 20 64 65 6c 69 63 69 6f 75 73  Try my delicious
00000010: 20 70 6f 72 6b 2d 63 68 6f 70 20 72 69 62 73 21   pork-chop ribs!
00000020: 0a                                               .


(defn pt8 [file]

   (let [afr (FileReader. file); instances of FileReader, BufferedReader, StringBuffer
         bfr (BufferedReader. afr)
         ct (StringBuilder.)
         this-list (list afr bfr ct)]
         ; (apply println this-list)
         ; put recur INSIDE THE WHEN

         (loop [val (.read bfr)]
               (when (not (= val -1))
                 (.append ct (Character/toChars (.read bfr)))
               (recur [val (.read bfr)])))
                ; when finished...
                (.toString ct)))

I think this fixed the 'recur', because it does rebinding to a new call to "read()".
However, the errors remains.
               
user> (pt8 ribs)

Execution error (IllegalArgumentException) at java.lang.Character/toChars (Character.java:8572).
Not a valid Unicode code point: 0xFFFFFFFF

The file used is sufficiently small so that we can walk the bytes using jshell:

jshell> FileReader afr = new FileReader("/home/hank/source/txt/ribs.txt/")
afr ==> java.io.FileReader@1698c449

jshell> BufferedReader bfr = new BufferedReader(afr)
bfr ==> java.io.BufferedReader@5ef04b5

jshell> StringBuilder ct = new StringBuilder()
ct ==>

FileReader reads 2 bytes per character so, to get to the end, of the first hexdump line, let's walk 32 bytes:

jshell> for (int i=0; i < 31; i++) {
   ...> if ((value = bfr.read()) != -1) { ct.append((char) value); }
   ...> i++;
   ...> }

jshell> ct
ct ==> Try my delicious

00000000: 54 72 79 20 6d 79 20 64 65 6c 69 63 69 6f 75 73  Try my delicious
          T  r  y  SP m  y  SP d  e  l  i  c  i  o  u  s  (<---- YOU ARE HERE)

00000000: 54 72 79 20 6d 79 20 64 65 6c 69 63 69 6f 75 73  Try my delicious
00000010: 20 70 6f 72 6b 2d 63 68 6f 70 20 72 69 62 73 21   pork-chop ribs!
00000020: 0a                                               .

Now we iterate 31 more bytes, stopping short of the last character:
jshell> for (int i=0; i < 30; i++) {
   ...> if ((value = bfr.read()) != -1) { ct.append((char) value); }
   ...> i++;
   ...> }

jshell> ct
ct ==> Try my delicious pork-chop ribs

00000000: 54 72 79 20 6d 79 20 64 65 6c 69 63 69 6f 75 73  Try my delicious
00000010: 20 70 6f 72 6b 2d 63 68 6f 70 20 72 69 62 73 21   pork-chop ribs!
                                                    ^ we stopped here  

We just advance one more:

jshell> if ((value = bfr.read()) != -1) { ct.append((char) value); }
   ...>

jshell> ct
ct ==> Try my delicious pork-chop ribs!

And one more time, to see if it borks:
jshell> if ((value = bfr.read()) != -1) { ct.append((char) value); }

jshell> ct
ct ==> Try my delicious pork-chop ribs!

Nope, everything looks fine. Now where does that '0xFFFFFFF" come from?!
-- Hank

LaurentJ

unread,
Dec 26, 2021, 2:12:48 PM12/26/21
to Clojure
Hi Hank,

That loop/recur is still wrong because `loop` set bindings to define names and gives initial values but `recur` does *not set bindings*, it just provides new values.
So `recur` does not need a vector of bindings like `loop`

The pattern is as follow:
  (loop [a-local-var "initial-value"]
    (if (should-stop-looping? a-local-var)
      (execute-last-expression a-local-var)
      (recur a-new-local-var)))


In your pt8 function, you are reading your stream to many times in a loop, if I write your pt8 function in procedural pseudo-code, this is what I get: 

bfr = input-stream()
ct = string-buffer()

set loop-start       // set loop point
    with val,           // set loop args
    val = read(bfr) // set args initial value, first read

if !(val == -1) {
    ct.append(char(read(bfr)))  // read again! and lose previous byte!!
    goto loop-start
         with val = vector(val, read(bfr))  // read again! this byte will be lost soon :(
}
ct.toString()



Regards,
Laurent

LaurentJ

unread,
Dec 26, 2021, 3:03:33 PM12/26/21
to Clojure
Hank,

Just a message to give you the solution [spoiler alert]
Don't read it, if you still want to search :)






SPOILER












SPOILER





  ;; ugly version using the fact that java objects are mutable in place
  (defn ugly-read-chars-one-by-one
    [reader]
    (let [sb (StringBuilder.)]
      (loop []
        (let [v (.read reader)]
          (if (neg? v)
            (.toString sb)
            (do (.append sb (char v))
                  (recur)))))))


  ;; "better" version using loop bindings
  (defn read-chars-one-by-one
    [reader]
    (loop [sb (StringBuilder.)
              v (.read reader)]
      (if (neg? v)
        (.toString sb)
        (recur (.append sb (char v))
                  (.read reader)))))


  ;; usage
  (require '[clojure.java.io :as io])
  (with-open [rdr (io/reader "some-file.txt")]
    (read-chars-one-by-one rdr))



Regards,
Laurent
Le dimanche 26 décembre 2021 à 18:12:56 UTC+1, hank....@gmail.com a écrit :

Hank Lenzi

unread,
Dec 27, 2021, 9:13:57 AM12/27/21
to Clojure
Hi --
Thanks so much, Laurent!
I was actually kind of close (you told me not to peep, so I didn't hehe).

(defn pt30 [file]
    (def ^:dynamic *asb* (StringBuilder.))
    (let [afr (FileReader. file)
         bfr (BufferedReader. afr)]
       (loop [x (.read bfr)
              *asb* (StringBuilder.)]
             (when (not (= x -1))
             (recur (.read bfr) (.append *asb* (java.lang.Character/toChars x)))))))

with  (def ribs "/path/to/ribs.txt")

which looks similar. But, for some reason, the *asb* variable didn't populate with rib advertisements:

user> (pt30 ribs)
nil
user> *asb*
#object[java.lang.StringBuilder 0x29f4eb0c ""]

When I convert this practically to the same you did,
user> (defn pt34 [file]
    (let [afr (FileReader. file)
         bfr (BufferedReader. afr)]
       (loop [x (.read bfr)
              sb (StringBuilder.)]
             (if (not (= x -1))
             (.toString sb1)
               (recur (.read bfr) (.append sb (char x)))))))
#'user/pt34
user> (pt34 ribs)

""
I still get no beef (err, ribs). Doesn't print anything either. 

Weird. Just weird.

-- Hank


Hank Lenzi

unread,
Dec 27, 2021, 3:34:05 PM12/27/21
to Clojure
Ooops my bad, there's a typo in '(.toString sb1)' which sould be 'sb'.
It doesn't change anything, it still won't work, only Laurent's version works. 

user> (defn pt%% [file]

    (let [afr (FileReader. file)
         bfr (BufferedReader. afr)]
       (loop [x (.read bfr)
              sb (StringBuilder.)]
             (if (not (= x -1))
             (.toString sb)
             (recur  (.append sb (char x)) (.read bfr))))))

#'user/pt%%
user> (pt%% ribs)
""

Which makes no sense to me...
-- Hank

LaurentJ

unread,
Dec 27, 2021, 7:16:18 PM12/27/21
to Clojure
Hank,

Your last version does not work because your `if` condition is wrong, your code stops on the first read ;)

Laurent

Hank Lenzi

unread,
Dec 27, 2021, 8:06:20 PM12/27/21
to Clojure
This is embarassing. ;-) Thanks
Reply all
Reply to author
Forward
0 new messages