Hey guys,
First of all I would like to ask some thing. In clojure the following statements results in:
(identical? "foo" (str "f" "oo"));;=> false
> (= "foo" (str "f" "oo"))
;;=>
true
Everything is ok with that. The next one on the other hand is what puzzles me:
(identical? \f (first (str "f" "oo")))
;;=>
true
If what I guess is right, the amount of chars that exist are finite, thus Clojure treats them like a "pool of charts". The question is then why are not strings implemented as vectors of charts instead of using the underlying Java String class? As by using the Java String new allocations would have to be performed every time that a new string needs to be created, even if it contains exactly the same information of an existing string.
This might not be a big deal if the amount of strings is small of lazy produced, but (at least in my case) when I needed to load a relatively small text file (120 MB) fully into memory* then I started having memory problems as (from what I saw with YourKit) I had lots of repeated strings.
I would like to know your thoughts on this idea and implications/problems with it.
---------------------------------------------------------------------- notes
* yes I know that I could lazily analyze the whole file and thus avoid having memory problems, but in some cases such as using sort-by or group-by, there is no other alternative than holding the whole thing and then process it.