On May 16, 5:20 am, Yuvi Masory <
ymas...@gmail.com> wrote:
> > I noticed today that the standard Arbitrary[String] will never
> > generate strings with characters that lie outside the base
> > multilingual plane
>
> The Scala spec states that only characters in the basic multilingual plane
> are supported, so that's probably a good default for ScalaCheck.
Actually, it says the "Scala programs are written using the Unicode
Basic
Multilingual Plane (BMP) character set"; Char, to the extent it's
specified
at all, is merely an unsigned 16-bit integer type and String of course
is
("usually" says the spec but in practice always) the underlying
platform's
string class, and therefore UTF-16. Anyway, since there are no non-
BMP
characters in the actual source code, it's fine according to my
reading.
> But what
> you've written could be a valuable addition for when you're testing Java or
> doing something else with Strings. I've been writing a bunch of generators
> for Unicode if you're interested:
https://github.com/quala/qualac/blob/master/src/main/scala/lex/Charac...
..and that's actually very useful indeed. Dealing with unicode on the
codepoint level instead of the UTF-16 stuff the JVM and CLR force on
us is... well, still not exactly a walk in the park, but better.