Unfortunately the generator I made only tested a much smaller subset of the legal characters. The issue is, as you are probably aware, the surrogate pair system used in Java. My suggestion would be to try to generate random ints (which correspond to unicode code points) then build a string up from the unicode representation.
This is a good read if you haven't already:
I haven't tried this thoroughly but it might work:
class TestStringGenerator extends Suite {
import org.scalacheck.Gen._
def unicodeScalar: Gen[Int] = choose(0, 0x10FFFF)
def unicodeChars: Gen[Array[Char]] = unicodeScalar map (cp => Character.toChars(cp))
def unicodeStr: Gen[String] = for (css <- listOf(unicodeChars)) yield {
css.flatten.mkString
}
def testStringGen {
for (i <- 0 until 10000) {
println(unicodeStr.sample.get)
}
}
}