invalid generated Strings when testing java

256 views
Skip to first unread message

Ingvar Bogdahn

unread,
Jul 27, 2011, 7:04:17 AM7/27/11
to scala...@googlegroups.com
Hi,

I'm new with scalacheck, and trying to test a java program. I get errors which I somehow suspect to be false negative, and when searching I found someone reporting that scalacheck generates invalid utf16 strings  (under the third code sample) :

then I saw the thread "Arbitrary instance for Strings" on this forum, in which "rm" made a suggestion on string generator, and Yuvi Masory made a comment that this may be useful for testing java. However, I don't understand well what's going on, and if it really applies to me, because rm suggests to include characters that lie outside the base multilingual plane, wheras I want to rather restrict it to valid utf-16.

Is there some easy trick to get fully Java compatible Strings?

thanks

Ingvar

Ingvar Bogdahn

unread,
Aug 8, 2011, 3:10:12 PM8/8/11
to scala...@googlegroups.com
I'd still need an answer to the question on how to obtain a Generator that generates valid Java Strings. 
I'd appreciate any hints. 
Thanks

Ben Jackman

unread,
Aug 17, 2011, 6:53:56 PM8/17/11
to scala...@googlegroups.com
I've had similar issues, I just made my own generator for strings. It's pretty simple to do, that's what I would suggest.

Ingvar Bogdahn

unread,
Aug 18, 2011, 2:48:41 AM8/18/11
to scala...@googlegroups.com
Do you mind sharing how you did it, or just post the code? In
particular, I still want to test all characters Java supports, not
just a selection of them, such as Ascii. I don't know which characters
are the troublemakers and even then I don't know how to create a
generator 'all except troublemaker characters'. I'd appreciate some
hints, thanks

Ben Jackman

unread,
Aug 23, 2011, 11:23:03 AM8/23/11
to scala...@googlegroups.com
Unfortunately the generator I made only tested a much smaller subset of the legal characters. The issue is, as you are probably aware, the surrogate pair system used in Java. My suggestion would be to try to generate random ints (which correspond to unicode code points) then build a string up from the unicode representation.

This is a good read if you haven't already:

I haven't tried this thoroughly but it might work:

class TestStringGenerator extends Suite {
  import org.scalacheck.Gen._
  def unicodeScalar: Gen[Int] = choose(0, 0x10FFFF)
  def unicodeChars: Gen[Array[Char]] = unicodeScalar map (cp => Character.toChars(cp))
  def unicodeStr: Gen[String] = for (css <- listOf(unicodeChars)) yield {
    css.flatten.mkString
  }

  def testStringGen {
    for (i <- 0 until 10000) {
      println(unicodeStr.sample.get)
    }
  }

}

Ingvar Bogdahn

unread,
Sep 4, 2011, 8:07:07 AM9/4/11
to scala...@googlegroups.com
Hi Ben, 
thanks for your answer and for sharing the code. I tested it, it works, but unreliably. This property fails (but after several successfull tests)
    val prop = Prop.forAll(combo)(s => s== new String(s.getBytes))

However, I realized that I was using scalacheck 1.8, and that with version 1.9 correct java String are created, so this works and I could test my stuff. 
There is only limitation that leading / trailing surrogates are not created.

However, in the link of the first post:
there is a generator code that should work also for those.

ingvar
Reply all
Reply to author
Forward
0 new messages