Goal: automatically parse JSON into nested case classes, OR accumulated errors

53 views
Skip to first unread message

Marc Siegel

unread,
Feb 18, 2015, 11:50:39 AM2/18/15
to raptur...@googlegroups.com
Hi Jon,

Following up on discussion at nescala 2015 and then on Twitter, I see that you are making progress towards the goal we discussed: to automatically parse JSON into an expected nested aggregation of case classes, OR return the accumulated errors. Working with Bill Venner's Scalactic library and its Or/Every classes, I'm hoping that we can find an easy way to also add filters (ie, age > 21) into this process.

Very excited to see where this goes,
-Marc

Jon Pretty

unread,
Feb 18, 2015, 2:19:15 PM2/18/15
to raptur...@googlegroups.com
Hi Marc,

Thanks for the heads up! Here's a brief outline of the problem, and solution I've come up with.

Currently, it's possible to import a "mode" in Rapture which determines the return type of standard calls which might fail, for example,

  import modes.returnTry._
  myJson.as[CaseClass] // returns Try[CaseClass]

  import modes.returnFuture._
  myJson.as[CaseClass] // returns Future[CaseClass]

  import modes.returnEither._
  myJson.as[CaseClass] // returns Either[JsonGetException, CaseClass]

This all works well, though rather than getting the first JsonGetException that occurs, we would like to be able to be able to collect *all* the errors that occur when extracting a `CaseClass`.

Unfortunately, this wasn't possible, because errors were dealt with as thrown exceptions, which may or may not be caught, but either way resulted in processing being aborted at the point of the first error.

I've changed this so that modes may define their own "throw" method. By default, it will continue to throw, but it enables implementations that register the exceptions as a side-effect, return `null` (because we have to return something), and attempts to carry on. When we return a value to the user, the `null`s don't matter because we've already established that there's been at least one failure.

I've called the result `Outcome`, with two subtypes: `Result` (for success) and `Problems` (containing all the issues that occurred).

That explanation probably sounds a bit abstract, so have a look at this Gist:

  https://gist.github.com/propensive/b0ec2a9a22e9fe9834de

Using this new mode style does require methods which use modes to be rewritten, and I've implemented as a test when implementing the Rapture CSV library, which is similar in some ways to Rapture JSON.

Note that the Gist also demonstrates how errors can be extracted from a `Problems` type based on their type (and indeed how the type of `Problems` encodes the exceptions it may contain).

So, Marc, this is the first step in getting what you need. The next step is getting this working on Rapture JSON, and then I will need to work out how to get extraction to potentially cause validation exceptions.

One way I've found to do this is to use artificial marker traits to drive type inference to use alternative extractors, like this:

  trait UpperCase
  implicit def extractor: Extractor[String with UpperCase, Json] =
    Json.extractor[String].filter(_.forall(_.isUpper))

  case class Foo(str: String, upperStr: String with UpperCase, upperStr2: String with UpperCase)

  import modes.returnOutcome._
  json"""{ "str": "Hello world", "upperStr": "lower!", "upperStr2": "x" }""".as[Foo]

In this particular example, validation should fail, so I'd expect to get back something like this:

  val res: Outcome[Foo, DataGetException] =
  Problems(
    rapture.data.ValidationException:
      validation failed accessing <value>.upperStr
      validation failed accessing <value>.upperStr2
  )

That's roughly it.

It's only a brief outline of how it will work, but I think I can get this working. :D

Cheers,
Jon



--
You received this message because you are subscribed to the Google Groups "Rapture users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rapture-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Jon Pretty | @propensive

Marc Siegel

unread,
Feb 20, 2015, 5:48:10 PM2/20/15
to raptur...@googlegroups.com
Hi Jon, this is cool. I'd like to discuss potentially:

  1.  Having an Or / Every output of the accumulated errors
  2.  Finding best available syntax and semantics for writing the validation functions

For point 2, I believe your suggestion below to use marker traits, if I understand it correctly, it probably not exactly what I would want. I am thinking of other ways to do it, such as a typeclass of the type in scope which has either all validation functions, or one single composed validation function that can return multiple failures?

So perhaps more like:

// We want some semantic (not syntactic) validations on these fields depending on context
case class TelevisionWatcher(name: String, age: Int)

// We're looking for that 18-49 target demo in this context
// see section "using when" at http://www.scalactic.org/user_guide/OrAndEvery
def notEmpty(s: String): Validation[ErrorMessage] =
  if (!s.isEmpty) Pass else Fail("blank name")

def hasTwoNames(s: String): Validation[ErrorMessage] =
  if (s.split(" ").length >= 2) Pass else Fail("no last name: " + s)

def inDemo(i: Int): Validation[ErrorMessage] =
  if (i >= 21 && i <= 49) Pass else Fail(i + " not in target 18-49 demo")

implicit val targetDemographic extends ValidateEvery[TelevisionWatcher]( /* specify the validation of fields somehow */ )

import modes.returnOrEvery._
json"""{ "name": "Marc", "age": 50 }""".as[TelevisionWatcher]

val res: Or[TelevisionWatcher, Every[ErrorMessage]] =
  Many("no last name: Marc", "50 not in target 18-49 demo")

This is more or less what I want to get to -- not just exceptional outcomes or failing to "be" the expected type, but also parsing into types and applying context-dependent validations, whether explicit or implicit, and accumulating semantically meaningful errors.

Does this seem like an interesting path as well? 

-Marc

Ryan Tanner

unread,
Mar 22, 2015, 12:30:06 PM3/22/15
to raptur...@googlegroups.com
Jon,

Where is modes.returnOutcome._?  I can't find it in rapture-core but it looks very relevant to a small library I'm thinking about.

Jon Pretty

unread,
Mar 22, 2015, 12:44:00 PM3/22/15
to raptur...@googlegroups.com
Hi Ryan,

So, `returnOutcome`, after I discussed it with Marc, became `returnResult`, and it was the subject of my talk at Scala Days on Wednesday.

The implementation is mostly there, but it's not finished yet, so I'll need to devote a bit more time to it this month before anyone should use it... But I'd be interested to hear about your use case. Would you be able to describe it here?

Cheers,
Jon

Ryan Tanner

unread,
Mar 22, 2015, 2:43:41 PM3/22/15
to raptur...@googlegroups.com
One of the problems I had to tackle a lot at my previous job was tasks which involve lots of finicky IO with really bad third-party servers.  For instance, processing IMAP message headers.  

1) Connect/authenticate
2) List folders
3) For each folder..
   3a) Get a list of message IDs
   3b) For each message ID...
      3b.1) Get headers

Pretty straightforward, but there are lots of different potential errors, some that indicate the entire process needs to be aborted, some that can be ignored.  Others indicate the folder is invalid, others that messages aren't really messages (e.g., calendar entries).  And so on.  This can of course all be done as a few nested while loops but refactoring that gets dangerous fast and these third-party servers aren't static so error handling is forever evolving.  And it's different for every server, so being able to change it at runtime is important.  To complicate matters, with some errors you might want to retain the incremental results gathered thus far and know where you left off, with others everything is invalidated and you need to throw it away.

Each step listed above can fail a number of different ways.  Something like modes.returnResult or Scalactic's Or could be the basis for a much saner form of control flow, especially if each step can use whatever mode is appropriate and then the parent step can compose those as needed.  Sometimes you want to carry an error forward as a value, sometimes you just want to throw it up to the top because you can't recover from it (Yahoo's servers *love* to throw those).

I wound up writing creating a structure similar to what you've described in Result/Outcome (I called it Result as well) in which the sealed trait had two subtypes, Completed and Failed, where Failed included the gathered state of the last successful step, the remainder of steps not yet processed and the reason for failure.  I mostly took my inspiration from the failure handling in Play's iteratee/enumerator classes.  I also used those as the inspiration for a non-async "unfold" which took a list of inputs, an initial state and a function which took an Input element and a State and returned a Step of Stop, Error or Continue, where Stop was a clean completion, Error was a fatal stop condition and Continue meant (of course) continue, providing the new state.

Ultimately I was never entirely pleased with it and it was written specifically around IMAP processing so it didn't compose well.  I'm toying with ways of clarifying it.  My goal is to be able to write the sort of complicated control flow and error handling I described while making it all very obvious and very safe to change.
Reply all
Reply to author
Forward
0 new messages