Lift-json and the dreaded U+2028 character: to escape or not?

143 views
Skip to first unread message

Matt Feury

unread,
Oct 31, 2012, 12:55:45 AM10/31/12
to lif...@googlegroups.com
Howdy Hi Hello,

We ran into an interesting issue on our servers tonight, involving the dreaded u\2028 character. After some research, I learned something new: There are some characters that are valid javascript but not valid json, including U+2028.

Essentially, someone inserted some data to our db with one of these characters, and once we send this down as json (a JObject), the client throws an 'Unexpected Token ILLEGAL' error. As the above link describes, this is character is not valid json.

Now, I could easily escape these characters on my end, but I wonder if it isn't a better idea to have lift-json handle this case. It would avoid a lot of potential complications, methinks. Antonio recommended a potential solution being to have a toggle on the implicit formats value, but I would go so far as to say, since 'lift-json' handles json, it should *always* escape these characters since they are invalid json.

Thoughts?

Matt

Antonio Salazar Cardozo

unread,
Oct 31, 2012, 2:18:35 AM10/31/12
to lif...@googlegroups.com
Whoopsies, reading too quickly there. Looks like some characters are valid JSON but not valid JavaScript (U+202[89], for example).

So the question is, do we:
(1) make the user do their own escaping
(2) bake it in so u+202[89] are always escaped by lift-json
(3) insert some sort of toggle that allows us to enable this escaping, so that it is disabled by default when lift-json is just dealing in plain JSON.

Thoughts?
Thanks,
Antonio

Joni Freeman

unread,
Oct 31, 2012, 4:11:59 AM10/31/12
to lif...@googlegroups.com
Hi,

Comments inline...


On Wednesday, October 31, 2012 8:18:35 AM UTC+2, Antonio Salazar Cardozo wrote:
Whoopsies, reading too quickly there. Looks like some characters are valid JSON but not valid JavaScript (U+202[89], for example).

So the question is, do we:
(1) make the user do their own escaping

I believe the following should work at the moment:

json transform {
  case JString(s) => JString(myOwnEscapeFunc(s))
}


(2) bake it in so u+202[89] are always escaped by lift-json

I don't think there should be any Javascript specific behaviour bolted in. JSON is often
used as a transport format in machine-to-machine REST APIs too. There Javascript is not relevant.
 
(3) insert some sort of toggle that allows us to enable this escaping, so that it is disabled by default when lift-json is just dealing in plain JSON.

This is something we can consider. Solution (1) comes with a performance hit because
JSON AST is walked and each String is processed twice. More performance optimal
solution would be to do custom escaping at 'quote' function:

  private[json] def quote(s: String): String = {
    val buf = new StringBuilder
    for (i <- 0 until s.length) {
      val c = s.charAt(i)
      buf.append(c match {
         ....
        case c if ((c >= '\u0000' && c < '\u0020')) => "\\u%04x".format(c: Int)
        case c => c
      })
    }
    buf.toString
  }

We could add a new case expression which would consult implicitly passed 'format' to check
if given character should be escaped:

   case c if format.shouldEscape(c) => "\\u%04x".format(c: Int)

If this change feels useful it can be scheduled to Lift 3.0. After all, it requires a small API change:

  def render(value: JValue): Document
  ->
  def render(value: JValue)(implicit format: Formats): Document

Cheers Joni
 

Jeppe Nejsum Madsen

unread,
Oct 31, 2012, 6:02:58 AM10/31/12
to lif...@googlegroups.com
Matt Feury <matt...@gmail.com> writes:

> Howdy Hi Hello,
>
> We ran into an interesting issue on our servers tonight, involving the
> dreaded u\2028 character. After some research, I learned something new: There
> are some characters that are valid javascript but not valid json, including
> U+2028 <http://timelessrepo.com/json-isnt-a-javascript-subset>.
>
> Essentially, someone inserted some data to our db with one of these
> characters, and once we send this down as json (a JObject), the client
> throws an 'Unexpected Token ILLEGAL' error. As the above link describes,
> this is character is not valid json.
>
> Now, I could easily escape these characters on my end, but I wonder if it
> isn't a better idea to have lift-json handle this case. It would avoid a
> lot of potential complications, methinks. Antonio recommended a potential
> solution being to have a toggle on the implicit formats value, but I would
> go so far as to say, since 'lift-json' handles json, it should *always*
> escape these characters since they are invalid json.

Actually, it's a valid JSON character, but an invalid Javascript
character.

It's only when you convert the JSON to use as javascript in the browser
that you'll see this issue.

One other option would be for lift to escape when writing a
JsonResponse. But this would then also escape when writing a pure
rest-service.

/Jeppe

Antonio Salazar Cardozo

unread,
Oct 31, 2012, 11:51:27 AM10/31/12
to lif...@googlegroups.com
Hmm… Yeah, the double-traversal hit is mildly annoying, though we can insert custom escaping along the way for now (i.e., before it hits lift-json).

I'm okay with the formats solution, or, because this is a fairly specific case, could we just have an extra parameter to render in an overload to avoid API changes and roll this in now rather than later? Mostly I'm keeping in mind the fact that for Lift, as a web framework, there are two core/critical consumers of JSON: other JSON libraries, and JavaScript. Thus, I think we should make sure that JS is a well-considered consumer as soon as possible, though I'm fine with avoiding inefficiencies (i.e., longer encodings) for non-JS consumers.

I should mention, according to this StackOverflow question, it looks like these are the characters that need JS escaping:

\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060-\u206f\ufeff\ufff0-\uffff

Thanks,
Antonio

rmat0n

unread,
Jan 27, 2015, 8:40:53 AM1/27/15
to lif...@googlegroups.com
Hi guys,

A little "up" for that case as I recently had this issue.
I just patched lift 3 by added 2028, 2029.. in pattern matching of appendEscapedString which correspond to the solution 1 but solution 3 would be a good/better choice.

Is a fix still planned for Lift 2 or 3? Did not see a issue on it.

Thanks,
Rom1

Antonio Salazar Cardozo

unread,
Jan 27, 2015, 9:29:56 AM1/27/15
to lif...@googlegroups.com
We didn't end up filing an issue; I actually don't remember why we ended up not needing
it, though…

Nothing is currently planned for Lift 3 that I know of.
Thanks,
Antonio

Antonio Salazar Cardozo

unread,
Dec 1, 2015, 10:51:13 PM12/1/15
to Lift
A bit later, but there is now a fix for this in the latest Lift 3 snapshots. You can use
`render(<JValue>, RenderSettings.compactJs)` (or prettyJs) to render with escaping
that is friendly to JS engine direct evaluation of JSON.
Thanks,
Antonio
Reply all
Reply to author
Forward
0 new messages