Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Lift-json and the dreaded U+2028 character: to escape or not?
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Antonio Salazar Cardozo  
View profile  
 More options Oct 31 2012, 11:51 am
From: Antonio Salazar Cardozo <savedfastc...@gmail.com>
Date: Wed, 31 Oct 2012 08:51:27 -0700 (PDT)
Local: Wed, Oct 31 2012 11:51 am
Subject: Re: Lift-json and the dreaded U+2028 character: to escape or not?

Hmm… Yeah, the double-traversal hit is mildly annoying, though we can
insert custom escaping along the way for now (i.e., before it hits
lift-json).

I'm okay with the formats solution, or, because this is a fairly specific
case, could we just have an extra parameter to render in an overload to
avoid API changes and roll this in now rather than later? Mostly I'm
keeping in mind the fact that for Lift, as a web framework, there are two
core/critical consumers of JSON: other JSON libraries, and JavaScript.
Thus, I think we should make sure that JS is a well-considered consumer as
soon as possible, though I'm fine with avoiding inefficiencies (i.e.,
longer encodings) for non-JS consumers.

I should mention, according to this StackOverflow question<http://stackoverflow.com/questions/2965293/javascript-parse-error-on-...>,
it looks like these are the characters that need JS escaping:

\u0000\u00ad\u0600-\u0604\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202f\u2060 -\u206f\ufeff\ufff0-\uffff

Thanks,
Antonio

On Wednesday, October 31, 2012 4:11:59 AM UTC-4, Joni Freeman wrote:

> Hi,

> Comments inline...

> On Wednesday, October 31, 2012 8:18:35 AM UTC+2, Antonio Salazar Cardozo
> wrote:

>> Whoopsies, reading too quickly there. Looks like some characters are
>> valid JSON but not valid JavaScript (U+202[89], for example).

>> So the question is, do we:
>> (1) make the user do their own escaping

> I believe the following should work at the moment:

> json transform {
>   case JString(s) => JString(myOwnEscapeFunc(s))
> }

> (2) bake it in so u+202[89] are always escaped by lift-json

> I don't think there should be any Javascript specific behaviour bolted in.
> JSON is often
> used as a transport format in machine-to-machine REST APIs too. There
> Javascript is not relevant.

>> (3) insert some sort of toggle that allows us to enable this escaping, so
>> that it is disabled by default when lift-json is just dealing in plain JSON.

> This is something we can consider. Solution (1) comes with a performance
> hit because
> JSON AST is walked and each String is processed twice. More performance
> optimal
> solution would be to do custom escaping at 'quote' function:

>   private[json] def quote(s: String): String = {
>     val buf = new StringBuilder
>     for (i <- 0 until s.length) {
>       val c = s.charAt(i)
>       buf.append(c match {
>          ....
>         case c if ((c >= '\u0000' && c < '\u0020')) => "\\u%04x".format(c:
> Int)
>         case c => c
>       })
>     }
>     buf.toString
>   }

> We could add a new case expression which would consult implicitly passed
> 'format' to check
> if given character should be escaped:

>    case c if format.shouldEscape(c) => "\\u%04x".format(c: Int)

> If this change feels useful it can be scheduled to Lift 3.0. After all, it
> requires a small API change:

>   def render(value: JValue): Document
>   ->
>   def render(value: JValue)(implicit format: Formats): Document

> Cheers Joni

> Thoughts?
>> Thanks,
>> Antonio

>> On Wednesday, October 31, 2012 12:55:45 AM UTC-4, Matt Feury wrote:

>>> Howdy Hi Hello,

>>> We ran into an interesting issue on our servers tonight, involving the
>>> dreaded u\2028 character. After some research, I learned something new: There
>>> are some characters that are valid javascript but not valid json, including
>>> U+2028 <http://timelessrepo.com/json-isnt-a-javascript-subset>.

>>> Essentially, someone inserted some data to our db with one of these
>>> characters, and once we send this down as json (a JObject), the client
>>> throws an 'Unexpected Token ILLEGAL' error. As the above link describes,
>>> this is character is not valid json.

>>> Now, I could easily escape these characters on my end, but I wonder if
>>> it isn't a better idea to have lift-json handle this case. It would avoid a
>>> lot of potential complications, methinks. Antonio recommended a potential
>>> solution being to have a toggle on the implicit formats value, but I would
>>> go so far as to say, since 'lift-json' handles json, it should *always*
>>> escape these characters since they are invalid json.

>>> Thoughts?

>>> Matt


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.