Jonathan Castello wrote:
> On Wed, Apr 21, 2010 at 12:44 AM, Petri Lehtinen <
pe...@digip.org> wrote:
> > json_string() requires an UTF-8 encoded string as the argument. Byte
> > 255 is not valid UTF-8. The UTF-8 encoded representation of 255
> > consists of two bytes: 195 and 191.
>
> Hmm, I see. Apparently that goes for characters 128 through 254 as well.
>
> > To exploit Jansson so that you can use arbitrary binary data in
> > strings, you must encode your data in UTF-8 before JSON encoding, and
> > decode the UTF-8 strings back to raw binary after JSON decoding. You
> > must also escape zero bytes somehow, as Jansson doesn't allow them
> > (even though JSON does).
>
> I meant to mention that too, actually. Jansson claims full UTF-8
> support, yet it doesn't directly support U+0000. Is this something to
> look forward to in Jansson 2.0?
I'll see if it's easy to implement support for strings with embedded
zero bytes. Other people have requested this, too.
Actually, I know it's quite easy, but I'm only willing to do it if
I'll manage to invent a smart API for it.
> Also, I notice that there's a utf8_encode function defined in utf.c,
> but either it's not used anywhere or Intellisense has bailed on me
> again. If I called that before using json_string(), would that encode
> 128 - 255 properly?
utf8_encode() is used in src/load.c to convert \u escapes to UTF-8. It
would do the trick for you, yes, although the Windows APIs might have
functions to do UTF-8 encoding for whole strings at once.
utf8_encode() encodes a single Unicode code point (or byte) at a time.
Petri