Hello,
I am having difficulty understanding why I am unable to decode JSON data which was encoded with jansson in the first place and then dumped into hex bytes (using xxd Hex Dumper). I am getting a jansson error trying to load JSON formatted data which has strings in German language (i.e. with characters outside of the basic ASCII character set). The same error occurs whether loading from string or buffer source.
The string element reads something like:
“unable to decode byte 0xf6”
I have seen the error occur on a 0xdf too, and I assume it will on any character above 0x7f.
When encoding using json_dumps(), if I pass the JSON_ENSURE_ASCII flag, then all of the bytes are in standard ASCII range and the json_loads() function works okay. Since I want to load the JSON in UTF-8 anyway, why should I have to escape all of the extended characters?
Why can't I seem to load JSON that is UTF-8 encoded?
Happy to hear your suggestions or advice...
Doug
Sorry. I should clarify that I call xxd with the -i option to output in C include file style. This outputs a C header file which looks like this...
unsigned char info_json[] = {
0x5b, 0x0d, 0x0a, 0x20, 0x20, 0x7b, 0x0d, 0x0a, 0x20, 0x20, 0x20, 0x20,
0x22, 0x65, 0x6e, 0x74, 0x72, 0x79, 0x5f, 0x69, 0x64, 0x22, 0x3a, 0x20,
// cut a lot of bytes...
};
unsigned int info_json_len = 179314;
I'm fairly sure this does nothing to change the text in any way, it is just a hex representation of the JSON data.
0xf6 and 0xdf are valid first bytes of UTF-8 code units, too, but 0xf6
shouldn't appear in German text (all Unicode code points whose UTF-8
representation starts with 0xf6 are outside the BMP).
What do you mean here by outside the BMP? Is 0xf6 not a valid UTF-8 code? It is certainly in the German text I have.