Hi,
On 30 December 2016 at 20:07, Nigel <
nigel....@gmail.com> wrote:
> The Content-Type that the API currently returns for JSON indicates ISO
> 8859-1 which is apparently not allowed by the RFC for JSON content.
This is why the API does not at present claim the results are JSON :-)
I believe it uses “JS” in the documentation, and the Content-Type
header is “text/javascript”, not “application/json”.
> I am going to see if I can get the Servant library to accept it by creating
> a new MIME type for JSON received from this service but it would be good if
> people didn't have to, additionally it could be confusing some other
> libraries and causing decoding issues if they actually attempt to honour the
> charset and decode UTF8 data as ISO 8859-1.
I fully agree; do note the data *is* (for MP data) ISO-8859-1, e.g.
see the output of getMP with id=11148 (constituency), or id=11863
(given name). So there shouldn’t be decoding issues if that is
honoured (though I would of course understand if a JSON library
refused to act on non-UTF-8 data). There are a tiny number of speeches
that do appear to have ISO-8859-1 or UTF-8 data, but those could be
classed as mistakes in either case – almost all special characters are
stored as HTML entities, which of course isn’t great either by any
means!
> If this was a conscious decision on the part of the maintainers I'd be
> interested to hear the reasoning.
It is only a conscious decision in so much as it is historic attrition
due to the age of the site and source data. It would obviously be
better if the output were in UTF-8 and it could therefore be true
JSON, but it is not currently so, we haven’t had the time to do
anything about it, and clearly no-one else has yet either :) Perhaps a
straightforward conversion before API output would be all that is
necessary for the API, rather than trying to solve it at a deeper
internal level (though I think the name/constituency source data is in
UTF-8, so that at least would hopefully not be complex).
ATB,
Matthew