'utf8' codec can't decode byte 0xfa in position 15: invalid start byte

Caden Howell

unread,

Aug 28, 2015, 6:11:06 PM8/28/15

to Learning Registry Developers List

Hi,

We have also been getting a lot of errors recently of the form:

'utf8' codec can't decode byte 0xfa in position 15: invalid start byte

where 0xfa and 15 aren't always the same.  It looks like we're probably sending latin where utf-8 is expected.  Have the assumptions about what is expected by the server changed recently?  We're using the com.navnorth.learningregistry package.

Thanks,

Caden

Steve Midgley

unread,

Aug 28, 2015, 6:54:10 PM8/28/15

to learnin...@googlegroups.com

They may have changed in terms of the spec becoming more strongly enforced. UTF8 is the required charset for submitted envelopes. 0.51 version which was uploaded to production recently might be more strongly enforcing that - Joe will have to comment.

But there is a plausible reason why your stuff has started breaking recently b/c we upgraded to spec/implementation version 0.51 very recently..

Steve

--
You received this message because you are subscribed to the Google Groups "Learning Registry Developers List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to learningreg-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jim Klo

unread,

Aug 28, 2015, 10:51:14 PM8/28/15

to learnin...@googlegroups.com

Greetings, long time no see everyone!

I've been here in the background helping Joe with some of the testing of .51 and assessment of the interoperability between .49 and .41.

Yes, there is a lot more spec enforcement taking place in the new .51 build, as well as some significant new features.

UTF8 is likely one of them as the JSON spec requires UTF8 encoding, however I don't recall offhand. The two most notable new features are the change in .51 spec (resource_data field on the envelope is now a string, JSON objects are disallowed), and the support for whitelisted public keys to permit update of documents that have been abandoned by users or the original signing key has been lost. I haven't had much chance to follow up with Joe, so I don't exactly know the full details of the upgrade and problems he encountered during the upgrade - however he did mention in an email to me that he had some difficulties. He is probably getting paraded on by the current entomological residents of Blackrock, so I don't expect to hear back from him for another week or so. I know he had asked me for some advice regarding the upgrade and the instructions I provided him were based upon the testbed environment I created for the work he asked me to help out on. I'm not sure if he updated uwsgi as part of this, however I know that I had to use the latest version of uwsgi available, as the version specified was no longer available. It's possible that the version that is on Sandbox does not mirror the one on the testbed (hence why a possible bug may exist) The version used on the test environment does require significant changes to configuration for launching LR, however testing did show that it does resolve issues related to forking and threading of background jobs that have historically plagued LR.

By the way, anyone interested in inspecting the testbed (or trying it out) can do so here as I've recently made it available https://bitbucket.org/jimklo/lr-vagrant/overview

With what is included on that repo is a Vagrant configuration for launching a local 0.49 and 0.51 mini-LR network. Note that I believe it is still configured to update against my LR Github branch for .49 and .51, however I believe all my changes have been merged - so it should theoretically behave the same. Some minor script changes could be made to install against the the official distribution. The configuration as-is is fine for standalone testing, however, because it is pre-configured with "bogus" signing keys and some other test specific configuration - you should not use it the configuration as-is to distribute into the production LR network without careful updates to the configuration.

joe hobson

unread,

Sep 18, 2015, 5:25:33 PM9/18/15

to learnin...@googlegroups.com

Thanks for reporting the bug, and to everyone for trying to work through it. I don't recall any distinct updates around UTF8, but it could have snuck in as a byproduct of stricter compliance on other things. I still need to run some more tests with LRJavaLib and hopefully I'll be able to put out an update in the next week or two. The code is open, so if you get a chance to look at it before I do, feel free to submit a pull request.

I'd also like to make sure everyone sees Jim's mention of his Vagrant test nodes. This is the easiest way I've seen to have your own LR node up and running quickly. Most people can probably get by just developing against the sandbox node, but if you run into trouble and you need to see what errors the node is throwing internally, having your own setup can save you lots of time. We're looking at making some updates to those in the near future, including an option to preload with a set of resource data to test against, for those building apps that consume from the LR, but don't want to wade through a million records on the public nodes.

joe hobson

unread,

Oct 21, 2015, 6:43:29 PM10/21/15

to Learning Registry Developers List

I looked into it further and can verify that these errors are not due to the 0.51.0 update. Trying to publish non UTF-8 text to node01 results in the same error. I guess the real question is whether the string should be verified/converted on the application side (by LRJavaLib) or on the node itself. If the node is going to require UTF-8 then it would be nice if the error trapping were a little more clear. I'm surprised that this hasn't come up before, but it might just be something the encoding libraries we're using in LRJavaLib allow, that other implimentations don't. I tried a Latin string with LRphpLib and it wouldn't get past the signing stage because the OpenPGP library didn't like it.

Reply all

Reply to author

Forward