CouchDB Invalid JSON UTF-8

292 views
Skip to first unread message

pjmorce

unread,
Apr 25, 2012, 2:09:33 AM4/25/12
to sql-to-nosql-importer-discuss
Hello,

I am trying SQLToNoSQLImporter to import data to a couchDB database
from a Postgresql database.

I configured correctly the import.properties and db-data-config files.

When I execute run.bat command (I am using windows), I get the
following result:

07:50:14,568 INFO DataImporter:134 - Data Configuration loaded
successfully
07:50:18,477 ERROR DataImporter:178 - ***** Data import failed.
**********
Reason is :
org.apache.http.HttpException: HTTP/1.1 400 Bad Request
at
net.sathis.export.sql.couch.CouchWriter.post(CouchWriter.java:68)
at
net.sathis.export.sql.couch.CouchWriter.writeToNoSQL(CouchWriter.java:
52)
at net.sathis.export.sql.DocBuilder.execute(DocBuilder.java:
142)
at
net.sathis.export.sql.DataImporter.doFullImport(DataImporter.java:174)
at
net.sathis.export.sql.DataImporter.doDataImport(DataImporter.java:93)
at
net.sathis.export.sql.SQLToNoSQLImporter.main(SQLToNoSQLImporter.java:
19)


As you can see, the configuration file is loaded correctly. In the
couchDB database log file, I get the following error:

[debug] [<0.147.0>] Invalid JSON: {{error,
{126,
"lexical error: invalid bytes
in UTF8 string.\n"}},
<<"{\"docs\":[{\"_id\":\"0\",\"label
\":\"Pas de taches\"},{\"_id\":\"1\",\"description\":\"Le pourcentage
de recouvrement est < 2 %\",\"label\":\"Très peu nombreuses\"},{\"_id
\":\"2\",\"description\":\"Le p.......

I think the problem happens because the text contained in the table
has special characters ("è", etc.).

The postgresql database is coded in UTF-8.

Anyone can help me to solve this issue?

Thank you

Best regards

sathis kumar

unread,
Apr 25, 2012, 3:55:14 AM4/25/12
to sql-to-nosql-i...@googlegroups.com
Can you please update this thread with the input record for which the import fails?. So i can findout where the problem is (i.e from sql-to-nosql-importer or couchdb)?
--
Regards
 sathis


pjmorce

unread,
Apr 25, 2012, 4:09:17 AM4/25/12
to sql-to-nosql-importer-discuss
Hello

Thanks for your answer.

The information I am trying to send to couchDB is the following one:

[debug] [<0.147.0>] Invalid JSON: {{error,
{126,
"lexical error: invalid bytes
in UTF8 string.\n"}},
<<"{\"docs\":[{\"_id\":\"0\",\"label
\":\"Pas de taches\"},{\"_id\":\"1\",\"description\":\"Le pourcentage
de recouvrement est < 2 %\",\"label\":\"Très peu nombreuses\"},{\"_id
\":\"2\",\"description\":\"Le pourcentage de recouvrement est compris
entre 2 et 5 %\",\"label\":\"Peu nombreuses\"},{\"_id\":\"3\",
\"description\":\"Le pourcentage de recouvrement est compris entre 5
et 15 %\",\"label\":\"Assez nombreuses\"},{\"_id\":\"4\",\"description
\":\"Le pourcentage de recouvrement est compris entre 15 et 40 %\",
\"label\":\"Nombreuses\"},{\"_id\":\"5\",\"description\":\"Le
pourcentage de recouvrement est compris entre 40 et 80 %\",\"label\":
\"Très nombreuses\"},{\"_id\":\"6\",\"description\":\"Le pourcentage
de recouvrement est supérieur à 80 %\",\"label\":\"Dominantes\"}]}">>}


Regards

sathis kumar

unread,
Apr 25, 2012, 5:03:38 AM4/25/12
to sql-to-nosql-i...@googlegroups.com
Looks like the error is from couchdb side.Try to insert this json to couchdb manually. http://piratepad.net/jfwjpZLtXI  If you still get the same error, you can post this to couchdb-user list.
--
Regards
 sathis


pjmorce

unread,
Apr 25, 2012, 6:51:27 AM4/25/12
to sql-to-nosql-importer-discuss
Ok, thanks

I tested using curl command.

I simplified the json file to this:
{"docs":[{"_id":"0","label ":"Pas de taches"}]}

The result was: {"ok":true,"id":"doc_id","rev":"1-
ffaec7bc2aa548ca8e5a9c697ea3eb64"}

Than, I modified the json file to:
{"docs":[{"_id":"1","label ":"Pas de tâches"}]}

The result was: {"error":"bad_request","reason":"invalid_json"}

Thanks

On Apr 25, 11:03 am, sathis kumar <sathi...@gmail.com> wrote:
> Looks like the error is from couchdb side.Try to insert this json to
> couchdb manually.http://piratepad.net/jfwjpZLtXI If you still get the

pjmorce

unread,
Apr 25, 2012, 8:22:42 AM4/25/12
to sql-to-nosql-importer-discuss
To correct the problem on the SQLToNoSQLImporter side, we have changed
the post method of the net.sathis.export.sql.couch.CouchWriter:

// These lines were added to convert from ISO-8859-1 to UTF-8 because
of the error
// (... lexical error: invalid bytes in UTF8 string ...) that was
happening on the
// couchDB database server
final byte[] b = content.getBytes("ISO-8859-1");
final String s = new String(b, "UTF-8");
httpclient = new DefaultHttpClient();
HttpEntity entity = new StringEntity(s, ContentType.APPLICATION_JSON);

instead of:

// Original code line
//HttpEntity entity = new StringEntity(content,
ContentType.APPLICATION_JSON);

Like that, the problem has been solved.

Thank you.

Regards.

pjmorce

unread,
Apr 25, 2012, 8:25:43 AM4/25/12
to sql-to-nosql-importer-discuss
Correction: the problem is not entirely solved. No error happens but
special characters are not correctly showed on Futon interface...

I will continue searching...

sathis kumar

unread,
Apr 25, 2012, 8:30:37 AM4/25/12
to sql-to-nosql-i...@googlegroups.com
Cool. Glad to hear.I think you can solve this problem by changing charset of your db to UTF-8 too.

Regards
 Sathis
--
Regards
 sathis


pjmorce

unread,
Apr 25, 2012, 9:38:10 AM4/25/12
to sql-to-nosql-importer-discuss
Finally I think we found the solution: we get the system encoding
file, we change it to utf-8, we do the process and after we put back
the encoding as it was:

// These lines were added to convert from ISO-8859-1 to UTF-8 because
of the error
// (... lexical error: invalid bytes in UTF8 string ...) that was
happening on the
// couchDB database server
final String currentEnc =
System.getProperties().getProperty("file.encoding");
System.getProperties().setProperty("file.encoding", "utf-8");
final byte[] b = content.getBytes("utf-8");
final String s = new String(b);
httpclient = new DefaultHttpClient();
HttpEntity entity = new StringEntity(s,
ContentType.APPLICATION_JSON);

// Original code line
//HttpEntity entity = new StringEntity(content,
ContentType.APPLICATION_JSON);
post.setEntity(entity);
post.setHeader(new BasicHeader("Content-Type", "application/
json"));
HttpResponse response = httpclient.execute(post);
if (response.getStatusLine().getStatusCode() !=
HttpStatus.SC_CREATED)
throw new
HttpException(response.getStatusLine().toString());

// To put the file encoding as it was
System.getProperties().setProperty("file.encoding",
currentEnc);

Regards
> > > > couchdb manually.http://piratepad.net/jfwjpZLtXIIf you still get the

pjmorce

unread,
Apr 26, 2012, 8:25:59 AM4/26/12
to sql-to-nosql-importer-discuss
Hello

I come back with this same problem.
The solution I proposed yersterday works until I don't use in the
database Uppercase special characters!
Example:
when in my table/field I have a value like this one : "Échantillon"

...a lexical error: invalid bytes in UTF8 string is generated

However, when in the same table/field and with the exactly same code I
have "Echantillon", no error is generated and the data is imported
correctly...

Any idea of the origin of this strange behaviour? any idea how to
resolve it?

Tank you

Regards.
> > > > > couchdb manually.http://piratepad.net/jfwjpZLtXIIfyou still get the
Reply all
Reply to author
Forward
0 new messages