Problem with invalid UTF-8 [Mongo-Ruby]

110 views
Skip to first unread message

Andrew Timberlake

unread,
Oct 15, 2010, 1:32:59 AM10/15/10
to mongod...@googlegroups.com
I'm having a lot of problems with data already in our database.
When I try to update a record, I'm getting the following error: String not valid utf-8: "Andri?tte"
When I load the model via MongoMapper, I get the following value in irb: "Andri�tte"
When I export the data from mongodb as json, I have the following: "Andriëtte"

If I load the record in irb, manually change the value from "Andri�tte" to "Andriëtte", then it saves correctly.
If I then reload the record, it displays correctly and saves fine.

I have 500k records in the database so manually fixing this does not sound like fun :-(

I have tried dumping and restoring the entire database which completes successfully but doesn't fix anything, nor does it raise any issues of invalid data.
I'm using the latest versions of MongoDB, Mongo-ruby and MongoMapper
This happens whether I use bson-ext or not (although the error message is nicer without bson-ext - at least it tells me what value is invalid)

Thanks for the help

Andrew Timberlake

Chuck Remes

unread,
Oct 15, 2010, 8:36:35 AM10/15/10
to mongod...@googlegroups.com

What platform are you running on (Windows, OSX, Linux)? And what Ruby runtime (MRI, REE, JRuby, etc) and version are you running?

cr

Andrew Timberlake

unread,
Oct 15, 2010, 11:49:50 AM10/15/10
to mongod...@googlegroups.com

On Fri, Oct 15, 2010 at 2:36 PM, Chuck Remes <cremes....@mac.com> wrote:

What platform are you running on (Windows, OSX, Linux)? And what Ruby runtime (MRI, REE, JRuby, etc) and version are you running?


OS: Linux, Ubuntu 8.04
Ruby: REE 2010.01

Andrew

Kyle Banker

unread,
Oct 15, 2010, 9:49:21 PM10/15/10
to mongod...@googlegroups.com
Did you insert the data using MongoMapper or the Ruby driver and not receive any error then?

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Andrew Timberlake

unread,
Oct 15, 2010, 11:36:49 PM10/15/10
to mongod...@googlegroups.com
On Sat, Oct 16, 2010 at 3:49 AM, Kyle Banker <ky...@10gen.com> wrote:
Did you insert the data using MongoMapper or the Ruby driver and not receive any error then?

All of the data was inserted using MongoMapper. I've been using it for quite a while now - Multiple MM versions, and MongoDB since at least the 1.0 branch (perhaps a little before)
Can't tell you when the problem started exactly. Originally I was picking the problem up when users where importing CSV files, I spent quite a while on various iconv conversions until I realised it was the updating of existing data that was causing the problem.

Andrew

Kyle Banker

unread,
Oct 18, 2010, 2:12:13 PM10/18/10
to mongod...@googlegroups.com
Andrew,

Were you using mongoimport? Before 1.6, mongoimport didn't validate
for utf-8. The only way to fix the data, unfortunately, is to manually
re-convert it.

Kyle

Andrew Timberlake

unread,
Oct 19, 2010, 7:17:43 AM10/19/10
to mongod...@googlegroups.com
On Mon, Oct 18, 2010 at 8:12 PM, Kyle Banker <ky...@10gen.com> wrote:
Andrew,

Were you using mongoimport? Before 1.6, mongoimport didn't validate
for utf-8. The only way to fix the data, unfortunately, is to manually
re-convert it.

Kyle

No, I tried mongodump and mongorestore. I'll try mongoexport and mongoimport and see if that highlights the errors as it imports (so I can find all the occurrences)

Andrew

Reply all
Reply to author
Forward
0 new messages