Same Field Name Twice In Document

727 views
Skip to first unread message

Eric Lubow

unread,
Jun 26, 2011, 6:12:33 PM6/26/11
to mongo...@googlegroups.com
I wanted to send this to the mongodb-dev list before I just created a Jira willy nilly.  I have a record that has a field name twice:
{ "_id" : ObjectId("4dfa73f8c2b2195307000027"), "_type" : "PublisherAccount", "_type" : "PublisherAccount" }

I have no idea how this happened but this seems like it is a major issue.  I am not even sure how to go about reproducing it.

This data is typically entered through a Rails app using MongoID or through one of our Node.js apps.  We also have some scripts that hit the DB that use the mongo-ruby-driver.  Either way, I don't think that a document that looks like that is valid.  The above document is a snippet of an actual document in the database.  Is this normal behavior or is this something that needs to be troubleshot (or that I should be worried about)?  Any help would be appreciated.  Thanks.

-e
--

Dwight Merriman

unread,
Jun 26, 2011, 6:21:24 PM6/26/11
to mongo...@googlegroups.com
probably some application code added the same field twice and the driver didn't enforce nonduplicates.

you can probably just fetch those docs and then update them back without the dup.


--
You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
To post to this group, send email to mongo...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-dev?hl=en.

Eric Lubow

unread,
Jun 26, 2011, 6:38:56 PM6/26/11
to mongo...@googlegroups.com
I guess I am curious how the engine let's that happen and why it's the driver's responsibility to enforce that.  Couldn't this potentially then be an issue affecting all drivers if the engine isn't de-duping field names?

Since I can't reproduce this using the JS console and I can't query for all documents that have the same field twice, how would I go about finding how many (and which) documents have duplicate field names?  What do you recommend for dealing with this going forward?

Thanks.

-e

Dwight Merriman

unread,
Jun 26, 2011, 6:57:50 PM6/26/11
to mongo...@googlegroups.com
running with --objcheck on the server might prevent it (not sure)

i am not sure you can query for it i would perhaps iterate everything to a client (possibly a client on same server)

Eric Lubow

unread,
Jun 27, 2011, 11:45:15 AM6/27/11
to mongo...@googlegroups.com
I've narrowed it down and it turns out to be a result of the mongo-ruby-driver via MongoID.

The ruby driver allows a symbol and a string as a hash key.  Therefore I have a:
doc[:_type] = "PublisherAccount"
doc["_type"] = "PublisherAccount"

and they are both eventually getting to Mongo the same way (as a string).

I might be wrong in thinking this, but wouldn't using --objcheck on the server have a write performance impact?

Should I file a Jira for this for the server to enforce de-duplication of field names?

-e

Eliot Horowitz

unread,
Jun 27, 2011, 4:54:04 PM6/27/11
to mongo...@googlegroups.com
Multiple fields with the same name are valid in both json and bson, so
this won't be something we'll be enforcing at the db level.

Kyle Banker

unread,
Jun 28, 2011, 10:57:45 AM6/28/11
to mongo...@googlegroups.com
The Ruby driver does not normalize keys; thus, you can save a document
with a symbol and with the "same" key, and you'll get two of the same
keys in the BSON document.

There are a couple solutions to this:

1. Always use either symbols xor strings as hash keys.

2. Use HashWithIndifferentAccess. I believe that this is what MongoMapper does.

We could check for these kinds of duplicates on the driver level, but
it would introduce overhead.

Kyle

Eric Lubow

unread,
Jun 29, 2011, 10:50:24 AM6/29/11
to mongo...@googlegroups.com
Ok.  I understand that the spec allows dupes and agree with Kyle that it may induce extra overhead in the driver to de-dupe.  I just couldn't see a use-case for something that incurs extra data storage (even though it's by mistake) on documents and isn't even really queryable or updatable.

-e
Reply all
Reply to author
Forward
0 new messages