Index field order

279 views
Skip to first unread message

Glenn Maynard

unread,
Feb 20, 2012, 3:19:00 PM2/20/12
to mongodb-user
http://www.mongodb.org/display/DOCS/Indexes#Indexes-UsingDocumentsasKeys
says:

> // this query does not match the document because the order of fields is significant
> db.factories.find( { metro: { state: "NY" , city: "New York" } } );

How can the order of fields be significant? I don't believe the
ordering of object properties is guaranteed to be preserved in
JavaScript (though it happens to behave that way in most
implementations), and I'm certain it's not guaranteed in most other
languages. That would make natural language bindings impossible; eg.
Python uses a hash table:

>>> {"a": 1, "b": 1, "c": 1}.keys()
['a', 'c', 'b']

This makes many documents inserted in JS impossible to look up in this
way under other language bindings; you have to switch to individual
field comparisons. Worse, it may sometimes work, or work until a
CPython upgrade changes the hash function.

--
Glenn Maynard

Scott Hernandez

unread,
Feb 20, 2012, 3:22:30 PM2/20/12
to mongod...@googlegroups.com
http://api.mongodb.org/python/1.7/api/pymongo/son.html

In every languages driver there should be an order preserving map/dict/etc.

If you don't use them then you don't care about order... and live with
the consequences. Up to you.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Glenn Maynard

unread,
Feb 20, 2012, 4:01:44 PM2/20/12
to mongodb-user
On Feb 20, 2:22 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> http://api.mongodb.org/python/1.7/api/pymongo/son.html
>
> In every languages driver there should be an order preserving map/dict/etc.
>
> If you don't use them then you don't care about order... and live with
> the consequences. Up to you.

The consequences are nondeterministic behavior which may only manifest
after system upgrades. That's inherently brittle and essentially
untestable. I shouldn't have to care about order; dictionaries are by
nature unordered, and comparisons on them shouldn't depend on the
order of keys. I shouldn't have to uglify my code with non-native,
and probably slow, basic containers. It also means you can't even use
a standard decoder to convert a JSON blob (eg. from an API) to a query
dict, because it'll output a regular dictionary. Dictionary key order
should be normalized before comparison.

--
Glenn Maynard

Scott Hernandez

unread,
Feb 20, 2012, 4:15:39 PM2/20/12
to mongod...@googlegroups.com
Totally up to you. Don't index on documents and don't worry about
order; if you can't define the order then indexing a document as a key
won't help you.

Wes Freeman

unread,
Feb 20, 2012, 4:44:00 PM2/20/12
to mongod...@googlegroups.com
If you want to have an index on a "document" that doesn't care about
order, index all of the fields within, and change your query to query
on all of the fields instead of the document as a whole. That is
essentially the functionality you desire, from what I can
tell--although it won't work with nested objects and arrays. Do you
have an example use case?

Also, be aware of the ~800 byte index key limit (which hopefully will go away).
http://www.mongodb.org/display/DOCS/Indexes#Indexes-KeysTooLargeToIndex

Wes

Glenn Maynard

unread,
Feb 21, 2012, 12:23:35 PM2/21/12
to mongodb-user
On Feb 20, 3:15 pm, Scott Hernandez <scotthernan...@gmail.com> wrote:
> Totally up to you. Don't index on documents and don't worry about
> order; if you can't define the order then indexing a document as a key
> won't help you.

The problem is that it's hard for tests to ensure that this isn't
happening.  (Inserting every document type with different field orders
isn't very practical.)

(Note that this can manifest without creating any indexes.)

On Feb 20, 3:44 pm, Wes Freeman <freeman....@gmail.com> wrote:
> If you want to have an index on a "document" that doesn't care about
> order, index all of the fields within, and change your query to query
> on all of the fields instead of the document as a whole. That is
> essentially the functionality you desire, from what I can
> tell--although it won't work with nested objects and arrays. Do you
> have an example use case?

I'd prefer to at least be able to disable features which are only
going to bite me--possibly months later--if someone on my team
accidentally uses it. (This also affects unique indexes; I don't know
what else.)

But what I don't understand is why it's done this way. Why would you
ever want {a: 1, b: 2} to be unequal to {b: 2, a: 1}? The keys are
always strings, so defining equality in this way is straightforward.
It's much more natural, maps cleanly to native dictionary types in all
languages I know of that support deep comparisons on dicts (eg.
Python, Ruby), and doesn't require using non-native containers to get
reliable behavior.

--
Glenn Maynard

Wes Freeman

unread,
Feb 21, 2012, 1:24:45 PM2/21/12
to mongod...@googlegroups.com
Just so we're clear, you're only talking about indexes on
documents--which would only be used when querying to match a whole
document or subdocument. Native collection support for maintaining
insertion order in a Hash/Dict:
Python: OrderedDict
Ruby: Hash (since 1.9)
Java: LinkedHashMap

Or you could use the MongoDB BSON ordered containers in those
respective languages:
Python: http://api.mongodb.org/python/current/api/bson/son.html
(subclass of dict)
Ruby: Looks like the Ruby driver used to have an OrderedHash for Ruby
1.8 support, but maybe they took that out?
Java: http://api.mongodb.org/java/2.2/org/bson/BasicBSONObject.html
(uses LinkedHashMap)

I don't profess to be an expert on the database internals, but
consider the performance of the backing index implementation. In order
to do it your way, it would probably need to be a multikey index,
which is not nearly as efficient as one big key that can be directly
compared byte by byte. There wouldn't be any benefit
(performance-wise) in using a document index over using a multikey
index.

Maybe it could work (without too much performance sacrifice) with some
sort of normalization of the BSON before it goes into the index
key--sort the documents on their keys as they're fed into the index?
I'm happy with the way it is, but you could create a JIRA issue to see
if anyone else shares your concern.

Wes

Glenn Maynard

unread,
Feb 21, 2012, 4:16:54 PM2/21/12
to mongodb-user
On Feb 21, 12:24 pm, Wes Freeman <freeman....@gmail.com> wrote:
> Just so we're clear, you're only talking about indexes on
> documents--which would only be used when querying to match a whole
> document or subdocument.

I'm talking about document equality depending on insertion order. It
doesn't bother me if Mongo happens to preserve insertion order (most
JavaScript implementations also happen to do that, though the spec
doesn't require it). I just don't think that should affect equality.
This does affect indexes, but it also affects searching when no
indexes are in use.

> Native collection support for maintaining
> insertion order in a Hash/Dict:
> Python: OrderedDict

(I'm focusing on Python since it's the language I use the most. Ruby
is the only language I know of that actually defines its builtin
dictionary type as order-preserving. Java doesn't have dictionary
builtins, so it's inconvenient no matter what you do.)

OrderedDict isn't a native dictionary, it's a library one, which is
very second-class. Native dictionaries, dict, are what you get when
you use the language's dictionary syntax. For example, it means you
can't simply say:

coll.find({"user": "glenn", "year": 2012})

Instead, you have to do this, which is painful:

coll.find(OrderedDict([("user", "glenn"), ("year", 2012)]))

OrderedDict is also slower than dict.

> Maybe it could work (without too much performance sacrifice) with some
> sort of normalization of the BSON before it goes into the index
> key--sort the documents on their keys as they're fed into the index?
> I'm happy with the way it is, but you could create a JIRA issue to see
> if anyone else shares your concern.

Right; normalize {"c": 1, "a", 3, "b": 2} to {"a": 3, "b": 2, "c": 3},
and use that for comparison and sorting purposes.

Since the original order probably does need to be retained in the
index (for covered-indexes lookups), it would need to store the
original ordering, which would give a small (but easily packed)
overhead:

{"a": 3, "b": 2, "c": 3}, [2, 0, 1]

where each item is the index of the value at that point.

https://jira.mongodb.org/browse/SERVER-5030

--
Glenn Maynard

Wes Freeman

unread,
Feb 21, 2012, 5:50:26 PM2/21/12
to mongod...@googlegroups.com
I like the idea for indexes, maybe as an option, but making it work
like that for normal equality sounds like a waste of space for an
uncommon use case. :(

Wes

Glenn Maynard

unread,
Feb 22, 2012, 10:13:31 AM2/22/12
to mongodb-user
On Feb 21, 4:50 pm, Wes Freeman <freeman....@gmail.com> wrote:
> I like the idea for indexes, maybe as an option, but making it work
> like that for normal equality sounds like a waste of space for an
> uncommon use case. :(

I suppose that if the "original order" mapping isn't used, the only
penalty is that you couldn't do covered-index lookups for whole
subdocuments, which itself is probably very rare. The normalization
itself shouldn't cause any waste of space.

(I don't know anything about MongoDB internals and storage, so I'll
put the probably useless technical speculation aside for now.)

--
Glenn Maynard

danieljsinclair

unread,
Oct 4, 2013, 8:03:38 AM10/4/13
to mongod...@googlegroups.com
I have to say I'm in complete agreement with Glenn - this is just *nuts*. I was completely blown away to discover this myself today, and I'm curious as to why anyone would defend it. Smells of fundamental bug to me.

It's not just indexes by the way, although that's where you're likely to run into it - or worse still, you won't run into it. No, it seems a more fundamental quirk than that.

Try this in the shell;

db.xxx.drop();
db.xxx.insert({             id: {bar:2, foo:1}, name:"bob"});
db.xxx.insert({ name:"bob", id: {bar:2, foo:1}});
db.xxx.insert({             id: {foo:1, bar:2}, name:"bob"}); 
db.xxx.find({               id: {bar:2, foo:1}, name:"bob"}); // only returns the first two

The first two ARE equivalent, because field ordering is unimportant. As such, the following line executed before you did any insert()s would cause the second insert to fail (as well it should);

db.xxx.ensureIndex({id:1},{unique:1});

However, the THIRD insert() where foo and bar are swapped in order, *should* be equivalent to both the two previous ones, but clearly it isn't. It is not returned from a find() and would be deemed unique by the index.

The Mongo docs have this to say;

"MongoDB does not make guarantees regarding the order of fields in a BSON document. Drivers and MongoDB will reorder the fields of a documents upon insertion and following updates."

Which is clearly only true for top-level fields. However, id is *also* considered a BsonDocument. I have to say, things like this are really starting to make me lose confidence in MongoDB.

Sam Millman

unread,
Oct 4, 2013, 9:06:48 AM10/4/13
to mongod...@googlegroups.com
> However, id is *also* considered a BsonDocument.

It is?


--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

danieljsinclair

unread,
Oct 4, 2013, 1:05:22 PM10/4/13
to mongod...@googlegroups.com
yes. At least that's how the c# driver seems to interpret it.

Glenn Maynard

unread,
Oct 4, 2013, 3:19:58 PM10/4/13
to mongod...@googlegroups.com
On Fri, Oct 4, 2013 at 7:03 AM, danieljsinclair <daniel....@nupe.com> wrote:
db.xxx.drop();
db.xxx.insert({             id: {bar:2, foo:1}, name:"bob"});
db.xxx.insert({ name:"bob", id: {bar:2, foo:1}});
db.xxx.insert({             id: {foo:1, bar:2}, name:"bob"}); 
db.xxx.find({               id: {bar:2, foo:1}, name:"bob"}); // only returns the first two

The first two ARE equivalent, because field ordering is unimportant. As such, the following line executed before you did any insert()s would cause the second insert to fail (as well it should);

(Some experimentation suggests that Mongo reorders the top-level dictionary's keys to put _id as the first field of the document.  Is that an optimization to allow always retrieving a document's _id without reading the whole document?)
 
db.xxx.ensureIndex({id:1},{unique:1});

However, the THIRD insert() where foo and bar are swapped in order, *should* be equivalent to both the two previous ones, but clearly it isn't. It is not returned from a find() and would be deemed unique by the index.

You're comparing two very different things: document insertion and a query.  When you say { _id: x, name: y }, you're not saying "find documents equal to { _id: x, name: y }", you're saying "find documents where _id == x && name == y".  A query is a set of things to match, not a document to match.

The key ordering issue is a problem, but this isn't related to it.

The Mongo docs have this to say;

"MongoDB does not make guarantees regarding the order of fields in a BSON document. Drivers and MongoDB will reorder the fields of a documents upon insertion and following updates."

I was aware of the reordering on updates, but I've never heard of it changing the order on insertion.  It's odd that key order is preserved in some cases but not others.  (FWIW, I've been using Mongo for a year and a half, and I've spent a fair bit of time looking over and discussing the key ordering problem, and I still didn't have a complete understanding of Mongo's ordering behavior...)

MongoDB's semantics shouldn't differentiate between "documents" and "subdocuments".  To users they're all just dictionaries, and should have identical behavior.
 

However, id is *also* considered a BsonDocument.

I'm not sure what this means.  If your _id is a dictionary, then I believe it's the same as any other subdocument, and always preserves order on insertion.

--
Glenn Maynard

Reply all
Reply to author
Forward
0 new messages