[mongodb-user] possible to increase max key size for a index?

824 views
Skip to first unread message

harryh

unread,
May 11, 2010, 1:06:57 PM5/11/10
to mongodb-user
This is 1k by default right? Can I increase it to avoid errors like
this:

Thu Apr 22 16:06:42 foursquare.venues Btree::insert: key too large to
index, skipping foursquare.venues.
$latlng__closed_1_aliases_1_keywords_1 1078 { : BinData, : false, :
{ 0: "jfk int'l airport - terminal 4 (international terminal)", 1:
"jfk intl airport terminal 4 international terminal", 2: "jfk airport-
terminal 4", 3: "jfk airportterminal 4", 4: "jfk virgin america
terminal", 5: "jfk virginamerica terminal", 6: "jfk virgin america
terminal 4", 7: "jfk terminal 4 (jfk airport)", 8: "jfk terminal 4 jfk
airport", 9: "jfk- virgin america", 10: "virgin america- jfk", 11:
"jfk airport - terminal 4 (jfk)", 12: "jfk airport terminal 4 jfk",
13: "virgin america @ jfk", 14: "virgin america jfk", 15: "virgin
america - jfk", 16: "jfk airport - terminal 4", 17: "jfk airport
terminal 4", 18: "aer lingus terminal 4 jfk", 19: "virgin america
jfk", 20: "jfk virgin america", 21: "virgin america terminal @ jfk",
22: "virgin america terminal jfk", 23: "kennedy airport - terminal
4", 24: "gate a7 terminal 4 jfk", 25: "terminal 4, gate 20", 26:
"virgin america terminal jfk", 27: "jfk t4", 28: "virgin america", 29:
"jfk terminal 4" }, :
"jfkintlairportterminalfourinternationalterminal" }

-harryh

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Eliot Horowitz

unread,
May 11, 2010, 1:12:23 PM5/11/10
to mongod...@googlegroups.com
Only by modifying a constant in the code.

There isn't a config option because usually its a bad thing, but in
your case I think its fine for now, at least until we get a better
solution.

Do you want to try that? If so - its KeyMax in db/btree.cpp - could
change to 2048

That's for the list of aliases? Kind of a strange case with geo...

harryh

unread,
May 11, 2010, 1:20:16 PM5/11/10
to mongodb-user
> Do you want to try that?  If so - its KeyMax in db/btree.cpp - could
> change to 2048

Eh. This might not be urgent enough for us to deploy a custom mongo
binary, but I'll talk about it with my guys. Prolly a good idea to
get this in as a config option for the future if possible.

> That's for the list of aliases?  Kind of a strange case with geo...

Ya, what we're doing is wonky for sure. This is what's backing our
geo search right now (show me all the nearby places named X).

Eliot Horowitz

unread,
May 11, 2010, 1:30:30 PM5/11/10
to mongod...@googlegroups.com
In your case I think the best thing to do is change the schema a tiny
bit - will make it faster too.

Right now you have an array of aliases, right?
I think you should keep that, but also add an array of alias_keywords.
basically split all the aliases into words and put them in a set.
At least in your example that will be a lot smaller.

Then you can query on that AND aliases. So the result will be just as
accurate, and most things will be filtered by the set.
So you would split the query up and use $all. There would only be a
few mismatches on that causing a couple of extra object loads, but
probably negligible.

That will also make the index size smaller, and make the normal query
faster since it will have to look at less data.

harryh

unread,
May 11, 2010, 6:07:36 PM5/11/10
to mongodb-user
> but also add an array of alias_keywords....

This seems like a good idea and we are going to pursue this option.
One additional question though:

Even with an array of alias_keywords it will still be possible to
exceed the 1k limit. We plan to avoid this by culling the list of
alias_keywords based on word frequency in the aliases list. The
question is where to set the cutoff. If we have an index that
consists of:

1) a lat/long
2) a closed flat (boolean)
3) a single keywords string
4) alias_keywords, an array of strings

Is there a reliable way to calculate what the index key size would be
so that we can cull words from #4 as necessary?

Eliot Horowitz

unread,
May 11, 2010, 11:05:44 PM5/11/10
to mongod...@googlegroups.com
The 1k refers to the bson index object.
So it should be easy to compute. Just remember field names are always
"" in the index.

roughly:
bson overheard: 5 bytes
lat/long: 2 + 9
closed: 2 + 1
keyword: 2 + strlen() + 1
alias_keywords: 7 + ( N * ( 2 + strlen() ) )

Nikhil Naib

unread,
Nov 12, 2014, 6:54:31 AM11/12/14
to mongod...@googlegroups.com
What about the text index? Does it mean that documents having a text index on the field containing text of length more than 1024 bytes won't get indexed? Also when does the stoplist kicks in? 

Thanks,
Nikhil 

Will Berkeley

unread,
Nov 12, 2014, 12:33:34 PM11/12/14
to mongod...@googlegroups.com
Hi Nikhil. When posting to the list, please start a new thread instead of responding to one that's been inactive for more than 4 years.

The 1024 byte index entry limit doesn't apply directly to text indexes. An entry in a text index has a different form from an entry in a Btree, which is the data structure used for "standard" MongoDB indexes - the field value isn't (generally) an index key in a text index like it is in a Btree. Roughly, a text field is tokenized and the tokens are indexed. The stoplist "kicks in" to filter out common, relatively meaningless language-specific terms like "a", "the", and "or" in English.

-Will
Reply all
Reply to author
Forward
0 new messages