Lucene Indexing Best Practices

945 views
Skip to first unread message

Adam Hill

unread,
Dec 5, 2011, 2:38:22 PM12/5/11
to ravendb
Is it better to have fewer indexes with larger numbers of fields in
the index or smaller indexes clustered by similar datatypes or types
of searches I am performing?

For example, my users can do a query against string and datetime
fields. Some of the string searches are exact, some are substring
queries (ie. I am doing a few evil *"term"* style searches) and a few
are date ranges and at least one specific set are always going to be
grouped together in a .Open/CloseSubclause() block.

Also, do compressed indexes help in general or should the only be
employed after the # of docs or uncompressed indexes get above a
certian size?

Thanks.

Chris Marisic

unread,
Dec 5, 2011, 4:13:53 PM12/5/11
to rav...@googlegroups.com
You want as few indexes as possible always. Generally you will never need more than 1 index per document type unless you do Map/Reduce or projection indexes to do other things than query a normal stored document.

Matt Warren

unread,
Dec 5, 2011, 4:51:29 PM12/5/11
to rav...@googlegroups.com
With regards to compressed/decompressed indexes, generally you don't have to worry about this. For instance the default is for Raven to set Field.Store.NO, i.e the field itself isn't stored in Lucene. So you have to specifiy Field.Store.YES before it even matters. This can be useful if you want to pull fields directly from the Lucene index, but normally this isn't needed.

All Raven (internally) stores in the Lucene index is the ID of the doc so that it can lookup a doc by ID when it is matched in a query.

Itamar Syn-Hershko

unread,
Dec 5, 2011, 5:20:51 PM12/5/11
to rav...@googlegroups.com
In addition to that, compressed fields in Lucene are a deprecated feature

Matt Warren

unread,
Dec 5, 2011, 5:44:49 PM12/5/11
to rav...@googlegroups.com
Yeah I saw that bit, although it seems to indicate that there was a new method for doing it using Compression Tools, see http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/document/CompressionTools.html.

Adam Hill

unread,
Dec 5, 2011, 6:06:59 PM12/5/11
to ravendb
Does the current RavenDB unstable automatically use the correct index
when you don't specify it in the query now? The current Static
Indexing docs on beta.ravendb.net seem to say it will pick the right
one.

And if I am *really* paranoid and think it is not using my index, is
there a way I can run stats or something to see that it is?

And are there any manual perf tools for queries (other than LinqPad)
and/or seeing the impact of an index on inserts and the length of
index creation?

Thanks.

Matt Warren

unread,
Dec 5, 2011, 6:18:50 PM12/5/11
to rav...@googlegroups.com


On Monday, 5 December 2011 23:06:59 UTC, Adam Hill wrote:
Does the current RavenDB unstable automatically use the correct index
when you don't specify it in the query now? The current Static
Indexing docs on beta.ravendb.net seem to say it will pick the right
one.

Yeah the query optimiser will do this for you, if it doesn't it's a bug!!
 

And if I am *really* paranoid and think it is not using my index, is
there a way I can run stats or something to see that it is?

And are there any manual perf tools for queries (other than LinqPad)
and/or seeing the impact of an index on inserts and the length of
index creation?

Generally you don't worry about it because it happens in the background. You only have to wait for the index to complete it you call WaitForNonStaleResultsAsOfNow. But you can take a look at the /stats endpoint to get some info.

Oren Eini (Ayende Rahien)

unread,
Dec 6, 2011, 3:34:26 AM12/6/11
to rav...@googlegroups.com
inline

On Tue, Dec 6, 2011 at 1:06 AM, Adam Hill <adam...@gmail.com> wrote:
Does the current RavenDB unstable automatically use the correct index
when you don't specify it in the query now? The current Static
Indexing docs on beta.ravendb.net seem to say it will pick the right
one.


Yes
 
And if I am *really* paranoid and think it is not using my index, is
there a way I can run stats or something to see that it is?


Look at the query statistics, it will tell you what the index it used is.

Itamar Syn-Hershko

unread,
Dec 6, 2011, 3:40:34 AM12/6/11
to rav...@googlegroups.com
Yes, rationale here  https://issues.apache.org/jira/browse/LUCENE-652 

I don't see any benefit for using this in RavenDB tho, it will just slow down the indexing operation

Matt Warren

unread,
Dec 6, 2011, 4:28:00 AM12/6/11
to rav...@googlegroups.com
Thanks for the link, it's always interesting to see the discussion behind these things.

I agree that it's not really that useful in RavenDB as you store the doc in a separate store, so there's very few situations you want to store info in Lucene directly.
Reply all
Reply to author
Forward
0 new messages