Solution for IN clause

40 views
Skip to first unread message

Nitesh Gupta

unread,
Apr 7, 2021, 7:49:03 PM4/7/21
to bleve
Hi,
Is there any solution to implement IN clause in BleveSearch?

Thanks
Nitesh

Marty Schoch

unread,
Apr 7, 2021, 8:03:19 PM4/7/21
to bl...@googlegroups.com
There are basically two possibilities, one at query time (most flexible) and one at index time (increases size of the index, but may perform better).

At query time, the values you're interested IN would simply be expanded to a disjunction query.  For example, status IN (open, reopened, in-progress) would be answered by using:

- Disjunction
  - Term "open" field "status"
  - Term "reopened" field "status"
  - Term "in-progress" field "status

Disjunction (or) queries tend to be expensive because there are more matches, and bleve has to produce document matches for all of them.

Generally, when you start having queries with lots of disjunction clauses, it may start to take longer than desired to execute.  The solution is to change the index definition so that all of the terms you're interested in can be found by using a smaller number of terms at query time.  Using the previous example, we could define a new field in our index based on the original status field, that gives us higher-level pseudo-statuses.

So, let's define a new field called status-group.  If that status field has values "open", "reopened" or "in-progress" we index the value "not-closed".  Now, at query time, I can find all the same documents as before using a single term query

- Term "not-closed" field "status-group"

Obviously, there are limitations to this approach, but also ways that you can vary it to apply it to other similar situations.

1.  You need to know which values are commonly grouped together for your IN clauses.  You can get creative and define lots of these, but they also contribute to making the index larger.
2.  You can still use a hybrid approach for some queries, meaning using the "not-closed" query to match the 3 statuses, but still use a disjunction with other term queries on the status field for other values.  This can reduce the overall number of disjunction clauses, without always trying to reduce everything to a single value.
3.  You have to decide how you want to produce this new field.  The most flexible thing would be to simply add a field to your domain objects and write your own code to create these values.  But, it is also possible to write custom analysis pipeline components to do this within the bleve mapping phase.  There is no right or wrong way, just more trade-offs to consider.

marty

--
You received this message because you are subscribed to the Google Groups "bleve" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bleve+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bleve/a87effd1-a5c4-400f-8e8b-f035b34ca5e7n%40googlegroups.com.

Amnon BC

unread,
Apr 8, 2021, 2:59:08 AM4/8/21
to bleve
Is there a limit to the number of disjunction terms bleve can handle?
In our application we often have 50 terms, and it works well.

Marty Schoch

unread,
Apr 8, 2021, 10:19:03 AM4/8/21
to bl...@googlegroups.com
On Thu, Apr 8, 2021 at 2:59 AM Amnon BC <amn...@gmail.com> wrote:
Is there a limit to the number of disjunction terms bleve can handle?
In our application we often have 50 terms, and it works well.

There is no specific limit, but performance may degrade as you increase the number of terms.

marty
Reply all
Reply to author
Forward
0 new messages