Fanout Index vs field concatenation

90 views
Skip to first unread message

Michael Schmid

unread,
Apr 7, 2021, 5:30:02 AM4/7/21
to RavenDB - an awesome database

Hey,

most of my collections contain documents that contain arrays of key/value pairs that always follow this structure:

{
...
     "dictionary": [
          {
                  "key":  "somekey",
                  "value: "somevalue"
          },
          {
                  "key":  "anotherkey",
                  "value: "anothervalue"
          }
     ]
}

To be more precise, the "dictionary" structure pops up all over the place in my documents, the structure (key/value pairs in an array) is always the same. Among other things, users can query documents via the dictionary like so:

  1. Return all documents where the dictionary has a specific key
  2. Return all documents where the dictionary has a specific value
  3. Return all documents where ONE of the dictionary entries has a specific key AND value.
Case 1 and 2 are trivial, case 3 requires a fanout index. Fanout indexes might be problematic if the dictionary size grows, so I came up with an alternative solution: combining "key" and "value" into a single string separated by "@". As endsWith searches cannot use the index efficiently my index now contains two values for each key/value pair: "key@value" and the same but with switched key and value position. In the example above, the index terms look like follow:

"somekey@somevalue", "anotherkey@anothervalue", "somevalue@somekey", "anothervalue@anotherkey".

The switching is done to be able to support case 2 efficiently (startsWith instead of endsWith), the concatenation with the "@" separator is done to support case 3 (simple equals comparison instead of endsWith). My question now is: which one is better in terms of performance and storage efficiency? Is there a certain array size where the concatenation approach would be more beneficial than fanout indexes? My guess is that concatenation will be more storage-efficient than fanout indexes, but is there any recommendation on what solution is "better"?

Oren Eini (Ayende Rahien)

unread,
Apr 7, 2021, 8:28:27 AM4/7/21
to ravendb
I would actually recommend doing that with dynamic fields, that way, you'll key using:

somekey = 'somevalue'


--
You received this message because you are subscribed to the Google Groups "RavenDB - an awesome database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/cf96f30f-d49b-4d3e-baee-c4a9d98e148bn%40googlegroups.com.
Message has been deleted
Message has been deleted

Michael Schmid

unread,
Apr 8, 2021, 4:43:53 AM4/8/21
to RavenDB - an awesome database
Ah,

thanks for that, looks way better than my solutions! Just to be sure, you mean index the dictionary like that:

_ = doc.dictionary.Select(p => CreateField(p.key, p.key + "=" + p.value, false, true))

This solves cases 1 (exists query) and 3 (equals query) perfectly. Am I right that for case 2 (query only using "value") I would still require a separate index field?

EDIT:

Actually I do not understand this completely: why would i need to concatenate "key" and "value" as the field name? Wouldn't simply indexing like this achieve the same result: _ = doc.dictionary.Select(p => CreateField(p.key, p.value, false, true)) ?

I'd still need to index "value" separately (for query case 2), but that is OK i guess.

Oren Eini (Ayende Rahien)

unread,
Apr 9, 2021, 4:07:44 AM4/9/21
to ravendb
You don't need to, use:

_ = doc.dictionary.Select(p => CreateField(p.key, value, false, true))


Michael Schmid

unread,
Apr 9, 2021, 5:16:20 AM4/9/21
to RavenDB - an awesome database
Thanks!

yes that's what I was going with. In addition to that I have indexed the "value" property separately, so i am also able to query only on value - works like a charm.
Reply all
Reply to author
Forward
0 new messages