Hi Martin,
Hopefully some more answers:
> The value is often NULL (e.g. MySQL does that). For unique indices, we can move the document ID to the value column. Since we know for each index table which document key was used to create them, we don't need to encode that in every entry.
That's how we encode our _id index, but it's (no longer) how our secondary unique indexes get encoded. Those do keep the document ID as part of the key. That's not relevant to your multikey question so I'll leave it at that, but I'm happy to provide more detail on the "why" there.
> This may change upon document insertion, but it's a one-directional thing; an index keeps being multi-key even if all documents that caused it to become multi-key in the first place are long gone and deleted.
> The only way to get back to a regular index that I can see is to drop it and create a new one (assuming that there are no arrays left in the indexed documents).
That used to be correct. I'm not perfectly familiar, but
the validate command can unset multikey. There may be some limitations associated with it though (my hunch is that may be a standalone-only operation).
> I assume that one of them simply gets discarded during indexing and we persist no information in the index that we saw this once at a[0] and once at a[1]?
That's correct. I'm not an expert in our key generation (so I couldn't tell if we intentionally "generate a single key" or if we just throw two items into WT with the same Key and value, and one is "accidentally" discarded).
To give detail on all the states you're interested in. The index is considered multikey and the multikey path for "a.b" is set:
{'backgroundSecondary': False,
'head': 0,
'multikey': True,
'multikeyPaths': {'x.b': b'\x01\x00'},
'ready': True,
'spec': {'key': {'x.b': 1.0}, 'name': 'x.b_1', 'v': 2}}],
And there is a single item in the index for that document (not shown, but verified locally on a modern version).
> Does that mean that we cannot use this index for this query at all?
My understanding is that MDB cannot use the index for this query. I don't think it's related to the multikey-ness, but rather a quirk of the query language and how the notation used for array matching is ambiguous with sub-object matching. Consider another document that the `{"a.0.b": 1}` query must return:
{ _id: ...,
"a": {"0": {"b": 1}}}
The `{"a.b": 1}` index generates a "null" key for that document. So using the index isn't sufficient for identifying that result.
Hope that helps,
Dan