Johannes Berg <
joha...@sipsolutions.net> writes:
> I think he's talking about the sqlite index thoughts/work, which would
> in fact make this easier - today I think it would require rewriting the
> index.
Right, it doesn't have to be sqlite, but that's what I've been toying
with. The main purpose is to make it easier to handle various
operations (and enhancements), including deletes, and even more
importantly, to be able to handle additional information, like the
current metadata safely.
Right now (extended) metatadata is stored separately, because the
current index is an efficient, mmapped data structure (array
representation of a tree with inter-node pointers represented by array
offsets) that has no easy way to handle variable length data that might
change size -- without something like a rewrite (merge). That's why the
extended metadata was originally stored externally with a fixed length
integer pointer field linking the index entry to its (extended)
metadata.
Updates to existing index entries (mtime changes, etc. -- basic
fixed-length stat data is in the main index) happen in-place via direct
array writes to the mmapped data structure, and so there's no
(pedestrian) way to avoid the potential for data races, "torn" updates,
etc. The integer value linking an entry to its externally stored,
extended metadata is also updated that way whenever the metadata
changes. We change the "id" because the extended metadata entries are
deduplicated and we never modify one in-place.
I believe the separation between the index proper, and the metadata
store is the source of some of the "broken index" problems we see
reported periodically. With sqlite, of course, it'd be easy to handle
those updates transactionally.
> But I think you're right - I'm not sure we _should_ be making semantic
> changes when the format changes, seems that should be orthogonal. OTOH,
> the biggest reason for having them linger in the index is that it
> requires rewriting and that's harder than just newly indexing/updating.
Right, I'd been preserving the existing semantics, though expected we
might want to discuss deletes if some of the current semantics were
partially a result of the storage method.
I've vaguely wondered about what semantics I think I might *want* with
respect to deletes. Offhand, conceptually, I could imagine wanting them
to disappear after the next save that includes their parent directory,
but given the fact (for example) that you can save arbitrary individual
paths...
> If I find time later, I might take a look at just implementing something
> like --clear-deleted (probably better called --prune-deleted?)
If there's no obvious way we already think we should handle deletes more
implicitly, then sounds plausible, and might be plausible anyway.
But I'd want to give it a bit of thought if we have any suspicions that
we might not want/need to keep the option/semantics if we were to rework
the index.