On 7/16/2020 4:29 AM, David Storrs wrote:
> tl;dr
> Can anyone recommend a data structure that is ordered and supports
> efficient reordering, insertion at arbitrary location, and deletion?
>
> Long form:
>
> I'm working on an operation-log reconciliation problem, where each
> operation is one of:
>
> File-Create P H
> File-Update P H
> File-Delete P H
> Folder-Create P
> Folder-Delete P
>
> P = path
> H = hash of the file (e.g. md5)
>
> Operation messages travel over the network, meaning that they could
> arrive out of order. (They could also be missed entirely, but that's
> a separate problem that I'm not working on yet.)
Shouldn't be a problem: "at-least-once" and "at-most-once" semantics
both are pretty easy. It's the "exactly-once" that is the problem.
Hopefully you have no need for that.
> Specifically, I want to be able to take a series of messages and
> collapse them where appropriate. For example:
>
> File-Update P H1 -> H2
> File-Create P1 H1
> Result after collapse:
> '(File-Create P1 H2)
>
> File-Create P H1
> File-Delete P H1
> Result after collapse:
> '()
>
> File-Delete P X
> File-Create P X
> Result after collapse:
> '()
> File-Delete P1 H1
> File-Create P2 H2
> File-Create P1 H1
> Result after collapse:
> '(File-Create P2 H2)
>
> I've been mulling over various ways to handle all of this and digging
> around to find examples of other people doing it, but I'm wondering if
> there's a data structure or existing algorithm that will handle it
> cleanly. Does anyone know of such a thing?
What if messages are lost permanently, e.g., due to hardware crash?
What it you receive a create but a corresponding delete or update is
lost - then your information / picture of the file system state is wrong.
What if you receive a file delete without a corresponding create? In the
absence of other information, can you even assume there *was* a create?
If these messages are sent in response to user actions, can they ever be
sent mistakenly?
The problem seems under-specified. Can you say more about the real purpose?
I can think of some ways to keep an index of the file system and track
changes made to it ... but at best that could provide a snapshot in time
rather than an ongoing audit log. And it seems like a log would not be
terribly useful without all the information.
George