Best data structure for ordered data set with insertion and reordering?

65 views
Skip to first unread message

David Storrs

unread,
Jul 16, 2020, 4:29:36 AM7/16/20
to Racket Users
tl;dr
Can anyone recommend a data structure that is ordered and supports efficient reordering, insertion at arbitrary location, and deletion?

Long form:

I'm working on an operation-log reconciliation problem, where each operation is one of:

  File-Create    P H
  File-Update   P H
  File-Delete    P H
  Folder-Create P
  Folder-Delete P

P = path
H = hash of the file (e.g. md5)

Operation messages travel over the network, meaning that they could arrive out of order.  (They could also be missed entirely, but that's a separate problem that I'm not working on yet.)

Specifically, I want to be able to take a series of messages and collapse them where appropriate.  For example:

  File-Update P H1 -> H2
  File-Create P1 H1
Result after collapse:  
  '(File-Create P1 H2)

  File-Create P H1
  File-Delete P H1
Result after collapse:
  '()

  File-Delete P X
  File-Create P X
Result after collapse:
  '()
 
  File-Delete P1 H1
  File-Create P2 H2
  File-Create P1 H1
Result after collapse:
  '(File-Create P2 H2)

I've been mulling over various ways to handle all of this and digging around to find examples of other people doing it, but I'm wondering if there's a data structure or existing algorithm that will handle it cleanly.  Does anyone know of such a thing?

evdubs

unread,
Jul 16, 2020, 5:23:08 AM7/16/20
to Racket Users
Do you think you'll need to try to identify the order that the events were created in?

What if user A does:

$ touch file.txt
$ rm file.txt

And what if user A had separately done:

$ rm file.txt
Error: file not found
$ touch file.txt

Would those operations both potentially create File-Create P H1 ; File-Delete P H1 events when things are allowed to be out of order? Don't they result in different states for the filesystem? If you also sent along the time of the operation (or maybe some sequence number), won't that let you keep things ordered and not make decisions that might be ambiguous?

David Storrs

unread,
Jul 16, 2020, 8:41:49 AM7/16/20
to evdubs, Racket Users


On Thu, Jul 16, 2020, 5:23 AM evdubs <evd...@gmail.com> wrote:
Do you think you'll need to try to identify the order that the events were created in?

What if user A does:

$ touch file.txt
$ rm file.txt

And what if user A had separately done:

$ rm file.txt
Error: file not found
$ touch file.txt

You meant for those to be user A and user B, right?


Would those operations both potentially create File-Create P H1 ; File-Delete P H1 events when things are allowed to be out of order? Don't they result in different states for the filesystem? If you also sent along the time of the operation (or maybe some sequence number), won't that let you keep things ordered and not make decisions that might be ambiguous?

Fair question. Yes, time will be sent along with the message. I was trying to boil things down as far as possible and perhaps I boiled too long.

Of course, that raises its own issues about different timezones etc, but those are separate from the issue at hand.

To answer your specific question: That sequence of events results in a conflict that will be flagged for human review.

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/f03b33d8-e286-4cc2-afa5-4aefd0050bd5n%40googlegroups.com.

George Neuner

unread,
Jul 16, 2020, 10:09:44 AM7/16/20
to David Storrs, Racket Users

On 7/16/2020 4:29 AM, David Storrs wrote:
> tl;dr
> Can anyone recommend a data structure that is ordered and supports
> efficient reordering, insertion at arbitrary location, and deletion?
>
> Long form:
>
> I'm working on an operation-log reconciliation problem, where each
> operation is one of:
>
>   File-Create    P H
>   File-Update   P H
>   File-Delete    P H
>   Folder-Create P
>   Folder-Delete P
>
> P = path
> H = hash of the file (e.g. md5)
>
> Operation messages travel over the network, meaning that they could
> arrive out of order.  (They could also be missed entirely, but that's
> a separate problem that I'm not working on yet.)

Shouldn't be a problem:  "at-least-once" and "at-most-once" semantics
both are pretty easy.  It's the "exactly-once" that is the problem. 
Hopefully you have no need for that.


> Specifically, I want to be able to take a series of messages and
> collapse them where appropriate.  For example:
>
>   File-Update P H1 -> H2
>   File-Create P1 H1
> Result after collapse:
>   '(File-Create P1 H2)
>
>   File-Create P H1
>   File-Delete P H1
> Result after collapse:
>   '()
>
>   File-Delete P X
>   File-Create P X
> Result after collapse:
>   '()
>   File-Delete P1 H1
>   File-Create P2 H2
>   File-Create P1 H1
> Result after collapse:
>   '(File-Create P2 H2)
>
> I've been mulling over various ways to handle all of this and digging
> around to find examples of other people doing it, but I'm wondering if
> there's a data structure or existing algorithm that will handle it
> cleanly.  Does anyone know of such a thing?

What if messages are lost permanently, e.g., due to hardware crash?

What it you receive a create but a corresponding delete or update is
lost - then your information / picture of the file system state is wrong.

What if you receive a file delete without a corresponding create? In the
absence of other information, can you even assume there *was* a create? 
If these messages are sent in response to user actions, can they ever be
sent mistakenly?

The problem seems under-specified.  Can you say more about the real purpose?


I can think of some ways to keep an index of the file system and track
changes made to it ... but at best that could provide a snapshot in time
rather than an ongoing audit log.  And it seems like a log would not be
terribly useful without all the information.

George

David Storrs

unread,
Jul 16, 2020, 11:44:37 AM7/16/20
to George Neuner, Racket Users
On Thu, Jul 16, 2020 at 10:09 AM George Neuner <gneu...@comcast.net> wrote:

On 7/16/2020 4:29 AM, David Storrs wrote:

The problem seems under-specified.  Can you say more about the real purpose?

Basic version:  It's a peer-to-peer encrypted swarmed file sharing system that presents like Dropbox on the front end (i.e. "make a change to the filesystem on peer A and peers B-Z will replicate that change") and works something like Bittorrent on the back end in that files are sent in chunks but it offers functionality that Bittorrent does not, such as encrypted transfer, WoT authentication, etc.

What if messages are lost permanently, e.g., due to hardware crash?

What it you receive a create but a corresponding delete or update is
lost - then your information / picture of the file system state is wrong.

What if you receive a file delete without a corresponding create? In the
absence of other information, can you even assume there *was* a create? 
If these messages are sent in response to user actions, can they ever be
sent mistakenly?


The ultimate answer to these questions is "If things get out of sync in a way that the system cannot resolve, it will be flagged for a human to resolve." There are things we do that mitigate them -- for example, a write-ahead log for messages received from peers -- but we acknowledge that we cannot resolve 100% of situations automatically.  Neither can any other file replication service.  (Dropbox, Box.com, etc)

Also relevantly, differences are reconciled across multiple peers.  If there's 5 peers in your replication set and the other 4 agree that there should be a file at path P but you don't have one then it's safe to assume that you missed a File-Create message.  And yes, that comes with issues of its own (Q: What if it was deleted on your machine and none of the others got your File-Delete because you crashed before sending it? A: Worst case, the file gets recreated and the user deletes it again.  Also, move files to a Trash folder in response to a File-Delete, don't actually delete them for a certain period of time) but again we fall back to human resolution.
 

George Neuner

unread,
Jul 16, 2020, 11:21:17 PM7/16/20
to David Storrs, Racket Users

Hi David,


On 7/16/2020 11:44 AM, David Storrs wrote:
On Thu, Jul 16, 2020 at 10:09 AM George Neuner <gneu...@comcast.net> wrote:

The problem seems under-specified.  Can you say more about the real purpose?

Basic version:  It's a peer-to-peer encrypted swarmed file sharing system that presents like Dropbox on the front end (i.e. "make a change to the filesystem on peer A and peers B-Z will replicate that change") and works something like Bittorrent on the back end in that files are sent in chunks but it offers functionality that Bittorrent does not, such as encrypted transfer, WoT authentication, etc.

Interesting.  So I'm guessing your problem is to (compactly) represent the state of the shared space.

Do you plan on having index servers, or are you aiming for a fully distributed solution?  And, if distributed, do you want each node to maintain its own state picture of the shared space, or were you thinking that nodes could just snoop admin broadcasts looking for mention of data they don't currently have?  [Your question about how to pair / collapse messages suggests you might be considering a snoopy solution.]

Asking because keeping a state picture has scalability issues, a snoopy solution has complexity issues, and (depending on latency) both have issues with performing unnecessary work.  In any event, I have some suggestions.



Snoopy is the more interesting case.  You start with a queue of file operations to be done as gleaned from the admin messages - mkdir, rmdir, fetch a file, delete a file, etc. - in whatever order the messages were received. 

Separately, you maintain a (hash table) mapping from pathnames to a list of queue nodes that operate on that object.  The map should use weak references so that nodes can safely be removed from the queue and discarded without also needing to update the map.  If queue processing gets to some operation first, any map reference to it will dissolve (be replaced by #f). 

When a message is received, you lookup the pathname in the map, and if a complementary operation is found in the queue, you remove and discard it.  [You can also remove references in the map or just let them dissolve depending on your handling.]   Then simply discard the message.

Otherwise you queue whatever operation the message indicates and add a reference to the queue node under the object's pathname in the map.

Extra complexity comes in having to notice that map entries (pathnames) have no operations left in the queue.  Weak references don't just disappear - they are changed to #f when the referenced object is no longer reachable - however AFAICT there is no hashtable variant that permits weak reference values, so you have to use weak-boxes and those continue to exist even if the objects they reference are gone.   Useless map entries will need to be identified and removed somehow.


Modeling the filesystem can be done rather simply with a trie in which folders are represented by mutable hash tables and files by structures.  You can combine this with the operation queue above, but in this case lookups can be done in the trie and queue references kept in the trie nodes.  And the trie provides a snapshot of the current state which may be useful for other purposes.


The trick in either case is processing latency: you don't want to wait too long, but if you really want to avoid unnecessary work you need to delay performing file operations long enough that complementary messages are likely to be received.




What if messages are lost permanently, e.g., due to hardware crash?

What it you receive a create but a corresponding delete or update is
lost - then your information / picture of the file system state is wrong.

What if you receive a file delete without a corresponding create? In the
absence of other information, can you even assume there *was* a create? 
If these messages are sent in response to user actions, can they ever be
sent mistakenly?


The ultimate answer to these questions is "If things get out of sync in a way that the system cannot resolve, it will be flagged for a human to resolve." There are things we do that mitigate them -- for example, a write-ahead log for messages received from peers -- but we acknowledge that we cannot resolve 100% of situations automatically.  Neither can any other file replication service.  (Dropbox, Box.com, etc)

Also relevantly, differences are reconciled across multiple peers.  If there's 5 peers in your replication set and the other 4 agree that there should be a file at path P but you don't have one then it's safe to assume that you missed a File-Create message.  And yes, that comes with issues of its own (Q: What if it was deleted on your machine and none of the others got your File-Delete because you crashed before sending it? A: Worst case, the file gets recreated and the user deletes it again.  Also, move files to a Trash folder in response to a File-Delete, don't actually delete them for a certain period of time) but again we fall back to human resolution.


Are you are considering eavesdropping on multicast?  That may work fine on a LAN ... but for wide area the variability in UDP complicates maintaining a consistent global state picture.  For a real replication system I think you really will want a reliable, causal delivery mechanism.  And if large scalability is an issue you will want an adaptive topology that limits the number of connections.


YMMV.  Hope this sparks a good idea.
George

David Storrs

unread,
Jul 17, 2020, 11:14:08 AM7/17/20
to George Neuner, Racket Users
Thanks George.  Much appreciated.
Reply all
Reply to author
Forward
0 new messages