Deletion Strategy

108 views
Skip to first unread message

James Norman

unread,
Nov 10, 2014, 11:17:47 PM11/10/14
to mobile-c...@googlegroups.com
I have a few questions around the best strategy for deleting/purging.  The goal is keep the mobile device lean and reduce the amount of data that is sent when a new device is logged into.

Currently without any purging all the activity over the lifetime of the user account is synced to the relevant channels and the database can grow without bounds.  Users can share data with other users as well, so the data that is on their device wasn't necessarily created by that user as well.

1) I can purge a document from the device easily, but what is a strategy to purge documents that were sent to the device via a sync?  Add a change listener to the database and if the document has a deleted revision and meets some purge criteria purge it from that device as well?

2) The second, and critical issue, is that all the documents over the lifetime of the channel are sent to the device via a sync.  If a user creates and deletes thousands of documents over time, then logs into a different device all of those revisions are sent to the database, even if they were deleted.  There may be minimal relevant data but tons of data may still be sent to the database.  
I can't think of a good strategy for this.  Ideally I think the Sync Gateway should only send documents that don't have a deleted revision, if the document was created and deleted in the timeframe since the last sync of that device+channel.  I can't think of a way to get around this.  We have use cases where users may log in from their other devices and can't limit the amount of data that is sent to that device.  

A simple example would be with the ToDo app.  If I create and delete thousands of items over a few months, then log into my account with my other devices I receive all the old deleted items from before.

Hopefully there is an easy way around this, let me know if you have any ideas or need some clarification.

-james

James Norman

unread,
Nov 11, 2014, 1:01:44 PM11/11/14
to mobile-c...@googlegroups.com
The only way I can think of handling issue 2 right now is have a service on the server that deletes the documents from the couchbase server directly so they don't get re-synced to the clients.  Then have a rest call the clients can hit to get the ID's of the docs that were deleted from the database so they can purge locally.

Not very elegant but the only option I can think of.  Users will login from their other devices and it could take 10+ minutes to sync all the old revisions just to purge it later.  I want to make sure only the relevant information is synced.

Let me know if anyone has any ideas how to handle this.  I would think this may be a somewhat common use case for users, has anyone else run into an issue like this?

James Norman

unread,
Nov 17, 2014, 7:00:31 PM11/17/14
to mobile-c...@googlegroups.com
I'm starting to implement a strategy for this.  Let me know if you think this may work or if I'm missing something obvious.  Has anyone else ran into this issue?  It seems it would be immediate once a user logs into a new device.

The issue again is that all of the documents and revisions over the lifetime of a channel are sent to the client, even if they have a deleted revision.  If a user creates and deletes data and purges it locally over time, then logs into a new device, all of this data is re-sent to the new device.  If they were to use this as a chat application over a year, then log into a new device all the data will be sent again, which could potentially be GB of data.

Ideally I think the sync-gateway should not send a document if it has a deleted revision and the document was created and deleted in the time frame since the last sync.  Another option would be to have a way to delete or archive a document in the sync gateway that would purge it from the clients and not re-sync it to new clients.

My workaround is to have a service on the server that decides the logic to delete documents.  If they are old and can be archived, have a deleted revision etc.  This will delete or archive the document directly to the couchbase server so that the sync gateway will not sync the document ever.  It will also need to tell the clients to purge this recently deleted document.  This will be done either by rest call or by using existing CB architecture by creating a document with the purged ID as a property.

As it is there's really no way to sync deletions and the data can grow without bounds.  I understand the reason and that deletions are just revisions, but there has to a way to limit the amount of data that is sent to a new device, and a way to sync deletions to other clients so the database doesn't grow without bounds.

Thanks for any advice -james

Jens Alfke

unread,
Nov 18, 2014, 12:58:39 PM11/18/14
to mobile-c...@googlegroups.com

On Nov 10, 2014, at 8:17 PM, James Norman <james....@gmail.com> wrote:

2) The second, and critical issue, is that all the documents over the lifetime of the channel are sent to the device via a sync.  If a user creates and deletes thousands of documents over time, then logs into a different device all of those revisions are sent to the database, even if they were deleted.  There may be minimal relevant data but tons of data may still be sent to the database.  

To be clear: only the "tombstone" revisions will be sent. The replication protocol only transfers current revisions, not replaced ones.

Ideally I think the Sync Gateway should only send documents that don't have a deleted revision, if the document was created and deleted in the timeframe since the last sync of that device+channel. 

The replication algorithm (which comes from CouchDB) is designed for multi-master systems, not just client/server. Some parts of it that may seem unnecessary in the basic client/server case, like what you're describing, are necessary in other cases where revisions can be transferred across an arbitrary directed graph. We're not really making use of that functionality yet, but there are some very important scenarios we want to be able to address in the future, like peer-to-peer, so we can't optimize away behaviors that are needed for them. (We also want to preserve compatibility with CouchDB, PouchDB, Cloudant, etc.)

—Jens

J. Chris Anderson

unread,
Dec 5, 2014, 12:17:39 PM12/5/14
to mobile-c...@googlegroups.com


On Monday, November 17, 2014 4:00:31 PM UTC-8, James Norman wrote:
I'm starting to implement a strategy for this.  Let me know if you think this may work or if I'm missing something obvious.  Has anyone else ran into this issue?  It seems it would be immediate once a user logs into a new device.

The issue again is that all of the documents and revisions over the lifetime of a channel are sent to the client, even if they have a deleted revision.  If a user creates and deletes data and purges it locally over time, then logs into a new device, all of this data is re-sent to the new device.  If they were to use this as a chat application over a year, then log into a new device all the data will be sent again, which could potentially be GB of data.

Ideally I think the sync-gateway should not send a document if it has a deleted revision and the document was created and deleted in the time frame since the last sync.  Another option would be to have a way to delete or archive a document in the sync gateway that would purge it from the clients and not re-sync it to new clients.


I like this idea of a special kind of purge tombstone, something like a delete, that sync gateway uses to tell databases to purge a document locally. I think this would be different than deletes, which you might want to have around for a while especially in case of p2p where you want clients to be able to tell each other to delete stuff, not just listen to the server for deletes.

But if the server has a worker that takes any document that's been deleted for more than 2 weeks and (insert application criteria) and turns it into one of these purge tombstones, then clients can avoid fetching content via revs diff, the purge could even be communicated in the changes feed.

Chris
Reply all
Reply to author
Forward
0 new messages