Best data model for large content-sharing application with Sync Gateway and CBL

192 views
Skip to first unread message

Jakob Hoydis

unread,
Jul 18, 2014, 1:08:43 PM7/18/14
to mobile-c...@googlegroups.com
Hi,

We are currently building a platform for content sharing, where every user has access to a sub-set of all available documents. The access rights for each document and user can change. The number of documents is much larger than the number of users (ratio 100/1).

From my understanding, Sync Gateway offers two ways to handle this problem:

A - Every user account has its own channel and each document has a channel property which contains an array of all the channels (i.e. users) to which it belongs. The sync function then maps the document to all the listed channels.

B - Each document has its own channel and we give the users the access rights to the different document-channels through a sync function or the Admin API.

The problems with A we see are that :
  1. we do not want to store a channel array in each document since this is information should be hidden from the user,
  2. the update of the channel-array will create a new revision of the document which will trigger an unnecessary replication to all users who already have access to the document,
  3. there is currently no Admin API call which would allow us to assign a list of document ids to one or more channels, e.g. assign([docId1,docId2,...],[channel1,...]). This would be similar to the method 'channel([channel1,channel2,...])' which is available in the sync function.
The problem with B is that the number of documents is much larger than the number of users and we will end up with a very larger number of channels and channels per user (of course, we could bundle all the channels to which a user has access in a role per user, but this does not solve the problem). For the moment, we are not sure how this solution will scale. 

Additionally, we need to treat the case where a user loses access to a document. Depending on the chosen solution, we end up with the following situation:

A - The document will be removed from a user's channel. Sync Gateway will communicate this to CBL as a new revision of the document with a deleted-property. This event can be detected on any device, such that the document can be deleted locally.

B - The user loses access to the document's channel. Currently, Sync Gateway does not trigger any event which would inform the user that the document should be deleted from its local database. One needs to handle this problem separately.

Overall, none of the two solutions seems to be optimal, although it should be a classic use-case of Sync Gateway.

Therefore, I am curious to know if there are any better solutions to our problem and/or if I am missing something important?

Thanks,

Jakob

Jens Alfke

unread,
Jul 18, 2014, 7:04:00 PM7/18/14
to mobile-c...@googlegroups.com, J Chris Anderson
On Jul 18, 2014, at 10:08 AM, Jakob Hoydis <jakob....@gmail.com> wrote:

A - Every user account has its own channel and each document has a channel property which contains an array of all the channels (i.e. users) to which it belongs. The sync function then maps the document to all the listed channels.

I think A would be our default answer to this question. But I understand your concerns about not exposing the access list in the document, and avoiding unnecessary replications.

B - Each document has its own channel and we give the users the access rights to the different document-channels through a sync function or the Admin API.

The channel scalability should mostly affect RAM usage, but to be honest this isn’t a scenario we’ve done performance/scalability testing on yet.
And you’re right about the problem of docs not being removed locally when access to channels is lost. Hm.

This is a scenario that might benefit from a mechanism for one doc to assign another doc to a channel. But I’m not sure whether that’s doable with our current architecture.

JChris, do you have any ideas about this?

—Jens

ja...@spraed.net

unread,
Jul 19, 2014, 5:57:37 PM7/19/14
to mobile-c...@googlegroups.com, jch...@couchbase.com
Thanks for your reply. 

I think A would be our default answer to this question. But I understand your concerns about not exposing the access list in the document, and avoiding unnecessary replications.

The best solution we came up with is to use cryptic channel names which do not reveal anything about a user. However, the worst part is that the more users have access to a document the larger it gets and the more unnecessary replications get triggered each time a new user is granted access to it.
 
This is a scenario that might benefit from a mechanism for one doc to assign another doc to a channel. But I’m not sure whether that’s doable with our current architecture.
 
This is indeed a functionality which would be great to have. Are there any concrete plans to have such a feature anytime soon?


J. Chris Anderson

unread,
Jul 22, 2014, 12:27:17 AM7/22/14
to mobile-c...@googlegroups.com, jch...@couchbase.com
The idea of an api where some documents can set channels for other document ids is super interesting. Same goes for a rest admin API for that.

Until we get an API like that, I might lean toward B, unless I'm incorrect about some of the runtime / performance requirements of your app. B will use less disk space as each document is only indexed once. The side of the tradeoff is that queries run when users sync all channels, can be expensive.

What you say here, is definitely a bug we need to fix:

B - The user loses access to the document's channel. Currently, Sync Gateway does not trigger any event which would inform the user that the document should be deleted from its local database. One needs to handle this problem separately.

If you don't mind having lots of channels per user, you can create a channel for each document, without listing the users on the document. Instead have a meta document, with an id based on the docid (maybe append "-meta" to the id) and when the sync function sees that, it grants access to the user to the channel for that document.

This way the list of users can be on the meta document, which isn't synced with any of them (it can be synced to the document owner or maybe only to admin) and impact user access to the document's channel. So that avoids leaking the userid or obfuscating it but you still have to deal with lots of channels. You'd use the changes worker pattern to watch documents and send push notifications to users who should sync that channel, so they can schedule a 1 time sync for the latest update.

Unless you expect these documents all to be equally long lived and active, you could do something like have the client keep a priority queue of channels to proactively sync, and then you run something on the server to send push notifications when there are updates to channels. This avoids a user always trying to sync all channels. 

I'm not sure what scale you'd need to start scheduling sync. In demos I've run a handful of devices with hundreds of channels for each user just by using the default sync methods. It's definitely not as easy on the system as syncing just a few channels per user, but it didn't seem to hurt anything in my testing. So by having each document be accompanied by a secret meta document that can set access rights on the first document, you get the security you are looking for, and you can probably get pretty far before you have to worry about scheduling sync of subsets of a user's channels.

You could even do this just by updating the documents themselves, if you had a robot that tags documents that have been idle, the sync function can react by not calling channel() so that revision doesn't sync, and the document is cleaned up from phones that connect (unless there has been offline activity). A conflict handling bot could resolve those conflicts in favor of the non-idle revision.

I think many channels per user (one channel per document in your case) results in smaller indexes than many channels per document (each channel for a user), but it's a matter of space / time tradeoff.

If you are going to do a channel per document then you might as well have as many documents as you need in that channel, and then you can avoid conflicts as users are less likely to edit the same thing concurrently.

Chris

Jens Alfke

unread,
Jul 22, 2014, 1:33:48 AM7/22/14
to mobile-c...@googlegroups.com, J Chris Anderson

On Jul 21, 2014, at 9:27 PM, J. Chris Anderson <jch...@couchbase.com> wrote:

What you say here, is definitely a bug we need to fix:

B - The user loses access to the document's channel. Currently, Sync Gateway does not trigger any event which would inform the user that the document should be deleted from its local database. One needs to handle this problem separately.

The problem with fixing #264 is that I can’t think of any way to do it cheaply. After the user is removed from a channel, the gateway will have to query to find all documents that are in the removed channel but not in any other channel the user still has access to, and then list each of those. And that query is likely to be expensive. So I’m worried about this situation happening too often.

—Jens

J. Chris Anderson

unread,
Jul 22, 2014, 12:47:11 PM7/22/14
to mobile-c...@googlegroups.com
The cheaper solution would be to send the client a message that says drop channel X. And there is some kind of ref count in the client that when no more channels reference a doc it is purged.

ja...@spraed.net

unread,
Jul 23, 2014, 9:49:34 AM7/23/14
to mobile-c...@googlegroups.com
Thanks for your detailed responses. For the moment, I think that we will stick to solution B. 

Concerning your last comment, we are currently facing the problem that a user might be logged in on multiple devices and we must ensure that a document gets deleted from all of them. Sending messages from the server to the clients is risky, as some of them might get lost, e.g., when a device is in sleep mode or the app is closed. 

We currently think about a solution where there is a replicated meta-document per user which contains the number of documents to which a user has access and a list of all the documents which have been removed. When the number of docs in the client database is larger than the one reported in this document, we figure out which document ids need to be removed locally. One problem with this solution is that the list of deleted documents grows over time. 

Jens Alfke

unread,
Jul 23, 2014, 11:42:17 AM7/23/14
to mobile-c...@googlegroups.com

On Jul 23, 2014, at 6:49 AM, ja...@spraed.net wrote:

Concerning your last comment, we are currently facing the problem that a user might be logged in on multiple devices and we must ensure that a document gets deleted from all of them. Sending messages from the server to the clients is risky, as some of them might get lost, e.g., when a device is in sleep mode or the app is closed. 

I’m sure that when Chris wrote "send the client a message” he meant a notification in the _changes feed. That’s guaranteed to be delivered to all clients, because of the way the _changes feed works.

 And actually it would be pretty easy to add such a notification. Then we'd also need to include in the feed a list of the channels that every document is in, which is also straightforward (but verbose). The harder part would be on the client side — the database would need to be extended to track channels, and on channel removal it’d need to run a fairly expensive query to find which docs to purge.

Sorry we don’t have a good solution for this yet! That’s the nature of 1.0 designs, unfortunately. As we get more experience with real-world requirements like yours we’ll steadily improve the design and implementation to optimize it for more use cases.

—Jens

Jakob Hoydis

unread,
Jul 24, 2014, 8:36:37 AM7/24/14
to mobile-c...@googlegroups.com
Ok, makes sense. Looking forward to having this feature!

James Norman

unread,
Oct 13, 2014, 6:32:08 PM10/13/14
to mobile-c...@googlegroups.com, jch...@couchbase.com
+1 on this, I'm running into this issue as well.  We have Groups or Teams of users, and if we add a user to a team, that team channel is added to their channels.  Removing the channel from the user however does not trigger the document to be removed on the client.  I have a workaround to this but it's not as clean as adding and removing channels from the user on the server.

-james
Reply all
Reply to author
Forward
0 new messages