Complex sync requirements (sync_gateway)

218 views
Skip to first unread message

Nick Wood

unread,
Oct 4, 2013, 10:17:12 AM10/4/13
to mobile-c...@googlegroups.com
We have some complex sync requirements and I haven't been able to wrap my head around how to use the sync function.

The basic use case is that there is a group of mobile users from a company.  The company's docs are in the same database as many other companies docs.  Some company users will have access to all documents, others will only have access to some documents.  Access can change along the way.

Maybe some example docs would help to illustrate and work through this.

Let's pretend this is a task management system and multiple notes can be added to each task.  We have task documents, and note documents.

{
 "id" : "1",
 "name" : "task1",
 "company" : "Acme",
 "type" : "task",
 "owners" : ["user1"],
}

{
 "id" : "2",
 "name" : "task2",
 "company" : "Acme",
 "type" : "task",
 "owners" : ["user2"],
}

{
 "id" : "3",
 "name" : "task3",
 "company" : "Different Company",
 "type" : "task",
 "owners" : ["user5"],
}

{
 "id" : "4",
 "name" : "note1",
 "type" : "note",
 "parent" : "1"
}

{
 "id" : "5",
 "name" : "note2",
 "type" : "note",
 "parent" : "1"
}

{
 "id" : "6",
 "name" : "note3",
 "type" : "note",
 "parent" : "2"
}

user1 is a "regular" user, so they should only have access to the tasks that they own (task 1) and their child notes (note1 and note2, but not note3)
user2 is an "manager" user, so they should have access to all tasks and all child notes for their company (not task3 or note3 because they're for another company)

1. I can envision one solution where I use a channels property in every document and our application logic can handle distinguishing between "regular" users and "manager" users.  Thinking through the drawbacks to this approach:
 - If a "regular" users gets promoted to a "manager", we would have to manually update the channels property for all of that company's documents.  In our case, we'll have 10,000+ documents per company and even though nothing substantial will have changed, this will trigger re-syncing all documents to every mobile user.

One possible solution to this would be to have direct API access to manipulate channel access from our app servers.  My theory is that by manipulating the "access control list" directly, it wouldn't cause each document to be updated and therefore wouldn't trigger unnecessary syncing to all of the mobile clients.  I don't understand the internals well enough to know if 1) such an api exists or 2) manipulating it this way would trigger appropriate syncing or "unsyncing" in the case of reduced access.

2. Can sync() function logic be based on a users role?  For example, if a user is a "manager" can we sync all documents regardless of the "owners" property?  Then if the user gets demoted to a "regular" user, will sync_gateway know to remove their access to all of the documents that they aren't an "owner" on?

3. I don't have an "owners" property on the note documents and the parent "task" documents don't point to the child "note" documents, it's the other way around.  When the sync() function comes across a note document, is there any way to have it grant access based on the referenced parent document's "owners" property?  I think I know the answer to this but wanted to bring up the use case.  Again, the "possible solution" for point #1 above would be helpful for something like this.

4. The other question this brings to mind is how well the channels concept scales.  How many channels can we reasonably have?  Hundreds of thousands, millions?

Sorry for the long post - I can only imagine that these use cases will be very common so solving them would be helpful.

fyi - we're currently solving this by running a separate CouchDB database per user, but with CouchDB not being our "master" database, keeping Couchbase and CouchDB in sync isn't very fun.

  Nick

Jens Alfke

unread,
Oct 5, 2013, 2:51:31 PM10/5/13
to mobile-c...@googlegroups.com
On Oct 4, 2013, at 7:17 AM, Nick Wood <nwoo...@gmail.com> wrote:

{
 "id" : "1",
 "name" : "task1",
 "company" : "Acme",
 "type" : "task",
 "owners" : ["user1"],
}

OK, looks like the sync function would define a channel for this task and give the owners and company admins access to it:
var docChannel = “task-“+doc._id;
channel(docChannel);
access(doc.owners, docChannel);
access(“role:manager-”+doc.company, docChannel);

{
 "id" : "4",
 "name" : "note1",
 "type" : "note",
 "parent" : "1"
}

A note document would be assigned to the channel of its parent task.
channel(“task-“+doc.parent);

user1 is a "regular" user, so they should only have access to the tasks that they own (task 1) and their child notes (note1 and note2, but not note3)
user2 is an "manager" user, so they should have access to all tasks and all child notes for their company (not task3 or note3 because they're for another company)

All you need for this is to define roles of the form “manager-company” and give each manager user access to her company’s manager role.

2. Can sync() function logic be based on a users role?  For example, if a user is a "manager" can we sync all documents regardless of the "owners" property?  Then if the user gets demoted to a "regular" user, will sync_gateway know to remove their access to all of the documents that they aren't an "owner" on?

Yes, this is what the last access() call in the example function docs.

3. I don't have an "owners" property on the note documents and the parent "task" documents don't point to the child "note" documents, it's the other way around.  When the sync() function comes across a note document, is there any way to have it grant access based on the referenced parent document's "owners" property?

It’s sort of implicit. The parent task document defines a channel for everything related to that task and assigns access to it. The child note document just adds itself to that channel.

This channel stuff is devilishly clever, and the credit goes to JChris. He sketched it on a whiteboard and I just implemented it, and we kept finding more things it could do.

4. The other question this brings to mind is how well the channels concept scales.  How many channels can we reasonably have?  Hundreds of thousands, millions?

I think so. It’s hard to answer because the ‘weight’ of a channel keeps shifting somewhat as I change the implementation to optimize for different things.

Currently every channel creates a document that basically caches the recent _changes-feed entries for it. Every channel() call that a sync function makes ends up appending an entry to the document for each channel. Couchbase Server is pretty good at handling vast numbers of documents — we have a potential customer right now who’s talking about hundreds of billions in a very large cluster.

Off the top of my head, things that will cause overhead are assigning a doc to lots of channels, or giving  a user access to lots of channels. Where ‘lots’ is an intentionally vague quantity, because we haven’t done in-depth performance testing yet.

Sorry for the long post - I can only imagine that these use cases will be very common so solving them would be helpful.

Yeah, we are discovering common design patterns as we run across them. There’s probably a book to be written about this, so I should get O’Reilly on the phone…

—Jens

Nick Wood

unread,
Oct 7, 2013, 1:17:26 PM10/7/13
to mobile-couchbase
Very helpful, thank you.  I'll be the first one to buy the O'Reilly book once you're finished ;)

Based on "Off the top of my head, things that will cause overhead are assigning a doc to lots of channels, or giving  a user access to lots of channels. " - if a user has 500 tasks, then the approach you suggested would create 500 channels for that user.  We actually have other "parent" object types besides tasks that follow the same concept, so in our use case there will be thousands of parent objects or in other words, thousands of channels per user.  Does thousands of channels count as "lots of channels"?

Also, if we give a role access to a document using what you suggested here:

access(“role:manager-”+doc.company, docChannel);

Then if a user is removed from that role, would sync_gateway know to revoke access to those documents if they had previously been synced?

  Nick




--
You received this message because you are subscribed to a topic in the Google Groups "Mobile Couchbase" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mobile-couchbase/52-19oUnCTc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mobile-couchba...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jens Alfke

unread,
Oct 7, 2013, 2:03:06 PM10/7/13
to mobile-c...@googlegroups.com

On Oct 7, 2013, at 10:17 AM, Nick Wood <nwoo...@gmail.com> wrote:

> Does thousands of channels count as "lots of channels"?

Currently yes; that’s one of the things I need to optimize before 1.0.

> if a user is removed from that role, would sync_gateway know to revoke access to those documents if they had previously been synced?

It can’t do anything about the docs already on the user’s device; that horse has left the barn. But the user can’t access the docs on the server anymore, nor will they get updates.

—Jens

Nick Wood

unread,
Oct 7, 2013, 2:27:32 PM10/7/13
to mobile-couchbase
For the "lots of channels" issue, how would you recommend we move forward?  We don't need to launch this for ~3 months, and can even shift that a little based on the 1.0 release.  Should we move forward with the "thousands of channels per user" and assume that will be worked out, or should we go for a different approach, like a channel per user?

In the situation where a user is removed from a role, I realize that what's already on the client is out of the server's control.  In our case though, it should act as if the document were deleted so that upon the next sync, the documents that the user no longer has access to are removed.  Is there any way to accomplish this without deleting the client database and re-syncing from scratch?

  Nick



—Jens

Jens Alfke

unread,
Oct 7, 2013, 2:54:25 PM10/7/13
to mobile-c...@googlegroups.com
On Oct 7, 2013, at 11:27 AM, Nick Wood <nwoo...@gmail.com> wrote:

Should we move forward with the "thousands of channels per user" and assume that will be worked out, or should we go for a different approach, like a channel per user?

Fewer channels per user would be a safer bet right now since it’s better suited to today’s version of the software. But (circling back to the original topic) it’d require a different design of your sync function, and I’m not sure it could handle everything you need.

In our case though, it should act as if the document were deleted so that upon the next sync, the documents that the user no longer has access to are removed.

Couchbase Lite should [but doesn’t yet] do that for you — that's issue 29. Should be easy to implement but it hasn’t made it to the top of my to-do list yet.

—Jens

Nick Wood

unread,
Oct 7, 2013, 3:27:09 PM10/7/13
to mobile-couchbase
Ok, so back to my original proposed solution - would it be possible to get direct "api" access to the access control lists?  This way we could create a channel per user and our application logic could add and remove documents from channels based on the more complex logic that we use.

  Nick


--

Nick Wood

unread,
Oct 16, 2013, 8:12:50 PM10/16/13
to mobile-c...@googlegroups.com
Looks like we have two options:

1. Manage a "channels" property for every document and update children whenever a parent changes (triggering a re-sync for all of the other channels, even though the meat of the document didn't change).

2. Continue to use CouchDB with a separate database per user like we've been doing, which is a little more resource intensive, but a better experience for the end users.

Are there any other reasonable solutions that I'm not seeing?

  Nick

MikeL

unread,
Feb 7, 2014, 4:10:35 PM2/7/14
to mobile-c...@googlegroups.com
Nick or Jens, any further thoughts for minimizing channel overload?

I am trying to architect a similar design where users might share documents as individuals and teams. The major stumbling block is that there will be many sub-documents that will be continuously generated which may impact channel performance.

Solution 1:
document (channel-doc-id)
sub-document (channel-doc-id)
user subscribes to many channel-doc-id channels (possibly too many channels per user)

Solution 2:
document ([channel-user-id, ...])
sub-document ([channe-user-id, ...]) (becomes prohibitive when there are changes in parent document channels)
users would only have to subscribe to one channel per user

Any thoughts?

Thanks in advance

Nick Wood

unread,
Feb 7, 2014, 4:24:50 PM2/7/14
to mobile-couchbase
Yes, your use case is the same as mine.  I haven't found a good solution yet, other than writing our own sync system, which isn't good...  I believe that Chris said he was working on a proof of concept that could handle many channels and scale, but I'm not sure what the status is.

  Nick


--
You received this message because you are subscribed to a topic in the Google Groups "Couchbase Mobile" group.

To unsubscribe from this topic, visit https://groups.google.com/d/topic/mobile-couchbase/52-19oUnCTc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mobile-couchba...@googlegroups.com.

Jens Alfke

unread,
Feb 8, 2014, 4:00:30 PM2/8/14
to mobile-c...@googlegroups.com
We've been doing a bunch of optimization work on the gateway lately, mostly involving the _changes feed. Some caching has been added that may make it cheaper to subscribe to multiple channels. There are so many possible variables in a system like this that I don't want to promise anything in general, though.

It would be interesting if someone built an experiment to see how this would perform. We've written a test program called gateload that connects to the sync gateway and simulates multiple clients doing push and pull replications. We're testing with up to 10,000 active clients currently. Gateload is somewhat configurable, but I think doing a test of your scenario would require modifying the code. (It's not very complicated.) During tests you can poll the gateway admin port's "/_expvar" URI to get statistics on what it's doing.

—Jens

J. Chris Anderson

unread,
Feb 11, 2014, 10:14:25 PM2/11/14
to mobile-c...@googlegroups.com
If your app has each user subscribed to a handful of channels, it should "just work" unless you have more than a few thousand connected users. At that point you will need to start optimizing. 

I think effective optimizations would look something like, having a continuous feed only of one or two channels that are active in the UI. (eg the Todo list the user is viewing). And for everything else, have a background replication (maybe pull once every few minutes). You could also supplement this with Apple / Android push notifications to trigger code to sync channels with urgent changes.

We are still learning what scale you have to start thinking about these things, vs smaller scales where you can pretty much do whatever. (I was able to do a continuous sync of 500+ channels -- just one client, but I wouldn't recommend it.)

Chris

Reply all
Reply to author
Forward
0 new messages