We have an app which has a single couch database for multiple users, each of which requires specific "live" data filtered to TouchDB.
As this couchdb database grows larger (into the millions of sequence numbers) the first login experience of users on devices becomes incredibly painful as couch runs the filtered replication over every single document. Out of the millions of changes, there may only be a few thousand at most which need replication to devices.
Are there any optimisations out there for this sort of filtered replication starting from a sequence number of 0. We're considering maintaining a separate indexed changes feed in SQL, based on our filter parameters to return the changes feed in this special case, it seems like filtered replication was never designed with this sort of "replicate from scratch fast" in mind, but seems incredibly useful, almost required when dealing with mobile devices and shared data.
On 10 October 2012 00:03, Layton Duncan <lay...@polarbearfarm.com> wrote:
> We have an app which has a single couch database for multiple users, each of which requires specific "live" data filtered to TouchDB.
> As this couchdb database grows larger (into the millions of sequence numbers) the first login experience of users on devices becomes incredibly painful as couch runs the filtered replication over every single document. Out of the millions of changes, there may only be a few thousand at most which need replication to devices.
> Are there any optimisations out there for this sort of filtered replication starting from a sequence number of 0. We're considering maintaining a separate indexed changes feed in SQL, based on our filter parameters to return the changes feed in this special case, it seems like filtered replication was never designed with this sort of "replicate from scratch fast" in mind, but seems incredibly useful, almost required when dealing with mobile devices and shared data.
I have 2 tricks to this, in normal CouchDB land.
1. Put an intermediary, separate DB per end user to use a replication
point for the TouchDB. Then let the server do the hard work of
filtering from the master copy back to the endusers. All the concerns
about duplicated storage, increased inefficiency etc all apply. This
pattern doesn't fit for some use cases.
2. Set up a view that implements that filter. The key advantage is
that the view is pre-sorted and pre-calculated so your request to get
the relevant doc IDs is a nice tidy fast range query. Then you use
named replication to pull them down to the device. The downside?
TouchDB doesn't support named replication. Maybe Jens feels like
adding this in some future release, I think this is a very useful
feature for mobile.
The workaround, given the replicator is "just" an HTTP client, you can
also do the same thing as the replicator, pulling the docs down and
then poking them into TouchDB. I can't speak for how easy that would
be to do, nor whether that's a dirty trick, but I think its doable.
WRT to the _changes feed, the update sequence from the DB / view can
also be used directly in the changes feed (it's the same pointer to
the DB).
So the high-level pattern becomes:
- query the view to make sure its up to date
- stash the DB seq number for future changes feed
- pull all the docs identified as relevant in the view to the DB &
stash them (Magical Step Occurs Here)
- switch back to using changes?since=seqnum feed for subsequent updates
I have a dream that one day, post BigCouch/Refuge merges back into
CouchDB, all the various filters (replication, views, changes etc)
will all run from the same core code and can use either live updates
or an incrementally updated view, as needs require.
I'd expect a measurable benefit in using erlang for filters on changes
feed simply as it avoids the need to run a JS function (via couchjs)
and with many concurrent users this will eventually become a
bottleneck. I've not had the need personally to do this though.
> On 10 October 2012 00:03, Layton Duncan <lay...@polarbearfarm.com> wrote:
>> We have an app which has a single couch database for multiple users, each of which requires specific "live" data filtered to TouchDB.
>> As this couchdb database grows larger (into the millions of sequence numbers) the first login experience of users on devices becomes incredibly painful as couch runs the filtered replication over every single document. Out of the millions of changes, there may only be a few thousand at most which need replication to devices.
>> Are there any optimisations out there for this sort of filtered replication starting from a sequence number of 0. We're considering maintaining a separate indexed changes feed in SQL, based on our filter parameters to return the changes feed in this special case, it seems like filtered replication was never designed with this sort of "replicate from scratch fast" in mind, but seems incredibly useful, almost required when dealing with mobile devices and shared data.
> I have 2 tricks to this, in normal CouchDB land.
> 1. Put an intermediary, separate DB per end user to use a replication
> point for the TouchDB. Then let the server do the hard work of
> filtering from the master copy back to the endusers. All the concerns
> about duplicated storage, increased inefficiency etc all apply. This
> pattern doesn't fit for some use cases.
> 2. Set up a view that implements that filter. The key advantage is
> that the view is pre-sorted and pre-calculated so your request to get
> the relevant doc IDs is a nice tidy fast range query. Then you use
> named replication to pull them down to the device. The downside?
> TouchDB doesn't support named replication. Maybe Jens feels like
> adding this in some future release, I think this is a very useful
> feature for mobile.
> The workaround, given the replicator is "just" an HTTP client, you can
> also do the same thing as the replicator, pulling the docs down and
> then poking them into TouchDB. I can't speak for how easy that would
> be to do, nor whether that's a dirty trick, but I think its doable.
> WRT to the _changes feed, the update sequence from the DB / view can
> also be used directly in the changes feed (it's the same pointer to
> the DB).
> So the high-level pattern becomes:
> - query the view to make sure its up to date
> - stash the DB seq number for future changes feed
> - pull all the docs identified as relevant in the view to the DB &
> stash them (Magical Step Occurs Here)
> - switch back to using changes?since=seqnum feed for subsequent updates
> I have a dream that one day, post BigCouch/Refuge merges back into
> CouchDB, all the various filters (replication, views, changes etc)
> will all run from the same core code and can use either live updates
> or an incrementally updated view, as needs require.
> I'd expect a measurable benefit in using erlang for filters on changes
> feed simply as it avoids the need to run a JS function (via couchjs)
> and with many concurrent users this will eventually become a
> bottleneck. I've not had the need personally to do this though.
On Oct 9, 2012, at 3:27 PM, Dave Cottlehuber <d...@jsonified.com<mailto:d...@jsonified.com>> wrote:
1. Put an intermediary, separate DB per end user to use a replication
point for the TouchDB. Then let the server do the hard work of
filtering from the master copy back to the endusers.
This is pretty much a requirement for those apps that need to read-protect data: if user A shouldn’t be able to see user B’s data, there’s no effective way to enforce that in CouchDB if the documents are in the same database (well, without encrypting the documents.)
2. Set up a view that implements that filter. The key advantage is
that the view is pre-sorted and pre-calculated so your request to get
the relevant doc IDs is a nice tidy fast range query. Then you use
named replication to pull them down to the device. The downside?
TouchDB doesn't support named replication. Maybe Jens feels like
adding this in some future release, I think this is a very useful
feature for mobile.
Um, yeah, I think this fell between the cracks somehow. I think on the pull side it should be pretty trivial to implement; just another URL parameter on the _changes feed. If so I could squeeze it in before 1.0.
On Oct 9, 2012, at 3:27 PM, Dave Cottlehuber <d...@jsonified.com<mailto:d...@jsonified.com>> wrote:
2. Set up a view that implements that filter. The key advantage is
that the view is pre-sorted and pre-calculated so your request to get
the relevant doc IDs is a nice tidy fast range query. Then you use
named replication to pull them down to the device.
Is this documented anywhere? I’m looking through the API docs<http://wiki.apache.org/couchdb/Complete_HTTP_API_Reference> on the wiki, and tried some random googling too, but can’t find any reference to a _changes or replication parameter that specifies a view. That’s probably why I never implemented it o_O
On 11 October 2012 19:53, Jens Alfke <j...@couchbase.com> wrote:
> On Oct 9, 2012, at 3:27 PM, Dave Cottlehuber <d...@jsonified.com> wrote:
> 2. Set up a view that implements that filter. The key advantage is
> that the view is pre-sorted and pre-calculated so your request to get
> the relevant doc IDs is a nice tidy fast range query. Then you use
> named replication to pull them down to the device.
> Is this documented anywhere? I’m looking through the API docs on the wiki,
> and tried some random googling too, but can’t find any reference to a
> _changes or replication parameter that specifies a view. That’s probably why
> I never implemented it o_O
Aah that's not quite what I meant, sorry for sending you on a
misguided google hunt!
The missing feature I was referring to was named doc replication. Hope
that makes more sense.