On 10 October 2012 00:03, Layton Duncan <lay...
> We have an app which has a single couch database for multiple users, each of which requires specific "live" data filtered to TouchDB.
> As this couchdb database grows larger (into the millions of sequence numbers) the first login experience of users on devices becomes incredibly painful as couch runs the filtered replication over every single document. Out of the millions of changes, there may only be a few thousand at most which need replication to devices.
> Are there any optimisations out there for this sort of filtered replication starting from a sequence number of 0. We're considering maintaining a separate indexed changes feed in SQL, based on our filter parameters to return the changes feed in this special case, it seems like filtered replication was never designed with this sort of "replicate from scratch fast" in mind, but seems incredibly useful, almost required when dealing with mobile devices and shared data.
I have 2 tricks to this, in normal CouchDB land.
1. Put an intermediary, separate DB per end user to use a replication
point for the TouchDB. Then let the server do the hard work of
filtering from the master copy back to the endusers. All the concerns
about duplicated storage, increased inefficiency etc all apply. This
pattern doesn't fit for some use cases.
2. Set up a view that implements that filter. The key advantage is
that the view is pre-sorted and pre-calculated so your request to get
the relevant doc IDs is a nice tidy fast range query. Then you use
named replication to pull them down to the device. The downside?
TouchDB doesn't support named replication. Maybe Jens feels like
adding this in some future release, I think this is a very useful
feature for mobile.
The workaround, given the replicator is "just" an HTTP client, you can
also do the same thing as the replicator, pulling the docs down and
then poking them into TouchDB. I can't speak for how easy that would
be to do, nor whether that's a dirty trick, but I think its doable.
WRT to the _changes feed, the update sequence from the DB / view can
also be used directly in the changes feed (it's the same pointer to
So the high-level pattern becomes:
- query the view to make sure its up to date
- stash the DB seq number for future changes feed
- pull all the docs identified as relevant in the view to the DB &
stash them (Magical Step Occurs Here)
- switch back to using changes?since=seqnum feed for subsequent updates
I have a dream that one day, post BigCouch/Refuge merges back into
CouchDB, all the various filters (replication, views, changes etc)
will all run from the same core code and can use either live updates
or an incrementally updated view, as needs require.
I'd expect a measurable benefit in using erlang for filters on changes
feed simply as it avoids the need to run a JS function (via couchjs)
and with many concurrent users this will eventually become a
bottleneck. I've not had the need personally to do this though.
Oh and greets from another mainlander!