Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Filtered replication performance
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Layton Duncan  
View profile  
 More options Oct 9 2012, 6:03 pm
From: Layton Duncan <lay...@polarbearfarm.com>
Date: Wed, 10 Oct 2012 11:03:48 +1300
Local: Tues, Oct 9 2012 6:03 pm
Subject: Filtered replication performance
We have an app which has a single couch database for multiple users, each of which requires specific "live" data filtered to TouchDB.

As this couchdb database grows larger (into the millions of sequence numbers) the first login experience of users on devices becomes incredibly painful as couch runs the filtered replication over every single document. Out of the millions of changes, there may only be a few thousand at most which need replication to devices.

Are there any optimisations out there for this sort of filtered replication starting from a sequence number of 0. We're considering maintaining a separate indexed changes feed in SQL, based on our filter parameters to return the changes feed in this special case, it seems like filtered replication was never designed with this sort of "replicate from scratch fast" in mind, but seems incredibly useful, almost required when dealing with mobile devices and shared data.

Layton


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Cottlehuber  
View profile  
 More options Oct 9 2012, 6:27 pm
From: Dave Cottlehuber <d...@jsonified.com>
Date: Wed, 10 Oct 2012 00:27:28 +0200
Local: Tues, Oct 9 2012 6:27 pm
Subject: Re: Filtered replication performance
On 10 October 2012 00:03, Layton Duncan <lay...@polarbearfarm.com> wrote:

> We have an app which has a single couch database for multiple users, each of which requires specific "live" data filtered to TouchDB.

> As this couchdb database grows larger (into the millions of sequence numbers) the first login experience of users on devices becomes incredibly painful as couch runs the filtered replication over every single document. Out of the millions of changes, there may only be a few thousand at most which need replication to devices.

> Are there any optimisations out there for this sort of filtered replication starting from a sequence number of 0. We're considering maintaining a separate indexed changes feed in SQL, based on our filter parameters to return the changes feed in this special case, it seems like filtered replication was never designed with this sort of "replicate from scratch fast" in mind, but seems incredibly useful, almost required when dealing with mobile devices and shared data.

I have 2 tricks to this, in normal CouchDB land.

1. Put an intermediary, separate DB per end user to use a replication
point for the TouchDB. Then let the server do the hard work of
filtering from the master copy back to the endusers. All the concerns
about duplicated storage, increased inefficiency etc all apply. This
pattern doesn't fit for some use cases.

2. Set up a view that implements that filter. The key advantage is
that the view is pre-sorted and pre-calculated so your request to get
the relevant doc IDs is a nice tidy fast range query. Then you use
named replication to pull them down to the device. The downside?
TouchDB doesn't support named replication. Maybe Jens feels like
adding this in some future release, I think this is a very useful
feature for mobile.

The workaround, given the replicator is "just" an HTTP client, you can
also do the same thing as the replicator, pulling the docs down and
then poking them into TouchDB. I can't speak for how easy that would
be to do, nor whether that's a dirty trick, but I think its doable.

WRT to the _changes feed, the update sequence from the DB / view can
also be used directly in the changes feed (it's the same pointer to
the DB).

So the high-level pattern becomes:

- query the view to make sure its up to date
- stash the DB seq number for future changes feed
- pull all the docs identified as relevant in the view to the DB &
stash them (Magical Step Occurs Here)
- switch back to using changes?since=seqnum feed for subsequent updates

I have a dream that one day, post BigCouch/Refuge merges back into
CouchDB, all the various filters (replication, views, changes etc)
will all run from the same core code and can use either live updates
or an incrementally updated view, as needs require.

I'd expect a measurable benefit in using erlang for filters on changes
feed simply as it avoids the need to run a JS function (via couchjs)
and with many concurrent users this will eventually become a
bottleneck. I've not had the need personally to do this though.

Oh and greets from another mainlander!

A+
Dave


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Layton Duncan  
View profile  
 More options Oct 9 2012, 7:22 pm
From: Layton Duncan <lay...@polarbearfarm.com>
Date: Wed, 10 Oct 2012 12:22:52 +1300
Local: Tues, Oct 9 2012 7:22 pm
Subject: Re: Filtered replication performance
Thanks Dave! Useful approaches to consider there.

Layton

On 10/10/2012, at 11:27 AM, Dave Cottlehuber wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jens Alfke  
View profile  
 More options Oct 11 2012, 1:48 pm
From: Jens Alfke <j...@couchbase.com>
Date: Thu, 11 Oct 2012 10:48:13 -0700
Local: Thurs, Oct 11 2012 1:48 pm
Subject: Re: Filtered replication performance

On Oct 9, 2012, at 3:27 PM, Dave Cottlehuber <d...@jsonified.com<mailto:d...@jsonified.com>> wrote:

1. Put an intermediary, separate DB per end user to use a replication
point for the TouchDB. Then let the server do the hard work of
filtering from the master copy back to the endusers.

This is pretty much a requirement for those apps that need to read-protect data: if user A shouldn’t be able to see user B’s data, there’s no effective way to enforce that in CouchDB if the documents are in the same database (well, without encrypting the documents.)

2. Set up a view that implements that filter. The key advantage is
that the view is pre-sorted and pre-calculated so your request to get
the relevant doc IDs is a nice tidy fast range query. Then you use
named replication to pull them down to the device. The downside?
TouchDB doesn't support named replication. Maybe Jens feels like
adding this in some future release, I think this is a very useful
feature for mobile.

Um, yeah, I think this fell between the cracks somehow. I think on the pull side it should be pretty trivial to implement; just another URL parameter on the _changes feed. If so I could squeeze it in before 1.0.

—Jens


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jens Alfke  
View profile  
 More options Oct 11 2012, 1:53 pm
From: Jens Alfke <j...@couchbase.com>
Date: Thu, 11 Oct 2012 10:53:56 -0700
Local: Thurs, Oct 11 2012 1:53 pm
Subject: Re: Filtered replication performance

On Oct 9, 2012, at 3:27 PM, Dave Cottlehuber <d...@jsonified.com<mailto:d...@jsonified.com>> wrote:

2. Set up a view that implements that filter. The key advantage is
that the view is pre-sorted and pre-calculated so your request to get
the relevant doc IDs is a nice tidy fast range query. Then you use
named replication to pull them down to the device.

Is this documented anywhere? I’m looking through the API docs<http://wiki.apache.org/couchdb/Complete_HTTP_API_Reference> on the wiki, and tried some random googling too, but can’t find any reference to a _changes or replication parameter that specifies a view. That’s probably why I never implemented it o_O

—Jens


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Cottlehuber  
View profile  
 More options Oct 12 2012, 5:53 am
From: Dave Cottlehuber <d...@jsonified.com>
Date: Fri, 12 Oct 2012 11:53:15 +0200
Local: Fri, Oct 12 2012 5:53 am
Subject: Re: Filtered replication performance
On 11 October 2012 19:53, Jens Alfke <j...@couchbase.com> wrote:

> On Oct 9, 2012, at 3:27 PM, Dave Cottlehuber <d...@jsonified.com> wrote:

> 2. Set up a view that implements that filter. The key advantage is
> that the view is pre-sorted and pre-calculated so your request to get
> the relevant doc IDs is a nice tidy fast range query. Then you use
> named replication to pull them down to the device.

> Is this documented anywhere? I’m looking through the API docs on the wiki,
> and tried some random googling too, but can’t find any reference to a
> _changes or replication parameter that specifies a view. That’s probably why
> I never implemented it o_O

Aah that's not quite what I meant, sorry for sending you on a
misguided google hunt!

The missing feature I was referring to was named doc replication. Hope
that makes more sense.

A+
Dave


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »