Best way to pull a small subset of related documents via their IDs

193 views
Skip to first unread message

Toby UP

unread,
Aug 27, 2014, 12:13:11 PM8/27/14
to mobile-c...@googlegroups.com
I have been developing an app using Couchbase Server, Sync Gateway and iOS SDK and having been through the documentation several times am still confused as to the best way to tackle a particular use case so I really would appreciate some feedback to get some clarity.

I have a document of type 'object' which may be owned by a 'user' document via an 'owned_by' property on the 'object' document. All 'object' documents also have a 'latitude' and 'longitude' property and a 'source_ref' property that references another document of type 'source'.

In iOS I am querying 'object' documents using 2 different views and then setting them up as live queries that run in the background. The first view map function queries all documents where the 'owned_by' property contains the current users email and the second uses the boundingbox function to get all documents within close proximity to the user via their 'latitude' and 'longitude' properties. The properties are constantly changing and thus the live query will potentially be returning new 'object' documents several times a minute.

This all works fine but the issue I am having is that every time the app pulls all the document of type 'object' using these live query views, I need to then use their 'source_ref' property to go and pull a very small subset of 'source' documents specifically by their IDs from the server. Up to now I have just been pulling all the 'source' documents so I have them to hand when I need them but there could over time potentially be tens of thousands of documents of type 'source' and I am only interested in pulling a very small subset by their ID every time the live queries return results.

It is my understanding that I can't create a dynamic condition for a view (i.e. the 'source_ref' properties of the 'object' documents as they are returned by a live query) so is there a simple way I am missing that this can be achieved using views map reduce function or can I simply create an array of the document ID's I want to pull from the server to my local db and query them one by one in the background? I've looked at the documentation on simulating relationships but this seems a very SQL approach and maybe not the best way to achieve want I want to do.

Any advice on how you would approach this would be much appreciated.

Toby

Alexander Gabriel

unread,
Aug 28, 2014, 12:03:25 PM8/28/14
to mobile-c...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "Couchbase Mobile" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mobile-couchba...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mobile-couchbase/56909e25-2235-4e1c-8db5-8a83b8163374%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jens Alfke

unread,
Aug 28, 2014, 12:37:28 PM8/28/14
to mobile-c...@googlegroups.com

On Aug 27, 2014, at 9:13 AM, Toby UP <tob...@gmail.com> wrote:

In iOS I am querying 'object' documents using 2 different views and then setting them up as live queries that run in the background. The first view map function queries all documents where the 'owned_by' property contains the current users email and the second uses the boundingbox function to get all documents within close proximity to the user via their 'latitude' and 'longitude' properties. The properties are constantly changing and thus the live query will potentially be returning new 'object' documents several times a minute.

I'm sort of confused by that last statement. If the live query is returning new results, that implies the database changed. But from your use case it sounds like it's the user's location that changes. That wouldn't cause the live query to update; instead you'd presumably be running a new query with different lat/lon coords.

It is my understanding that I can't create a dynamic condition for a view (i.e. the 'source_ref' properties of the 'object' documents as they are returned by a live query) so is there a simple way I am missing that this can be achieved using views map reduce function or can I simply create an array of the document ID's I want to pull from the server to my local db and query them one by one in the background?

I think there are two different issues here (or maybe I'm just confused.)

First there's the issue of joins. You can't directly join from the object docs to the source docs. You'll need to collect the set of source_ref values and then get those documents. If they're local, just call [database documentWithID:] for each one. (CBLDatabase has a document cache and will never instantiate multiple CBLDocument objects for the same ID.)

Second, it sounds like you want to pull the source documents lazily from the server instead of replicating the entire data set to the device. You'll need to do that by setting the documentIDs property of the replication. Two things to keep in mind:
  1. The replicator only checks this property when the replication starts. That means that, if you change the property on a running CBLReplication, you'll need to stop and re-start it.
  2. If you want to keep getting updates of all the source documents you've downloaded, you'll need to persistently store the entire set of source document IDs somewhere so you can set the replication's documentIDs on re-launch of the app.
(I know this isn't ideal, especially condition 2. Our architecture isn't yet optimized for keeping a subset of the server database; that's a high priority feature for version 2.0.)

Hope this helps. If I've misunderstood your issues, please clarify!

—Jens

Toby UP

unread,
Sep 1, 2014, 9:47:41 AM9/1/14
to mobile-c...@googlegroups.com


On Thursday, 28 August 2014 17:37:28 UTC+1, Jens Alfke wrote:

On Aug 27, 2014, at 9:13 AM, Toby UP <tob...@gmail.com> wrote:

In iOS I am querying 'object' documents using 2 different views and then setting them up as live queries that run in the background. The first view map function queries all documents where the 'owned_by' property contains the current users email and the second uses the boundingbox function to get all documents within close proximity to the user via their 'latitude' and 'longitude' properties. The properties are constantly changing and thus the live query will potentially be returning new 'object' documents several times a minute.

I'm sort of confused by that last statement. If the live query is returning new results, that implies the database changed. But from your use case it sounds like it's the user's location that changes. That wouldn't cause the live query to update; instead you'd presumably be running a new query with different lat/lon coords.

First thank you so much for getting back to me about this. Your help in understanding is very much appreciated.

Yes, your presumption above is correct. Each time the users location changes a new query is generated based on the users new latitude / longitude coordinates. The reason it is set up as a liveQuery each time is that other properties relating to these document objects can be changed by other users and so these changes will then trigger the local document to be updated on the device as they happen.


It is my understanding that I can't create a dynamic condition for a view (i.e. the 'source_ref' properties of the 'object' documents as they are returned by a live query) so is there a simple way I am missing that this can be achieved using views map reduce function or can I simply create an array of the document ID's I want to pull from the server to my local db and query them one by one in the background?

I think there are two different issues here (or maybe I'm just confused.)

First there's the issue of joins. You can't directly join from the object docs to the source docs. You'll need to collect the set of source_ref values and then get those documents. If they're local, just call [database documentWithID:] for each one. (CBLDatabase has a document cache and will never instantiate multiple CBLDocument objects for the same ID.)

The majority of the 'source' documents won't exist locally. However if they don't and I make a call to the local database via [database documentWithID:] will it create a blank document and then subsequently pull an existing document with that ID from the server seeing as it is now created locally. Each 'source' document has a custom predefined '_id'  property as it relates to third party data it's referencing so I can guarantee there should only be one 'source' document with the _id I specify.

Also just to clarify there is no way to do what Alexander suggested in the reply above (have you tried joining with views? http://docs.couchdb.org/en/latest/couchapp/views/joins.html) and create a view that includes the 'source' document (in full) as an adjacent row for each 'object' document row using it's 'source_ref' property every time the live query returns results? 


Second, it sounds like you want to pull the source documents lazily from the server instead of replicating the entire data set to the device. You'll need to do that by setting the documentIDs property of the replication. Two things to keep in mind:
  1. The replicator only checks this property when the replication starts. That means that, if you change the property on a running CBLReplication, you'll need to stop and re-start it.
  2. If you want to keep getting updates of all the source documents you've downloaded, you'll need to persistently store the entire set of source document IDs somewhere so you can set the replication's documentIDs on re-launch of the app.
(I know this isn't ideal, especially condition 2. Our architecture isn't yet optimized for keeping a subset of the server database; that's a high priority feature for version 2.0.)

As there will potentially be an exponential amount of documents created on the server database that will never need to be seen/accessed locally by individual devices (as they are purely based on the users geo-location) I feel the only way to safeguard future performance from unnecessary replications and an enormous local database on every device, is to only pull a subset of the relevant data from the server rather than replicating all data to the local device. I originally thought perhaps 'channels' would be the answer but the parameters for what documents need to be replicated by a device are so variable and ever changing (being geo based) that I can't see how channels could be used for this.

So I have two questions regarding the above:

1. For my understanding, when creating a view within iOS where does it get actioned exactly - on the device or on the server? For example if not using channels does sync_gateway attempt to push replications of all documents to the device and then the device has to constantly update it's locally iOS specified views to do any filtering? If there is a lot of data on the server will each device suffer performance wise with this approach seeing as it only ever needs a small subset of data?

2. Is the solution to do as you suggest above specify documentID properties for replications. What is involved exactly in this if the documents that need accessing are constantly changing as the users location updates etc. will I be continually destroying and recreating replications for this to work?

Many thanks again for your help.

Jens Alfke

unread,
Sep 1, 2014, 1:53:57 PM9/1/14
to mobile-c...@googlegroups.com

On Sep 1, 2014, at 6:47 AM, Toby UP <tob...@gmail.com> wrote:

The majority of the 'source' documents won't exist locally. However if they don't and I make a call to the local database via [database documentWithID:] will it create a blank document and then subsequently pull an existing document with that ID from the server seeing as it is now created locally.

Calling -documentWithID doesn't create a document in the database; it just returns an empty instance of CBLDocument. The document will only be added to the database if you subsequently save it.

Also, the pull replicator doesn't care whether a document exists already locally: it just downloads revisions created since the last replication, which match the filter (if any). By creating  a blank doc locally, all you're doing is creating a conflict with the real doc that gets pulled in, and it'll be random which of those revisions 'wins' the conflict.

If you have a set of doc IDs that you want to pull immediately, you should create a new one-shot pull replication and set its documentIDs property. Thereafter, to keep the docs up to date, you should run a continuous replication whose documentIDs is set to all the documents that you've ever brought in this way. (This is awkward but as I said, there isn't a clean way to do it yet.)

—Jens

Jens Alfke

unread,
Sep 1, 2014, 1:56:24 PM9/1/14
to mobile-c...@googlegroups.com

On Sep 1, 2014, at 6:47 AM, Toby UP <tob...@gmail.com> wrote:

Also just to clarify there is no way to do what Alexander suggested in the reply above (have you tried joining with views? http://docs.couchdb.org/en/latest/couchapp/views/joins.html) and create a view that includes the 'source' document (in full) as an adjacent row for each 'object' document row using it's 'source_ref' property every time the live query returns results? 

That feature is supported.

—Jens

Toby UP

unread,
Sep 1, 2014, 5:45:57 PM9/1/14
to mobile-c...@googlegroups.com
Ok thank you I understand. I think I may have to find another way rather than keeping a persistent record of every document received locally using a specific . Perhaps including the full dictionary of the 'source' document within the 'object' document until another solution comes to light. I think it could get very messy otherwise. Alternatively perhaps I could use CouchQuery's linked document functionality in my view as mentioned below to retrieve 'source' document data when querying 'object' documents.

Toby UP

unread,
Sep 1, 2014, 5:56:04 PM9/1/14
to mobile-c...@googlegroups.com
I have just been trying this and am not getting the full document dictionary.

My view emit looks like this:

emit(@[doc[@"_id"], @"1"], @{@"_id": doc[@"source_ref"]});

And this is what I'm getting when iterating through my rows:

CBLQueryRow[key=["321D285B-E684-43FF-AA84-B3661732A654","1"]; value={"_id":"68B3E7A6-5391-449B-8D83-17C83EA7BBDC"}; id=321D285B-E684-43FF-AA84-B3661732A654] 

There is obviously a lot more data in the 'source' document than just the _id so it doesn't seem to be getting the full document. Am I missing something? Do I have to set the equivalent of include_docs=true on the CBLQuery in iOS or should it be picking up that it's a linked document automatically by the fact I am using '_id'. This is using a liveQuery incase that makes any difference.

Jens Alfke

unread,
Sep 1, 2014, 6:55:30 PM9/1/14
to mobile-c...@googlegroups.com

On Sep 1, 2014, at 2:56 PM, Toby UP <tob...@gmail.com> wrote:

Do I have to set the equivalent of include_docs=true on the CBLQuery in iOS

Yup. That's called "preload" in the CBLQuery API. Then CBLQueryRow.document will return the linked document.

—Jens

Alexander Gabriel

unread,
Sep 2, 2014, 12:00:05 PM9/2/14
to mobile-c...@googlegroups.com
in vanilla couchdb I have to add include_docs=true
don't know for sure about couchbase lite but I bet it's the same



--
You received this message because you are subscribed to the Google Groups "Couchbase Mobile" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mobile-couchba...@googlegroups.com.

Toby UP

unread,
Sep 12, 2014, 9:06:28 AM9/12/14
to mobile-c...@googlegroups.com
Excellent thanks Jens, that worked great btw. I think the property is 'prefetch' in iOS as that seemed to work.

Thanks again.

Toby UP

unread,
Sep 12, 2014, 9:10:24 AM9/12/14
to mobile-c...@googlegroups.com
Thanks Alexander,

Yes I did have to set the equivalent of include_docs=true to get it working.

For anyone else interested, in iOS SDK you have to set the boolean 'prefetch' property on the CBLQuery to  YES for joining to work.


On Tuesday, 2 September 2014 17:00:05 UTC+1, Alexander Gabriel wrote:
in vanilla couchdb I have to add include_docs=true
don't know for sure about couchbase lite but I bet it's the same



2014-09-01 23:56 GMT+02:00 Toby UP <tob...@gmail.com>:
On Monday, 1 September 2014 18:56:24 UTC+1, Jens Alfke wrote:

On Sep 1, 2014, at 6:47 AM, Toby UP <tob...@gmail.com> wrote:

Also just to clarify there is no way to do what Alexander suggested in the reply above (have you tried joining with views? http://docs.couchdb.org/en/latest/couchapp/views/joins.html) and create a view that includes the 'source' document (in full) as an adjacent row for each 'object' document row using it's 'source_ref' property every time the live query returns results? 

That feature is supported.

—Jens


I have just been trying this and am not getting the full document dictionary.

My view emit looks like this:

emit(@[doc[@"_id"], @"1"], @{@"_id": doc[@"source_ref"]});

And this is what I'm getting when iterating through my rows:

CBLQueryRow[key=["321D285B-E684-43FF-AA84-B3661732A654","1"]; value={"_id":"68B3E7A6-5391-449B-8D83-17C83EA7BBDC"}; id=321D285B-E684-43FF-AA84-B3661732A654] 

There is obviously a lot more data in the 'source' document than just the _id so it doesn't seem to be getting the full document. Am I missing something? Do I have to set the equivalent of include_docs=true on the CBLQuery in iOS or should it be picking up that it's a linked document automatically by the fact I am using '_id'. This is using a liveQuery incase that makes any difference.

--
You received this message because you are subscribed to the Google Groups "Couchbase Mobile" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mobile-couchbase+unsubscribe@googlegroups.com.

Rajagopal V

unread,
Sep 13, 2014, 4:52:07 AM9/13/14
to mobile-c...@googlegroups.com
Apologize for asking on an existing thread, but I have a similar problem.

On Monday, September 1, 2014 11:23:57 PM UTC+5:30, Jens Alfke wrote:


If you have a set of doc IDs that you want to pull immediately, you should create a new one-shot pull replication and set its documentIDs property. Thereafter, to keep the docs up to date, you should run a continuous replication whose documentIDs is set to all the documents that you've ever brought in this way. (This is awkward but as I said, there isn't a clean way to do it yet.)


I have a continuous replication that brings some specified "types" of documents. Now when the user does a Search on his mobile for other "types", there is a REST call that goes to my PHP server that returns document ids back to the mobile(these are not yet available in the mobile). If I were to do a one-shot pull replication using the returned ids, I assume that those will be available in the mobile and subsequently be replicated continuously when I update them through the mobile. 

Is my assumption valid ? ( I was going to try it this afternoon, but wanted to check if it was possible).

Thanks
Raja


  

Jens Alfke

unread,
Sep 13, 2014, 12:47:22 PM9/13/14
to mobile-c...@googlegroups.com

On Sep 13, 2014, at 1:52 AM, Rajagopal V <raja...@gmail.com> wrote:

I have a continuous replication that brings some specified "types" of documents. Now when the user does a Search on his mobile for other "types", there is a REST call that goes to my PHP server that returns document ids back to the mobile(these are not yet available in the mobile). If I were to do a one-shot pull replication using the returned ids, I assume that those will be available in the mobile and subsequently be replicated continuously when I update them through the mobile. 

In 1.0.2 the answer is no; a one-shot replication will only update the documents once. To keep all the docs you've pulled fresh, you'll have to keep track of all of their IDs (persistently) and run a continuous pull replication with those doc IDs.

Now, on the iOS branch feature/subset I've added a couple of new methods on CBLReplication to make what you're doing easier:

  • -createQueryOfRemoteView: returns a CBLQuery that will access a view on the server. And if you get the .document property of any of the returned query rows, that document will be pulled asynchronously. (So be prepared for it to have no properties initially, until the download completes.) Note: This works with CouchDB and Cloudant, but to use it with Sync Gateway you need to build its feature/query_api branch.
  • -pullDocumentIDs: will immediately pull the documents with the given IDs.
  • Setting a pull replication's .customProperties property to @{@"add_docs": @NO} will prevent that replication from adding documents to the local database. So if you run a continuous replication with this property, but no filters or docIDs, it will keep all the docs you've downloaded in sync (which is what you asked for above.)

Of course this is experimental, we haven't committed to ship it, the API might change, etc. But I'd love for people to try it and give feedback.

—Jens

Rajagopal V

unread,
Sep 13, 2014, 1:34:30 PM9/13/14
to mobile-c...@googlegroups.com


On Saturday, September 13, 2014 10:17:22 PM UTC+5:30, Jens Alfke wrote:

On Sep 13, 2014, at 1:52 AM, Rajagopal V <raja...@gmail.com> wrote:

I have a continuous replication that brings some specified "types" of documents. Now when the user does a Search on his mobile for other "types", there is a REST call that goes to my PHP server that returns document ids back to the mobile(these are not yet available in the mobile). If I were to do a one-shot pull replication using the returned ids, I assume that those will be available in the mobile and subsequently be replicated continuously when I update them through the mobile. 

In 1.0.2 the answer is no; a one-shot replication will only update the documents once. To keep all the docs you've pulled fresh, you'll have to keep track of all of their IDs (persistently) and run a continuous pull replication with those doc IDs.


Thanks Jens. Is it possible to run another continuous replication with just the docIds set to the ones that I want to replicate. So, one of my continuous replications will pull specific "types" and the other one(or more) will run with specified document Ids. 

I will try out based on the branch feature you mentioned below but I have to get this working on both iOS and Android, so am looking for something that will work on both.

Thanks
Raja

Rajagopal V

unread,
Sep 18, 2014, 1:36:56 PM9/18/14
to mobile-c...@googlegroups.com

On Saturday, September 13, 2014 10:17:22 PM UTC+5:30, Jens Alfke wrote:

In 1.0.2 the answer is no; a one-shot replication will only update the documents once. To keep all the docs you've pulled fresh, you'll have to keep track of all of their IDs (persistently) and run a continuous pull replication with those doc IDs.

Now, on the iOS branch feature/subset I've added a couple of new methods on CBLReplication to make what you're doing easier:


I think the feature/query branch contains the  createQueryOfRemoteView as well as the other 2 methods (which  were in the feature/subset branch). Im trying to do a similar thing in Android since Im more comfortable with Java than Obj. C. I will post results when I make some more progress.

In CBLRemoteQuery, the live query mechanism is not yet implemented. If I were to bring in documents in the future that satisfy the view, would that be the way to go forward. For e.g. I have a view that emits certain sets of documents, which I can query using a start/endkey. If I were to pull future documents that match the start/endkey in the future, the mobile doesnt know about it unless it does another createQueryOfRemoteView. Would a live query be the only solution to this? Is there another way to solve the problem?

Thanks
Raja

Jens Alfke

unread,
Sep 18, 2014, 2:18:13 PM9/18/14
to mobile-c...@googlegroups.com

On Sep 18, 2014, at 10:36 AM, Rajagopal V <raja...@gmail.com> wrote:

In CBLRemoteQuery, the live query mechanism is not yet implemented.

Right. The problem is that it's a lot more expensive to do this over the network than locally. The implementation will need to poll the remote database checking to see when the latest-sequence value changes; when it does, it will need to re-issue the query. If the remote database changes frequently, this could result in constant activity.

I'm not saying we won't implement it, just that it needs some more thought. For now, if you use this I would design your UI with an explicit Refresh or Search button that re-issues the query, instead of trying to do it automatically.

—Jens
Reply all
Reply to author
Forward
0 new messages