FYI: Deferred attachment downloading, in progress

121 views
Skip to first unread message

Jens Alfke

unread,
Aug 6, 2015, 4:42:50 PM8/6/15
to Couchbase Mobile
I’m working on support for deferring attachment downloads  — you’ll be able to tell a pull replication not to download attachment contents, and then you can request that individual attachments be downloaded. This feature has been requested many times.

My work so far is checked into the couchbxse-lite-ios repo as a branch named ‘feature/lazy_attachments’. You’re welcome to try it out or critique it. There are some preliminary docs here on the wiki. There’s also a C# pull request by a 3rd party that uses a somewhat different API.

This is preliminary and the API and features are subject to change. I’m particularly interested in which of the limitations/issues people consider to be deal-breakers or highest priority. Quoting from the wiki page:

• There's not currently any way to pull some attachments automatically but not others. It's all or nothing.
• Multiple calls to download the same attachment will issue redundant downloads, wasting network bandwidth. (Only one copy will be saved to disk, though.)
• Attachment downloading isn't as fault-tolerant as regular replication: if there's no network connectivity or the server isn't reachable, the request will immediately fail. If the request fails, it won't be retried automatically.
• If you retry an interrupted download, it starts over from the beginning instead of where it left off.
• There's no way yet to cancel or pause a download.
• There's no way to "un-download" an attachment, i.e. purge an attachment from local storage.

—Jens

Brett Harrison

unread,
Aug 7, 2015, 1:08:18 PM8/7/15
to Couchbase Mobile
Hello,

I submitted the C# pull request.  I believe I could alter my API to more closely match yours from what I have seen.  I would most likely wait until yours is more finalized.

Currently the major differences that I have vs. you features are:
- I don't currently expose a progress callback
- I do combine requests that would pull the same object based on the digest (so only one network request would be made)
- I throttle the number of simultaneous network requests to 4 (configurable), so that the network does not get choked with too many requests at once.

- I created a new replication "AttachmentPuller" to do this work instead of adding to the current Puller.

As a side note, I was able to point the attachment puller at an AWS Cloudfront that then got its data from the sync gateway.
This allows a quick CDN setup for the attachment data.

Jens Alfke

unread,
Aug 7, 2015, 2:06:10 PM8/7/15
to mobile-c...@googlegroups.com
On Aug 7, 2015, at 10:08 AM, Brett Harrison <brett.h...@zyamusic.com> wrote:

I submitted the C# pull request.  I believe I could alter my API to more closely match yours from what I have seen.  I would most likely wait until yours is more finalized.

Hi Brett! That sounds great.

Currently the major differences that I have vs. you features are:
- I don't currently expose a progress callback

I assumed that the ‘Task’ object, returned from your public method that starts the download, includes a way for the app to monitor progress. (There’s a similar Cocoa class called NSProgress which I’m thinking of using.) I’m not familiar with the .NET frameworks and I assume it’s a standard class?

- I do combine requests that would pull the same object based on the digest (so only one network request would be made)

Yup, I’ll be adding that.

- I throttle the number of simultaneous network requests to 4 (configurable), so that the network does not get choked with too many requests at once.

The Cocoa network framework already takes care of that; like a browser, it will open a limited number of sockets to a single host, and queues requests onto those sockets.

- I created a new replication "AttachmentPuller" to do this work instead of adding to the current Puller.

I didn’t see a need to change the public API to do this. The app should be able to use the same Replication it already uses to pull docs. It’s fine to use a new puller object internally, but that should be just an implementation detail.


I’m currently working on making it possible to resume interrupted downloads. This requires extending Sync Gateway to support “Range:” headers on requests for attachments; I’ve got that mostly complete.

Also, I found that when a response has “Content-Encoding: gzip” — as with an attachment that was originally stored in compressed form — there’s no way to prevent Cocoa’s NSURLConnection from unzipping the body. This is a problem since the attachment needs to be stored as-is, otherwise the digest breaks. I’m adding a “?content_encoding=false” query option to Sync Gateway’s attachment handler to disable using Content-Encoding for this; instead it’ll set the Content-Type to “application/gzip”.

—Jens

atom992

unread,
Aug 9, 2015, 10:43:51 PM8/9/15
to Couchbase Mobile
Is there a plan for android about the ‘feature/lazy_attachments’?

Jens Alfke

unread,
Aug 10, 2015, 7:09:46 PM8/10/15
to Couchbase Mobile

On Aug 9, 2015, at 7:43 PM, atom992 <yangzi...@gmail.com> wrote:

Is there a plan for android about the ‘feature/lazy_attachments’?

I don’t know the specific answer. In general, you can look at the project’s bug tracker on Github. If there’s an issue for this, look at what milestone it’s assigned to, and look for tags like “current sprint” or “backlog". If there isn’t an issue, feel free to submit one.

—Jens

Brett Harrison

unread,
Aug 10, 2015, 9:11:06 PM8/10/15
to Couchbase Mobile


On Friday, August 7, 2015 at 11:06:10 AM UTC-7, Jens Alfke wrote:

On Aug 7, 2015, at 10:08 AM, Brett Harrison <brett.h...@zyamusic.com> wrote:

I submitted the C# pull request.  I believe I could alter my API to more closely match yours from what I have seen.  I would most likely wait until yours is more finalized.

Hi Brett! That sounds great.

Currently the major differences that I have vs. you features are:
- I don't currently expose a progress callback

I assumed that the ‘Task’ object, returned from your public method that starts the download, includes a way for the app to monitor progress. (There’s a similar Cocoa class called NSProgress which I’m thinking of using.) I’m not familiar with the .NET frameworks and I assume it’s a standard class?

The Task object is basically a Job for a thread pool.  If I convert to your API, it would be removed. 

I do have a progress callback internally, but I didn't expose it because:
  - It would have to work properly with combined requests
  - I was undecided if it should include the data stream
  - We didn't need it yet
 

- I do combine requests that would pull the same object based on the digest (so only one network request would be made)

Yup, I’ll be adding that.

- I throttle the number of simultaneous network requests to 4 (configurable), so that the network does not get choked with too many requests at once.

The Cocoa network framework already takes care of that; like a browser, it will open a limited number of sockets to a single host, and queues requests onto those sockets.

- I created a new replication "AttachmentPuller" to do this work instead of adding to the current Puller.

I didn’t see a need to change the public API to do this. The app should be able to use the same Replication it already uses to pull docs. It’s fine to use a new puller object internally, but that should be just an implementation detail.

I may be able to merge the AttachmentPull back into the normal Puller, I just want to be able to make a Puller that only does attachments.  I don't want to have to call Start on it because I don't want it to try to sync the documents ever (I have another puller for that).  I plan to have the attachment puller point at a different URL that is a CDN (the CDN will pull from the original sync gateway as needed).

Anyway, I will take a look at eliminating the AttachmentPuller to keep the APIs the same.

Jens Alfke

unread,
Aug 12, 2015, 3:33:03 PM8/12/15
to Couchbase Mobile

On Aug 10, 2015, at 6:11 PM, Brett Harrison <brett.h...@zyamusic.com> wrote:

The Task object is basically a Job for a thread pool.  If I convert to your API, it would be removed. 

Not exactly sure what that means, but anyway it’s not required that we do this identically — the API spec assumes things like notifications, async calls or delegation will work differently between platforms.

I’ve now decided that in Cocoa it would be better to have the download method return an NSProgress object. This is more standardized and allows the caller to cancel or pause the download. (It will make observing progress a bit more complicated for the caller, though.)

I plan to have the attachment puller point at a different URL that is a CDN (the CDN will pull from the original sync gateway as needed).

Hm. That relates to another open issue, #78, although only on the downloading side. I take it you’d be uploading attachments to Sync Gateway as usual, just the downloads would be routed through the CDN?

This would be a really minor change to my code (altering the base URL for the CBLAttachmentDownloader) … it’s just a question of how to hook it into the API. It could at least go in the CBLReplication’s options dictionary.

—Jens

Brett Harrison

unread,
Aug 14, 2015, 1:03:39 PM8/14/15
to Couchbase Mobile


On Wednesday, August 12, 2015 at 12:33:03 PM UTC-7, Jens Alfke wrote:

On Aug 10, 2015, at 6:11 PM, Brett Harrison <brett.h...@zyamusic.com> wrote:

The Task object is basically a Job for a thread pool.  If I convert to your API, it would be removed. 

Not exactly sure what that means, but anyway it’s not required that we do this identically — the API spec assumes things like notifications, async calls or delegation will work differently between platforms.

I’ve now decided that in Cocoa it would be better to have the download method return an NSProgress object. This is more standardized and allows the caller to cancel or pause the download. (It will make observing progress a bit more complicated for the caller, though.)

I plan to have the attachment puller point at a different URL that is a CDN (the CDN will pull from the original sync gateway as needed).

Hm. That relates to another open issue, #78, although only on the downloading side. I take it you’d be uploading attachments to Sync Gateway as usual, just the downloads would be routed through the CDN?

Exactly any uploads would go directly to the gateway.
Reply all
Reply to author
Forward
0 new messages