Attachment size and number

19 views
Skip to first unread message

devlist

unread,
Sep 26, 2017, 2:06:01 PM9/26/17
to Couchbase Mobile
We have some extra json data that is variable in size. It’s usually KB but in a few cases it will be a few hundred MB.

It’s an optional part of the document however syncing would be great.
I’ve been testing serialising it and storing as an attachment as this will sync using Sync Gateway and also allow on demand loading by the local app.

However it appears that the limit of 20MB applies to attachments if using Sync Gateway and a Couchbase bucket?

If correct this seems to negate some of the advantage of attachments ? Any plans for this to change?

Anyway assuming the limit is 20MB per attachment, then I thought a possible workaround would be to split larger data into smaller 10MB attachments.

I’ve tried and it seems to work however the upload speed is very slow.
Say I have a single local CBL document with 20 x 10MB attachments, it takes about 4 minutes to replicate to SG.
In Activity Monitor I can see that it uploaded about 2GB of data, 10 times what I expected.

So it looks like having multiple attachments per document causes it to upload a lot more data, possibly a bug ?

When I delete the local CBL store to force SG to send back all the data this only takes a few seconds to sync all the documents + attachments, so replicating back down is working fine.

Finally creating lots of docs each with a single attachment seems to work fine with syncing.

Thanks,

Martin

Jens Alfke

unread,
Sep 26, 2017, 4:11:30 PM9/26/17
to mobile-c...@googlegroups.com

On Sep 26, 2017, at 11:05 AM, devlist <dev...@mac.com> wrote:

However it appears that the limit of 20MB applies to attachments if using Sync Gateway and a Couchbase bucket?

Yes. Couchbase Server has a limit of 20MB for any document stored in a bucket. You’re correct that you can work around this by splitting attachments in pieces, since every attachment gets stored as a separate ‘document’ in the bucket.

If correct this seems to negate some of the advantage of attachments ? Any plans for this to change?

It’s not going to change at the server level; the server is highly optimized for speed and tries to keep data in RAM as much as possible, and allowing arbitrarily large documents would greatly complicate this.

In the long run we want to move Sync Gateway attachment storage outside of buckets, but there aren’t concrete plans to do this yet.

I’ve tried and it seems to work however the upload speed is very slow.
Say I have a single local CBL document with 20 x 10MB attachments, it takes about 4 minutes to replicate to SG.
In Activity Monitor I can see that it uploaded about 2GB of data, 10 times what I expected.

That definitely shouldn’t be happening. Data traffic should be approximately equal to the size of the JSON plus the total size of the attachments, as you’d expect. (The document will be uploaded via an HTTP PUT with a MIME multipart body, one part for the JSON and another part for each attachment.)

Could you file an issue on Github? (From your reference to Activity Monitor I’m guessing this is on Mac OS or iOS.)
Please include the steps to reproduce, including what sequence of API calls you used to construct this document, and at what point you started the replicator. Thanks.

—Jens

Jens Alfke

unread,
Sep 26, 2017, 4:17:20 PM9/26/17
to mobile-c...@googlegroups.com


On Sep 26, 2017, at 1:11 PM, Jens Alfke <je...@couchbase.com> wrote:

In the long run we want to move Sync Gateway attachment storage outside of buckets, but there aren’t concrete plans to do this yet.

I forgot to mention the usual workaround we suggest when this comes up as a problem: You can use an external data hosting service, such as Amazon S3, to store attachments. Actually you’re no longer using attachments as far as CBL is concerned. Instead the client first uploads the attachment data to the hosting service, then adds the resulting URL to the document as an ordinary property value. Other clients that receive the document can download the attachment via that URL. (The server doesn’t have to know or care about these attachments at all.)

It does end up being more work for the client implementation, but you get total flexibility as to where/how you store attachments, and no limits on data size.

—Jens

devlist

unread,
Sep 29, 2017, 10:35:20 AM9/29/17
to Couchbase Mobile
Thanks for the detailed reply which made me realise what I was doing.
I was making a fresh revision when adding each attachment. So each revision + new attachment was being uploaded along with all the previous attachments.

In my case it’s easy to add them all in one go and doing this then I now see the expected upload.

However if attachments are added later on then the entire thing will be re uploaded.
A suggested optimisation would be to check the previous revision and just upload attachments that have a new signature?

Anyway thanks again,

Martin

Jens Alfke

unread,
Sep 29, 2017, 12:36:35 PM9/29/17
to mobile-c...@googlegroups.com


On Sep 29, 2017, at 7:35 AM, devlist <dev...@mac.com> wrote:

However if attachments are added later on then the entire thing will be re uploaded.
A suggested optimisation would be to check the previous revision and just upload attachments that have a new signature?

It already does this. Each attachment tracks what revision it was added/modified in (it’s a “revpos” attribute in the JSON) and the replicator only transfers attachments that have a newer revpos.

So I don’t understand why the attachments would be transferred multiple times, unless you were re-adding every attachment in each revision. What does the document JSON look like, in the original code where you created multiple revisions?

—Jens

devlist

unread,
Sep 30, 2017, 9:10:01 AM9/30/17
to mobile-c...@googlegroups.com

When making multiple revisions I’m definitely only adding one new attachment to each new revision.

The revpos in docs all look fine, see attached screen shots for middle example with 4 attachments.
Have run multiple times and very consistently the more revisions with new attachments then the more extra data is uploaded.

2 x 7.6MB attachments in 1 revision upload 15MB
2 x 7.6MB attachments in 2 revisions upload 22MB

4 x 7.6MB attachments in 1 revision upload 29MB
4 x 7.6MB attachments in 4 revisions upload 76MB (see screen shots)

8 x 7.6MB attachments in 1 revision upload 58MB
8 x 7.6MB attachments in 8 revisions upload 274MB

This is on macOS

Martin

On 29 Sep 2017, at 17:36, Jens Alfke <je...@couchbase.com> wrote:

It already does this. Each attachment tracks what revision it was added/modified in (it’s a “revpos” attribute in the JSON) and the replicator only transfers attachments that have a newer revpos.

So I don’t understand why the attachments would be transferred multiple times, unless you were re-adding every attachment in each revision. What does the document JSON look like, in the original code where you created multiple revisions?

—Jens

Reply all
Reply to author
Forward
0 new messages