Excessive meta data created through CBL REST API

43 views
Skip to first unread message

Chris Fuentes

unread,
Jul 1, 2014, 6:51:07 PM7/1/14
to couc...@googlegroups.com
We are using couchbase lite, and we have to frequently update certain types of documents. After a while, the end result is that in the actual remote CB database, we will have several hundred or thousands of revisions of a document. We run compaction daily to get rid of old revisions, as we do not need them for our application, however the document size of the current revisions have huge amounts of meta data pertaining to the old (deleted) revisions, which we do not need. E.g., it will list the old bodies of the document for several hundred deleted revisions. 

Is there a clean and automated way to get rid of all this extra metadata?

(N.B. this is all under the "_sync.history" field). 

Todd Freese

unread,
Jul 1, 2014, 9:09:39 PM7/1/14
to couc...@googlegroups.com
I'm no expert and have not tried this, but this is from CBLDatabase.h.

/** The maximum depth of a document's revision tree (or, max length of its revision history.)

    Revisions older than this limit will be deleted during a -compact: operation. 

    Smaller values save space, at the expense of making document conflicts somewhat more likely. */

@property NSUInteger maxRevTreeDepth;


Maybe Jens can provide a little more detail on this.


T

Aliaksey Kandratsenka

unread,
Jul 1, 2014, 9:21:54 PM7/1/14
to couc...@googlegroups.com
BTW there's small chance that Jens is not listening on this ML but is only active on couchbase mobile mailing list. I.e. here: https://groups.google.com/forum/#!forum/mobile-couchbase

Chris Fuentes

unread,
Jul 2, 2014, 12:28:26 AM7/2/14
to couc...@googlegroups.com
The problem is not with CouchBaseLite to my knowledge - I'm talking about in the actual CouchBase database, there are excessive metadata auto generated as a result of using the CBL API.

Even if there is some constant in the app settings that I can change, this wouldn't resolve the issue for using the http REST api to modify documents.

Chris Fuentes

unread,
Jul 2, 2014, 1:38:08 PM7/2/14
to couc...@googlegroups.com
For example, all of my documents start sort of like this:

{
  "_sync": {
    "rev": "207-11dfb93b2901d71a3da15d5802e03e0a",
    "flags": 24,
    "sequence": 80270,
    "history": {
      "revs": [
        "140-00d8f5719bb00155a892a14b396ff65b",
        "55-46969ad25b17cf2004832754a298b194",
        "71-2a86d746ecf72b262aa671b5ec1eb33c",
        "181-3d2313bf2387e7fcbae8061e5a45c78f",
        "59-af47aba983d300ef087226623a9d15ed",
        "141-81d651e967cd5aefaa56febf44ce7653",
        "90-7ba7f24c0446ebb845a55787b9438273",
        "125-b55e591f8a805858a636692d29d2ecd6",
        "202-d901d9545765d66ccb20b21e70fc9eb5",
        "133-9b30d03a7ee83269c351f6b474689f93",
        "119-0f19e66a796cfbed3fa69e0d398da529",
        "178-c37d6f45e62598e0e295324bdacbd7bc",
        "114-f000a8de00dc1416ad7b434090fde08d",
        "6-4a3c9cb9d43754372d88f14cc07fcbe5",
        "109-9a6908f3487591d7db85f86d1c8f50a5",
        "192-c410fb168ec5a8e7331bcb0cb00aaf30",
        "41-2b504d9f4aebcba50f79c0b8c23d0e48",
        "161-d6848144bd4422727dc89050aca09b84",
        "67-a726721154900de3c45a7b5bd973b165",

etc...
And have literally hundreds of lines of historic metadata that we don't need. 

Matt Ingenthron

unread,
Jul 2, 2014, 1:42:09 PM7/2/14
to couc...@googlegroups.com, Jens Alfke, J Chris Anderson
Hi Chris,

(cc¹ing some other folks)

On 7/1/14, 9:28 PM, "Chris Fuentes" <ch...@crowdcomfort.com> wrote:

>The problem is not with CouchBaseLite to my knowledge - I'm talking about
>in the actual CouchBase database, there are excessive metadata auto
>generated as a result of using the CBL API.

This is true, however it¹s really part-and-parcel of the Couchbase Mobile
team, including Couchbase Lite and SyncGateway. That¹s what provides the
REST interface you mention.

>
>Even if there is some constant in the app settings that I can change,
>this wouldn't resolve the issue for using the http REST api to modify
>documents.

Hopefully Jens and JChris can help address how they expect to handle this
with SyncGatway on Couchbase Server.

Thanks,

Matt

--
Matt Ingenthron
Couchbase, Inc.




Matt Ingenthron

unread,
Jul 2, 2014, 1:44:29 PM7/2/14
to couc...@googlegroups.com, Jens Alfke, J Chris Anderson
By the way, on re-reading that, I’m not 100% certain this is SyncGateway
related, but I believe it is from your last message. If you could clarify
Chris, that’d be great.

Thanks,

Matt

Chris Fuentes

unread,
Jul 2, 2014, 1:52:06 PM7/2/14
to couc...@googlegroups.com, je...@couchbase.com, jch...@couchbase.com
To be honest, I'm not sure what portion of the Couchbase stack it is most directly related to: 

CBL (mobile)/REST API is what generates and updates the documents, but as I understand it the sync_gateway is responsible for handling revisions/history. I would have expected the old revisions and the old metadata to be deleted during the compaction cycle that we run on the sync_gateway every 24 hours. While this does delete the old revisions of the documents, the metadata pertaining to the old documents is still present in the current revisions. 

So I'm not really sure which service  (couchbase server, sync_gateway, or CBL) is most responsible for the excessive bloating of metadata in the server. That's really what I'm trying to figure out. I was hoping there was just some configurable field somewhere on the Couchbase server or sync gateway that would stop keeping track of this old data, like "maintain_sync_history: false" or something. 

Chris Anderson

unread,
Jul 2, 2014, 3:25:11 PM7/2/14
to Chris Fuentes, couc...@googlegroups.com, Jens Alfke
It sounds like the revs_limit parameter is what you are looking for. I couldn't find it in the docs but you can see it in the code here: https://github.com/couchbase/sync_gateway/blob/master/src/github.com/couchbaselabs/sync_gateway/rest/config.go#L79

You might find more people working on this stuff in the mobile group: https://groups.google.com/forum/#!forum/mobile-couchbase

Note that the metadata is tracked so that when a mobile device that has been disconnected for a while reconnects, it can agree with everyone about where it's changes fit in the history. Setting rev stemming to a small value is fine when you know you won't have documents changed on more than one device. By default we ship with a large revs_limit parameter to err on the side of robustness.

Chris
--

Chris Anderson  @jchris

Chris Fuentes

unread,
Jul 2, 2014, 3:41:42 PM7/2/14
to couc...@googlegroups.com, ch...@crowdcomfort.com, je...@couchbase.com
Thanks Chris, that's really helpful. The corollary followup question would be this: 

What happens if a device is disconnected for some absurd amount of time and then reconnects to find all of it's revisions are gone from the DB? Even if the revs_limit param is quite high, it's possible someone could be hundreds or thousands of revisions behind if they disconnect their device for long enough. 
Reply all
Reply to author
Forward
0 new messages