CouchDB maintenance using Couchdbkit

Nils Breunese

unread,

Feb 1, 2011, 5:05:48 AM2/1/11

to couch...@googlegroups.com

Hi,

I have a small Python script which uses Couchdbkit (0.4.2) and which is run daily for CouchDB (0.10.1) maintenance. It gets a list of all databases and for each database it runs db.compact() and db.view_cleanup(). Now I learned that that db.compact() only compacts a database and not its views. Looking at the API docs I believe I can call db.compact(design_doc_name) to compact the views for a particular design document. How do I get a list of all design document names for a database so I can run compact() on all of them?

Thanks, Nils.
------------------------------------------------------------------------
VPRO www.vpro.nl
------------------------------------------------------------------------

Benoit Chesneau

unread,

Feb 1, 2011, 5:51:55 AM2/1/11

to couchdbkit

On Feb 1, 11:05 am, Nils Breunese <N.Breun...@vpro.nl> wrote:
> Hi,
>
> I have a small Python script which uses Couchdbkit (0.4.2) and which is run daily for CouchDB (0.10.1) maintenance. It gets a list of all databases and for each database it runs db.compact() and db.view_cleanup(). Now I learned that that db.compact() only compacts a database and not its views. Looking at the API docs I believe I can call db.compact(design_doc_name) to compact the views for a particular design document. How do I get a list of all design document names for a database so I can run compact() on all of them?
>

Hi,

To get lists of design documents you can do:

all_ddocs = db.all_docs(startkey="_design",
endkey="_design/"+u"\u9999",
include_docs=True)

Then to compact them use the db.compact method and give the dname as
kwargs:

db.compact(dname="someddocid")

That could be an interesting tool anyway, any chance you opensource
it ?

- benoît

Nils Breunese

unread,

Feb 1, 2011, 7:38:56 AM2/1/11

to couch...@googlegroups.com

Benoit Chesneau wrote:

> On Feb 1, 11:05 am, Nils Breunese <N.Breun...@vpro.nl> wrote:
>> I have a small Python script which uses Couchdbkit (0.4.2) and which is run daily for CouchDB (0.10.1) maintenance. It gets a list of all databases and for each database it runs db.compact() and db.view_cleanup(). Now I learned that that db.compact() only compacts a database and not its views. Looking at the API docs I believe I can call db.compact(design_doc_name) to compact the views for a particular design document. How do I get a list of all design document names for a database so I can run compact() on all of them?
>

> To get lists of design documents you can do:
>
> all_ddocs = db.all_docs(startkey="_design",
> endkey="_design/"+u"\u9999",
> include_docs=True)

Ah right, thanks. I just need the id's, so I can do without include_docs.

> Then to compact them use the db.compact method and give the dname as
> kwargs:
>
> db.compact(dname="someddocid")

This got me further:

----
design_docs = db.all_docs(startkey="_design", endkey="_design/"+u"\u9999")
design_doc_names = [ d["id"][8:] for d in design_docs ]
----

>
> That could be an interesting tool anyway, any chance you opensource
> it ?

Once I have it working correctly I'll see if I can get it up on Github or something. I'm getting some connection problems though, although that might be related to our new internal proxy (Squid) and I'd like to get rid of those first. My current version calls compact on all databases and design documents. We don't have too many databases and design documents in production, but since compaction is an async process I'm wondering if it's smart to start compaction on all db's and design docs at once. Any ideas about that? Should I make this a synchronous process maybe? I know I can check for compact_running on a database, is something like that also available for view compaction? Or should it be fine to start compaction on all db's and design docs at the same time?

Benoit Chesneau

unread,

Feb 1, 2011, 10:54:11 AM2/1/11

to couch...@googlegroups.com

On Feb 1, 2011, at 1:38 PM, Nils Breunese wrote:

> Benoit Chesneau wrote:
>
>> On Feb 1, 11:05 am, Nils Breunese <N.Breun...@vpro.nl> wrote:
>>> I have a small Python script which uses Couchdbkit (0.4.2) and which is run daily for CouchDB (0.10.1) maintenance. It gets a list of all databases and for each database it runs db.compact() and db.view_cleanup(). Now I learned that that db.compact() only compacts a database and not its views. Looking at the API docs I believe I can call db.compact(design_doc_name) to compact the views for a particular design document. How do I get a list of all design document names for a database so I can run compact() on all of them?
>>
>> To get lists of design documents you can do:
>>
>> all_ddocs = db.all_docs(startkey="_design",
>> endkey="_design/"+u"\u9999",
>> include_docs=True)
>
> Ah right, thanks. I just need the id's, so I can do without include_docs.
>
>> Then to compact them use the db.compact method and give the dname as
>> kwargs:
>>
>> db.compact(dname="someddocid")
>
> This got me further:
>
> ----
> design_docs = db.all_docs(startkey="_design", endkey="_design/"+u"\u9999")
> design_doc_names = [ d["id"][8:] for d in design_docs ]
> ----

You could do it in one pass I think::

def dname(row):
return row['id'][8:]

design_doc_names = list(db.all_docs(startkey="_design", endkey="_design/"+u"\u9999", wrapper=dname))

>
>>
>> That could be an interesting tool anyway, any chance you opensource
>> it ?
>
> Once I have it working correctly I'll see if I can get it up on Github or something. I'm getting some connection problems though, although that might be related to our new internal proxy (Squid) and I'd like to get rid of those first.

Which kind ? A new restkit version is coming today, we should improve connection reliability.

> My current version calls compact on all databases and design documents. We don't have too many databases and design documents in production, but since compaction is an async process I'm wondering if it's smart to start compaction on all db's and design docs at once. Any ideas about that? Should I make this a synchronous process maybe? I know I can check for compact_running on a database, is something like that also available for view compaction? Or should it be fine to start compaction on all db's and design docs at the same time?

I wouldn't start all compaction at the same time, it could use too much cpu or slow ios. Not sure what's the best though. It needs some tests I guess.

- benoît

Nils Breunese

unread,

Feb 1, 2011, 11:03:20 AM2/1/11

to couch...@googlegroups.com

I wrote:

>> That could be an interesting tool anyway, any chance you opensource
>> it ?
>
> Once I have it working correctly I'll see if I can get it up on Github or something.

I put the code on https://github.com/vpro/couchdb_maintenance

Benoit, could you take a look at it and tell me if this makes sense? If this looks ok to you I'll e-mail the CouchDB users mailinglist about it.

Nils Breunese

unread,

Feb 1, 2011, 11:47:25 AM2/1/11

to couch...@googlegroups.com

Benoit Chesneau wrote:

>> ----
>> design_docs = db.all_docs(startkey="_design", endkey="_design/"+u"\u9999")
>> design_doc_names = [ d["id"][8:] for d in design_docs ]
>> ----
>
> You could do it in one pass I think::
>
> def dname(row):
> return row['id'][8:]
>
> design_doc_names = list(db.all_docs(startkey="_design", endkey="_design/"+u"\u9999", wrapper=dname))

I used a lambda now. :o)

----
for design_doc_name in list(db.all_docs(startkey="_design", endkey="_design0", wrapper=lambda row: row['id'][len('_design/'):])):
logging.info("Compacting design document %s..." % design_doc_name)
db.compact(dname=design_doc_name
----

I'm not sure readability is better though. That looks like a pretty complex smiley at the end of that first line. :o)

>> I'm getting some connection problems though, although that might be related to our new internal proxy (Squid) and I'd like to get rid of those first.
>
> Which kind ? A new restkit version is coming today, we should improve connection reliability.

Seems like the server's HTTP proxy was causing problems unrelated to Couchdbkit.

> I wouldn't start all compaction at the same time, it could use too much cpu or slow ios. Not sure what's the best though. It needs some tests I guess.

We run this maintenance script nightly and haven't had problems with CPU or I/O, but yes, it might be better to not compact everything all at once. If anyone has a better idea that solves this in a generic way? Random sleeps could help, but don't sound like a really generic solution.

Benoit Chesneau

unread,

Feb 1, 2011, 11:55:29 AM2/1/11

to couch...@googlegroups.com

On Feb 1, 2011, at 5:47 PM, Nils Breunese wrote:

> Benoit Chesneau wrote:
>
>>> ----
>>> design_docs = db.all_docs(startkey="_design", endkey="_design/"+u"\u9999")
>>> design_doc_names = [ d["id"][8:] for d in design_docs ]
>>> ----
>>
>> You could do it in one pass I think::
>>
>> def dname(row):
>> return row['id'][8:]
>>
>> design_doc_names = list(db.all_docs(startkey="_design", endkey="_design/"+u"\u9999", wrapper=dname))
>
> I used a lambda now. :o)
>

:)

If you use it in a for loop, you can also remove the list, this is already an iterator. About the script I find it fine, maybe it could be improved with some kind of journalling (an option to set the logging handler may be enough), but I already find it useful as is :)

- benoît

Reply all

Reply to author

Forward