Bug? Database compaction keeps re-running continuously on CouchDB 1.4

101 views
Skip to first unread message

Calle Arnesten

unread,
Oct 4, 2013, 4:17:49 AM10/4/13
to us...@couchdb.apache.org
Hi,

I recently upgraded from CouchDB 1.2 to 1.4. I have noticed that the database compaction is running more or less all the time during the allowed compaction time. Is there a known issue for this with 1.4?

The compaction is completed on each run and the reported database size is smaller on the first run during the compaction time. But then it starts again for the same database, and when completed, starts again, etc. It's like it thinks that the database is still fragmented even if it's not.

The databases are quite large (~5GB), so it's not the case that many documents have had time to change during the compaction time.

These are my settings:
[{db_fragmentation, "20%"}, {view_fragmentation, "20%"}, {from, "03:00"}, {to, "11:00"}]

The harddrive is not full, it has about 70GB of free space.

I have a large percentage of deleted documents, if that might be a reason for the issue/bug.

I don't have the same problem for view compaction.

Best regards
Calle Arnesten

Calle Arnesten

unread,
Oct 5, 2013, 2:43:51 AM10/5/13
to us...@couchdb.apache.org
I tested to change the db_fragmentation to different levels. If I raise it to 70% the compaction stops, but for 60% and lower it keeps running all the time. 

So there seems to be something weird with how CouchDB calculates the fragmentation level. As I said, I have a large percentage of deleted documents in the database, so perhaps it is not including them correctly in the calculation? It could definitely be near 70% of the database size that is deleted documents.

Robert Newson

unread,
Oct 5, 2013, 4:26:42 AM10/5/13
to us...@couchdb.apache.org

It makes intuitive sense that setting that % too low will cause endless (and pointless) compactions (the ratio of disk_size to data_size exceeding your % immediately after compaction). I'm fairly sure, for example, that the data_size value does not include the space consumed by the many database footers in the file.

B.
signature.asc

Calle Arnesten

unread,
Oct 5, 2013, 10:33:18 AM10/5/13
to us...@couchdb.apache.org
Robert, thanks for your reply.

I wasn't aware of the database footers, and then I can understand that an endless compaction could happen if the value is set too low. But I get these endless loop even if I raise to as high as 60%. To me that's not intuitive.

Before, I had it set to 70% and then I didn't get these endless compaction loops, but then I in general consumed a lot more disk space than I do now.

To me, at least, it would be more intuitive if the number stood for how much unnecessary space that was allowed before compaction takes place. So for example if I had a 10GB database file and it was 20% fragmented, it would after compaction be 8GB and 0% fragmented. It might (?) be harder to calculate the numbers that way, but it would be much easier to reason about when configuring your database server.

/Calle
> Email had 1 attachment:
> + signature.asc
> 1k (application/pgp-signature)

Paul J Davis

unread,
Oct 7, 2013, 7:19:43 AM10/7/13
to us...@couchdb.apache.org, us...@couchdb.apache.org
IIRC we're not exactly right on the free space calculation but more importantly we also generate garbage while compacting. Specifically the id_tree updates cause a lot of fragmentation when docs are updated in a random order.

The compactor on the Nebraska-merge branch was rewritten to avoid this and was a significant improvement in many cases.

Calle Arnesten

unread,
Oct 7, 2013, 12:48:54 PM10/7/13
to us...@couchdb.apache.org
Thanks Paul! Then I will look forward to the compaction improvements in the Nebraska branch, as well as the other BigCouch stuff.

/Calle
Reply all
Reply to author
Forward
0 new messages