Ok, so far, this looks exactly what I have for my hashes databases:
data_size: 557537537
disk_size: 1542664311
doc_count: 1298255
doc_del_count: 18
avg doc size: ~350 bytes
While there is 3 times disk_size/data_size ratio, this database
uncompactiable: CouchDB isn't able to get it to 500MB size, leaving it
at 1.5GB. This looks like a some "specifics" of underlying database
format which isn't able to rationale allocate huge amount of tiny
documents....But, CouchDB provides two interesting options to
configure database compaction: doc_buffer_size and checkpoint_after.
http://docs.couchdb.org/en/latest/config/compaction.html#database_compaction
By default they are have the following values:
checkpoint_after = 5242880
doc_buffer_size = 524288
And this makes my hashes database to stop at 1.5GB point. If I
multiple them both by 10, after compaction database size will be
~900MB - yay! If I do this again with the resulting config:
checkpoint_after = 524288000
doc_buffer_size = 52428800
Then database sizes will be much more better:
disk_size: 633688183
data_size: 556759808
Almost no overhead! Why this happens? Paul or Robert may correct me,
but it seems that the most of wasted space after compaction is
consumed by checkpoint headers and btree rebalance. Asking CouchDB to
make compaction checkpoints rarely and use bigger buffer for docs
allows it to build the resulting btree in the new database file in
more optimized way. As the downsize of such configuration, if your
compaction fails, it have to start from far and bigger buffer size
requires more memory to use.
Try to play with these options and see how they will affect on your databases.
P.S. This issue is eventually solved for upcoming 2.0 with default config.
--
,,,^..^,,,