ff3fffffffffffff7fff0fff03ff006400000000000007ff3ccbb80080000000
57dffffedff81ff8fff898787b780f7018103c1ee403c203c003c003b0033701
ffffff7fffffffffff1fff1ffe4fce2780118305f00184c780018001ffffffff
c00040007e0039000f001f004fc01ff93ff07ff1fff9fff1ffc0ff81ff80ff80
000007f80ffc1ffe3ffe3ffffffeaffe2ffe00380bf801f8017801c800f00000
8000800001f80607f003f807f767f7e9f5c1b821d80bc261fa81fe01ffffffff
fff8ffe77ecdff85e003c00ca00f901fe01fc067c03fa11931f1f1e0eca04411
fffcfbfcfb88fc00fdc0bfa0bf807f80fff043b045902394200420043c621f7e
e000ef803ff07ef058e049c0f1c0e070e0fcf0c24042806780f7ef7feb3bff7b
fffff93fa03ff81ff81ff20fe01fa01f121f101f043f041f001f400f001f801f
A problem we've encountered is that for the data we have, the HashDB size on disk grows exponentially against the number of records. With a mere 40,000 records, we're hitting close to 400 MB in size.
The problem seem to be specific to the data we use: if we use pure random data, the growth is linear against the number of records.
What we've done so far to optimise is to switch from HashDB to TreeDB, and we load 100k works at the time, with a defrag after each load. This seem to work more or less, but we're still encountering very large database sizes.
The tuning we do is that we set TLINEAR and tune_buckets to 200% of the maximum number of works we expect.
Any thoughts on this greatly appreciated.
Sincerely,
Jonas