We are running memcachedb 1.2.1 on a 4GB memory machine with "-m 1836"
option, and it accumulated over 11GB data when it started to get
slower and slower. By this time we have about 12m data items. We
didn't want to shut down the site to do the repartitioning, so we ran
some scripts to delete about 1/3 of the data. But the data file sizes
didn't change.
-rw-r--r-- 1 dxu dxu 11796185088 Apr 12 07:55 data.db
-rw-r----- 1 dxu dxu 24576 Apr 11 18:02 __db.001
-rw-r----- 1 dxu dxu 76775424 Apr 11 18:02 __db.002
-rw-r----- 1 dxu dxu 1925185536 Apr 11 18:02 __db.003
-rw-r----- 1 dxu dxu 4259840 Apr 11 18:02 __db.004
-rw-r----- 1 dxu dxu 16252928 Apr 11 18:02 __db.005
-rw-r----- 1 dxu dxu 3891200 Apr 11 18:02 __db.006
We restarted memcachedb, and it is still the same. memcachedb is still
quite slow. It seems deleting the data items do not affect the data
file sizes or the performance. Is there an easy way to force
memcachedb to clean up the fragments in backend B-Tree and reduce the
data file size? Or what should we do to improve the performance in
this case?
A lot of thanks in advance.
-Donghua
Btree does not remve data actually when you delete an item. This is a
problem that current BerkeleyDB implementation has(or other btree
has). Several ways may be adopted to reduce the db size, but I am
afraid it will be offline or a bit complicated if you want it to be
online.
Offline: run a checkpoint, db_dump your database and db_load your
database. put the new dumped db into a new dir, start memcachedb with
the new db.(be careful with db name(-f <new db name>))
Online:Relay your write operations to a queue(such as memcacheq), read
from your original db. copy a database to another place, deal with
the copy, use db_dump and db_load as below. there will be a
inconsistency windows between read and write. when finished the new
db, load updates from mq, apply it to new db.
Hi, greg,
Any changes in bdb5.0 for btree disk reclaim? As I know the btree
compact api(DB->compact()) did not work well, and most of time fails
and blocks the db. What do you think?
Regards,
Steve
> --
> You received this message because you are subscribed to the Google Groups "memcachedb" group.
> To post to this group, send email to memca...@googlegroups.com.
> To unsubscribe from this group, send email to memcachedb+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/memcachedb?hl=en.
>
>
--
Best Regards,
Steve Chu
http://stvchu.org
A lot of work went into db->compact() between 4.8 and 5.0, it works for hash as well as btrees and if you have multiple databases in a single file we can move root pages around to compact those now (I forget, is this something you do in MemcacheDB?). Also, to the best of my knowledge it *worked* in 4.8.x (do you have a bug SR# or URL to reference for the issues you mention?). :)
We don't reclaim allocated disk space for performance reasons (really!). It is expensive to shrink a file, especially when it's about to grow again. So I'd not call this a problem, it's really an optimization. That's why we added compact(), for those cases where you explicitly wanted to incur the overhead of a) reclaiming the space and b) potentially re-allocating that space later. That said, there can be cases where performance degrades - this may be one of those.
To understand the performance problem mentioned by Donghua we really need to know more about the state of the system. Gather the statistics and post them somewhere. Also you might try using DTrace (or SystemTap) to find out what's really happening. We can help with all of this, start a thread on the OTN Forums for BDB and engineering/support will pick it up.
Thanks for asking, we're happy to help. :)
-greg
Product Manager, Oracle Berkeley DB
I guess Donghua thought that the large no-reclamation database file
makes the cache ineffective(so he want to reclaim the db size.). When
some item is deleted but it still occupies the page space where some
valid items may colocate. If the valid items is read, the whole page
is swap into memory pool. The deleted items comes along with valid
ones(because they are in same page) and do occupy the memory pool
space, right?
Besides, does DB->compact() block the whole database? I tried compact
api when I used bdb4.7, but after waiting for a long time, finally the
call fails because it got lock number limitation, and also it costs
lots of memory. Can compact api work with very large db files(GB
size)? What we want is no matter how large the db file is, it can
slowly run compaction, we can afford the service degration(for
example, we can run compaction at night), but it should not block the
whole db, or block much.
Regards,
Steve
The cache and the size of the on-disk database file are unrelated. To BDB the disk file is just a set of pages, the cache holds the most active pages so the newly emptied pages quickly evicted out of cache as more relevant pages are accessed. The DB->compact() API can happen in small chunks of the key-space thus reducing and bounding the locks required to finish the task as well preventing total database lock-up. It accomplishes two things separately, 1) returning free pages in the disk file to the OS and 2) optimizing the btree itself.
Why don't you ask these questions on the OTN Forum. Be specific, provide example code, you know the drill. ;-) Maybe the easiest thing would be to build an example, 'ex_compact.c' or something so that everyone can benefit from. Copy on ex_access.c and add code to illustrate the problem. Post that to the OTN BDBB Forum. :)
-greg
> > On Mon, Apr 12, 2010 at 10:48 PM, Greg Burd <GREG.B...@oracle.com>
> > >> On Mon, Apr 12, 2010 at 4:03 PM, Donghua Xu <dong...@gmail.com>