LSM rate limits

38 views
Skip to first unread message

MARK CALLAGHAN

unread,
Oct 23, 2012, 7:37:30 PM10/23/12
to wiredtig...@googlegroups.com
Do you have any thoughts on applying rate limits to Put operations to
prevent an LSM from getting too many files? For now I have been adding
a simple hack into __clsm_put to get that behavior.

__clsm_put(...)
WT_SESSION_IMPL *session, WT_CURSOR_LSM *clsm, WT_ITEM *key, WT_ITEM *value)
{
WT_BTREE *btree;
WT_CURSOR *primary;
WT_DECL_RET;
WT_LSM_TREE *lsm_tree;
uint32_t *memsizep;

lsm_tree = clsm->lsm_tree;

while (lsm_tree->nchunks > 30) {
__wt_sleep(0, 100000);
}

...

--
Mark Callaghan
mdca...@gmail.com

Michael Cahill

unread,
Oct 24, 2012, 7:05:35 AM10/24/12
to wiredtig...@googlegroups.com
Hi Mark,

> Do you have any thoughts on applying rate limits to Put operations to prevent an LSM from getting too many files?

I understand the idea, but the fundamental issue with the current implementation is that some merges can take a long time to complete. That is, there is no upper limit on the size of a merge in the way there is in LevelDB. We intend to address to address this in future by adding range partitioning within levels. In the meantime, the change you're suggesting could pause inserts for unpredictable amounts of time.

There is some throttling of inserts already -- they cannot go faster than the data is written to disk. If the cache becomes full (because writing new files has not kept up), the application threads will pause and may also attempt to evict pages from cache.

Another thing that might help here is having multiple merge threads, at different levels. If new, level-0 files can be merged quickly, that should prevent the tree from becoming too deep. We've tried to address this by having the merge thread wait for a while for more level-0 files to be created before looking for merges at lower levels, but multiple threads would be a more general solution.

Michael.

MARK CALLAGHAN

unread,
Oct 24, 2012, 12:44:01 PM10/24/12
to wiredtig...@googlegroups.com
Multiple merge threads will be a big deal. For disk-based servers I
need more threads to get more IO throughput. For very fast flash I
still want multiple threads because merge with compression &
decompression needs more CPU.

Can you use the notion of debt for rate limits? I define amplification
as the amount of sequential IO (reads & writes) required to eventually
migrate a change from the in-memory table to the oldest file. For a
classic LSM with each level 10X larger than the previous, then the
amplification will be ~20X per level. The LSM style used by WT allows
for a tradeoff -- more files increases the cost for reads (at least
for range scans) and decreases the amplification for writes.

So there is the option for two types of tuning. The first is the
choice to favor faster range scans (few files to check in the worst
case) or write amplification. The second is to keep the amplification
debt from becoming too big. Assuming you estimate the amplification at
run time, and the estimate is X, then each a write of 100 bytes incurs
a debt of 100 * X. When the debt grows more CPU and IO can be devoted
to merges and in the worst case writes could be throttled.

--
Mark Callaghan
mdca...@gmail.com
Reply all
Reply to author
Forward
0 new messages