Hitting the wall with b+ tree at about 100mm records

Sam Tingleff

unread,

Oct 9, 2009, 1:44:16 PM10/9/09

to Tokyo Cabinet Users

We seem to be hitting a performance wall with TC/TT and b+ tree
databases at about 100 million records of about 200 bytes each. Write
performance slows dramatically, reads start to timeout and memory
usage increases constantly until the oom-killer kills the process.
The hardware is EBS volumes on "large" instances inside AWS.

Before we start to build a replacement on mysql I thought I would
ask... are there people breaking through this wall successfully? What
kind of memory behavior do you see? What tuning parameters should we
be using?

Martin Sarsale

unread,

Oct 9, 2009, 2:26:09 PM10/9/09

to tokyocabi...@googlegroups.com

Sam:

what's the bnum you're using?
have you tried compressing your data?
from:
http://1978th.net/tokyocabinet/spex-en.html
"Each pages of B+ tree can be stored with compressed. Two compression
method; Deflate of ZLIB and Block Sorting of BZIP2, are supported.
Because each record in a page has similar patterns, high efficiency of
compression is expected due to the Lempel-Ziv or the BWT algorithms.
In case handling text data, the size of a database is reduced to about
25%. If the scale of a database is large and disk I/O is the
bottleneck, featuring compression makes the processing speed improved
to a large extent."

--
Martin Sarsale
msn: mar...@malditainternet.com
jabber: martin....@gtalk.com
twitter: http://twitter.com/runixo
linkedin: http://www.linkedin.com/in/msarsale
sumavisos: http://www.sumavisos.com
blog: http://runa.tumblr.com

Sam Tingleff

unread,

Oct 9, 2009, 3:22:49 PM10/9/09

to Tokyo Cabinet Users

We're tried bnum values in the neighborhood of 10 million to 100
million and I don't think the difference has been really noticable.
"more than 1/128 of the number of records to be stored" does not
provide a lot of guidance on what this value really _should_ be when
your goal X records. Is there a reasonable upper limit?

Compression is a good idea. I'll try an optimize with compression and
see how it goes.

On Oct 9, 11:26 am, Martin Sarsale <martin.sars...@gmail.com> wrote:
> Sam:
>
> what's the bnum you're using?
> have you tried compressing your data?
> from:http://1978th.net/tokyocabinet/spex-en.html
> "Each pages of B+ tree can be stored with compressed. Two compression
> method; Deflate of ZLIB and Block Sorting of BZIP2, are supported.
> Because each record in a page has similar patterns, high efficiency of
> compression is expected due to the Lempel-Ziv or the BWT algorithms.
> In case handling text data, the size of a database is reduced to about
> 25%. If the scale of a database is large and disk I/O is the
> bottleneck, featuring compression makes the processing speed improved
> to a large extent."
>

> On Fri, Oct 9, 2009 at 2:44 PM, Sam Tingleff <sam.tingl...@gmail.com> wrote:
>
> > We seem to be hitting a performance wall with TC/TT and b+ tree
> > databases at about 100 million records of about 200 bytes each. Write
> > performance slows dramatically, reads start to timeout and memory
> > usage increases constantly until the oom-killer kills the process.
> > The hardware is EBS volumes on "large" instances inside AWS.
>
> > Before we start to build a replacement on mysql I thought I would
> > ask... are there people breaking through this wall successfully? What
> > kind of memory behavior do you see? What tuning parameters should we
> > be using?
>
> --
> Martin Sarsale
> msn: mar...@malditainternet.com

> jabber: martin.sars...@gtalk.com

Nicolae Mihalache

unread,

Oct 9, 2009, 3:32:27 PM10/9/09

to tokyocabi...@googlegroups.com

What tuning parameters do you set with tcbdbtune? Have you tried with a
large bnum?

nicolae
||

Sam Tingleff

unread,

Oct 9, 2009, 4:05:37 PM10/9/09

to Tokyo Cabinet Users

Here's the options we've settled on
#nmemb=256#ncnum=10240#lmemb=512#lcnum=65536#bnum=100000000#xmsiz=536870912#opts=l

Is 100 million "large" for bnum? One would think... Is there some
relationship between these parameters that we have wrong?

Sam Tingleff

unread,

Oct 13, 2009, 4:42:45 PM10/13/09

to Tokyo Cabinet Users

Am I the only one willing to open the kimono? Is anyone (successfully)
using TT at high throughput with 50m+ records? What tuning parameters
are YOU using?

Nicolae Mihalache

unread,

Oct 13, 2009, 11:48:26 PM10/13/09

to tokyocabi...@googlegroups.com

It seems not...

I have a few questions though maybe not helpful:
1. how did you select the xmsiz?
2. how big is the file when it starts slowing down?
3. have you tried running a standalone test on a standalone computer
with a standalone disk?

The memory increase suggests some sort of bug in TT. If you have a
simple C program that can populate such a b+ tree, I'm willing to try it
out on my computer and help you debug the problem.

nicolae

Zev Blut

unread,

Oct 13, 2009, 11:57:35 PM10/13/09

to tokyocabi...@googlegroups.com

Hi Sam,

I am not sure I can help much, but just to confirm what type of server
are you running TT on? In particular what are the specs of the disk
that holds the underlying TC file? I was once partially burned due to
having a TT server on an underspec disk for the type of load we were
sending to it. I was doing a lot of partial key deletes, so that
stressed the disk quite a bit.

Zev

Sam Tingleff

unread,

Oct 14, 2009, 11:50:03 AM10/14/09

to Tokyo Cabinet Users

@Nicolae
We selected the xmsiz using the only known documentation ("more than
1/128..."). It's probably too high so we're currently testing smaller
values. I'd say file size is typically about 35GB when we begin to
really see degraded performance. Running optimize seems to help quite
a bit while compression does not. I've not yet tried to reproduce
outside the cloud. Great idea.

@Zev
The server is an Amazon EC2 "m1.large" instance using an EBS volume.
It's in the cloud so exact specs are unknown but lots of people are
successful using EBS for high volume database-type workloads. Our use
case is constant inserts, lots of reads and almost zero deletes.

Thanks,
Sam

Neal Richter

unread,

Oct 15, 2009, 12:39:37 PM10/15/09

to Tokyo Cabinet Users

I worry that increasing the bnum is becoming 'lore' in TT w/ a b
+tree.

One should not need a 100M anything with a b+tree. From everything
our team can see/read the hashtable in a B+tree cabinet is used as a
page cache, where a page is a set of leaf nodes. I've read the code
and there are not hashtables inside the leaves to store records, a
binary search is used to locate records within a leaf.

If one assumes that the minimum branching factor is 4 and 128 records
per leaf, then
log4((500 x 10^6)/128) ~ 10 comparisons to find a leaf with 100+
records.

Thus the number of nodes in the b+tree should be something like when
it's full and assuming reasonable balance.
n = 4^11 - 1 = 4 Million

Does this sound wrong?

I've emailed Mikio with questions on how the hashtable (which bnum
sizes) interacts with a b+tree. He has responded with very little.
I'm still reading the C code..

Nicolae Mihalache

unread,

Oct 16, 2009, 10:55:26 AM10/16/09

to tokyocabi...@googlegroups.com

Can you run a "tcbmgr inform <your-btree-file>" and send us the output?

igrigorik

unread,

Oct 17, 2009, 10:32:03 AM10/17/09

to Tokyo Cabinet Users

> We're tried bnum values in the neighborhood of 10 million to 100
> million and I don't think the difference has been really noticable.
> "more than 1/128 of the number of records to be stored" does not
> provide a lot of guidance on what this value really _should_ be when
> your goal X records. Is there a reasonable upper limit?

"bnum' specifies the number of elements of the bucket array. If it is
not more than 0, the default value is specified. The default value is
131071. Suggested size of the bucket array is about from 0.5 to 4
times of the number of all records to be stored."
- http://1978th.net/tokyocabinet/spex-en.html

I know of a few TC deployments with 300M+ tyrants. bnum is the main
parameter you want to look at, and I'm not sure where you're getting
the 1/128th of the number of records.

TC picks a prime number closest to your specified bnum -- if your bnum
is << records, you will get a lot of collisions, which are extremely
expensive. Which, by the looks of it, is exactly what you're seeing.
If you want to store 100M records, set your bnum to 200-400M.

Running optimize basically resets your bnum to be closest prime to the
actual number of records.

ig

Sam Tingleff

unread,

Oct 19, 2009, 11:47:08 PM10/19/09

to Tokyo Cabinet Users

@igrigorik

That sounds like the docs for a hash db rather than b+ tree, which
we're using. tcbdbtune says (for bnum):
"The default value is 32749. Suggested size of the bucket array is
about from 1 to 4 times of the number of all pages to be stored."

I'm getting the 1/128 from http://1978th.net/tokyotyrant/spex.html
which says:
"If you use a B+ tree database, set the tuning parameters
"#lcnum=xxx#bnum=yyy" to improve performance. The former specifies the
maximum number of leaf nodes to be cached. It should be larger as long
as the capacity of RAM on the system allows. The latter specifies the
bucket number and should be more than 1/128 of the number of records
to be stored."

I would agree, 0.5 to 4 might make sense for a hash db. I'd love to
hear directly from someone with a 200-400m record db, either hash or b
+ tree. Because so far nobody else has provided actual, known working
configs for a large scale database.

On Oct 17, 7:32 am, igrigorik <i...@aiderss.com> wrote:
> > We're tried bnum values in the neighborhood of 10 million to 100
> > million and I don't think the difference has been really noticable.
> > "more than 1/128 of the number of records to be stored" does not
> > provide a lot of guidance on what this value really _should_ be when
> > your goal X records. Is there a reasonable upper limit?
>
> "bnum' specifies the number of elements of the bucket array. If it is
> not more than 0, the default value is specified. The default value is
> 131071. Suggested size of the bucket array is about from 0.5 to 4
> times of the number of all records to be stored."

> -http://1978th.net/tokyocabinet/spex-en.html

Jeremy Hinegardner

unread,

Oct 20, 2009, 10:59:07 AM10/20/09

to tokyocabi...@googlegroups.com

On Mon, Oct 19, 2009 at 08:47:08PM -0700, Sam Tingleff wrote:
> I would agree, 0.5 to 4 might make sense for a hash db. I'd love to
> hear directly from someone with a 200-400m record db, either hash or b
> + tree. Because so far nobody else has provided actual, known working
> configs for a large scale database.

Does this help? We have 6 master-master pairs of hash databases each
with a bnum value of 1 billion. We are planning for these to have
approximately 250 million documents in each database.

[tyrant@fs6 ~]$ tyrantmanager replication-status
10:46:38 INFO : storage-001 is replicating from tt1.collectiveintellect.com:11000
10:46:38 INFO : tt1.collectiveintellect.com:11000 is replicating from tt2.collectiveintellect.com:11001
10:46:38 INFO : Primary master : 10.10.12.15:11000 -> 89480612 records, primary since 2009-10-20 10:46:38
10:46:38 INFO : Failover master : 10.10.12.16:11001 -> 89480615 records, last replicated 0.102007 seconds ago
10:46:38 INFO :
10:46:39 INFO : storage-003 is replicating from tt1.collectiveintellect.com:11002
10:46:39 INFO : tt1.collectiveintellect.com:11002 is replicating from tt2.collectiveintellect.com:11003
10:46:39 INFO : Primary master : 10.10.12.16:11003 -> 87715752 records, primary since 2009-10-20 10:46:38
10:46:39 INFO : Failover master : 10.10.12.15:11002 -> 87715751 records, last replicated 0.036739 seconds ago
10:46:39 INFO :
10:46:39 INFO : storage-005 is replicating from tt1.collectiveintellect.com:11004
10:46:39 INFO : tt1.collectiveintellect.com:11004 is replicating from tt2.collectiveintellect.com:11005
10:46:39 INFO : Primary master : 10.10.12.15:11004 -> 176595425 records, primary since 2009-10-20 10:46:38
10:46:39 INFO : Failover master : 10.10.12.16:11005 -> 176595437 records, last replicated 0.141081 seconds ago
10:46:39 INFO :
10:46:39 INFO : storage-007 is replicating from tt1.collectiveintellect.com:11006
10:46:39 INFO : tt1.collectiveintellect.com:11006 is replicating from tt2.collectiveintellect.com:11007
10:46:39 INFO : Primary master : 10.10.12.15:11006 -> 176178472 records, primary since 2009-10-20 10:46:38
10:46:39 INFO : Failover master : 10.10.12.16:11007 -> 176178497 records, last replicated 0.126130 seconds ago
10:46:39 INFO :
10:46:39 INFO : storage-009 is replicating from tt1.collectiveintellect.com:11008
10:46:39 INFO : tt1.collectiveintellect.com:11008 is replicating from tt2.collectiveintellect.com:11009
10:46:39 INFO : Primary master : 10.10.12.16:11009 -> 88625774 records, primary since 2009-10-20 10:46:38
10:46:39 INFO : Failover master : 10.10.12.15:11008 -> 88625774 records, last replicated 0.012528 seconds ago
10:46:39 INFO :
10:46:39 INFO : storage-011 is replicating from tt1.collectiveintellect.com:11010
10:46:39 INFO : tt1.collectiveintellect.com:11010 is replicating from tt2.collectiveintellect.com:11011
10:46:39 INFO : Primary master : 10.10.12.16:11011 -> 89604842 records, primary since 2009-10-20 10:46:38
10:46:39 INFO : Failover master : 10.10.12.15:11010 -> 89604831 records, last replicated 0.233051 seconds ago

This is a sample commandline for each of our tyrants.

[tyrant@fs6 ~]$ tyrantmanager start --dry-run storage-001
10:51:56 INFO : Starting storage-001 : ttserver -host 0.0.0.0 -port 11001 -thnum 8 \
-tout 15 -dmn -pid /data10/storage-001/storage-001.pid \
-log /data10/storage-001/log/storage-001.log -le \
-ulog /data10/storage-001/ulog -ulim 1g -sid 11001 \
-mhost tt1.collectiveintellect.com -mport 11000 \
-rts /data10/storage-001/storage-001.rts -mask vanish \
/data10/storage-001/data/storage-001.tch#opts=ld#mode=wc#bnum=1000000000 : (dry-run)

And here is the directory layout for each tyrant instance:

Each tyrant gets its own disk for its data, and at some point we may
split out update logs to separate disks also.

I keep meaning to put together a blog post about this setup.

enjoy,

-jeremy

--
========================================================================
Jeremy Hinegardner jer...@hinegardner.org

Sam Tingleff

unread,

Oct 20, 2009, 4:52:10 PM10/20/09

to Tokyo Cabinet Users

Looks like you currently have between 87m and 176m documents on each
database. What is the throughput? What kind of concurrency do you
have? Median response time?

We're mostly using b+ tree but would be happy to switch if we knew
hash would "work" for our use case.

Brian Hammond

unread,

Oct 30, 2009, 3:17:14 PM10/30/09

to Tokyo Cabinet Users

Did you ever figure this out? I just did some capacity planning and
reached the conclusion that my project will need about 100MM keys.

Mike Dierken

unread,

Oct 30, 2009, 8:28:46 PM10/30/09

to tokyocabi...@googlegroups.com

Brian,
We have multiple TT servers running with several hundred million keys,
so as long as you have a multi-machine strategy support for 100M
should be okay.
However, the problems we have run into with ttserver have to do with
increasing memory usage. There is no way to set a hard limit on total
resources used. Even with lowering runtime parameters (lcnum, ncnum)
usage grows over time. We have started running a periodic process to
call 'sync' on the ttserver (which seems to flush changes from memory
to disk) but that doesn't recover memory.

The real problem with this is if the Linux out-of-memory killer stops
the ttserver process, the datafile become corrupt and data is lost
from that file. There is no known solution to this data corruption on
crash situation from the community or the author.

Mike

Jeremy Hinegardner

unread,

Oct 31, 2009, 9:56:08 PM10/31/09

to tokyocabi...@googlegroups.com

We are not hitting the tyrants directly for requests. Requests go
through HAproxy -> Varnish -> Tyrants. Our Varnish configuration has
the distribution algorithm in it so it knows which of the 6 tyrants
to forward the request to.

So stats here are from the haproxy logs which show the requests between
haproxy and varnish. Varnish caches the results of all the GET requests
to the tyrants so this does not show direct load on the tyrant servers
themselves. We're not actively recording the varnish -> tyrant data.

This is for one 24 hour period

average bytes / sec : 1153759
max concurrency : 229
average reqeusts / sec : 944
daily throughput : 92.7GiB
median response time : 1ms -- most GETS will be cache hits

And here's some graphs of some of the data:

http://skitch.com/copiousfreetime/nfteg/varnish-requests-minute-as-reported-by-haproxy
http://skitch.com/copiousfreetime/nfts8/bytes-transferrred-per-minute-for-varnish-as-reported-by-haproxy
http://skitch.com/copiousfreetime/nftey/concurrency-of-varnish-over-one-day-as-reported-by-haproxy

We haven't really done any optimizations for access other than
setting an appropriate bnum, and every tokyo cabinet file is on its
own physical disc.

enjoy,

-jeremy

On Tue, Oct 20, 2009 at 01:52:10PM -0700, Sam Tingleff wrote:
> Looks like you currently have between 87m and 176m documents on each
> database. What is the throughput? What kind of concurrency do you
> have? Median response time?
>
> We're mostly using b+ tree but would be happy to switch if we knew
> hash would "work" for our use case.
>

> On Oct 20, 7:59?am, Jeremy Hinegardner <jer...@hinegardner.org> wrote:
> > On Mon, Oct 19, 2009 at 08:47:08PM -0700, Sam Tingleff wrote:
> > > I would agree, 0.5 to 4 might make sense for a hash db. I'd love to
> > > hear directly from someone with a 200-400m record db, either hash or b
> > > + tree. Because so far nobody else has provided actual, known working
> > > configs for a large scale database.
> >

> > Does this help? ?We have 6 master-master pairs of hash databases each
> > with a bnum value of 1 billion. ?We are planning for these to have

> > approximately 250 million documents in each database.
> >

> > ? ? [tyrant@fs6 ~]$ tyrantmanager replication-status
> > ? ? 10:46:38 ?INFO : storage-001 is replicating from tt1.collectiveintellect.com:11000
> > ? ? 10:46:38 ?INFO : ? tt1.collectiveintellect.com:11000 is replicating from tt2.collectiveintellect.com:11001
> > ? ? 10:46:38 ?INFO : ? Primary master ?: 10.10.12.15:11000 -> 89480612 records, primary since 2009-10-20 10:46:38
> > ? ? 10:46:38 ?INFO : ? Failover master : 10.10.12.16:11001 -> 89480615 records, last replicated 0.102007 seconds ago
> > ? ? 10:46:38 ?INFO :
> > ? ? 10:46:39 ?INFO : storage-003 is replicating from tt1.collectiveintellect.com:11002
> > ? ? 10:46:39 ?INFO : ? tt1.collectiveintellect.com:11002 is replicating from tt2.collectiveintellect.com:11003
> > ? ? 10:46:39 ?INFO : ? Primary master ?: 10.10.12.16:11003 -> 87715752 records, primary since 2009-10-20 10:46:38
> > ? ? 10:46:39 ?INFO : ? Failover master : 10.10.12.15:11002 -> 87715751 records, last replicated 0.036739 seconds ago
> > ? ? 10:46:39 ?INFO :
> > ? ? 10:46:39 ?INFO : storage-005 is replicating from tt1.collectiveintellect.com:11004
> > ? ? 10:46:39 ?INFO : ? tt1.collectiveintellect.com:11004 is replicating from tt2.collectiveintellect.com:11005
> > ? ? 10:46:39 ?INFO : ? Primary master ?: 10.10.12.15:11004 -> 176595425 records, primary since 2009-10-20 10:46:38
> > ? ? 10:46:39 ?INFO : ? Failover master : 10.10.12.16:11005 -> 176595437 records, last replicated 0.141081 seconds ago
> > ? ? 10:46:39 ?INFO :
> > ? ? 10:46:39 ?INFO : storage-007 is replicating from tt1.collectiveintellect.com:11006
> > ? ? 10:46:39 ?INFO : ? tt1.collectiveintellect.com:11006 is replicating from tt2.collectiveintellect.com:11007
> > ? ? 10:46:39 ?INFO : ? Primary master ?: 10.10.12.15:11006 -> 176178472 records, primary since 2009-10-20 10:46:38
> > ? ? 10:46:39 ?INFO : ? Failover master : 10.10.12.16:11007 -> 176178497 records, last replicated 0.126130 seconds ago
> > ? ? 10:46:39 ?INFO :
> > ? ? 10:46:39 ?INFO : storage-009 is replicating from tt1.collectiveintellect.com:11008
> > ? ? 10:46:39 ?INFO : ? tt1.collectiveintellect.com:11008 is replicating from tt2.collectiveintellect.com:11009
> > ? ? 10:46:39 ?INFO : ? Primary master ?: 10.10.12.16:11009 -> 88625774 records, primary since 2009-10-20 10:46:38
> > ? ? 10:46:39 ?INFO : ? Failover master : 10.10.12.15:11008 -> 88625774 records, last replicated 0.012528 seconds ago
> > ? ? 10:46:39 ?INFO :
> > ? ? 10:46:39 ?INFO : storage-011 is replicating from tt1.collectiveintellect.com:11010
> > ? ? 10:46:39 ?INFO : ? tt1.collectiveintellect.com:11010 is replicating from tt2.collectiveintellect.com:11011
> > ? ? 10:46:39 ?INFO : ? Primary master ?: 10.10.12.16:11011 -> 89604842 records, primary since 2009-10-20 10:46:38
> > ? ? 10:46:39 ?INFO : ? Failover master : 10.10.12.15:11010 -> 89604831 records, last replicated 0.233051 seconds ago

> >
> > This is a sample commandline for each of our tyrants.
> >

> > ? ? [tyrant@fs6 ~]$ tyrantmanager start --dry-run storage-001
> > ? ? 10:51:56 ?INFO : Starting storage-001 : ttserver -host 0.0.0.0 -port 11001 -thnum 8 \
> > ? ? ? ? ? ? ? ? ? ? ?-tout 15 -dmn -pid /data10/storage-001/storage-001.pid \
> > ? ? ? ? ? ? ? ? ? ? ?-log /data10/storage-001/log/storage-001.log -le \
> > ? ? ? ? ? ? ? ? ? ? ?-ulog /data10/storage-001/ulog -ulim 1g -sid 11001 \
> > ? ? ? ? ? ? ? ? ? ? ?-mhost tt1.collectiveintellect.com -mport 11000 ?\
> > ? ? ? ? ? ? ? ? ? ? ?-rts /data10/storage-001/storage-001.rts -mask vanish \
> > ? ? ? ? ? ? ? ? ? ? /data10/storage-001/data/storage-001.tch#opts=ld#mode=wc#bnum=1000000000 : (dry-run)

> >
> > And here is the directory layout for each tyrant instance:
> >
> > Each tyrant gets its own disk for its data, and at some point we may
> > split out update logs to separate disks also.
> >

> > ? ? [tyrant@fs6 /]$ tree -I *.ulog /data10/storage-001
> > ? ? /data10/storage-001
> > ? ? |-- config.rb
> > ? ? |-- data
> > ? ? | ? `-- storage-001.tch
> > ? ? |-- log
> > ? ? | ? `-- storage-001.log
> > ? ? |-- lua
> > ? ? |-- storage-001.pid
> > ? ? |-- storage-001.rts
> > ? ? `-- ulog
> >
> > I keep meaning to put together a blog post about this setup. ?
> >
> > enjoy,
> >
> > -jeremy
> >
> > --
> > ========================================================================
> > ?Jeremy Hinegardner ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?jer...@hinegardner.org

Neal Richter

unread,

Nov 2, 2009, 6:40:51 PM11/2/09

to Tokyo Cabinet Users

Nice work. Can you post some more details on your configuration?

- Neal

On Oct 31, 6:56 pm, Jeremy Hinegardner <jer...@hinegardner.org> wrote:
> We are not hitting the tyrants directly for requests. Requests go
> through HAproxy -> Varnish -> Tyrants. Our Varnish configuration has
> the distribution algorithm in it so it knows which of the 6 tyrants
> to forward the request to.
>
> So stats here are from the haproxy logs which show the requests between
> haproxy and varnish. Varnish caches the results of all the GET requests
> to the tyrants so this does not show direct load on the tyrant servers
> themselves. We're not actively recording the varnish -> tyrant data.
>
> This is for one 24 hour period
>
> average bytes / sec : 1153759
> max concurrency : 229
> average reqeusts / sec : 944
> daily throughput : 92.7GiB
> median response time : 1ms -- most GETS will be cache hits
>
> And here's some graphs of some of the data:
>

> http://skitch.com/copiousfreetime/nfteg/varnish-requests-minute-as-re...
> http://skitch.com/copiousfreetime/nfts8/bytes-transferrred-per-minute...
> http://skitch.com/copiousfreetime/nftey/concurrency-of-varnish-over-o...

slacket

unread,

Nov 5, 2009, 10:38:41 AM11/5/09

to Tokyo Cabinet Users

Thanks for sharing. With the given number of records in your TT
instance, do you know how many "puts" said instance can do?

When we exceed say a million records on b+ tree over TT, our puts go
down by an order, ball park of hundreds a second from thousands a
second. I've played around with bnum but it doesn't seem to affect
this behavior.

Is this the normal behavior for TT/TC for a large number of records?

On Oct 31, 8:56 pm, Jeremy Hinegardner <jer...@hinegardner.org> wrote:
> We are not hitting the tyrants directly for requests. Requests go
> through HAproxy -> Varnish -> Tyrants. Our Varnish configuration has
> the distribution algorithm in it so it knows which of the 6 tyrants
> to forward the request to.
>
> So stats here are from the haproxy logs which show the requests between
> haproxy and varnish. Varnish caches the results of all the GET requests
> to the tyrants so this does not show direct load on the tyrant servers
> themselves. We're not actively recording the varnish -> tyrant data.
>
> This is for one 24 hour period
>
> average bytes / sec : 1153759
> max concurrency : 229
> average reqeusts / sec : 944
> daily throughput : 92.7GiB
> median response time : 1ms -- most GETS will be cache hits
>
> And here's some graphs of some of the data:
>

> http://skitch.com/copiousfreetime/nfteg/varnish-requests-minute-as-re...
> http://skitch.com/copiousfreetime/nfts8/bytes-transferrred-per-minute...
> http://skitch.com/copiousfreetime/nftey/concurrency-of-varnish-over-o...
>

Reply all

Reply to author

Forward