Learning memcached

113 views
Skip to first unread message

punkish

unread,
May 21, 2011, 1:07:11 PM5/21/11
to memcached
I am trying to learn memcached, so I installed it on my laptop and
fired it up. I also created a simple SQLite db 'mem.sqlite' like so
and filled the table with 20_000 random strings.

CREATE TABLE t (id INTEGER PRIMARY KEY, str TEXT)

I started memcached with a simple `memcached -p 11212 -m 48` and used
the following simplistic Perl script to benchmark.

use Cache::Memcached;
my $memd = new Cache::Memcached {'servers' => [ "localhost:11212" ]};

use DBI qw(:sql_types);
my $dbh = DBI->connect("dbi:SQLite:dbname=mem.sqlite");

use Benchmark qw(:all);
my $count = 40_000;

cmpthese($count, {
'query_dbh' => sub {
my $id = int(rand(20_000));
my $sth = $dbh->prepare("SELECT str FROM t WHERE id = ?");
$sth->execute($id);
my ($str) = $sth->fetchrow_array;

open F, ">", "foo.txt" or die $!;
print F "id: $id, str: $str\n";
close F;
},
'query_mem' => sub {
my $id = int(rand(20_000));

open F, ">", "foo.txt" or die $!;
my $str = $memd->get($id);
unless ($str) {
my $sth = $dbh->prepare("SELECT str FROM t WHERE id = ?");
$sth->execute($id);
($str) = $sth->fetchrow_array;

$memd->set($id, $str);
}
print F "id: $id, str: $str\n";
close F;
}
});

I consistently get results such as above, while I expected the
memcache to slowly fill up and speed up the queries way faster than
only accessing the file based db. Whatever I am doing, it is far
faster to open up the SQLite database every time and query it than it
is to query the memory cached value. (I am creating a $dbh and $sth
every time, and also opening a file and writing to it every time to
kinda emulate a CGI process that starts afresh on every hit of the
browser.

What am I doing wrong, or are my expectations wrong?

Rate query_mem query_dbf
query_mem 2723/s -- -29%
query_dbh 3846/s 41% --

It seems logical to me that retrieving even the simplest of results
from memory should be faster than opening a file on disk, preparing a
db handle, querying, and then returning the result. At worst, it
should be about the same, not 30% worse!

Dustin

unread,
May 21, 2011, 2:40:10 PM5/21/11
to memcached

On May 21, 10:07 am, punkish <punk.k...@gmail.com> wrote:

> I consistently get results such as above, while I expected the
> memcache to slowly fill up and speed up the queries way faster than
> only accessing the file based db. Whatever I am doing, it is far
> faster to open up the SQLite database every time and query it than it
> is to query the memory cached value. (I am creating a $dbh and $sth
> every time, and also opening a file and writing to it every time to
> kinda emulate a CGI process that starts afresh on every hit of the
> browser.

Minor nit: You're not opening the file every time.

> What am I doing wrong, or are my expectations wrong?

The thing you're doing wrong is taking something very cheap (i.e.
not performance critical) and trying to make it faster with memcached.

First note that SQLite is quite fast. Any DB whose entire contents
can fit into your filesystem cache will return primary key lookups
around as fast as memcached. In this case, the entire database is in
memory in perl's process space and you're doing non-concurrent access
to it and memcached and comparing speeds. SQLite has an advantage
here and is not performance critical to your app. For this exact app,
you're better off storing the data in a perl-based in-memory cache
(hash table with some eviction logic).

I've deployed memcached over SQLite in a twisted app that had lots
of outstanding read and write queries to be processed through a
separate thread. Before sending them off to that thread, I could fire
many concurrent requests to memcached and avoid the need to hit the
scarce resource at all as concurrency is not an issue there. While it
might be slower for what I'm doing, it could do it all in parallel.

The basic problem with benchmarking is that it often doesn't start
with a problem. In this case, I think you've simplified the problem
to the level where you can't really see the benefits. Had this been
an actual application, for example, you might want to get 10 keys at a
time. That would give you a 10x increase in throughput. You may also
have more than one thread/DB doing the work, or your DB is larger than
can fit in RAM in which case you can spread the RAM across many
machines, workers across many machines/processes/threads/events.

punkish

unread,
May 21, 2011, 6:08:29 PM5/21/11
to memcached

On May 21, 2011, at 1:40 PM, Dustin wrote:


pk> On May 21, 10:07 am, punkish <punk.k...@gmail.com> wrote:
pk>
pk> I consistently get results such as above, while I expected the
pk> memcache to slowly fill up and speed up the queries way faster
pk> than only accessing the file based db. Whatever I am doing, it is
pk> far faster to open up the SQLite database every time and query it
pk> than it is to query the memory cached value. (I am creating a
$dbh
pk> and $sth every time, and also opening a file and writing to it
pk> every time to kinda emulate a CGI process that starts afresh on
pk> every hit of the browser.

ds> Minor nit: You're not opening the file every time.
ds>

Interesting. I didn't realize that, and still don't. Anyway, that is
unimportant for this discussion.



pk> What am I doing wrong, or are my expectations wrong?

ds> The thing you're doing wrong is taking something very cheap (i.e.
ds> not performance critical) and trying to make it faster with
ds> memcached.
ds>
ds> First note that SQLite is quite fast. Any DB whose entire
ds> contents can fit into your filesystem cache will return primary
ds> key lookups around as fast as memcached. In this case, the
entire
ds> database is in memory in perl's process space and you're doing
ds> non-concurrent access to it and memcached and comparing speeds.
ds> SQLite has an advantage here and is not performance critical to
ds> your app. For this exact app, you're better off storing the data
ds> in a perl-based in-memory cache (hash table with some eviction
ds> logic).


Ok. First, this app is completely cooked up. As noted in my OP, these
are my baby steps with memcached.

I understand that SQLite is very fast. I understand that there is a
filesystem cache that speeds up db lookup. That said, as I noted in my
OP, I would expect memcache to be at least as fast if not faster. What
I am seeing is that memcache is actually significantly slower.

My worry is not that memcache is not all that it is cracked up to be.
My worry is that I am not understanding memcache, and am doing
something wrong with it.



ds> The basic problem with benchmarking is that it often doesn't start
ds> with a problem. In this case, I think you've simplified the
ds> problem to the level where you can't really see the benefits.
Had
ds> this been an actual application, for example, you might want to
ds> get 10 keys at a time. That would give you a 10x increase in
ds> throughput.

Fair enough. So, I modified my script to grab 20 random, non-
sequential rows without using the SQL IN clause so that I have 20
separate SQL queries.

my @ids = map { int(rand(19_999)) || 1 } (0 .. 19);
my $sql = "SELECT Ifnull(str, 'none') FROM t WHERE id = ?";

sub out {
my @res = @_;

open F, ">", "foo.txt" or die $!;
say F "id: " . $_->[0] . ", str: " . $_->[1] for (@res);
close F;
}

timethese($count, {
'query_mem' => sub {
my $sth = $dbh->prepare($sql);
my @res = ();
for (@ids) {
my $str = $memd->get($_);
unless ($str) {
$sth->execute($_);
($str) = $sth->fetchrow_array;
$memd->set($_, $str);
}

push @res, [$_, $str];
}

out(@res);
}
'query_dbh' => sub {
my $sth = $dbh->prepare($sql);
my @res = ();
for (@ids) {
$sth->execute($_);
my ($str) = $sth->fetchrow_array;
push @res, [$_, $str];
}

out(@res);
}
});

Well, now memcache is even worse. As the results below show, while
earlier on I was getting circa 2700 results per second with memcache,
now I am getting a tenth of that. I must be doing something wrong. I
am trying to determine what.

Benchmark: timing 10000 iterations of query_dbh, query_mem...
query_dbh: 6 wallclock secs ( 3.29 usr + 1.03 sys = 4.32 CPU) @
2314.81/s (n=10000)
query_mem: 54 wallclock secs (30.02 usr + 8.24 sys = 38.26 CPU) @
261.37/s (n=10000)

dormando

unread,
May 21, 2011, 6:23:19 PM5/21/11
to memcached
>
> timethese($count, {
> 'query_mem' => sub {
> my $sth = $dbh->prepare($sql);
> my @res = ();
> for (@ids) {
> my $str = $memd->get($_);
> unless ($str) {
> $sth->execute($_);
> ($str) = $sth->fetchrow_array;
> $memd->set($_, $str);
> }
>
> push @res, [$_, $str];
> }
>
> out(@res);
> }

The confusion here is probably more about you comparing the fastest
nearly-fully-C-based DB path vs the slowest possible memcached path.
Asking DBI for a simple query and then calling fetchrow_array (its fastest
deserializer) is very hard to beat.

If you're looking to see if an actual loaded DB would improve or not, you
need to make the DB queries as slow as you see in production. You might as
well reduce them to timed sleeps that roughly represent what you'd see in
prod... If your dataset really is this fast, small, single-threaded, and
in memory, there is no purpose to memcached.

> Benchmark: timing 10000 iterations of query_dbh, query_mem...
> query_dbh: 6 wallclock secs ( 3.29 usr + 1.03 sys = 4.32 CPU) @
> 2314.81/s (n=10000)
> query_mem: 54 wallclock secs (30.02 usr + 8.24 sys = 38.26 CPU) @
> 261.37/s (n=10000)

Cache::Memcached is a slowish pure-perl implementation. It helps speed
things up with very slow DB queries, or very large hot data set that won't
fit in your DB (ie; even fastish DB queries start to slow down as they hit
disk more often than not). It's also great to relieve concurrency off of
your DB.

If you're trying to compare raw speed vs speed you probably want to start
with the Memcached::libmemcached library. That's the "faster" C based guy.
There's also a wrapper to make it look more like Cache::Memcached's
interface...

-Dormando

punkish

unread,
May 21, 2011, 6:29:33 PM5/21/11
to memcached
Thanks. This begins to explain some. I will move my baby steps toward
using the Memcached::libmemcached interface and then report back what
I find.

The funny thing is, while in real production, the queries are not this
simple, in most web apps I make, the queries are really not all that
complicated. They do retrieve data from large data stores, but the SQL
itself is relatively straightforward. Besides, none of the web sites I
make are hosting thousands of hits a day. These are scientific web
sites, so even if the query is complicated, it being performed only a
few 10s, at the most 100s of times a day.

After all this, it may well be that memcache may not be my immediate
step toward better speeds, nevertheless, it is a great concept worth
learning for me.

> -Dormando

dormando

unread,
May 21, 2011, 6:34:21 PM5/21/11
to memcached
> The funny thing is, while in real production, the queries are not this
> simple, in most web apps I make, the queries are really not all that
> complicated. They do retrieve data from large data stores, but the SQL
> itself is relatively straightforward. Besides, none of the web sites I
> make are hosting thousands of hits a day. These are scientific web
> sites, so even if the query is complicated, it being performed only a
> few 10s, at the most 100s of times a day.
>
> After all this, it may well be that memcache may not be my immediate
> step toward better speeds, nevertheless, it is a great concept worth
> learning for me.

It's a great tool to have around when shit hits the fan. It's also great
for random other things; like caching the results of multiple queries, of
rendered page templates, of full pages renderings, etc.

So if you compare 5 DB fetches (over the network! or at least not
in-process) vs one memcached fetch, it might start looking better.

But that's all moot, if your site is fast enough as is there's no point in
working on it. If it's slow, profile why it's slow and cache or fix those
queries.

Dustin

unread,
May 21, 2011, 6:38:08 PM5/21/11
to memcached

On May 21, 3:08 pm, punkish <punk.k...@gmail.com> wrote:
> ds>  Minor nit:  You're not opening the file every time.
> ds>
>
> Interesting. I didn't realize that, and still don't. Anyway, that is
> unimportant for this discussion.

You open the file one time before looping through your tests.

> My worry is that I am not understanding memcache, and am doing
> something wrong with it.

Understood. That's why we're trying to tell you why you're seeing
results that aren't meeting your expectations.
I can't imagine why it wouldn't be worse. On each iteration, you
compile a SQL query then block on network activity 20 times in the
memcached case. You're doing almost the entire SQLite workload in the
memcached case and then mixing in a gang of network round trips.

In the SQLite case, you should've expected a performance gain since
you're compiling queries less frequently than you did in the previous
version.

memcached is a network service. You'd never structure code such
that the inner loop of your program requires a full round trip over
the network. Perform a single get for all of the keys you want and if
you miss any, compile the query at that time, backfill them, and fill
in the holes.

punkish

unread,
May 21, 2011, 7:45:06 PM5/21/11
to memcached


On May 21, 5:38 pm, Dustin <dsalli...@gmail.com> wrote:
..
>
>   I can't imagine why it wouldn't be worse.  On each iteration, you
> compile a SQL query then block on network activity 20 times in the
> memcached case.  You're doing almost the entire SQLite workload in the
> memcached case and then mixing in a gang of network round trips.
>
>   In the SQLite case, you should've expected a performance gain since
> you're compiling queries less frequently than you did in the previous
> version.
>
>   memcached is a network service.  You'd never structure code such
> that the inner loop of your program requires a full round trip over
> the network.  Perform a single get for all of the keys you want and if
> you miss any, compile the query at that time, backfill them, and fill
> in the holes.

Nice. I changed the code to

sub {
my $sth = $dbh->prepare($sql);
my @res = ();

my $hashref = $memd->get_multi(@ids);

while (my ($id, $str) = each %$hashref) {
unless ($str) {
$sth->execute($id);
($str) = $sth->fetchrow_array;
$memd->set($id, $str);
}

push @res, [$_, $str];
}

out(@res);
}

and now I get



Benchmark: timing 10000 iterations of query_dbh, query_mem...
query_dbh: 6 wallclock secs ( 3.28 usr + 1.03 sys = 4.31 CPU) @
2320.19/s (n=10000)
query_mem: 10 wallclock secs ( 6.57 usr + 1.45 sys = 8.02 CPU) @
1246.88/s (n=10000)

instead of

Benchmark: timing 10000 iterations of query_dbh, query_mem...
query_dbh: 6 wallclock secs ( 3.29 usr + 1.03 sys = 4.32 CPU) @
2314.81/s (n=10000)
query_mem: 54 wallclock secs (30.02 usr + 8.24 sys = 38.26 CPU) @
261.37/s (n=10000)

an almost 5 times improvement.

punkish

unread,
May 21, 2011, 7:53:22 PM5/21/11
to memcached


On May 21, 5:23 pm, dormando <dorma...@rydia.net> wrote:
..
>
> > Benchmark: timing 10000 iterations of query_dbh, query_mem...
> >  query_dbh:  6 wallclock secs ( 3.29 usr +  1.03 sys =  4.32 CPU) @
> > 2314.81/s (n=10000)
> >  query_mem: 54 wallclock secs (30.02 usr +  8.24 sys = 38.26 CPU) @
> > 261.37/s (n=10000)
>
> Cache::Memcached is a slowish pure-perl implementation. It helps speed
> things up with very slow DB queries, or very large hot data set that won't
> fit in your DB (ie; even fastish DB queries start to slow down as they hit
> disk more often than not). It's also great to relieve concurrency off of
> your DB.
>
> If you're trying to compare raw speed vs speed you probably want to start
> with the Memcached::libmemcached library. That's the "faster" C based guy.
> There's also a wrapper to make it look more like Cache::Memcached's
> interface...


Ha ha! I installed Cache::Memcached::Fast, which seems to be a C based
drop in replacement for C::M, and now I get the following results
(this includes the get_multi method to get many keys in one shot)

query_dbh: 5 wallclock secs ( 3.31 usr + 1.03 sys = 4.34 CPU) @
2304.15/s (n=10000)
query_mem: 6 wallclock secs ( 1.87 usr + 1.46 sys = 3.33 CPU) @
3003.00/s (n=10000)

much, much nicer. In fact, the memcache option is now slightly faster
than pure SQLite.

dormando

unread,
May 21, 2011, 7:58:16 PM5/21/11
to memcached
>
> Ha ha! I installed Cache::Memcached::Fast, which seems to be a C based
> drop in replacement for C::M, and now I get the following results
> (this includes the get_multi method to get many keys in one shot)
>
> query_dbh: 5 wallclock secs ( 3.31 usr + 1.03 sys = 4.34 CPU) @
> 2304.15/s (n=10000)
> query_mem: 6 wallclock secs ( 1.87 usr + 1.46 sys = 3.33 CPU) @
> 3003.00/s (n=10000)
>
> much, much nicer. In fact, the memcache option is now slightly faster
> than pure SQLite.

*cough*. I hope nobody jumps up my ass for this; but C::M::F is based off
of some faulty mutation optimizations. I would highly recommend
Memcached::libmemcached in its favor. If you want a drop-in interface try
this out:
http://search.cpan.org/~timb/Cache-Memcached-libmemcached-0.02011/

libmemcached is a much more robust library, if you end up stuck with
something it's probably best off to be stuck over there.

punkish

unread,
May 21, 2011, 8:24:05 PM5/21/11
to memcached
Yeah, pretty good

Benchmark: timing 10000 iterations of query_dbh, query_mem...
query_dbh: 5 wallclock secs ( 3.20 usr + 0.99 sys = 4.19 CPU) @
2386.63/s (n=10000)
query_mem: 6 wallclock secs ( 2.37 usr + 1.26 sys = 3.63 CPU) @
2754.82/s (n=10000)

Dustin

unread,
May 22, 2011, 1:26:14 AM5/22/11
to memcached

On May 21, 5:24 pm, punkish <punk.k...@gmail.com> wrote:
> Yeah, pretty good
>
> Benchmark: timing 10000 iterations of query_dbh, query_mem...
>  query_dbh:  5 wallclock secs ( 3.20 usr +  0.99 sys =  4.19 CPU) @
> 2386.63/s (n=10000)
>  query_mem:  6 wallclock secs ( 2.37 usr +  1.26 sys =  3.63 CPU) @
> 2754.82/s (n=10000)

Just for my sanity, can you do another pass where you're not compiling
SQL in the cases where you have cache hits and don't need it?

punkish

unread,
May 22, 2011, 11:20:15 AM5/22/11
to memcached
SQLite has no significant difference in preparing a SQL statement.
Nevertheless, I modified my script (see complete script below) per
your request above. I get the following result (note, I am now using
Cache::Memcached::libmemcached as suggested by Dormando) --

punkish@lucknow ~/Projects/Learning_memcached$perl test.pl
Benchmark: timing 10000 iterations of query_dbh, query_mem...
query_dbh: 6 wallclock secs ( 3.87 usr + 0.92 sys = 4.79 CPU) @
2087.68/s (n=10000)
query_mem: 6 wallclock secs ( 2.51 usr + 1.23 sys = 3.74 CPU) @
2673.80/s (n=10000)

#!/usr/local/bin/perl

use strict;
use Cache::Memcached::libmemcached;
use DBI qw(:sql_types);
use Benchmark qw(:all);

my $dbh = DBI->connect(
"dbi:SQLite:dbname=mem.sqlite","","",
{RaiseError => 1, AutoCommit => 0}
);

my $memd = new Cache::Memcached::libmemcached ({
'servers' => [ "localhost:11212" ]
});

my $sql = "SELECT Ifnull(str, 'none') FROM t WHERE id = ?";
my @ids = map { int(rand(19_999)) || 1 } (0 .. 20);

timethese(10_000, {
'query_mem' => sub {
my $hashref = $memd->get_multi(@ids);

my @res = ();
while (my ($id, $str) = each %$hashref) {
unless ($str) {
my $sth = $dbh->prepare($sql);
$sth->execute($id);
($str) = $sth->fetchrow_array;
$memd->set($id, $str);
}

push @res, [$id, $str];
}
out(@res);
},
'query_dbh' => sub {
my $sth = $dbh->prepare($sql);

my @res = ();
for (@ids) {
$sth->execute($_);
my ($str) = $sth->fetchrow_array;
push @res, [$_, $str];
}
out(@res);
},
});

sub out {
my @res = @_;

open F, ">", "foo.txt" or die $!;
say F "id: " . $_->[0] . ", str: " . $_->[1] for (@res);
close F;
}

sub create_db {
my $sth = $dbh->prepare(
"CREATE TABLE t (id INTEGER PRIMARY KEY, str TEXT"
);

$sth->execute;
$dbh->commit;
}

sub pop_db {
my $sth = $dbh->prepare("INSERT INTO t (str) VALUES (?)");

$sth->execute(randstr(rand(100))) for (0 .. 20_000);
$dbh->commit;
}

sub randstr {
my $len = shift;

my @chr = ('a'..'z','A'..'Z','0'..'9','_', ' ');
return join "", map { $chr[rand @chr] } 1 .. $len;
}
Reply all
Reply to author
Forward
0 new messages