memcached slower than file IO

591 views
Skip to first unread message

Ved

unread,
May 29, 2009, 2:51:38 AM5/29/09
to memcached
Hi

I have a webpage in which I read a text file for processing. My
benchmark (ab) results show a much higher request / second than
contents stored and accessed using memcached. And also the number of
failed request is 0 when I am using disk IO where as when I am using
memcached the number keeps going higher. What could be the possible
reasons for memory IO being slower than disk IO, and the failed
requests.

I would also like to know how these settings -c , -n, -b affect
overall performance.

Thanks

Ved

David Stanek

unread,
May 29, 2009, 6:32:03 AM5/29/09
to memc...@googlegroups.com
On Fri, May 29, 2009 at 2:51 AM, Ved <prakas...@gmail.com> wrote:
>
> I have a webpage in which I read a text file for processing. My
> benchmark (ab) results show a much higher request / second than
> contents stored and accessed using memcached. And also the number of
> failed request is 0 when I am using disk IO where as when I am using
> memcached the number keeps going higher. What could be the possible
> reasons for memory IO being slower than disk IO, and the failed
> requests.

I think some of your assumptions may be incorrect. If you are using
the same file in every request your OS is probably using a cached copy
in memory. It won't hit the disk every time. While memcached does
store everything in memory which is fast, it transmits data over a
socket which is slow. You are really comparing using local memory vs.
using memory on a different machine.

I use memcached to reduce hits to my database which is much slower
than memcached. Sometimes I also use it to store objects that are
expensive to create. This is just a trade off between CPU and network
access.

--
David
blog: http://www.traceback.org
twitter: http://twitter.com/dstanek

Ved

unread,
May 29, 2009, 8:39:37 AM5/29/09
to memcached
Thanks David for your reply but what I mentioned are not my
assumptions

1. memcached is installed on the machine where my file is so that
takes care of your concern ( memory on a different machine ).
2. I am not just reading a single file everytime. I have 3 data files
that I access in the script (system may be caching files in memory)
3. Even if the files are not changing does it mean that disk cache is
faster than memcached? that too an extent of disk cache being almost
50% faster than memcached.

I am sorry, but I don't think your explanation addresses my concerns.

Ved

On May 29, 3:32 pm, David Stanek <dsta...@dstanek.com> wrote:

Les Mikesell

unread,
May 29, 2009, 8:55:25 AM5/29/09
to memc...@googlegroups.com
Ved wrote:
> Thanks David for your reply but what I mentioned are not my
> assumptions
>
> 1. memcached is installed on the machine where my file is so that
> takes care of your concern ( memory on a different machine ).

That doesn't matter - you still go through the same client/server
motions to access it through a socket as if you have distributed storage.

> 2. I am not just reading a single file everytime. I have 3 data files
> that I access in the script (system may be caching files in memory)
> 3. Even if the files are not changing does it mean that disk cache is
> faster than memcached? that too an extent of disk cache being almost
> 50% faster than memcached.

Yes, if you have sufficient RAM, all recently accessed file data will be
cached at the OS level for fast repeated access.

> I am sorry, but I don't think your explanation addresses my concerns.

The part that didn't make sense was that you mentioned memcache having
many failures. Unless you have not provided sufficient RAM, you should
only fail on the first access to new or expired data. Perhaps with
memcached running you don't have enough memory to hold your active data
set in either the memcache cache or the now reduced filesystem buffers
and end up making them both thresh.

--
Les Mikesell
lesmi...@gmail.com


Brian Moon

unread,
May 29, 2009, 9:36:42 AM5/29/09
to memc...@googlegroups.com
Yeah, everything David said. If this had been a real production
environment where lots of file IO was happening, you could have seen a
difference too. We replaced several file based caches with memcached
requests for our ad server system. Those files were being loaded
hundreds of times per second. The IO became a bottleneck.

Brian.
--------
http://brian.moonspot.net/

Henrik Schröder

unread,
May 29, 2009, 9:46:52 AM5/29/09
to memc...@googlegroups.com
Testing the performance of memcached on a local machine is pretty useless since memcached is useful as a distributed cache accessed over the network from many client machines, and that is the case you should benchmark.

On the other hand, if you actually intend to use it locally only, then you should consider a different solution, that's not what memcached really is for, because if you only have local requests, then in-process memory accesses are always going to be faster than talking to a local memcached over sockets. Same memory, lots more overhead.


/Henrik

Brian Moon

unread,
May 29, 2009, 9:47:53 AM5/29/09
to memc...@googlegroups.com
> 1. memcached is installed on the machine where my file is so that
> takes care of your concern ( memory on a different machine ).

network is network. A localhost vs. local lan requests are the same in
terms of overhead for requests as small as memcached.

> 2. I am not just reading a single file everytime. I have 3 data files
> that I access in the script (system may be caching files in memory)

Yeah, try hundreds of files, being read hundreds of times per second.

> 3. Even if the files are not changing does it mean that disk cache is
> faster than memcached? that too an extent of disk cache being almost
> 50% faster than memcached.

Oh hell yes. This is the entire basis behind Varnish, the caching proxy
server. The kernel and/or the filesystem manages file cache. You don't
get more low level than that.

> I am sorry, but I don't think your explanation addresses my concerns.

If your file based approach works, why are you looking at memcached?

As for failures, I assume you mean failures reported by Apache Bench.
My guess is that you are not using threaded mode or are not tuning your
thread count appropriately for memcached and hitting the same 3 keys for
say 1000 concurrent requests could cause some contention.

Brian.

Dustin

unread,
May 29, 2009, 11:25:24 AM5/29/09
to memcached

On May 28, 11:51 pm, Ved <prakash.ve...@gmail.com> wrote:
> Hi
>
> I have a webpage in which I read a text file for processing. My
> benchmark (ab) results show a much higher request / second than
> contents stored and accessed using memcached. And also the number of
> failed request is 0 when I am using disk IO where as when I am using
> memcached the number keeps going higher. What could be the possible
> reasons for memory IO being slower than disk IO, and the failed
> requests.

If you're getting errors, you're likely doing something wrong. If
you're doing something wrong, you're not likely getting the
performance you could be getting.

And, as was mentioned, if you're trying to see if memcached will
outrun a single machine's filesystem cache, you'll be disappointed.
However, if you add lots of files, lots of machines *and* try to
change some of that data, then you'll start to see the benefits.

Jay Paroline

unread,
May 29, 2009, 11:45:41 AM5/29/09
to memcached
Although I agree with the others here that memcache can very well be
slower than cached local file access, and that sounds like *part* of
your issue. But your failed requests concern me. More details would be
extremely helpful.

1. How large is the file/data you are caching, and exactly how are you
caching it?
2. How are you determining failure?
3. If you write a test script with a bunch of arbitrary sets and then
gets of the same keys, do they fail?

Jay

Ved

unread,
May 30, 2009, 11:07:29 AM5/30/09
to memcached
Thanks all for your replies,

Les : Thanks for clarifying that memcached would access data as client/
server thru sockets.

There is 8 GB RAM on the server, and on first access apache bench
doesn't show any failed requests, but if I execute the program again,
I see that number of failed request show up, this is something that I
couldn't understand as why didn't the number of failed request show on
the first run and then appears every subsequent run (Total request for
each run : 10,000 and 200 cc).

"Perhaps with memcached running you don't have enough memory to ...".
this is not the case in any which ways, have checked the memory and
had that been an issue even file IO would have more or less given me
the same performance.


Brian: if there is lot of file IO it certainly would become a bottle
neck, but let say I have sufficient ram to hold most of the files in
file cache, memcached would still not give me any better results as I
am getting now. You may be right in pointing the overheads involved
related to network, but even FS does have some overheads like for
every read finding out if the file changed. Moreover, had filesystem
managing cache made such a big difference, why do I have fluctuations
in number of Requests/Sec while benchmarking the script that uses file
I/O. The fluctuations has been as high as 30% in some tests.

memcached works in threaded mode as recommended in docs (1 thread per
cpu core), in my case its 2.

Yes I am very much interested in knowing in depth about fine tuning
memcached.

"If your file based approach works.." cuz I don't want to stop working
on optimization just because something works, and as you have
mentioned about fine tuning memcached I still have not been able to
achieve that, how can I say FS approach works better for me.

Henrik: I am sorry for the term I used 'Local Machine", I am using a
server on EC2 and I run my tests from another server on EC2. And also,
I am not planning to use memcached on local machine.

Dustin: I couldn't figure out where I was going wrong, and that's the
reason I would like you guys to help me find out.

Jay: 3.If you write a test script with a bunch of arbitrary sets and
then gets of the same keys, do they fail?
didn't try this, but what are we trying to achieve out of this
exercise? In my script I need to invalidate cache once every 15 mins
and then only get when this goes on production.


Ved

Dustin

unread,
May 30, 2009, 2:24:32 PM5/30/09
to memcached

On May 30, 8:07 am, Ved <prakash.ve...@gmail.com> wrote:

> Dustin: I couldn't figure out where I was going wrong, and that's the
> reason I would like you guys to help me find out.

That was kind of my point. You've only vaguely said what you're
doing in the first place. You're expecting us to all guess what
you're doing and tell you how it's wrong.

If you don't find our guesses helpful, you should perhaps show us
exactly what you're doing so we can offer specific advice.

Syed Ali

unread,
May 30, 2009, 3:54:47 PM5/30/09
to memc...@googlegroups.com
I was thinking about u getting failures at subsequent runs .. It could
be that the entries expired.. What is the ttl u r using?

Syed

Ps: i know i m late to the party and if someone has already pointed in
that direction, i apologize.


--
Best,
- Ali

Ved

unread,
May 30, 2009, 5:29:33 PM5/30/09
to memcached
Dustin: I don't know if putting the code that I have written and also
the way I start memcached demon would help you guys figure out whats
causing this, but still I would post it for your perusal.

function array_diff_fast($data1, $data2) {
$data1 = array_flip($data1);
$data2 = array_flip($data2);

foreach($data2 as $hash => $key) {
if (isset($data1[$hash])) unset($data1[$hash]);
}

return array_flip($data1);
}
$req=explode(";",$_SERVER['QUERY_STRING']);
$sizestr=$_REQUEST['sz'];
$sizes = explode(",",$sizestr);
$memcache = new Memcache;
$memcache->connect('localhost', 11211) or die ("Could not connect");
foreach ($sizes as $sz) {
$get_result = $memcache->get('inclarr');
if(!empty($get_result)){
$incl_array = unserialize($get_result);
}
else
{
$get_result = file_get_contents($path.'/data/incl_'.
$sz.'.ser');
$memcache->set('inclarr', $get_result);
$incl_array = unserialize($get_result);
}
$incl_key_array = array_keys($incl_array);
$incl_req_filter = array_intersect($req, $incl_key_array);
if (!empty($incl_req_filter)) {
foreach($incl_req_filter as $key) {
if (!empty($incl_array[$key])) {
$include_array[$sz][$key] = $incl_array[$key];
}
}
}
$get_result = $memcache->get('exclarr');
if(!empty($get_result)){
$excl_array = unserialize($get_result);
}
else
{
$get_result = file_get_contents($path.'/data/excl_'.
$sz.'.ser');
$memcache->set('exclarr', $get_result);
$excl_array = unserialize($get_result);
}

$excl_key_array = array_keys($excl_array);
$excl_req_filter = array_intersect($req, $excl_key_array);
if (!empty($excl_req_filter)) {
foreach($excl_req_filter as $key) {
if (!empty($excl_array[$key])) {
$excl_key_array = $excl_array[$key];
foreach($include_array[$sz] as $key=>$value) {
$include_array[$sz][$key] = array_diff_fast($value,
$excl_key_array);
}
}
}
}
$get_result = $memcache->get('keymatcharr');
if(!empty($get_result)){
$fixed_match_cnt = unserialize($get_result);
}
else
{
$get_result = file_get_contents($path.'/data/
keyd_match_cnt_data.ser');
$memcache->set('keymatcharr', $get_result);
$fixed_match_cnt= unserialize($get_result);
}
$cntarr = $include_array[$sz];
if(!empty($cntarr))
{
foreach($include_array[$sz] as $key=>$value) {
foreach($value as $key_id) {
$largest = max($fixed_match_cnt[$key_id]);
if(!empty($testkey_match[$key_id])) {
if($largest > $testkey_match[$key_id]) {
$testkey_match[$key_id]++;
}
}
else
$testkey_match[$key_id] = 1;
}
}
}
}
arsort($testkey_match);
foreach($testkey_match as $key=>$value) {
for($k=0; $k < count($fixed_match_cnt[$key]); $k++) {
if ($value == $fixed_match_cnt[$key][$k]) {
if (!in_array($key, $final_array)) {
$final_array[] = $key;
}
}
}
}


and here is the command that I use to start memcached

/usr/local/memcached/bin/memcached -u root -d -c 4096 -n 8192 -b 2048

Size of data in memory : 15.3 KBytes
Size of memcached cache size: 64 Mb


-- apache bench results

Concurrency Level: 200
Time taken for tests: 33.716885 seconds
Complete requests: 10000
Failed requests: 47
(Connect: 0, Length: 47, Exceptions: 0)
Write errors: 0
Total transferred: 8112328 bytes
HTML transferred: 2041164 bytes
Requests per second: 296.59 [#/sec] (mean)
Time per request: 674.338 [ms] (mean)
Time per request: 3.372 [ms] (mean, across all concurrent
requests)
Transfer rate: 234.96 [Kbytes/sec] received

Please let me know if you need any more information ..

Ved

Bobo

unread,
May 30, 2009, 7:54:27 PM5/30/09
to memcached
>   foreach ($sizes as $sz) {
>         $get_result = $memcache->get('inclarr');
>         if(!empty($get_result)){
>                 $incl_array = unserialize($get_result);
>         }
>         else
>         {
>                 $get_result = file_get_contents($path.'/data/incl_'.
> $sz.'.ser');
>                 $memcache->set('inclarr', $get_result);
>                 $incl_array = unserialize($get_result);
>         }

Sorry I don't know how to solve your problem, but I see something
silly in this snippet (and the other foreach loops). You're either
retrieving many times the same object 'inclarr', that could as well be
retrieved before the loop (once), or you're not retrieving the right
element from the cache. In the else branch, when you're reading from
the filesystem, you're differentiatinf between files with the $sz
variable: the same should be done with the memcache element, I think.
I'm sorry if I didn't really understood your code and there is a
perfectly valid reason for what you did.

--
Andrea Baron
bo...@elbrigante.it
and...@bhweb.it

Ved

unread,
May 31, 2009, 1:20:04 AM5/31/09
to memcached
no the snippet doesn't have an issue, this loop runs only once so
there is no way same object is retrieved more than once. if the loop
seems confusing, you can just remove the foreach ($sizes as $sz)
{ and ending }.
> b...@elbrigante.it
> and...@bhweb.it

Les Mikesell

unread,
May 31, 2009, 1:48:49 AM5/31/09
to memc...@googlegroups.com
Ved wrote:

> There is 8 GB RAM on the server, and on first access apache bench
> doesn't show any failed requests, but if I execute the program again,
> I see that number of failed request show up, this is something that I
> couldn't understand as why didn't the number of failed request show on
> the first run and then appears every subsequent run (Total request for
> each run : 10,000 and 200 cc).

Are these apache errors or errors in memcache? If you are running a
high concurrency in apache benchmark you may be running the server out
of sockets or some other resource.

> "Perhaps with memcached running you don't have enough memory to ...".
> this is not the case in any which ways, have checked the memory and
> had that been an issue even file IO would have more or less given me
> the same performance.

I think you are missing the point that if you had just enough room for
the data in the file buffers without memcache, then allocate a portion
to memcache, neither one would have enough space and both would have to
continuously reload the data as it is evicted.

--
Les Mikesell
lesmi...@gmail.com


Brian Moon

unread,
May 31, 2009, 11:51:45 PM5/31/09
to memc...@googlegroups.com
> "If your file based approach works.." cuz I don't want to stop working
> on optimization just because something works, and as you have
> mentioned about fine tuning memcached I still have not been able to
> achieve that, how can I say FS approach works better for me.

Its not a question of whether or not memcached *could* be faster. The
question you should be asking when optimiziing is "What is the
bottleneck in my application?" If it is the file based cache, you need
to fix it. But, if its not, why focus on that?

As for your issues, if you only have one web server, memcached is not
the right tool here and you are wasting your time. You use PHP, use APC
or xcache if you want a memory based cache for PHP on a single web
server. If you have more than one web server, you need to be performing
your tests using more than one web server.

Brian.

Ved

unread,
Jun 1, 2009, 12:22:13 AM6/1/09
to memcached
Thanks Les, Brian .. I will surely try what you have suggested, thanks
again all for all your help. I would have been more pleased had we
discussed even remotely about memcached tuning, which we didn't. I
guess, someone here must have done and would be pleased to share it.


- V

Henrik Schröder

unread,
Jun 1, 2009, 6:40:51 AM6/1/09
to memc...@googlegroups.com
Tuning memcached consists of figuring out how to best use it in your application. For the overwhelming amount of memcached users, changing the startup parameters does absolutely nothing.


/Henrik
Reply all
Reply to author
Forward
0 new messages