Number of Instances - Details | Average QPS | Average Latency | Average Memory |
3 total (1 Resident) | 1.017 | 568.7 ms | 61.5 MBytes |
Hit count: | 237801 |
Miss count: | 1940100 |
Hit ratio: | 10% |
Item count: | 43785 item(s) |
Total cache size: | 78798944 byte(s) |
Oldest item age: | 8 hour(s) 36 min(s) 47 second(s) |
Same App serving same content, but in a different Market
Number of Instances - Details | Average QPS | Average Latency | Average Memory |
5 total | 41.062 | 98.7 ms | 51.6 MBytes |
Mecache
Hit count: | 2610910 |
Miss count: | 308020 |
Hit ratio: | 89% |
Item count: | 54803 item(s) |
Total cache size: | 92765915 byte(s) |
Oldest item age: | 8 hour(s) 27 min(s) 31 second(s) |
--
On Wed, Aug 8, 2012 at 5:44 PM, Waleed Abdulla <wal...@ninua.com> wrote:You must mean concurrency less than 10?
> Thanks, Jeff. Is it possible to repeat the test with qps < 10 to rule out
> the limit that Johan pointed out? In other words, how big is the performance
> difference if you had less requests that do more work?
I'm not really certain how concurrency relates to this. All the tests
I ran (Node.js, Twisted, Tornado, Simple) were nonblocking servers
with a concurrency of 1.
Maybe - just maybe - it would be possible to
increase throughput by using multiple system threads up to the number
of cores available... but then you would lose performance due to
synchronization. Probably significantly. Optimal hardware
utilization is one isolated, single-threaded, nonblocking server per
core.
I really don't know why backends are slow. Maybe it has something to
do with the request queueing system? Throughput sucks even when
backends are doing noops. Maybe "increased concurrency" would allow
more requests to travel through the queueing system at once... but
it's hard to imagine this helping out the actual server process at
all.
More timeslicing and synchronization on a cpu- and memory-bound
problem will reduce performance, not improve it.
Jeff
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
I think I mentioned this before, but there is no I/O in the problem.
It just collects data in RAM (thousands of individual submissions) and
waits for the reaper come get the entire set in one fetch. This is
why I do not expect concurrency to help.
If you've been reading various threads on this list you know that
Richard has been having trouble getting his mobile game to run
smoothly on GAE. It's a little unusual because timing is coordinated
precisely:
* At T+0, all clients submit scores
* At T+5s, a reaper process aggregates the scores and builds a result set
* At T+10s, all clients fetch scores
The question is: Where to submit the score data so that the reaper
can fetch and aggregate it?
Here's some answers that didn't work:
* The datastore. Eventual consistency is too eventual to query for
all the scores and get them.
* Pull queues. There's too much of a delay between task insertion
and when it appears for leasing.
* A single backend. One backend cannot handle more than ~80qps.
He eventually got a system working reliably, sharded across ten B1
instances, at a cost (beyond other charges) of ~$600/mo. It can
collect a couple thousand scores within the 5s deadline (barely).
I thought this was insane, so I built a few experiments to see what
other technologies can do, using the exact program logic of Richard's
collector. Here are the results:
The environment: 256MB Rackspacecloud VPS running Ubuntu 10.04.4 LTS
The cost: $11/mo
The command: ab -c 10000 -n 10000 -r http://theurl (that's 10k
requests, all concurrent).
Node.js: ~2500 qps. Rock solid through multiple test runs, all
complete before the deadline.
Java SimpleHTTP: ~2100 qps. Had to bump heap up to 128MB.
Python Twisted: ~1600 qps. Failed a lot of requests on most test runs.
Python Tornado: ~1500 qps, but rock solid through multiple test runs.
So basically, an $11/mo VPS server running Javascript vastly exceeds
the capabilities of 10 backends at $60/mo each.
Jeff
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
On Mon, Aug 13, 2012 at 6:33 AM, Johan Euphrosine <pro...@google.com> wrote:I tried both. Going ab -> frontend -> backend is _much_ slower than
>
> Hi Jeff,
>
> Can you share your test methodology ?
>
> I would like to reproduce this testing on my side before escalating the
> results.
>
> In particular I'm wondering if are you calling ab on the backends directly,
> or on App Engine frontend that urlfetch to the backend in an handler?
going directly to a backend.
You can see the code for testing backends here:
https://github.com/stickfigure/wh-test
There is a bunch of extra code in there you should ignore. The
important methods are:
/noop - returns static string from frontend
/bnoop - urlfetch from frontend to backend which returns static string
/backend/noop - directly to backend to return static string
/away - just does a urlfetch to a rackspacecloud vps instance
There's pretty much no code here. Just handlers that do nothing.
Jeff
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Memcache doesn't work for this application. Completely aside from
reliability issues, there's no memcache instruction for "give me all
the data". At best you can fetch some number of individual keys, but
that brings up two problems:
1) Trying to do a batch fetch of 10k keys (or more)
2) How do you know what keys to fetch in the first place?
#2 is intractable because any practical solution would essentially
eliminate the need for memcache in the first place.
The problem is pretty easily stated: Collect 10k score submissions in
5s and be able to provide a sorted leaderboard 5s later. GAE does not
offer any practical facility capable of accomplishing this.