Finally, Benchmark results! Community, feedback here!

102 views
Skip to first unread message

PatrickNegri

unread,
Apr 13, 2008, 6:06:57 PM4/13/08
to Google App Engine
Hello guys,

I have completed the first benchmark tests.

Read it formatted at:
http://aeuniversity.appspot.com/

Also, i am a bit disappointed with overall performance.

The read speed will be only acceptable if it scales well.
Something like queries being the same speed regardless of how many
clients you have connected.
The write speed is horrible.

Read it, and comment it!

Regards
Patrick

Hubert Chen

unread,
Apr 13, 2008, 7:52:46 PM4/13/08
to Google App Engine
I think benchmarking the developer environment doesn't really make
much sense. The infrastructure and scale behind the production
environment is vastly different than the dev environment. Your dev
environment is probably a single slow disk which is bottlenecked by
the CPU. The real BigTable and GFS are large, distributed environments
that have much different characteristics. The dev environment probably
just pickles the datastore in a simplistic fashion. It wouldn't
surprise me if the development datastore didn't support transactions
or any kind of fault tolerance.

We don't know about the production environment except that the
datastore uses BigTable. Bigtable and gfs are designed primarily with
scalability, large concurrent operations, and distributed fault
tolerance in mind. It's not clear to me if individual writes and index
lookups will be fast.

But I'm very interested in what you've seen so far and I am eager to
see more benchmarks of deployed applications.

Good Luck,
hubert

Lee O

unread,
Apr 13, 2008, 7:55:08 PM4/13/08
to google-a...@googlegroups.com
I too eagerly await good benchmarking results on the live app.
--
Lee Olayvar
http://www.leeolayvar.com

PatrickNegri

unread,
Apr 13, 2008, 8:11:58 PM4/13/08
to Google App Engine
The main problem is:
I cant do bench on deployed application, there is a timelimit on any
script of 8 seconds.
:(

But also, i have bad news... testing in window limit of 8 second, the
results were the same. Also, my dev ambient doenst sux :P. I have a
cluster of 12 servers that use a mapreduce/glusterfs implementation
and my dev ambient is running in that.

Now that the results of the test are the same of deployed app. I only
think about what matters here, how the app scale.

But its confirmed, the speed of insertion is the same at appspot.com.
Sorry guys.

I dont know how the insertion is handled in backend, i am giving a
break to study the datastore py sources from google directory.

Regards
Patrick

Josh Flanagan

unread,
Apr 13, 2008, 8:20:47 PM4/13/08
to google-a...@googlegroups.com
I don't think benchmarking the dev environment tells us anything
either. Its a simulation of the platform, not the actual platform.

PatrickNegri

unread,
Apr 13, 2008, 8:30:48 PM4/13/08
to Google App Engine
Yes but now i have bech in the appspot too, same results for insert
operations.

Was trying to do a Bulk Loading of data but, bulk load menu is gone
now?

Trying to create a distributed inclusion using my cluster.
Since i can add 50 inserts each time in 8second window, configuring
the cluster to simultaneous call the insertion script at appspot.
Just waiting of of my crawlers finishim a job, and will dispatch 250
simultanous requests, each one to add 4 records.
Will post results soon.

Patrick

PatrickNegri

unread,
Apr 13, 2008, 8:46:02 PM4/13/08
to Google App Engine
Any1 have success using bulkloading?

Cant make it work in windows or fc6.
Lots of errors.

Patrick

PatrickNegri

unread,
Apr 13, 2008, 9:12:51 PM4/13/08
to Google App Engine
Heya guys,

New benchmark tests.
Finally i got it to work at appspot.com.

IT SCALES!!! :)
But add 0.47 seconds to each 1k record you need.

Testing data below:

Benchmarking 10 inserts: 0.01000000 secs

Benchmarking single delete (100): 0.07000000 secs
Benchmarking 100 inserts: 0.14000000 secs
Benchmarking all select (100): 100 - 0.04000000 secs

Benchmarking single delete (200): 0.14000000 secs
Benchmarking 200 inserts: 0.25000000 secs
Benchmarking all select (200): 200 - 0.09000000 secs

Benchmarking single delete (250): 0.16000000 secs
Benchmarking 250 inserts: 0.31000000 secs
Benchmarking all select (250): 250 - 0.12000000 secs

Benchmarking single delete (400): 0.24000000 secs
Benchmarking 400 inserts: 0.48000000 secs
Benchmarking all select (400): 400 - 0.19000000 secs

Adding 1000 records in Parallel:
Benchmarking 200 inserts: 0.24000000 secs
Benchmarking all select (200): 200 - 0.09000000 secs
Benchmarking 200 inserts: 0.25000000 secs
Benchmarking all select (200): 400 - 0.18000000 secs
Benchmarking 200 inserts: 0.26000000 secs
Benchmarking all select (200): 600 - 0.32000000 secs
Benchmarking 200 inserts: 0.26000000 secs
Benchmarking all select (200): 800 - 0.37000000 secs
Benchmarking 200 inserts: 0.25000000 secs
Benchmarking all select (200): 1000 - 0.47000000 secs

Testing only select in 1000:
Benchmarking all select (1000): 1000 - 0.49000000 secs
Benchmarking all select (1000): 1000 - 0.46000000 secs
Benchmarking all select (1000): 1000 - 0.50000000 secs
Benchmarking all select (1000): 1000 - 0.47000000 secs
Benchmarking all select (1000): 1000 - 0.52000000 secs

Testing only select in 1000 limiting 200:
Benchmarking all select (200): 1000 - 0.46000000 secs
Benchmarking all select (200): 1000 - 0.51000000 secs
Benchmarking all select (200): 1000 - 0.46000000 secs
Benchmarking all select (200): 1000 - 0.52000000 secs
Benchmarking all select (200): 1000 - 0.46000000 secs

Populating the Sample to 3000:
Benchmarking 200 inserts: 0.25000000 secs
Benchmarking all select (200): 1200 - 0.47000000 secs
Benchmarking 200 inserts: 0.30000000 secs
Benchmarking all select (200): 1400 - 0.50000000 secs
Benchmarking 200 inserts: 0.28000000 secs
Benchmarking all select (200): 1600 - 0.47000000 secs
Benchmarking 200 inserts: 0.26000000 secs
Benchmarking all select (200): 1800 - 0.50000000 secs
Benchmarking 200 inserts: 0.27000000 secs
Benchmarking all select (200): 2000 - 0.46000000 secs
Benchmarking 200 inserts: 0.25000000 secs
Benchmarking all select (200): 2200 - 0.50000000 secs
Benchmarking 200 inserts: 0.26000000 secs
Benchmarking all select (200): 2400 - 0.46000000 secs
Benchmarking 200 inserts: 0.26000000 secs
Benchmarking all select (200): 2600 - 0.51000000 secs
Benchmarking 200 inserts: 0.26000000 secs
Benchmarking all select (200): 2800 - 0.46000000 secs
Benchmarking 200 inserts: 0.25000000 secs
Benchmarking all select (200): 3000 - 0.52000000 secs

Testing only select in 3000:
Benchmarking all select (1000): 1000 - 0.47000000 secs
Benchmarking all select (1000): 1000 - 0.46000000 secs
Benchmarking all select (1000): 1000 - 0.49000000 secs
Benchmarking all select (1000): 1000 - 0.46000000 secs
Benchmarking all select (1000): 1000 - 0.50000000 secs

----------------
Comment

Patrick

ryan

unread,
Apr 14, 2008, 3:42:34 AM4/14/08
to Google App Engine
To confirm the conclusions here, the dev_appserver is intended for
rapid offline development, not scalability, high performance, or high
data volume. On each write, it pickles and writes the entire datastore
to a file. On each query, it filters, sorts, and prepares all of the
query results in memory, then maps over the entire query history to
determine whether it needs to generate new indexes into index.yaml.
Needless to say, these are expensive, and they'll slow down noticeably
under heavy usage or large amounts of data.

PatrickNegri

unread,
Apr 14, 2008, 4:07:08 AM4/14/08
to Google App Engine
updated
Our conclusion about this topic is:

* Inserts are slow, local or remote, but the speed seens to
maintain.
* Selects maintains speed too when running on Google Architecture.
* Dont compare the perfomance with relational database like mysql:
o Databases like that degrade perfomance when you start
adding connections
o While AppEngine datastore cant beat the speed of RDBM´s in
SINGLE LOAD, it maintains the speed when you bulk load with clients,
becoming a major feature of the system.

My final word on this topic is:
The results were good.

Cameron Singe

unread,
Apr 14, 2008, 5:51:49 AM4/14/08
to Google App Engine
I wonder if the db stuff is thread safe? Still new to python my self

You could do the inserts in a thread, then send the response back to
user for larger inserts

Brett Morgan

unread,
Apr 14, 2008, 5:57:34 AM4/14/08
to google-a...@googlegroups.com
Which bit of the "db stuff" are you thinking about being thread safe?
The actual python sandboxes are designed to only allow single thread,
so there are no threading issues there. The actual DataStore back end,
being built atop GFS and BigTable is designed from the ground up to
handle concurrent access.

So yeah, i'd say it's thread safe. Hell, this entire architecture has
been designed from the ground up to eliminate threading issues.

Duncan

unread,
Apr 14, 2008, 6:46:48 AM4/14/08
to Google App Engine
Patrick, could you clarify please what you mean by something like '400
inserts'? Are all these inserts inside a single transaction or are
they separate, and if separate then what sort of entity groups are
they arranged in?

I think (though I have no figures yet) that for real-world performance
it is going to be very critical how you decide to structure the entity
groups, and certainly to get good scalability the implication is that
you want to group related data into the same entity group so it can be
updated within a single transaction, but you also want many separate
entity groups to allow simultaneous inserts from different users.

PatrickNegri

unread,
Apr 14, 2008, 6:54:46 AM4/14/08
to Google App Engine
Duncan, you cant add 400 inserts in a single script call.
These tests was using simultaneous calls.

For example. I have configured a insertion script at server, where i
can specify the number of insertions, something like:
http://localhost/insert?qnt=200 (The insert function is populing the
datastore with random data)

Then i have called simultaneous, 15 times using a webcrawler.
15*200 = 3000 insertions.

Also, entity groups entire handled by appengine.

Duncan

unread,
Apr 14, 2008, 7:20:36 AM4/14/08
to Google App Engine
So by 'entity groups entire handled by appengine' do I assume that you
didn't specify any parent objects, so each insert is in its own entity
group? What I'm getting at is that I would expect that inserting say
100 objects under the same parent will be massively faster than
inserting 100 objects at the root level.

A useful set of benchmarks might be to time:

a 200 inserts with no parent
b 200 inserts under a single parent (same one every time) but no
transactions
c 200 inserts under a single parent (same one every time) in a single
transaction
d 200 inserts under a single parent (different for each web request)
but no transactions
e 200 inserts under a single parent (different for each web request)
in a single transaction

If you do those with your 15 simultaneous requests then I think that d
and e ought to scale better than b and c. It would be really great,
given that you have most of the benchmark framework set up already, if
you could do some tests along these lines.

PatrickNegri

unread,
Apr 14, 2008, 7:24:08 AM4/14/08
to Google App Engine
I will try out, but you know by getting inserts at same entity group
is going to slow down the search speed, right?

Duncan

unread,
Apr 14, 2008, 8:03:31 AM4/14/08
to Google App Engine
On Apr 14, 12:24 pm, PatrickNegri <patrickne...@gmail.com> wrote:
> I will try out, but you know by getting inserts at same entity group
> is going to slow down the search speed, right?

I know it is likely to have an effect on search speed but that could
be a slow down or speed up depending on the pattern of use, and it may
be tricky using random data to mimic a realistic use pattern.

For example, if I have a bunch of data owned by each user so store
them under an object for that user, then when I do a search which
specifies the user's root as an ancestor I think that will be faster
than a search which specifies no ancestor, or any kind of search if
the data is all in separate groups. Or of course I could be way wrong
which is why some figures would be nice. :)

As I understand it, all data within a single entity group is stored on
a single server, and any data you access is copied to a database
server near the web server. So that means if you put everything in a
single group any data access could involve copying the entire database
onto a local server, and any update means a large copy onto any other
servers holding a replica of the data (all of which sounds bad).
Putting everything in a separate entity group means every access just
involves grabbing a lot of small items, but will have overhead from a
lot of small copies (also bad). Using appropriately chosen entity
groups should mean that when you first start accessing the data you
also make available related data which is likely to be accessed so
subsequent hits won't have to get anything local, and if someone
updates some unrelated data that doesn't invalidate your local copy.

If I've understood it correctly the implication is definitely that you
need to group data to get performance, and you need to structure the
data so that as many searches as possible are restricted to a single
entity group. Since you cannot move data around once it has been
created it may be tricky to get the right groupings.

PatrickNegri

unread,
Apr 14, 2008, 8:09:22 AM4/14/08
to Google App Engine
Very nice understand of the system Duncan, i am currently trying to
contact Ryan (Google) for a chat, he have high knowledge of the
DataStore and can help us understand the system and best practices.
I know there will be a large event next month, where he will speak,
but i think we cant wait for next month, also, if we can get enough
information, we can put on our wiki and create some screencasts
showing
how to use it.

Lee O

unread,
Apr 22, 2008, 1:14:17 AM4/22/08
to google-a...@googlegroups.com
Yea, i got a feeling i'll be rewriting a lot of code once we get official "best practices" on how data is stored. Its sort of surprising we got GAE without good docs on how to manage the DataStore properly.

Brett Morgan

unread,
Apr 22, 2008, 1:43:38 AM4/22/08
to google-a...@googlegroups.com
I suspect it's a new platform, and the great thing about new platforms
is that best practice is still to be discovered.
Reply all
Reply to author
Forward
0 new messages