Bulk uploading to App Engine is faster than you think

8 views
Skip to first unread message

Kugutsumen

unread,
Apr 30, 2009, 4:32:38 AM4/30/09
to Google App Engine
http://blog.nkill.com/2009/04/bulk-uploading-to-app-engine-is-faster.html

One of the things that really worried me when I started porting the
nkill project to App Engine was the speed at which I could upload data
to the app engine datastore.

I kept seeing threads indicating that it was a slow and painful
process.
Luckily, bulkupload.py isn't bad at all! I suspect that the bottleneck
is upload speed. Typically home users have asymnetric bandwidth
wherein the download speed is significantly higher than the upload
speed which is typically capped at 256-512 kbit/s.

Here are some stats:

2332970 entities in 21112.7 seconds (that's 2.3M in about 5 hours and
110 entities per second.)

I split my input CSV into 10,000 line files and used the following
bulkloader.py (SDK 1.2.0) options:

--rps_limit=250 --batch_size=50

I am pretty sure there is room for improvement. I tried to use
conservative values to minimize CPU usage and stay under the quota
radar (I still managed to get in the red).

The following parameters will affect the speed at which you can
upload:

--batch_size= (max 500)
--num_threads= (default 10)
--rps_limit= (default 20)
--http_limit= (default 8)

I'll do a follow-up post since I have several million records to
upload. Hopefully I'll find the sweet spot.

K.
Reply all
Reply to author
Forward
0 new messages