Bulk uploading to App Engine is faster than you think

Skip to first unread message


Apr 30, 2009, 4:32:38 AM4/30/09
to Google App Engine

One of the things that really worried me when I started porting the
nkill project to App Engine was the speed at which I could upload data
to the app engine datastore.

I kept seeing threads indicating that it was a slow and painful
Luckily, bulkupload.py isn't bad at all! I suspect that the bottleneck
is upload speed. Typically home users have asymnetric bandwidth
wherein the download speed is significantly higher than the upload
speed which is typically capped at 256-512 kbit/s.

Here are some stats:

2332970 entities in 21112.7 seconds (that's 2.3M in about 5 hours and
110 entities per second.)

I split my input CSV into 10,000 line files and used the following
bulkloader.py (SDK 1.2.0) options:

--rps_limit=250 --batch_size=50

I am pretty sure there is room for improvement. I tried to use
conservative values to minimize CPU usage and stay under the quota
radar (I still managed to get in the red).

The following parameters will affect the speed at which you can

--batch_size= (max 500)
--num_threads= (default 10)
--rps_limit= (default 20)
--http_limit= (default 8)

I'll do a follow-up post since I have several million records to
upload. Hopefully I'll find the sweet spot.

Reply all
Reply to author
0 new messages