Importing data to datastore with bulkloader : error and long back off, but import finish ok

209 views
Skip to first unread message

Mathieu Clavel

unread,
Nov 22, 2011, 9:35:30 AM11/22/11
to google-a...@googlegroups.com
Hello,

I'm importing data to my app datastore (slave/master) with the bulkloader.
I'm on Windows Xp, with Python 2.7.1 and Appengine SDK 1.6.0

appcfg.py upload_data --num_threads=1 --batch_size=1000 --rps_limit=1000 --config_file=bulkloader.yaml --filename=contrats.csv --kind=Contrat --url "https://XXX.appspot.com/remote_api"

Uploading data records.
[INFO    ] Logging to bulkloader-log-20111122.102427
[INFO    ] Throttling transfers:
[INFO    ] Bandwidth: 250000 bytes/second
[INFO    ] HTTP connections: 8/second
[INFO    ] Entities inserted/fetched/modified: 1000/second
[INFO    ] Batch Size: 1000
[INFO    ] Opening database: bulkloader-progress-20111122.102427.sql3
[INFO    ] Connecting to XXX.appspot.com/remote_api
[INFO    ] Starting import; maximum 1000 entities per post
.[INFO    ] [WorkerThread-0] Backing off due to errors: 1.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 2.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 4.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 8.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 16.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 32.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 64.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 128.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 256.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 512.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 1024.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 2048.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 4096.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 8192.0 seconds
.[INFO    ] [WorkerThread-0] Backing off due to errors: 16384.0 seconds

[INFO    ] 14115 entities total, 0 previously transferred
[INFO    ] 14115 entities (17642966 bytes) transferred in 17328.7 seconds
[INFO    ] All entities successfully transferred

When looking in the log, I have this :
[DEBUG    2011-11-22 10:24:31,710 adaptive_thread_pool.py] [WorkerThread-0] WorkerThread: started
[DEBUG    2011-11-22 10:24:32,444 bulkloader.py] [Thread-2] ProgressTrackerThread: started
[DEBUG    2011-11-22 10:24:32,444 bulkloader.py] [Thread-3] DataSourceThread: started
[INFO     2011-11-22 10:24:32,444 bulkloader.py] Starting import; maximum 1000 entities per post
[DEBUG    2011-11-22 10:24:33,085 adaptive_thread_pool.py] [WorkerThread-0] Got work item [1-1000]
[DEBUG    2011-11-22 10:24:37,944 bulkloader.py] [Thread-3] DataSourceThread: exiting
[DEBUG    2011-11-22 10:24:37,960 bulkloader.py] Waiting for worker threads to finish...
[DEBUG    2011-11-22 10:25:45,318 bulkloader.py] [WorkerThread-0] [1-1000] Transferred 1000 entities in 71.1 seconds
[INFO     2011-11-22 10:25:45,318 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 1.0 seconds
[DEBUG    2011-11-22 10:25:46,318 adaptive_thread_pool.py] [WorkerThread-0] Got work item [1001-2000]
[DEBUG    2011-11-22 10:26:39,582 bulkloader.py] [WorkerThread-0] [1001-2000] Transferred 1000 entities in 53.2 seconds
[INFO     2011-11-22 10:26:39,582 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 2.0 seconds
[DEBUG    2011-11-22 10:26:41,582 adaptive_thread_pool.py] [WorkerThread-0] Got work item [2001-3000]
[DEBUG    2011-11-22 10:27:36,644 bulkloader.py] [WorkerThread-0] [2001-3000] Transferred 1000 entities in 55.0 seconds
[INFO     2011-11-22 10:27:36,644 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 4.0 seconds
[DEBUG    2011-11-22 10:27:40,644 adaptive_thread_pool.py] [WorkerThread-0] Got work item [3001-4000]
[DEBUG    2011-11-22 10:28:34,846 bulkloader.py] [WorkerThread-0] [3001-4000] Transferred 1000 entities in 54.2 seconds
[INFO     2011-11-22 10:28:34,846 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 8.0 seconds
[DEBUG    2011-11-22 10:28:42,846 adaptive_thread_pool.py] [WorkerThread-0] Got work item [4001-5000]
[DEBUG    2011-11-22 10:29:36,253 bulkloader.py] [WorkerThread-0] [4001-5000] Transferred 1000 entities in 53.4 seconds
[INFO     2011-11-22 10:29:36,253 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 16.0 seconds
[DEBUG    2011-11-22 10:29:52,253 adaptive_thread_pool.py] [WorkerThread-0] Got work item [5001-6000]
[DEBUG    2011-11-22 10:30:45,611 bulkloader.py] [WorkerThread-0] [5001-6000] Transferred 1000 entities in 53.3 seconds
[INFO     2011-11-22 10:30:45,611 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 32.0 seconds
[DEBUG    2011-11-22 10:31:17,627 adaptive_thread_pool.py] [WorkerThread-0] Got work item [6001-7000]
[DEBUG    2011-11-22 10:32:18,844 bulkloader.py] [WorkerThread-0] [6001-7000] Transferred 1000 entities in 61.2 seconds
[INFO     2011-11-22 10:32:18,844 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 64.0 seconds
[DEBUG    2011-11-22 10:33:22,891 adaptive_thread_pool.py] [WorkerThread-0] Got work item [7001-8000]
[DEBUG    2011-11-22 10:34:21,124 bulkloader.py] [WorkerThread-0] [7001-8000] Transferred 1000 entities in 58.2 seconds
[INFO     2011-11-22 10:34:21,124 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 128.0 seconds
[DEBUG    2011-11-22 10:36:30,684 adaptive_thread_pool.py] [WorkerThread-0] Got work item [8001-9000]
[DEBUG    2011-11-22 10:38:00,964 bulkloader.py] [WorkerThread-0] [8001-9000] Transferred 1000 entities in 90.2 seconds
[INFO     2011-11-22 10:38:00,964 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 256.0 seconds
[DEBUG    2011-11-22 10:42:16,993 adaptive_thread_pool.py] [WorkerThread-0] Got work item [9001-10000]
[DEBUG    2011-11-22 10:44:01,756 bulkloader.py] [WorkerThread-0] [9001-10000] Transferred 1000 entities in 104.7 seconds
[INFO     2011-11-22 10:44:01,756 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 512.0 seconds
[DEBUG    2011-11-22 10:52:34,328 adaptive_thread_pool.py] [WorkerThread-0] Got work item [10001-11000]
[DEBUG    2011-11-22 10:53:32,483 bulkloader.py] [WorkerThread-0] [10001-11000] Transferred 1000 entities in 58.1 seconds
[INFO     2011-11-22 10:53:32,483 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 1024.0 seconds
[DEBUG    2011-11-22 11:10:37,174 adaptive_thread_pool.py] [WorkerThread-0] Got work item [11001-12000]
[DEBUG    2011-11-22 11:11:34,361 bulkloader.py] [WorkerThread-0] [11001-12000] Transferred 1000 entities in 57.2 seconds
[INFO     2011-11-22 11:11:34,361 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 2048.0 seconds
[DEBUG    2011-11-22 11:45:43,578 adaptive_thread_pool.py] [WorkerThread-0] Got work item [12001-13000]
[DEBUG    2011-11-22 11:46:42,576 bulkloader.py] [WorkerThread-0] [12001-13000] Transferred 1000 entities in 59.0 seconds
[INFO     2011-11-22 11:46:42,576 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 4096.0 seconds
[DEBUG    2011-11-22 12:55:00,586 adaptive_thread_pool.py] [WorkerThread-0] Got work item [13001-14000]
[DEBUG    2011-11-22 12:56:00,555 bulkloader.py] [WorkerThread-0] [13001-14000] Transferred 1000 entities in 59.9 seconds
[INFO     2011-11-22 12:56:00,555 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 8192.0 seconds
[DEBUG    2011-11-22 15:12:41,778 adaptive_thread_pool.py] [WorkerThread-0] Got work item [14001-14115]
[DEBUG    2011-11-22 15:13:15,792 bulkloader.py] [WorkerThread-0] [14001-14115] Transferred 115 entities in 17.1 seconds
[INFO     2011-11-22 15:13:15,823 adaptive_thread_pool.py] [WorkerThread-0] Backing off due to errors: 16384.0 seconds
[DEBUG    2011-11-22 15:13:15,871 adaptive_thread_pool.py] Waiting for Anonymous_WorkerThread to exit
[DEBUG    2011-11-22 15:13:16,089 adaptive_thread_pool.py] [WorkerThread-0] WorkerThread: exiting
[DEBUG    2011-11-22 15:13:16,167 bulkloader.py] Waiting for progress_thread to terminate...
[DEBUG    2011-11-22 15:13:16,371 bulkloader.py] [Thread-2] ProgressTrackerThread: exiting
[DEBUG    2011-11-22 15:13:16,417 bulkloader.py] ... done.
[INFO     2011-11-22 15:13:16,434 bulkloader.py] 14115 entities total, 0 previously transferred
[INFO     2011-11-22 15:13:16,434 bulkloader.py] 14115 entities (17642966 bytes) transferred in 17328.7 seconds
[INFO     2011-11-22 15:13:16,434 bulkloader.py] All entities successfully transferred

The import was working ok before, but I can't tell since when the problem started : SDK 1.6.0 or GAE new pricing.

Does anyone know how to fix it ?

Thanks,

Mathieu

Brian

unread,
Nov 6, 2012, 2:51:49 PM11/6/12
to google-a...@googlegroups.com
Yes, I ran into the same problem a couple of years ago with the local SDK, and again recently when I first deployed my app to appspot.com. The solution that worked for me was to batch upload my data with no more that 60 records in each batch. (Had to write some local tools just to work around this problem!) Otherwise the delays kept expanding exponentially until hours became days and days became weeks -- it would never finish the upload. Hope this helps!

Brian

unread,
Jun 6, 2013, 10:35:21 AM6/6/13
to google-a...@googlegroups.com
Hi Vincent,

The tools I hacked together are pretty rough, but not difficult.

I used the Python CSV module to read the source data CSV file and rewrite it into a tmp folder as multiple CSV part files with no more than 60 records. Also included the writing of a command file that I can call to upload each part.

Good luck!

On Saturday, June 1, 2013 5:04:39 PM UTC-5, Vincent Lee wrote:
brian,

Can I borrow your local tools?
Reply all
Reply to author
Forward
0 new messages