Threading Heisenbug in mapreduce- style financial simulation

8 views
Skip to first unread message

justin worrall

unread,
Jul 8, 2014, 9:58:06 AM7/8/14
to app-engine-...@googlegroups.com
Hi,

I'm using the Pipeline API plus cloudstorage to parallelise a large financial simulation, rather like this:


The above is just an example; it uses a simplified algo (summing two numbers) and saves to memcache rather than GCS.

The key detail is that the SimMapPipeline makes may calls to an external URL, which performs the actual calculations.

This sample runs fine. I'm start to have problems when I replace the simple summing algo with the real one (take approx 1 second to run), and then increase the task queue bucket to anything above 1.

The problems seems to be at the MapPipeline level; the entire process runs fine *but all the results are jumbled up*.

I am not a threading expert, and have no great understanding of the Pipeline API internals, but the problem seems to be multiple threads accessing MapPipeline.run() at once; they seem to be mixing up either request files or response destinations.

I tried implementing threading.lock() around MapPipeline.run(). This definitely mitigated the problem somewhat, but not completely; there are still some errors getting through.

Can anyone advise on what might be happening ? How the threading model works here, particularly in conjunction with urlfetch ?

Thank you.

Reply all
Reply to author
Forward
0 new messages