Re: Digest for picloud@googlegroups.com - 1 Message in 1 Topic

6 views

Skip to first unread message

Eric Lofgren

unread,

Dec 5, 2011, 6:43:04 PM12/5/11

to pic...@googlegroups.com

Ken,

Thanks for the quick reply, I've been out of town and just getting back to it. The input data for the function call is fairly small, and actually coded into the function. Essentially, the code takes in a system of ordinary differential equations, and solves the system for few thousand time points. So there's essentially 0 data going in, and a largeish set of rows and columns coming out that then need to be saved as...something. CSV just happens to probably be the most flexible format.

A single run of the model with a single iteration, no batching or multiple runs, takes on one of your c1 machines about 0.007 seconds using my mockup code. The actual version is slightly more complex, but still won't likely even take on the order of 1 second all by itself. The problem is needing to draw samples from a huge number of distributions and still get coverage means running the simulation quite a few times. I'd also like to run a more complex form of the model that should take longer to run, but by how much, I don't know.

But for right now, it seems my limiting problem is how to handle the output NumPy array and put it into something usable.

Eric

On Nov 28, 2011, at 9:20 AM, pic...@googlegroups.com wrote:

Ken Elkabany <k...@picloud.com> Nov 27 03:26PM -0800

Hi Eric,

Yes, you're correct. Each job should be responsible for saving its own data
to cloud.files.

An efficient way to do what you want is to generate the CSV in memory using
the Python csv module and then upload it using cloud.files.putf. This
avoids writing the file to disk, and then redundantly reading the file from
disk to upload it. Here's a quick example:

import csv
from cStringIO import StringIO

# create a file-like object that resides purely in memory
f = StringIO()

# create a csv writer object
w = csv.writer(f)

# you can call this function repeatedly to write as many rows as you want
# this writes a row with values from 0 to 9
w.writerow(range(10))

# (optional) you can see that the csv writer is writing to the StringIO obj
print f.getvalue()

# save your csv to cloud.files
cloud.files.putf(f, 'name_for_data')

# you can retrieve the data at a later time
# get will save the csv to a file of the same name
cloud.files.get('name_for_data')

Helpful links:
http://docs.python.org/library/stringio.html
http://docs.python.org/library/csv.html
http://docs.picloud.com/moduledoc.html#module-cloud.files

How large is the input data for each function call? Are the functions
taking in a CSV and outputting a CSV? How long does each run take when you
haven't batched jobs together?

Ken

> the end of the day, it would be nice to have the collected data - or
> chunks of it - in a nice convenient CSV file for later analysis on my
> local machine.

Batching any number of runs so I don't have to call 10,000 jobs means

You received this message because you are subscribed to the Google Group picloud.
You can post via email.
To unsubscribe from this group, send an empty message.
For more options, visit this group.

Reply all

Reply to author

Forward

0 new messages