Re: Digest for picloud@googlegroups.com - 1 Message in 1 Topic

6 views
Skip to first unread message

Eric Lofgren

unread,
Dec 5, 2011, 6:43:04 PM12/5/11
to pic...@googlegroups.com
Ken,

Thanks for the quick reply, I've been out of town and just getting back to it. The input data for the function call is fairly small, and actually coded into the function. Essentially, the code takes in a system of ordinary differential equations, and solves the system for few thousand time points. So there's essentially 0 data going in, and a largeish set of rows and columns coming out that then need to be saved as...something. CSV just happens to probably be the most flexible format.

A single run of the model with a single iteration, no batching or multiple runs, takes on one of your c1 machines about 0.007 seconds using my mockup code. The actual version is slightly more complex, but still won't likely even take on the order of 1 second all by itself. The problem is needing to draw samples from a huge number of distributions and still get coverage means running the simulation quite a few times. I'd also like to run a more complex form of the model that should take longer to run, but by how much, I don't know. 

But for right now, it seems my limiting problem is how to handle the output NumPy array and put it into something usable.

Eric


On Nov 28, 2011, at 9:20 AM, pic...@googlegroups.com wrote:
    Ken Elkabany <k...@picloud.com> Nov 27 03:26PM -0800  

    Hi Eric,
     
    Yes, you're correct. Each job should be responsible for saving its own data
    to cloud.files.
     
    An efficient way to do what you want is to generate the CSV in memory using
    the Python csv module and then upload it using cloud.files.putf. This
    avoids writing the file to disk, and then redundantly reading the file from
    disk to upload it. Here's a quick example:
     
    import csv
    from cStringIO import StringIO
     
    # create a file-like object that resides purely in memory
    f = StringIO()
     
    # create a csv writer object
    w = csv.writer(f)
     
    # you can call this function repeatedly to write as many rows as you want
    # this writes a row with values from 0 to 9
    w.writerow(range(10))
     
    # (optional) you can see that the csv writer is writing to the StringIO obj
    print f.getvalue()
     
    # save your csv to cloud.files
    cloud.files.putf(f, 'name_for_data')
     
    # you can retrieve the data at a later time
    # get will save the csv to a file of the same name
    cloud.files.get('name_for_data')
     
    Helpful links:
    http://docs.python.org/library/stringio.html
    http://docs.python.org/library/csv.html
    http://docs.picloud.com/moduledoc.html#module-cloud.files
     
    How large is the input data for each function call? Are the functions
    taking in a CSV and outputting a CSV? How long does each run take when you
    haven't batched jobs together?
     
    Ken
     
    > the end of the day, it would be nice to have the collected data - or
    > chunks of it - in a nice convenient CSV file for later analysis on my
    > local machine.
     
    Batching any number of runs so I don't have to call 10,000 jobs means
     

You received this message because you are subscribed to the Google Group picloud.
You can post via email.
To unsubscribe from this group, send an empty message.
For more options, visit this group.


Reply all
Reply to author
Forward
0 new messages