Hi Jeff:
Merely replacing one 6-line segment of code with a cython equivalent has reduced my program's execution time by about 70%, so this quest is definitely worthwhile! There is a bit more to do along these lines, but now I have a new issue. Using the profiler, I realize that the majority of execution time is now spent executing the pandas "merge" command (to calculate the objective function for the entire dataset, I need to run this command over a million times to merge lots of chunks of data). I need to reduce this time substantially. I strongly suspect that a lot of the time now is being spent allocating memory, so I'd like to stop doing this.
Since I use the merged chunks of data sequentially, in my C code I allocate up-front one buffer large enough to hold the largest conceivable merged output then loop through, merging, putting the results into this pre-allocated memory, processing, then moving on to the
next chunk. I'd like to replicate this in my Python code, so ideally I'd start by allocating 3 large ndarrays for strings, ints, and floats respectively, then loop through my data, merging and using these ndarrays to store the results. In principle this doesn't seem too hard, but I do have a few questions:
1) Is there any way to get pandas' existing merge command to use pre-allocated memory for its output? I assume I'm going to have to code this myself, but it can't hurt to ask...
2) Once I have my output in these ndarrays, how do I turn them into a dataframe? There are (at least) two issues I can see here:
a. I'm using three blocks of different types, not just a single homogeneous ndarray - I'm not sure how to turn heterogeneous ndarrays into a single dataframe.
b. Because the blocks are preallocated to fit the largest possible output, when I actually perform a merge and put the output into these blocks, I'll only want to use a subset of the blocks in my dataframe.
Thanks again for your help here. Hopefully this discussion will be of some value to others interested in similarly speeding up their pandas code, but I don't want to hog too much bandwidth, so do let me know if there's a more appropriate forum for this discussion.
Best,
Richard