This uses PyList_Append under the hood, which has O(1) Amortized worst
case complexity (you can think of it as doubling the size of the list
every time it runs out of room, though the formula is slightly more
complicated than that). Usually this is fast enough.
> I tried using PyList_New and PyList_SetItem but got back garbage.
> Is there something wrong with my cython code here?...
>
> # === Cython Code ===
>
> from cpython cimport PyObject, PyList_New, PyList_SetItem
> def hello_n_times(int n):
> o = PyList_New(n)
> for i in xrange(n):
> s = 'hi there %d' % i
> print s
> PyList_SetItem(o, i, s)
> return o
>
> # === resulting output ===
>
> hi there 0
> hi there 1
> hi there 2
> hi there 3
> hi there 4
> ['Py_Repr', 'stdout', 'Py_Repr', 'stdout', 'Py_Repr']
You need to manually incref s. However, unless you need this speed
(profile it), the list comprehension should be nearly as fast and much
cleaner. I've often considered doing this optimization when the size
of the resulting list is known ahead of time (though on this note
PyList_Append will shrink a list that is too large, so just
initializing the list to a larger capacity is insufficient).
- Robert
Yep.
> I looked at the Objects/listobject.c file in Python's source and indeed the
> formula is complicated / weird.
> It basically increases the size by an eighth of the original size.
> The source doesn't claim O(1) it claims to have linear behavior which would
> be O(n).
It's the amortized cost that's O(1). Though an individual insert might
be O(n), these are exponentially rare, so the average insert is O(n) /
n. (The actual cost is c for some c that depends on your resizing
parameter (as well as the expense of allocation, copying, how likely
that realloc doesn't require a copy, etc.) It's a balance between
reallocation frequency and memory waste.)
> In any case it isn't doing anything boneheaded so I guess I should just
> leave my list comprehension alone.
Yep, that's what I'd do. You might also want to look into using NumPy
or memory views to provide Python-level access to the underlying data
(avoiding creating a Python object for each element in your list.)
yes, do this!
I was discussing only this morning how it wasn't immediately obvious to
me that Numpy is fantastic for a big set of things beyond numerical
computation. Basically, anywhere where you need to deal with a block of
memory in python in a clean pythonic way, Numpy's your friend. It's just
wonderful, and I really can't stress that enough!
cheers,
hen
How is your array being filled? Do you invoke the data getter (written
in C) from within Numpy?
The easiest way to do this is to call your C code with an array that is
already being managed by Numpy. So you create the Numpy array, then you
call your C code, passing it a pointer to the Numpy array, which it then
fills.
Another option is to do a memcopy from the C array into a preallocated
Numpy array within Cython. This obviously has the overhead of a copy
(albeit a very fast, low level copy).
Finally, if the memory is necessarily being allocated within the C code
and you don't want to do a copy, then you're outside my experience, but
I understand it's possible. https://gist.github.com/1253698 seems to be
a neat example (though I defer to someone more knowledgeable on this
one).
If you want an example for either of the first two options, I can do
that.
cheers,
hen
You could also go through the buffer interface using memoryviews:
https://sage.math.washington.edu:8091/hudson/job/cython-docs/doclinks/1/src/userguide/memoryviews.html#cython-array
Great!
hen
Neat!
Yes, you can clone Cython from github or wait for the
soon-to-be-released Cython 0.16.