How to declare a 1d array in cython

1,690 views
Skip to first unread message

Raphael C

unread,
Jul 15, 2018, 10:23:36 AM7/15/18
to cython-users
I have the following lines in my code:

    cdef np.ndarray[DTYPE_t, ndim = 1] v0 = np.zeros(lenT+1, dtype=DTYPE)
    cdef np.ndarray[DTYPE_t, ndim = 1] v1 = np.zeros(lenT+1, dtype=DTYPE)


Naturally cython -a shows these as yellow. How can I convert them so they don't use numpy and hence are faster?  

This is the full pyx code, it is to compute the edit distance.

import numpy as np
cimport numpy as np
import cython
ctypedef np.int_t DTYPE_t
DTYPE=np.int

@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def levenshtein(np.ndarray[DTYPE_t, ndim = 1] S, np.ndarray[DTYPE_t, ndim = 1] T):
    cdef int lenS = S.shape[0]
    cdef int lenT = T.shape[0]
    cdef int delcost = 2
    cdef int inscost = 1
    cdef int subscost
    if np.array_equal(S, T): return 0
    elif lenS == 0: return lenT
    elif lenT == 0: return lenS
    cdef np.ndarray[DTYPE_t, ndim = 1] v0 = np.zeros(lenT+1, dtype=DTYPE)
    cdef np.ndarray[DTYPE_t, ndim = 1] v1 = np.zeros(lenT+1, dtype=DTYPE)
    cdef int i,j
    for i in range(lenT+1):
        v0[i] = i
    for i in range(lenS):
        v1[0] = i + 1
        for j in range(lenT):
            if S[i] == T[j]:
                subscost = 0
            else:
                subscost = 1
            v1[j + 1] = min(v1[j] + delcost, v0[j + 1] + inscost, v0[j] + subscost)
        for j in range(lenT+1):
            v0[j] = v1[j]
    return v1[lenT]

Marcel Martin

unread,
Jul 15, 2018, 12:30:04 PM7/15/18
to cython...@googlegroups.com
On 2018-07-15 13:12, Raphael C wrote:
> I have the following lines in my code:
>
>     cdef np.ndarray[DTYPE_t, ndim = 1] v0 = np.zeros(lenT+1, dtype=DTYPE)
>     cdef np.ndarray[DTYPE_t, ndim = 1] v1 = np.zeros(lenT+1, dtype=DTYPE)
>
>
> Naturally cython -a shows these as yellow. How can I convert them so
> they don't use numpy and hence are faster?  
>
> This is the full pyx code, it is to compute the edit distance.

Hi,

have you profiled this? I suspect memory allocation is not your bottleneck.

I use cython array objects (cython.view.array) to allocate memory, see
<https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#cython-arrays>.
You can also use normal Python arrays (array.array), just read the
section following the one in the link.

However, both options still create a Python object and so the lines
should stay yellow, but the other option would be to manage memory
manually. That introduces more opportunities for bugs and it’s probably
not even faster.

Regarding your algorithm: Note that you only need a single column of the
alignment matrix, not two. You just need to store the cell you are
working on in a temporary variable before overwriting it. Unless the
compiler is optimizing this already, this may give you a speedup.

And there are probably lots of edit distance libraries out there in case
you are ready to add an external dependency.

Regards,
Marcel

Raphael C

unread,
Jul 15, 2018, 1:22:28 PM7/15/18
to cython-users
Thank you. This is very helpfui.

Raphael 
Reply all
Reply to author
Forward
0 new messages