Sort a list of object by an atribute in a cdef class

131 views
Skip to first unread message

Javier Panadero Martínez

unread,
Jun 2, 2016, 12:36:12 AM6/2/16
to cython-users

I would like to sort the list inner of this class. The list contains objects of a cdef class called "Edge". The edge class contains a member variable called "savings". I would like sort the list by this variable.


In order to do that, I create the function sort inside de class, but when I execute the function, I obtain the next error message:


Exception AttributeError: "'fib.Edge' object has no attribute 'savings'" in 'fib.Alist.sort' ignored


Any of you could help me?


cdef class Alist:
 def __init__(self):
        self.inner = []
 cdef list inner
 cdef void append(self, Edge a):
    self.inner.append(a)
 cdef void pop(self, int a):
    self.inner.pop(a)
 cdef void insert(self,int pos,Edge a):
    self.inner.insert(pos,a)
 cdef int index(self,Edge a):
    if a in self.inner:
         return self.inner.index(a)
    return -1
 cdef Edge get(self, int i):
    return <Edge> self.inner[i]
 cdef void sort(self):
    self.inner.sort(key = lambda c : c.Savings)
    #self.inner.sort()
 def __len__(self):
    return len(self.inner)

 def __richcmp__(Edge self, Edge other,int op):
    if op == 0:
        if self.inner.savings < other.inner.savings:
            return True
    return False

Stefan Behnel

unread,
Jun 2, 2016, 12:52:02 AM6/2/16
to cython...@googlegroups.com
Javier Panadero Martínez schrieb am 02.06.2016 um 01:15:
> I would like to sort the list inner of this class. The list contains
> objects of a cdef class called "Edge". The edge class contains a member
> variable called "savings". I would like sort the list by this variable.
>
> In order to do that, I create the function sort inside de class, but when I
> execute the function, I obtain the next error message:
>
> *Exception AttributeError: "'fib.Edge' object has no attribute 'savings'"
> in 'fib.Alist.sort' ignored*
>
> Any of you could help me?
>
>
> cdef class Alist:
> def __init__(self):
> self.inner = []
> cdef list inner

I'd always move attributes first in the class definition.


> cdef void append(self, Edge a):
> self.inner.append(a)

Note that list.append() and the other list operations below can raise
exceptions, just like pretty much all other Python operations. Declaring
this method as "void" prevents their propagation. I'd either just remove
the "void" or declare the method as

cdef int append(self, Edge a) except -1

if it's *really* performance critical. Which I guess it isn't. So just
avoid "void". Cython actually gives you a warning for this. Might not be
visible by default, though.


> cdef void pop(self, int a):
> self.inner.pop(a)
> cdef void insert(self,int pos,Edge a):
> self.inner.insert(pos,a)
> cdef int index(self,Edge a):
> if a in self.inner:
> return self.inner.index(a)
> return -1

Same here. You should add an "except? -2" to the method declaration.


> cdef Edge get(self, int i):
> return <Edge> self.inner[i]
> cdef void sort(self):
> self.inner.sort(key = lambda c : c.Savings)

Assuming that "c.savings" (your error message says it starts with a
lower-case "s") is a cdef attribute, you'll have to either declare it as
"public" or "readonly", e.g. "cdef readonly int savings", or cast "c" to
the <Edge> type to let Cython know about it's C-level attributes. Otherwise
it has to assume that it's an arbitrary object, with no hint which cdef
attributes it might have and how to access them.

Overall, while I'm aware that this is stripped down example code, I wonder
what the purpose of this class is. Is its only purpose to use a list in a
type-safe way? And is it really storing a list of integers?

Stefan

Vern Muhr

unread,
Jun 2, 2016, 2:30:41 PM6/2/16
to cython-users


On Wednesday, June 1, 2016 at 9:36:12 PM UTC-7, Javier Panadero Martínez wrote:

I would like to sort the list inner of this class. The list contains objects of a cdef class called "Edge". The edge class contains a member variable called "savings". I would like sort the list by this variable.


In order to do that, I create the function sort inside de class, but when I execute the function, I obtain the next error message:


Exception AttributeError: "'fib.Edge' object has no attribute 'savings'" in 'fib.Alist.sort' ignored


Any of you could help me?


I spent some time on a Cython sorting problem a while back. In my case the equivalent of your Edge class was an Event class with attributes of time and priority. I wanted to sort on Time first then priority. Event was defined in a Python class. I had similar error messages until I kept track of reference counts. Here is my code:
#----------------------------------------------------------------------------
#
# sort_example.pyx  -  6/2/16
#
# A Cython implementation of the minheap algorithm
#
#----------------------------------------------------------------------------

# Memory allocation for the heap array uses cpython.mem functions
from cpython.mem cimport PyMem_Malloc, PyMem_Realloc, PyMem_Free

# For managing reference counts
from cpython.ref cimport Py_INCREF, Py_DECREF 

#----------------------------------------------------------------------------
#
# The type managed by CyMinHeap

ctypedef struct q_item:

    float t         # Event time as a float
    int   priority  # Event priority as an int    
    void* evt       # The Event object
  
#----------------------------------------------------------------------------
#
# Functions for < and > comparison of q_item objects (for priority sorting)
# The sort order is first by t then by priority
    
cdef inline int q_item_lt(q_item a, q_item b):
    
    if a.t == b.t:
        return a.priority < b.priority
    
    return a.t < b.t

#----
    
cdef inline int q_item_gt(q_item a, q_item b):
    
    if a.t == b.t:
        return a.priority > b.priority
    
    return a.t > b.t

        
#----------------------------------------------------------------------------
#
# Functions compatible with the Python heapq module

def heappush(heap, item):
    heap.push(item)

#----

def heappop(heap):
    return heap.pop()

#----

def heappeek(heap):
    return heap.peek()

#----------------------------------------------------------------------------
#
# Replacement for python heapq module specialized for q_item objects.
# 7.2x faster than python heapq implementation

DEF HEAP_SIZE = 1000
   
cdef class CyMinHeap:

    cdef q_item* heap
    cdef size_t  heap_len
    cdef public int position
    
    #----
    
    def __cinit__(self):
        # allocate uninitialized memory for the heap
        
        self.heap = <q_item*> PyMem_Malloc(HEAP_SIZE * sizeof(q_item))
        if not self.heap:
            raise MemoryError()
        
        self.heap_len = HEAP_SIZE
        self.heap[0] = q_item(0.0, 0, NULL)
        self.position = 0
    
    #----
    
    def __dealloc__(self):
        PyMem_Free(self.heap)
    
    #----
    
    def resize(self, size_t new_number):
        
        # Allocates new_number * sizeof(q_item) bytes,
        # preserving the current content and making a
        # best-effort to re-use the original data location
        
        mem = <q_item*> PyMem_Realloc(self.heap, new_number * sizeof(q_item))
        if not mem:
            raise MemoryError()
        self.heap = mem
        self.heap_len = new_number
         
    #------------------------------------------------------------------------
    #
    # Python compatible push/pop/peek methods
    
    def push(self, obj):
                
        if self.position < HEAP_SIZE:           
                        
            # Protect obj from garbage collection
            Py_INCREF(obj)
            
            self.position += 1                        
            self.heap[self.position] = q_item(<float>obj.time, <int>obj.priority, <void *>obj)                             
            self.percUp(self.position)
            
        else:
            raise IndexError()
        
    #----
 
    def pop(self):
     
        cdef object obj
         
        if self.position < 1:
            raise IndexError()
        
        obj = <object>self.heap[1].evt
        
        # Remove garbage collection protection from obj 
        Py_DECREF(obj)
               
        self.heap[1] = self.heap[self.position]
        self.position -= 1
        self.percDown(1)
        
        return obj
     
    #----
     
    def peek(self):

        if self.position < 1:
            raise IndexError()
    
        return <object>self.heap[1].evt
    
    #----
    
    def __len__(self):
        return self.position
    
    #------------------------------------------------------------------------
    #
    # Methods implementing minheap sorting
    
    cdef inline percUp(self,int i):
        
        while i / 2 > 0:
            if q_item_lt(self.heap[i], self.heap[i / 2]):                
                self.heap[i / 2], self.heap[i] = self.heap[i], self.heap[i / 2]               
            i = i / 2

    #----

    cdef inline percDown(self, int i):
        
        cdef int mc
        
        while (i * 2) <= self.position:
            mc = self.minChild(i)
            if q_item_gt(self.heap[i], self.heap[mc]):                
                self.heap[i], self.heap[mc] = self.heap[mc], self.heap[i]
            i = mc

    #----

    cdef inline int minChild(self, int i):
    
        if i * 2 + 1 > self.position:
            return i * 2
        else:
            if q_item_lt(self.heap[i*2], self.heap[i*2+1]):
                return i * 2
            else:
                return i * 2 + 1
      

Here is a test program, and the results:

#----------------------------------------------------------------------------
#----------------------------------------------------------------------------
#
# test_sort.py  -  6/2/16  Test sort_example.pyx

from cy_minheap import CyMinHeap, heappush, heappop
 
#------------------------------------------------------------------------------
#
# Define the object to be sorted
    
class Event(object):
        
    def __init__(self, t, p):      
        self.time = t
        self.priority = p

    #----
    
    def __str__(self):
        return 'Event(%.3f, %d)' % (self.time, self.priority)

#------------------------------------------------------------------------------
    
def test():
        
    heap = CyMinHeap()
    
    try:
        for i in [5,7,0,4,9,2,8,6,1,3]:
            heappush(heap, Event(0.0, i))
            print 'pushed Event(0.000, %d)' % i
            
    except IndexError:
        print 'push on full heap'
    
    print
    while 1:
        try:
            obj = heappop(heap)
            print 'popped %s' % obj 
            
        except IndexError:
            print 'pop from empty heap'
            break

test()
---------------------------------------------------------------
>python test_sort.py
pushed Event(0.000, 5)
pushed Event(0.000, 7)
pushed Event(0.000, 0)
pushed Event(0.000, 4)
pushed Event(0.000, 9)
pushed Event(0.000, 2)
pushed Event(0.000, 8)
pushed Event(0.000, 6)
pushed Event(0.000, 1)
pushed Event(0.000, 3)

popped Event(0.000, 0)
popped Event(0.000, 1)
popped Event(0.000, 2)
popped Event(0.000, 3)
popped Event(0.000, 4)
popped Event(0.000, 5)
popped Event(0.000, 6)
popped Event(0.000, 7)
popped Event(0.000, 8)
popped Event(0.000, 9)
pop from empty heap

Vern 

Stefan Behnel

unread,
Jun 5, 2016, 12:33:21 PM6/5/16
to cython...@googlegroups.com
Vern Muhr schrieb am 02.06.2016 um 20:06:
> I spent some time on a Cython sorting problem a while back. In my case the
> equivalent of your Edge class was an Event class with attributes of time
> and priority. I wanted to sort on Time first then priority. Event was
> defined in a Python class. I had similar error messages until I kept track
> of reference counts. Here is my code:
> [...]
>>
>> ctypedef struct q_item:
>>
>> float t # Event time as a float
>> int priority # Event priority as an int
>> void* evt # The Event object

Note that a C float is not very accurate if you want to store a time value,
assuming this is a Unix timestamp (e.g. time.time()). If this is counting
seconds since the epoch, you'd already have lost the precision of one
second way back in April 1970. Out of the 30 significant bits that the
current Unix timestamp (1465144318) requires, a C float only provides 23.
This means that the current accuracy of your timestamps would be about two
minutes (2**(30-23) = 128 seconds).

>> [...]
>> cdef class CyMinHeap:
>> cdef q_item* heap
>> cdef size_t heap_len
>> cdef public int position
>>
>> def push(self, obj):
>> # Protect obj from garbage collection
>> Py_INCREF(obj)
>>
>> self.position += 1
>> self.heap[self.position] = q_item(<float>obj.time, <int>obj.priority, <void *>obj)
>> self.percUp(self.position)

This is a different problem. The OP was using a Python list as container.
You are storing Python references in a struct, which is a C container and
not a Python container, meaning that it does not know anything about
reference counting and does not keep track of owned references itself. It
only stores pointers, and the object it points to may die at any time, as
soon as its last owned reference is gone.

Personally, I'd first consider using a Python list instead of a C array of
structs, and cdef classes instead of C structs as items. You seem to
require the (Python) input objects to have certain attributes already, so
making them pre-defined cdef classes might not even hurt the users. Storing
objects instead of your structs would require a little more memory per item
(which may or may not be acceptable in your case), but it would simplify
things otherwise.

Your solution is a valid one, though, if memory constraints are tight. In
that case, a "short int" priority and packed structs would reduce your
memory footprint even further (assuming you don't have more than 65536
priority levels).

Stefan

Reply all
Reply to author
Forward
0 new messages