Converting python byte-string to unsigned char* array?

3,456 views
Skip to first unread message

Keith Hughitt

unread,
May 10, 2012, 4:18:46 PM5/10/12
to cython...@googlegroups.com
Hey all,

I'm trying to read in a binary file using Python and convert the result to a C-style array, but am not having any luck.

I tried casting the data manually using angle brackets and also letting Cython automatically cast the data, e.g.:

        pydata = open(filename, 'rb').read()
        cdef unsigned char* raw_data = pydata

but neither approach have worked.

Any suggestions?

Thanks,
Keith


Stefan Behnel

unread,
May 10, 2012, 4:24:45 PM5/10/12
to cython...@googlegroups.com
Keith Hughitt, 10.05.2012 22:18:
> I'm trying to read in a binary file using Python and convert the result to
> a C-style array, but am not having any luck.
>
> I tried casting the data manually using angle brackets and also letting
> Cython automatically cast the data, e.g.:
>
> pydata = open(filename, 'rb').read()
> cdef unsigned char* raw_data = pydata
>
> but neither approach have worked.

That should work, although you might need to type (or cast) pydata as bytes
object.

What Cython version are you using and what error message does it give you?
(note that it's always best to provide this information in a problem report)

Stefan

Keith Hughitt

unread,
May 11, 2012, 6:56:11 AM5/11/12
to cython...@googlegroups.com
Hi Stefan,

I'm using Cython 0.15.1. There is no error message indicating that the operation isn't working, but when I attempt to pass the array into a function that expects the unsigned char* array, it doesn't work.

The reason I suspect that it has to do with how I'm creating the array is that the function works (sometimes...) when I create the array using c-style syntax, e.g.:

from libc.stdlib cimport calloc, malloc, free

cdef unsigned char *data = NULL
cdef int file_length

cdef unsigned char* read_file(char* filename, int* file_length):
       cdef int length
       cdef fsrc = fopen(filename, "rb")

        # Determine length of file
        fseek(fsrc, 0, SEEK_END)
        length = ftell(fsrc)
        fseek(fsrc, 0, SEEK_SET)
        
        *file_length = &length

        # Set aside memory
        data = <unsigned char *> malloc(length)
        
        # Read file in
        fread(data, 1, length, fsrc)

        fclose(fsrc)

        return data

data = read_file('test.jpg', &file_length)
        
I think I may also be making some mistakes relating to passing values by reference which is complicating the process of debugging, but I'll ask that separately.

Any insight would be greatly appreciated.

Thanks!
Keith

Stefan Behnel

unread,
May 11, 2012, 7:11:58 AM5/11/12
to cython...@googlegroups.com
Keith Hughitt, 11.05.2012 12:56:
> I'm using Cython 0.15.1. There is no error message indicating that the
> operation isn't working, but when I attempt to pass the array into a
> function that expects the unsigned char* array, it doesn't work.

That's what I meant when I wrote that "it's best to provide this
information". How does "it doesn't work" look exactly?


> The reason I suspect that it has to do with how I'm creating the array is
> that the function works (sometimes...) when I create the array using
> c-style syntax, e.g.:
>
> from libc.stdlib cimport calloc, malloc, free
>
> cdef unsigned char *data = NULL
> cdef int file_length
>
> cdef unsigned char* read_file(char* filename, int* file_length):
> cdef int length
> cdef fsrc = fopen(filename, "rb")
>
> # Determine length of file
> fseek(fsrc, 0, SEEK_END)
> length = ftell(fsrc)
> fseek(fsrc, 0, SEEK_SET)

Hmm, that seems a rather uncommon way to determine the file length. Why not
ask the file system?


> *file_length = &length

I'm sure you want to assign the value here, not the address.


> # Set aside memory
> data = <unsigned char *> malloc(length)

... which may return NULL ...


> # Read file in
> fread(data, 1, length, fsrc)
>
> fclose(fsrc)
>
> return data
>
> data = read_file('test.jpg', &file_length)

Let me express a very educated guess about the code that you did not show
us yet. It does not care about the length of the Python bytes object, does
it? And since you're dealing with binary data, it stops processing at the
first null byte...

Did you take a look at the string processing tutorial?

http://docs.cython.org/src/tutorial/strings.html

Stefan

Keith Hughitt

unread,
May 11, 2012, 10:13:48 AM5/11/12
to cython...@googlegroups.com
Hi Stefan,

I just finished writing up a lengthy response with the exact errors I was encountering when I discovered that the casting *is* working: It turns out that I had forgotten to remove an earlier call to "free()" from when I was reading in the data using C. At some point I was also getting a different error (the result to a call to a third-party library was NULL indicating a problem with my input) which is why I thought the problem may be with the type-casting. I'm not sure what change I made to fix that.

Your feedback was still very helpful, however, and I think I'm slowly becoming reacquainted with C coding and some of the caveats and pitfalls I haven't had to deal with after so many years of working in higher-level languages.

Thanks,
Keith

Stefan Behnel

unread,
May 11, 2012, 10:29:21 AM5/11/12
to cython...@googlegroups.com
Keith Hughitt, 11.05.2012 16:13:
> I just finished writing up a lengthy response with the exact errors I was
> encountering when I discovered that the casting *is* working: It turns out
> that I had forgotten to remove an earlier call to "free()" from when I was
> reading in the data using C. At some point I was also getting a different
> error (the result to a call to a third-party library was NULL indicating a
> problem with my input) which is why I thought the problem may be with the
> type-casting. I'm not sure what change I made to fix that.

Well, that's teddy bear debugging in action. ;)


> Your feedback was still very helpful, however, and I think I'm slowly
> becoming reacquainted with C coding and some of the caveats and pitfalls I
> haven't had to deal with after so many years of working in higher-level
> languages.

Yep, and that's one of the selling points of Cython: you can descend to
that level at any point, but you aren't forced to stay there.

Stefan

Keith Hughitt

unread,
May 11, 2012, 10:44:10 AM5/11/12
to cython...@googlegroups.com
Yea. I can definitely see it's strength now. When I first started on the project I wrote some C code and then tried to port that line-for-line into Cython, even though C is not my strength to begin with. Fortunately most of the performance-demanding code is in the library I'm wrapping so it looks like I can work mostly in the language I'm most comfortable with :)
Reply all
Reply to author
Forward
0 new messages