Any examples out there of reading and writing binary files from cython

2,022 views
Skip to first unread message

pytrade

unread,
Apr 20, 2011, 3:33:20 PM4/20/11
to cython-users
Using struct pack and unpack is going to be too slow.

Sage finance was mentioned in an earlier post. Is that still a good
place to start? Any other examples?

Matthew Brett

unread,
Apr 20, 2011, 6:17:03 PM4/20/11
to cython...@googlegroups.com, pytrade
Hi,

Using numpy structured array dtypes is quick iff your binary data has
a constant memory format. There's an example for one format here:

http://www.trackvis.org/docs/?subsect=fileformat
https://github.com/nipy/nibabel/blob/master/nibabel/trackvis.py

Best,

Matthew

Robert Bradshaw

unread,
Apr 20, 2011, 8:17:43 PM4/20/11
to cython...@googlegroups.com

This is what I would recommend as well. From Cython you can type your
arrays to get fast access to the underly g C data element by element
if you want that as well.

- Robert

Vineet Jain

unread,
Apr 20, 2011, 8:22:20 PM4/20/11
to cython...@googlegroups.com
Yes this would work well when all the items are of the same type. However the binary file will have items with different structures. So would need to be able read individual items. 

Robert Bradshaw

unread,
Apr 20, 2011, 8:30:54 PM4/20/11
to cython...@googlegroups.com
On Wed, Apr 20, 2011 at 5:22 PM, Vineet Jain <vinj...@gmail.com> wrote:
> Yes this would work well when all the items are of the same type. However
> the binary file will have items with different structures. So would need to
> be able read individual items.

You can use all the low-level C stdio methods, but then you have to
worry (more) about issues like endianness, integer sizes, and struct
packing manually.

Vineet Jain

unread,
Apr 20, 2011, 8:33:16 PM4/20/11
to cython...@googlegroups.com
Cool. Any examples (or existing applications) on how to call c stdio functions from cython. 

Robert Bradshaw

unread,
Apr 20, 2011, 8:41:02 PM4/20/11
to cython...@googlegroups.com
On Wed, Apr 20, 2011 at 5:33 PM, Vineet Jain <vinj...@gmail.com> wrote:
> Cool. Any examples (or existing applications) on how to call c stdio
> functions from cython.

You can import them and use them just as you would from C.

from libc.stdio cimport *

cdef my_struct data
cdef FILE *f = fopen("path", "r") # or get this from a Python file object
fread(&data, sizeof(my_struct), 1, f)
fclose(f)

- Robert

Vineet Jain

unread,
Apr 20, 2011, 9:11:33 PM4/20/11
to cython...@googlegroups.com
Great. What is the over head of creating (memory wise) to use extension types so that I read in structs and then map them to extension types which I can then use in regular python data structures. 

Stefan Behnel

unread,
Apr 20, 2011, 11:35:25 PM4/20/11
to cython...@googlegroups.com
[Vineet, I took the time to refactor your posting order to make it
readable. This thread is a great example why top-posting is evil. Please
avoid it.]

Vineet Jain, 21.04.2011 03:11:
> On Wed, Apr 20, 2011 at 8:41 PM, Robert Bradshaw wrote:
>> On Wed, Apr 20, 2011 at 5:33 PM, Vineet Jain wrote:
>>> On Wed, Apr 20, 2011 at 8:30 PM, Robert Bradshaw wrote:
>>>> On Wed, Apr 20, 2011 at 5:22 PM, Vineet Jain wrote:
>>>>> On Wed, Apr 20, 2011 at 8:17 PM, Robert Bradshaw wrote:
>>>>>> On Wed, Apr 20, 2011 at 3:17 PM, Matthew Brett wrote:
>>>>>>> On Wed, Apr 20, 2011 at 12:33 PM, pytradewrote:


>>>>>>>> Using struct pack and unpack is going to be too slow.
>>>>>>>> Sage finance was mentioned in an earlier post. Is that still a
>>>>>>>> good place to start? Any other examples?
>>>>>>>
>>>>>>> Using numpy structured array dtypes is quick iff your binary data
>>>>>>> has a constant memory format. There's an example for one format here:
>>>>>>>
>>>>>>> http://www.trackvis.org/docs/?subsect=fileformat
>>>>>>> https://github.com/nipy/nibabel/blob/master/nibabel/trackvis.py
>>>>>>
>>>>>> This is what I would recommend as well. From Cython you can type your
>>>>>> arrays to get fast access to the underly g C data element by element
>>>>>> if you want that as well.
>>>>>>

>>>>> Yes this would work well when all the items are of the same
>>>>> type. However the binary file will have items with different
>>>>> structures. So would need to be able read individual items.
>>>>
>>>> You can use all the low-level C stdio methods, but then you have to
>>>> worry (more) about issues like endianness, integer sizes, and struct
>>>> packing manually.
>>>>

>>> Cool. Any examples (or existing applications) on how to call c stdio
>>> functions from cython.
>>
>> You can import them and use them just as you would from C.
>>
>> from libc.stdio cimport *
>>
>> cdef my_struct data
>> cdef FILE *f = fopen("path", "r") # or get this from a Python file
>> object
>> fread(&data, sizeof(my_struct), 1, f)
>> fclose(f)
>>

> Great. What is the over head of creating (memory wise) to use extension
> types so that I read in structs and then map them to extension types
> which I can then use in regular python data structures.

Depends. The minimal memory overhead of Python objects is something in the
order of 32 bytes IIRC, but depends on the platform. You can look it up in
the CPython header files. Cython adds a pointer to this for cdef methods if
you use them.

If that's too much for you (i.e. you have a large number of tiny structs),
consider writing your own container type instead that keeps a dynamically
allocated array of the structs.

Stefan

Robert Bradshaw

unread,
Apr 20, 2011, 11:56:49 PM4/20/11
to cython...@googlegroups.com

There is also the allocation overhead, allocating one struct (object)
at a time vs. a whole set or array of ones in a row. It really depends
on what exactly you're trying to do, but for simple, homogenious data
types that's one of the advantages of a NumPy array over, e.g. a list.

- Robert

Reply all
Reply to author
Forward
0 new messages