Using numpy structured array dtypes is quick iff your binary data has
a constant memory format. There's an example for one format here:
http://www.trackvis.org/docs/?subsect=fileformat
https://github.com/nipy/nibabel/blob/master/nibabel/trackvis.py
Best,
Matthew
This is what I would recommend as well. From Cython you can type your
arrays to get fast access to the underly g C data element by element
if you want that as well.
- Robert
You can use all the low-level C stdio methods, but then you have to
worry (more) about issues like endianness, integer sizes, and struct
packing manually.
You can import them and use them just as you would from C.
from libc.stdio cimport *
cdef my_struct data
cdef FILE *f = fopen("path", "r") # or get this from a Python file object
fread(&data, sizeof(my_struct), 1, f)
fclose(f)
- Robert
Vineet Jain, 21.04.2011 03:11:
> On Wed, Apr 20, 2011 at 8:41 PM, Robert Bradshaw wrote:
>> On Wed, Apr 20, 2011 at 5:33 PM, Vineet Jain wrote:
>>> On Wed, Apr 20, 2011 at 8:30 PM, Robert Bradshaw wrote:
>>>> On Wed, Apr 20, 2011 at 5:22 PM, Vineet Jain wrote:
>>>>> On Wed, Apr 20, 2011 at 8:17 PM, Robert Bradshaw wrote:
>>>>>> On Wed, Apr 20, 2011 at 3:17 PM, Matthew Brett wrote:
>>>>>>> On Wed, Apr 20, 2011 at 12:33 PM, pytradewrote:
>>>>>>>> Using struct pack and unpack is going to be too slow.
>>>>>>>> Sage finance was mentioned in an earlier post. Is that still a
>>>>>>>> good place to start? Any other examples?
>>>>>>>
>>>>>>> Using numpy structured array dtypes is quick iff your binary data
>>>>>>> has a constant memory format. There's an example for one format here:
>>>>>>>
>>>>>>> http://www.trackvis.org/docs/?subsect=fileformat
>>>>>>> https://github.com/nipy/nibabel/blob/master/nibabel/trackvis.py
>>>>>>
>>>>>> This is what I would recommend as well. From Cython you can type your
>>>>>> arrays to get fast access to the underly g C data element by element
>>>>>> if you want that as well.
>>>>>>
>>>>> Yes this would work well when all the items are of the same
>>>>> type. However the binary file will have items with different
>>>>> structures. So would need to be able read individual items.
>>>>
>>>> You can use all the low-level C stdio methods, but then you have to
>>>> worry (more) about issues like endianness, integer sizes, and struct
>>>> packing manually.
>>>>
>>> Cool. Any examples (or existing applications) on how to call c stdio
>>> functions from cython.
>>
>> You can import them and use them just as you would from C.
>>
>> from libc.stdio cimport *
>>
>> cdef my_struct data
>> cdef FILE *f = fopen("path", "r") # or get this from a Python file
>> object
>> fread(&data, sizeof(my_struct), 1, f)
>> fclose(f)
>>
> Great. What is the over head of creating (memory wise) to use extension
> types so that I read in structs and then map them to extension types
> which I can then use in regular python data structures.
Depends. The minimal memory overhead of Python objects is something in the
order of 32 bytes IIRC, but depends on the platform. You can look it up in
the CPython header files. Cython adds a pointer to this for cdef methods if
you use them.
If that's too much for you (i.e. you have a large number of tiny structs),
consider writing your own container type instead that keeps a dynamically
allocated array of the structs.
Stefan
There is also the allocation overhead, allocating one struct (object)
at a time vs. a whole set or array of ones in a row. It really depends
on what exactly you're trying to do, but for simple, homogenious data
types that's one of the advantages of a NumPy array over, e.g. a list.
- Robert