Re: Howto map a Python dictionary to a typed memory view

1,518 views
Skip to first unread message

Andreas van Cranenburgh

unread,
Apr 25, 2013, 5:18:24 AM4/25/13
to cython...@googlegroups.com


On Thursday, April 25, 2013 8:05:12 AM UTC+2, Martin Bammer wrote:
I've read the documentation for the typed memory views, but I have no glue how to to map a dictionary, which contains addresses as keys and byte values as data to a typed memory view.

I think it is something like:

cdef int[::view.indirect_contiguous, ::1] memory_view

But how do I map the contents of the dictionary to this view and how to read from this view?

You could do this, but I don't think you should want to. While the items of a dictionary are stored in an array, they are not stored contiguously. More importantly, the representation is not part of the public API, you're supposed to use the functions in the C-API (which Cython does). You can find out things like this by reading the Python C-API documentation [1] and the source code.

Depending on what you want to do you could switch to a C++ unordered_map, use a C implementation of a hash table, or stick with Python dictionaries. They are well optimized so it's really the question whether this is your bottleneck (profile!). The items in Python dictionaries are not typed but if you're after speed not type safety then you can cast the values you get out of dictionary lookups to avoid type checks.

Sturla Molden

unread,
Apr 29, 2013, 5:55:31 AM4/29/13
to cython...@googlegroups.com
If you have a C++11 compiler (e.g. GCC), the easiest solution is just to use STL containers. You will find STL containers that corresponds to most Python types. For example:

Some samples:

str :: std::string
unicode -> std::wstring
list -> std::vector
tuple -> std::tuple
dict -> std::unordered_map
set -> std::unordered_set
collections.deque -> std::deque
Queue.Queue -> std::queue + std::mutex
heapq module -> std::priority_queue

(STL types are typed, and do not depend on the GIL.)

Sturla


Sendt fra min iPad

Den 29. apr. 2013 kl. 06:56 skrev Martin Bammer <mrb...@gmail.com>:

Yes of course, dictionaries are quite fast in Python, but the problem is the GIL. I would like to calculate the values in parallel with threads. This needs to avoid Python types.



Am Donnerstag, 25. April 2013 11:18:24 UTC+2 schrieb Andreas van Cranenburgh:

You could do this, but I don't think you should want to. While the items of a dictionary are stored in an array, they are not stored contiguously. More importantly, the representation is not part of the public API, you're supposed to use the functions in the C-API (which Cython does). You can find out things like this by reading the Python C-API documentation [1] and the source code.

Depending on what you want to do you could switch to a C++ unordered_map, use a C implementation of a hash table, or stick with Python dictionaries. They are well optimized so it's really the question whether this is your bottleneck (profile!). The items in Python dictionaries are not typed but if you're after speed not type safety then you can cast the values you get out of dictionary lookups to avoid type checks.

--
 
---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Chris Barker - NOAA Federal

unread,
Apr 29, 2013, 2:47:43 PM4/29/13
to cython-users
On Mon, Apr 29, 2013 at 2:55 AM, Sturla Molden <stu...@molden.no> wrote:
If you have a C++11 compiler (e.g. GCC), the easiest solution is just to use STL containers. You will find STL containers that corresponds to most Python types. For example:
<snip> 
 
unicode -> std::wstring

Is this at all the case anymore? My understanding is that a wstring is simply an array of 2-byte "char"s -- and encoding, interpretatin, etc is all up to you (or libraries that you're using).

whereas a Python unicode object really is unicode, and if you use the Python API dont need to care about encoding, etc -- it does all that for you. Internally:

Py2: unicode objects are stored internally in buffers of either two or four bytes, (UCS16 or UCS32) determined at compile time.

Py3: The latest version supports multiple internal encodings depending on what data you actually have in the object -- very cool, but a bit hard to directly pass to cany other library.

So: what you need to so is determine what encoding you want to use in your C or C++ code, choose an appropriate container (maybe std::string or std::wstring, or maybe something from a proper unicode library). Then in your Cython, encode the unicode object to a bytes object, and pass that bytes object off to your C/C++ datatype.

I sure wish it were easier, but it's just not.

-Chris



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Robert Bradshaw

unread,
Apr 30, 2013, 1:29:55 AM4/30/13
to cython...@googlegroups.com
On Mon, Apr 29, 2013 at 11:47 AM, Chris Barker - NOAA Federal
<chris....@noaa.gov> wrote:
> On Mon, Apr 29, 2013 at 2:55 AM, Sturla Molden <stu...@molden.no> wrote:
>>
>> If you have a C++11 compiler (e.g. GCC), the easiest solution is just to
>> use STL containers. You will find STL containers that corresponds to most
>> Python types. For example:
>
> <snip>
>
>>
>> unicode -> std::wstring
>
>
> Is this at all the case anymore? My understanding is that a wstring is
> simply an array of 2-byte "char"s -- and encoding, interpretatin, etc is all
> up to you (or libraries that you're using).
>
> whereas a Python unicode object really is unicode, and if you use the Python
> API dont need to care about encoding, etc -- it does all that for you.
> Internally:

I agree, wstring is almost certainly not what you want.

> Py2: unicode objects are stored internally in buffers of either two or four
> bytes, (UCS16 or UCS32) determined at compile time.
>
> Py3: The latest version supports multiple internal encodings depending on
> what data you actually have in the object -- very cool, but a bit hard to
> directly pass to cany other library.
>
> So: what you need to so is determine what encoding you want to use in your C
> or C++ code, choose an appropriate container (maybe std::string or
> std::wstring, or maybe something from a proper unicode library). Then in
> your Cython, encode the unicode object to a bytes object, and pass that
> bytes object off to your C/C++ datatype.
>
> I sure wish it were easier, but it's just not.

It's better with this last release:
http://docs.cython.org/src/tutorial/strings.html#auto-encoding-and-decoding
.

It should be noted that if you're trying to avoid the GIL you may end
up having to do your own synchronizing of the (not completely
thread-safe) stl containers.

- Robert

Sturla Molden

unread,
Apr 30, 2013, 3:29:12 PM4/30/13
to cython...@googlegroups.com
Den 30. apr. 2013 kl. 07:29 skrev Robert Bradshaw <robe...@gmail.com>:

>
>
> I agree, wstring is almost certainly not what you want.
>
>


Ok. Never mind unicode. But the rest still applies:

str :: std::string
list -> std::vector
tuple -> std::tuple
dict -> std::unordered_map
set -> std::unordered_set
collections.deque -> std::deque
Queue.Queue -> std::queue + std::mutex
heapq module -> std::priority_queue


Sturla

Reply all
Reply to author
Forward
0 new messages