Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Best Practices for passing numpy data pointer to C ?

Received: by 10.205.139.2 with SMTP id iu2mr1133818bkc.7.1343491045410;
        Sat, 28 Jul 2012 08:57:25 -0700 (PDT)
X-BeenThere: cython-users@googlegroups.com
Received: by 10.204.7.213 with SMTP id e21ls3982470bke.2.gmail; Sat, 28 Jul
 2012 08:57:22 -0700 (PDT)
Received: by 10.205.139.2 with SMTP id iu2mr1133797bkc.7.1343491042142;
        Sat, 28 Jul 2012 08:57:22 -0700 (PDT)
Received: by 10.205.139.2 with SMTP id iu2mr1133796bkc.7.1343491042110;
        Sat, 28 Jul 2012 08:57:22 -0700 (PDT)
Return-Path: <sturlamol...@yahoo.no>
Received: from mail-forward2.uio.no (mail-forward2.uio.no. [129.240.10.71])
        by gmr-mx.google.com with ESMTPS id e23si1515341bks.0.2012.07.28.08.57.21
        (version=TLSv1/SSLv3 cipher=OTHER);
        Sat, 28 Jul 2012 08:57:22 -0700 (PDT)
Received-SPF: neutral (google.com: 129.240.10.71 is neither permitted nor denied by best guess record for domain of sturlamol...@yahoo.no) client-ip=129.240.10.71;
Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 129.240.10.71 is neither permitted nor denied by best guess record for domain of sturlamol...@yahoo.no) smtp.mail=sturlamol...@yahoo.no
Received: from exim by mail-out2.uio.no with local-bsmtp (Exim 4.75)
	(envelope-from <sturlamol...@yahoo.no>)
	id 1Sv9OL-0002QU-LQ
	for cython-users@googlegroups.com; Sat, 28 Jul 2012 17:57:21 +0200
Received: from mail-mx1.uio.no ([129.240.10.29])
	by mail-out2.uio.no with esmtp (Exim 4.75)
	(envelope-from <sturlamol...@yahoo.no>)
	id 1Sv9OL-0002QR-KX
	for cython-users@googlegroups.com; Sat, 28 Jul 2012 17:57:21 +0200
Received: from ip-18-9-179-93.dialup.ice.net ([93.179.9.18] helo=[192.168.0.3])
	by mail-mx1.uio.no with esmtpsa (TLSv1:DHE-RSA-CAMELLIA256-SHA:256)
	user sturlamo (Exim 4.80)
	(envelope-from <sturlamol...@yahoo.no>)
	id 1Sv9OF-000693-Du
	for cython-users@googlegroups.com; Sat, 28 Jul 2012 17:57:21 +0200
Message-ID: <50140BE6.8040907@yahoo.no>
Date: Sat, 28 Jul 2012 17:57:26 +0200
From: Sturla Molden <sturlamol...@yahoo.no>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: cython-users@googlegroups.com
Subject: Re: [cython-users] Re: Best Practices for passing numpy data pointer
 to C ?
References: <CALGmxEJOLLx8VtNtK3jBt-AJ=kNU5i1kbEqgyTv9OErBgOG...@mail.gmail.com> <4666d921-974c-4236-83b3-5dd6ed98d09c@googlegroups.com> <CALGmxE+oftOv4sN6PGE=sp+sVLnmarLTEopEF85QCof9_GS...@mail.gmail.com> <5012E12A.7040...@yahoo.no> <070ba075-6fec-4f47-a293-a753ca4a1...@email.android.com> <50131250.9050...@yahoo.no> <CAMKS98_9KphiTfmKb_6qFh7uXLCAW55+21PN87DYgcj+kpZ...@mail.gmail.com> <5013F9E6.3080...@yahoo.no> <501402E1.7010...@yahoo.no> <CAMKS98_PuTddqs4SGMLLyODB9KR2VnSAz7bge_9LzCHKMgb...@mail.gmail.com>
In-Reply-To: <CAMKS98_PuTddqs4SGMLLyODB9KR2VnSAz7bge_9LzCHKMgb...@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------070405010601070103070805"
X-UiO-SPF-Received: 
X-UiO-Ratelimit-Test: rcpts/h 2 msgs/h 2 sum rcpts/h 2 sum msgs/h 2 total rcpts 1705 max rcpts/h 15 ratelimit 0
X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0, autolearn=disabled, FREEMAIL_FROM=0.001,FSL_RCVD_USER=0.001,HTML_MESSAGE=0.001,UIO_MAIL_IS_INTERNAL=-5, uiobl=NO, uiouri=NO)
X-UiO-Scanned: D72A57537069FC2ED276D8784CA277B80CDC8AB7
X-UiO-SPAM-Test: remote_host: 93.179.9.18 spam_score: -49 maxlevel 80 minaction 2 bait 0 mail/h: 2 total 47 max/h 12 blacklist 0 greylist 0 ratelimit 0

This is a multi-part message in MIME format.
--------------070405010601070103070805
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

I finally managed to make git/github work...
https://github.com/sturlamolden/memview_benchmarks

You got a pull request. I'm not sure if you already updated your code.

I'm very happy with the speed of memoryviews too, particularly slicing. 
The slowness of slicing np.ndarray was the reason I never could use 
Cython+NumPy instead of Fortran 95.

I now want to see a more realistic benchmark. I'm not sure if porting 
Scimark will be too much work. I want preferably to compare these on a 
set of real-world problems:

Python
Python with NumPy
C
C++ using STL
Fortran 77
Fortran 95
Cython with memoryviews
Java (perhaps)
C#.NET (perhaps)
MATLAB (perhaps)

Or perhaps we could use the Debian shootout?


Sturla



Den 28.07.2012 17:39, skrev Jake Vanderplas:
> Sturla,
> Thanks for looking at this.  I'm still learning the details of 
> optimizing memviews - these are very impressive benchmarks!  I've 
> updated my github repository with your changes:
> https://github.com/jakevdp/memview_benchmarks
> Thanks
>    Jake
>
> On Sat, Jul 28, 2012 at 8:18 AM, Sturla Molden <sturlamol...@yahoo.no 
> <mailto:sturlamol...@yahoo.no>> wrote:
>
>     I found another issue, the memoryview slices were not declared
>     contiguous. This reduced the runtime from 1.86 to 1.83 seconds.
>     That puts the overhead from using memoryview slices to 2.2%
>     compared to raw C pointer arithmetics. The benchmark creates two
>     million memoryview slices and computes one million dot products,
>     each with vector lengths of 1000. I am more than willing to accept
>     those 2.2 % to avoid those pesky pointers, but it remains to be
>     seen how memoryviews perform on a more realistic problem.
>
>     Sturla
>
>
>
>
>     Den 28.07.2012 16:40, skrev Sturla Molden:
>
>
>             I prepared some quick-and-dirty benchmarks of the behavior
>             I need at https://github.com/jakevdp/memview_benchmarks/
>             -- I'd be interested if people more familiar with
>             memory-views could take a look and let me know if I'm
>             missing anything there.
>                Jake
>
>
>
>         I took the liberty to update your banchmarks (see attachment).
>          For example I noticed that GCC was clever enough to optimize
>         out all the loops in your pointer_arith.pyx...
>
>         Here are the timings I got from the updated version in the
>         attachment. I think this gives the correct picture:
>
>         D:\memview-benchmarks\new>python runme.py
>         numpy_only: 6.86 sec
>         cythonized_numpy: 5.74 sec
>         cythonized_numpy_2: 10.4 sec
>         cythonized_numpy_2b: 6.25 sec
>         cythonized_numpy_3: 2.43 sec
>         cythonized_numpy_4: 1.78 sec
>         pointer_arith: 1.79 sec
>         memview: 1.86 sec
>
>         There is a table in the attached PDF that should be easier to
>         read.
>
>         The overhead from the numpy versions comes from slicing the
>         ndarray. In comparison, slicing the memoryview has a very
>         small overhead. If we slice the ndarray in Cython, this is not
>         much better than just using plain numpy in Python. But if we
>         use memoryviews, slicing is just a little bit slower than
>         using C style pointer arithmetics.
>
>         And consider this: Numerical code using array slicing in
>         Fortran90 with gfortran is often 2x slower than the same code
>         using pointer arithmetics in C with GCC. At least in my
>         experience (Fortran 77 is another matter.)
>
>         If you wonder why using np.dot was faster than writing out the
>         loop in Cython, that is due to Intel MKL in Enthought.
>
>         Conclusion:
>
>         Memoryviews are extremely fast, comparable to pointer
>         arithmetics in C.
>
>         Now we need a real benchmark, e.g. some linear algebra solver
>         or an FFT. Something like Scimark perhaps. Cython vs. C vs.
>         Fortran 90.
>
>         Sturla
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


--------------070405010601070103070805
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    I finally managed to make git/github work...<br>
    <a class="moz-txt-link-freetext" href="https://github.com/sturlamolden/memview_benchmarks">https://github.com/sturlamolden/memview_benchmarks</a><br>
    <br>
    You got a pull request. I'm not sure if you already updated your
    code.<br>
    <br>
    I'm very happy with the speed of memoryviews too, particularly
    slicing. The slowness of slicing np.ndarray was the reason I never
    could use Cython+NumPy instead of Fortran 95. <br>
    <br>
    I now want to see a more realistic benchmark. I'm not sure if
    porting Scimark will be too much work. I want preferably to compare
    these on a set of real-world problems:<br>
    <br>
    Python<br>
    Python with NumPy<br>
    C <br>
    C++ using STL<br>
    Fortran 77<br>
    Fortran 95&nbsp; <br>
    Cython with memoryviews<br>
    Java (perhaps)<br>
    C#.NET (perhaps)<br>
    MATLAB (perhaps)<br>
    <br>
    Or perhaps we could use the Debian shootout?<br>
    <br>
    <br>
    Sturla<br>
    <br>
    <br>
    <br>
    Den 28.07.2012 17:39, skrev Jake Vanderplas:
    <blockquote
cite="mid:CAMKS98_PuTddqs4SGMLLyODB9KR2VnSAz7bge_9LzCHKMgb...@mail.gmail.com"
      type="cite">Sturla,<br>
      Thanks for looking at this.&nbsp; I'm still learning the details of
      optimizing memviews - these are very impressive benchmarks!&nbsp; I've
      updated my github repository with your changes:<br>
      <a moz-do-not-send="true"
        href="https://github.com/jakevdp/memview_benchmarks">https://github.com/jakevdp/memview_benchmarks</a><br>
      Thanks<br>
      &nbsp;&nbsp; Jake<br>
      <br>
      <div class="gmail_quote">On Sat, Jul 28, 2012 at 8:18 AM, Sturla
        Molden <span dir="ltr">&lt;<a moz-do-not-send="true"
            href="mailto:sturlamol...@yahoo.no" target="_blank">sturlamol...@yahoo.no</a>&gt;</span>
        wrote:<br>
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex">
          I found another issue, the memoryview slices were not declared
          contiguous. This reduced the runtime from 1.86 to 1.83
          seconds. That puts the overhead from using memoryview slices
          to 2.2% compared to raw C pointer arithmetics. The benchmark
          creates two million memoryview slices and computes one million
          dot products, each with vector lengths of 1000. I am more than
          willing to accept those 2.2 % to avoid those pesky pointers,
          but it remains to be seen how memoryviews perform on a more
          realistic problem.<br>
          <br>
          Sturla<br>
          <br>
          <br>
          <br>
          <br>
          Den 28.07.2012 16:40, skrev Sturla Molden:
          <div class="HOEnZb">
            <div class="h5"><br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  I prepared some quick-and-dirty benchmarks of the
                  behavior I need at <a moz-do-not-send="true"
                    href="https://github.com/jakevdp/memview_benchmarks/"
                    target="_blank">https://github.com/jakevdp/memview_benchmarks/</a>
                  -- I'd be interested if people more familiar with
                  memory-views could take a look and let me know if I'm
                  missing anything there.<br>
                  &nbsp; &nbsp;Jake<br>
                </blockquote>
                <br>
                <br>
                I took the liberty to update your banchmarks (see
                attachment). &nbsp;For example I noticed that GCC was clever
                enough to optimize out all the loops in your
                pointer_arith.pyx...<br>
                <br>
                Here are the timings I got from the updated version in
                the attachment. I think this gives the correct picture:<br>
                <br>
                D:\memview-benchmarks\new&gt;python runme.py<br>
                numpy_only: 6.86 sec<br>
                cythonized_numpy: 5.74 sec<br>
                cythonized_numpy_2: 10.4 sec<br>
                cythonized_numpy_2b: 6.25 sec<br>
                cythonized_numpy_3: 2.43 sec<br>
                cythonized_numpy_4: 1.78 sec<br>
                pointer_arith: 1.79 sec<br>
                memview: 1.86 sec<br>
                <br>
                There is a table in the attached PDF that should be
                easier to read.<br>
                <br>
                The overhead from the numpy versions comes from slicing
                the ndarray. In comparison, slicing the memoryview has a
                very small overhead. If we slice the ndarray in Cython,
                this is not much better than just using plain numpy in
                Python. But if we use memoryviews, slicing is just a
                little bit slower than using C style pointer
                arithmetics.<br>
                <br>
                And consider this: Numerical code using array slicing in
                Fortran90 with gfortran is often 2x slower than the same
                code using pointer arithmetics in C with GCC. At least
                in my experience (Fortran 77 is another matter.)<br>
                <br>
                If you wonder why using np.dot was faster than writing
                out the loop in Cython, that is due to Intel MKL in
                Enthought.<br>
                <br>
                Conclusion:<br>
                <br>
                Memoryviews are extremely fast, comparable to pointer
                arithmetics in C.<br>
                <br>
                Now we need a real benchmark, e.g. some linear algebra
                solver or an FFT. Something like Scimark perhaps. Cython
                vs. C vs. Fortran 90.<br>
                <br>
                Sturla<br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
                <br>
              </blockquote>
              <br>
            </div>
          </div>
        </blockquote>
      </div>
      <br>
    </blockquote>
    <br>
  </body>
</html>

--------------070405010601070103070805--