A compiler infrastructure for data visualization

Cyrille Rossant

unread,

Jul 25, 2015, 5:07:57 AM7/25/15

to Vispy dev list

Hi all,

I finally took the time to write down my thoughts about the state of
VisPy, WebGL, and Vulkan in a somewhat thought-provoking blog post.
Feel free to have a look:
http://cyrille.rossant.net/compiler-data-visualization/

There are some radical (some might say crazy) ideas in there;
nevertheless I'd love to hear your thoughts and trigger the
discussion!

Best,
Cyrille

Eric Larson

unread,

Jul 27, 2015, 8:53:19 AM7/27/15

to visp...@googlegroups.com

Those are interesting ideas. Have you benchmarked how much overhead is due to multiple GL calls for objects that could in principle be collectionized? I've assumed the slowdown was almost entirely due to extra GL calls, not Python overhead, but you mention it might be the Python overhead causing slowdowns. I'd be curious to find out. I guess one simple way would be to write a simple GLSL visualization in C and in Python, and benchmark a "collectionized" version versus making 1e1, 1e2, ..., 1e5 calls to do the same amount of drawing in each language.

Eric

--
You received this message because you are subscribed to the Google Groups "vispy-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vispy-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vispy-dev/CA%2B-1RQQSd_5%3D%3DGfd0fi%2BJS4nmDanTx4b33T7T65evo7LK8vLSw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Cyrille Rossant

unread,

Jul 27, 2015, 8:55:45 AM7/27/15

to Vispy dev list

I think Almar has done these benchmarks recently...?

Almar Klein

unread,

Jul 27, 2015, 10:23:37 AM7/27/15

to visp...@googlegroups.com

http://www.almarklein.org/gl_collections.html

Most overhead is on the Python side, and remarkably, the overhead is
much smaller on Windows. I think it shows that if you're not using
Python, but JS or C++, you do not need collections (unless you're crazy
like Luke and plot 2000 plots at the same time, but that example
probably warrants a special LineCollectionVisual).

Luke Campagnola

unread,

Jul 29, 2015, 12:07:50 PM7/29/15

to visp...@googlegroups.com

Nice, provocative article :)

I completely agree that OpenGL makes our lives needlessly difficult,
and abandoning the shader system is bound to bring peace to the world.

> This is not more complicated than creating a NumPy universal function (ufunc) in pure Python with Numba: it's really the same idea of stream processing, but in a context of data visualization. You describe how your data is stored on the GPU, and how it's converted to vertices and pixels.

Spot on. I've been thinking a lot about this lately. We need a more
generic way to represent functions and expressions that can be
recompiled between platforms. That might be my next focus after the
Big PR goes through.

However, I have some gripes on the near-term issues:

> I'm now thinking that the whole "pure Python" idea is a bit overrated.

Pure python is important up until you have a community large enough to
take care of the packaging and distribution for you. Anaconda itself
does not fix this unless CA has taken on the burden of packaging. I
would love to have C-compiled code to boost performance, but I don't
think we're there yet..

> What does "pure Python" even mean, really?

It means users don't need a C compiler, and we don't need to compile
across the full matrix of platforms.
It means that VisPy is more likely to just work on many platforms,
without hassle (we already have enough OpenGL issues to deal with
across platforms).

> Because it is in pure Python, there always have been significant performance issues.

I'm not sure that's accurate. I've spent a lot of time profiling
VisPy, and the performance issues almost always appear where we are
doing unnecessary work--rebuilding large structures when we should be
modifying the existing structures. It just takes a lot more work to
implement it the right way, and we cut corners in many places in order
to have a functional prototype. Throwing C++ or LLVM at this problem
isn't going to automatically solve it, especially given how much work
we have left to do in Python. As we continue to streamline the
codebase, though, we will definitely come to places that no longer
benefit from optimization in Python. Some parts of gloo might already
be there.

> This means we cannot support recent OpenGL features like geometry shaders, tesselation shaders, or compute shaders.

Did you mean we cannot support them on the browser or on the desktop?
It has always been the plan to support them wherever the OpenGL
implementation allows it, and to fall back to slower techniques when
necessary.

On Sat, Jul 25, 2015 at 5:07 AM, Cyrille Rossant
<cyrille...@gmail.com> wrote:

Luke Campagnola

unread,

Jul 29, 2015, 12:14:47 PM7/29/15

to visp...@googlegroups.com

On Mon, Jul 27, 2015 at 7:23 AM, Almar Klein <almar...@gmail.com> wrote:
> http://www.almarklein.org/gl_collections.html
>
> Most overhead is on the Python side, and remarkably, the overhead is much
> smaller on Windows. I think it shows that if you're not using Python, but JS
> or C++, you do not need collections (unless you're crazy like Luke and plot
> 2000 plots at the same time, but that example probably warrants a special
> LineCollectionVisual).

There's a lot of speculation going on here.. It might very well be the
case that the performance differences stem from OpenGL / Angle /
WebGL, but this is confounded by the mixture of Python and JS with the
measurements (it wouldn't surprise me to find that Python is a real
bottleneck, but I it's important not to optimize until you know for
sure).

What we need are some control measurements: how long does it take to
execute various GL calls directly in Python+OpenGL, Python+Angle, and
JS (all sans gloo). Without that it is difficult to determine the
actual cause of the performance differences.

Cyrille Rossant

unread,

Jul 29, 2015, 12:30:52 PM7/29/15

to Vispy dev list

> However, I have some gripes on the near-term issues:

What do you mean by "near-term"? The whole article proposes a really
long-term view.

> Pure python is important up until you have a community large enough to
> take care of the packaging and distribution for you. Anaconda itself
> does not fix this unless CA has taken on the burden of packaging. I
> would love to have C-compiled code to boost performance, but I don't
> think we're there yet..

In my lab we've been working on a cross-platform matrix build system,
and it is not that complicated. It does require some upfront work and
investment, but then you get a system that builds conda packages
automatically for you at every commit (or PR, or release...). Note
that this is made relatively easy thanks to conda and binstar.

I agree that pure Python leads to very straightforward packaging and
distribution, but I don't think it's a sufficient reason to justify
the lock-in into Python. I think other platforms like R, Julia, or
even MATLAB could potentially benefit from VisPy, and that will never
happen if the entire codebase is in Python. But, sticking with pure
Python totally makes sense in the first few years (say, 5 years) of
the project.

> I'm not sure that's accurate. I've spent a lot of time profiling
> VisPy, and the performance issues almost always appear where we are
> doing unnecessary work--rebuilding large structures when we should be
> modifying the existing structures. It just takes a lot more work to
> implement it the right way, and we cut corners in many places in order
> to have a functional prototype. Throwing C++ or LLVM at this problem
> isn't going to automatically solve it, especially given how much work
> we have left to do in Python.

To be clear, I wasn't proposing throwing C++ or LLVM at the current
approach, and, as you say, it would make absolutely no sense at all.
Rather, the proposed idea is radically different, and in a sense more
limited in that you probably loose some dynamic aspects of
visualizations. For example, I am not proposing to implement a dynamic
scene graph at all, at least not in a first approach. I think you can
get a good deal of interesting visualizations without a dynamic scene
graph.

In other words, if VisPy can cover 100% of the use cases, the system I
propose would only cover 90% or something. But I expect it to be much
less complex, in a sense. Again, it's all rather speculative, and the
only way to know would be to experiment with the idea once Vulkan is
released. That's something I'd like to do, and it can be done
completely independently from the normal development of VisPy.

> Did you mean we cannot support them on the browser or on the desktop?
> It has always been the plan to support them wherever the OpenGL
> implementation allows it, and to fall back to slower techniques when
> necessary.

I meant on the desktop. I know it's the plan, but no one has ever done
it as far as I know? It might not be that simple; for example, I guess
you'd have to bypass GLIR somehow and resort to raw OpenGL...

Almar Klein

unread,

Jul 29, 2015, 5:22:19 PM7/29/15

to visp...@googlegroups.com

>> Most overhead is on the Python side, and remarkably, the overhead is much
>> smaller on Windows. I think it shows that if you're not using Python, but JS
>> or C++, you do not need collections (unless you're crazy like Luke and plot
>> 2000 plots at the same time, but that example probably warrants a special
>> LineCollectionVisual).
>
> There's a lot of speculation going on here.. It might very well be the
> case that the performance differences stem from OpenGL / Angle /
> WebGL, but this is confounded by the mixture of Python and JS with the
> measurements (it wouldn't surprise me to find that Python is a real
> bottleneck, but I it's important not to optimize until you know for
> sure).

If anything, I would expect the extra angle layer to be a cause for less
performance, but we see more. My guess is either the GL driver itself or
the way that ctypes-binding works on Win vs Unix. But because the
performance of JS-on-win and Python-on-win is rather similar, I suspect
the former to have a bigger influence.

> What we need are some control measurements: how long does it take to
> execute various GL calls directly in Python+OpenGL, Python+Angle, and
> JS (all sans gloo). Without that it is difficult to determine the
> actual cause of the performance differences.

That's true, on Python, gloo is not really a think layer anymore.

Almar Klein

unread,

Jul 29, 2015, 5:28:53 PM7/29/15

to visp...@googlegroups.com

> In my lab we've been working on a cross-platform matrix build system,
> and it is not that complicated. It does require some upfront work and
> investment, but then you get a system that builds conda packages
> automatically for you at every commit (or PR, or release...). Note
> that this is made relatively easy thanks to conda and binstar.

As much as I like conda, I agree with Luke here; not everyone uses
(ana)conda. And there are people that use pypy, or Raspberry Pi, or an
OS for which there are no conda packages. If we ever want to move to
mobile, we can probably do it relatively easily via Kivi *if* VisPy is
pure Python.

Alexander Taylor

unread,

Jul 29, 2015, 6:24:27 PM7/29/15

to vispy-dev, almar...@gmail.com

Just to note about the mobile thing, I got vispy working on android a few days ago. There's a screenshot at http://imgur.com/ojJPWqd ; it looks pretty much as one would hope. But actually, for what it's worth the python-for-android tools handle compiled C/C++ very well if Vispy does eventually go in that direction (and Kivy itself uses a lot of cython).

Almar Klein

unread,

Jul 29, 2015, 6:44:20 PM7/29/15

to Alexander Taylor, vispy-dev

> Just to note about the mobile thing, I got vispy working on android a
> few days ago. There's a screenshot at http://imgur.com/ojJPWqd ; it
> looks pretty much as one would hope.

Dude, that's awesome! So how does this work? Did you modify glir.py to
let it make use of kivi's gl API? Or does ctypes just work?

> But actually, for what it's worth
> the python-for-android tools handle compiled C/C++ very well if Vispy
> does eventually go in that direction (and Kivy itself uses a lot of cython).

Ok, that's good to know. Thanks for correcting me there :)

Alexander Taylor

unread,

Jul 29, 2015, 7:17:35 PM7/29/15

to Almar Klein, vispy-dev

I've submitted a new thread to vispy-dev about it, to avoid derailing
this one: https://groups.google.com/forum/#!topic/vispy-dev/bItKmo9AgcA

signature.asc

Luke Campagnola

unread,

Jul 30, 2015, 5:18:42 AM7/30/15

to visp...@googlegroups.com

On Wed, Jul 29, 2015 at 12:30 PM, Cyrille Rossant
<cyrille...@gmail.com> wrote:
>> However, I have some gripes on the near-term issues:
>
> What do you mean by "near-term"? The whole article proposes a really
> long-term view.
>
>> Pure python is important up until you have a community large enough to
>> take care of the packaging and distribution for you. Anaconda itself
>> does not fix this unless CA has taken on the burden of packaging. I
>> would love to have C-compiled code to boost performance, but I don't
>> think we're there yet..
>
> In my lab we've been working on a cross-platform matrix build system,
> and it is not that complicated. It does require some upfront work and
> investment, but then you get a system that builds conda packages
> automatically for you at every commit (or PR, or release...). Note
> that this is made relatively easy thanks to conda and binstar.

Have you written this up somewhere? I would be thrilled if we can get
a pipeline like this running for vispy. Particularly if Almar's
analysis turns out to be correct (which is very likely) and we decide
to reimplement vispy.gloo in C++ / cython / other. Here's a related
question: does anyone know if cython-compiled code can be loaded like
a normal shared object / DLL in other languages? If that turns out to
be difficult, then we might want to consider going straight to C++.

> I agree that pure Python leads to very straightforward packaging and
> distribution, but I don't think it's a sufficient reason to justify
> the lock-in into Python. I think other platforms like R, Julia, or
> even MATLAB could potentially benefit from VisPy, and that will never
> happen if the entire codebase is in Python. But, sticking with pure
> Python totally makes sense in the first few years (say, 5 years) of
> the project.

Agreed!

>> Did you mean we cannot support them on the browser or on the desktop?
>> It has always been the plan to support them wherever the OpenGL
>> implementation allows it, and to fall back to slower techniques when
>> necessary.
>
> I meant on the desktop. I know it's the plan, but no one has ever done
> it as far as I know? It might not be that simple; for example, I guess
> you'd have to bypass GLIR somehow and resort to raw OpenGL...

I don't think it will be terribly complicated. We need to implement
some new program types and VAOs in gloo, which will be most of the
effort. The shader system will also need some minor changes to allow
more than 2 types of shader. Aside from that, the ret of the effort is
in the visuals, and we don't really need to touch those at first; the
point is just to make sure this is an option when someone tells us
they'd like to implement X fancy-new rendering algorithm they saw at
SIGGRAPH.

On the other hand, maybe it makes more sense to skip over 3/4 and
focus our effort on Vulkan..

Cyrille Rossant

unread,

Jul 30, 2015, 6:37:58 PM7/30/15

to Vispy dev list, Max Hunter

> Have you written this up somewhere? I would be thrilled if we can get
> a pipeline like this running for vispy

no but we should definitely. cc max

> I don't think it will be terribly complicated. We need to implement
> some new program types and VAOs in gloo, which will be most of the
> effort. The shader system will also need some minor changes to allow
> more than 2 types of shader. Aside from that, the ret of the effort is
> in the visuals, and we don't really need to touch those at first; the
> point is just to make sure this is an option when someone tells us
> they'd like to implement X fancy-new rendering algorithm they saw at
> SIGGRAPH.
>
> On the other hand, maybe it makes more sense to skip over 3/4 and
> focus our effort on Vulkan..

Vulkan has support for all latest GPU techniques including all sorts
of shaders...

I forgot to say something about C/Python. I think you could stay in
pure Python even with a C++ core, using ctypes/cffi instead of Cython.
Instead of wrapping OpenGL with ctypes, you'd wrap a C++ lib with
ctypes too. What would remain in pure Python would essentially be the
scene graph system, I guess. The point is that compiling a C++ lib
might be easier than compiling a CPython extension...

alou...@umich.edu

unread,

Jan 22, 2016, 3:08:02 PM1/22/16

to vispy-dev, m...@huntershome.org

By the way, I have recently explored interfacing python with C++ code (I seem to try it about once a year). The best lightweight option I found was pybind11 (http://pybind11.readthedocs.org/en/latest/). An example of this interface is in the docs, or in my test project (https://github.com/antonl/varpro-blocks/blob/master/src/varpro_module.cpp). I have a hacked together CMake script there too.

It seems to do a lot less magical stuff than cython, and is more elegant than doing ctypes (in my opinion anyway). There are some complications with using it on Windows though - you need Visual Studio 2015 to use C++11 This is a problem because of the way python is compiled on windows. If you're using python 3.5 though, it's not a big deal and you're going to have this problem anyway if you're using C++11 on windows.

I've found that a build system involving (say) CMake makes compiling a CPython extension about the same as compiling a lib. You also get dependency management for free, and it is (mostly) cross-platform. You can also build out a python-independent library for other languages and a python module from the same-ish code.

> Here's a related
> question: does anyone know if cython-compiled code can be loaded like
> a normal shared object / DLL in other languages? If that turns out to
> be difficult, then we might want to consider going straight to C++.

In principle because Cython just creates a DLL with the proper python module hooks, it should be possible to load it in other languages, but I suspect the build out option is better. The relevant documentation for python modules is here (https://docs.python.org/3.5/extending/extending.html). Basically, you just need a pymod_init function with the right name. The code would be available to call from other languages, but I'm not sure it would make sense to do all of the type conversion back and forth between python types and C types.

- Anton

Reply all

Reply to author

Forward