I'm one of the core developers of Cython, a tool for translating Cython
code (a blend of Python and C) into C code using the CPython API,
loadable into the standard CPython interpreter. It's become somewhat of
a de facto standard in the scientific Python community when Python
itself isn't enough.
I recently learned about Julia -- and so far I love it! Congratulations!
To some of the Cython developers and users, Cython has been something of
a stop-gap solutions. It gets the job done, but it's not pretty. So
while I have invested lots of time in Cython and have a heavy emotional
investment as well, I still wish for Cython to be swept away -- or the
need for it to disappear.
What I hope can be done here is to avoid walled gardens. We already have
R and Python as somewhat separate open source gardens (although Python
folks will occasionally call into R) and it's very destructive. Open
source computation doesn't have more cycles than we need!
I believe Julia and Python is a match made in heaven:
Python has:
- an enourmous amount of libraries
- a pretty large userbase in scientific computing and engineering
- some enthusiastic developers/companies/foundations wishing to drive
the platform further for scientific computing
But, there's also
- a crappy interpreter speed-wise (no JIT), and the best bet for a
better one, PyPy, has its own problems
- a CPython development community who simply doesn't "get" scientific
computing (not their problem at all, but it can cause problems for us).
And Julia appears to have exactly a) nice language geared for
performance, b) the JIT technology that we lack, c) developers who "get it".
It's a common theme when we scientific Python users talk that we don't
really use Python for the *language*. We use it for the community and
the libraries. This is why I'm afraid to jump to Julia outright (and
won't do it) -- it's way too easy to underestimate the pain of building
the userbase and the thousands of libraries for every little task. Just
consider the amount of pain it is for the Python community to make the
transition from Python 2 to Python 3!
So I believe both Julia and "SciPy" (as a term for the scientific Python
community) would gain hugely be a close cooperation. And I don't think
this is an arbitrary match at all! -- I believe Python to be
best-of-breed scientific ecosystems (in competition with R), and it
looks like Julia may become the best-of-breed scientific language and
interpreter.
Here's an example roadmap. I'm not saying anything about what *you*
should do, it's just food for thought. Perhaps this is more about what
Python users or Cython developers should do. I just want to get the
conversation started. (Also there's a discussion going on on the
Cython-dev list *right now* that's sort of relevant, more below.)
1) Make sure Julia is easily callable from Python.
Scientific Python users are quite used to diving into other languages
(earlier C and C++, now more and more Cython) to get speedups. Julia
should be one of those choices, and it should be well enough done that
Julia gets promoted by core scientific Python teachers (instead of
Cython as is often the case today).
2) Make sure Python can be transparently called from Julia. Even ship
Python with Julia! You don't need to reimplement all the libraries under
the sun in Julia -- it may be faster, it may be purer, but using
existing libraries gets the job done now, and the important ones can
always be ported to Julia later for speed.
This is not a one-way transaction -- once there's a single non-trivial
but useful piece of code implemented in Julia only (and under a non-GPL
open source license), you can bet that scientific Python distributions
will start to bundle it.
Technology:
CPython has a nice C API. If Julia can both be called from and call into
C (fast!), an "interpreter-level" binding will not be a problem.
On the Cython list we're currently discussion a proposal to annotate
Python objects with C-level function signatures, so that when native
functions are "boxed" and passed throug Python, you can unbox the
function pointer instead of boxing and unboxing the arguments.
This could be used to make sure Julia routines are callable *without*
argument boxing and unboxing from Cython, and that scipy routines, such
as ODE solvers etc. (implemented in C or Fortran) can likewise call back
into callbacks written in Julia without argument boxing/unboxing.
Spec draft:
http://wiki.cython.org/enhancements/cep1000
Best point of entry for ongoing discussion:
http://article.gmane.org/gmane.comp.python.cython.devel/13557
Finally, for some light background reading, here's some comments of the
clashes scientific Python has had with PyPy, and the dangers to be avoided:
http://technicaldiscovery.blogspot.com/2011/10/thoughts-on-porting-numpy-to-pypy.html
http://blog.streamitive.com/2011/10/19/more-thoughts-on-arrays-in-pypy/
Any and all comments appreciated.
Dag Sverre Seljebotn
A great starting point would be for someone to take a crack at embedding a Python interpreter in a Julia process using ccall (I might do this because it looks fun and easy :-). I think that calling Python from Julia is the more fruitful direction anyway because there's obviously far more Python code out there that could be useful to call from Julia than there is Julia code that might be useful to call from Python. It's also the easier direction to get working immediately.
I also think it's about more than the tools -- the scientific Python
community is, I believe, what comes closest to prior art for what you
want to achieve. High-level scientific computing but conscious about
performance.
For instance, packaging issues is something SciPy people have probably
spent hundreds of hours of thinking about. Of course, Python packaging
is broken in a hundred ways -- but you should talk to people who knows
about that (e.g., David Cournapeau) just to be warned about how NOT to
do it and what you should avoid.
Perhaps some Julia devs could come and give a talk about Julia at a
SciPy conference this summer? (provided the committee find that on topic
enough...I'd hope so).
> We want to be able to compile Julia code into C-callable shared
> libraries, but that's going to take a fair amount of work; this would
> make it as easy for Python to call Julia code as it is to call C code.
> I'm not sure how much boilerplate and magic incantation is required for
> calling shared libraries from Python, but from what I recall it's pretty
> minimal. Whatever incantations there are can certainly be easily
> assisted with from the Julia side when generating libraries.
Hopefully it could be made totally transparent; at least that's my hope.
The set of language idioms/semantics that can sanely be mapped between
Julia and Python is much larger than what C provides.
The reason I stress this is because I believe your payoff would be
rather large -- it's *much* less work to create a Python bridge than
port over all the web frameworks, networking libraries, machine learning
libraries, and whatnot. Believe me -- a Python bridge is easy :-)
> I'm not sure what's necessary to call Python from Julia. I've never
> embedded a python interpreter in a C process, but it should be basically
> the same exact thing: just write ccall code that does whatever the
> embedding C program would do. Making sure data structures are compatible
> is a trickier propposition, but fortunately the lingua franca of basic C
> data types (i.e. the stuff that Fortran 77 can do) gets you pretty
> damned far — that's actually all our ccall interface even supports at
> this point. Passing native Python data structures to and from Julia is,
> I suspect, never going to work, but passing NumPy array memory around
> might very well, precisely because the layout is C/Fortran compatible.
Well, you can just use proxy objects, a "PyObject" type in Julia. The
proxy aspect would hopefully mostly disappear through macros/JITing, and
you're left with basically what Cython does. Using Python libraries
through Julia could easily be *faster* than through the Python
interpreter (unless your FFI to C is something silly).
The structures of Python objects are *very* easily accessible from C,
and there is no reason you can't make them so from Julia. In Cython,
we're able to turn almost any Python code into C that is compiled by
gcc. We're 100% in the Python environment, doing everything just as the
CPython interpreter would in every way, but it is C-compiled code.
But yes, having a native Python list and a native Julia list map to the
same thing, and be modifiable simultaneously in both languages without
copying, would not work.
You should definitely be able to do
pycall(pyobj, "methodname", 3, 4.4)
reather easily.
However, to make it 100% transparent and allow
methodname(pyobj, 3, 4.4)
you may perhaps need to add a feature or two to the Julia language
(though probably nothing specific to Python).
I regret that I don't have time to dive into this myself the coming
month. But I'm more than willing to answer any questions you may have
about the this and the CPython API, if any Julia developers want to play
with it.
Dag
>
> A great starting point would be for someone to take a crack at embedding
> a Python interpreter
> <http://docs.python.org/py3k/extending/embedding.html> in a Julia
> - some enthusiastic developers/companies/__foundations wishing to
> http://wiki.cython.org/__enhancements/cep1000
> <http://wiki.cython.org/enhancements/cep1000>
>
> Best point of entry for ongoing discussion:
>
> http://article.gmane.org/__gmane.comp.python.cython.__devel/13557
> <http://article.gmane.org/gmane.comp.python.cython.devel/13557>
>
> Finally, for some light background reading, here's some comments of
> the clashes scientific Python has had with PyPy, and the dangers to
> be avoided:
>
> http://technicaldiscovery.__blogspot.com/2011/10/thoughts-__on-porting-numpy-to-pypy.html
> <http://technicaldiscovery.blogspot.com/2011/10/thoughts-on-porting-numpy-to-pypy.html>
>
> http://blog.streamitive.com/__2011/10/19/more-thoughts-on-__arrays-in-pypy/
On the other hand, almost nothing in Python libraries relies on using
the exact right type, but just duck typing the right interface -- in 99%
of the cases, if you call a Python library function that expects a list,
it would be happy with a proxy for a Julia list instead.
But that requires proxying Julia objects Python-side -- does Julia have,
or will it in time get, an API for interacting with Julia
types/objects/functions from C? (I guess I should just look in the docs
and code...)
Dag
You have an excellent appraisal of the situation. What julia loves to
do is generate native code, so using it together with python for
top-level integration, plus other possible combinations, would be a
great match and I think I can see your vision there. We do want to be
able to generate C-callable shared objects, which of course would be
usable from python, but that is probably not enough.
We should probably do some deeper integration and implement things
like pycall()/PyObject. Many python libraries are in a "sweet spot"
where there is not much performance to gain, e.g. because they do lots
of I/O. I don't know when we will get around to this, but it's very
much worth thinking about. Would we be able to focus exclusively on
python 3 at this point, or should we worry about both?
I can't really think through all the details now, but this will be an
ongoing process.
Thanks for the very helpful message.
-Jeff
On Sat, Apr 21, 2012 at 7:29 AM, Dag Sverre Seljebotn
+1. I think it would be great to have this, I'd already floated that
same idea in idle chat with some colleagues a few days ago. If some
Julia devs want to do so, I'm sure we could talk to the program
committee to ensure that the talk is accepted, as I think it's an
extremely relevant and valuable topic.
Cheers,
f
Remember that for success in the long term, one needs success in the
short term. Anybody in the SciPy community who would be interested in
adopting and helping out in implementation will be on 2.x for another
couple of years.
BUT, we don't need to have this discussion: Breakage is mainly in the
Python language. The C API, which is what is relevant for Julia+Python
integration, only changed superficially between Python 2 and Python 3.
If you focus on Python 2 first, getting Python 3 support is a couple of
days work.
The same Cython-generated C files compile with both Python 2 and 3 with
the help of a couple of #ifdefs.
Dag
> Hopefully it could be made totally transparent; at least that's my hope.
> The set of language idioms/semantics that can sanely be mapped between
> Julia and Python is much larger than what C provides.
Indeed. A "python object" proxy type in Julia could implement interfaces to
iterators, indexing, etc. Attribute access (and thus method call) requires
a bit more thought, because Julia doesn't have real equivalents. Access to
fields in Julia types is determined by static typing, not dynamic lookup.
Access from Python functions defined in Julia should be no problem at all.
Access to Julia-defined data types should be easy if Julia has reflection
mechanisms, which I haven't explored yet.
I agree that a Julia-Python bridge would be *very* nice to have!
Konrad.
Not quite true; we do dynamic field lookup in some cases as necessary.
But it is true that julia objects are not dictionaries.
pycall(obj, :method, ...) can be implemented, and if we allow
overloading dot then obj.method might even be made to work.
method(obj, ...) would be more idiomatic though?
But I don't know whether it is possible (or desirable) to implement
support for that -- guess it would require something like a
"findunknownfunction" function/hook that one could write a PyObject
method for?
Dag
About the performance, you're of course right that that's often not
needed. PyPI has 20000 packages now (want to control a browser? Write
OpenOffice documents? Parse BiBTEX? Work with JSON or XML? Almost
anything is there.) For most such "utility nice-to-haves" performance
doesn't matter at all.
However, there's another class of more computational libraries where
performance matters: PyZMQ, mpi4py, petsc4py, PyTables, Pandas,
scikits-learn, SciPy, all the dozen wrappers around mathematics
libraries in SAGE, etc..
And, incidentally, these are mostly written in Cython (with SciPy only a
subset, but that may in time grow). Which gives you some opportunities:
- We want to define a stable ABI for Cython, so that through some
introspection you'd get the corresponding C function for a Python
callable that you can call directly (the CEP 1000 discussion I linked to
in my introductory post is a humble beginning on that). For something
like ZeroMQ that gives you one C wrapper function around the raw library.
- Write a Julia backend for Cython. This is not as insane as it sounds
-- 1 1/2 years ago Enthought funded a .NET/IronPython backend for Cython
and used it to port a subset of SciPy to .NET (I was part of that effort
myself). Basically, Cython would generate Julia code that calls the
CPython API or use the C FFI -- and in the case of wrappers, could allow
you to go straight from Julia to the wrapped library. (A robust Julia
backend would be harder than the C++/CLR, but not impossible, and
PyPy+Cython faces some of the same challenges -- I'll write more when/if
you become interested).
Of course, this is a lot of work, but we're talking several months, not
several years. I'm not saying this affects anything in the *present*, I
just want to paint you a picture of what may be possible.
You've got some major strategic decisions ahead of you with how you're
going to grow a library base and a user base for Julia, good luck with
that! Feel free to ask if there's anything (when this thread dies I may
forget to check this list that often, so CC me or the cython-dev list).
Dag
You seem to have an assumption that the Julia languages stays the way it
is and will never be extended. But the Julia language is whatever the
authors make it.
> But more importantly, I think this is not a good idea because it
> raises false expectations. Julia functions do multiple dispatch based
> on the types of all arguments, whereas Python dispatches only on the
> object type. pycall(obj, :method, ...) makes this clearer.
I don't have an opinion on whether it is desirable, I'm way too new to
Julia to say.
I don't agree with your specific concern though -- you could make it so
that you would be totally free to create additional methods in Julia
taking other arguments into account. It just happens that modules
imported from Python (as seen from Julia) only contains functions
overloads on the first argument, and don't specify a type for the rest.
The real issue I see is in mapping Python semantics to Julia is that
Python is so namespace-based. I don't think it's a problem to map
"animal.make_sound(x, y)" to Julia, but I do think it's a problem to map
"os.path.realpath", or "scipy.special.sph_jn" to Julia.
Neither of the following are natural in current Julia, you'd be
painfully aware that you are calling into Python in either case:
pycall(pygetattr(os, :path), :realpath, x)
pycall(os.path, :realpath, x)
os.path.realpath(x)
But I see that namespacing is not out of the question for Julia:
https://github.com/JuliaLang/julia/issues/57
Dag
os.path.realpath(x) # requires . to be defined as a special field
# access operator that each type can implement
# arbitrarily, with current field selection
# as the default case.
symbol("os.path.realpath") seems to work. Also, this does something
interesting,
julia> :(os.path.realpath)
os.:path.:realpath
but I'm not sure if it's good or not...
--Tim
That's such a strange statement to me, it feels like we're on entirely
different planets! Interacting with other languages (not specifically
"Python", but X) is not "sole", it's the alpha and omega of succeeding
as a language these days.
Look, I didn't really come here to discuss the specifics of a
Julia<->Python bridge (though I find that interesting). What I wondered
about was the overall strategy of Julia. How will Julia be able to
thrive, unlike so many other LISP-inspired languages that came and went?
Will it:
a) focus on a narrow niche (a "glorified MATLAB") with few libraries
b) full steam ahead on implementing packages in Julia (problem being
that "the last 5%" is different for every user -- e.g., Python as 20000
packages now)
c) focus on quick, easy, transparent usage of libraries from Python
(and/or Ruby, Perl, R...) to quickly make up for being a new-comer on
the language smorgasbord
Anyway, I think Jeff and Stefan understood my point and have answered
this to my satisfaction.
Dag
Konrad Hinsen <google...@khinsen.fastmail.net> wrote:
Yes, on rereading what you wrote I see that I read a lot more between the lines than on them. My fault, I apologise.
Dag
>Konrad.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
We want to be able to compile Julia code into C-callable shared libraries, but that's going to take a fair amount of work; this would make it as easy for Python to call Julia code as it is to call C code. I'm not sure how much boilerplate and magic incantation is required for calling shared libraries from Python, but from what I recall it's pretty minimal. Whatever incantations there are can certainly be easily assisted with from the Julia side when generating libraries.
> Python is an excellent first point to consider for interop with
> other dynamic languages. Cython, because of it's focus on C
> interop, performance, and heavy use in scientific computing is an
> even better starting point.
Cython uses the "Python platform", so it's best to consider both in
parallel in my opinion.
Something that could turn out to be tricky is interfacing Julia arrays
and NumPy arrays. That's pretty much a "must have" feature for
scientific applications, so it's worth investing some effort
there. While the basic storage layout is the same on both sides, the
mapping of element types is likely to be platform dependent, and
NumPy's record arrays probably require a Julia type definition for
each array's elements. Nothing looks impossible, but it's going to be
some work.
Konrad.
Something that could turn out to be tricky is interfacing Julia arrays
and NumPy arrays. That's pretty much a "must have" feature for
scientific applications, so it's worth investing some effort
there. While the basic storage layout is the same on both sides, the
mapping of element types is likely to be platform dependent, and
NumPy's record arrays probably require a Julia type definition for
each array's elements. Nothing looks impossible, but it's going to be
some work.
> That sounds like a drag. We'd have to reimplement numpy's interface,
> in addition to everything else we need/want to do. I'd put my hopes on
> using lowest-common-denominator fortran storage, and standard data
> formats (like HDF5 maybe, but I don't know enough about it).
In-memory data and out-of-memory data are different issues for me.
For the former, you have to be able to work with the data as the
supplier (whoever that is) has prepared it, for performance
reasons. For the latter, portability among platforms and languages is
the prime issue (for me at least). I'd rather discuss them separately.
For in-memory arrays, we can certainly start with the most common simple
cases, which comes down to Fortran-style storage. This still requires
correct element type mapping, but that's true for any inter-language
interface, even with C and Fortran. Type mapping isn't really difficult,
it's just a pain to test on all platforms.
In the long run, I am almost sure there will be pressure to support
more complex arrays, such as NumPy's record arrays, which are used
much like R's data frames and are just too convenient to give up once
you are used to them. The "impedance mismatch" is that the element
type information is dynamic in Python but static in Julia.
> Most of all, we (SciPy and julia) should look at the big picture of
> how data is accessed and used. There should be some kind of
> cloud-oriented platform where data can be pushed and pulled from
> multiple environments, and allow parallel computations to move to the
That's certainly an important topic, but one I'd see addressed outside
any specific programming language. OPeNDAP (http://opendap.org/) looks
like a good starting point.
One problem is to support very different storage and access modes with
different priorities. Data stored in the cloud is used very
differently from data stored on an IBM BlueGene where fast coordinated
parallel access from thousands of nodes is the priority.
Konrad.
You have the case where the data is in memory, it just happens to be so
in different machines all over the world.
These days the vogue is all about pushing computations to the data, not
the data to the computations.
Dag
But the element types do not have to be mapped to separate julia
composite types. It's just a collection of arrays with some extra
metadata and fancier indexing behavior. And the element type of a
julia array can certainly be determined at run time.
We are planning to add more flexible struct and array-of-struct layout
support to the core language, which will allow us to generate fast
code for those cases, but will not support all the same formats as
numpy record arrays, at least not for a while. So some user-level
implementation might have to be part of the picture.
I have seen cases where "fields" are stored as separate homogeneous
arrays, but of course what you want there depends on the access
pattern.
Is OPeNDAP something worth seriously integrating with?
> Something similar to his code generators for pack and unpack could
> be used to access struct elements within a byte array. Or one could
That's what I was thinking off as well.
> We are planning to add more flexible struct and array-of-struct layout
> support to the core language, which will allow us to generate fast
> code for those cases, but will not support all the same formats as
That sounds good!
> Is OPeNDAP something worth seriously integrating with?
It's pretty easy to do (they provide a C library, and the interface is made
to resemble netCDF), and I don't think it requires any action from language
developers. Anyone who wants OPeNDAP can just write a Julia interface layer.
Apparently OPeNDAP is well accepted in fields of science, but not in mine,
so I can't say how important it is to support.
Konrad.
2) Make sure Python can be transparently called from Julia. Even ship
Python with Julia! You don't need to reimplement all the libraries under
the sun in Julia -- it may be faster, it may be purer, but using
existing libraries gets the job done now, and the important ones can
always be ported to Julia later for speed.
using PyCall
@pyimport pylab x = linspace(0,2*pi,1000); y = sin(3*x + 4*cos(2*x)); pylab.plot(x, y, @pykw color="red" linewidth=2.0 linestyle="--") pylab.show()
All the data types here are translated automatically, bi-directionally, under the hood. Multidimensional arrays (via NumPy) and
dictionaries
can be shared without making a copy. There are also lower-level interfaces, e.g. to improve performance in cases
where you know the return types of the Python functions, and a PyObject wrapper for data types that cannot yet be translated
directly to native Julia types.
Thanks to Julia's ability to natively call C functions and Python's well-documented C API, this could be implemented entirely
in Julia---the ability to mix direct C calls with a high-level language is quite powerful. (I'm quite certain that orders of magnitude
more effort would have been required to add similar functionality to Octave, Matlab, or Scilab, having hacked on all three. In
fact, I don't have to guess: the PIMS project implemented a Python interface for Scilab, and required thousands of lines of C++
code (http://www.ohloh.net/p/pims), and it's not clear that they have as much functionality as PyCall, which took a week and a
thousand lines of Julia. It's just a lot harder to bridge two high-level languages by writing code in a low-level language.)
--SGJ
--On 26 février 2013 20:14:17 -0500 "Steven G. Johnson" <ste...@alum.mit.edu> wrote:
> There are at least three audiences here:
Let me add one that doesn't exist yet but will in a few years:
4) People whose problem is best solved by combining libraries written in
Julia and libraries written in Python.
...Pythonic
(/"Julianic"? "Julish?")
Perhaps julia could be used to implement some python-callable code? It's nice
that it has some good batteries included - some good performing libraries (some
look even better than what I'm currently using).
This is very sweet! And mindbending in the sense that you are using plt in julia to plot matplotlib. Very very sweet.
Just got the "plain ipython" stuff working on my Mac, but am getting the same crash with the remote notebook usage that you are seeing. (I had never tried the remote notebook before.) Any idea which version of Julia/PyCall was working so that we can do a git bisect?