import os
import libjulia as jl
jl_home = os.environ["JULIA_HOME"]
jl.init(jl_home)
assert jl.eval_string("1") == 1
import julia
assert julia.eval("1") == 1
1. Startup timeThe script programs like Python should start up quickly, but I guess loading and JIT compiling of the PyCall.jl code takes a little time.
On Thursday, March 20, 2014 1:46:36 AM UTC-4, Kenta Sato wrote:1. Startup timeThe script programs like Python should start up quickly, but I guess loading and JIT compiling of the PyCall.jl code takes a little time.You can configure Julia to precompile PyCall, in which case this overhead goes away.
> 2. Overhead of calling Julia> Once a program starts running, we want a fast FFI system to call Julia from your Python program.> Using Cython, I can avoid the extra overhead of calling Julia code.It's not clear to me what precisely you mean by "extra overhead". The conversion between Python and Julia types has to happen somewhere, either on the Python side or on the Julia side. What makes you think that this will be faster if it happens to execute on the Cython side?
> I measured the evaluation time of the Julia's literal using IPython:I'm not sure that this is a fair comparison, since your code is not as full-featured as pyjulia. In particular, you are not dynamically determining the return type of eval("1") and are not converting it to a native Python object, as I understand it.
assert jl.eval_string("1") is 1
In exchange for these type-conversion features, it looks like pyjulia has less than a factor of 2 overhead compared to the bare-bones evaluation, which seems pretty good.
In [1]: run sample.pyOKIn [2]: import numpy as np
In [3]: rs = np.random.RandomState(0)
In [4]: arr = rs.randn(10000)
In [5]: sum = jl.get_base_function("sum")
In [6]: arr
Out[6]:
array([ 1.76405235, 0.40015721, 0.97873798, ..., 0.51687218,
-0.03292069, 1.29811143])
In [7]: sum(arr)
Out[7]: -184.33720158265817
In [8]: arr.sum()
Out[8]: -184.33720158265783
In [9]: timeit sum(arr)
100000 loops, best of 3: 5.44 µs per loop
In [10]: timeit arr.sum()
100000 loops, best of 3: 18.3 µs per loop
In [1]: import julia
In [2]: import numpy as np
In [3]: rs = np.random.RandomState(0)
In [4]: arr = rs.randn(10000)
In [5]: sum = julia.eval("Base.sum")
In [6]: arr
Out[6]:
array([ 1.76405235, 0.40015721, 0.97873798, ..., 0.51687218,
-0.03292069, 1.29811143])
In [7]: sum(arr)
Out[7]: -184.33720158265817
In [8]: timeit sum(arr)
1000 loops, best of 3: 392 µs per loop
> Using low level tools like Cython and C extension, I can touch the internal fields of NumPy arrays.Also, the NumPy C > API is well documented (http://docs.scipy.org/doc/numpy/reference/c-api.html).> I think it is not so difficult to implement zero-copy conversion between arrays in Julia and NumPy.Consider the fact that NumPy arrays are typically in row-major order. To represent this without a copy in Julia, you need to define a new AbstractArray type in Julia that wraps around your NumPy array. It's going to be extremely tricky to define a new subtype using the Julia C API, and will require a lot of digging around in in the Julia internals.
Even for column-major arrays, you can't just convert it into a Julia object and forget about it, because you have to be careful about garbage collection--you have to make sure that the Julia object is not garbage-collected before the Python object, or vice versa. Again, this is a surmountable difficulty, but the easiest solution is defining a wrapper type in Julia that holds a reference to the Python object.
Matters are even trickier going in the opposite direction (passing a Julia array to Python without making a copy), because Julia's garbage collection is not simply reference counting, so there is no easy way for the Python object to "hold a reference" to a Julia object. ( In PyCall, this is implemented by keeping a global Julia dictionary of objects that are needed by Python, and the Python destructor is set up to remove the object from the Julia dictionary.)It would be easier to define the AbstractArray type by writing a little glue code in Julia, but that is starting to go in the direction of reimplementing PyCall.Again, all of these are surmountable difficulties in theory, but I think it may be more tricky than you think, and I think you will find yourself re-implementing a lot of stuff in PyCall.
> In addition, Cython supports the new buffer protocol written in PEP 3118 (http://legacy.python.org/dev/peps/pep-3118/, http://docs.cython.org/src/userguide/memoryviews.html).This runs into the same issues; you need to define a new Julia type to wrap a buffer.(There is an issue for PyCall to support this protocol: https://github.com/stevengj/PyCall.jl/issues/38)
> I know there is a suitable mechanism to call Python functions from C code (http://docs.python.org/3.3/extending/extending.html#calling-python-functions-from-c).> I think this and the ccall method in Julia can solve the problem, but I'm not sure at the current moment.The main difficulty here is that creating a Julia Function object from the C API looks extremely hairy: the jl_function_t type is fairly Julia-specific, much more than just a wrapper around a C function pointer. (And again, you need to keep a reference to the Python object in the Julia Function object to prevent the former from being garbage collected.) Probably this requires some modification to the Julia C API.(Again, it would be easier to implement this on the Julia side, which is what PyCall does.)
I don't want to discourage you too much; it's great that you are interested in this. I just want to make sure that you understand the scope of the problem here. Part of the difficulty is that, unlike CPython, Julia is not really set up to be easily extensible from the C side... extending Julia with C code is really designed to be done by calling C from Julia rather than the other way around, and the C API is currently very minimal. Julia is similar to PyPy in this way.In some sense this is a good thing: writing the glue code in the high level language is more flexible, and doesn't constrain future implementation choices the way CPython's API has constrained CPython (necessitating the breakage of backward compatibility in PyPy). On the other hand, it means that when you interface Julia with another high-level language, it is much easier if the *other* high-level language has a well-defined C API. Interfacing Julia with PyPy would be harder than with CPython, for example (it would probably require one to write glue code in both languages rather than just in one).
I'm sorry for my poor explanation, but I'm afraid you may misunderstand my prototype program.It also does type conversion between Julia and Python, and the returned value is a common Python object.
Yes, the difference is not so large for a primitive values.But in other benchmarks, pyjulia becomes much slower than my program or NumPy's utilities.
The utilities which the Julia C API export seem to be enough to create arrays that is available from Julia.
The previous benchmark used `jl_array_t*` as an argument to the `sum` method, and it worked without any special wrapper type in Julia. I agree that the difference of row-major or column-major of multidimensional arrays is confusing.
Yes, garbage collection is a source of concern to me. I think Julia should have a mechanism to protect objects from the garbage collection on the Julia side in order to keep objects that is allocated in Julia for a long time.
A data buffer that is passed from Python to Julia can handle the ownership thanks to `jl_ptr_to_array` function, which has `own_buffer` argument to indicate the ownership of the data.
I think it would be a great journey to implement the full-features like PyCall.jl. In my use case, what I really want is a programming language that eliminate a bottleneck of my Python code. I've used C/C++ to solve those problems, but I think Julia is suitable language for that purpose. So, I want to create a lightweight and fast binding library that has sufficient function to implement ideas.