Python 3 is very close to become a holy grail of programming languages in the sense that almost everything could be redefined. However, there is still one thing missing: the immutable copy-on-assign numeric types.
Consider this part of code:
a = 1
b = a
a += 1
assert a == b + 1
The object "1" gets assigned to the "a" variable, then another independent copy gets assigned to the "b" variable, then the value in the "a" variable gets modified without affecting the second.
The problem is - this behaviour can not be recreated in user-defined classes:
a = MyInteger(1)
b = a
a += 1
assert a == b + 1
The "a" and "b" variables both point to the same object. This is a difference on what one might expect with numeric types.
But apart from that, I think allowing overloading of the binding operator "=" might be a good idea. A special method __bind__ could return the object to be bound:
a = b
should then bind the name "a" to the return value of
b.__bind__()
if b implements __bind__.
Sure, it could be used to implement copy on assignment. But it would also do other things like allowing lazy evaluation of an expression.
NumPy code like
z = a*x + b*y + c
could avoid creating three temporary arrays if there was a __bind__ function called on "=". This is a big thing, cf. the difference between NumPy and numexpr:
z = numexpr.evaluate("""a*x + b*y + c""")
The reason numerical expressions must be written as strings to be efficient in Python is because there is no __bind__ function.
lambda: a*x + b*y + c
lhs @= rhs
lhs.__bind__(lambda: rhs)
t = a + btypo = a+bt @= a+btypo @= a+b
d = {}for i in range(2):d['x'] = a + i
>
> Why? z could just be a "lazy value" at this point, basically a manual
> building of thunks, only reifying them when necessary (whenever that
> is). It's not like numpy *has* to create three temporary arrays, just
> that it *does*.
>
It has to, because it does not know when to flush an expression. This strangely enough, accounts for most of the speed difference between Python/NumPy and e.g. Fortran 95. A Fortran 95 compiler can compile an array expression as a single loop. NumPy cannot, because the binary operators does not tell when an expression is "finalized". That is why the numexpr JIT compiler evaluates Python expressions as strings, and needs to include a parser and whatnot. Today, most numerical code is memory bound, not compute bound, as CPUs are immensely faster than RAM. So what keeps numerical/scientific code written in Python slower than C or Fortran today is mostly creation of temporary array objects – i.e. memory access –, not the computations per se. If we could get rid of temprary arrays, Python codes could possibly achieve 80 % of Fortran 95 speed. For scientistis that would mean we don't need to write any more Fortran or C.
But perhaps it is possible to do this with AST magic? I don't know. Nor do I know if __bind__ is the best way to do this. Perhaps not. But I do know that automatically detecting when to "flush a compund expression with (NumPy?) arrays" would be the holy grail for scientific computing with Python. A binary operator x+y would just return a symbolic representation of the expression, but when the full expression needs to be flushed we can e.g. ask OpenCL or LLVM to generate the code on the fly. It would turn numerical computing into something similar to dynamic HTML. And we know how good Python is at generating structured text on the fly.
Sturla
Today that can be achieved by crafting a class that overrides all ops
to perform literal transforms and with a "flush" or "calculate"
method. Sympy does something like that, and it would not be hard to
have a numpy module to perform like that with numpy arrays. In this
particular use case, we'd have the full benefit of "explicit is better
than implicit".
js
-><-
But perhaps it is possible to do this with AST magic? I don't know. Nor do I know if __bind__ is the best way to do this. Perhaps not. But I do know that automatically detecting when to "flush a compund expression with (NumPy?) arrays" would be the holy grail for scientific computing with Python. A binary operator x+y would just return a symbolic representation of the expression, but when the full expression needs to be flushed we can e.g. ask OpenCL or LLVM to generate the code on the fly. It would turn numerical computing into something similar to dynamic HTML. And we know how good Python is at generating structured text on the fly.
Sturla
On 5 December 2012 18:09, Sturla Molden <stu...@molden.no> wrote:
>
> Den 5. des. 2012 kl. 19:51 skrev Masklinn <mask...@masklinn.net>:
>
>>
>> Why? z could just be a "lazy value" at this point, basically a manual
>> building of thunks, only reifying them when necessary (whenever that
>> is). It's not like numpy *has* to create three temporary arrays, just
>> that it *does*.
>>
>
> It has to, because it does not know when to flush an expression. This strangely enough, accounts for most of the speed difference between Python/NumPy and e.g. Fortran 95. A Fortran 95 compiler can compile an array expression as a single loop. NumPy cannot, because the binary operators does not tell when an expression is "finalized". That is why the numexpr JIT compiler evaluates Python expressions as strings, and needs to include a parser and whatnot. Today, most numerical code is memory bound, not compute bound, as CPUs are immensely faster than RAM. So what keeps numerical/scientific code written in Python slower than C or Fortran today is mostly creation of temporary array objects – i.e. memory access –, not the computations per se. If we could get rid of temprary arrays, Python codes could possibly achieve 80 % of Fortran 95 speed. For scientistis that would mean we don't need to write any more Fortran or C.
>
> But perhaps it is possible to do this with AST magic? I don't know. Nor do I know if __bind__ is the best way to do this. Perhaps not. But I do know that automatically detecting when to "flush a compund expression with (NumPy?) arrays" would be the holy grail for scientific computing with Python. A binary operator x+y would just return a symbolic representation of the expression, but when the full expression needs to be flushed we can e.g. ask OpenCL or LLVM to generate the code on the fly. It would turn numerical computing into something similar to dynamic HTML. And we know how good Python is at generating structured text on the fly.
But perhaps it is possible to do this with AST magic?