Michele Simionato
Combinators.
> For instance suppose I want to define a 'str_repr' function returning
> the string representation of *any* object. Such generic function
> already exists, since when I type an object in the REPL I get its
> string representation. Can I define my own 'str_repr'? How?
> How do I access the builtin function that gives the string
> representation?
In SML and OCaml, the idiomatic solution is to parameterize your IO
functions over other IO functions. For example, a function to convert a
list with elements of any type into a string would be parameterized over
the string_of function of an element:
# let string_of_list string_of list =
"["^String.concat "; " (List.map string_of list)^"]";;
val string_of_list : ('a -> string) -> 'a list -> string = <fun>
# string_of_list string_of_int [1; 2; 3];;
- : string = "[1; 2; 3]"
> And since I am at it, I would like to know if there is a standard way
> to serialize/pickle/marshal SML objects, and how it works.
I don't believe SML has standard marshalling. OCaml does but it is not type
safe. F# has type safe marshalling.
F# also solves the above problem because it retains type information and you
can resort to dynamic typing, which is exactly what the
built-in "any_to_string" function does:
> any_to_string [1; 2; 3];;
val it : string = "[1; 2; 3]"
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?u
Can you suggest a reference and/or examples?
> For example, a function to convert a
> list with elements of any type into a string would be parameterized over
> the string_of function of an element:
>
> # let string_of_list string_of list =
> "["^String.concat "; " (List.map string_of list)^"]";;
> val string_of_list : ('a -> string) -> 'a list -> string = <fun>
Uhm ... I do not understand that (yet).
> > And since I am at it, I would like to know if there is a standard way
> > to serialize/pickle/marshal SML objects, and how it works.
>
> I don't believe SML has standard marshalling. OCaml does but it is not type
> safe. F# has type safe marshalling.
If not a standard way, there should some implementation-dependent way
to
achieve persistence, right? I would be curious to see some example.
Is it possible to serialize closures? How is it done?
Michele Simionato
Yes.
> Can you suggest a reference and/or examples?
See http://mlton.org/TypeIndexedValues for starters. It contains
references to some relevant articles. I have also written an article
on the topic:
I can e-mail you a copy of the article if you'd like. Just send me an
e-mail privately.
> If not a standard way, there should some implementation-dependent
> way to achieve persistence, right?
I know of at least three (or four) serialization libraries for SML.
SML.NET has one (http://mlton.org/References#Kennedy04). MLKit has
one (http://mlton.org/References#Elsman04). SML# has one (based on
SML.NET's lib/techniques, IIRC). And then there is also my library:
Alice ML has advanced built-in support for pickling (serialization).
> I would be curious to see some example. Is it possible to serialize
> closures? How is it done?
Serializing closures within pure SML'97 is impossible, I'm afraid.
However, you can, for example, write a simple embedded interpeter and
serialize programs for the interpreter. See here for a toy example:
http://mlton.org/pipermail/mlton-user/2007-October/001210.html
-Vesa Karvonen
But the program doesn't know the types at runtime, which is when you
would have to dispatch. Due to polymorphism, you would need to pass
around runtime type information (or something equivalent) to make that
possible. A variation of this is the approach taken by Haskell with
its mechanism of so called type classes.
In ML, you generally have to use the module system for that purpose.
More precisely, functors are the mechanism to abstract over types plus
operations. This should be explained in the books.
However, often it is not necessary to fall back to such advanced
features - first-class functions take you a long way. For example,
here is a little library for converting values to strings:
val int = Int.toString
val bool = Bool.toString
fun list a xs = "[" ^ String.concatWith ", " (map a xs) ^ "]"
fun pair (a, b) (x, y) = "(" ^ a x ^ ", " ^ b y ^ ")"
For exposition, I use the name of the respective type for each
conversion function. The list and pair functions are called
combinators, because they take other functions (a and b) to construct
new functions of similar purpose.
You can now transform, say, a list of pairs:
val xs = [(3, true), (66, false), (0, true)]
val s = list(pair(int, bool)) xs
Note how I plug together the combinators to get a conversion function
of the right type - the plugging directly mirrors the type of xs. This
scales to arbitrary complex values, and you can build quite
sophisticated libraries in this style (see below).
> For instance suppose I want to define a 'str_repr' function returning
> the string representation of *any* object. Such generic function
> already exists, since when I type an object in the REPL I get its
> string representation. Can I define my own 'str_repr'? How?
> How do I access the builtin function that gives the string
> representation?
The REPL usually has access to the type information from the compiler,
so it can use some internal magic to print values based on this info.
This won't work for your own code, unless the language supports
runtime type passing (which most don't).
> And since I am at it, I would like to know if there is a standard way
> to serialize/pickle/marshal SML objects, and how it works.
Most ML's provide some unsafe way to marshal/unmarshal values. For
example, in SML/NJ there is the following pair of functions:
Unsafe.blastWrite : 'a -> Word8Vector.vector
Unsafe.blastRead Word8Vector.vector -> 'a
A Word8Vector.vector is basically a string of bytes, which you can
write to a file. Note however that the type of blastRead is completely
unsafe! Using features like this thus is strongly discouraged!
Instead, there are combinator libraries for pickling, which are safe.
One is described in this paper:
http://research.microsoft.com/~akenn/fun/picklercombinators.pdf
Last, I myself am involved in the development of Alice ML, an SML
dialect that supports type-safe higher-order pickling natively (by
implementing a form of runtime type passing). In Alice ML, you cannot
only pickle values and closures, but in fact whole modules, and all
required code will travel along in a platform-independent form:
http://www.ps.uni-sb.de/alice/manual/pickling.html
- Andreas
> I can e-mail you a copy of the article if you'd like. Just send me an
> e-mail privately.
I forgot to mention that you can see the slides from my presention at
the ML workshop here:
http://research.microsoft.com/~crusso/ml2007/slides/ml08rp-karvonen-slides.pdf
-Vesa Karvonen
Suppose I want to define a product string times positive integer such
that
"a"*3 = "aaa".
In principle I could say to the compiler (in pseudo language):
declare s: string;
declare n: integer>0:
declare operator * (s: string, n: integer>0): string = s + s+ .... + s
(* n times *)
so that the compiler would know what to do to compute the product
string times
integer, *at compile time*.
Trying to access s and n before initialization would be an error; s
and n
would be initialized at runtime, only once, and with type checking.
I don't see why this would not work and I wonder why it is not done
this
way. I am sure I am overlooking something, please educate me ;)
Michele Simionato
This only works for simple first-order examples, but breaks down as
soon as polymorphism and first-class functions come into play.
Consider the following toy example:
fun apply (f : 'a -> 'b, x : 'a) = f x
Now I do
apply (op*, ("a", 3))
The actual application of op* is performed inside the apply function,
and at that point you have no compile-time knowledge about the
argument types or even the function itself - they may be anything. You
would have to make the choice in the second line, by looking at the
way op* is used. In other words, you need some contextual type
information. This can be done, and it is basically what Haskell does
with type classes, but in the general case it requires introducing a
more elaborate notion of polymorphic types. Consider:
fun f (x) = x * 3
What is the type of f? Haskell's answer would be something along the
lines of
'a -> 'b for all 'a, 'b where op* is defined on 'a * int -> 'b
You see this is far from simple.
Also, I should add that many consider the kind of overloading you
propose pretty bad style that just serves to obfuscate code. Just
consider Java and the vast differences between saying
s + s
i + i
s + i
i + s
When the types of the arguments aren't immediately apparent this isn't
very helpful in reading programs.
- Andreas
Ok, I stand educated ;)
> Also, I should add that many consider the kind of overloading you
> propose pretty bad style that just serves to obfuscate code. Just
> consider Java and the vast differences between saying
>
> s + s
> i + i
> s + i
> i + s
>
> When the types of the arguments aren't immediately apparent this isn't
> very helpful in reading programs.
One might perhaps defend this thesis for enterprise programming
(even if I would not defend it even in that case) but
for scientific programming operator overloading and generic functions
are invaluable. BTW, what is the situation of SML
with respect to scientific programming? Are there scientific
libraries or bindings to standard libraries for number crunching
(I mean libraries like LAPACK, ATLAS, etc)?
Michele Simionato
michele....@gmail.com wrote:
> One might perhaps defend this thesis for enterprise programming
> (even if I would not defend it even in that case) but
> for scientific programming operator overloading and generic functions
> are invaluable.
As this is relevant: I am a consultant specializing in scientific computing
using modern functional programming languages (primarily OCaml and F#).
In scientific computing, type inference is preferable to operator
overloading. There have been many attempts to reconcile these two
conflicting language features.
OCaml takes the simple approach of using completely different operators
(e.g. + for integer addition and +. for float addition). This has the
advantages of being conceptually simple and unambiguous, so error messages
are clear. OCaml code is typically typeset by the editor to make +. look
similar to +, so you only notice the difference when you write code and the
use of +. replaces an explicit declaration of the "float" type.
SML takes the slightly more sophisticated approach of ad-hoc polymorphic
arithmetic operators. So + defaults to int addition but can also be applied
to floats. This makes non-typeset code more readable but requires explicit
type annotations.
Haskell takes a more heavy-weight approach of full-blown type classes. This
makes the code as elegant as possible (e.g. 0 can mean zero or empty over a
variety of types). I have too little practical experience of this to
comment but type classes complicate optimization and separate compilation.
F# is most closely related to OCaml but implements a statically-resolved
form of type classes. So you do not get the complete dynamic freedom that
type classes offer but you are guaranteed good performance, i.e. each +
means addition for a single type that is known at compile time. The F# team
chose this approach because they believe (as you said) that operator
overloading is essential in technical computing.
A simple example might help to elucidate the differences. Consider this C++:
double square(double x) { return x*x; }
In OCaml, the +. operator is used to perform double-precision float
multiply:
# let square x = x *. x;;
val square : float -> float = <fun>
In F#, you can use the overloaded + operator but you must then explicitly
specify the type:
> let square (x: float) = x * x;;
val square : float -> float
As you can see, operator overloading is largely a red herring. You traded
the "." in +. for a type annotation ": float". More complicated functions
might replace several uses of +. with a single type annotation but the
winning approach is sufficiently unclear that I would certainly not state
that "for scientific programming operator overloading and generic functions
are invaluable".
Speaking from experience, you will not need to spend a lot of time learning
ML before you realise why it is such a productive family of languages. For
me, it took only 4 months before OCaml completely replaced C++ and
Mathematica. My OCaml programs are much faster than C++ and more concise
and robust than Mathematica.
> BTW, what is the situation of SML
> with respect to scientific programming? Are there scientific
> libraries or bindings to standard libraries for number crunching
> (I mean libraries like LAPACK, ATLAS, etc)?
I wrote the book "OCaml for Scientists" which covers all major aspects of
scientific computing in OCaml:
http://www.ffconsultancy.com/products/ocaml_for_scientists/?clf
LAPACK and FFTW are among the topics discussed as well as real-time
visualization using OpenGL, with complete examples (many are freely
available from the above page along with the whole of the first chapter).
Christophe Troestler's excellent FFTW3 bindings for OCaml even include a
modified implementation of one of the examples from my book.
If you would like some simpler and more fun projects to play with then there
are several freely available on our site:
http://www.ffconsultancy.com/ocaml/?clf
There are much more polished examples with thorough descriptions published
every two weeks in the OCaml Journal:
http://www.ffconsultancy.com/products/ocaml_journal/?clf
We are just completing a new book "F# for Scientists" that will be published
by Wiley next year.
HTH.
Just to clarify, I wasn't referring to overloading in general, but to
overloading of unrelated operations like addition and string
concatenation, and particularly having mixed versions of them.
> BTW, what is the situation of SML
> with respect to scientific programming? Are there scientific
> libraries or bindings to standard libraries for number crunching
> (I mean libraries like LAPACK, ATLAS, etc)?
I have to pass on that one.
Just to clarify: I have nothing against OCaml and I may just
switch to OCaml in the future, but for the present I would like
to stick with SML, since it looks simpler, with less features,
and easier to grasp for a newbie. For instance, contrast the
situation of OCaml vs SML with respect to 'printf': in OCaml is
just there, in SML I am learning a lot studying how to implement
it. Sometimes I am in practical mood and I would rant
against the lack of such basic facilities; however in
these days I happen to be in a learning mood ;)
Seriously, the primary reason why I am looking at the ML-family
is that my experience has always been with dynamic languages and I
am completely ignorant about modern statically typed languages. I
just don't like to feel ignorant. Also, I have come to dislike
inheritance very much and I am looking for alternatives. And
I am also curious to know how is the life in a world without
type errors.
> As you can see, operator overloading is largely a red herring. You traded
> the "." in +. for a type annotation ": float". More complicated functions
> might replace several uses of +. with a single type annotation but the
> winning approach is sufficiently unclear that I would certainly not state
> that "for scientific programming operator overloading and generic functions
> are invaluable".
I was suggesting type annotations myself in a previous post, but
Andreas Rossberg pointed out that this approach does not
work well with higher order functions.
> Speaking from experience, you will not need to spend a lot of time learning
> ML before you realise why it is such a productive family of languages. For
> me, it took only 4 months before OCaml completely replaced C++ and
> Mathematica. My OCaml programs are much faster than C++ and more concise
> and robust than Mathematica.
Uhm ... maybe you meant "much faster than Mathematica and more
concise and robust than C++" which I would have no trouble to
believe.
> I wrote the book "OCaml for Scientists" which covers all major aspects of
> scientific computing in OCaml:
>
> http://www.ffconsultancy.com/products/ocaml_for_scientists/?clf
I will have a look at it. I am not doing much scientific computing,
but occasionally I have the need to process large datasets to perform
statistics. In the kind of analysis I do the
bottleneck is usually the database, not the language used :-(
BTW, what is the situation of OCaml with respect to databases?
Specifically Postgresql, Mysql, SQLite and SQL Server?
Thanks for your time,
Michele Simionato
I am confused: are you implying than Haskell type classes require
runtime information? If so, then runtime type errors may happen,
right? OTOH, if Haskell type classes do not require runtime
information (since in another post you say that Haskell is
able to figure out the type of overloaded operators from the
context) then the approach is not equivalent to runtime type
checking.
Just to check my understanding,
Michele Simionato
> I am confused: are you implying than Haskell type classes require
> runtime information? If so, then runtime type errors may happen,
> right? OTOH, if Haskell type classes do not require runtime
> information (since in another post you say that Haskell is
> able to figure out the type of overloaded operators from the
> context) then the approach is not equivalent to runtime type
> checking.
I believe that Haskell type classes do represent runtime
*information*, but not runtime *type* information - not in the sense
of runtime type checking or errors, which indeed do not happen.
For example, saying "my type t belongs to the type class Num(eric)"
means "there is a *dictionary* which tells e.g. what is the meaning of
the operation (+) in t". The Haskell type system checks at compile
time that the meaning of (+) in t has the correct type t->t->t. Thus
the dictionary for type t=Double has (+)::Double->Double->Double,
whereas the dictionary for type t=Int has (+)::Int->Int->Int, and so
on. This is how Haskell handles overloading: (+) is overloaded for all
types t=Double,Int,... in its class Num.
The compiler attaches the corresponding dictionary to each value. Thus
the compiled code for "x+y" says "look up the meaning of (+) from the
dictionary attached to x and call it with x and y". The compiler also
checks that the type of x = the type of y.
However, note that this meaning in the dictionary attached to x is
guaranteed to have
(+)::(type of x and y)->(type of x and y)->(type of x and y)
so no type checking is needed any more, and no type errors can happen.
Thus these dictionaries are the required runtime information, but the
types themselves can be omitted.
[If we know the exact type of x and y at compile time, then we can of
course further optimize the directory lookup away. But if we don't -
if "x+y" is generic code over all Num types - then we can do the above
instead.]
--
Matti Nykänen
professori, Tietojenkäsittelytieteen laitos, Kuopion yliopisto
professor, Department of Computer Science, University of Kuopio, Finland
But if I understand correctly, the dictionary is just an
implementation technique, not required by the language. JHC gets rid
of it, apparently by using typed intermediate languages to keep more
type info around til code generation time. Note that I don't actually
understand what the above means in any detail, so I've been meaning to
ask about it here. See the typeclass section of:
Haskell (IOW, GHC and *not* Haskell '98) allows, among other things,
polymorphic recursion and higher-rank polymorphism. These features
make it impossible to eliminate the run-time aspect (some kind of
dispatch using, for example, method dictionary or type representation
passing) of type classes in all cases.
-Vesa Karvonen
michele....@gmail.com wrote:
> On Nov 8, 3:47 pm, rossb...@ps.uni-sb.de wrote:
>> But the program doesn't know the types at runtime, which is when you
>> would have to dispatch. Due to polymorphism, you would need to pass
>> around runtime type information (or something equivalent) to make that
>> possible. A variation of this is the approach taken by Haskell with
>> its mechanism of so called type classes.
>
> I am confused: are you implying than Haskell type classes require
> runtime information?
Effectively, yes.
> If so, then runtime type errors may happen, right?
No.
> OTOH, if Haskell type classes do not require runtime
> information (since in another post you say that Haskell is
> able to figure out the type of overloaded operators from the
> context) then the approach is not equivalent to runtime type
> checking.
There is no run-time type checking, just run-time dispatch to the
appropriate type-specialized function in the basic case. Of course, you
don't want to incur a run-time dispatch every time you add a pair of
integers so any decent Haskell compiler will go to great lengths to
statically resolve the lookup and avoid it completely.
In general, this is very successful and most instances can be resolved. The
problem is that it is not clear when they have been resolved, which makes
it more difficult to optimize code in the presence of abstractions.
Haskell 98 does allow polymorphic recursion.
Lauri
Hmm, so what's the situation with JHC? Does it use a dictionary
in those cases?
Indeed, I was confused. Before posting the above, I quickly looked at
GHC's list of extensions, which mentions polymorphic recursion, but in
a specific context that I missed.
-Vesa Karvonen
One correction: dictionaries are *not* attached to values. They are
passed independently (where needed, potentially whereever you have a
class constraint on a polymorphic type variable). That is the major
difference beween type classes and object-oriented classes and their
use of method tables.
The type class approach is more general, since it allows dispatch
independent of the existence of a particular value. For example, you
can dispatch a call on the *result* type of the respective function,
or any other part of the function's type. Moreover, the approach
naturally scales to multiple dispatch, while this is not the case for
conventional OO-style methods.
- Andreas
michele....@gmail.com <michele....@gmail.com> wrote:
> [...] The thing that bugs me, is that everybody is saying that
> overloading is not supported in SML, i.e. user-defined function cannot
> accept a generic type. [...]
> For instance, if I wanted to define the sum of two vectors (in the
> mathematical sense, as entities in a vector space) or of two matrices,
> why I cannot do it? I am puzzled since the compiler knows the type of
> the variables (and the type cannot change) so it should be able to
> dispatch to the right sum function at compile time. I am so used at
> thinking in terms of generic functions that it is difficult for me to
> understand how can you live without. Are there tricks to somewhat
> emulate generic functions? For instance suppose I want to define a
> 'str_repr' function returning the string representation of *any* object.
And, in another post:
> [...] for scientific programming operator overloading and generic
> functions are invaluable. [...]
Let me just first briefly rephrase what you are saying. You feel that
overloading and generic functions are important, because you are used to
thinking in terms of them. In particular, you feel that overloading and
generic functions are essential for scientific programming. You are
puzzled at how people programming in SML are able to do without
overloading and generic functions.
Actually, I have been at a very similar position. Before learning ML, I
programmed mostly in C++. For example, I wrote libraries and application
code for dealing with 3D graphics, vectors, matrices, rays, planes,
triangles, quaternions, etc... In C++, one uses *operator* overloading in
such code heavily and, frankly, it improves the readability of basic
graphics / geometric algorithms significantly to be able to write them
with a notation that resembles typical mathematical notation. Also,
through overloading and templates, one can write "generic" functions such
as a function for converting values to strings. When I first started
learning ML, O'Caml at first, it felt rather limiting that one couldn't
have *overloaded* functions. Perhaps fortunately, at the time, I wasn't
writing code that would have been dealing with concepts where overloading
seemed crucial to me (my first O'Caml project was a compiler).
So, how do we (ML'ers) actually get by without (operator) overloading and
generic functions? In this post I'll concentrate mainly on operator
overloading from the perspective of scientific programming or graphics
programming. ("Generic" has many meanings, for one interpretation see the
following page http://mlton.org/TypeIndexedValues.) First of all, most of
us, but there are exceptions, of course, probably aren't writing the kinds
of applications where operator overloading would be crucial. But let's
ignore that. What if you want to do something like scientific programming
or 3D graphics programming in ML?
I think that many people, when they speak of overloading, are thinking
about operator overloading in particular. This is because a number of
(more or less) mainstream languages provide a fixed set of "operators",
that is, symbolic identifiers, some of which are used as infix operators.
To be able to make "mathematical" code readable, you want to be able to
overload those operators, because they are the only symbolic and/or infix
operators in the language. If you couldn't overload the operators, you'd
have to write expressions with ordinary prefix function calls, using
possibly quite long function names, and it would significantly decrease
the readability of the code.
Now, an important reason why ML'ers get by in such situations is that
while ML doesn't directly support overloading, it does support user
definable, symbolic, infix operators. So, while you might not be able to
overload +, -, *, and / for vectors, you can, say, define the operators
:+:, :-:, :*:, and :/: for element-wise addition, subtraction,
multiplication, and division of vectors. You could further adopt the
convention that colon, ":", when it appears on the left or right side of
an operator, means that the corresponding operand is a vector and add a
few more operators, like *: and :* for concise mixed arithmetic with
scalars and vectors. Now, consider the following artificial example that
takes two lists of vectors ({x, y, z} vectors) and performs an ad hoc
operations on them and accumulates the results:
val test =
ListPair.foldl
(fn (v, w, s) => s + mag (2.0 *: v :-: update#y (w, 2.0 * sub#y w)))
0.0
With overloading, one could write it as
val test =
ListPair.foldl
(fn (v, w, s) => s + mag (2.0 * v - update#y (w, 2.0 * sub#y w)))
0.0
but the notational difference is arguable rather insignificant. At the
end of this message is nearly the full code (and a bit of stuff not used
in the above) for compiling the above snippet of code with MLton. If
necessary, I can provide a single, self-contained SML source file.
If you take a look at the code, you can see that vector operations have
been defined quite concisely, making extensive use of simple higher-order
functions and several layers of abstraction. For example, the *:
operator, used above, is defined in terms of quite a few simpler functions
(can you count them?). Also, the above expression computes three
intermediate vector results (one by scaling, another by functionally
updating an element of a vector, and the last one by vector subtraction)
before computing the magnitude of the vector and summing it with the
accumulator.
Now, an interesting question is what kind of code the compiler actually
generates for the above expression? Here is the loop produced by MLton:
L_883:
movl %edi,%ecx
andl $0x3,%ecx
jnz L_2836
L_1401:
movl 0x4(%edi),%ecx
fldL 0x0(%edx)
fldL 0x8(%edx)
fldL 0x10(%edx)
fldL (globalReal64+0x10)
fmul %st, %st(1)
fmul %st, %st(2)
fmul %st, %st(3)
movl 0x0(%edi),%ebx
fldL 0x8(%ebx)
fmulp %st, %st(1)
fldL 0x0(%ebx)
fxch %st(2)
fsubL 0x10(%ebx)
fxch %st(3)
fsubp %st, %st(1)
fxch %st(3)
fsubp %st, %st(1)
fxch %st(1)
fmul %st, %st
fxch %st(2)
fmul %st, %st
fxch %st(1)
fmul %st, %st
faddp %st, %st(1)
faddp %st, %st(1)
fsqrt
faddp %st, %st(1)
cmpl $0x1,%esi
je L_2784
L_1402:
movl 0x0(%esi),%edx
movl 0x4(%esi),%esi
movl %ecx,%edi
jmp L_883
While the above is certainly not optimal (in particular, it could be
improved significantly by switching to 4-element vectors and using
SIMD instructions) it is pretty good, IMHO. In particular, the
compiler is smart enough to effectively eliminate all intermediate
vectors. Essentially, there would be almost zero benefit to be had by
tweaking the code further by manual inlining, for example. The
programming technique is clearly viable.
Here is a challenge: transliterate the SML code (i.e. write it in the
(almost) exact same way with a trivial one-to-one correspondence
between each expression and sub-expression of the code) to your
favorite language and look at the code produced by your favorite
compiler. Are the programming techniques expressible (e.g. can you
define several operators (e.g. |*| & |* & *|) with a single function
call and multiple binding) and viable (i.e. compared to manually
inlined and tweaked code, how does the concisely written
transliterated code perform) in your favorite language? And remember
to have fun! I certainly did. I spent all of the time working on
that example thinking about how to implement the vector operations as
concisely as possible using as small a set of primitive operations as
possible. The code is representative of how I would approach the task
of implementing vector operation in SML/MLton. (Oh, and for the
record, I do know that there are a couple of compilers that are
probably up to the challenge.)
Back to the topic.
So, I think that it is debatable whether it is infix operators or
overloading or both at the same time that is really crucial in domains
like scientific programming. There is also a trick for creating operator
bindings for situations where you have a wide collection of "number -like"
types and you might want to use different sets of operators in different
sections of your code. See the following link for a brief mention of the
basic technique:
http://mlton.org/pipermail/mlton/2005-January/026663.html
But what if you really need overloading? It turns out that, if you know
what you are doing, you can even have that too (even without any run-time
overhead). See the discussion starting from the following message for
details of the very simple technique:
http://mlton.org/pipermail/mlton/2005-October/028127.html
-Vesa Karvonen
infix 7 :*: :* *: :/: :/ /:
infix 6 :+: :+ +: :-: :- -:
signature SCALAR = sig
type t
val ~ : t UnOp.t
val + : t BinOp.t
val - : t BinOp.t
val * : t BinOp.t
val / : t BinOp.t
val sqrt : t UnOp.t
val fromInt : Int.t -> t
end
signature SEQ_CORE = sig
type 'a t
val map : ('a -> 'b) -> 'a t -> 'b t
val selector : ('a t -> 'a) t
val foldr : ('a * 'b -> 'b) -> 'b -> 'a t -> 'b
end
signature SEQ = sig
include SEQ_CORE
val app : 'a Effect.t -> 'a t Effect.t
val toList : 'a t -> 'a List.t
val dup : 'a -> 'a t
val zipWith : ('a * 'b -> 'c) -> 'a t * 'b t -> 'c t
type 'a r
val sub : ('a r t -> 'a r) -> 'a t -> 'a
val update : ('a r t -> 'a r) -> 'a t * 'a -> 'a t
val sumWith : ('a * 'a -> 'a) -> 'a t -> 'a
end
functor MkSeq (Core : SEQ_CORE) :> SEQ where type 'a t = 'a Core.t = struct
open Core
fun zipWith f (l, r) = let
val l = map INL l
val r = map INR r
in
map (fn s => f (Sum.outL (s l), Sum.outR (s r))) selector
end
fun app e = ignore o map e
fun dup v = map (const v) selector
fun toList v = foldr op :: [] v
type 'a r = 'a ref
fun sub f v = case map ref v of r => ! (f r)
fun update f (v, s) = case map ref v of r => (f r := s ; map ! r)
fun sumWith f =
Sum.outR o foldr (fn (v, INL ()) => INR v
| (v, INR s) => INR (f (s, v))) (INL ())
end
(* This signature isn't really necessary, it could be eliminated for brevity *)
signature VEC = sig
structure Scalar : SCALAR and Seq : SEQ
type s = Scalar.t and v = Scalar.t Seq.t
val diag : s -> v -> v Seq.t
val e : v Seq.t
val ~| : v UnOp.t
val :+: : v BinOp.t
val :+ : v * s -> v
val +: : s * v -> v
val :-: : v BinOp.t
val :- : v * s -> v
val -: : s * v -> v
val :*: : v BinOp.t
val :* : v * s -> v
val *: : s * v -> v
val :/: : v BinOp.t
val :/ : v * s -> v
val /: : s * v -> v
val dot : v Sq.t -> s
val norm : v -> s
val mag : v -> s
val lerp : v Sq.t -> s -> v
val unitize : v UnOp.t
end
functor Vec (structure Scalar : SCALAR and Seq : SEQ_CORE) : VEC = struct
structure Scalar = Scalar and Seq = MkSeq (Seq)
open Scalar Seq
type s = Scalar.t and v = Scalar.t Seq.t
fun diag s v = map (fn f => update f (dup s, sub f v)) selector
val e = diag (fromInt 0) (dup (fromInt 1))
val ~| = map Scalar.~
local
fun mk f =
case zipWith f
of vv => vv & vv o Pair.map (id, dup) & vv o Pair.map (dup, id)
in
val op :+: & op :+ & op +: = mk op +
val op :-: & op :- & op -: = mk op -
val op :*: & op :* & op *: = mk op *
val op :/: & op :/ & op /: = mk op /
end
val dot = sumWith op + o op :*:
val norm = dot o Sq.mk
val mag = sqrt o norm
fun lerp (l, r) s = l :* (fromInt 1 - s) :+: r :* s
fun unitize v = v :* (fromInt 1 / mag v)
end
functor RealToScalar (Real : REAL) = struct
open Real
type t = real
val sqrt = Math.sqrt
end
structure Real = RealToScalar (Real)
structure XYZ : SEQ_CORE = struct
type 'a t = {x : 'a, y : 'a, z : 'a}
val selector : ('a t -> 'a) t = {x = #x, y = #y, z = #z}
fun map f {x, y, z} = {x = f x, y = f y, z = f z}
fun foldr f s {x, y, z} = f (x, f (y, f (z, s)))
end
structure V3R = Vec (structure Scalar = Real and Seq = XYZ)
open V3R V3R.Seq
val test =
ListPair.foldl
(fn (v, w, s) => s + mag (2.0 *: v :-: update#y (w, 2.0 * sub#y w)))
0.0
val s2r = valOf o Real.fromString
val () =
recur [] (fn lp =>
fn vs =>
case TextIO.inputLine TextIO.stdIn
of NONE => println (Real.toString (test (vs, rev vs)))
| SOME l =>
case String.tokens Char.isSpace l
of [x, y, z] => lp (map s2r {x=x, y=y, z=z} :: vs)
| _ => fail "Must have 3 reals per row")
So let's say that I am convinced that you can live without operator
overloading once you have the possibility to define custom infix
operators.
As you know, I am not particularly concerned about performance,
but since you insist on that, let me ask a
few questions, out of idle curiosity (I may have a need for
performance in the future, who knows). I was looking at the Debian
shootout, which is performing vector computations using the trick
you are describing:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=nbody&lang=mlton
I see that MLton takes 27 seconds versus 15 seconds of C
and 22 seconds of SBCL; Java is doing a pretty respectable
job at 16 seconds, maybe it is the JIT, anyway, quite impressive
(I am disappointed by Fortran G95 at 20 seconds, it is probably
not optimized as it was in the old days).
Are these figures significant, according to your experience? Should
I expect a performance within a factor of two from C for
numerical code in MLton? I know that benchmark are lies, so
I am asking your opinion.
Michele Simionato
Yes. However, MLton does not yet have a good backend codegen for 64-bit, so
it typically generates slower code than OCaml on my 2.2GHz Athlon64 X2. On
the nbody benchmark I get:
C++: 8.2s
OCaml: 9.2s
MLton: 15.8s
SBCL: 23.0s
Look at the relative speedup compared to the results on the shootout:
C++: 1.8x faster
OCaml: 4.0x faster
MLton: 1.7x faster
SBCL: 1.0x faster
So if you want your numerical functional programs to run fast in absolute
terms then use OCaml on an Athlon64 in 64-bit.
> I know that benchmark are lies, so I am asking your opinion.
You also need to watch out that many of the languages can do a lot better
but the shootout maintainers subjectively reject submissions, making
certain languages look artificially bad. This is often the case of OCaml,
IME. For example:
http://ocamlnews.blogspot.com/2007/09/spectral-norm.html
My ray tracer language comparison is a much more objective benchmark and,
consequently, I have never had to reject any submissions:
http://www.ffconsultancy.com/languages/ray_tracer/results.html
I would have agreed until I tried F#, which adds operator overloading to an
OCaml-like language. The result is much nicer for scientific computing!
F# uses a kind of statically-resolved type class to implement overloading,
provides many overloaded operators and some functions (e.g. sin and cos
apply to all float types and the "float_of_*" casts become just "float").
You can add your own overloads to operators but not to functions, which I
think is an excellent trade-off.
I'm working on my own language now and this is certainly one of the features
that I will steal from F#...
> I was looking at the Debian shootout, which is performing vector
> computations using the trick you are describing:
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=nbody&lang=mlton
Actually, it isn't. The code was originally translated to SML by
Matthias Blume. You might be confused by the comment "Ported to MLton
by Vesa Karvonen". I believe I merely removed some of the boilerplate
required to compile the benchmark as a stand-alone program with
SML/NJ.
Now that you mention it, I think that this is a perfect example of
what I've been trying to communicate here. As a case in point, I
rewrote the implementation for clarity, using a simpler representation
of the system (a list of records), and essentially a reusable vector
arithmetic library (the same code I posted here). Performance stayed
essentially the same. In fact, performance of my version is slightly
better on my laptop. The amount of code taken by the actual
(non-reusable) simulation code reduced significantly (from 535 words
to 354 words --- it could be further reduced via obfuscation, but I'll
refrain from that). IMO, the readability of the simulation code also
improved significantly. You can find my version from here:
http://mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/org/mlton/vesak/toys/n-body/
> I see that MLton takes 27 seconds versus 15 seconds of C and 22
> seconds of SBCL; Java is doing a pretty respectable job at 16
> seconds, maybe it is the JIT, anyway, quite impressive (I am
> disappointed by Fortran G95 at 20 seconds, it is probably not
> optimized as it was in the old days).
SBCL is doing quite well here. It would be interesting to see someone
transliterate my version to SBCL and see the resulting performance.
The current SBCL version is fairly carefully tuned. In particular, it
uses manually specialized code for the 3D vector arithmetic. I would
guess that the main reason that many compilers are doing better here
than MLton is that they are doing better at scheduling the
instructions for good pipeline usage. In fact, using MLton's C
codegen, I observe a minor performance improvement in my version of
the toy. This is not typical. My guess (I haven't looked at the
code) is that gcc is simply better at scheduling FP instructions for
good pipeline usage. A factor of 2-4 in performance could easily be
explained by that.
BTW, here is the code produced by MLton (just) for the inner(most)
loop of the simulation:
L_950:
movl 0x20(%esi),%edx
fldL 0x0(%edx)
fldL 0x8(%edx)
movl 0x10(%ebp),%ecx
movl 0x20(%ecx),%ebx
fldL 0x10(%ebx)
fsubL 0x10(%edx)
fxch %st(1)
fsubrL 0x8(%ebx)
fxch %st(2)
fsubrL 0x0(%ebx)
fld %st(1)
fmul %st, %st
fld %st(2)
fld %st(4)
fmul %st, %st
fld %st(5)
fld %st(4)
fmul %st, %st
faddp %st, %st(2)
fxch %st(3)
faddp %st, %st(1)
fsqrt
fld %st
fmul %st, %st
fmulp %st, %st(1)
fdivrL (globalReal64+0x108)
fldL 0x18(%esi)
fmul %st(1), %st
fmul %st, %st(2)
fmul %st, %st(3)
fmul %st(4), %st
fxch %st(2)
fsubrL 0x10(%ecx)
fstpL 0x10(%ecx)
fxch %st(2)
fsubrL 0x8(%ecx)
fstpL 0x8(%ecx)
fsubrL 0x0(%ecx)
fstpL 0x0(%ecx)
fldL 0x8(%esi)
fxch %st(1)
fmulL 0x18(%ecx)
fmul %st, %st(3)
fmul %st, %st(4)
fmulp %st, %st(2)
fxch %st(2)
faddL 0x10(%esi)
fstpL 0x10(%esi)
fxch %st(2)
faddp %st, %st(1)
fstpL 0x8(%esi)
faddL 0x0(%esi)
fstpL 0x0(%esi)
cmpl $0x1,%edi
je L_1640
L_951:
movl 0x0(%edi),%esi
movl 0x4(%edi),%edi
jmp L_950
As you can see, MLton has again done a very good job at eliminating
temporaries and using an efficient representation for the simulation
objects. Tuning the code by specializing the vector arithmetic for
this toy would be mostly fruitless.
> Are these figures significant, according to your experience? Should
> I expect a performance within a factor of two from C for numerical
> code in MLton?
Generally, yes. If you get significantly worse relative performance
than that, you should report it. There are some exceptions. MLton
is, by design (for correctness), somewhat more conservative in FP
optimizations than gcc, for example. Some of these issues could
probably be solved with relatively minor amounts of compiler hacking.
> I know that benchmark are lies
I totally agree. Benchmarks are fine as a basis for further analysis
(e.g. to find out weaknesses of a particular compiler). Not as
objective results in themselves.
-Vesa Karvonen
Do you claim that the program shown in that 15 September 2007 blog
entry was contributed to the benchmarks game and rejected?
> > I know that benchmark are lies
>
> I totally agree. Benchmarks are fine as a basis for further analysis
> (e.g. to find out weaknesses of a particular compiler). Not as
> objective results in themselves.
When X was measured and Y is reported the benchmark is a lie.
In other cases, it is not the benchmark that is at fault but the
interpretations made from the benchmark are at fault - it's just data,
we make from it our own meanings.
I totally agree that benchmarks are more a source of questions than
answers.
(The "benchmark lie" that has surprised me the most was - the
benchmarks game showed X, and following the link it became obvious
that it did not show X and never had shown X, and afaict the link to
the benchmarks game was simply a bluff based on the assumption that
hardly anyone would follow the link and check.)
No but you are most welcome to use it.
Anyone interested can see when I stopped contributing to the shootout from
my posts to the shootout mailing list.
If X, Y and Z were all measured but the shootout maintainer doesn't like X
and Z for some subjective reason and, therefore, pretends that only Y
exists then he is also deceiving his audience.
This is the reality of the current shootout and is just as much a lie.
Thank you for the clarification.
Some readers may have thought that your previous comment suggested you
had contributed that OCaml code and now we all know that is untrue.
As you already now, we don't screen-scrape code snippets from
webpages. If you have a complete tested program that you would like
to contribute then please follow the FAQ instructions:
In your mind that may be reality.
Yet you failed to attribute my work correctly.
> If you have a complete tested program that you would like
> to contribute then please follow the FAQ instructions:
So that you can "deoptimize" it:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=binarytrees&lang=ocaml
or reject it for being too efficient.
No thanks.
Is that a reference to something specific?
> > If you have a complete tested program that you would like
> > to contribute then please follow the FAQ instructions:
>
> So that you can "deoptimize" it:
>
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=binarytrees&...
The way the shootout never attributed my code as agreed and then removed it
because you didn't want to refer to this company.
You seem to be talking about specific programs - which programs?
The spectral-norm C++ and nsieve-bits C++ programs?
(My apologies to long-suffering comp.lang.functional - I don't know
why I keep trying to make sense of Jon Harrop's endless stream of
accusations.)
This was already detailed on the shootout mailing list.
We're left chasing phantoms again.
When we do catch hold of something specific it turns out like this -
"Some readers may have thought that your previous comment suggested
you had contributed that OCaml code and now we all know that is
untrue. "
It's not interesting.
>http://shootout.alioth.debian.org/gp4/faq.php#play
Dalmatian is misspelled.
--
for email reply remove "/" from address
>If X, Y and Z were all measured but the shootout maintainer doesn't like X
>and Z for some subjective reason and, therefore, pretends that only Y
>exists then he is also deceiving his audience.
Oops, I thought for a moment that you were talking about your own ray
tracer benchmark.
George
Constructive criticism! Thank you.
If you find people detailing your plagiarism and deceit uninteresting, sure.
Hopefully some people will take heed and not waste their time with you.
I have never rejected a single submission. Indeed, I have no need to reject
submissions precisely because the ray tracer is a sound benchmark.
>On Dec 7, 6:12 pm, George Neuner <gneuner2/@/comcast.net> wrote:
>> On Thu, 6 Dec 2007 08:06:12 -0800 (PST), Isaac Gouy <igo...@yahoo.com>
>> wrote:
>>
>> >http://shootout.alioth.debian.org/gp4/faq.php#play
>>
>> Dalmatian is misspelled.
>
>Constructive criticism! Thank you.
Didn't see anything else to criticize.
George
You haven't detailed any plagiarism or deceit, it's yet another vague
denigration. You have selectively edited my comment - here's what I
wrote:
It isn't vague at all: everything is detailed in my posts to the shootout
mailing list.
--
You probably made hundreds of posts to the shootout mailing list,
specifically which posts do you claim substantiate your accusations?
160.
> specifically which posts do you claim substantiate your accusations?
This post of mine cites several programs by myself and others that you chose
to ignore despite the improvements they offered:
http://osdir.com/ml/lang.shootout.general/2005-06/msg00161.html
This post explains why your standard rebuttal "but your program works by
different means" is undefined and, consequently, devoid of merit:
http://osdir.com/ml/lang.shootout.general/2005-09/msg00023.html
If anyone wishes to spend their weekend reading Jon Harrop's old posts
then I recommend reading the mailing-list archive directly - there
aren't any ads -
http://lists.alioth.debian.org/pipermail/shootout-list/2005-June/002331.html
On Dec 7, 9:10 pm, Jon Harrop <use...@jdh30.plus.com> wrote:
> Isaac Gouy wrote:
> > It's not interesting.
> If you find people detailing your plagiarism and deceit uninteresting, sure.
-snip-
You've made an accusation of plagiarism but those mailing-list
discussions are about someone having accused /you/ of plagiarism.
Yes. You (and only you) made a variety of non-specific and unsubstantiated
accusations on your list not uncoincidentally just before I stopped
contributing.
You've made an accusation of plagiarism - clearly the specific emails
you have suggested as evidence do not substantiate your accusation -
it's another one of your false accusations (haven't you more amusing
to do?)
I suppose it depends what you consider your contribution to have been
- you were still contributing your opinions to the mailing-list 3
months later.
Everything is in those posts. I won't go over it again here. You are well
aware of how you work the shootout.
> I suppose it depends what you consider your contribution to have been
> - you were still contributing your opinions to the mailing-list 3
> months later.
3 months later than what?
Here, on this mailing-list, you accused me of plagiarism - please make
clear whether or not you have now withdrawn that accusation.
> > I suppose it depends what you consider your contribution to have been
> > - you were still contributing your opinions to the mailing-list 3
> > months later.
>
> 3 months later than what?
3 months later than the June 2005 post that you linked to.
> I have never rejected a single submission. Indeed, I have no need to reject
> submissions precisely because the ray tracer is a sound benchmark.
By the way,
<http://www.ffconsultancy.com/languages/ray_tracer/code/1/ray.ml>
doesn't type-check.
Works fine here. Are you using compile line I gave? In particular, are you
using -rectypes?