various forms of generic programming?

raould

unread,

Feb 8, 2010, 7:38:26 PM2/8/10

to

hello

the term generic seems to be over-used, and i'm not sure i grok all
the uses and/or how they overlap. but anyway, i'm attempting to learn
more about the subject. some things i've read [1] make it sound to me
like c++ is the most complete language when it comes to generic
approaches. of course that makes me sad. (maybe d does well, i dunno.)
supposedly scala fares better than most, but still falls at some
point. any thoughts? i'm looking to learn about such things in a
concrete real world language rather than e.g. an extension to ghc.

gracias.

[1] http://repository.tamu.edu/bitstream/handle/1969.1/5008/etd-tamu-2006C-CPSC-Nguessan.pdf

Michael Ekstrand

unread,

Feb 8, 2010, 8:30:32 PM2/8/10

to

On 02/08/2010 06:38 PM, raould wrote:
> the term generic seems to be over-used, and i'm not sure i grok all
> the uses and/or how they overlap. but anyway, i'm attempting to learn
> more about the subject. some things i've read [1] make it sound to me
> like c++ is the most complete language when it comes to generic
> approaches. of course that makes me sad. (maybe d does well, i dunno.)
> supposedly scala fares better than most, but still falls at some
> point. any thoughts? i'm looking to learn about such things in a
> concrete real world language rather than e.g. an extension to ghc.

C++ does have some very powerful generic programming features. I find,
however, that I can do the vast majority of useful generic programming
things that I might do with C++ templates by using functors and
higher-order functions in OCaml (or Standard ML).

Functors allow modules to be parameterized over other modules,
frequently data types and associated operations. Practically, this
allows you to have code which can operate over any structures/backends
that meet particular requirements, and the backend can be switched by
simply instantiating the functor over a different module. Further, you
have very strong typing guarantees which are far easier to debug than
C++ template error messages.

The OCamlGraph library is a very good example of how ML-style generic
programming can be implemented and used. It provides a variety of
graph-related algorithms, each functorized over a module interface
exposing the graph functionality it needs. It also provides a variety
of graph implementations. You can then piece together the pieces you
need. For example:

(* let's use persistent directed graphs *)
module G = Graph.Persistent.Digraph.Concrete(VertexModule)
(* everything has weight 1 *)
module W = struct
type label = unit
type t = int
let weight () = 1
let compare = Pervasives.compare
let add = (+)
let zero = 0
end
(* instantiate Dijkstra's algorithm with our graph and weight *)
module SP = Path.Dijkstra(G)(W)

Now we have access to Dijkstra's all-pairs shortest path over our choice
of graph implementation. Many other things don't even need the
syntactic overhead of modules and functors - passing around closures
will suffice.

- Michael

Keith H Duggar

unread,

Feb 9, 2010, 12:20:50 AM2/9/10

to

On Feb 8, 7:38 pm, raould <rao...@gmail.com> wrote:
> the term generic seems to be over-used, and i'm not sure i grok all
> the uses and/or how they overlap. but anyway, i'm attempting to learn
> more about the subject. some things i've read [1] make it sound to me
> like c++ is the most complete language when it comes to generic
> approaches. of course that makes me sad. (maybe d does well, i dunno.)
> supposedly scala fares better than most, but still falls at some
> point. any thoughts? i'm looking to learn about such things in a
> concrete real world language rather than e.g. an extension to ghc.

You might find these useful

"A Comparative Study of Language Support for Generic Programming"
Garcia et al
http://www.osl.iu.edu/publications/prints/2003/comparing_generic_programming03.pdf

"An Extended Comparative Study of Language Support for Generic
Programming" Garcia et al
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.110.122&rep=rep1&type=pdf

KHD

Ertugrul Söylemez

unread,

Feb 9, 2010, 12:43:29 AM2/9/10

to

raould <rao...@gmail.com> wrote:

> the term generic seems to be over-used, and i'm not sure i grok all
> the uses and/or how they overlap. but anyway, i'm attempting to learn
> more about the subject. some things i've read [1] make it sound to me
> like c++ is the most complete language when it comes to generic
> approaches. of course that makes me sad. (maybe d does well, i dunno.)
> supposedly scala fares better than most, but still falls at some
> point. any thoughts? i'm looking to learn about such things in a
> concrete real world language rather than e.g. an extension to ghc.

For most generic programming tasks the type system of standard Haskell
is totally sufficient. If you really need GHC extensions, you're
already doing something more sophisticated. Note that the type systems
of most other languages can't even express the most basic Haskell
constructs.

In fact, C++ isn't even a good choice, when it comes to generic
programming. Besides its ugly syntax, it also lacks type inference,
which can be very annoying. Further it has no support for anonymous or
at least lexically scoped functions, not to mention closures.

Another important fact is that C++ has no contracts, contexts,
interfaces or whatever you call them, so you need dynamic casts, which
are bad. A lot of problems are catched by them at compile time, but
because of the lack of them, most type errors in C++ can only be found
at run time, if the programmer doesn't forget to check for them.

Finally it's impossible to separate template interface and
implementation in a sane way. There is no module system of any kind.
You get header files and the classic dumb linker.

To see all of these problems in action at the same time, try to
implement Haskell's Functor class as a C++ template. It's about the
simplest example of useful generic programming, and C++ makes it a
nightmare.

No type inference means that you have to specify your container type all
the time. No anonymous functions, not even lexically scoped functions,
so your mapping function will be a named top-level function. No
contracts, so your fmap function doesn't even know whether your
container type is in fact a container type. It needs dynamic_cast, so
type errors are not catched at compile time. Furthermore it's
impossible to extend existing container classes to support your new
Functor template, not to mention primitive types like arrays. You need
to subclass existing classes and make a completely new class for arrays.

Greets
Ertugrul

--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://blog.ertes.de/

Paul Rubin

unread,

Feb 9, 2010, 3:24:55 AM2/9/10

to

raould <rao...@gmail.com> writes:
> the term generic seems to be over-used, and i'm not sure i grok all
> the uses and/or how they overlap. but anyway, i'm attempting to learn
> more about the subject. some things i've read [1] make it sound to me
> like c++ is the most complete language when it comes to generic
> approaches.

There was a study by Garcia, Jarvi et al that Keith mentioned. A
working url is here:

http://www.osl.iu.edu/publications/prints/2005/garcia05:_extended_comparing05.pdf

It compared a bunch of different languages and Haskell basically came
out on top, except in one area, which was that it didn't support
associated types as well as could be done in C++ through template
programming. In response to the paper, SPJ et al added type families to
GHC:

http://haskell.org/haskellwiki/Simonpj/Talk:FunWithTypeFuns

and that took care of the weak spot. It mostly seems to me that we hear
more about generics in connection to C++ than to functional languages,
because the C++ community uses the term a lot more, and puts a lot of
energy into the template hacking needed to make it happen in C++. In
functional languages like ML or Haskell, it's just called "polymorphism"
and taken for granted. The Haskell community actually uses the term
"generic" to mean something different, i.e. the automatic derivation of
code from algebraic types, like the "deriving" mechanism or in libraries
like Scrap Your Boilerplate.

I only spent about a minute looking at the thesis about Scala that you
cited, but it looks interesting and I will put it on my reading list.
The thesis advisor, Jaakko Jarvi, is one of the co-authors of the
comparison paper that I linked to.

raould

unread,

Feb 9, 2010, 1:50:15 PM2/9/10

to

hi,

> programming. In response to the paper, SPJ et al added type families to
> GHC:
> http://haskell.org/haskellwiki/Simonpj/Talk:FunWithTypeFuns

thanks!

Andreas Rossberg

unread,

Feb 9, 2010, 2:56:04 PM2/9/10

to

Unfortunately, I can't remember the exact quote, nor who said it, but:
people always use "generic" to refer to the kind of polymorphism their
favorite language does not have yet (or acquired late).

Dan Doel

unread,

Feb 9, 2010, 3:15:24 PM2/9/10

to

raould wrote:
> the term generic seems to be over-used, and i'm not sure i grok all
> the uses and/or how they overlap. but anyway, i'm attempting to learn
> more about the subject. some things i've read [1] make it sound to me
> like c++ is the most complete language when it comes to generic
> approaches. of course that makes me sad. (maybe d does well, i dunno.)
> supposedly scala fares better than most, but still falls at some
> point. any thoughts? i'm looking to learn about such things in a
> concrete real world language rather than e.g. an extension to ghc.

Perhaps this is not relevant to what you're interested in, but "generic
programming" has a relatively different (than in C++/Java/C#...) meaning in
the Haskell community (at least, and I suspect elsewhere as well), probably
because parametric polymorphism is so fundamental and almost trivial in a
Haskell (or ML, or...)-like language.

Anyhow, generic programming in Haskell is typically meant to refer to
functions that are able to work over multiple different structures, by
somehow using a description of how the structure is built from its
constituent parts. One basic way of doing this is to treat all data as fixed
points of functors, and so you get functions like:

cata :: forall f r. Functor f => (f r -> r) -> Mu f -> r
ana :: forall f s. Functor f => (s -> f s) -> s -> Nu f

which are in some sense parameterized by the shape of the structure (f),
rather than just being parameterized by what you can put in some fixed
structure. I think there is at least one framework building on the above
(the name of which escapes me), but there are others that use other ideas,
like Uniplate and Scrap Your Boilerplate.

Clean also has built-in support for this kind of generic programming, but
finding information on it is difficult, as I recall, because the section on
it in the Clean report is blank for some reason.

With regard to another message, GHC at least now implements associated
types, so that bubble can be filled in all the way for it now. :)

-- Dan

raould

unread,

Feb 10, 2010, 1:07:39 PM2/10/10

to

> The OCamlGraph library is a very good example of how ML-style generic
> programming can be implemented and used.

thanks for the note! graph libraries are what are used in the paper(s)
referenced, and ml and haskell and scala turn out to not be too bad
compared to java and c#.

sincerely.

(mls have been a favourite family of mine since sml in college, but i
have never had super great luck with ocaml.)

raould

unread,

Feb 10, 2010, 1:08:24 PM2/10/10

to

> You might find these useful

ja, i think some of those were pre-cursors to the scala paper?

raould

unread,

Feb 10, 2010, 1:10:07 PM2/10/10

to

> In fact, C++ isn't even a good choice, when it comes to generic
> programming.

hrm, part of this might be the "which kind of generic are we talking
about" question? the generic was for STL-style generic, rather than
data-shape-style generic.

(yes, personally, i would rather stay well away from C++ in just about
any shape or form, if i have the option ;-)

raould

unread,

Feb 10, 2010, 1:12:57 PM2/10/10

to

> Clean also has built-in support for this kind of generic programming, but
> finding information on it is difficult, as I recall, because the section on
> it in the Clean report is blank for some reason.

i wish clean were doing better. it seemed like a nice idea in lots of
ways, but i think it really seriously fell down in lots of other ways,
mostly usability related (their IDE is horrible evilness, and the
uniqueness typing apparently becomes a bear according to the DDC
thesis).

sounds like they are "not dead yet" and really /are/ turning clean
into a competitor for ghc; clean will be a compiler for haskell.

> With regard to another message, GHC at least now implements associated
> types, so that bubble can be filled in all the way for it now. :)

yes, that is very cool, it seems to me!

raould

unread,

Feb 10, 2010, 1:13:32 PM2/10/10

to

although i have to say that originally i was hoping not to have to use
ghc specificness ;-) oh well.

Jon Harrop

unread,

Feb 10, 2010, 6:35:18 PM2/10/10

to

Interesting statements given Haskell's lack of higher-order modules and
comparatively poor type inference.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Jon Harrop

unread,

Feb 10, 2010, 8:02:53 PM2/10/10

to

raould wrote:
> the term generic seems to be over-used, and i'm not sure i grok all
> the uses and/or how they overlap.

Everyone uses the term "generic programming" to refer to making definitions
more widely applicable (i.e. general) which seems to be indistinguishable
from code reuse and/or factoring to me. This is, in turn, strongly related
to brevity.

Lisp's macros were one of the earliest forms of generic programming and,
yet, modern uses of the term "generic programming" often refer specifically
to the expressiveness of static type systems. This seems to be particularly
prevalent in the context of Haskell where generic programming is conflated
with type constraints and any relevance to the true objective of code reuse
is lost.

For example, Haskellers Garcia et al. published a paper "An extended
comparative study of language support for generic programming" that was
actually more about static type constraints than it was about generalizing
definitions or improving brevity and reuse. Although they claimed to have
implemented graph theoretic code in several languages, their OCaml code
seems to have remained secret:

http://www.osl.iu.edu/research/comparing/

Garcia had published research on Haskell and apparently wrote great Haskell
solutions in this paper but I think their OCaml code leaves a lot to be
desired (and probably all of the other non-Haskell solutions as well) and
their relevant conclusions are simply wrong. Apparently they managed to get
this published without obtaining peer review from anyone familiar with the
other languages.

Even their first example (complete with superfluous parentheses and
semicolons):

class type comparable = object ('a) method better : 'a -> bool end

class apple init = object (self: 'a)
val value_ : int = init
method better (y: 'a) = self#value > y#value
method value = value_;
end

let pick ((x: #comparable as 'a), (y: 'a)) : 'a =
if x#better y then x else y

let a1 = (new apple 3);;
let a2 = (new apple 1);;
let a3 = pick (a1, a2);;

would normally be written:

let apple i = object
method better y = i > y#value
method value = i
end

let pick x y = if x#better y then x else y

let a1 = apple 3 and a2 = apple 1
let a3 = pick a1 a2

The essential difference is that OCaml can infer the relevant types and,
consequently, there is no need to define these types and their
relationships by hand. Ironically, their OCaml code inherited superfluous
definitions from the Haskell and they then concluded that these were a
disadvantage for OCaml because non-trivial combinations must be maintained
by hand (which is simply wrong: they are all automatically inferred).

Now look at their keystone OCaml solution for the graph theoretic problem:

class type ['vertex_t] vertex_list_graph = object
method vertices : 'vertex_t list
method num_vertices : int
end

class type ['vertex_t] edge = object
method source : 'vertex_t
method target : 'vertex_t
end

class type ['vertex_x, 'edge_t] incidence_graph = object
constraint 'edge_t = 'vertex_t #edge
method out_edges : 'vertex_t -> 'edge_t list
end

In the context of generic programming, these type definitions are entirely
superfluous. Sadly, they use these superfluous definitions to justify the
conclusion that OCaml is needlessly verbose.

They do give the following concrete implementation of an adjacency-list
based graph type in their paper:

class algraph_edge (s: int) (t: int) = object
val src = s
val tgt = t
method source = src
method target = tgt
end

class adjacency_list (num_vertices_) = object
val g = Array.make num_vertices_ []

method add_edge (src, tgt: int * int) = g.(src) <- (tgt::g.(src))

method vertices =
let rec floop (i: int) =
if i = num_vertices_ then [] else i::floop(i+1) in
floop 0

method num_vertices = num_vertices_

method out_edges v = List.map (fun n -> new algraph_edge v n) g.(v)

method adjacent_vertices v = g.(v)

method edges =
let (_, result) = Array.fold_left
(fun (src, (sofar: (algraph_edge) list)) (tgts: int list) ->
(src+1, List.append
(List.map (fun n -> new algraph_edge src n)
tgts) sofar))
(0, []) g in
result

method create_property_map : 'a. 'a -> 'a array =
fun def -> Array.make num_vertices_ def
end

let graph_search
(graph: (('vertex_t, 'edge_t) #incidence_graph) as 'graph_t)
(v: 'vertex_t)
(q: 'value_t #buffer)
(vis: ('graph_t, 'vertex_t, 'edge_t) #visitor)
(map: ('color_t, 'key_t) #color_map) = ...

This is just abhorrent and could not be less representative of a real OCaml
solution.

Why the superfluous immutable private data in the "algraph_edge" class?

Why are they defining classes that only have the effect of making the code
less generic?

Why are they mixing mutable (array) and immutable (list) data structures
like this?

Why the unnecessary nested function in the "vertices" method to create the
list [0..n-1]?

Why did they not annotate the type of "g" where it might actually help but
did annotate the arguments to the inner anonymous function in the "edges"
method?

You can rewrite their 46-line OCaml solution with the following 14-line
OCaml solution:

let edge s t = object method source = s method target = t end

let adjacency_list n = object
val g = Array.init n (fun _ -> Stack.create())

method add_edge (src, tgt) = Stack.push tgt g.(src)
method vertices = Array.init n id
method num_vertices = n
method out_edges v = g.(v)
method adjacent_vertices v = g.(v)
method edges =
let es = Stack.create() in
Array.iteri (fun u -> Stack.iter (fun v -> Stack.push (edge u v) es));
es
end

This uses more idiomatic data structures but that is irrelevant in this
context.

Finally, they claim that the OCaml solution cannot support separate
compilation. Is that true? I doubt it...

> but anyway, i'm attempting to learn
> more about the subject. some things i've read [1] make it sound to me
> like c++ is the most complete language when it comes to generic
> approaches.

The problem with C++ (and Haskell) is that the Turing complete type system
gives you complete generality in some uselessly-impractical sense, i.e. it
is a Turing argument.

> of course that makes me sad. (maybe d does well, i dunno.)
> supposedly scala fares better than most, but still falls at some
> point. any thoughts? i'm looking to learn about such things in a
> concrete real world language rather than e.g. an extension to ghc.

You must take into account the applicability of these "solutions" in real
programs. The .NET guys are acutely aware of many of the academic features
found in languages like Haskell but they choose not to adopt them because
they do not solve real problems.

Jon Harrop

unread,

Feb 10, 2010, 8:37:00 PM2/10/10

to

Jon Harrop wrote:
> For example, Haskellers Garcia et al. published a paper "An extended
> comparative study of language support for generic programming" that was
> actually more about static type constraints than it was about generalizing
> definitions or improving brevity and reuse.

Incidentally, I was surprised they did not even mention genericity over data
structures because this is a well-known example where Haskell falls down on
something MLs have handled well for decades using their higher-order module
system (which Haskell lacks):

See the article "A SOMEWHAT FAILED ADVENTURE IN HASKELL ABSTRACTION" by
Lennart Augustsson, for example:

http://augustss.blogspot.com/2008/12/somewhat-failed-adventure-in-haskell.html

Ertugrul Söylemez

unread,

Feb 11, 2010, 3:59:42 AM2/11/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

Ah, it's Dr. Harrop again. I know that you love making things up, but
frankly I don't care. Firstly Haskell doesn't need higher order
modules, but you've always failed to use Haskell properly. Secondly
Haskell's type inference isn't poor, especially not compared to C++.

Ertugrul Söylemez

unread,

Feb 11, 2010, 4:17:53 AM2/11/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> > but anyway, i'm attempting to learn more about the subject. some
> > things i've read [1] make it sound to me like c++ is the most
> > complete language when it comes to generic approaches.
>
> The problem with C++ (and Haskell) is that the Turing complete type
> system gives you complete generality in some uselessly-impractical
> sense, i.e. it is a Turing argument.

Once again you're just proving that you have no idea what you're talking
about. Haskell's type system is not Turing-complete. You specifically
need language extensions (like UndecidableInstances) for that. Secondly
if you decide to make it Turing-complete, you get a few interesting
features like type arithmetic.

You may want to tell us what's wrong with a Turing-complete type system
anyway.

> > of course that makes me sad. (maybe d does well, i dunno.)
> > supposedly scala fares better than most, but still falls at some
> > point. any thoughts? i'm looking to learn about such things in a
> > concrete real world language rather than e.g. an extension to ghc.
>
> You must take into account the applicability of these "solutions" in
> real programs. The .NET guys are acutely aware of many of the academic
> features found in languages like Haskell but they choose not to adopt
> them because they do not solve real problems.

What do you think C# LINQ and F# computation expressions are? Yes,
Haskell monads! Why does F# support a lazy evaluation construct
(despite that it's just function wrapping)? Some other F# features
include ADTs and pattern matching. There are not much more "academic"
features left.

Clearly even Microsoft knows the practical value of Haskell's language
features. You on the other hand don't.

Jon Harrop

unread,

Feb 11, 2010, 10:52:08 AM2/11/10

to

Ertugrul Söylemez wrote:

> Jon Harrop <j...@ffconsultancy.com> wrote:
>> Interesting statements given Haskell's lack of higher-order modules and
>> comparatively poor type inference.
>
> Ah, it's Dr. Harrop again. I know that you love making things up, but
> frankly I don't care. Firstly Haskell doesn't need higher order
> modules, but you've always failed to use Haskell properly. Secondly
> Haskell's type inference isn't poor, especially not compared to C++.

Haskell cannot even infer sum types. You actually have to define and
maintain them by hand.

Ertugrul Söylemez

unread,

Feb 11, 2010, 10:22:09 AM2/11/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> Ertugrul Söylemez wrote:
> > Jon Harrop <j...@ffconsultancy.com> wrote:
> >> Interesting statements given Haskell's lack of higher-order modules
> >> and comparatively poor type inference.
> >
> > Ah, it's Dr. Harrop again. I know that you love making things up,
> > but frankly I don't care. Firstly Haskell doesn't need higher order
> > modules, but you've always failed to use Haskell properly. Secondly
> > Haskell's type inference isn't poor, especially not compared to C++.
>
> Haskell cannot even infer sum types. You actually have to define and
> maintain them by hand.

As said, you're making things up again. Type inference is the process
of finding out the type of an expression.

raould

unread,

Feb 11, 2010, 2:36:14 PM2/11/10

to

On Feb 11, 1:17 am, Ertugrul Söylemez <e...@ertes.de> wrote:
> You may want to tell us what's wrong with a Turing-complete type system
> anyway.

this is a complete aside, but i think Qi-Lisp purports to have a
turning-complete and user-extensibleish type system.

raould

unread,

Feb 11, 2010, 2:43:06 PM2/11/10

to

> Clearly even Microsoft knows the practical value of Haskell's language
> features. You on the other hand don't.

i do not currently work at MSFT, and when i was there i was not in
the .NET team, but nevertheless i believe i have enough insight to
say:

As with any team of size and age, the .NET team does not have one
single perspective on programming language design or theory. i would
say that the original C# team was less PLT, and the later folks who
came along and influenced .NET, like Meijer and Syme and SPL,
obviously were more PLT.

point being that when you look at .NET as it is today, it is a
melange.

sincerely.

raould

unread,

Feb 11, 2010, 2:44:24 PM2/11/10

to

> Garcia had published research on Haskell and apparently wrote great Haskell
> solutions in this paper but I think their OCaml code leaves a lot to be
> desired (and probably all of the other non-Haskell solutions as well) and
> their relevant conclusions are simply wrong. Apparently they managed to get
> this published without obtaining peer review from anyone familiar with the
> other languages.

thanks for the review, it is interesting to me, and reassuring that
O'Caml isn't out of the running as it were.

sincerely.

Jon Harrop

unread,

Feb 11, 2010, 5:09:12 PM2/11/10

to

Ertugrul Söylemez wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:
>> Ertugrul Söylemez wrote:
>> > Jon Harrop <j...@ffconsultancy.com> wrote:
>> >> Interesting statements given Haskell's lack of higher-order modules
>> >> and comparatively poor type inference.
>> >
>> > Ah, it's Dr. Harrop again. I know that you love making things up,
>> > but frankly I don't care. Firstly Haskell doesn't need higher order
>> > modules, but you've always failed to use Haskell properly. Secondly
>> > Haskell's type inference isn't poor, especially not compared to C++.
>>
>> Haskell cannot even infer sum types. You actually have to define and
>> maintain them by hand.
>
> As said, you're making things up again. Type inference is the process of
> finding out the type of an expression.

Try to translate the following simple OCaml expression into Haskell, for
example:

# `Add(`Mul(`Int 3, `Int 4), `Int 5);;
- : [> `Add of
[> `Mul of [> `Int of int ] * [> `Int of int ] ] * [> `Int of
int ] ]
= `Add (`Mul (`Int 3, `Int 4), `Int 5)

Note how OCaml infers the algebraic datatype.

Ertugrul Söylemez

unread,

Feb 11, 2010, 4:59:59 PM2/11/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

I'm sorry, but my knowledge of OCaml is too limited. I don't understand
your code. Could you please describe in words, what it does?

Dirk Thierbach

unread,

Feb 11, 2010, 5:24:57 PM2/11/10

to

Ertugrul Söylemez <e...@ertes.de> wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:

>> # `Add(`Mul(`Int 3, `Int 4), `Int 5);;
>> - : [> `Add of
>> [> `Mul of [> `Int of int ] * [> `Int of int ] ] * [> `Int of
>> int ] ]
>> = `Add (`Mul (`Int 3, `Int 4), `Int 5)
>>
>> Note how OCaml infers the algebraic datatype.

> I'm sorry, but my knowledge of OCaml is too limited. I don't understand
> your code. Could you please describe in words, what it does?

These are polymorphic variants. In Haskell, one can implement them
with a straightforward extension of the typeclass system. See
"A Polymorphic Type System for Extensible Records and Variants" by
Gaster and Jones, 1996. GHC doesn't implement this, Hugs implements at
least the extensible records (Trex); I'm not sure about the variants.

- Dirk

Jon Harrop

unread,

Feb 11, 2010, 6:43:07 PM2/11/10

to

That is just a value representing the symbolic expression 3*4+5.

A Haskell equivalent is:

data Expr = Int Int
| Add Expr Expr
| Mul Expr Expr
deriving Show

main = do
print $ Add (Mul (Int 3) (Int 4)) (Int 5)

Note that you must define the algebraic datatype (called "Expr" here) in
Haskell but not in OCaml because it can infer them.

OCaml also infers object types.

Jon Harrop

unread,

Feb 11, 2010, 7:01:41 PM2/11/10

to

Dirk Thierbach wrote:
> Ertugrul Söylemez <e...@ertes.de> wrote:
>> Jon Harrop <j...@ffconsultancy.com> wrote:
>>> # `Add(`Mul(`Int 3, `Int 4), `Int 5);;
>>> - : [> `Add of
>>> [> `Mul of [> `Int of int ] * [> `Int of int ] ] * [> `Int of
>>> int ] ]
>>> = `Add (`Mul (`Int 3, `Int 4), `Int 5)
>>>
>>> Note how OCaml infers the algebraic datatype.
>
>> I'm sorry, but my knowledge of OCaml is too limited. I don't understand
>> your code. Could you please describe in words, what it does?
>
> These are polymorphic variants. In Haskell, one can implement them
> with a straightforward extension of the typeclass system.

No, you cannot.

> See "A Polymorphic Type System for Extensible Records and Variants" by
> Gaster and Jones, 1996. GHC doesn't implement this, Hugs implements at
> least the extensible records (Trex); I'm not sure about the variants.

That paper describes how you can implement open (extensible) sum types, not
polymorphic variants. You're still missing the type inference and there is
no way to implement that within Haskell.

Note that this discussion was specifically about the type inference and had
nothing to do with open sum types.

Jon Harrop

unread,

Feb 11, 2010, 7:17:22 PM2/11/10

to

Ertugrul Söylemez wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:

>> The problem with C++ (and Haskell) is that the Turing complete type
>> system gives you complete generality in some uselessly-impractical
>> sense, i.e. it is a Turing argument.
>

> ...

> Haskell's type system is not Turing-complete. You specifically
> need language extensions (like UndecidableInstances) for that.

So Haskell's type system can be Turing complete, just as I said.

> You may want to tell us what's wrong with a Turing-complete type system
> anyway.

They are only used to emulate features that are better implemented directly,
e.g. metaprogramming using C++ templates.

>> You must take into account the applicability of these "solutions" in
>> real programs. The .NET guys are acutely aware of many of the academic
>> features found in languages like Haskell but they choose not to adopt
>> them because they do not solve real problems.
>
> What do you think C# LINQ and F# computation expressions are? Yes,
> Haskell monads! Why does F# support a lazy evaluation construct
> (despite that it's just function wrapping)? Some other F# features
> include ADTs and pattern matching. There are not much more "academic"
> features left.

To the extent which they are present, I do not consider those features to be
academic.

I was referring to features such as:

. Purity.
. Non-strict evaluation as the default.
. Multiparameter type classes.
. Monad transformers.
. Stream fusion.
. Generalized algebraic datatypes.
...

> Clearly even Microsoft knows the practical value of Haskell's language
> features. You on the other hand don't.

I write commercial literature on all of the topics you just mentioned and
none of the ones I just mentioned. And I'm not the only one...

Ertugrul Söylemez

unread,

Feb 12, 2010, 8:54:54 AM2/12/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

Okay, OCaml doesn't need me to define my types, but what's wrong with
having to do so? I mean, C doesn't require you to write function
prototypes, too, which is rather bad than good. I'm not saying that
that OCaml feature is bad, but that I'd rather define my type
explicitly. After all that's where type safety comes from. We know how
awful PHP is, because it tries hard to make sense of your statements,
even if they _are_ totally pointless and _should_ cause an error.

Also it's not the same kind of type inference that I'm talking about.
Usually type inference finds the type of an expression instead of
creating one that fits. Haskell's type system is very powerful at that.

Dirk Thierbach

unread,

Feb 12, 2010, 9:13:04 AM2/12/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:
> That paper describes how you can implement open (extensible) sum types, not
> polymorphic variants. You're still missing the type inference and

Read the paper again. I recommend section 4.2 "A type inference algorithm".

> there is no way to implement that within Haskell.

One cannot implement that "within" Haskell, in the same way one
cannot implement that "within" OCaml (i.e., without using the in-built
polymorphic variants). That's why I wrote "a straightfoward extension". Hugs
implements this algorithm, not "within" Haskell, but "within" Hugs,
at least for extensible records. Polymorphic variants are just the dual.

The GHC people apparently didn't consider this extension worthwhile,
and I tend to agree.

It's also rather old news. Now, back into my killfile.

- Dirk

Jon Harrop

unread,

Feb 12, 2010, 2:33:56 PM2/12/10

to

Dirk Thierbach wrote:
> ...One cannot implement that "within" Haskell...

This is the only relevant point. Everything else you wrote seems to have
been a Turing argument shrouded in irrelevancies.

Jon Harrop

unread,

Feb 12, 2010, 4:11:04 PM2/12/10

to

Ertugrul Söylemez wrote:
> Okay, OCaml doesn't need me to define my types, but what's wrong with
> having to do so?

Having to write out definitions for types manually can damage productivity.

The first order effect is that these unnecessary type definitions bloat the
code, making it harder to read, write and maintain. In the "Expr" example I
just gave, the Haskell code is several times longer. This is exactly the
same effect you cited with Haskell vs C++.

A secondary effect is that there is no definition to refer to when
constructing a value of the type. For example, using .NET from F# you might
write:

System.Windows.Media.Effects.SamplingMode.Bilinear

whereas with polymorphic variants you can just write:

`Bilinear

In essence, tuples in languages like Haskell can be hugely beneficial over
the nearest equivalent in C++ (e.g. std::pair<..>) or record types for
exactly the same reason that polymorphic variants can be better than
ordinary variants.

There is no logical reason to stop at tuples and force programmers to
declare sum and class types. Moreover, OCaml has proven that polymorphic
variants are useful.

> I mean, C doesn't require you to write function prototypes, too, which is
> rather bad than good.

I disagree. I don't want to have to write and maintain function prototypes
by hand. I prefer lots of inference with throwback of inferred types from
the compiler to the IDE.

> I'm not saying that that OCaml feature is bad, but that I'd rather define
> my type explicitly.

That is often the right thing to do but I still want the choice.

> After all that's where type safety comes from.

To some extent, yes. Polymorphic variants are still completely type safe, of
course. There is a larger possibility of error but, in practice, that isn't
their main disadvantage. The main problem is that inference over sum types
(particularly recursive sum types) gives the compiler more freedom to be
misled and, consequently, can lead to obfuscated error messages. Hence
explicitly-declared ordinary variant types are still very common in OCaml.

However, by far the most common application of polymorphic variants in
practice is non-recursive sum types like enumerations. For example, the
following type from LablGL represents the different OpenGL buffers:

As this is almost an enumeration, there is almost no room for error. Many
practical applications fall into this category.

My use of polymorphic variants for the recursive type representing HLVM
types is more adventurous but requires minimal type annotations to make it
easy to develop with:

https://forge.ocamlcore.org/scm/viewvc.php/hlvm.ml?view=markup&revision=42&root=hlvm

Ertugrul Söylemez

unread,

Feb 14, 2010, 6:13:33 PM2/14/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> Ertugrul Söylemez wrote:
> > Okay, OCaml doesn't need me to define my types, but what's wrong
> > with having to do so?
>
> Having to write out definitions for types manually can damage
> productivity.
>
> The first order effect is that these unnecessary type definitions
> bloat the code, making it harder to read, write and maintain. In the
> "Expr" example I just gave, the Haskell code is several times
> longer. This is exactly the same effect you cited with Haskell vs C++.
>
> A secondary effect is that there is no definition to refer to when
> constructing a value of the type. For example, using .NET from F# you
> might write:
>
> System.Windows.Media.Effects.SamplingMode.Bilinear
>
> whereas with polymorphic variants you can just write:
>
> `Bilinear
>
> In essence, tuples in languages like Haskell can be hugely beneficial
> over the nearest equivalent in C++ (e.g. std::pair<..>) or record
> types for exactly the same reason that polymorphic variants can be
> better than ordinary variants.
>
> There is no logical reason to stop at tuples and force programmers to
> declare sum and class types. Moreover, OCaml has proven that
> polymorphic variants are useful.

Well, when you choose Haskell, you specifically choose type safety over
convenience at certain places. Haskell code is often way shorter than
equivalent code in other languages (especially in recent versions of
GHC, which are much smarter and require far less explicit performance
hints than earlier), but it may not always be the shortest code, and it
may be so, because the type system gets in your way. But the average
Haskell programmer will prefer to define the type explicitly anyway.

Whether or not you consider this good or bad is probably a matter of
personal preference. There are certain things, which are easier/faster
to do/express in OCaml and other things are easier/faster to do/express
in Haskell. I don't think that any of the two languages is less
powerful than the other, but rather that they make different
performance, safety and productivity tradeoffs.

An interesting consequence of this is that few people will like both
languages at the same time. People preferring OCaml will write their
code in it regardless of whether it would be more convenient to write in
Haskell and vice versa.

> > I mean, C doesn't require you to write function prototypes, too,
> > which is rather bad than good.
>
> I disagree. I don't want to have to write and maintain function
> prototypes by hand. I prefer lots of inference with throwback of
> inferred types from the compiler to the IDE.

It's bad the way C does it. It just assumes you are right. OCaml
assumes you're right, too, but you can't be right in two different,
contradicting ways at the same time. In C you can, so the prototype is
actually needed to enforce correctness. Otherwise you may end up with
strange bugs, which are hard to find.

Yes, it can be more productive and more convenient, too. I still
wouldn't use it, because if your type is simple, it won't hurt you to
define it explicitly. On the other hand, if your type is complex, you
generally want an explicit, clear description of it in your code anyway.
This is, of course, Haskell thinking.

If you prefer to use it, there is probably nothing wrong with it. OCaml
is still ages ahead of most other languages in wide-spread use today.

Side note: I like your new tone. Discussing with you has become more
pleasant than in the past. Keep it that way.

Jon Harrop

unread,

Feb 14, 2010, 9:46:50 PM2/14/10

to

Ertugrul Söylemez wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:

>> There is no logical reason to stop at tuples and force programmers to
>> declare sum and class types. Moreover, OCaml has proven that
>> polymorphic variants are useful.
>
> Well, when you choose Haskell, you specifically choose type safety over
> convenience at certain places.

Yes.

> Haskell code is often way shorter than
> equivalent code in other languages (especially in recent versions of
> GHC, which are much smarter and require far less explicit performance
> hints than earlier), but it may not always be the shortest code, and it
> may be so, because the type system gets in your way.

Other languages can obviously be far more concise than Haskell in specific
domains. In general, Haskell is clearly more concise than C++ but I'm not
sure where it sits among languages such as APL and Mathematica or even F#.

On the u64 shootout, Haskell is ~2x shorter than OCaml on chameneos-redux,
pidigits and binary-trees but ~2x longer on spectral-norm,
reverse-complement, k-nucleotide, fannkuch and regex-dna. However, these
are optimized programs. Moreover, F# is usually significantly more concise
than OCaml.

On Rosetta code, my F# is shorter than the Haskell on Anagrams, Bresenham,
Game of Life and Evolutionary Algorithm. Haskell is shorter than F# on LZW.
These programs are not optimized.

The two most compelling examples of Haskell I have ever seen were LZW:

http://www.haskell.org/haskellwiki/Toy_compression_implementations

and power series:

http://www.cs.dartmouth.edu/~doug/powser.html

> But the average Haskell programmer will prefer to define the type
> explicitly anyway.

Yes, of course. They are a self selected group of people who are, by
definition, willing to endure that.

> Whether or not you consider this good or bad is probably a matter of
> personal preference. There are certain things, which are easier/faster
> to do/express in OCaml and other things are easier/faster to do/express
> in Haskell. I don't think that any of the two languages is less
> powerful than the other, but rather that they make different
> performance, safety and productivity tradeoffs.

Yes. OCaml is unusually unsafe among MLs.

> An interesting consequence of this is that few people will like both
> languages at the same time. People preferring OCaml will write their
> code in it regardless of whether it would be more convenient to write in
> Haskell and vice versa.

There seem to be a growing number of people fluent in both but, as you say,
they do seem to stick with one regardless.

>> > I mean, C doesn't require you to write function prototypes, too,
>> > which is rather bad than good.
>>
>> I disagree. I don't want to have to write and maintain function
>> prototypes by hand. I prefer lots of inference with throwback of
>> inferred types from the compiler to the IDE.
>
> It's bad the way C does it. It just assumes you are right. OCaml
> assumes you're right, too, but you can't be right in two different,
> contradicting ways at the same time. In C you can, so the prototype is
> actually needed to enforce correctness. Otherwise you may end up with
> strange bugs, which are hard to find.

Yes. C is absurd. OCaml tried to address that problem by including MD5 sums
of APIs in object files but that shifts the problem to one of brittle
bindings. Now, a new minor-minor OCaml release forces all downstream
developers to completely recompile from scratch which takes days for the
Debian maintainers because there is so much OCaml code out there now. For
example, my commercial Smoke Vector Graphics library for OCaml requires
different binaries to be shipped for every different permutation of minor
versions of libraries (including the compiler) that it depends upon. That
is obviously not practicable.

> Yes, it can be more productive and more convenient, too. I still
> wouldn't use it, because if your type is simple, it won't hurt you to
> define it explicitly.

Surely you use tuples without explicitly naming them? What's the difference?

Ertugrul Söylemez

unread,

Feb 16, 2010, 5:28:21 AM2/16/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> The two most compelling examples of Haskell I have ever seen were LZW:
>
> http://www.haskell.org/haskellwiki/Toy_compression_implementations
>
> and power series:
>
> http://www.cs.dartmouth.edu/~doug/powser.html

I'd like to add another one, which I wrote myself. This code is
insanely difficult to write in other languages:

isPrime :: Integral i => i -> Bool
isPrime n = all (\d -> rem n d /= 0) . takeWhile (\d -> d^2 <= n) $ primes

primes :: Integral i => [i]
primes = 2 : filter isPrime [3..]

Note that this is not a prime sieve, but more intelligent than the naive
filtering approach. If you replace 'primes' in the second line by
'[2..]', the code runs considerably slower.

> > Yes, it can be more productive and more convenient, too. I still
> > wouldn't use it, because if your type is simple, it won't hurt you
> > to define it explicitly.
>
> Surely you use tuples without explicitly naming them? What's the
> difference?

If that is what you mean, Haskell can well infer sum types:

> let n = (Left (Right 15), Right 17)
> :t n
n :: (Either (Either a Integer) b, Either a1 Integer)

So there is also a straightforward translation of your example to
Haskell:

> let add = Left
> let mul = Right
> let n = add (mul (3, 4), 5)
> :t n
n :: Either (Either a (Integer, Integer), Integer) b

Unfortunately you can't pattern-match against 'add' and 'mul', but need
to match against Left and Right, so it's still better to make explicit
what you mean:

data Term a
= Literal a
| Add (Term a) (Term a)
| Mul (Term a) (Term a)

Dan Doel

unread,

Feb 16, 2010, 6:50:06 AM2/16/10

to

Ertugrul Söylemez wrote:

> So there is also a straightforward translation of your example to
> Haskell:
>
> > let add = Left
> > let mul = Right
> > let n = add (mul (3, 4), 5)
> > :t n
> n :: Either (Either a (Integer, Integer), Integer) b
>
> Unfortunately you can't pattern-match against 'add' and 'mul'

You can if you download the she preprocessor:

{-# OPTIONS_GHC -F -pgmF she #-}
{-# LANGUAGE TypeOperators #-}

data Free f a = Unit a | Roll (f (Free f a))

newtype Join f a = J (f a a)
newtype (f :. g) a = O (f (g a))

pattern Add x y = Roll (O (J (Left (J (x, y)))))
pattern Mul x y = Roll (O (J (Right (J (x, y)))))

type Term a = Free (Join Either :. Join (,)) a

n = Add (Mul (Unit 3) (Unit 4)) (Unit 5)

q = case n of Add x y -> x

*Main> :t n
n :: (Num t) => Free (Join Either :. Join (,)) t
*Main> :t q
q :: (Num t) => Free (Join Either :. Join (,)) t

Of course, that's not all it does.

http://personal.cis.strath.ac.uk/~conor/pub/she/

Jon Harrop

unread,

Feb 17, 2010, 1:18:19 AM2/17/10

to

Ertugrul Söylemez wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:
>> The two most compelling examples of Haskell I have ever seen were LZW:
>>
>> http://www.haskell.org/haskellwiki/Toy_compression_implementations
>>
>> and power series:
>>
>> http://www.cs.dartmouth.edu/~doug/powser.html
>
> I'd like to add another one, which I wrote myself. This code is
> insanely difficult to write in other languages:
>
> isPrime :: Integral i => i -> Bool
> isPrime n = all (\d -> rem n d /= 0) . takeWhile (\d -> d^2 <= n) $
> primes
>
> primes :: Integral i => [i]
> primes = 2 : filter isPrime [3..]
>
> Note that this is not a prime sieve, but more intelligent than the naive
> filtering approach. If you replace 'primes' in the second line by
> '[2..]', the code runs considerably slower.

Interesting challenge. I'll have a fiddle! :-)

>> > Yes, it can be more productive and more convenient, too. I still
>> > wouldn't use it, because if your type is simple, it won't hurt you
>> > to define it explicitly.
>>
>> Surely you use tuples without explicitly naming them? What's the
>> difference?
>
> If that is what you mean, Haskell can well infer sum types:
>
> > let n = (Left (Right 15), Right 17)
> > :t n
> n :: (Either (Either a Integer) b, Either a1 Integer)

Does it handle recursive types?

For example, what happens if you try to write a function to compute symbolic
derivative in that style:

For example:

# let x = `Var "x";;
val x : [> `Var of string ] = `Var "x"

# let f = `Add(`Add(`Mul(`Mul(x, x), x), `Mul(`Int(-1), x)), `Int(-1));;
val f :
[> `Add of
[> `Add of
[> `Mul of
[> `Mul of [> `Var of string ] * [> `Var of string ] ] *
[> `Var of string ] ] *
[> `Mul of [> `Int of int ] * [> `Var of string ] ] ] *

[> `Int of int ] ] =
`Add

(`Add
(`Mul (`Mul (`Var "x", `Var "x"), `Var "x"),
`Mul (`Int (-1), `Var "x")),
`Int (-1))

# d f "x";;
- : [> `Add of 'a * 'a
| `Int of int
| `Mul of
([ `Add of 'b * 'b | `Int of int | `Mul of 'b * 'b | `Var of
string ]
as 'b) *
'a ]
as 'a
=
`Add
(`Add
(`Add
(`Mul (`Mul (`Var "x", `Var "x"), `Int 1),
`Mul
(`Var "x",
`Add (`Mul (`Var "x", `Int 1), `Mul (`Var "x", `Int 1)))),
`Add (`Mul (`Int (-1), `Int 1), `Mul (`Var "x", `Int 0))),
`Int 0)

> So there is also a straightforward translation of your example to
> Haskell:
>
> > let add = Left
> > let mul = Right
> > let n = add (mul (3, 4), 5)
> > :t n
> n :: Either (Either a (Integer, Integer), Integer) b
>
> Unfortunately you can't pattern-match against 'add' and 'mul', but need
> to match against Left and Right, so it's still better to make explicit
> what you mean:
>
> data Term a
> = Literal a
> | Add (Term a) (Term a)
> | Mul (Term a) (Term a)

A view pattern would handle that.

Ertugrul Söylemez

unread,

Feb 18, 2010, 7:32:27 AM2/18/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> Ertugrul Söylemez wrote:
> > Jon Harrop <j...@ffconsultancy.com> wrote:
> >> The two most compelling examples of Haskell I have ever seen were LZW:
> >>
> >> http://www.haskell.org/haskellwiki/Toy_compression_implementations
> >>
> >> and power series:
> >>
> >> http://www.cs.dartmouth.edu/~doug/powser.html
> >
> > I'd like to add another one, which I wrote myself. This code is
> > insanely difficult to write in other languages:
> >
> > isPrime :: Integral i => i -> Bool
> > isPrime n = all (\d -> rem n d /= 0) . takeWhile (\d -> d^2 <= n) $
> > primes
> >
> > primes :: Integral i => [i]
> > primes = 2 : filter isPrime [3..]
> >
> > Note that this is not a prime sieve, but more intelligent than the
> > naive filtering approach. If you replace 'primes' in the second
> > line by '[2..]', the code runs considerably slower.
>
> Interesting challenge. I'll have a fiddle! :-)

Have fun. This is one case, where laziness really pays off. I'd like
to see your results. =)

> >> > Yes, it can be more productive and more convenient, too. I still
> >> > wouldn't use it, because if your type is simple, it won't hurt
> >> > you to define it explicitly.
> >>
> >> Surely you use tuples without explicitly naming them? What's the
> >> difference?
> >
> > If that is what you mean, Haskell can well infer sum types:
> >
> > > let n = (Left (Right 15), Right 17)
> > > :t n
> > n :: (Either (Either a Integer) b, Either a1 Integer)
>
> Does it handle recursive types?
>
> For example, what happens if you try to write a function to compute
> symbolic derivative in that style:
>

> [...]

Unfortunately no. At least I haven't found a way to encode it (easily).

> > So there is also a straightforward translation of your example to
> > Haskell:
> >
> > > let add = Left
> > > let mul = Right
> > > let n = add (mul (3, 4), 5)
> > > :t n
> > n :: Either (Either a (Integer, Integer), Integer) b
> >
> > Unfortunately you can't pattern-match against 'add' and 'mul', but
> > need to match against Left and Right, so it's still better to make
> > explicit what you mean:
> >
> > data Term a
> > = Literal a
> > | Add (Term a) (Term a)
> > | Mul (Term a) (Term a)
>
> A view pattern would handle that.

Well, it would, but not very nicely I think.

Jon Harrop

unread,

Feb 18, 2010, 8:23:27 PM2/18/10

to

Ertugrul Söylemez wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:

>> Ertugrul Söylemez wrote:
>> > I'd like to add another one, which I wrote myself. This code is
>> > insanely difficult to write in other languages:
>> >
>> > isPrime :: Integral i => i -> Bool
>> > isPrime n = all (\d -> rem n d /= 0) . takeWhile (\d -> d^2 <= n) $
>> > primes
>> >
>> > primes :: Integral i => [i]
>> > primes = 2 : filter isPrime [3..]
>> >
>> > Note that this is not a prime sieve, but more intelligent than the
>> > naive filtering approach. If you replace 'primes' in the second
>> > line by '[2..]', the code runs considerably slower.
>>
>> Interesting challenge. I'll have a fiddle! :-)
>
> Have fun. This is one case, where laziness really pays off. I'd like
> to see your results. =)

That is a nice program. :-)

Here is a simple translation to F#:

let rec isPrime n =
primes
|> Seq.takeWhile (fun d -> d*d <= n)
|> Seq.forall (fun d -> n%d <> 0)
and primes =
let rec f n =
if isPrime n then Some(n, n+1) else f(n+1)
Seq.unfold f 3
|> Seq.append [2]
|> Seq.cache

Takes about the same time as the Haskell to find the 100,000th prime.

Would be interesting to convert it into imperative style, accumulating a
resizeable array.

Keith H Duggar

unread,

Feb 20, 2010, 1:57:08 AM2/20/10

to

On Feb 18, 8:23 pm, Jon Harrop <j...@ffconsultancy.com> wrote:

> Ertugrul S√∂ylemez wrote:
> > Jon Harrop <j...@ffconsultancy.com> wrote:

How much time?

> Would be interesting to convert it into imperative style, accumulating a
> resizeable array.

Ignoring integer overflow Here is a C++ program that uses
the isPrime function to print primes:

#include <iostream>
#include <vector>

bool isPrime ( unsigned n ) ;

//Primes is an iterator over the prime numbers
class Primes
{
unsigned _idx ;
static std::vector<unsigned> _primes ;
void grow ( ) {
unsigned t = _primes[_idx] ;
while ( not isPrime(++t) ) ;
_primes.push_back(t) ;
}
public :
Primes ( ) : _idx(0) { }
unsigned operator* ( ) { return _primes[_idx] ; }
Primes & operator++ ( ) {
if ( _idx+1 == _primes.size() ) grow() ;
++_idx ;
return *this ;
}
} ;

std::vector<unsigned> Primes::_primes(1,2) ;

bool isPrime ( unsigned n ) {
Primes primes ;
unsigned d = *primes ;
while ( d*d <= n and n % d != 0 ) d = *(++primes) ;
return d*d > n ;
}

int main ( )
{
for ( unsigned i = 2 ; i < ~0 ; ++i ) {
if ( isPrime(i) ) {
std::cout << i << '\n' ;
}
}

return 0 ;
}

frankenstein

unread,

Feb 20, 2010, 2:33:38 PM2/20/10

to

> > Jon Harrop <j...@ffconsultancy.com> wrote:

> Here is a simple translation to F#:
>
> let rec isPrime n =
> primes
> |> Seq.takeWhile (fun d -> d*d <= n)
> |> Seq.forall (fun d -> n%d <> 0)
> and primes =
> let rec f n =
> if isPrime n then Some(n, n+1) else f(n+1)
> Seq.unfold f 3
> |> Seq.append [2]
> |> Seq.cache
>
> Takes about the same time as the Haskell to find the 100,000th prime.
>
> Would be interesting to convert it into imperative style, accumulating a
> resizeable array.

Hi: I somehow manage to understand the Haskell version based on lazy
evaluation but gave in in grasping the OCaml version.

What is 'Some' do? What is 'f' supposed to stand for in 'rec f n' and
how is it all processed (what is the starting point)? I do not
understand how the 'guards |>' in the OCaml code accomplish the
behavior of 'laziness' here.

Thanks,
Europaer

Ertugrul Söylemez

unread,

Feb 20, 2010, 5:39:28 PM2/20/10

to

frankenstein <klohm...@yahoo.de> wrote:

Note that it's F# code. It doesn't emulate laziness, but takes the
imperative approach of establishing a cache explicitly. It's quite
elegant code, but the elegance of the Haskell version can only be
matched with true laziness.

BTW, as an interesting side note, this example shows that the "laziness"
construct in F# is not real laziness, but rather just a wrapper around
unary functions, because it doesn't help here. A "lazy" value in F# is
the same as a function, which returns the value in question when called.
In fact using functions directly leads to more readable code in F#, at
least in my experience.

Jon Harrop

unread,

Feb 21, 2010, 12:20:34 AM2/21/10

to

frankenstein wrote:
>> > Jon Harrop <j...@ffconsultancy.com> wrote:
>> Here is a simple translation to F#:
>>
>> let rec isPrime n =
>> primes
>> |> Seq.takeWhile (fun d -> d*d <= n)
>> |> Seq.forall (fun d -> n%d <> 0)
>> and primes =
>> let rec f n =
>> if isPrime n then Some(n, n+1) else f(n+1)
>> Seq.unfold f 3
>> |> Seq.append [2]
>> |> Seq.cache
>>
>> Takes about the same time as the Haskell to find the 100,000th prime.
>>
>> Would be interesting to convert it into imperative style, accumulating a
>> resizeable array.
>
> Hi: I somehow manage to understand the Haskell version based on lazy
> evaluation but gave in in grasping the OCaml version.

Note that this is F# code and not OCaml code.

> What is 'Some' do? What is 'f' supposed to stand for in 'rec f n' and
> how is it all processed (what is the starting point)? I do not
> understand how the 'guards |>' in the OCaml code accomplish the
> behavior of 'laziness' here.

This code is confusing and fragile primarily because I (ab)used the .NET
IEnumerable (called Seq in F#) as a poor man's lazy infinite sequence.

The function "f" is a generator that takes one prime and produces the next
prime. The Seq.unfold function uses "f" to create the infinite sequence of
primes. The "|>" operator just pipes a value through a function.

Jon Harrop

unread,

Feb 21, 2010, 12:35:50 AM2/21/10

to

Jon Harrop wrote:
> Here is a simple translation to F#:
>
> let rec isPrime n =
> primes
> |> Seq.takeWhile (fun d -> d*d <= n)
> |> Seq.forall (fun d -> n%d <> 0)
> and primes =
> let rec f n =
> if isPrime n then Some(n, n+1) else f(n+1)
> Seq.unfold f 3
> |> Seq.append [2]
> |> Seq.cache
>
> Takes about the same time as the Haskell to find the 100,000th prime.

Here is a shorter, simpler and more direct translation that just uses F#'s
LazyList data structure:

open LazyList

let rec isPrime n =

Seq.takeWhile (fun d -> d*d <= n) primes

|> Seq.forall (fun d -> n%d <> 0)
and primes =

unfold (fun n -> Some(n, n+1)) 3
|> filter isPrime
|> cons 2

As you can see, the differences are merely syntactic. Moreover, the
performance is still comparable.

Jon Harrop

unread,

Feb 21, 2010, 12:51:45 AM2/21/10

to

Ertugrul Söylemez wrote:
> Note that it's F# code. It doesn't emulate laziness, but takes the
> imperative approach of establishing a cache explicitly.

There is no mutation so it is not really "imperative" code.

> It's quite elegant code, but the elegance of the Haskell version can only
> be matched with true laziness.

Well, your original Haskell was 171 bytes of code whereas my F# is 186
bytes. Moreover, Haskell's 8% brevity advantage only appears to apply in
the tiny niche of performance-agnostic infinite lazy sequences.

> BTW, as an interesting side note, this example shows that the "laziness"
> construct in F# is not real laziness, but rather just a wrapper around
> unary functions, because it doesn't help here.

My code didn't even use F#'s laziness construct. I think you are referring
to .NET's IEnumerable sequences which, as you say, are not lazy.

> A "lazy" value in F# is the same as a function, which returns the value in
> question when called.

No, a "lazy" value in F# is a real lazy value exactly the same as Haskell.

Jon Harrop

unread,

Feb 21, 2010, 1:35:16 AM2/21/10

to

Keith H Duggar wrote:
> On Feb 18, 8:23 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
>> Here is a simple translation to F#:
>>
>> let rec isPrime n =
>> primes
>> |> Seq.takeWhile (fun d -> d*d <= n)
>> |> Seq.forall (fun d -> n%d <> 0)
>> and primes =
>> let rec f n =
>> if isPrime n then Some(n, n+1) else f(n+1)
>> Seq.unfold f 3
>> |> Seq.append [2]
>> |> Seq.cache
>>
>> Takes about the same time as the Haskell to find the 100,000th prime.
>
> How much time?

About 4.5s for GHC 6.12.1 and F# 2.0, IIRC.

>> Would be interesting to convert it into imperative style, accumulating a
>> resizeable array.
>
> Ignoring integer overflow Here is a C++ program that uses
> the isPrime function to print primes:

I'm not loving it. ;-)

Here's an imperative F# implementation of the same algorithm that is only
50% longer than the pure lazy Haskell but ~25x faster (0.18s):

let primes =
let a = ResizeArray[2]
let rec loop i p =
let d = a.[i]
if d*d>p then p else
if p%d=0 then loop 0 (p+1) else
loop (i+1) p
fun n ->
while n >= a.Count do
a.[a.Count - 1] + 1 |> loop 0 |> a.Add
a.[n]

You can get the 100,000th prime with "primes 100000".

Jon Harrop

unread,

Feb 21, 2010, 1:58:57 AM2/21/10

to

Here is the Sieve of Eratosthenes implemented in imperative F# such that the
number of primes sieved roughly doubles with each iteration:

let primes =
let a = ResizeArray[2]

let grow() =
let p0 = a.[a.Count-1]+1
let b = Array.create p0 true
for di in a do
let i0 = p0/di*di
let i0 = if i0<p0 then i0+di else i0
let rec loop i =
if i<b.Length then
b.[i] <- false
loop(i+di)
loop(i0-p0)
for i=0 to b.Length-1 do
if b.[i] then a.Add(p0+i)

fun n ->
while n >= a.Count do

grow()
a.[n]

This is 2.5x more chars and over 100x faster than the original Haskell. For
example, you can get the 10,000,000th prime in only a few seconds.

Ertugrul Söylemez

unread,

Feb 21, 2010, 1:43:43 AM2/21/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> Ertugrul Söylemez wrote:
> > Note that it's F# code. It doesn't emulate laziness, but takes the
> > imperative approach of establishing a cache explicitly.
>
> There is no mutation so it is not really "imperative" code.

It doesn't need to use mutation to be in an imperative style. You're
establishing a cache explicitly.

> > It's quite elegant code, but the elegance of the Haskell version can
> > only be matched with true laziness.
>
> Well, your original Haskell was 171 bytes of code whereas my F# is 186
> bytes. Moreover, Haskell's 8% brevity advantage only appears to apply
> in the tiny niche of performance-agnostic infinite lazy sequences.

Note that the type signatures in the Haskell version are strictly
optional. Without them it has only 107 bytes (not counting line feeds).
Infinite lazy sequences are not a niche. I use them a lot for all kinds
of stuff. Remember that in Haskell you encode algorithms in a different
way than in F#.

> > BTW, as an interesting side note, this example shows that the
> > "laziness" construct in F# is not real laziness, but rather just a
> > wrapper around unary functions, because it doesn't help here.
>
> My code didn't even use F#'s laziness construct. I think you are
> referring to .NET's IEnumerable sequences which, as you say, are not
> lazy.

No, I really meant the Lazy type. Your code doesn't use it, that's why
it's just an interesting side note.

> > A "lazy" value in F# is the same as a function, which returns the
> > value in question when called.
>
> No, a "lazy" value in F# is a real lazy value exactly the same as
> Haskell.

You can't pass a lazy value to a strict function. There is no implicit
demand-driven execution, but rather explicit forcing. That's the reason
why the laziness construct in F# does not help to solve this problem.
You can't really have useful laziness without nonstrict semantics.

Jon Harrop

unread,

Feb 21, 2010, 12:19:19 PM2/21/10

to

Jon Harrop wrote:
> Keith H Duggar wrote:
>> On Feb 18, 8:23 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
>>> Here is a simple translation to F#:
>>>
>>> let rec isPrime n =
>>> primes
>>> |> Seq.takeWhile (fun d -> d*d <= n)
>>> |> Seq.forall (fun d -> n%d <> 0)
>>> and primes =
>>> let rec f n =
>>> if isPrime n then Some(n, n+1) else f(n+1)
>>> Seq.unfold f 3
>>> |> Seq.append [2]
>>> |> Seq.cache
>>>
>>> Takes about the same time as the Haskell to find the 100,000th prime.
>>
>> How much time?
>
> About 4.5s for GHC 6.12.1 and F# 2.0, IIRC.

Sorry, with GHC 6.12 the Haskell actually takes 6.6s.

Jon Harrop

unread,

Feb 21, 2010, 1:30:02 PM2/21/10

to

Ertugrul Söylemez wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:
>> Ertugrul Söylemez wrote:
>> > Note that it's F# code. It doesn't emulate laziness, but takes the
>> > imperative approach of establishing a cache explicitly.
>>
>> There is no mutation so it is not really "imperative" code.
>
> It doesn't need to use mutation to be in an imperative style. You're
> establishing a cache explicitly.

If Haskell had an "unfold" function that took a function enumeration (e.g.
Int -> Int) and returned an infinite lazy list of those values, you would
not claim it was imperative because it used caching. The F# is not
imperative in exactly the same way.

>> > It's quite elegant code, but the elegance of the Haskell version can
>> > only be matched with true laziness.
>>
>> Well, your original Haskell was 171 bytes of code whereas my F# is 186
>> bytes. Moreover, Haskell's 8% brevity advantage only appears to apply
>> in the tiny niche of performance-agnostic infinite lazy sequences.
>
> Note that the type signatures in the Haskell version are strictly
> optional.

You annotated the types by hand because that is idiomatic Haskell. That is
idiomatic Haskell because the type system is too easily misled by design
(flaw). Type annotations are one of the main sources of verbosity in real
Haskell code.

Just look at the source code to darcs, for example:

http://allmydata.org/trac/darcs-2/browser/src/Progress.hs

Every single definition there has a type annotation. I see no merit in
pretending that doesn't bloat real Haskell source code.

> Without them it has only 107 bytes (not counting line feeds).

There are still only two significant differences in verbosity between your
Haskell and my equivalent F#. Firstly, your Haskell used the syntactic
sugar [3..] whereas my F# used:

unfold (fun n -> Some(n, n+1)) 3

Secondly, your Haskell benefitted from functions being in the global
namespace whereas my F# qualified them explicitly.

Excluding these mere syntactic differences, my F# is still only 18% longer
than your Haskell. In contrast, the real Sieve of Eratosthenes can be
implemented in APL in only 17 chars:

http://www.vaxman.de/publications/apl_slides.pdf

And that will be a *lot* faster than your 6x longer Haskell solution.

> Infinite lazy sequences are not a niche. I use them a lot for all kinds
> of stuff. Remember that in Haskell you encode algorithms in a different
> way than in F#.

As I've shown here, you can encode algorithms the same way in F# if you want
to. The fact that F# programmers rarely choose to is a statement about the
practicality of the Haskell-style of encoding algorithms.

Indeed, implementing the Sieve of Eratosthenes in Haskell was an unsolved
problem for many years and, even today, is still the stuff of research.
Melissa O'Neil wrote a seminal paper on it:

http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

The nearest she came to a decent Haskell implementation is 4,690 bytes of
Haskell code (!) and still runs several times slower than my 436 byte F#
solution.

> You can't pass a lazy value to a strict function. There is no implicit
> demand-driven execution, but rather explicit forcing. That's the reason
> why the laziness construct in F# does not help to solve this problem.

I solved the problem in F# using laziness.

> You can't really have useful laziness without nonstrict semantics.

Can you have useful laziness with any semantics?

As I've shown here, F# is more than capable of expressing lazy Haskell-like
solutions. I have implemented many as academic exercises but I have never
adopted that style in production code precisely because it always suffers
from these kinds of problems.

Even when I want something like the sequence of prime numbers (and our F#
for Numerics product does provide one), there's no way I'm going to use a
two line solution that is hundreds of times slower than a dozen line
solution. Nor will I adopt a style that saves me a couple of lines of code
at the cost of orders of magnitude in performance or costs me an order of
magnitude in bloat when I come to optimize it.

Jon Harrop

unread,

Feb 21, 2010, 2:07:41 PM2/21/10

to

Compare with the Haskell implementation of the same simple sieve by Melissa
O'Neill (that is slower and 10x longer!):

module ONeillPrimes (primes, sieve, calcPrimes, primesToNth,
primesToLimit) where

data PriorityQ k v = Lf
| Br {-# UNPACK #-} !k v !(PriorityQ k v) !(PriorityQ k
v)
deriving (Eq, Ord, Read, Show)

emptyPQ :: PriorityQ k v
emptyPQ = Lf

isEmptyPQ :: PriorityQ k v -> Bool
isEmptyPQ Lf = True
isEmptyPQ _ = False

minKeyValuePQ :: PriorityQ k v -> (k, v)
minKeyValuePQ (Br k v _ _) = (k,v)
minKeyValuePQ _ = error "Empty heap!"

minKeyPQ :: PriorityQ k v -> k
minKeyPQ (Br k v _ _) = k
minKeyPQ _ = error "Empty heap!"

minValuePQ :: PriorityQ k v -> v
minValuePQ (Br k v _ _) = v
minValuePQ _ = error "Empty heap!"

insertPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v
insertPQ wk wv (Br vk vv t1 t2)
| wk <= vk = Br wk wv (insertPQ vk vv t2) t1
| otherwise = Br vk vv (insertPQ wk wv t2) t1
insertPQ wk wv Lf = Br wk wv Lf Lf

siftdown :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v -> PriorityQ
k v
siftdown wk wv Lf _ = Br wk wv Lf Lf
siftdown wk wv (t @ (Br vk vv _ _)) Lf
| wk <= vk = Br wk wv t Lf
| otherwise = Br vk vv (Br wk wv Lf Lf) Lf
siftdown wk wv (t1 @ (Br vk1 vv1 p1 q1)) (t2 @ (Br vk2 vv2 p2 q2))
| wk <= vk1 && wk <= vk2 = Br wk wv t1 t2
| vk1 <= vk2 = Br vk1 vv1 (siftdown wk wv p1 q1) t2
| otherwise = Br vk2 vv2 t1 (siftdown wk wv p2 q2)

deleteMinAndInsertPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v
deleteMinAndInsertPQ wk wv Lf = error "Empty PriorityQ"
deleteMinAndInsertPQ wk wv (Br _ _ t1 t2) = siftdown wk wv t1 t2

leftrem :: PriorityQ k v -> (k, v, PriorityQ k v)
leftrem (Br vk vv Lf Lf) = (vk, vv, Lf)
leftrem (Br vk vv t1 t2) = (wk, wv, Br vk vv t t2) where
(wk, wv, t) = leftrem t1
leftrem _ = error "Empty heap!"

deleteMinPQ :: Ord k => PriorityQ k v -> PriorityQ k v
deleteMinPQ (Br vk vv Lf _) = Lf
deleteMinPQ (Br vk vv t1 t2) = siftdown wk wv t2 t where
(wk,wv,t) = leftrem t1
deleteMinPQ _ = error "Empty heap!"

type HybridQ k v = (PriorityQ k v, [(k,v)])

initHQ :: PriorityQ k v -> [(k,v)] -> HybridQ k v
initHQ pq feeder = (pq, feeder)

insertHQ :: (Ord k) => k -> v -> HybridQ k v -> HybridQ k v
insertHQ k v (pq, q) = (insertPQ k v pq, q)

deleteMinAndInsertHQ :: (Ord k) => k -> v -> HybridQ k v -> HybridQ k v
deleteMinAndInsertHQ k v (pq, q) = postRemoveHQ(deleteMinAndInsertPQ k v
pq, q)
where
postRemoveHQ mq@(pq, []) = mq
postRemoveHQ mq@(pq, (qk,qv) : qs)
| qk < minKeyPQ pq = (insertPQ qk qv pq, qs)
| otherwise = mq

minKeyHQ :: HybridQ k v -> k
minKeyHQ (pq, q) = minKeyPQ pq

minKeyValueHQ :: HybridQ k v -> (k, v)
minKeyValueHQ (pq, q) = minKeyValuePQ pq

{-# SPECIALIZE wheel :: [Int] #-}
{-# SPECIALIZE wheel :: [Integer] #-}
wheel :: Integral a => [a]
wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:4:8:6:4:6:2:4:6
:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel

{-# SPECIALIZE calcPrimes :: () -> [Int] #-}
{-# SPECIALIZE calcPrimes :: () -> [Integer] #-}
calcPrimes :: Integral a => () -> [a]
calcPrimes () = 2 : 3 : 5 : 7 : sieve 11 wheel

{-# SPECIALIZE primes :: [Int] #-}
{-# SPECIALIZE primes :: [Integer] #-}
primes :: Integral a => [a]
primes = calcPrimes ()

{-# SPECIALIZE primesToNth :: Int -> [Integer] #-}
{-# SPECIALIZE primesToNth :: Int -> [Int] #-}
primesToNth :: Integral a => Int -> [a]
primesToNth n = take n (calcPrimes ())

{-# SPECIALIZE primesToLimit :: Integer -> [Integer] #-}
{-# SPECIALIZE primesToLimit :: Int -> [Int] #-}
primesToLimit :: Integral a => a -> [a]
primesToLimit limit = takeWhile (< limit) (calcPrimes ())

{-# SPECIALIZE sieve :: Int -> [Int] -> [Int] #-}
{-# SPECIALIZE sieve :: Integer -> [Integer] -> [Integer] #-}
sieve :: Integral a => a -> [a] -> [a]
sieve n [] = []
sieve n wheel@(d:ds) = n : (map (\(p,wheel) -> p) primes1) where
primes1 = sieve' (n+d) ds initialTable
primes2 = sieve' (n+d) ds initialTable
initialTable = initHQ (insertPQ (n*n) (n, wheel) emptyPQ)
(map (\(p,wheel) -> (p*p,(p,wheel))) primes2)
sieve' n [] table = []
sieve' n wheel@(d:ds) table
| nextComposite <= n = sieve' (n+d) ds (adjust table)
| otherwise = (n,wheel) : sieve' (n+d) ds table
where
nextComposite = minKeyHQ table
adjust table
| m <= n = adjust (deleteMinAndInsertHQ m' (p, ds)
table)
| otherwise = table
where
(m, (p, d:ds)) = minKeyValueHQ table
m' = m + p * d

frankenstein

unread,

Feb 21, 2010, 3:49:07 PM2/21/10

to

Sod! What's that? As far as I remember the Sieve could be estimated by
using list comprehension. I haven't been using Clean (a language
similar to Haskell) for years but the Clean book showed how to
calculate the Sieve in a clever way. I am not 100% sure though think I
came across the Sieve in the Clean book.

Hasn't Haskell similar build-in capabilities than Clean?

frankenstein

unread,

Feb 21, 2010, 4:02:42 PM2/21/10

to

On Feb 21, 7:07 pm, Jon Harrop <j...@ffconsultancy.com> wrote:

nicked from the Clean book (page 70 in the Acrobat reader):
http://www.st.cs.ru.nl/papers/cleanbook/CleanBookI.pdf

==
This algorithm to compute prime numbers is called the sieve of
Eratosthenes. Eratosthenes
was a greek mathematician born in Cyrene who lived 276-196 BC. His
algorithm can be
expressed slightly more elegant using list comprehensions:

primes :: [Int]
primes = sieve [2..]

sieve :: [Int] -> [Int]
sieve [prime:rest] = [prime: sieve [i \\ i <- rest | i mod prime <>
0]]
==

Jon Harrop

unread,

Feb 21, 2010, 11:09:35 PM2/21/10

to

frankenstein wrote:
> nicked from the Clean book (page 70 in the Acrobat reader):
> http://www.st.cs.ru.nl/papers/cleanbook/CleanBookI.pdf
>
> ==
> This algorithm to compute prime numbers is called the sieve of
> Eratosthenes. Eratosthenes
> was a greek mathematician born in Cyrene who lived 276-196 BC. His
> algorithm can be
> expressed slightly more elegant using list comprehensions:
>
> primes :: [Int]
> primes = sieve [2..]
>
> sieve :: [Int] -> [Int]
> sieve [prime:rest] = [prime: sieve [i \\ i <- rest | i mod prime <>
> 0]]

That is actually Turner's Sieve and not the Sieve of Eratosthenes. This
common misconception was described in detail in the following paper:

http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

frankenstein

unread,

Feb 22, 2010, 5:43:13 AM2/22/10

to

Hi

I am left in the dark. Why might it be that hard to translate your F#
version into a similar concise Haskell version? Or are people trying
here (the posted Haskell version) to avoid imperative styles in
Haskell by any means? Or is it just impossible imitating the
imperative F# code in Haskell?

Ertugrul Söylemez

unread,

Feb 22, 2010, 5:55:30 AM2/22/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

I have no F# compiler here, but I don't believe it. I have just written
a non-growing (i.e. faster) sieve in both C and Haskell and both need a
lot of time just sieving the first 100 million numbers, which don't even
contain the 10 millionth prime number. The C version takes 4.4 seconds,
the Haskell version takes 7.0 seconds. Following are the source codes.

Haskell version:

import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed

soeST :: Int -> ST s (STUArray s Int Bool)
soeST n = do
arr <- newArray (0,n) True
writeArray arr 0 False
writeArray arr 1 False
forM_ [2..n] $ \i -> do
b <- readArray arr i
when b $ forM_ [2*i, 3*i .. n] (\j -> writeArray arr j False)
return arr

soe :: Int -> [Int]
soe n = map fst . filter snd . assocs . runSTUArray $ soeST n

C version:

#include <stdint.h>
#include <stdio.h>
#include <string.h>

#define SIEVE (100000000)
#define SIEVEARRAY (SIEVE / 32 + 1)

inline void setBit(uint32_t *p, uint64_t i) {
p[i / 32] |= 1 << (i & 31);
}

inline int getBit(uint32_t *p, uint64_t i) {
return p[i / 32] & (1 << (i % 32));
}

int main() {
static uint32_t sieve[SIEVEARRAY];
uint32_t primes = 0;

memset(sieve, 0, SIEVEARRAY * sizeof(uint32_t));
for (uint64_t i = 2; i < SIEVE; i++) {
if (getBit(sieve, i)) continue;
primes++;
for (uint64_t j = 2*i; j < SIEVE; j += i)
setBit(sieve, j);
}
printf("%u\n", primes);
return 0;

Ertugrul Söylemez

unread,

Feb 22, 2010, 6:13:19 AM2/22/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> >> Well, your original Haskell was 171 bytes of code whereas my F# is
> >> 186 bytes. Moreover, Haskell's 8% brevity advantage only appears to
> >> apply in the tiny niche of performance-agnostic infinite lazy
> >> sequences.
> >
> > Note that the type signatures in the Haskell version are strictly
> > optional.
>
> You annotated the types by hand because that is idiomatic
> Haskell. That is idiomatic Haskell because the type system is too
> easily misled by design (flaw). Type annotations are one of the main
> sources of verbosity in real Haskell code.
>
> Just look at the source code to darcs, for example:
>
> http://allmydata.org/trac/darcs-2/browser/src/Progress.hs
>
> Every single definition there has a type annotation. I see no merit in
> pretending that doesn't bloat real Haskell source code.

Unless you use Haskell extensions, annotating types is strictly
optional. In general, the type system will infer the type of an
expression even better (i.e. more polymorphic) than you. The type
system of standard Haskell is designed such that it always succeeds or
the code is wrong.

There is one exception to this rule, though: the monomorphism
restriction, but it seldomly applies.

> > Without them it has only 107 bytes (not counting line feeds).
>
> There are still only two significant differences in verbosity between
> your Haskell and my equivalent F#. Firstly, your Haskell used the
> syntactic sugar [3..] whereas my F# used:
>
> unfold (fun n -> Some(n, n+1)) 3
>
> Secondly, your Haskell benefitted from functions being in the global
> namespace whereas my F# qualified them explicitly.
>
> Excluding these mere syntactic differences, my F# is still only 18%
> longer than your Haskell. In contrast, the real Sieve of Eratosthenes
> can be implemented in APL in only 17 chars:
>
> http://www.vaxman.de/publications/apl_slides.pdf
>
> And that will be a *lot* faster than your 6x longer Haskell solution.

That may be because my Haskell version is not the sieve of
Eratosthenes. ;)

Of course you cannot implement it that short in Haskell, because APL has
good support for matrix operations built into the language. But anyway,
see my other post. The SoE is very fast in Haskell (the C version is
only 1.6 times faster) and needs little code. The performance
difference is related to the fact that standard Haskell arrays are safe.
I left out the bounds checking from the C code, which makes it faster.

> > Infinite lazy sequences are not a niche. I use them a lot for all
> > kinds of stuff. Remember that in Haskell you encode algorithms in a
> > different way than in F#.
>
> As I've shown here, you can encode algorithms the same way in F# if
> you want to. The fact that F# programmers rarely choose to is a
> statement about the practicality of the Haskell-style of encoding
> algorithms.
>
> Indeed, implementing the Sieve of Eratosthenes in Haskell was an
> unsolved problem for many years and, even today, is still the stuff of
> research. Melissa O'Neil wrote a seminal paper on it:
>
> http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf
>
> The nearest she came to a decent Haskell implementation is 4,690 bytes
> of Haskell code (!) and still runs several times slower than my 436
> byte F# solution.

See my other post. Haskell sieve of Eratosthenes in 7 lines of code.
Two unnecessary writeArrays are there just for efficiency. And it gets
near C speed.

> > You can't pass a lazy value to a strict function. There is no
> > implicit demand-driven execution, but rather explicit forcing.
> > That's the reason why the laziness construct in F# does not help to
> > solve this problem.
>
> I solved the problem in F# using laziness.

You used features of Seq, not laziness.

> > You can't really have useful laziness without nonstrict semantics.
>
> Can you have useful laziness with any semantics?
>
> As I've shown here, F# is more than capable of expressing lazy
> Haskell-like solutions. I have implemented many as academic exercises
> but I have never adopted that style in production code precisely
> because it always suffers from these kinds of problems.
>
> Even when I want something like the sequence of prime numbers (and our
> F# for Numerics product does provide one), there's no way I'm going to
> use a two line solution that is hundreds of times slower than a dozen
> line solution. Nor will I adopt a style that saves me a couple of
> lines of code at the cost of orders of magnitude in performance or
> costs me an order of magnitude in bloat when I come to optimize it.

Then use the imperative style SoE. It's short, concise and fast.

frankenstein

unread,

Feb 22, 2010, 8:52:46 AM2/22/10

to

> > Jon Harrop wrote:
> > > Here is the Sieve of Eratosthenes implemented in imperative F# such that
> > > the number of primes sieved roughly doubles with each iteration:
>
> > > let primes =
> > > let a = ResizeArray[2]
> > > let grow() =
> > > let p0 = a.[a.Count-1]+1
> > > let b = Array.create p0 true
> > > for di in a do
> > > let i0 = p0/di*di
> > > let i0 = if i0<p0 then i0+di else i0
> > > let rec loop i =
> > > if i<b.Length then
> > > b.[i] <- false
> > > loop(i+di)
> > > loop(i0-p0)
> > > for i=0 to b.Length-1 do
> > > if b.[i] then a.Add(p0+i)
> > > fun n ->
> > > while n >= a.Count do
> > > grow()
> > > a.[n]

Over the weekend I'd like to give it a try with Bigloo. However, the
only issue I have: ResizeArray[2] what does it do? It will create a
resizable list of [0,1,2], or [1,2] or just [2] or [0,0] or [1,1] or
what?

I cannot find any information at
http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/fsharp.powerpack/Microsoft.FSharp.Collections.ResizeArray.html

regarding ResizeArray[]. Last but not least "a.[a.Count-1]+1' is
indexing in F# zero based (like in Scheme) or 1 based (like in
Fortran).

Thanks,
Foerster vom Silberwald

Ertugrul Söylemez

unread,

Feb 22, 2010, 10:06:06 AM2/22/10

to

frankenstein <klohm...@yahoo.de> wrote:

> On Feb 22, 4:09 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> > frankenstein wrote:
> >
> > > This algorithm to compute prime numbers is called the sieve of
> > > Eratosthenes. Eratosthenes was a greek mathematician born in
> > > Cyrene who lived 276-196 BC. His algorithm can be expressed
> > > slightly more elegant using list comprehensions:
> > >
> > > primes :: [Int]
> > > primes = sieve [2..]
> >
> > > sieve :: [Int] -> [Int]
> > > sieve [prime:rest] = [prime: sieve [i \\ i <- rest | i mod prime <>
> > > 0]]
> >
> > That is actually Turner's Sieve and not the Sieve of
> > Eratosthenes. This common misconception was described in detail in
> > the following paper:
> >
> > http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf
>

> I am left in the dark. Why might it be that hard to translate your F#
> version into a similar concise Haskell version? Or are people trying
> here (the posted Haskell version) to avoid imperative styles in
> Haskell by any means? Or is it just impossible imitating the
> imperative F# code in Haskell?

See the Haskell code in my other post. It is not a direct translation,
because my code uses a fixed sieve, but making it self-growing wouldn't
be difficult using the 'vector' package. It's also quite fast, slower
than C by a factor of only 1.6.

Also note when talking to Dr. Harrop that he's a notorious Haskell
hater. =)

Jon Harrop

unread,

Feb 22, 2010, 4:36:01 PM2/22/10

to

Creates an extensible array containing the single element 2.

> regarding ResizeArray[]. Last but not least "a.[a.Count-1]+1' is
> indexing in F# zero based (like in Scheme) or 1 based (like in
> Fortran).

Zero based. That expression gives 1 + the last known prime.

Jon Harrop

unread,

Feb 22, 2010, 4:36:42 PM2/22/10

to

It isn't significantly faster.

> sieve in both C and Haskell and both need a
> lot of time just sieving the first 100 million numbers, which don't even
> contain the 10 millionth prime number.

Your C code takes only a few seconds to sieve 200M numbers and find the 10
millionth prime here.

Ertugrul Söylemez

unread,

Feb 22, 2010, 6:02:36 PM2/22/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

Sure, but that's not the point. I'm just saying that a nongrowing sieve
is not slower to justify the comparison.

> > sieve in both C and Haskell and both need a lot of time just sieving
> > the first 100 million numbers, which don't even contain the 10
> > millionth prime number.
>
> Your C code takes only a few seconds to sieve 200M numbers and find
> the 10 millionth prime here.

Then the Haskell version should behave similarly. My point is that it's
easy to write the imperative-style, fast sieve of Eratosthenes in
Haskell, and if your F# version is really that fast, then you've got a
much faster computer than I have and C and Haskell should behave
similarly.

To sieve the first 200M numbers the C version takes 9.4 seconds and the
Haskell version takes 14.7 seconds. Note that I'm compiling with GHC
6.10.4, which is a bit outdated. As soon as GHC 6.12 is in my
distribution, I'll test again.

Juan Pedro Bolivar Puente

unread,

Feb 22, 2010, 6:35:01 PM2/22/10

to

>
> You annotated the types by hand because that is idiomatic Haskell. That is
> idiomatic Haskell because the type system is too easily misled by design
> (flaw). Type annotations are one of the main sources of verbosity in real
> Haskell code.
>

How can ever one single line per top-level declaration be considered
bloat? Really, I did not want to participate in this discussion because
I am far from an expert on FP; but you keep repeating this argument and
it is completely non-sense. As Ertugrul has just said, the type
annotations are **completely optional**; the type system will always
infer the right most general type --except with certain odd extensions
and in the monomorphic restriction case.

Why are then the type annotations widely used? Because they help the
programmer. I don't know what kind of code do you write, but, I guess
that you write comments. It is idiomatic in any language to write one
comment per top-level function unless you want to make everybody else in
your team angry. Just consider those type annotations formal
documentation. A very nice form of documentation though, because if the
code is wrong the compiler will tell you that it doesn't do what you
meant in the type signature.

My two cents,

JP

Jon Harrop

unread,

Feb 22, 2010, 8:07:51 PM2/22/10

to

Parallelizing the loop over "a" requires tiny changes and provides a ~3x
performance improvement on my 8 core:

let primes =
let a = ResizeArray[2]
let grow() =
let p0 = a.[a.Count-1]+1
let b = Array.create p0 true

System.Threading.Tasks.Parallel.For(0, a.Count, fun j ->
let di = a.[j]

let rec loop i =
if i<b.Length then
b.[i] <- false
loop(i+di)

let i0 = p0/di*di
loop(if i0<p0 then i0+di-p0 else i0-p0)) |> ignore

for i=0 to b.Length-1 do
if b.[i] then a.Add(p0+i)
fun n ->
while n >= a.Count do
grow()
a.[n]

--

Jon Harrop

unread,

Feb 22, 2010, 9:36:18 PM2/22/10

to

Juan Pedro Bolivar Puente wrote:
>> You annotated the types by hand because that is idiomatic Haskell. That
>> is idiomatic Haskell because the type system is too easily misled by
>> design (flaw). Type annotations are one of the main sources of verbosity
>> in real Haskell code.
>
> How can ever one single line per top-level declaration be considered
> bloat?

Type annotations are a significant proportion of the total number of lines
of code in real Haskell code bases (but not in other FPLs).

> Really, I did not want to participate in this discussion because
> I am far from an expert on FP; but you keep repeating this argument and
> it is completely non-sense.

I already gave you one example where this is the case. Look at any
production Haskell code and you will find the same thing. In Ertugral's
original Haskell code on this thread, these unnecessary type annotations
*doubled* the size of his code. That is not uncommon.

> As Ertugrul has just said, the type annotations are **completely
> optional**; the type system will always infer the right most general
> type --except with certain odd extensions and in the monomorphic
> restriction case.

You are describing a property of Hindley-Milner yet other languages based on
HM are not afflicted by these type annotations. How do you explain that?

> Why are then the type annotations widely used? Because they help the
> programmer. I don't know what kind of code do you write, but, I guess
> that you write comments. It is idiomatic in any language to write one
> comment per top-level function unless you want to make everybody else in
> your team angry. Just consider those type annotations formal
> documentation. A very nice form of documentation though, because if the
> code is wrong the compiler will tell you that it doesn't do what you
> meant in the type signature.

You are just repeating the same bullshit "explanation" that Ertugral posted.
If that really were the motivation for bloating every definition with a
type annotation in production code you would see the same effect in all
FPLs but you do not. Why?

The real reason is, of course, that those theoretically-optional type
annotations are a practically-essential in Haskell for developer
productivity but not in most other FPLs. They are a manifestation of a
design flaw in Haskell's type system: its "power" makes it too easily
misled so it needs that extra guidance. If you removed those type
annotations you would suffer from incomprehensible error messages in
Haskell but not in other FPLs with simpler type systems.

Jon Harrop

unread,

Feb 22, 2010, 9:38:31 PM2/22/10

to

Ertugrul Söylemez wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:
>> >> Well, your original Haskell was 171 bytes of code whereas my F# is
>> >> 186 bytes. Moreover, Haskell's 8% brevity advantage only appears to
>> >> apply in the tiny niche of performance-agnostic infinite lazy
>> >> sequences.
>> >
>> > Note that the type signatures in the Haskell version are strictly
>> > optional.
>>
>> You annotated the types by hand because that is idiomatic
>> Haskell. That is idiomatic Haskell because the type system is too
>> easily misled by design (flaw). Type annotations are one of the main
>> sources of verbosity in real Haskell code.
>>
>> Just look at the source code to darcs, for example:
>>
>> http://allmydata.org/trac/darcs-2/browser/src/Progress.hs
>>
>> Every single definition there has a type annotation. I see no merit in
>> pretending that doesn't bloat real Haskell source code.
>
> Unless you use Haskell extensions, annotating types is strictly
> optional.

Type annotations do bloat real Haskell code. There is no merit in trying to
pretend otherwise.

> The type system of standard Haskell is designed such that it always
> succeeds or the code is wrong.

The following traditional implementation of the y-combinator is correct
code:

fix f = (\x -> f (x x)) (\x -> f (x x))

Yet the type system of standard Haskell fails on it.

>> And that will be a *lot* faster than your 6x longer Haskell solution.
>
> That may be because my Haskell version is not the sieve of
> Eratosthenes. ;)

Exactly.

> Of course you cannot implement it that short in Haskell, because APL has
> good support for matrix operations built into the language. But anyway,
> see my other post. The SoE is very fast in Haskell (the C version is
> only 1.6 times faster) and needs little code. The performance
> difference is related to the fact that standard Haskell arrays are safe.
> I left out the bounds checking from the C code, which makes it faster.

That speculation is obviously wrong: my F# is bounds checked is still much
faster than your Haskell.

>> The nearest she came to a decent Haskell implementation is 4,690 bytes
>> of Haskell code (!) and still runs several times slower than my 436
>> byte F# solution.
>
> See my other post. Haskell sieve of Eratosthenes in 7 lines of code.

No. Your incomplete Haskell solution is already 15 lines of code.

> Two unnecessary writeArrays are there just for efficiency. And it gets
> near C speed.

On my machine your Haskell is over 3x slower than your C.

>> > You can't pass a lazy value to a strict function. There is no
>> > implicit demand-driven execution, but rather explicit forcing.
>> > That's the reason why the laziness construct in F# does not help to
>> > solve this problem.
>>
>> I solved the problem in F# using laziness.
>
> You used features of Seq, not laziness.

No, I used features of Seq in one solution and laziness in another.

Erik de Castro Lopo

unread,

Feb 22, 2010, 8:40:38 PM2/22/10

to

Jon Harrop wrote:

> Type annotations are a significant proportion of the total number of lines
> of code in real Haskell code bases (but not in other FPLs).

Comments take up far more lines than type annotations. Are they also
bloat?

Idiot!

Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

Jon Harrop

unread,

Feb 22, 2010, 10:07:11 PM2/22/10

to

Ertugrul Söylemez wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:

>> Ertugrul Söylemez wrote:
>> > I have no F# compiler here, but I don't believe it. I have just
>> > written a non-growing (i.e. faster)
>>
>> It isn't significantly faster.
>
> Sure, but that's not the point. I'm just saying that a nongrowing sieve
> is not slower to justify the comparison.

There is no evidence that the nongrowing sieves are not slower.

>> > sieve in both C and Haskell and both need a lot of time just sieving
>> > the first 100 million numbers, which don't even contain the 10
>> > millionth prime number.
>>
>> Your C code takes only a few seconds to sieve 200M numbers and find
>> the 10 millionth prime here.
>
> Then the Haskell version should behave similarly.

With that hardcoded limit, your Haskell takes 36.4s and your C takes 12.7s.

> My point is that it's easy to write the imperative-style, fast sieve of
> Eratosthenes in Haskell,

Given that your Haskell solution introduced a hard-coded limit that my F#
did not have and your Haskell runs several times slower than everything
else, how can you possibly justify your claim?

> To sieve the first 200M numbers the C version takes 9.4 seconds

In your last post you said the C programs took "a lot of time" but now
you're quoting times under 10s. Is that what you meant by "a lot of time"?

> and the
> Haskell version takes 14.7 seconds. Note that I'm compiling with GHC
> 6.10.4, which is a bit outdated. As soon as GHC 6.12 is in my
> distribution, I'll test again.

I managed to get GHC 6.12 working. Even if I statically encode the *optimal*
sieve size (a wildly unfair advantage) into your incomplete Haskell
implementation it is still 6x slower than my F#:

F#: 5.9s (parallel)
F#: 15.9s (serial)
C++: 7.9s
C: 10.8s (with optimal limit)
Haskell: 33.4s (with optimal limit)

Here is my C++ with the extensible sieve and no hard-coded limit:

#include <iostream>
#include <vector>
#include <cstring>
#include <cstdlib>

std::vector<int> a;

void grow() {
const int p0 = a[a.size()-1]+1;
std::vector<bool> b(p0);
fill(b.begin(), b.end(), true);
for (int j=0; j<a.size(); ++j) {
const int di = a[j];
const int i0 = p0/di*di;
for (int i=(i0<p0 ? i0+di-p0 : i0-p0); i<b.size(); i+=di)
b[i] = false;
}
for (int i=0; i<b.size(); ++i)
if (b[i]) a.push_back(p0+i);
}

int primes(int n) {
while (n >= a.size())
grow();
return a[n];
}

int main(int argc, char *argv[]) {
if (argc != 2) {
std::cerr << "Usage: sieve <n>\n";
return 1;
}
int n = atoi(argv[1]);
a.push_back(2);
std::cout << "Prime " << n << " is " << primes(n) << std::endl;
return 0;

Jon Harrop

unread,

Feb 22, 2010, 10:22:37 PM2/22/10

to

Erik de Castro Lopo wrote:
> Jon Harrop wrote:
>> Type annotations are a significant proportion of the total number of
>> lines of code in real Haskell code bases (but not in other FPLs).
>
> Comments take up far more lines than type annotations.

Where did this "fact" come from?

Ertugrul Söylemez

unread,

Feb 22, 2010, 10:14:32 PM2/22/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> >>> sieve in both C and Haskell and both need a lot of time just
> >>> sieving the first 100 million numbers, which don't even contain
> >>> the 10 millionth prime number.
> >>
> >> Your C code takes only a few seconds to sieve 200M numbers and find
> >> the 10 millionth prime here.
> >
> > Then the Haskell version should behave similarly.
>
> With that hardcoded limit, your Haskell takes 36.4s and your C takes
> 12.7s.

I wonder why Haskell code is always much slower as soon as you
compile/use it. I'm using an older GHC version and yet the speed
difference factor between C and Haskell is much smaller here. Also note
that my code is not parallelized, so claiming that your F# version is
six times faster is not fair. It's only two times faster, namely
comparing unparallelized to unparallelized.

> > To sieve the first 200M numbers the C version takes 9.4 seconds
>
> In your last post you said the C programs took "a lot of time" but now
> you're quoting times under 10s. Is that what you meant by "a lot of
> time"?

You said "a few seconds" to find the 10 millionth prime number. I
reached "a few seconds" already to sieve 100 million numbers, which
contain only around the first five million primes. Hence "a lot of
time".

Paul Rubin

unread,

Feb 22, 2010, 10:05:31 PM2/22/10

to

Jon Harrop <j...@ffconsultancy.com> writes:
>> As Ertugrul has just said, the type annotations are **completely
>> optional**; the type system will always infer the right most general
>> type --except with certain odd extensions and in the monomorphic
>> restriction case.
>
> You are describing a property of Hindley-Milner yet other languages based on
> HM are not afflicted by these type annotations. How do you explain that?

That claim is pretty bogus. ML, for example, has an entire separate
sublanguage (its module system) for writing type annotations, that
Haskell handles with a class constraint here or there. Even to add two
plus two in Ocaml, you have to annotate both the "+" and the "2"
(i.e. "2+2" or "2. +. 2.") to distinguish integer from floating versions
of the numeric literal and the addition function. I think other ML
dialects get around that problem somehow, but Haskell's scheme for it is
quite natural and extensible and the automatic inference works pretty
well.

Writing a top-level function in Haskell, I usually either write the type
annotation first, which then guides writing the code; or else I write
the code first, then let ghci infer the type, then paste the type back
into the program for later reference. Either way, the annotation feels
natural and useful as machine-checked documentation, even though it's
not necessary.

Ertugrul Söylemez

unread,

Feb 22, 2010, 10:16:56 PM2/22/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> [a lot of bullshit, which is entirely wrong]

Ertugrul Söylemez

unread,

Feb 22, 2010, 10:18:16 PM2/22/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> [a lot of bullshit, which is entirely wrong]

Jon Harrop

unread,

Feb 22, 2010, 11:38:43 PM2/22/10

to

Ertugrul Söylemez wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:
>> >>> sieve in both C and Haskell and both need a lot of time just
>> >>> sieving the first 100 million numbers, which don't even contain
>> >>> the 10 millionth prime number.
>> >>
>> >> Your C code takes only a few seconds to sieve 200M numbers and find
>> >> the 10 millionth prime here.
>> >
>> > Then the Haskell version should behave similarly.
>>
>> With that hardcoded limit, your Haskell takes 36.4s and your C takes
>> 12.7s.
>
> I wonder why Haskell code is always much slower as soon as you
> compile/use it. I'm using an older GHC version and yet the speed
> difference factor between C and Haskell is much smaller here.

I've discovered why:

ghc 6.10: 16.5s
ghc 6.12: 36.4s

So the new version of GHC is generating code over 2x slower than before. How
unpredictable...

> Also note
> that my code is not parallelized, so claiming that your F# version is
> six times faster is not fair. It's only two times faster, namely
> comparing unparallelized to unparallelized.

Let's make it "fair" by parallelizing the Haskell rather than crippling my
F#. If you cannot parallelize it, let's keep it fair by comparing the
fastest implementations.

>> > To sieve the first 200M numbers the C version takes 9.4 seconds
>>
>> In your last post you said the C programs took "a lot of time" but now
>> you're quoting times under 10s. Is that what you meant by "a lot of
>> time"?
>
> You said "a few seconds" to find the 10 millionth prime number. I
> reached "a few seconds" already to sieve 100 million numbers, which
> contain only around the first five million primes. Hence "a lot of
> time".

Ok. When you said "a lot of time" I thought you mean minutes or hours rather
than 10 seconds.

Jon Harrop

unread,

Feb 23, 2010, 12:44:08 AM2/23/10

to

Paul Rubin wrote:
> Jon Harrop <j...@ffconsultancy.com> writes:
>>> As Ertugrul has just said, the type annotations are **completely
>>> optional**; the type system will always infer the right most general
>>> type --except with certain odd extensions and in the monomorphic
>>> restriction case.
>>
>> You are describing a property of Hindley-Milner yet other languages based
>> on HM are not afflicted by these type annotations. How do you explain
>> that?
>
> That claim is pretty bogus. ML, for example, has an entire separate
> sublanguage (its module system) for writing type annotations,

No, that is not what the module system is for at all. The primary use of the
module system is to generate new data structures from one or more existing
data structures using higher-order modules. Consequently, type annotations
appear only in the library's abstract signatures and not in the user's
concrete structures at all.

For example, using the ocamlgraph library to create a data structure
representing an undirected network of coordinates:

module G = Imperative.Graph.Abstract(struct type t = int * int end)

From: http://ocamlgraph.lri.fr/sudoku.ml

There is a type definition but no annotation.

Creating an undirected network with integer coordinates for nodes and
integer labels on edges:

module IntInt = struct
type t = int * int
end
module Int = struct
type t = int
let compare = compare
let hash = Hashtbl.hash
let equal = (=)
let default = 0
end
module G = Imperative.Graph.AbstractLabeled(IntInt)(Int)

From: http://ocamlgraph.lri.fr/color.ml

Again, two type definitions and the rest are definitions without
annotations.

As the types appear only in the reusable abstract interfaces, they are
comparatively rare whereas type annotations in Haskell appear on almost
every definition. Annotating the signature argument to a higher-order
module is not comparable to annotating the type of a definition.

> that Haskell handles with a class constraint here or there.

No, Haskell cannot express this because it doesn't have a higher order
module system. See:

http://augustss.blogspot.com/2008/12/somewhat-failed-adventure-in-haskell.html

> Even to add two
> plus two in Ocaml, you have to annotate both the "+" and the "2"
> (i.e. "2+2" or "2. +. 2.") to distinguish integer from floating versions
> of the numeric literal and the addition function. I think other ML
> dialects get around that problem somehow,

Standard ML ad-hoc overloads "+" for int and float but no other types and
you must still write 2.0 instead of 2 if you want a float. F# has a limited
form of type classes for operators but, again, you must write the literals
correctly.

> but Haskell's scheme for it is quite natural and extensible and the
> automatic inference works pretty well.

Type classes are great for some very specific problems like operator
overloading but they are abused for many other things in Haskell because it
lacks alternatives like a higher-order module system, polymorphic variants
an object system and so on.

You're right that there are other forms of bloat buried in OCaml code
though. Perhaps the only logical course is to compare verbosity.

> Writing a top-level function in Haskell, I usually either write the type
> annotation first, which then guides writing the code; or else I write
> the code first, then let ghci infer the type, then paste the type back
> into the program for later reference. Either way, the annotation feels
> natural and useful as machine-checked documentation, even though it's
> not necessary.

What about source files for batch compilation? Don't you annotate almost all
definitions? What happens if you try to develop a complicated program
without writing and maintaining those type annotations by hand?

Keith H Duggar

unread,

Feb 23, 2010, 1:12:55 AM2/23/10

to

On Feb 22, 10:07 pm, Jon Harrop <j...@ffconsultancy.com> wrote:

> Ertugrul S√∂ylemez wrote:
> > Jon Harrop <j...@ffconsultancy.com> wrote:

Which is twice slower (on my machine) than my equivalent C++
version (code below) for the 10meg prime

g++ -O3 harrop.cpp -o harrop
g++ -O3 duggar.cpp -o duggar

time ./harrop 10000000
Prime 10000000 is 179424691

real 0m20.05s
user 0m19.76s
sys 0m0.25s

time ./duggar 10000000
179424691

real 0m8.12s
user 0m7.95s
sys 0m0.14s

time ./harrop 50000000
Prime 50000000 is 982451707

real 3m27.12s
user 3m11.42s
sys 0m2.14s

time ./duggar 50000000
982451707

real 0m45.55s
user 0m44.34s
sys 0m0.66s

and 4.6 times slower for the 50meg prime. I think the main reason
is the reduced memory consumption and improved memory locality in
my version due to scanning a (more) mimimal gap size when growing.
Also it can be futher optimized (at least in C++) should one need:

#include <cstdlib>
#include <iostream>
#include <vector>

using std::vector ;

class Primes
{
unsigned _idx ;
static unsigned g ;
static vector<unsigned> _primes ;

void grow ( ) {
unsigned g0 = g*g - g ; ++g ;
unsigned g1 = g*g - g ;
sieve(g0,g1) ;
}

void sieve ( unsigned g0, unsigned g1 ) {
vector<bool> cross(g1-g0,0) ;
vector<unsigned>::iterator pi = _primes.begin() ;
vector<unsigned>::iterator pe = _primes.end() ;
for ( ; pi != pe ; ++pi ) {
unsigned p = *pi ;
if ( p*p >= g1 ) break ;
unsigned m = (g0+p-1)/p*p ;
for ( ; m < g1 ; m += p ) cross[m-g0] = true ;
}
for ( unsigned ci = 0 ; ci < cross.size() ; ++ci ) {
if ( not cross[ci] ) _primes.push_back(g0+ci) ;
}
}

public :
Primes ( ) : _idx(0) { }
unsigned operator * ( ) { return _primes[_idx] ; }
Primes & operator ++ ( ) {
++_idx ;
if ( _idx >= _primes.size() ) grow() ;
return *this ;
}
} ;

unsigned Primes::g = 2 ;
std::vector<unsigned> Primes::_primes(1,2) ;

unsigned nthPrime ( unsigned n ) {
Primes primes ;
for ( unsigned i = 0 ; i < n ; ++i ) ++primes ;
return *primes ;
}

int main ( int argc, char * argv[])
{
unsigned const i = std::atoi(argv[1]) ;
std::cout << nthPrime(i) << '\n' ;
return 0 ;
}

KHD

frankenstein

unread,

Feb 23, 2010, 2:07:11 AM2/23/10

to

On Feb 23, 1:07 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> Jon Harrop wrote:

> Parallelizing the loop over "a" requires tiny changes and provides a ~3x
> performance improvement on my 8 core:
>
> let primes =
> let a = ResizeArray[2]
> let grow() =
> let p0 = a.[a.Count-1]+1
> let b = Array.create p0 true
> System.Threading.Tasks.Parallel.For(0, a.Count, fun j ->
> let di = a.[j]
> let rec loop i =
> if i<b.Length then
> b.[i] <- false
> loop(i+di)
> let i0 = p0/di*di
> loop(if i0<p0 then i0+di-p0 else i0-p0)) |> ignore
> for i=0 to b.Length-1 do
> if b.[i] then a.Add(p0+i)
> fun n ->
> while n >= a.Count do
> grow()
> a.[n]
>
> --
> Dr Jon D Harrop, Flying Frog Consultancy Ltd.http://www.ffconsultancy.com/?u

I am not interested in micro benchmark times because they are
completely useless in real life since how a language handles memory
and large data sets would be important.

However, general programming concepts are still important.

It seems to be the case that F# is way ahead of lets say Bigloo when
it comes to parallelizing since it comes out of the box as far as I
can tell from your code. I would have no idea how to parallelize the
Bigloo version by using POSIX.

But F# as any other Lisp and Scheme out there: no one can tell me F#
leads to user friendly code which can easily be read by others.

No one knows the type of this by just reading the code:

==
i0 = p0/di*di
==

Is it an int or a float? Not to speak how F# will handle it: i0 = (p0/
di)*di or p0/(di*di)?

Context that i0 should mean an integer is no excuse here. It is just
funny: a strict type sytem which only the compiler know.

This is one of the reasons why I would never use Common Lisp or any
other Scheme for writing numerical code. Granted Bigloo only has basic
types on top of defined class types. In Bigloo I would write:

==
(let ((p0::int (* (/ p0 di) di)
==

Clean paved the way for how to document your code: every example in
the Clean book is annotated by its respective types. This is good
style.

frankenstein

unread,

Feb 23, 2010, 2:13:23 AM2/23/10

to

On Feb 23, 7:07 am, frankenstein <klohmusc...@yahoo.de> wrote:
In Bigloo I would write:
>
> ==
> (let ((p0::int (* (/ p0 di) di)
> ==
>
> Clean paved the way for how to document your code: every example in
> the Clean book is annotated by its respective types. This is good
> style.

I think it should be:

==
(let ((p0::int (inexact->exact (* (/ p0 di) di))
==

I still haven't figured out what the F# line actually does.

Ertugrul Söylemez

unread,

Feb 23, 2010, 5:51:33 AM2/23/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> > The type system of standard Haskell is designed such that it always
> > succeeds or the code is wrong.
>
> The following traditional implementation of the y-combinator is
> correct code:
>
> fix f = (\x -> f (x x)) (\x -> f (x x))
>
> Yet the type system of standard Haskell fails on it.

This is the traditional Y combinator of untyped lambda calculus.
Haskell implements the typed lambda calculus, in which you can't express
the Y combinator. It is correct and expected that this code does not
typecheck.

This is not related to any shortcoming of Haskell's type system. It
does its job properly. It's a shortcoming of your understanding of the
lambda calculus. Note that the typed lambda calculus is not Turing
complete.

> >> And that will be a *lot* faster than your 6x longer Haskell
> >> solution.
> >
> > That may be because my Haskell version is not the sieve of
> > Eratosthenes. ;)
>
> Exactly.

Good. Then your whole statement is void, because you're comparing the
running times of two different algorithms with different complexities,
which is obviously pointless.

> > Of course you cannot implement it that short in Haskell, because APL
> > has good support for matrix operations built into the language. But
> > anyway, see my other post. The SoE is very fast in Haskell (the C
> > version is only 1.6 times faster) and needs little code. The
> > performance difference is related to the fact that standard Haskell
> > arrays are safe. I left out the bounds checking from the C code,
> > which makes it faster.
>
> That speculation is obviously wrong: my F# is bounds checked is still
> much faster than your Haskell.

You're saying that F# outperforms C. I used to program in F# at work.
My code was always much slower than the equivalent F# code.

> >> The nearest she came to a decent Haskell implementation is 4,690
> >> bytes of Haskell code (!) and still runs several times slower than
> >> my 436 byte F# solution.
> >
> > See my other post. Haskell sieve of Eratosthenes in 7 lines of
> > code.
>
> No. Your incomplete Haskell solution is already 15 lines of code.

My solution is not incomplete and I could remove what would be "bloat"
in your terminology: type signatures, convenience functions and
imports. But I find the code more readable the way I've written it.
The core sieve can be implemented in only three lines. With combinators
and less syntactic sugar I can even squeeze it further:

main = fmap (map read) getArgs >>=
mapM_ (\n -> print . last . map fst . filter snd . assocs . runSTUArray $
newArray (2,n) True >>= \arr ->
arr <$ forM_ [2..n] (\i -> readArray arr i >>=
(`when` forM_ [2*i, 3*i .. n] (flip (writeArray arr) False))))

This is not merely the sieve. It's a complete program, which expects
sieve sizes as parameters, runs the sieve for each of these sizes and
prints the largest prime found. It will be slower than the original
sieve by a small constant factor, because this one uses a 2-based array.

> > Two unnecessary writeArrays are there just for efficiency. And it
> > gets near C speed.
>
> On my machine your Haskell is over 3x slower than your C.

Haskell code is always orders of magnitude slower, as soon as you
compile it.

Jon Harrop

unread,

Feb 23, 2010, 9:46:37 AM2/23/10

to

frankenstein wrote:
> It seems to be the case that F# is way ahead of lets say Bigloo when
> it comes to parallelizing since it comes out of the box as far as I
> can tell from your code. I would have no idea how to parallelize the
> Bigloo version by using POSIX.

F# is way ahead of everything open source when it comes to parallelism.

> But F# as any other Lisp and Scheme out there: no one can tell me F#
> leads to user friendly code which can easily be read by others.
>
> No one knows the type of this by just reading the code:
>
> ==
> i0 = p0/di*di
> ==

An F# developer would hover the mouse over those identifiers to see that
they correspond to variables with the inferred type "int".

> Clean paved the way for how to document your code: every example in
> the Clean book is annotated by its respective types. This is good
> style.

I prefer automatic inference with throwback of inferred data in the IDE.

frankenstein

unread,

Feb 23, 2010, 11:55:38 AM2/23/10

to

On Feb 23, 2:46 pm, Jon Harrop <j...@ffconsultancy.com> wrote:

> > Clean paved the way for how to document your code: every example in
> > the Clean book is annotated by its respective types. This is good
> > style.
>
> I prefer automatic inference with throwback of inferred data in the IDE.
>
> --
> Dr Jon D Harrop, Flying Frog Consultancy Ltd.http://www.ffconsultancy.com/?u

Hi Jon

It took me a long time to figure out how to avoid too much consing.
Push in C++ seems to be VERY, VERY efficient and so is cache (if wanna
believe your timings) in F#! I do not think that any Common Lisp out
there would stand any chance against C++ its push.

My Bigloo (which is a mix of your F# and the C++ version) version
(code attached) for 1e7 (10 million): The version might be up for some
more improvements.

==
bigloo -Obench sieve.scm

time ./a.out
==

delivers:
==
10466504
prime 179424691
#<output_port:stdout>

real 0m22.969s
user 0m20.678s
sys 0m1.068s
==

The C version you posted:

==
g++ -O3 sieve.c

time ./a.out
==

delivers:

==
a size 10466504
Prime 10000000 is 179424691 1

real 0m14.299s
user 0m13.310s
sys 0m0.237s
==

==
(module sieve)

(define-inline (acc::vector i::bint p0::bint b::vector a::pair)
(cond ((=fx i (vector-length b))
(list->vector (reverse a)))
((= (vector-ref b i) 1)
(acc (+fx i 1) p0 b (append! (list (+fx i p0)) a)))
(else
(acc (+fx i 1) p0 b a))))

(define (primes::obj n::bint)
(let ([a::vector (vector 2)])
(define (grow a)
(let* ([p0::bint (+fx (vector-ref a
(-fx (vector-length a) 1)) 1)]
[b::vector (make-vector p0 1)]
[dim::bint p0)
(dim-a::bint (vector-length a)))
(do ([jj 0 (+fx jj 1)])
([=fx jj dim-a])
(let* ([di::bint (vector-ref a jj)]
(i0::bint (*fx (inexact->exact (/ p0 di)) di))
[i0::bint (if (<fx i0 p0)
(-fx (+fx i0 di) p0)
(-fx i0 p0))])
(do ([j i0 (+fx j di)])
([>fx j (-fx dim 1)])
(vector-set! b j 0))))
(acc 0 p0 b (vector->list a))))
(let while ((a a))
(cond ([>= n (vector-length a)]
(print "Doing n " (vector-length a))
(while (grow a)))
(else
(print (vector-length a))
(print "prime " (vector-ref a n)))))))

(print (primes 10000000))
==

frankenstein

unread,

Feb 23, 2010, 11:57:55 AM2/23/10

to

On Feb 23, 4:55 pm, frankenstein <klohmusc...@yahoo.de> wrote:

> The C version you posted:
>
> ==
> g++ -O3 sieve.c
>
> time ./a.out
> ==

typo: the C++ version:

I ment:

g++ -O3 sieve.cpp

frankenstein

unread,

Feb 23, 2010, 12:07:57 PM2/23/10

to

On Feb 23, 4:57 pm, frankenstein <klohmusc...@yahoo.de> wrote:

> typo: the C++ version:
>
> I ment:
>
> g++ -O3 sieve.cpp

To put the figures into the right light:

my machine: Mac OS X, MacBook laptop, dual core 2GHz processor with
4GB RAM and gnu gcc compiler (both for Bigloo and C++)

I am not so interested into the raw metal timings because they are
useless. However, consing is still after all the years yet a crucial
problem in every of the Scheme and Lisp implementations out there.

frankenstein

unread,

Feb 23, 2010, 12:24:07 PM2/23/10

to

On Feb 23, 4:55 pm, frankenstein <klohmusc...@yahoo.de> wrote:

Additonal note: Using Bigloo its (append-vector) would shift the
timings into infinity. Seems to be append-vector has to traverse the
vector every time a new element is being attached:

==

(define-inline (acc::vector i::bint p0::bint b::vector a::vector)

(cond ((=fx i (vector-length b))

a)

((= (vector-ref b i) 1)

(acc (+fx i 1) p0 b (vector-append a (vector (+fx i p0)))))

(else
(acc (+fx i 1) p0 b a))))

==

The secret lies here: a) convert the passed vector into a list, b)
accumulate the list by using append , c) upon exit convert back the
list into a vector. The main program (grow) will suffer if you are
using lists instead of vectors because access time is not constant and
you need a vector here.

==

(define-inline (acc::vector i::bint p0::bint b::vector a::pair)
(cond ((=fx i (vector-length b))
(list->vector (reverse a)))
((= (vector-ref b i) 1)
(acc (+fx i 1) p0 b (append! (list (+fx i p0)) a)))
(else
(acc (+fx i 1) p0 b a))))

==

Jussi Piitulainen

unread,

Feb 23, 2010, 1:08:52 PM2/23/10

to

frankenstein writes:

> ... (append! (list (+fx i p0)) a) ...

You could use cons here if cons wasn't such a performance killer.
Right?

frankenstein

unread,

Feb 23, 2010, 1:21:11 PM2/23/10

to

On Feb 23, 6:08 pm, Jussi Piitulainen <jpiit...@ling.helsinki.fi>
wrote:

The strange thing: it wouldn't make any difference (the timings are
still fast and more or less the same: 23 seconds): using cons, append,
or append! here.

At least for Bigloo. I haven't observed the cost of memory whilst
using cons though.

Again a comment to the append vector:

Even when appending the element at the beginning of the vector:

(vector-append (vector value) a)

and reversing it upon exit will not put the timings down and they are
still highly n^2 or n^3. Not sure why append-vector in Bigloo is such
a dog.

Jussi Piitulainen

unread,

Feb 23, 2010, 2:24:06 PM2/23/10

to

frankenstein <klohm...@yahoo.de> writes:

> On Feb 23, 6:08 pm, Jussi Piitulainen <jpiit...@ling.helsinki.fi>
> wrote:
> > frankenstein writes:
> > > ... (append! (list (+fx i p0)) a) ...
> >
> > You could use cons here if cons wasn't such a performance killer.
> > Right?
>
> The strange thing: it wouldn't make any difference (the timings are
> still fast and more or less the same: 23 seconds): using cons,
> append, or append! here.

(cons x a) simply allocates the cell you want.

(append (list x) a) allocates at least two cells, at least one of
which can be collected immediately: append copies all but the last
argument.

(append! (list x) a) allocates one cell and then mutates it.

(append! a (list x)) would also only allocate one cell, but it needs
to mutate the last cell of a, which requires traversal of whole a.

> At least for Bigloo. I haven't observed the cost of memory whilst
> using cons though.

Are you aware that (list x) is equivalent to (cons x '())?

> Again a comment to the append vector:
>
> Even when appending the element at the beginning of the vector:
>
> (vector-append (vector value) a)
>
> and reversing it upon exit will not put the timings down and they
> are still highly n^2 or n^3. Not sure why append-vector in Bigloo is
> such a dog.

It allocates a whole new vector and stores the contents of both old
vectors to it.

Least work and least allocation here is done by cons, though I suppose
when lispers talk about consing they may refer to allocation of memory
in general.

frankenstein

unread,

Feb 23, 2010, 2:32:21 PM2/23/10

to

On Feb 23, 7:24 pm, Jussi Piitulainen <jpiit...@ling.helsinki.fi>
wrote:

Thanks for the lesson Jussi. Believe it or not over the years I have
been always admiring your deep knowledge of the internals of the
Scheme language and how things work.

I haven't had a deeper look into the other C++ version for the reason
to learning if it might be possible to aplly some more tricks to the
Scheme version. However, I am not sure why the guy (no pun intented)
wrote a C++ version which shows all the signs of bad C++ programming
only for the fact to show off even though it might be very fast. His
version reminds me on old Fortran code with lots of GOTOS. I liked
Jon's version for that matter: it is readable.

Jon Harrop

unread,

Feb 23, 2010, 4:05:36 PM2/23/10

to

Ertugrul Söylemez wrote:
> Jon Harrop <j...@ffconsultancy.com> wrote:
>> > The type system of standard Haskell is designed such that it always
>> > succeeds or the code is wrong.
>>
>> The following traditional implementation of the y-combinator is
>> correct code:
>>
>> fix f = (\x -> f (x x)) (\x -> f (x x))
>>
>> Yet the type system of standard Haskell fails on it.
>
> This is the traditional Y combinator of untyped lambda calculus.
> Haskell implements the typed lambda calculus, in which you can't express
> the Y combinator. It is correct and expected that this code does not
> typecheck.
>
> This is not related to any shortcoming of Haskell's type system. It
> does its job properly. It's a shortcoming of your understanding of the
> lambda calculus.

If it is not a shortcoming of Haskell's type system, why does it work in
OCaml:

$ ocaml -rectypes
Objective Caml version 3.11.1

# let fix f = (fun x -> f(x x)) (fun x y -> f(x x) y);;
val fix : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun>

# fix (fun f -> function 0 -> 1 | n -> n*f(n-1)) 10;;
- : int = 3628800

>> > That may be because my Haskell version is not the sieve of
>> > Eratosthenes. ;)
>>
>> Exactly.
>
> Good. Then your whole statement is void, because you're comparing the
> running times of two different algorithms with different complexities,
> which is obviously pointless.

You cannot see the point of observing that other languages let you solve
problems more succinctly and efficiently than Haskell?

>> That speculation is obviously wrong: my F# is bounds checked is still
>> much faster than your Haskell.
>
> You're saying that F# outperforms C.

C compiled with gcc, yes.

> I used to program in F# at work. My code was always much slower than the
> equivalent F# code.

My F# code often outperforms C. My F# code has even outperformed
vendor-tuned Fortran on linear algebra.

>> >> The nearest she came to a decent Haskell implementation is 4,690
>> >> bytes of Haskell code (!) and still runs several times slower than
>> >> my 436 byte F# solution.
>> >
>> > See my other post. Haskell sieve of Eratosthenes in 7 lines of
>> > code.
>>
>> No. Your incomplete Haskell solution is already 15 lines of code.
>
> My solution is not incomplete

Your solution does not extend its sieve => it is incomplete.

> and I could remove what would be "bloat"
> in your terminology: type signatures, convenience functions and
> imports. But I find the code more readable the way I've written it.
> The core sieve can be implemented in only three lines. With combinators
> and less syntactic sugar I can even squeeze it further:
>
> main = fmap (map read) getArgs >>=
> mapM_ (\n -> print . last . map fst . filter snd . assocs .
> runSTUArray $
> newArray (2,n) True >>= \arr ->
> arr <$ forM_ [2..n] (\i -> readArray arr i >>=
> (`when` forM_ [2*i, 3*i .. n] (flip (writeArray arr)
> False))))

That's 7 lines of code, is still incomplete by design and doesn't even
compile:

$ ghc-6.12.1 --make -O2 sieve.hs -o sieve
[1 of 1] Compiling Main ( sieve.hs, sieve.o )

sieve.hs:1:23: Not in scope: `getArgs'

sieve.hs:2:59: Not in scope: `assocs'

sieve.hs:2:68: Not in scope: `runSTUArray'

sieve.hs:3:10: Not in scope: `newArray'

sieve.hs:4:14: Not in scope: `<$'

sieve.hs:4:17: Not in scope: `forM_'

sieve.hs:4:37: Not in scope: `readArray'

sieve.hs:5:13: Not in scope: `when'

sieve.hs:5:20: Not in scope: `forM_'

sieve.hs:5:49: Not in scope: `writeArray'

> This is not merely the sieve.

Indeed, it doesn't even implement the extensible sieve correctly.

> It's a complete program,

Then why doesn't it compile or work?

> which expects sieve sizes as parameters,

A complete program would not even require sieve sizes.

>> > Two unnecessary writeArrays are there just for efficiency. And it
>> > gets near C speed.
>>
>> On my machine your Haskell is over 3x slower than your C.
>
> Haskell code is always orders of magnitude slower, as soon as you
> compile it.

I suppose its my fault your programs are so long and incomplete as well?

Jon Harrop

unread,

Feb 23, 2010, 7:30:19 PM2/23/10

to

Keith H Duggar wrote:
> On Feb 22, 10:07 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
>> F#: 5.9s (parallel)
>> F#: 15.9s (serial)
>> C++: 7.9s
>> C: 10.8s (with optimal limit)
>> Haskell: 33.4s (with optimal limit)
>

> Which is twice slower (on my machine) than my equivalent C++
> version (code below) for the 10meg prime

Your C++ is 4x faster than my C++ on this machine, taking only 2s.

> and 4.6 times slower for the 50meg prime. I think the main reason
> is the reduced memory consumption and improved memory locality in
> my version due to scanning a (more) mimimal gap size when growing.

I'll have to check it out. Thanks!

toby

unread,

Feb 23, 2010, 8:03:30 PM2/23/10

to

On Feb 23, 9:46 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> frankenstein wrote:
> > It seems to be the case that F# is way ahead of lets say Bigloo when
> > it comes to parallelizing since it comes out of the box as far as I
> > can tell from your code. I would have no idea how to parallelize the
> > Bigloo version by using POSIX.
>
> F# is way ahead of everything open source when it comes to parallelism.

How about Erlang?

System.Threading.Tasks.Parallel.For(0, a.Count, fun j ->

^^
looks like something only a Microsoftie could love.

Jon Harrop

unread,

Feb 24, 2010, 7:12:04 AM2/24/10

to

toby wrote:
> On Feb 23, 9:46 am, Jon Harrop <j...@ffconsultancy.com> wrote:
>> frankenstein wrote:
>> > It seems to be the case that F# is way ahead of lets say Bigloo when
>> > it comes to parallelizing since it comes out of the box as far as I
>> > can tell from your code. I would have no idea how to parallelize the
>> > Bigloo version by using POSIX.
>>
>> F# is way ahead of everything open source when it comes to parallelism.
>
> How about Erlang?

Erlang is great for concurrency but useless for parallelism.

Ertugrul Söylemez

unread,

Feb 24, 2010, 9:42:13 AM2/24/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> >> The following traditional implementation of the y-combinator is
> >> correct code:
> >>
> >> fix f = (\x -> f (x x)) (\x -> f (x x))
> >>
> >> Yet the type system of standard Haskell fails on it.
> >
> > This is the traditional Y combinator of untyped lambda calculus.
> > Haskell implements the typed lambda calculus, in which you can't
> > express the Y combinator. It is correct and expected that this code
> > does not typecheck.
> >
> > This is not related to any shortcoming of Haskell's type system. It
> > does its job properly. It's a shortcoming of your understanding of
> > the lambda calculus.
>
> If it is not a shortcoming of Haskell's type system, why does it work
> in OCaml:
>
> $ ocaml -rectypes
> Objective Caml version 3.11.1
>
> # let fix f = (fun x -> f(x x)) (fun x y -> f(x x) y);;
> val fix : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun>
>
> # fix (fun f -> function 0 -> 1 | n -> n*f(n-1)) 10;;
> - : int = 3628800

I don't know OCaml's type system, but obviously it doesn't implement the
typed lambda calculus exactly, at least not with the '-rectypes'
argument, you're passsing to it. Note that I'm talking about standard
Haskell without extensions.

> >>> That may be because my Haskell version is not the sieve of
> >>> Eratosthenes. ;)
> >>
> >> Exactly.
> >
> > Good. Then your whole statement is void, because you're comparing
> > the running times of two different algorithms with different
> > complexities, which is obviously pointless.
>
> You cannot see the point of observing that other languages let you
> solve problems more succinctly and efficiently than Haskell?

If your point is something completely different and unrelated to your
actual statement, no I can't see your point. I don't care either,
because your point is wrong anyway.

> >>>> The nearest she came to a decent Haskell implementation is 4,690
> >>>> bytes of Haskell code (!) and still runs several times slower
> >>>> than my 436 byte F# solution.
> >>>
> >>> See my other post. Haskell sieve of Eratosthenes in 7 lines of
> >>> code.
> >>
> >> No. Your incomplete Haskell solution is already 15 lines of code.
> >
> > My solution is not incomplete
>
> Your solution does not extend its sieve => it is incomplete.

It is complete, because we were talking about the sieve of Eratosthenes.
The original description does not include extension. If it did, I would
have written a growing sieve and it wouldn't have 4690 bytes. Just
because someone else fails to do it proper doesn't mean it's not
possible.

> > and I could remove what would be "bloat" in your terminology: type
> > signatures, convenience functions and imports. But I find the code
> > more readable the way I've written it. The core sieve can be
> > implemented in only three lines. With combinators and less syntactic
> > sugar I can even squeeze it further:
> >
> > main = fmap (map read) getArgs >>=
> > mapM_ (\n -> print . last . map fst . filter snd . assocs .
> > runSTUArray $
> > newArray (2,n) True >>= \arr ->
> > arr <$ forM_ [2..n] (\i -> readArray arr i >>=
> > (`when` forM_ [2*i, 3*i .. n] (flip (writeArray arr)
> > False))))
>
> That's 7 lines of code, is still incomplete by design and doesn't even
> compile:

I posted five lines of code, not seven. Not my fault, if your
newsreader messes up my post. Also I don't count imports, because I
could use fully qualified names just as well, but it would be braindead
to do it.

> > This is not merely the sieve.
>
> Indeed, it doesn't even implement the extensible sieve correctly.
>
> > It's a complete program,
>
> Then why doesn't it compile or work?
>
> > which expects sieve sizes as parameters,
>
> A complete program would not even require sieve sizes.

Thousands of implementations in history disagree with you.

> >>> Two unnecessary writeArrays are there just for efficiency. And it
> >>> gets near C speed.
> >>
> >> On my machine your Haskell is over 3x slower than your C.
> >
> > Haskell code is always orders of magnitude slower, as soon as you
> > compile it.
>
> I suppose its my fault your programs are so long and incomplete as
> well?

If I would write a "complete" (in your sense) implementation, it would
be shorter than your F# code. But I don't feel like wasting my time to
prove things I don't care about to people I don't care about. I've
already done much more than I wanted to.

Jon Harrop

unread,

Feb 24, 2010, 4:09:57 PM2/24/10

to

The problem is that you have been confusing "the typed lambda calculus" with
the *simply* typed lambda calculus.

>> >>> See my other post. Haskell sieve of Eratosthenes in 7 lines of
>> >>> code.
>> >>
>> >> No. Your incomplete Haskell solution is already 15 lines of code.
>> >
>> > My solution is not incomplete
>>
>> Your solution does not extend its sieve => it is incomplete.
>
> It is complete, because we were talking about the sieve of Eratosthenes.

No, we were talking specifically about my extensible implementation of it
here:

http://groups.google.com/group/comp.lang.functional/msg/3659ea9adaa46653?hl=en

You can tell that by the way everyone else posted solutions equivalent to
that one.

> The original description does not include extension.

Yes, it did. I know that because I posted the original description.

> If it did, I would
> have written a growing sieve and it wouldn't have 4690 bytes. Just
> because someone else fails to do it proper doesn't mean it's not
> possible.

Nobody claimed it was impossible to implement in Haskell. You claimed it
would be "easy" to write and "fast" to run and then failed to produce any
correct or performant solutions in Haskell.

>> > and I could remove what would be "bloat" in your terminology: type
>> > signatures, convenience functions and imports. But I find the code
>> > more readable the way I've written it. The core sieve can be
>> > implemented in only three lines. With combinators and less syntactic
>> > sugar I can even squeeze it further:
>> >
>> > main = fmap (map read) getArgs >>=
>> > mapM_ (\n -> print . last . map fst . filter snd . assocs .
>> > runSTUArray $
>> > newArray (2,n) True >>= \arr ->
>> > arr <$ forM_ [2..n] (\i -> readArray arr i >>=
>> > (`when` forM_ [2*i, 3*i .. n] (flip (writeArray arr)
>> > False))))
>>
>> That's 7 lines of code, is still incomplete by design and doesn't even
>> compile:
>
> I posted five lines of code, not seven. Not my fault, if your
> newsreader messes up my post.

But it is your fault that your code overran 80 char lines. You simply put
more code on fewer lines and switched from counting chars to counting lines
in order to substantiate the pro-Haskell conclusion you started out with.

You'll notice that my original F# solution was properly formatted and its
longest line was only 43 chars.

> Also I don't count imports, because...

because you want to conclude that Haskell is concise.

>> > This is not merely the sieve.
>>
>> Indeed, it doesn't even implement the extensible sieve correctly.
>>
>> > It's a complete program,
>>
>> Then why doesn't it compile or work?
>>
>> > which expects sieve sizes as parameters,
>>
>> A complete program would not even require sieve sizes.
>
> Thousands of implementations in history disagree with you.

Three other people responded with competing implementations to the challenge
I set. Only you got it wrong.

>> >>> Two unnecessary writeArrays are there just for efficiency. And it
>> >>> gets near C speed.
>> >>
>> >> On my machine your Haskell is over 3x slower than your C.
>> >
>> > Haskell code is always orders of magnitude slower, as soon as you
>> > compile it.
>>
>> I suppose its my fault your programs are so long and incomplete as
>> well?
>
> If I would write a "complete" (in your sense) implementation, it would

> be shorter than your F# code...

Wow.

Andreas Rossberg

unread,

Feb 24, 2010, 6:22:09 PM2/24/10

to

I can't refrain from a couple of comments on Jon's agitation, though
I'm sure I gonna regret posting them...

> [...] Look at any
> production Haskell code and you will find the same thing. In Ertugral's
> original Haskell code on this thread, these unnecessary type annotations
> *doubled* the size of his code. That is not uncommon.

That is a wild claim, and I doubt that you have any serious statistics
to back it up.

Moreover, if the declaration-to-definition ratio is really higher in
Haskell code than ML then that probably says more about the high-
levelness of Haskell code than about alleged weaknesses of its type
inference.

> > As Ertugrul has just said, the type annotations are **completely
> > optional**; the type system will always infer the right most general
> > type --except with certain odd extensions and in the monomorphic
> > restriction case.
>
> You are describing a property of Hindley-Milner yet other languages based on
> HM are not afflicted by these type annotations. How do you explain that?

Haskell's monomorphism restriction is a relaxed version of ML's value
restriction. And while Haskell at least allows you to get around it
with a type annotation, there is no way to do so in ML. Are you truly
arguing that that is an advantage and makes ML type inference "more
powerful"?

Generally speaking, annotations are a moot point anyway. It has long
been acknowledged by most experts that complete type inference is a
dead end. As a trade-off for increased expressiveness, future type
systems will almost certainly require some amount of annotation. In
fact, they already do in all existing languages, including all MLs.

> > Why are then the type annotations widely used? Because they help the
> > programmer. [...]
>
> You are just repeating the same bullshit "explanation" that Ertugral posted.
> If that really were the motivation for bloating every definition with a
> type annotation in production code you would see the same effect in all
> FPLs but you do not. Why?

A reasonable ML programmer will typically put a similar amount of type
declarations into module signatures. The fact that you never do that
probably says more about your approach to software engineering than
about the alleged verbosity of ML vs Haskell.

[in another subthread:]

> The following traditional implementation of the y-combinator is correct
> code:
> fix f = (\x -> f (x x)) (\x -> f (x x))
> Yet the type system of standard Haskell fails on it.

And so does any of the MLs, except if you intentionally run OCaml with
rectypes on. Arbitrary recursive types are usually disallowed for very
good reasons that you should be aware of before using them as an
argument. Incidentally, these reasons have to do with too much power
and incomprehensible error messages.

> No, Haskell cannot express this because it doesn't have a higher order
> module system.

Interestingly, with the same verve, you have denied any need for
functors in previous discussions (where you happened to pitch F#,
which does not have them either).

Matthias

unread,

Feb 24, 2010, 9:16:48 PM2/24/10

to

On Feb 24, 8:42 am, Ertugrul Söylemez <e...@ertes.de> wrote:

> I don't know OCaml's type system, but obviously it doesn't implement the
> typed lambda calculus exactly, at least not with the '-rectypes'
> argument, you're passsing to it.

You are talking about "the" typed lambda calculus as if there were
only one. There are many typed lambda calculi, some for which your
claim is true, and others for which your claim is false. In
particular, in a lambda-calculus with equi-recursive types the Y
combinator can be typed. OCaml with -rectypes happens to implement
such a lambda calculus.

(However, it is widely acknowledged that the behavior that you get
from -rectypes is not usually desirable for many practical reasons.
That's why the option is not on by default.)

Cheers,
Matthias

Jon Harrop

unread,

Feb 24, 2010, 10:38:42 PM2/24/10

to

Andreas Rossberg wrote:
>> [...] Look at any
>> production Haskell code and you will find the same thing. In Ertugral's
>> original Haskell code on this thread, these unnecessary type annotations
>> *doubled* the size of his code. That is not uncommon.
>
> That is a wild claim, and I doubt that you have any serious statistics
> to back it up.

I meant that type annotations often amount to a significant proportion of
Haskell code bases, not that they commonly amount to 50%.

> Moreover, if the declaration-to-definition ratio is really higher in
> Haskell code than ML then that probably says more about the high-
> levelness of Haskell code than about alleged weaknesses of its type
> inference.

What exactly do you mean by "high-levelness" there?

>> > As Ertugrul has just said, the type annotations are **completely
>> > optional**; the type system will always infer the right most general
>> > type --except with certain odd extensions and in the monomorphic
>> > restriction case.
>>
>> You are describing a property of Hindley-Milner yet other languages based
>> on HM are not afflicted by these type annotations. How do you explain
>> that?
>
> Haskell's monomorphism restriction is a relaxed version of ML's value
> restriction. And while Haskell at least allows you to get around it
> with a type annotation, there is no way to do so in ML. Are you truly
> arguing that that is an advantage and makes ML type inference "more
> powerful"?

No, I was not referring to the "monomorphic restriction" part.

My point was that there are trade-offs between weak and powerful type
systems in terms of usability. More powerful => incomprehensible errors =>
add more type annotations to keep it usable.

> Generally speaking, annotations are a moot point anyway. It has long
> been acknowledged by most experts that complete type inference is a
> dead end. As a trade-off for increased expressiveness, future type
> systems will almost certainly require some amount of annotation.

Sure.

> In fact, they already do in all existing languages, including all MLs.

What problem requires a solution using type annotations in OCaml?

>> > Why are then the type annotations widely used? Because they help the
>> > programmer. [...]
>>
>> You are just repeating the same bullshit "explanation" that Ertugral
>> posted. If that really were the motivation for bloating every definition
>> with a type annotation in production code you would see the same effect
>> in all FPLs but you do not. Why?
>
> A reasonable ML programmer will typically put a similar amount of type
> declarations into module signatures. The fact that you never do that
> probably says more about your approach to software engineering than about
> the alleged verbosity of ML vs Haskell.

Firstly, Standard ML programmers might put that density of annotations in
their code but OCaml programmers certainly do not. Look at the examples I
already cited.

Secondly, I said I don't annotate every definition not that I never annotate
any definition. Look at my open source code.

Thirdly, why the strawman-powered ad-hominem?

> [in another subthread:]
>
>> The following traditional implementation of the y-combinator is correct
>> code:
>> fix f = (\x -> f (x x)) (\x -> f (x x))
>> Yet the type system of standard Haskell fails on it.
>
> And so does any of the MLs, except if you intentionally run OCaml with
> rectypes on. Arbitrary recursive types are usually disallowed for very
> good reasons that you should be aware of before using them as an
> argument. Incidentally, these reasons have to do with too much power
> and incomprehensible error messages.

Sure. And because the workaround is generally easy:

# let fix f =

(fun (`X x) -> f(x (`X x))) (`X(fun (`X x) y -> f(x (`X x)) y));;

val fix : (('a -> 'b) -> 'a -> 'b) -> 'a -> 'b = <fun>

>> No, Haskell cannot express this because it doesn't have a higher order

>> module system.
>
> Interestingly, with the same verve, you have denied any need for
> functors in previous discussions (where you happened to pitch F#,
> which does not have them either).

There are alternatives to a higher-order module system. Haskell doesn't have
them either.

frankenstein

unread,

Feb 25, 2010, 11:20:56 AM2/25/10

to

On Feb 23, 4:55 pm, frankenstein <klohmusc...@yahoo.de> wrote:

Update: Using strict srfi-4 vectors (s8 and u32, respectively) results
in timings near on par with Jon's C++ version. Code structure though
is still the same:

$ bigloo -Obench sieve.scm
$ time a.out

==
10466504
prime 179424691
#<output_port:stdout>

real 0m17.240s
user 0m15.086s
sys 0m0.640s
==

==
(module Jon)

(define (primes-u32::obj n::int)
(let ([a::u32vector (u32vector 2)])
(define (grow a)
(let* ([p0::int (+fx (u32vector-ref a
(-fx (u32vector-length a) 1)) 1)]
[b::s8vector (make-s8vector p0 1)]
[dim::int p0)
(dim-a::int (u32vector-length a)))

(do ([jj 0 (+fx jj 1)])
([=fx jj dim-a])

(let* ([di::int (u32vector-ref a jj)]
[i0::int (*fx (/fx p0 di) di)))
(do ([j (if (<fx i0 p0)

(-fx (+fx i0 di) p0)
(-fx i0 p0))

(+fx j di)])
([>fx j (-fx dim 1)])

(s8vector-set! b j 0))))
(acc-u32 0 p0 b (u32vector->list a))))
(let while ((a a))
(cond ([>= n (u32vector-length a)]
(print "Doing n " (u32vector-length a))
(while (grow a)))
(else
(print (u32vector-length a))
(print "prime " (u32vector-ref a n)))))))

(define-inline (acc-u32::u32vector i::bint p0::bint b::s8vector
a::pair)
(cond ((=fx i (s8vector-length b))
(list->u32vector (reverse a)))
((=fx (s8vector-ref b i) 1)
(acc-u32 (+fx i 1) p0 b (cons (+fx i p0) a)))
(else
(acc-u32 (+fx i 1) p0 b a))))

(print (primes 10000000))
==

Ertugrul Söylemez

unread,

Feb 25, 2010, 11:22:14 AM2/25/10

to

Jon Harrop <j...@ffconsultancy.com> wrote:

> >> No, Haskell cannot express this because it doesn't have a higher
> >> order module system.
> >
> > Interestingly, with the same verve, you have denied any need for
> > functors in previous discussions (where you happened to pitch F#,
> > which does not have them either).
>
> There are alternatives to a higher-order module system. Haskell
> doesn't have them either.

It has type classes, which cover 90% of use cases in standard Haskell.
If they don't cover your use case, there are numerous extensions that
will.

Now be a mature boy and don't try to make up a difficult use case,
forcing us to prove things with actual code. I don't feel like wasting
my time anymore. Haskell works great for me for all of my use cases,
much greater than OCaml will ever will.