[Caml-list] Hash clash in polymorphic variants

Jon Harrop

unread,

Jan 10, 2008, 12:17:17 PM1/10/08

to caml...@yquem.inria.fr

ISTR advice that constructors sharing the first few characters should be
avoided in order to reduce the likelihood of clashing hash values for
polymorphic variants. Is that right?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Eric Cooper

unread,

Jan 10, 2008, 3:36:01 PM1/10/08

to caml...@yquem.inria.fr

On Thu, Jan 10, 2008 at 05:09:13PM +0000, Jon Harrop wrote:
> ISTR advice that constructors sharing the first few characters should be
> avoided in order to reduce the likelihood of clashing hash values for
> polymorphic variants. Is that right?

I don't think it's worth worrying about.

I wrote a program a while ago to look into this. I never saw any
"human-sensible" collisions (between two identifiers that a person
might have chosen). And if you're producing gensyms in a program, you
can just check ahead of time.

To find a collision with a given identifier, consider each bignum N
that differs by a multiple of 2^31 from the identifier's hash value.
Compute the radix-223 representation of N. If that forms a legal
OCaml identifier, then you've found a collision.

For example, Eric_Cooper collides with azdwbie, c7diagq, hlChrkt,
NSaServ, and SaupDOF, to pick just a few.

--
Eric (call me SaupDOF) Cooper e c c @ c m u . e d u

Jon Harrop

unread,

Jan 10, 2008, 4:32:30 PM1/10/08

to caml...@yquem.inria.fr

On Thursday 10 January 2008 20:35:34 Eric Cooper wrote:
> On Thu, Jan 10, 2008 at 05:09:13PM +0000, Jon Harrop wrote:
> > ISTR advice that constructors sharing the first few characters should be
> > avoided in order to reduce the likelihood of clashing hash values for
> > polymorphic variants. Is that right?
>
> I don't think it's worth worrying about.
>
> I wrote a program a while ago to look into this. I never saw any
> "human-sensible" collisions (between two identifiers that a person
> might have chosen). And if you're producing gensyms in a program, you
> can just check ahead of time.

I'm interested in automatically translating the GL_* enum from OpenGL into
polymorphic variants. So although it is generated code I have little control
over it, e.g. I cannot change the translation as OpenGL gets extended because
code will already be using the existing names.

Still, maybe I'm over-reacting. ;-)

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

David Allsopp

unread,

Jan 10, 2008, 4:41:37 PM1/10/08

to caml...@yquem.inria.fr

Jon Harrop wrote:
> On Thursday 10 January 2008 20:35:34 Eric Cooper wrote:
> > On Thu, Jan 10, 2008 at 05:09:13PM +0000, Jon Harrop wrote:
> > > ISTR advice that constructors sharing the first few characters should
> > > be avoided in order to reduce the likelihood of clashing hash values
> > > for polymorphic variants. Is that right?
> >
> > I don't think it's worth worrying about.
> >
> > I wrote a program a while ago to look into this. I never saw any
> > "human-sensible" collisions (between two identifiers that a person
> > might have chosen). And if you're producing gensyms in a program, you
> > can just check ahead of time.
>
> I'm interested in automatically translating the GL_* enum from OpenGL into

> polymorphic variants. So although it is generated code I have little
> control over it, e.g. I cannot change the translation as OpenGL gets
> extended because
> code will already be using the existing names.
>
> Still, maybe I'm over-reacting. ;-)

I presume you're worried about the bindings clashing internally rather than
someone who uses the library happening to use a variant that clashes?

You can do something about it - when you're generating your bindings, you
can use the hash_variant() C function to detect the collisions yourself. If
you detect one, you can either issue *your own* warning while generating the
bindings allowing you to specify specific renaming for the program
generating your bindings or you could append digits to the names until the
collisions disappear (which is likely, though not guaranteed, to happen
quickly).

It's slightly ugly, but then the possibility of collisions in the first
place is IMHO ugly too!

David

Jacques Garrigue

unread,

Jan 10, 2008, 7:15:58 PM1/10/08

to j...@ffconsultancy.com, caml...@yquem.inria.fr

From: Jon Harrop <j...@ffconsultancy.com>

> ISTR advice that constructors sharing the first few characters should be
> avoided in order to reduce the likelihood of clashing hash values for
> polymorphic variants. Is that right?

Not at all. If the first characters are identical it just means that an
identical value will be added to the hashes of the suffixes, which
actually means that you lower the probability of getting conflicts :-)
The hash functions guarantees that all keys of strictly less than 5
characters will map to different.

The probability of getting clashes being really low, you should not be
concerned by this. Just check aferwards. A simple way to do it is to
produce a big type containing all the tags, and feed it to ocamlc.

> I'm interested in automatically translating the GL_* enum from
> OpenGL into polymorphic variants. So although it is generated code I
> have little control over it, e.g. I cannot change the translation as
> OpenGL gets extended because code will already be using the existing
> names.

In the event you get a conflict when openGL is extended, you can still
add a special case for the newly added tags. I hope this does not
happen, but the birthday theorem tells you that when you get enough
participants, clashes are hard to avoid.

Cheers,

Jacques Garrigue

Kuba Ober

unread,

Jan 11, 2008, 8:30:57 AM1/11/08

to caml...@yquem.inria.fr

> > > > ISTR advice that constructors sharing the first few characters should
> > > > be avoided in order to reduce the likelihood of clashing hash values
> > > > for polymorphic variants. Is that right?
> > >
> > > I don't think it's worth worrying about.
> >

> > I'm interested in automatically translating the GL_* enum from OpenGL
> > into
> > polymorphic variants. So although it is generated code I have little
>

> I presume you're worried about the bindings clashing internally rather than
> someone who uses the library happening to use a variant that clashes?
>
> You can do something about it - when you're generating your bindings, you
> can use the hash_variant() C function to detect the collisions yourself. If
> you detect one, you can either issue *your own* warning while generating
> the bindings allowing you to specify specific renaming for the program
> generating your bindings or you could append digits to the names until the
> collisions disappear (which is likely, though not guaranteed, to happen
> quickly).
>
> It's slightly ugly, but then the possibility of collisions in the first
> place is IMHO ugly too!

Are those collisions of any real importance? I mean, do they break anything?
If all they do is imply linearly searching a list of a few elements, for the
colliding entry, then it's a non-issue?

Cheers, Kuba

Jon Harrop

unread,

Jan 11, 2008, 8:56:25 AM1/11/08

to caml...@yquem.inria.fr

On Friday 11 January 2008 13:30:29 Kuba Ober wrote:
> Are those collisions of any real importance? I mean, do they break
> anything? If all they do is imply linearly searching a list of a few
> elements, for the colliding entry, then it's a non-issue?

It would prevent code from compiling so it would be a complete show-stopper.

In this case, there is a chance that a hash clash in names that I have no
control over would break my OpenGL bindings at some point in the future.

A theoretical solution would be to grow the bindings and avoid clashes in
identifiers included in later versions of OpenGL by adding random suffixes.
Although this works in theory, in practice it places the burden of a linear
search on the programmer who must then sift through the bindings to find out
if the identifier they want to use happens to have had an internal clash in
my bindings and, therefore, would require them to use a different identifier.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Kuba Ober

unread,

Jan 11, 2008, 11:14:42 AM1/11/08

to caml...@yquem.inria.fr

On Friday 11 January 2008, Jon Harrop wrote:
> On Friday 11 January 2008 13:30:29 Kuba Ober wrote:
> > Are those collisions of any real importance? I mean, do they break
> > anything? If all they do is imply linearly searching a list of a few
> > elements, for the colliding entry, then it's a non-issue?
>
> It would prevent code from compiling so it would be a complete
> show-stopper.

So what you're saying is that the implementation uses the hash with bucket
size of 1? That's kinda poor decision, methinks.

Maybe perfect hashes should be used, computed at link time (and at runtime
whenever a module is linked in). The pefect hashing function could probably
implement some sort of a table, so that no real code would need to be
generated, just recomputing of decision tree table. Gperf code could be
adapted for that. The benefit is that there would be no collisions, the hashed
data structure would be very compact, and the cost to regenerate the hash is
amortized. Ideally, one would generate the actual perfect hashing function,
but this is currently only possible in bytecode, right? I mean, toplevel won't
run in native code? Or am I mistaken?

Kuba

David Allsopp

unread,

Jan 11, 2008, 1:41:27 PM1/11/08

to caml...@yquem.inria.fr

Kuba Ober wrote:
> On Friday 11 January 2008, Jon Harrop wrote:
> > On Friday 11 January 2008 13:30:29 Kuba Ober wrote:
> > > Are those collisions of any real importance? I mean, do they break
> > > anything? If all they do is imply linearly searching a list of a few
> > > elements, for the colliding entry, then it's a non-issue?
> >
> > It would prevent code from compiling so it would be a complete
> > show-stopper.
>
> So what you're saying is that the implementation uses the hash with bucket

> size of 1? That's kinda poor decision, methinks.

I think you're missing the context - there's no hash table. See 18.3.6 in
the manual - the hashed values (and resulting collisions) are to do with the
internal representation of polymorphic variants.

The compiler cannot process code that uses two polymorphic variants whose
tag names will have the same internal representation (and therefore be
incorrectly viewed as having the same value). The test is probably performed
somewhere in the type checker...

An alternative implementation might have been to lookup the tags (in a
perfect hash table) using a system similar to caml_named_value but I imagine
that the present method was preferred because it's simpler (and quite
possibly faster) and collisions are rare (as Eric pointed out) - although in
Jon's case the lack of a guarantee is unfortunate.

Incidentally, and off-the-subject here, using a hash table with a bucket
size of 1 is very important if you need performance guarantees on your hash
table and have some other way of coping with collisions.

David

Kuba Ober

unread,

Jan 14, 2008, 7:20:24 AM1/14/08

to caml...@yquem.inria.fr

On Friday 11 January 2008, David Allsopp wrote:
> Kuba Ober wrote:
> > On Friday 11 January 2008, Jon Harrop wrote:
> > > On Friday 11 January 2008 13:30:29 Kuba Ober wrote:
> > > > Are those collisions of any real importance? I mean, do they break
> > > > anything? If all they do is imply linearly searching a list of a few
> > > > elements, for the colliding entry, then it's a non-issue?
> > >
> > > It would prevent code from compiling so it would be a complete
> > > show-stopper.
> >
> > So what you're saying is that the implementation uses the hash with
> > bucket
> >
> > size of 1? That's kinda poor decision, methinks.
>
> I think you're missing the context - there's no hash table. See 18.3.6 in
> the manual - the hashed values (and resulting collisions) are to do with
> the internal representation of polymorphic variants.
>
> The compiler cannot process code that uses two polymorphic variants whose
> tag names will have the same internal representation (and therefore be
> incorrectly viewed as having the same value). The test is probably
> performed somewhere in the type checker...

Yeah, I sort of put the wagon ahead of the horse. Of course the hashing
function doesn't imply a hash table.

What I meant was simply that instead of using some fixed hash function, one
could use a perfect hashing function which is optimal for its known set of
inputs, and won't ever generate a collision.

The tables that such a function uses to hash its input have to be generated at
link-time, which means run-time too.

Cheers, Kuba

Kuba Ober

unread,

Jan 14, 2008, 9:56:51 AM1/14/08

to caml...@yquem.inria.fr

On Monday 14 January 2008, Stefan Monnier wrote:
> > What I meant was simply that instead of using some fixed hash function,
> > one could use a perfect hashing function which is optimal for its known
> > set of inputs, and won't ever generate a collision.
>

> The problem is that the set of inputs is not know at compile time, only
> at link time.

As I've said in the cited post, the perfect hash generator would have to be
invoked at link time, which shouldn't be a big deal.

David Allsopp

unread,

Jan 14, 2008, 10:38:32 AM1/14/08

to caml...@yquem.inria.fr

Kuba Ober wrote:
> On Monday 14 January 2008, Stefan Monnier wrote:
> > > What I meant was simply that instead of using some fixed hash
> > > function, one could use a perfect hashing function which is optimal
> > > for its known set of inputs, and won't ever generate a collision.
> >
> > The problem is that the set of inputs is not know at compile time, only
> > at link time.
>
> As I've said in the cited post, the perfect hash generator would have to
> be invoked at link time, which shouldn't be a big deal.

Assuming you're talking hypothetically and designing a new runtime then,
yes, it's not a big deal.

However, this scheme could not just be dropped into the present system - it
would not work with dynamic linking because once you've hashed a polymorphic
variant tag-name you drop the name so you can't re-hash when you update your
perfect hashing function... unless you can devise a perfect hashing scheme
that hashes all the old keys to their old values and new ones to
non-clashing new values ;o)

Internally, `Foo is indistinguishable from the int 3505894* - so if
caml_hash_variant("Foo") suddenly changes value mid-program then any
previous instances of `Foo in memory cease to be equal to it!

David

* Try:
# (Obj.magic `Foo : int);;
- : int = 3505894
# (Obj.magic 3505894) = `Foo;;
- : bool = true

I don't know whether caml_hash_variant varies between version or even
platform so the actual number may be different on other systems.

Kuba Ober

unread,

Jan 14, 2008, 10:44:35 AM1/14/08

to caml...@yquem.inria.fr

On Monday 14 January 2008, David Allsopp wrote:
> Kuba Ober wrote:
> > On Monday 14 January 2008, Stefan Monnier wrote:
> > > > What I meant was simply that instead of using some fixed hash
> > > > function, one could use a perfect hashing function which is optimal
> > > > for its known set of inputs, and won't ever generate a collision.
> > >
> > > The problem is that the set of inputs is not know at compile time, only
> > > at link time.
> >
> > As I've said in the cited post, the perfect hash generator would have to
> > be invoked at link time, which shouldn't be a big deal.
>
> Assuming you're talking hypothetically and designing a new runtime then,
> yes, it's not a big deal.
>
> However, this scheme could not just be dropped into the present system - it
> would not work with dynamic linking because once you've hashed a
> polymorphic variant tag-name you drop the name so you can't re-hash when
> you update your perfect hashing function...

A trivial solution to that is to keep both, as obviously each time an
equivalent of dlopen() is made, everything has to be rehashed. gperf
is "slightly" memory-hungry, so surely it'd need to be something using a
different algorithm. I'm talking hypothetically, but I also think it's a
weird design decision to use those possibly-colliding hashes. String
sorting/comparison isn't exactly a CPU killer, so couldn't the original names
have been used instead? I admit not to knowing too many details of the
current implementation of course ;(

Cheers, Kuba

Stefan Monnier

unread,

Jan 14, 2008, 10:45:54 AM1/14/08

to caml...@inria.fr

>> > What I meant was simply that instead of using some fixed hash function,
>> > one could use a perfect hashing function which is optimal for its known
>> > set of inputs, and won't ever generate a collision.
>>
>> The problem is that the set of inputs is not know at compile time, only
>> at link time.

> As I've said in the cited post, the perfect hash generator would have to be
> invoked at link time, which shouldn't be a big deal.

That would require postponing the execution of the hash-function to
link-time or run-time. Run-time is clearly undesirable, and link-time
adds yet-more complexity to the linker.

It's not a bad idea, obviously, but AFAICT the linker currently is kept
very simple.

Stefan

David Allsopp

unread,

Jan 14, 2008, 11:04:02 AM1/14/08

to caml...@yquem.inria.fr

Kuba Ober wrote:
> A trivial solution to that is to keep both, as obviously each time an
> equivalent of dlopen() is made, everything has to be rehashed. gperf
> is "slightly" memory-hungry, so surely it'd need to be something using a
> different algorithm. I'm talking hypothetically, but I also think it's a
> weird design decision to use those possibly-colliding hashes.

I agree that it's a bit weird - but the clashes are very rare (and the
function was designed to keep them rare for "normal" usage).

> String sorting/comparison isn't exactly a CPU killer, so couldn't the
> original names have been used instead?

String comparison is much slower than integer comparison... we're talking
about one CPU instruction compared to a for loop! Jon would never use them
again :o) Not to mention the storage overhead of keeping the tag names in
memory - not great if you've got long lists of `YetAnotherTag.

David

Jon Harrop

unread,

Jan 14, 2008, 12:22:12 PM1/14/08

to caml...@yquem.inria.fr

On Monday 14 January 2008 14:44:58 Stefan Monnier wrote:
> > What I meant was simply that instead of using some fixed hash function,
> > one could use a perfect hashing function which is optimal for its known
> > set of inputs, and won't ever generate a collision.
>

> The problem is that the set of inputs is not know at compile time, only
> at link time.

Yes. I think this is another case where OCaml would really benefit from a
symbol table and this is something else that seems much easier to do with JIT
compilation.

Also, what happens if you try to dynamically load two libraries that use
polymorphic variants that clash?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Alain Frisch

unread,

Jan 14, 2008, 12:36:42 PM1/14/08

to Jon Harrop, caml-list

Jon Harrop wrote:
> Also, what happens if you try to dynamically load two libraries that use
> polymorphic variants that clash?

AFAIK, this is ok. The problematic clashes can always be detected at
type-checking time.

-- Alain

Jacques Garrigue

unread,

Jan 14, 2008, 10:37:05 PM1/14/08

to obe...@osu.edu, caml...@yquem.inria.fr

From: Kuba Ober <obe...@osu.edu>

> On Monday 14 January 2008, Stefan Monnier wrote:
> > > What I meant was simply that instead of using some fixed hash function,
> > > one could use a perfect hashing function which is optimal for its known
> > > set of inputs, and won't ever generate a collision.
> >
> > The problem is that the set of inputs is not know at compile time, only
> > at link time.
>
> As I've said in the cited post, the perfect hash generator would have to be
> invoked at link time, which shouldn't be a big deal.

Unfortunately, this would make marshalling between different programs
much more complicated...

Another advantage of knowing the hash function at compile time is
that you can generate efficient code for pattern matching. Since you
already know the ordering of tags, it is easy to generate a decision
tree. I didn't check very recently about efficiency for polymorphic
variants, but the depth of the decision tree is logarithmic in the
number of tags involved in the pattern matching, and if you can keep
it below 3 or 4 (about 10 tags) you can be actually faster than a
jump table.
Another comparison is with the old implementation for method calls.
Originally ocaml used your idea for methods: method hashes were
generated at initialization time. The scheme for dispatch was a two
level array, compressed by reusing buckets so that you don't use too
much memory. This meant actually 3 array accesses for a method call.
The current scheme reuses variant hashes, and implements a simple
dichotomic search, together with an index cache for each call site.
This doesn't look very efficient, but on small method tables, the
search is almost as fast as the old approach, and if the cache hits
this is much faster...

Now concerning the risks of name conflicts. The main point of
polymorphic variants is that there is only a conflict if the two tags
appear in the same type. And logically the type should stay small.
If you want to put all GLenum's inside the same type, then you may
well end up with conflicts. But what LablGL shows is that in practice
only a small number of tags are used together. So if you can partition
your set of tags so that each type has at most 64 tags, then you get
a probability conflict less than 1 per million for each type. This
seems safe enough. But if you have one type with 2000 tags, then the
probability is 1 per thousand. Not that much, but it can happen.
(p(n) is n*n / 2**32)

Jacques Garrigue

Jon Harrop

unread,

Jan 15, 2008, 12:06:31 AM1/15/08

to caml...@yquem.inria.fr

On Tuesday 15 January 2008 03:36:21 Jacques Garrigue wrote:
> Unfortunately, this would make marshalling between different programs
> much more complicated...

Do people marshal polymorphic variants between different programs?

> Another advantage of knowing the hash function at compile time is
> that you can generate efficient code for pattern matching. Since you
> already know the ordering of tags, it is easy to generate a decision
> tree. I didn't check very recently about efficiency for polymorphic
> variants, but the depth of the decision tree is logarithmic in the
> number of tags involved in the pattern matching, and if you can keep
> it below 3 or 4 (about 10 tags) you can be actually faster than a
> jump table.

For 3-16 tags on AMD64, jump tables (ordinary variants) are 2x slower than
decision trees (polymorphic variants) when branches are taken at random.
However, jump tables are consistently up to 2x faster when a single branch is
taken repeatedly. So caching jump tables is more effective at run-time
optimizing pattern matches over ordinary variants than branch prediction is
at optimizing decision trees for pattern matches over polymorphic variants.

So the advantage of a decision tree is probably insignificant on real code
because it will lie between these two extremes.

> Now concerning the risks of name conflicts. The main point of
> polymorphic variants is that there is only a conflict if the two tags
> appear in the same type. And logically the type should stay small.
> If you want to put all GLenum's inside the same type, then you may
> well end up with conflicts. But what LablGL shows is that in practice
> only a small number of tags are used together.

Can LablGL's design support OpenGL extensions?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Jacques Garrigue

unread,

Jan 15, 2008, 4:02:09 AM1/15/08

to j...@ffconsultancy.com, caml...@yquem.inria.fr

From: Jon Harrop <j...@ffconsultancy.com>

> On Tuesday 15 January 2008 03:36:21 Jacques Garrigue wrote:
> > Unfortunately, this would make marshalling between different programs
> > much more complicated...
>
> Do people marshal polymorphic variants between different programs?

Do people marshal data between different programs (or different
versions of the same program)?

> For 3-16 tags on AMD64, jump tables (ordinary variants) are 2x slower than
> decision trees (polymorphic variants) when branches are taken at random.
> However, jump tables are consistently up to 2x faster when a single branch is
> taken repeatedly. So caching jump tables is more effective at run-time
> optimizing pattern matches over ordinary variants than branch prediction is
> at optimizing decision trees for pattern matches over polymorphic variants.
>
> So the advantage of a decision tree is probably insignificant on real code
> because it will lie between these two extremes.

Since the goal was never to be faster than ordinary variants, but just
obtain comparable speed, this seems good :-)

> > Now concerning the risks of name conflicts. The main point of
> > polymorphic variants is that there is only a conflict if the two tags
> > appear in the same type. And logically the type should stay small.
> > If you want to put all GLenum's inside the same type, then you may
> > well end up with conflicts. But what LablGL shows is that in practice
> > only a small number of tags are used together.
>
> Can LablGL's design support OpenGL extensions?

I'm not sure what this means.
Since LablGL was coded by hand, adding extensions would mean modifying
it.
One might want to add a way to detect whether an extension is
available or not, but making it static does not seem a good idea: one
wouldn't even be able to compile code using an extension that is not
available.
Also, one might want to make code generation automatic, particularly
for C wrappers, to allow adding cases to functions easily. This should
be doable, but there is no infrastructure for that currently
(using CPP macros was simpler to start with...)

Jacques Garrigue

Jon Harrop

unread,

Jan 15, 2008, 1:25:13 PM1/15/08

to caml...@yquem.inria.fr

On Tuesday 15 January 2008 09:01:42 Jacques Garrigue wrote:
> From: Jon Harrop <j...@ffconsultancy.com>
> > On Tuesday 15 January 2008 03:36:21 Jacques Garrigue wrote:
> > > Unfortunately, this would make marshalling between different programs
> > > much more complicated...
> >
> > Do people marshal polymorphic variants between different programs?
>
> Do people marshal data between different programs (or different
> versions of the same program)?

I suspect OCaml's marshalling is used almost entirely between same versions of
the same programs.

In particular, I was advised against marshalling data between different
versions of the same program because this is unsafe (not just type safety but
the format used by Marshal is not ossified).

> > So the advantage of a decision tree is probably insignificant on real
> > code because it will lie between these two extremes.
>
> Since the goal was never to be faster than ordinary variants, but just
> obtain comparable speed, this seems good :-)

Yes. This would probably also work ok if you used a symbol table to store
exact identifier names rather than just a hash. The symbol's index in the
table would serve the same purpose as the hash.

> > > Now concerning the risks of name conflicts. The main point of
> > > polymorphic variants is that there is only a conflict if the two tags
> > > appear in the same type. And logically the type should stay small.
> > > If you want to put all GLenum's inside the same type, then you may
> > > well end up with conflicts. But what LablGL shows is that in practice
> > > only a small number of tags are used together.
> >
> > Can LablGL's design support OpenGL extensions?
>
> I'm not sure what this means.

OpenGL has an extension mechanism that can be queried at run-time. If a given
extension is available then you can do things that you could not do before,
such as pass a GLenum to a function that might not have accepted it without
the extension.

> Since LablGL was coded by hand, adding extensions would mean modifying
> it.

Exactly, that is a limitation of LablGL's design and, therefore, I think it is
was quite wrong of you to claim "LablGL shows is that in practice only a
small number of tags are used together" when LablGL's use of small, closed
sum types is actually a design limitation that would not be there if it
supported all of OpenGL, i.e. the extension mechanism.

Incidentally, Xavier made a statement based upon what appears to me to be a
similar logical error in the CUFP notes from last year that I read recently:

"On the other hand, certain features seem somewhat unsurprisingly to be
unimportant to industrial users. GUI toolkits are not an issue, because GUIs
tend to be built using more mainstream tools; it seems that different
competencies are involved in Caml and GUI development and companies "don't
want to squander their precious Caml expertise aligning pixels". Rich
libraries don't seem to matter in general; presumably companies are happy to
develop these in-house. And no-one wants yet another IDE; the applications of
interest are usually built using a variety of languages and tools anyway, so
consistency of development environment is a lost cause."
- http://cufp.galois.com/CUFP-2007-Report.pdf (page 3)

Xavier appears to have taken the biased sample of industrialists who already
use OCaml despite its limitations and has drawn the conclusion that these
limitations are not important to industrialists. I was really horrified to
see this because, in my experience, companies are turning away from OCaml in
droves because of exactly the limitations Xavier enumerated and I for one
would dearly love to see them fixed.

OCaml will continue to go from strength to strength regardless but its uptake
would be vastly faster if these problems are addressed. To take them point by
point:

GUIs are incredibly important (LablGTK is the world's favorite OCaml
library!) and tens of thousands of OCaml programmers are crying out for
proper LablGTK documentation as a first priority, many of whom are in
industry.

Rich libraries are incredibly important and OCaml has the potential to
become a hugely successful commercial platform where people can buy and sell
cross-platform libraries but OCaml needs support for shared run-time DLLs (or
something equivalent) this before this can happen.

An easy-to-use IDE would be an excellent way to kick-start people learning
OCaml even if an industrial-strength IDE is intractable.

> Also, one might want to make code generation automatic, particularly
> for C wrappers, to allow adding cases to functions easily. This should
> be doable, but there is no infrastructure for that currently
> (using CPP macros was simpler to start with...)

Yes. A better FFI could also be enormously beneficial. Improving upon OCaml's
FFI is one of the most alluring aspects of a reimplementation on LLVM, IMHO.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Gerd Stolpmann

unread,

Jan 15, 2008, 2:20:33 PM1/15/08

to Jon Harrop, caml...@yquem.inria.fr

Jon Harrop wrote:
> Incidentally, Xavier made a statement based upon what appears to me to be a
> similar logical error in the CUFP notes from last year that I read recently:
>
> "On the other hand, certain features seem somewhat unsurprisingly to be
> unimportant to industrial users. GUI toolkits are not an issue, because GUIs
> tend to be built using more mainstream tools; it seems that different
> competencies are involved in Caml and GUI development and companies "don't
> want to squander their precious Caml expertise aligning pixels". Rich
> libraries don't seem to matter in general; presumably companies are happy to
> develop these in-house. And no-one wants yet another IDE; the applications of
> interest are usually built using a variety of languages and tools anyway, so
> consistency of development environment is a lost cause."
> - http://cufp.galois.com/CUFP-2007-Report.pdf (page 3)

An interesting thesis, right? Although I wouldn't get that far, there is
some truth in it. The point, IMHO, is that OCaml will never replace
other languages in the sense that a company who uses language X for
years in product Y rewrites the code in OCaml. For what reason? The
company would run into big educational problems (learning a new
environment), would have high initial costs, and it is questionable
whether the result is better. Of course, for rewriting existing software
the company would profit from GUIs, from rich libraries etc. But I think
this does not happen.

What I see, however, is that OCaml is used where new software is
developed, in ambitious projects that start from scratch. It is simply a
fact that GUIs are not crucial in these areas (at least for the
companies I know). GUIs are seen as standard tools where nothing new
happens where OCaml could shine. If you need one, you develop it in one
of the mainstream languages.

IDEs aren't interesting right now because OCaml is mainly used by
(computer & related) scientists (and I include scientists working for
companies outside academia). IDEs are nice for beginners and for people
who do not want to know what's happening inside. They are not
interesting for companies that invent completely new types of products,
because they've hired experts that can live without (and want to live
without).

> Xavier appears to have taken the biased sample of industrialists who already
> use OCaml despite its limitations and has drawn the conclusion that these
> limitations are not important to industrialists. I was really horrified to
> see this because, in my experience, companies are turning away from OCaml in
> droves because of exactly the limitations Xavier enumerated and I for one
> would dearly love to see them fixed.

Which companies?

I fully understand that OCaml is not well-suited for the average
company. But it is not because of missing GUIs and IDEs, but because the
language itself is too ambitious. Sorry to say that, but this is not the
mainstream and it will never be.

(I have a good friend who works for an average company, so I know what
I'm talking of. They program business apps for a commercial platform
from CA. A horrible language, but they can manage it. They are experts
for the models they use, and simply take a platform from industry.)

> OCaml will continue to go from strength to strength regardless but its uptake
> would be vastly faster if these problems are addressed. To take them point by
> point:
>

> . GUIs are incredibly important (LablGTK is the world's favorite OCaml

> library!) and tens of thousands of OCaml programmers are crying out for
> proper LablGTK documentation as a first priority, many of whom are in
> industry.

See this as opportunity for your next book :-)

GTK is already poorly documented, so this is not only the problem of the
LablGTK creators. Nevertheless, GTK is widely used. I don't think it's a
real problem.

> . Rich libraries are incredibly important and OCaml has the potential to

> become a hugely successful commercial platform where people can buy and sell
> cross-platform libraries but OCaml needs support for shared run-time DLLs (or
> something equivalent) this before this can happen.

Do you dream or what?

I don't think that selling libraries in binary form is that important...
It is difficult anyway to do that, and why do you expect you could be
successful in a niche language? As customer I would demand to get the
source code - to lower the risks of the investment into a small
platform.

> . An easy-to-use IDE would be an excellent way to kick-start people learning

> OCaml even if an industrial-strength IDE is intractable.
>
> > Also, one might want to make code generation automatic, particularly
> > for C wrappers, to allow adding cases to functions easily. This should
> > be doable, but there is no infrastructure for that currently
> > (using CPP macros was simpler to start with...)
>
> Yes. A better FFI could also be enormously beneficial. Improving upon OCaml's
> FFI is one of the most alluring aspects of a reimplementation on LLVM, IMHO.

A general question to you: When you are complaining about so many
aspects of OCaml, why don't you invest time & money to fix them? We
would all be very thankful.

Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

Jon Harrop

unread,

Jan 15, 2008, 5:12:16 PM1/15/08

to Gerd Stolpmann, caml...@yquem.inria.fr

I believe many more companies would migrate to OCaml if it had well-documented
GUI APIs and rich libraries. Indeed, Microsoft are gambling on people
migrating to F# in exactly the same way.

> What I see, however, is that OCaml is used where new software is
> developed, in ambitious projects that start from scratch. It is simply a
> fact that GUIs are not crucial in these areas (at least for the
> companies I know).

But the companies you know were already self-selected to be the ones who do
not care about OCaml's limitations, so it is a biased sample?

> GUIs are seen as standard tools where nothing new happens where OCaml could
> shine.

I have no doubt that OCaml would shine in GUIs just as it does elsewhere.

> If you need one, you develop it in one of the mainstream languages.

Actually I would either choose F# on Windows or give up on any other OS.

> IDEs aren't interesting right now because OCaml is mainly used by
> (computer & related) scientists (and I include scientists working for
> companies outside academia).

Many of the world's most sophisticated IDEs are targetted solely at technical
users. Look at Mathematica's notebook interface, for example. I believe that
is a great example to aspire to.

> IDEs are nice for beginners and for people
> who do not want to know what's happening inside. They are not
> interesting for companies that invent completely new types of products,
> because they've hired experts that can live without (and want to live
> without).

I couldn't disagree more. Pharmaceuticals are a trillion dollar industry where
many scientists would benefit enormously from being able to use a tool like
OCaml without knowing anything about how it works in order to create their
next generation products (drugs). The same is true of most industries where
scientists and engineers work and there are many such industries and there
are extremely profitable.

> > Xavier appears to have taken the biased sample of industrialists who
> > already use OCaml despite its limitations and has drawn the conclusion
> > that these limitations are not important to industrialists. I was really
> > horrified to see this because, in my experience, companies are turning
> > away from OCaml in droves because of exactly the limitations Xavier
> > enumerated and I for one would dearly love to see them fixed.
>
> Which companies?

General Electric, Microsoft, Wolfram Research and various bioinformatics
institutes for example.

Look at General Electric. They build some of the world's most sophisticated
medical scanners and that large-scale embedded market is ideal for using
languages like OCaml for its high-performance numerics because you have
complete control over the environment. However, they desperately need GUI
toolkits to provide a front-end for users.

I'd like to know what Alex Barretta makes of this, for example. His glass
cutters must have the same characteristics in this respect...

> I fully understand that OCaml is not well-suited for the average
> company. But it is not because of missing GUIs and IDEs, but because the
> language itself is too ambitious. Sorry to say that, but this is not the
> mainstream and it will never be.

I still think OCaml has the best chance of any FPL to become a mainstream tool
in technical computing.

Indeed, I recently tried to quantify how far OCaml has already come and I
believe it is already as popular as C# among technical users, for example.
That is quite an achievement!

> (I have a good friend who works for an average company, so I know what
> I'm talking of. They program business apps for a commercial platform
> from CA. A horrible language, but they can manage it. They are experts
> for the models they use, and simply take a platform from industry.)

Yes. I do not believe OCaml will make significant inroads into displacing
COBOL and relatives but there are a lot of other big opportunities out there
for such a language.

> > OCaml will continue to go from strength to strength regardless but its
> > uptake would be vastly faster if these problems are addressed. To take
> > them point by point:
> >
> > . GUIs are incredibly important (LablGTK is the world's favorite OCaml
> > library!) and tens of thousands of OCaml programmers are crying out for
> > proper LablGTK documentation as a first priority, many of whom are in
> > industry.
>
> See this as opportunity for your next book :-)

Indeed. Even after the announcement that Microsoft are productizing F#, OCaml
for Scientists continues to be our biggest earning product. Consequently, I
am very tempted to write a "sequel" that covers many of the important aspects
of the language that I did not cover in the original, including GUI
programming, XML, parallelism and so forth. If anyone has ideas for subjects
they would like to see covered, please e-mail me!

> GTK is already poorly documented, so this is not only the problem of the
> LablGTK creators. Nevertheless, GTK is widely used. I don't think it's a
> real problem.

Yes. I'm really not sure what the best course of action would be here. Would
Qt bindings be preferable? Is it worth the hassle? How long would it be
before they reached the maturity of GTK?

I think we would really need more high-profile open source programs with
hundreds of thousands of users testing the bindings (as GTK has had) before
you could really gamble on it.

> > . Rich libraries are incredibly important and OCaml has the potential to
> > become a hugely successful commercial platform where people can buy and
> > sell cross-platform libraries but OCaml needs support for shared run-time
> > DLLs (or something equivalent) this before this can happen.
>
> Do you dream or what?

One man's reality is another man's dream. :-)

> I don't think that selling libraries in binary form is that important...

If it were possible then it would be important to me because I could earn a
living from it. I'm sure the same is true for many other people.

> It is difficult anyway to do that, and why do you expect you could be
> successful in a niche language?

Because I already am. :-)

> As customer I would demand to get the source code - to lower the risks of
> the investment into a small platform.

Nobody ever got fired for buying IBM.

Historically, we've made a lot more money from sales of binaries than from
sales of source code. Consequently, I would be more than willing to gamble on
selling shared run-time DLLs for OCaml users if it were possible.

> > Yes. A better FFI could also be enormously beneficial. Improving upon
> > OCaml's FFI is one of the most alluring aspects of a reimplementation on
> > LLVM, IMHO.
>
> A general question to you: When you are complaining about so many
> aspects of OCaml, why don't you invest time & money to fix them?

An excellent idea!

So I wrote to Xavier Leroy and asked about contributing to INRIA's OCaml
distribution. Xavier explained that French copyright law makes it
prohibitively difficult for him to include my code contributions so this will
never be possible. The best I could think of was to suggest that they make it
possible for users to pay to get certain bugs fixed or functionality
implemented. I'm not sure that will happen though.

I wrote to Pierre Weis and asked what the likelihood of getting some tweaks
into the language was. He said that it is unlikely I could even get a "try ..
finally" construct put in.

So there's no way I can improve INRIA's OCaml distribution. Next, I thought
perhaps a complete fork of OCaml would be a viable alternative. This is
complicated by OCaml's license which requires variants to be distributed with
the core sources intact and everything else as patches to it. This is not an
insurmountable problem, of course, you just distribute the core and a giant
autogenerated patch instead. So I asked Sylvain about getting Debian to adopt
the fork rather than INRIA's upstream. He said this will almost certainly not
happen.

So I can't develop or contribute to INRIA's OCaml implementation and I can't
fork it without starting with zero users. What about reimplementing it?

So I wondered what I could build upon that would make this as painless as
possible. This led me to the Smoke VM, Mono, the JVM and LLVM. I enumerated
each of these in turn and came to the conclusion that LLVM is preferable, not
least because several other people had already drawn the same conclusion and
started work on similar projects themselves.

That's when I wrote my 100LOC test program calling LLVM from OCaml. Since
then, Gordon has been working hard on the OCaml bindings and example
programs, which are now nothing short of incredible. Dozens of people have
e-mailed me expressing their desire to contribute to such an effort.

This will take time, of course, but I believe it is the future of the OCaml
language.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Jacques GARRIGUE

unread,

Jan 15, 2008, 10:27:14 PM1/15/08

to j...@ffconsultancy.com, caml...@yquem.inria.fr

> > From: Jon Harrop <j...@ffconsultancy.com>
> > > On Tuesday 15 January 2008 03:36:21 Jacques Garrigue wrote:
> > > > Unfortunately, this would make marshalling between different programs
> > > > much more complicated...
> > >
> > > Do people marshal polymorphic variants between different programs?
> >
> > Do people marshal data between different programs (or different
> > versions of the same program)?
>
> I suspect OCaml's marshalling is used almost entirely between same
> versions of the same programs.

I'm not so sure. Actually, I do it all the time when recompiling
ocaml. Otherwise I would have to bootstrap after any modification in
the compiler. Fortunately, this is not the case, and one only needs to
bootstrap when the data structures are modified (or semantics changed).

> In particular, I was advised against marshalling data between different
> versions of the same program because this is unsafe (not just type
> safety but the format used by Marshal is not ossified).

Marshalling data between different versions of the same program is ok,
but you're on your own concerning compatibility. You must be careful
concerning changes in ocaml versions, but I don't remember any change
in representation, and if one were to happen it would be amply
documented.

> > > So the advantage of a decision tree is probably insignificant on real
> > > code because it will lie between these two extremes.
> >
> > Since the goal was never to be faster than ordinary variants, but just
> > obtain comparable speed, this seems good :-)
>
> Yes. This would probably also work ok if you used a symbol table to store
> exact identifier names rather than just a hash. The symbol's index in the
> table would serve the same purpose as the hash.

No, because in order to produce efficient code you have to know the
hash at compile time, and in your scheme you only know it at link time
or runtime.

> OpenGL has an extension mechanism that can be queried at
> run-time. If a given extension is available then you can do things
> that you could not do before, such as pass a GLenum to a function
> that might not have accepted it without the extension.
>
> > Since LablGL was coded by hand, adding extensions would mean modifying
> > it.
>
> Exactly, that is a limitation of LablGL's design and, therefore, I think it is
> was quite wrong of you to claim "LablGL shows is that in practice only a
> small number of tags are used together" when LablGL's use of small, closed
> sum types is actually a design limitation that would not be there if it
> supported all of OpenGL, i.e. the extension mechanism.

I don't see your point. Even with the extension mechanism, extra
GLenum's are still only allowed for some specific functions. So you
can still define some subsets of GLenum that should be conflict free,
you don't need to prohibit all conflicts in GLenum. This is what I
mean by lablGL's design.

The problem with lablGL and extensions is the implementation, not the
API design. What we would need was some kind of AOP approach to the
stubs, where you could describe what functions are extended by which
extensions.

> Incidentally, Xavier made a statement based upon what appears to me to be a
> similar logical error in the CUFP notes from last year that I read recently:
>
> "On the other hand, certain features seem somewhat unsurprisingly to be
> unimportant to industrial users. GUI toolkits are not an issue, because GUIs
> tend to be built using more mainstream tools; it seems that different
> competencies are involved in Caml and GUI development and companies "don't
> want to squander their precious Caml expertise aligning pixels". Rich
> libraries don't seem to matter in general; presumably companies are happy to
> develop these in-house. And no-one wants yet another IDE; the applications of
> interest are usually built using a variety of languages and tools anyway, so
> consistency of development environment is a lost cause."
> - http://cufp.galois.com/CUFP-2007-Report.pdf (page 3)
>
> Xavier appears to have taken the biased sample of industrialists who already
> use OCaml despite its limitations and has drawn the conclusion that these
> limitations are not important to industrialists. I was really horrified to
> see this because, in my experience, companies are turning away from OCaml in
> droves because of exactly the limitations Xavier enumerated and I for one
> would dearly love to see them fixed.

I don't agree with all these points (otherwise I wouldn't be
maintaining a GUI toolkit), but there is some truth in it. I actually
got similar reactions from industry in Japan, if for different
reasons: they don't need the GUI, because they prefer to do it
themselves, to differentiate from others. People doing in-house
programming have a different point of view. I remember somebody from a
bank who told me he wrote a program to be used in all their branches
using labltk. In this case you don't need anything flashy, it just has
to be functional (err, to work).

Concerning IDEs, since eclipse is more and more used, good support
for it seems a must. But you won't have me use anything other than
emacs and ocamlbrowser!

> > Also, one might want to make code generation automatic, particularly
> > for C wrappers, to allow adding cases to functions easily. This should
> > be doable, but there is no infrastructure for that currently
> > (using CPP macros was simpler to start with...)
>
> Yes. A better FFI could also be enormously beneficial. Improving
> upon OCaml's FFI is one of the most alluring aspects of a
> reimplementation on LLVM, IMHO.

The current FFI works well, but it's true that the way it cuts the
work in small pieces (stubs in C on one side, externals on the other)
makes it difficult to automate its use. In my experience it is very
flexible, but badly lacks abstraction.

Jacques Garrigue

Yaron Minsky

unread,

Jan 15, 2008, 10:35:11 PM1/15/08

to Jacques GARRIGUE, caml...@yquem.inria.fr

On Jan 15, 2008 10:26 PM, Jacques GARRIGUE <garr...@math.nagoya-u.ac.jp>
wrote:

>
> I'm not so sure. Actually, I do it all the time when recompiling
> ocaml. Otherwise I would have to bootstrap after any modification in
> the compiler. Fortunately, this is not the case, and one only needs to
> bootstrap when the data structures are modified (or semantics changed).
>

I agree. We quite often use marshal to share data between different
programs that share a common library.

> I don't agree with all these points (otherwise I wouldn't be
> maintaining a GUI toolkit), but there is some truth in it. I actually
> got similar reactions from industry in Japan, if for different
> reasons: they don't need the GUI, because they prefer to do it
> themselves, to differentiate from others. People doing in-house
> programming have a different point of view. I remember somebody from a
> bank who told me he wrote a program to be used in all their branches
> using labltk. In this case you don't need anything flashy, it just has
> to be functional (err, to work).
>

We started out doing entirely back-end processes using OCaml, but as time
went on, we started building more and more GUIs. The fact that OCaml has
lablgtk makes it much more useful for us, without a doubt. The main reason
we like to do GUIs in OCaml is that we see a lot of value in sharing type
definitions and code between the GUIs and the back-end services they connect
to.

y

Jon Harrop

unread,

Jan 15, 2008, 10:49:38 PM1/15/08

to caml...@yquem.inria.fr

On Wednesday 16 January 2008 03:34:54 Yaron Minsky wrote:
> We started out doing entirely back-end processes using OCaml, but as time
> went on, we started building more and more GUIs. The fact that OCaml has
> lablgtk makes it much more useful for us, without a doubt. The main reason
> we like to do GUIs in OCaml is that we see a lot of value in sharing type
> definitions and code between the GUIs and the back-end services they
> connect to.

Yes, this is exactly the kind of thing I was referring to. I think a lot of
people want simple GUIs that are perfectly feasible to construct entirely in
OCaml and the overhead of splitting a project across languages is much
higher. Fortunately, LablGTK makes this feasible in OCaml.

There must be some reason why LablGTK is so popular! ;-)

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Jon Harrop

unread,

Jan 15, 2008, 11:47:33 PM1/15/08

to caml...@yquem.inria.fr

On Wednesday 16 January 2008 03:26:27 Jacques GARRIGUE wrote:
> > I suspect OCaml's marshalling is used almost entirely between same
> > versions of the same programs.
>
> I'm not so sure. Actually, I do it all the time when recompiling
> ocaml. Otherwise I would have to bootstrap after any modification in
> the compiler. Fortunately, this is not the case, and one only needs to
> bootstrap when the data structures are modified (or semantics changed).

Interesting.

> > Yes. This would probably also work ok if you used a symbol table to store
> > exact identifier names rather than just a hash. The symbol's index in the
> > table would serve the same purpose as the hash.
>
> No, because in order to produce efficient code you have to know the
> hash at compile time, and in your scheme you only know it at link time
> or runtime.

You could still use the same hashing scheme but you could fall back to linear
search of symbols by name in the event of a clash.

> > Exactly, that is a limitation of LablGL's design and, therefore, I think
> > it is was quite wrong of you to claim "LablGL shows is that in practice
> > only a small number of tags are used together" when LablGL's use of
> > small, closed sum types is actually a design limitation that would not be
> > there if it supported all of OpenGL, i.e. the extension mechanism.
>
> I don't see your point. Even with the extension mechanism, extra
> GLenum's are still only allowed for some specific functions. So you
> can still define some subsets of GLenum that should be conflict free,
> you don't need to prohibit all conflicts in GLenum. This is what I
> mean by lablGL's design.

Provided you can enumerate which tags can be used with which functions
including the presence of extensions, yes. I suppose that would be possible
and you could end up with many small sets of tags and much less chance of
clashing.

> The problem with lablGL and extensions is the implementation, not the
> API design. What we would need was some kind of AOP approach to the
> stubs, where you could describe what functions are extended by which
> extensions.

I think it would be better to remove all complexity from the C stubs, have
them all autogenerated and then write a higher-level API on top entirely in
OCaml. GLCaml is the start of a good foundation for OpenGL, IMHO. I think it
would be very productive to merge the projects at some point.

> ...

> I don't agree with all these points (otherwise I wouldn't be
> maintaining a GUI toolkit), but there is some truth in it. I actually
> got similar reactions from industry in Japan, if for different
> reasons: they don't need the GUI, because they prefer to do it
> themselves, to differentiate from others. People doing in-house
> programming have a different point of view. I remember somebody from a
> bank who told me he wrote a program to be used in all their branches
> using labltk. In this case you don't need anything flashy, it just has
> to be functional (err, to work).
>
> Concerning IDEs, since eclipse is more and more used, good support
> for it seems a must. But you won't have me use anything other than
> emacs and ocamlbrowser!

Visual Studio's Intellisense makes GUI programming much easier in F# than
ocamlbrowser+ocaml. I think the single most productive thing that could be
added to ocamlbrowser is hyperlinks from the quoted definitions to all
related definitions.

Now that I come to think of it, you can just run ocamldoc on the LablGTK
sources and use a browser to do that. Is the ocamldoc HTML output for the
latest LablGTK2 on the web anywhere?

> > Yes. A better FFI could also be enormously beneficial. Improving
> > upon OCaml's FFI is one of the most alluring aspects of a
> > reimplementation on LLVM, IMHO.
>
> The current FFI works well, but it's true that the way it cuts the
> work in small pieces (stubs in C on one side, externals on the other)
> makes it difficult to automate its use. In my experience it is very
> flexible, but badly lacks abstraction.

What sorts of abstractions would you like?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Richard Jones

unread,

Jan 16, 2008, 5:51:10 AM1/16/08

to caml...@inria.fr

On Tue, Jan 15, 2008 at 06:17:32PM +0000, Jon Harrop wrote:
> . GUIs are incredibly important (LablGTK is the world's favorite OCaml

> library!) and tens of thousands of OCaml programmers are crying out for
> proper LablGTK documentation as a first priority, many of whom are in
> industry.

GTK itself is horribly undocumented. However SooHyoung Oh has done an
excellent job translating the C-based GTK 2.0 tutorial into OCaml,
here:

http://plus.kaist.ac.kr/~shoh/ocaml/lablgtk2/lablgtk2-tutorial/

> . Rich libraries are incredibly important and OCaml has the

> potential to become a hugely successful commercial platform where
> people can buy and sell cross-platform libraries but OCaml needs
> support for shared run-time DLLs (or something equivalent) this
> before this can happen.

My requirement is similar to this: (1) to be able to take OCaml
libraries and automatically generate C bindings from them (ie.
translate the OCaml .mli file into a .h file, and generate stubs).
(2) to be able to ship the library as a DLL / .so file. Efficiency is
not so much of a concern for me - eg. if the generated stubs worked by
copying all strings passed, that would be OK for my requirements.

I actually did a little bit of work on a stub/wrapper generator, and I
think it is possible to implement it, especially now that ocamlopt can
generate PIC.

Rich.

--
Richard Jones
Red Hat

Kuba Ober

unread,

Jan 16, 2008, 8:48:34 AM1/16/08

to caml...@yquem.inria.fr

In fact, after some initial thinking and looking around it seems that the
only "sane" GUI for OCaml, at this time, is Qt, but someone has to write a
machine translator to port it from C++ to OCaml. Qt is reasonably well
designed, and has the richest feature set of all GUI toolkits, even if you
combined all the competition and treated it as one "other" toolkit.

Using Qt with some machine (or not!) generated bindings is just a huge
waste -- it's a nice, clean design, which has recently been tweaked for
performance (some Qt4 apps start in 50% of the time just by having been
ported to Qt4 from Qt3).

Cheers, Kuba

Dario Teixeira

unread,

Jan 16, 2008, 10:03:11 AM1/16/08

to Kuba Ober, caml...@yquem.inria.fr

Hi,

> In fact, after some initial thinking and looking around it seems that the
> only "sane" GUI for OCaml, at this time, is Qt, but someone has to write a
> machine translator to port it from C++ to OCaml. Qt is reasonably well
> designed, and has the richest feature set of all GUI toolkits, even if you
> combined all the competition and treated it as one "other" toolkit.
>
> Using Qt with some machine (or not!) generated bindings is just a huge
> waste -- it's a nice, clean design, which has recently been tweaked for
> performance (some Qt4 apps start in 50% of the time just by having been
> ported to Qt4 from Qt3).

I'm inclined to agree. I would even go as far as saying that the lack of
Qt bindings is perhaps the biggest open sore as far as Ocaml library support
is concerned.

The guys at Trolltech, however, seem quite keen on having Qt on as many
platforms as possible (Qt-Jambi, which brings Qt to the JVM is one of their
products). Couldn't this whole auto-generation of bindings be made easier
if they got involved? I am sure they already have plenty of tools in
place to facilitate it. Even if they were not to commit actual manpower
to the effort, they might still be able to help.

And incidentally, the afore mentioned Qt-Jambi, together with the Ocamljava
project might provide a last-resort solution in the absence of native bindings.
Another possibility might be the Qyoto/Kimono project (which brings Qt/KDE
into .net) together with the OcamlIL project (if it's still alive). You would
then use Mono to run Ocaml programmes.

cheers,
Dario

__________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

Jon Harrop

unread,

Jan 16, 2008, 2:07:53 PM1/16/08

to caml...@yquem.inria.fr

On Wednesday 16 January 2008 15:02:54 Dario Teixeira wrote:
> I'm inclined to agree. I would even go as far as saying that the lack of
> Qt bindings is perhaps the biggest open sore as far as Ocaml library
> support is concerned.

As I understand it, OCaml's FFI makes writing Qt bindings an enormous
undertaking which is why we don't have any.

I'm happy with GTK for now and would rather see OpenGL 2 bindings instead.

> The guys at Trolltech, however, seem quite keen on having Qt on as many
> platforms as possible (Qt-Jambi, which brings Qt to the JVM is one of their
> products). Couldn't this whole auto-generation of bindings be made easier
> if they got involved? I am sure they already have plenty of tools in
> place to facilitate it. Even if they were not to commit actual manpower
> to the effort, they might still be able to help.

I found TrollTech's customer support awful as a customer so I very much doubt
they will go out of their way to help a really obscure virgin corner of the
Qt market. That was a few years ago though.

> And incidentally, the afore mentioned Qt-Jambi, together with the Ocamljava
> project might provide a last-resort solution in the absence of native
> bindings. Another possibility might be the Qyoto/Kimono project (which
> brings Qt/KDE into .net) together with the OcamlIL project (if it's still
> alive). You would then use Mono to run Ocaml programmes.

I evaluated various such options recently and decided that Mono is truly awful
(very poorly written, unreliable and slow) and LLVM is absolutely superb
(extremely well-written C++ with complete native OCaml bindings!). Moreover,
Mono appears to have no future in its current form whereas LLVM has serious
backers and is improving at a tremendous rate.

Even if you don't want to implement a whole new language or backend, using
LLVM's JIT compilation for code generation has great potential for OCaml,
e.g. regexps. I highly recommend giving it a play!

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Kuba Ober

unread,

Jan 17, 2008, 8:09:41 AM1/17/08

to caml...@yquem.inria.fr

> > Using Qt with some machine (or not!) generated bindings is just a huge
> > waste -- it's a nice, clean design, which has recently been tweaked for
> > performance (some Qt4 apps start in 50% of the time just by having been
> > ported to Qt4 from Qt3).
>
> I'm inclined to agree. I would even go as far as saying that the lack of
> Qt bindings is perhaps the biggest open sore as far as Ocaml library
> support is concerned.
>
> The guys at Trolltech, however, seem quite keen on having Qt on as many
> platforms as possible (Qt-Jambi, which brings Qt to the JVM is one of their
> products). Couldn't this whole auto-generation of bindings be made easier
> if they got involved?

At some point, in order to "naturally" use Qt and benefit from its
performance, the machine translation will be easier than any bindings you
could think of. IMHO, of course. Qt's code itself will become smaller in
Ocaml - I've hacked at porting QObject, and so far I've got the line count to
50% of Trolltech's. And I'm a total noob.

Cheers, Kuba

Kuba Ober

unread,

Jan 18, 2008, 12:19:39 AM1/18/08

to caml...@yquem.inria.fr

Yeah, I wouldn't be using Qt if there was no source code for it. Quite a few
times over the years I had to tweak away at the implementation details.

In fact, I would never specify *any* mission-critical libraries or frameworks
if they didn't come with full sources.

Cheers, Kuba

Kuba Ober

unread,

Jan 18, 2008, 12:34:11 AM1/18/08

to caml...@yquem.inria.fr

Making bindings for Qt is basically putting a beautiful architecture to waste.
Qt's architecture is good enough to be actually machine-translated into OCaml.
This would be an involved project, but not impossible.

Using Qt from OCaml via a set of bindings can be a short-term stop-gap measure
for trivial applications, I would never deploy a Qt application written in
OCaml if the application was any bigger on the GUI side than a couple simple
dialog boxes. There is a binding generator (forgot its name) which can
generate OCaml bindings for Qt, but you have to give it a list of
classes/methods/signals/slots to generate bindings for. So perfect for
trivial applications, but not much else.

Qt, when you start to think of its API in how it may look in OCaml, becomes
pretty cool, and I'm sure there are a few improvements to it you can make to
leverage the power given to you by OCaml, once you loose the shackles of C++.

Cheers, Kuba

Kuba Ober

unread,

Jan 18, 2008, 12:40:09 AM1/18/08

to caml...@yquem.inria.fr

> > > . Rich libraries are incredibly important and OCaml has the potential
> > > to become a hugely successful commercial platform where people can buy
> > > and sell cross-platform libraries but OCaml needs support for shared
> > > run-time DLLs (or something equivalent) this before this can happen.
> >
> > Do you dream or what?
> >
> > I don't think that selling libraries in binary form is that important...
> > It is difficult anyway to do that, and why do you expect you could be
> > successful in a niche language? As customer I would demand to get the
> > source code - to lower the risks of the investment into a small
> > platform.
>
> Yeah, I wouldn't be using Qt if there was no source code for it. Quite a
> few times over the years I had to tweak away at the implementation details.
>
> In fact, I would never specify *any* mission-critical libraries or
> frameworks if they didn't come with full sources.

In other words, Jon: if you tried to sell me source-code-less libraries, I
simply wouldn't buy, and no amount of persuading could change that. I'd still
keep buying your books, though :)

Just look at what happened to scores of Delphi and OCX controls which became
abandonware, and how much of this stuff eventually had to be simply
reimplemented by the same people who originally bought the controls not to
implement them in the first place. I detest closed-source controls and
libraries, I simply don't use them. The whole idea of "here's the OCX and a
typelib, and a help file, take it or leave it" is preposterous. Well, maybe
it's fine if you're being contracted for a one-off job where the payee has no
clue, and your morals don't seem to interfere -- sure then you can reuse all
the source-less crap you want. But as a part of a long term strategy? No way.

If there was one decision Trolls made right, it was to include the source
code.