Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Caml-list] Compiler feature - useful or not?

6 views
Skip to first unread message

Edgar Friendly

unread,
Nov 13, 2007, 6:42:33 PM11/13/07
to caml-list
When one writes

type row = int
type col = int

This allows one to use the type names "row" and "col" as synonyms of
int. But it doesn't prevent one from using a value of type row in the
place of a value of type col. OCaml allows us to enforce row as
distinct from int two ways:

1) Variants:
type row = Row of int
type col = Col of int

Downside: unnecessary boxing and tagging
conversion from row -> int: (fun r -> match r with Row i -> i)
conversion from int -> row: (fun i -> Row i)

2) Functors:
module type RowCol =
sig
type row
val int_of_row : row -> int
val row_of_int : int -> row
type col
val int_of_col : col -> int
val col_of_int : int -> col
end

module Main = functor (RC: RowCol) -> struct
(* REST OF PROGRAM HERE *)
end

Any code using rows and cols could be written to take a module as a
parameter, and because of the abstraction granted when doing so, type
safety is ensured.

Downside: functor overhead, misuse of functors, need to write
boilerplate conversion functions
conversion from row -> int, int -> row: provided by RowCol boilerplate

IS THE FOLLOWING POSSIBLE:
Modify the type system such that one can declare

type row = new int
type col = new int

Row and col would thus become distinct from int, and require explicit
casting/coercion (2 :> row). There would be no runtime overhead for use
of these types, only bookkeeping overhead at compilation.

Downside: compiler changes (hopefully not too extensive)
conversion from row -> int: (fun r -> (r :> int)) (* might need (r : row
:> int) if it's not already inferred *)
conversion from int -> row: (fun i -> (i :> row))

Thoughts? Do any of you use Variants or Functors to do this now? Do
you find this style of typing useful?

E.

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Yaron Minsky

unread,
Nov 13, 2007, 7:08:38 PM11/13/07
to Edgar Friendly, caml-list
There's a simple trick that Steven Weeks introduced to us and that we now
use at Jane Street for this kind of thing.

You write down a signature:

module type Abs_int : sig
type t
val to_int : t -> int
val of_int : int <- t
end

And then you write concrete module Int that implements this signature. You
can then write:

module Row : Abs_int = Int
module Col : Abs_int = Int

You can now use Row.t and Col.t as abstract types. The boilerplate is
written once, but can be used over and over. I've personally seen more
use-cases for this with strings than with ints (to separate out different
kinds of identifiers)

y

Martin Jambon

unread,
Nov 13, 2007, 7:22:40 PM11/13/07
to Yaron Minsky, caml-list
On Tue, 13 Nov 2007, Yaron Minsky wrote:

> There's a simple trick that Steven Weeks introduced to us and that we now
> use at Jane Street for this kind of thing.
>
> You write down a signature:
>
> module type Abs_int : sig
> type t
> val to_int : t -> int
> val of_int : int <- t
> end
>
> And then you write concrete module Int that implements this signature. You
> can then write:
>
> module Row : Abs_int = Int
> module Col : Abs_int = Int
>
> You can now use Row.t and Col.t as abstract types. The boilerplate is
> written once, but can be used over and over. I've personally seen more
> use-cases for this with strings than with ints (to separate out different
> kinds of identifiers)

That's the best solution I've seen so far.

Otherwise there's still the following:
http://martin.jambon.free.fr/ocaml.html#opaque

which in theory doesn't require new module or type declarations, but it
adds type parameters, which can be confusing.


Martin

--
http://wink.com/profile/mjambon
http://martin.jambon.free.fr

Pierre Weis

unread,
Nov 14, 2007, 2:59:01 AM11/14/07
to Martin Jambon, caml-list, Yaron Minsky
> On Tue, 13 Nov 2007, Yaron Minsky wrote:
>
> >There's a simple trick that Steven Weeks introduced to us and that we now
> >use at Jane Street for this kind of thing.
> >
> >You write down a signature:
> >
> >module type Abs_int : sig
> >type t
> >val to_int : t -> int
> >val of_int : int <- t
> >end
> >
> >And then you write concrete module Int that implements this signature. You
> >can then write:
> >
> >module Row : Abs_int = Int
> >module Col : Abs_int = Int
> >
> >You can now use Row.t and Col.t as abstract types. The boilerplate is
> >written once, but can be used over and over. I've personally seen more
> >use-cases for this with strings than with ints (to separate out different
> >kinds of identifiers)
>
> That's the best solution I've seen so far.
>
> Otherwise there's still the following:
> http://martin.jambon.free.fr/ocaml.html#opaque
>
> which in theory doesn't require new module or type declarations, but it
> adds type parameters, which can be confusing.
>
>
> Martin

In the next version of the compiler, you will have an extension to the
definition of types with private construction functions (aka ``private''
types) that just fulfills your programming concern: in addition to usual
private sum and record private type definitions, you can now define a type
abbreviation which is private to the implementation part of a module (see
note 1).

Since the values of a private type are concrete, it is much easier for the
programmer to use the values of the type abbreviation.

Using the new private type abbreviation feature, you can write for instance:

row.ml
------
type row = int;;

let make i =
if i < 0 then failwith "Row: cannot create a negative row" else
i;;

let from i = i;;

row.mli
-------
type row = private int;;

val make : int -> row;;
val from : row -> int;;

This solution does not require any additional tagging or boxing, only the
usage of injection/projection function for the type. As for usual private
types, the injection function is handy to provide useful invariants (in the
row type example case, we get ``a row value is always a positive integer'').

In addition, the private abbreviation is publicly exposed in the interface
file. Hence, the compiler knows about it: it can (and effectively does)
optimize the programs that use values of type row in the same way as if the
type row were defined as a regular public abbreviation.

Last but not least, being an instance of the identity function, the from
projection function is somewhat generic: we can dream to suppress the burden
of having to write it for each private type definition, if we find a way to
have it automatically associated to the new private type by the compiler.

Best regards,

Note 1: this is a generalization for regular type abbreviations of the
polymorphic variants private type definitions that Jacques Garrigue already
introduced to provide polymorphic variants (and object) types with private
row variables.

--
Pierre Weis

INRIA Rocquencourt, http://bat8.inria.fr/~weis/

Gerd Stolpmann

unread,
Nov 14, 2007, 6:01:52 AM11/14/07
to Edgar Friendly, caml-list
Am Dienstag, den 13.11.2007, 17:41 -0600 schrieb Edgar Friendly:
> When one writes
>
> type row = int
> type col = int
>
> This allows one to use the type names "row" and "col" as synonyms of
> int. But it doesn't prevent one from using a value of type row in the
> place of a value of type col. OCaml allows us to enforce row as
> distinct from int two ways:
>
> 1) Variants:
> type row = Row of int
> type col = Col of int
>
> Downside: unnecessary boxing and tagging
> conversion from row -> int: (fun r -> match r with Row i -> i)

Note that you can write much shorter:

(fun Row i -> i), or

let some_function (Row i) = ...

If the small runtime penalty is not essential, this is probably the best
way to do it.

Gerd

--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

Jon Harrop

unread,
Nov 14, 2007, 6:06:48 AM11/14/07
to caml...@yquem.inria.fr
On Wednesday 14 November 2007 11:01, Gerd Stolpmann wrote:
> (fun Row i -> i), or

Do you mean:

function Row i -> i

or:

fun (Row i) -> i

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

Alain Frisch

unread,
Nov 14, 2007, 7:37:56 AM11/14/07
to Pierre Weis, caml-list
Pierre Weis wrote:
> type row = private int;;

The only difference with an abstract type is that some generic
operations (comparision, equality) can be optimized, and currently, it
happens only for some basic types. So exporting a private abbreviation
(instead of an abstract type) is useless when the type is not a basic
type. Is it correct?

I find it somewhat disturbing to expose low-level optimization features
as part of the type system. Couldn't a similar thing (specializing
operations on integers) be achieved by always storing manifest types in
cmxs files? (Within a single compilation unit, the compiler could keep
type definitions across module boundaries.)


Alain

Virgile Prevosto

unread,
Nov 14, 2007, 8:56:33 AM11/14/07
to caml...@yquem.inria.fr
Le mer 14 nov 2007 13:37:24 CET,
Alain Frisch <al...@frisch.fr> a écrit :

> Pierre Weis wrote:
> > type row = private int;;
>
> The only difference with an abstract type is that some generic
> operations (comparision, equality) can be optimized, and currently,
> it happens only for some basic types. So exporting a private
> abbreviation (instead of an abstract type) is useless when the type
> is not a basic type. Is it correct?
>

As far as I understand Pierre's original email, this is not simply
about optimization (which is in fact a simple by-product of the
feature). The main point of private is to distinguish between the
construction of a value (where you might want to enforce some invariant)
and its use (where you might rely on the invariant). You can use a
value of type row everywhere a plain int is expected (in particular for
addition, subtraction, etc.), without having to redefine the
corresponding functions as would have been necessary with an abstract
type. On the other hand, the construction of a value of type row from a
plain int is impossible outside of the constructor function provided in
the module, which checks that the invariant required for row is
satisfied. This can be useful for other types as well, consider for
instance the following module (not tested of course)

module Sorted:
sig
type sorted_list = private int list
val empty: sorted_list
val insert: int -> sorted_list -> sorted_list
end =
struct
type sorted_list = int list
let empty = []
let rec insert x = function
[] -> [x]
| y::_ as l when x < y -> x::l
| y::l -> y :: insert x l
end

A value of type Sorted.sorted_list is a list, for which all operations
of module List are available (as well as pattern-matching), but in
addition it is guaranteed that a sorted_list is sorted, since it must
be built with empty and insert.

--
E tutto per oggi, a la prossima volta.
Virgile

Zheng Li

unread,
Nov 14, 2007, 9:35:45 AM11/14/07
to caml...@inria.fr
Edgar Friendly <thele...@gmail.com> writes:
> When one writes
>
> type row = int
> type col = int
>
> This allows one to use the type names "row" and "col" as synonyms of
> int. But it doesn't prevent one from using a value of type row in the
> place of a value of type col. OCaml allows us to enforce row as
> distinct from int two ways:
>
> 1) Variants:
> type row = Row of int
> type col = Col of int
>
> Downside: unnecessary boxing and tagging
> conversion from row -> int: (fun r -> match r with Row i -> i)
> conversion from int -> row: (fun i -> Row i)

I agree with Gerd, this is probably the least-effort solution. If I'm
not making mistake, even with the new feature introduced by Pierre (Btw,
really nice feature!), you still have to make the restriction at the
module level. I.e., only the outside world of current module will
benefit from this protection, and if you want to use this protection
right now, you still have to wrap type row/col as modules.

You may consider using variant with the syntax convention in Gerd's
post, or you can consider using record. E.g.

type row = {r:int} and col = {c:int}

then the "from" method are just record construction like {c=3} and "to"
method are just field access like x.r.

On efficiency, note that if your single variant (record) has multiple
fields, e.g. "type row = Row of int * int", this encoding doesn't
introduce more (un)boxing than "type row = int * int". The difference
happens when the single variant (record) has single field itself. There
was discussion on this topic before. I do agree that eliminating extra
(un)boxing on single variant type (and single immutable field record
type) would be nice.

> 2) Functors:
> module type RowCol =
> sig
> type row
> val int_of_row : row -> int
> val row_of_int : int -> row
> type col
> val int_of_col : col -> int
> val col_of_int : int -> col
> end
>
> module Main = functor (RC: RowCol) -> struct
> (* REST OF PROGRAM HERE *)
> end
>

If you want to make the protection effective in "current" module, you
anyway needs some module-level abstraction in current OCaml. The following
functor tries to do the simple wrapping for any type, and can be defined
once for all:

module type ANY = sig type t end
module type ABS = sig type t type nat val box: nat -> t val unbox: t -> nat end
module Abstr(T:ANY) : ABS with type nat = T.t = struct
type t = T.t and nat = T.t
let box x = x and unbox x = x
end

(* Test *)
# module Row = Abstr(struct type t = int end)
# module Col = Abstr(struct type t = int end)
# let x = Row.box 3 and y = Col.box 5;;
val x : Row.t = <abstr>
val y : Col.t = <abstr>
# let f x y = Row.unbox x + Col.unbox y;;
val f : Row.t -> Col.t -> int = <fun>

--
Zheng Li
http://www.pps.jussieu.fr/~li

Pierre Weis

unread,
Nov 14, 2007, 9:35:52 AM11/14/07
to Alain Frisch, caml-list, Pierre Weis
Hi Alain,

> Pierre Weis wrote:
> >type row = private int;;
>
> The only difference with an abstract type is that

a private type is not an abstract type but a concrete type.

> some generic operations (comparision, equality) can be optimized, and
> currently, it happens only for some basic types. So exporting a private
> abbreviation (instead of an abstract type) is useless when the type is not
> a basic type. Is it correct?

I don't understand what you mean by ``exporting a private type abbreviation
is useless''. In fact, you cannot create a private abbreviation if you do not
`export'' it and define it as private when exporting, isn't it ?

(Clearly, if you define a private type abbreviation in an implementation
file, this definition is completely useless since you cannot define any
value of the private type).

However, if I interpret what you said as: ``if we only want to optimize the
compilation, there is no point to define a private type instead of an
abstract type in the case of a complex type abbreviation'' then the question
is somewhat equivalent to ask ``in which case the compiler can optimize when
faced to a type abbreviation ?'': the answer is complex but in particular the
compiler can optimize a type t defined as being a (concrete) abbreviation for
a float array or a (concrete) abbreviation for a record of floats.

In any case, the purpose of introducing private abbreviation type definitions
is not to optimize programs but to write clearer and safer programs. In
particular because private data types can ensure invariants the same way as
abstract data types do; and also because private data types support pattern
matching the same way as concrete types do.

Once more, private abbreviation types being concrete, they get all the
advantages of concrete types and all their drawbacks (in particular wrt
abstraction of value representation, evolution of implementation programs,
openness of algorithms, meta informations).

> I find it somewhat disturbing to expose low-level optimization features
> as part of the type system.

No low-level optimization is ``exposed in the type system'': the low level
(type-based) optimizations are just an automatic side effect of a private
data type being indeed a concrete type. In other words, would you say that
you ``expose low level optimization as part of the type system'' if you ever
define a vector as being a float array instead of being a M.t array value
with M.t being an abstract abbreviation for float ?

In some cases, the answer can be yes: you may need to ``unabstract'' types
for the sake of efficiency. But generally speaking, you just use the type
float because it is the type of values you need, the optimizations come then
from the compiler as a very welcome bonus.

> Couldn't a similar thing (specializing operations on integers) be achieved

> by always storing manifest types in .cmxs files? (Within a single


> compilation unit, the compiler could keep type definitions across module
> boundaries.)

That's another story: the problem with this approach is to prevent breaking
the separate compilation feature of the actual compiler.

In any case, private type abbreviations have been implemented as a new useful
tool for the programmer, not as a way to optimize programs. Type-based
compiler's optimizations are just an extra plus, I thought it was worth to
mention that private type abbreviations are fully compatible with these
optimizations.

But in the first place, private type abbreviations are useful to define proper
concrete sub types of another type.

All the best,

-- Pierre

Edgar Friendly

unread,
Nov 14, 2007, 11:10:12 AM11/14/07
to Yaron Minsky, caml-list
Yaron Minsky wrote:
> module type Abs_int : sig
> type t
> val to_int : t -> int
> val of_int : int <- t
> end
>
> And then you write concrete module Int that implements this signature. You
> can then write:
>
> module Row : Abs_int = Int
> module Col : Abs_int = Int
>
This approximates my idea of "optimal" well enough that I'll rewrite
some of my old code to use it. It loses some opportunities for compiler
optimization, but otherwise seems perfect. No repetitive boilerplate to
write over and over, no unnecessary boxing, efficient conversion to the
base type (well, it still requires a function call to find out that
nothing needs to change, and maybe there's a shallow copy created in
this process), easy on the eyes/fingers syntax for conversion and
declaration and a readable type name for ocamlc to report when things go
wrong.

Thanks for sharing this gem. Any chance it can be added to a/the FAQ?

Brian Hurt

unread,
Nov 14, 2007, 11:20:53 AM11/14/07
to Edgar Friendly, caml-list, Yaron Minsky
Edgar Friendly wrote:

>Yaron Minsky wrote:
>
>
>>module type Abs_int : sig
>>type t
>>val to_int : t -> int
>>val of_int : int <- t
>>end
>>
>>And then you write concrete module Int that implements this signature. You
>>can then write:
>>
>>module Row : Abs_int = Int
>>module Col : Abs_int = Int
>>
>>
>>
>This approximates my idea of "optimal" well enough that I'll rewrite
>some of my old code to use it. It loses some opportunities for compiler
>optimization, but otherwise seems perfect. No repetitive boilerplate to
>write over and over, no unnecessary boxing, efficient conversion to the
>base type (well, it still requires a function call to find out that
>nothing needs to change, and maybe there's a shallow copy created in
>this process), easy on the eyes/fingers syntax for conversion and
>declaration and a readable type name for ocamlc to report when things go
>wrong.
>
>

Actually, Ocaml is pretty good at cross-module inlining, so that
normally the function call is optimized out, and the whole conversion
becomes a no-op. Or, at least, such has been my experience.

Brian

Alain Frisch

unread,
Nov 14, 2007, 11:39:28 AM11/14/07
to Pierre Weis, caml-list
Pierre Weis wrote:
> a private type is not an abstract type but a concrete type.

I guess I got a wrong intuition of the new feature because of the "from"
function in your example. Is it the case that a value of type "abstract
int" can always be used as a value of type "int" automatically (you
mention that it is ok for pattern matching at least)?

-- Alain

Edgar Friendly

unread,
Nov 14, 2007, 11:57:39 AM11/14/07
to Pierre Weis, caml-list
Pierre Weis wrote:
> In the next version of the compiler, you will have an extension to the
> definition of types with private construction functions (aka ``private''
> types) that just fulfills your programming concern: in addition to usual
> private sum and record private type definitions, you can now define a type
> abbreviation which is private to the implementation part of a module (see
> note 1).
>
At first glance, this improvement does satisfy my concerns, but I think
it still falls short of optimal in ways that seem easy to fix.

> Since the values of a private type are concrete, it is much easier for the
> programmer to use the values of the type abbreviation.
>

This concerns me - I'd like to require explicit casting/coercion to
create or use the internal value of the abstract type, maybe with an
exception for literals. Could you elaborate on this?

> This solution does not require any additional tagging or boxing, only the
> usage of injection/projection function for the type. As for usual private
> types, the injection function is handy to provide useful invariants (in the
> row type example case, we get ``a row value is always a positive integer'').
>

With a bit of low-level support, I imagine it not difficult to implement
the following:

type row = private int constraint (fun i -> i >= 0)

such that the compiler uses the provided constraint function to check
any (x :> row) casts, throwing an exception(?) on false. This solution
wouldn't involve the module system just to have positive integer types,
and gets rid of the function call overhead on 'from'.

> In addition, the private abbreviation is publicly exposed in the interface
> file. Hence, the compiler knows about it: it can (and effectively does)
> optimize the programs that use values of type row in the same way as if the
> type row were defined as a regular public abbreviation.
>

Does this mean I can do:

let one (r:row) = r+1
let onea (r : row) = r = 1
let two (r:row) (i:int) = r+i
let three : row -> string = function 3 -> "three" | _ -> "not three"

> Last but not least, being an instance of the identity function, the from
> projection function is somewhat generic: we can dream to suppress the burden
> of having to write it for each private type definition, if we find a way to
> have it automatically associated to the new private type by the compiler.
>

The use case for this feature is restricted construction to enforce
invariants, no? Anything more complex should probably be totally
abstract and not simply private. In which case, the constraint keyword
seems an effective way to tell the compiler what to do, and just let :>
coercions do their magic.

E.

Pierre Weis

unread,
Nov 14, 2007, 1:44:23 PM11/14/07
to Alain Frisch, caml-list, Pierre Weis
> Pierre Weis wrote:
> >a private type is not an abstract type but a concrete type.
>
> I guess I got a wrong intuition of the new feature because of the "from"
> function in your example. Is it the case that a value of type "abstract
> int" can always be used as a value of type "int" automatically (you
> mention that it is ok for pattern matching at least)?

If we stick to the row example of positive values, we get:

- a value of type row is in fact a concrete integer (it is not hidden in any
way),
- a value of type row can only be created by the make function defined in the
implementation of the module that defines the private type,
- a value of type row can be projected out of type row to a value of type int
with a ``no-op'' identity function (I called it from in the example).

So, no: a value of type row is not of type int and you need a marker to
indicate the projection (for the time being the marker is a (identity)
function call to let the implemention as simple as possible, but a sub-typing
constraint makes sense and we can provide it if this is considered clearer).

Best regards,

-- Pierre

Edgar Friendly

unread,
Nov 14, 2007, 2:19:47 PM11/14/07
to Pierre Weis, caml-list, Alain Frisch
Pierre Weis wrote:
> If we stick to the row example of positive values, we get:
>
> - a value of type row is in fact a concrete integer (it is not hidden in any
> way),
> - a value of type row can only be created by the make function defined in the
> implementation of the module that defines the private type,
> - a value of type row can be projected out of type row to a value of type int
> with a ``no-op'' identity function (I called it from in the example).
>
> So, no: a value of type row is not of type int and you need a marker to
> indicate the projection (for the time being the marker is a (identity)
> function call to let the implemention as simple as possible, but a sub-typing
> constraint makes sense and we can provide it if this is considered clearer).
>
> Best regards,
>
> -- Pierre

Other than piggybacking on the private row types work by J. Garrigue and
probably simplicity of implementation, does there exist a reason for
involving the module system in this mechanism?

At the basic level, creation and projection will always be identity
functions (except that creation can throw exceptions), so it seems
reasonable to elide that repetitive code. For more complex examples (a
private type with more functions that work on the internal
representation of that type), since the representation is already
exposed, it seems appropriate to just let the client implement those
functions through the identity accessors. It'd probably be safer than
doing so in the Row module, because the compiler could enforce the
creation invariant in those functions instead of allowing them to bypass
the 'make' function.

Once all these functions are removed from the module all that's left is
the type definition and the creation constraint. Am I missing something?

E.

Pierre Weis

unread,
Nov 14, 2007, 4:04:36 PM11/14/07
to Edgar Friendly, caml-list, Pierre Weis
> Pierre Weis wrote:
> > In the next version of the compiler, you will have an extension to the
> > definition of types with private construction functions (aka ``private''
> > types) that just fulfills your programming concern: in addition to usual
> > private sum and record private type definitions, you can now define a type
> > abbreviation which is private to the implementation part of a module (see
> > note 1).
> >
> At first glance, this improvement does satisfy my concerns, but I think
> it still falls short of optimal in ways that seem easy to fix.
>
> > Since the values of a private type are concrete, it is much easier for the
> > programmer to use the values of the type abbreviation.
> >
> This concerns me - I'd like to require explicit casting/coercion to
> create or use the internal value of the abstract type, maybe with an
> exception for literals. Could you elaborate on this?

Values of a private type abbreviation are concrete in the sense that they are
public and not hidden to inspection. However, a value of the private type
abbreviation has a type that is the one of the private type, not the one f
the abbreviated expression. To project such a value you need some syntactic
construct: either a (identity) function call or a sub-typing constraint as
already proposed on this list.

To create a value of the private type abbreviation, you must use the
construction function (as usual with private types, you cannot freely crete
values, you must use the provided injection).

> > This solution does not require any additional tagging or boxing, only the
> > usage of injection/projection function for the type. As for usual private
> > types, the injection function is handy to provide useful invariants (in the
> > row type example case, we get ``a row value is always a positive integer'').
> >
> With a bit of low-level support, I imagine it not difficult to implement
> the following:
>
> type row = private int constraint (fun i -> i >= 0)
>
> such that the compiler uses the provided constraint function to check
> any (x :> row) casts, throwing an exception(?) on false. This solution
> wouldn't involve the module system just to have positive integer types,
> and gets rid of the function call overhead on 'from'.

The function call overhead can be avoided easily, if the from function is
provided by the compiler or if we use a sub-typing constraint row :> int.

On the other hand, the construction you proposed also applies to abstract
types: we need module to define an abstract type; we can have something
lighter such as a direct abstract type definition:

type rat = abstract {
numerator : int;
denominator : int;
} with

let make_rat n d = ...
and numerator {numerator = n} = n
and denominator ...

let plus_rat r1 r2 = ...
let mult_rat r1 r2 = ...

...

> > In addition, the private abbreviation is publicly exposed in the interface
> > file. Hence, the compiler knows about it: it can (and effectively does)
> > optimize the programs that use values of type row in the same way as if the
> > type row were defined as a regular public abbreviation.
> >
> Does this mean I can do:
>
> let one (r:row) = r+1
> let onea (r : row) = r = 1
> let two (r:row) (i:int) = r+i
> let three : row -> string = function 3 -> "three" | _ -> "not three"

No, you must explicitely project row values to int values (even if this is an
identity):

let one r = from r + 1
let onea r = from r = 1
let two (r:row) (i:int) = from r + i


let three : row -> string =

fun r -> match from r with


| 3 -> "three" | _ -> "not three"

> > Last but not least, being an instance of the identity function, the from
> > projection function is somewhat generic: we can dream to suppress the burden
> > of having to write it for each private type definition, if we find a way to
> > have it automatically associated to the new private type by the compiler.
> >
> The use case for this feature is restricted construction to enforce
> invariants, no?

Yes.

> Anything more complex should probably be totally
> abstract and not simply private. In which case, the constraint keyword
> seems an effective way to tell the compiler what to do, and just let :>
> coercions do their magic.

Given that the injection function (the make function or the constraint part
of your construction) must enforce arbitrarily complex invariants, we may
need a module for private abbreviation as well (imagine for instance a
private abbreviation for prime numbers, that only injects into the prime type
integer arguments that are indeed prime numbers: you may need some room to
define the predicate!).

Thank you for your suggestions.

Best regards,

--
Pierre Weis

INRIA Rocquencourt, http://bat8.inria.fr/~weis/

_______________________________________________

Edgar Friendly

unread,
Nov 14, 2007, 5:10:10 PM11/14/07
to Pierre Weis, caml-list
Pierre Weis wrote:
> Values of a private type abbreviation are concrete in the sense that they are
> public and not hidden to inspection.
>
<SNIP>

> No, you must explicitely project row values to int values (even if
> this is an identity):

I guess I don't understand what you mean by "public and not hidden to
inspection". To a programmer using such a type, the values don't seem
public. Do you only mean that to the compiler sees the full type (for
optimization purposes), or something more?

> The function call overhead can be avoided easily, if the from function is
> provided by the compiler or if we use a sub-typing constraint row :> int.
>

I'd love to use subtyping constraints more than functions. Applying a
function *could* do some other work or massage the value it gives to me,
whereas a compiler cast seems to indicate what's happening better. It
also doesn't depend on any cross-module inlining magic that may or may
not happen.

> On the other hand, the construction you proposed also applies to abstract
> types: we need module to define an abstract type; we can have something
> lighter such as a direct abstract type definition:
>
> type rat = abstract {
> numerator : int;
> denominator : int;
> } with
>
> let make_rat n d = ...
> and numerator {numerator = n} = n
> and denominator ...
>
> let plus_rat r1 r2 = ...
> let mult_rat r1 r2 = ...
>
> ...
>

This example seems like a great time to use a module and an abstract
type: there's lots of functions that deal with the data in a way that
they all need to use its internal representation. But there's a use for
private copies of builtin types, possibly with restrictions on their
construction, and it seems that *this*

> Given that the injection function (the make function or the constraint part
> of your construction) must enforce arbitrarily complex invariants, we may
> need a module for private abbreviation as well (imagine for instance a
> private abbreviation for prime numbers, that only injects into the prime type
> integer arguments that are indeed prime numbers: you may need some room to
> define the predicate!).
>

let is_prime i = ... (* return true if prime, false if not *)
type prime = private int constraint is_prime

This seems to suffice for the example of primes. For natural numbers, I
don't see any advantage of having a private type vs. an abstract type,
so using a module to encapsulate that type makes more sense to me.

I might point out that supporting arbitrarily complex invariants, while
theoretically satisfying, might not be necessary. I'd imagine that 90+%
of actual uses of this pattern fall in the case of "simple range
restriction", and taking care of those well changes the approach
significantly. I like the Perl idea of "make the common things easy,
and the hard things possible".

> Thank you for your suggestions.
>
> Best regards,
>

Thank *you* for treating my naive and amateur suggestions as if they had
worth.

All the Best,
Eric

Jacques Garrigue

unread,
Nov 14, 2007, 7:18:15 PM11/14/07
to thele...@gmail.com, caml...@yquem.inria.fr
From: Edgar Friendly <thele...@gmail.com>

> With a bit of low-level support, I imagine it not difficult to implement
> the following:
>
> type row = private int constraint (fun i -> i >= 0)
>
> such that the compiler uses the provided constraint function to check
> any (x :> row) casts, throwing an exception(?) on false. This solution
> wouldn't involve the module system just to have positive integer types,
> and gets rid of the function call overhead on 'from'.

This syntax could be nice, but it is just syntactic sugar for

module Private_row : sig


type row = private int

val new_row : int -> row
end = struct
type row = int
let new_row i = assert (i >= 0); i
end
include Private_row

I'm sure camlp4 can do that.

Direct compiler support couldn't give you more:

* you cannot use a coercion to create a row: coercions are purely
type-level features, and cannot execute any code; we don't want to
change this. On the other hand coercing row to int could be made ok.

* the "constraint ..." part cannot appear in an interface, since
interfaces cannot contain expressions

Changing any of these two would be difficult indeed.

(To be honest, the above results in
module Private_row : sig type row = private int val new_row : int -> row end
type row = Private_row.row
val new_row : int -> row = <fun>
meaning that Private_row is not completely hidden, eventhough we don't
need to mention it in an interface.)

Jacques Garrigue

Edgar Friendly

unread,
Nov 15, 2007, 1:24:02 AM11/15/07
to Jacques Garrigue, caml...@yquem.inria.fr
Jacques Garrigue wrote:
> This syntax could be nice, but it is just syntactic sugar for
>
> module Private_row : sig
> type row = private int
> val new_row : int -> row
> end = struct
> type row = int
> let new_row i = assert (i >= 0); i
> end
> include Private_row
>
Funny to define a module (hiding) just to include it (un-hiding). This
pair usually comes to a no-op, but here it doesn't.

> I'm sure camlp4 can do that.
>

It can. But camlp4 holds back real development of the OCaml Community
by dividing the language into incompatible splinter dialects or forcing
us to code in pure OCaml which has little sugar.

> Direct compiler support couldn't give you more:
>
> * you cannot use a coercion to create a row: coercions are purely
> type-level features, and cannot execute any code; we don't want to
> change this. On the other hand coercing row to int could be made ok.
>

Good point. And I guess there's no better way to pack the creation
assertion with the type than a module with a constructor function...

> * the "constraint ..." part cannot appear in an interface, since
> interfaces cannot contain expressions
>

Having interfaces contain arbitrary expressions... difficult. But I
guess that we would have a 90% solution by reducing the problem from
arbitrary constraints to simple range constraints. As long as it worked
for ints, floats and chars, .... Still not trivial, though.

Kila la heri,
E.

Alain Frisch

unread,
Nov 15, 2007, 1:33:19 AM11/15/07
to Pierre Weis, caml-list
Pierre Weis wrote:
> - a value of type row is in fact a concrete integer (it is not hidden in any
> way),

But you cannot apply integer operations to it, so it is not a normal
concrete integer, right?

> - a value of type row can only be created by the make function defined in the
> implementation of the module that defines the private type,
> - a value of type row can be projected out of type row to a value of type int
> with a ``no-op'' identity function (I called it from in the example).
>
> So, no: a value of type row is not of type int and you need a marker to
> indicate the projection (for the time being the marker is a (identity)
> function call to let the implemention as simple as possible, but a sub-typing
> constraint makes sense and we can provide it if this is considered clearer).

Now I'm lost :-)

Can you show an example where replacing all "type t = private ..."
definitions by "type t" changes a well-typed program into an ill-typed
one? I understand that if private types create real subtypes (w.r.t.
:>) then they are different than abstract types, but otherwise, I don't
see the difference for the type-checker.

-- Alain

Vincent Hanquez

unread,
Nov 15, 2007, 5:55:50 AM11/15/07
to Edgar Friendly, Jacques Garrigue, caml...@yquem.inria.fr
On Thu, Nov 15, 2007 at 12:23:25AM -0600, Edgar Friendly wrote:
> > I'm sure camlp4 can do that.
> >
> It can. But camlp4 holds back real development of the OCaml Community
> by dividing the language into incompatible splinter dialects or forcing
> us to code in pure OCaml which has little sugar.

I'm glad that some people feels the same way about camlp4 !
this is really sad, that people thinks camlp4 is a great idea ...

--
Vincent Hanquez

Pierre Weis

unread,
Nov 15, 2007, 8:27:21 AM11/15/07
to Alain Frisch, caml-list, Pierre Weis
> >- a value of type row is in fact a concrete integer (it is not hidden in
> >any way),
>
> But you cannot apply integer operations to it, so it is not a normal
> concrete integer, right?

Right: a value of type row has type row ... not type int!

> Can you show an example where replacing all "type t = private ..."
> definitions by "type t" changes a well-typed program into an ill-typed
> one? I understand that if private types create real subtypes (w.r.t.
> :>) then they are different than abstract types, but otherwise, I don't
> see the difference for the type-checker.

May be, I must recall what are private types in the first place: private
types has been designed to implement quotient data structure.

(*** Warning.

Wao: after re-reading this message I realize that it is really long.

You can skip it, if you already know something about mathematical quotient
structures, private types for record and variant, relational types and the
mocac compiler!

***)

What is a quotient ?
--------------------

Here quotient has to be understood in the mathematical sense of the term:
given a set S and an equivalence relation R, you consider the set S/R of the
equivalence classes modulo the relation R. S/R is named the quotient
structure of S by R. Quotient structures are fundamental in mathematics and
there properties have been largely studied, in particular to find the
relationship between operations defined on S and operations on S/R: which
operations on S can be extended to operations on S/R ? Which properties of
operations on S are still valid for there extension on S/R ? and so on.

Simple examples of quotient structures can be found everywhere in maths, for
instance consider the equivalence relation R on pairs of integer values
defined as

z_1 equiv z_2 if and only if z_1 and z_2 are both odd or are both even
(in Caml terms
z_1 equiv z_2 if and only if (z_1 mod 2) = (z_2 mod 2))

The set Z/R is named Z/2Z and it inherits properties of operations on Z: it
is an abelian group (and more).

A wonderful example of such inheritance of interesting properties by
inheritance of operations thanks to a definition by a quotient structure is
the hierarchy of sets of numbers: starting with N (the set of natural
numbers) we define Z (the set of relative integer numbers) as a quotient of
NxN), then Q (the rational numbers) as a quotient of ZxZ*, R (the set of real
numbers) as a quotient of Q^N (the sets of sequences of rational numbers), C
(the set of complex numbers) as a quotient of R[X] (the set of polynomials
with one unknown and real coefficients). Note here that at each step of the
hierarchy the quotient structure is richer (has more properties and/or more
elements) than the original structure: first we have a monoide, then a group
and a ring, then a field, then a complete field and so on.

Why quotient structures ?
-------------------------

So quotient structures are a fundamental tool to express mathematical
structures and properties. Exactly as mathematical functions, relations and
sets. As you may have noticed, programming languages are extremely related to
maths (due to their purely logical bases) even if, in some cases, the zealots
of the language at hand try to hide the mathematical fundation into a strange
anthropomorphic discurse to describe their favorite language features.

In the end, the computer programing languages try hard to incorporate
powerful features from mathematics, because these features have been polished
by mathematicians for centuries. As a consequence of this work, we know
facts, properties and theorems about the mathematical features and this is an
extremely valuable benefit.

Now, what is the next frontier ? What can we steal more to mathematics for the
benefit of our favorite programming language ?

If we review the most powerful tools of mathematicians, we found that
functions have been borrowed (hello functional programming, lambda-calculus
and friends :)), relations have been borrowed (data bases, hash tables,
association lists), sets have been more or less borrowed (hello setle,
pascal, and set definition facilities from various libraries of various
languages...). More or less, we have all those basic features.

>From the mathematical set construction tools, we have got in Caml:

- the cartesian product of sets (the * binary type constructor, the record
type definitions),
- the disjunct union of sets (the | of polymorphic variants, the sum (or
variant) type definitions).

Unfortunately, we have no powerful way to define a quotient data
structure. That what private type definitions have been designed to do.

What do we need for a quotient data structure ?
-----------------------------------------------

In the first place, we need the ``true'' thing, the real feature that is
indeed used in maths. Roughly speaking this means to assimilate the quotient
set S/R to a subset of S.

In the previous definition of quotient structures, there is a careful
distinction between the base set S and the quotient set S/R. In fact, there
always exists a canonical injection from S to S/R, and if we choose a
canonical representant in each equivalence class of S/R, we get another
canonical injection from S/R to S, so that S/R can be considered as a subset
of S (the story is a bit more complex but that's the idea). This
injection/projection trickery is intensively used in maths; for instance, in
the hierarchy of number sets, we say and consider that N is a subset of Z
that itself is a subset of Q, a subset of R, a subset of C. Rigourously, we
should say for instance, there is a subset of Z that is canonically
isomorphic to N; in fact, we abusively assimilate N to this subset of Z;
hence, we say that N is ``included'' in Z, when we should say ``the image of
N by the canonical isomorphism from N to Z'' is included in Z; then, we go
one step further in the assimilation of N to a subset of Z, by denoting the
same, the elements of N and there image in Z; for instance, ``the neutral
element of the multiplication in Z'' and the successor of 0 in N is denoted
``1''; and we now can say that 1 belongs to Z. Note here that, in the first
place, ``the neutral element of the multiplication in Z'' is an equivalence
class (as all elements in Z are). So it is a set. More exactly, the ``neutral
element of the multiplication in Z'' is the infinite set of pairs of natural
numbers (n, m) such that n - m = 1 (here ``-'' is an operation in N and ``1''
is the successor of the natural number ``0'', so that n - m = 1 is a
shorthand for n = succ m). The assimilation between N and its isomorphic
image allows to use 1 as the denotation of this infinite set of pairs of
natural numbers.

We understand why the mathematicians always write after having designed a
quotient structure: ``thanks to this isomorphism, and if no confusion may
arise, we always assimilate S to its canonical injection in S/R''.

This assimilation is what private type definitions allow.

How do we define a quotient data structure ?
--------------------------------------------

The mathematical problem:
- we have a set S and an equivalence relation R on SxS,
- we construct the quotient S/R.
- we state afterwards:
``if no confusion may arise, we always assimilate S to its canonical
injection in S/R''.

The programming problem:
- we have a data structure S (defined by a type s) and a relation R on SxS
(defined by a function r from s * s to bool).
- we construct a data structure that represents S/R.
- we have afterwards:
``No confusion may arise, we always assimilate S to its canonical injection
in S/R''.

The private data type solution:
- the data structure S is defined as any Caml data structure and the
relation R is implemented by the canonical injection(s) from S to S/R.
- the canonical projection from S/R to S is automatic.
- S (defined by s) is assimilated to S/R (which is then also s!).

We defined S/R as a subset of S, the set of canonical representants of
equivalence classes of S/R.

More exactly, the canonical injection from S to S/R maps each element of S to
its equivalent class in S/R; if we assimilate each equivalence class to its
canonical representant (an element of S), the canonical injection maps each
element of S to the canonical representant of its equivalence class. Hence
the canonical injection has type S -> S.

An example treated without private types:
-----------------------------------------

Let's take a simple example:

S is the following data structure that implements a mathematical (free) structure
generated by the constant 0, the unary symbol succ, and the binary symbol +.

type peano =
| Zero
| Succ of peano
| Plus of peano * peano

R is the (equivalence) relation that expresses that
- 0 is the neutral element for + (for all x in S, 0 + x = x and x + 0 = x),
- + is associative (for all x, y, z in Sł, (x + (y + z)) = ((x + y) + z)).

The canonical injection is a function from peano -> peano that maps each value
in S (the type peano) to the canonical representant of its class in S/R (an
element of S (the type peano)):

let rec make = function
| Zero -> Zero
| Plus (Succ n, p) -> Succ (make (Plus (n, p)))
| Plus (Zero, p) -> p
| Plus (p, Zero) -> p
| Plus (Plus (x, y), z) -> make (Plus (x, make (Plus (y, z))))
| Succ p -> Succ p
;;

(This function may be wrong but you see the idea :))

So, if you want to represent S/R for peano in Caml you must:
- (1) define the type peano
- (2) always use the make function to create a value in S/R
- (3) not to confuse S and S/R in your head (I mean in your programs)

Private data types permits (1), ensures (2), and helps for (3) concerning the
head part and ensures (3) for programs by means of compile-time type errors.

The same example with private types:
------------------------------------

To define a private data type you must define a module.
- in the implementation, you define the carrier S and its canonical injection.
- in the interface, you export the carrier and the injection.

The type checker will ensure transparent projection from the quotient to the
carrier and mandatory use of the canonical projection to build a value in
S/R.

interface peano.mli
-------------------
type peano = private
| Zero
| Succ of peano
| Plus of peano * peano
;;

val zero : peano;;
val succ : peano -> peano;;
val plus : peano * peano -> peano;;

implementation peano.ml
-----------------------
type peano =
| Zero
| Succ of peano
| Plus of peano * peano
;;

let rec make = function
...
;;

let zero = make Zero;;
let succ p = make (Succ p);;
let plus (n, m) = make (Plus (n, m));;

(See note (1) for a discussion on the design of this example.)

What is given by private types:
-------------------------------

- You cannot create a value of S/R if you do not use the canonical injection
# Zero;;
Cannot create values of the private type peano

- As a consequence, values in S (i.e. S/R) are always canonical:
# let one = succ zero;;
val one : M.peano = Succ Zero
# let three = plus (one, succ (plus (one, zero)));;
val three : M.peano = Succ (Succ (Succ Zero))

- Projection is automatic
# let rec int_of_peano = function
| Zero -> 0
| Succ n -> 1 + int_of_peano n
| Plus (n, p) -> int_of_peano n + int_of_peano p
;;
val int_of_peano : M.peano -> int = <fun>
# int_of_peano three;;
- : int = 3

What about private abbreviations ?
----------------------------------

Private abbreviation definitions are a natural extension of private data type
definitions to abbreviation type definitions. The same notions are carried
out mutatis-mutandis:

- we have a data structure S (defined by a type s) and a relation R on SxS
(defined by a function r from s * s to bool).
- we construct a data structure that represents S/R.
- we have afterwards:
``No confusion may arise, we always assimilate S to its canonical injection
in S/R''.

In the case of abbreviations:

- the data structure S/R is defined by a type s (which is an abbreviation) and
a relation defined by a function,
- the canonical injection should be defined in the implementation file of the
private data type module,
- we always assimilate S to its canonical injection in S/R.

In pratice, as for usual private types (for which the constructive part of
the data type is not available outside the implementation), the type abbreviation
is not known outside the implementation (it is really private to its
implementation). This prevents the construction of values of S/R without
using the canonical injection.

Th noticeable difference is the projection function: it cannot be fully
implicit (otherwise, as you said Alain, the assimilation will turn to complete
confusion: we would have S identical to S/R, hence row=int and no difference
between a regular abbreviation definition and a private abbreviation
definition). If not implicit, the injection function should granted to be the
identity function (something that we would get for free, if we allow
projection via sub-typing coercion).

In short: abstract data types provide values that cannot be inspected nor
safely manipulated without using the functions provided by the module that
defines the abstract data type. In contrast, private data types are
explicitely concrete and you can freely write any algorithm you need. A good
test is printing: you can always write a function to print values of a
private type, it is up to the implementor of an abstract type to give you the
necessary primitives to access the components of the abstracted values.

Automatic generation of the canonical injection:
------------------------------------------------

You may have realized that writing the canonical injection can be a real
challenge.

The moca compiler (see http://moca.inria.fr/) helps you to write the
canonical injection by generating one for you, provided you can express the
injection at hand via a set of predefined algebraic relations (and/or rewrite
rules you specify) attached to the constructors of the private type. Private
types with constructors having algebraic relations are named relational
types. Moca generated canonical injections for relation types.

For instance, you would write the peano example as the following peano.mlm
file:

type peano = private
| Zero
| Succ of peano
| Plus of peano * peano
begin
associative
neutral (Zero)
rule Plus (Succ n, p) -> Succ (Plus (n, p))
end;;

The mocac compiler will generate the interface and implementation of the
peano module for you, including the necessary canonical injection function.

Best regards,

--
Pierre Weis

INRIA Rocquencourt, http://bat8.inria.fr/~weis/

Note (1):
- we can't just export the canonical injection since you could not build any
value of the type without previously defined values!
- we provide specialized versions of the canonical injection function
introducing the convention that the lowercase name of a value constructor is
the name of its associated canonical injection.
- we do not export the plasin true canonical injection since it would be more
confusing than useful.

Jacques Carette

unread,
Nov 15, 2007, 8:49:25 AM11/15/07
to Vincent Hanquez, caml...@yquem.inria.fr
Vincent Hanquez wrote:
> On Thu, Nov 15, 2007 at 12:23:25AM -0600, Edgar Friendly wrote:
>
>> But camlp4 holds back real development of the OCaml Community
>> by dividing the language into incompatible splinter dialects or forcing
>> us to code in pure OCaml which has little sugar.
>>
>
> I'm glad that some people feels the same way about camlp4 !
> this is really sad, that people thinks camlp4 is a great idea ...
>

Come now, be fair, camlp4 has both advantages and disadvantages.

Advantages:
- Allows you to *prove* that some syntactic sugar can make a big
difference in writing code. This is 2 orders of magnitude more
effective in convincing people than just arguing for it (on caml-list,
in person, whatever). Some extensions become popular because they have
proved themselves this way. What is lacking is a process by which these
can be integrated into the language.
- Allows you to show that some boilerplate code can be eliminated. This
is even better than the above, because every such instance is really a
research problem in disguise: how does one design a type system that can
show that this purely syntactic manipulation is correct?
- Allows you to embed a DSL into OCaml with quite a bit of ease and
freedom. One of the reasons to use campl4 instead of OCaml directly is
that for embedding DSLs, it is very frequent that lazyness is needed,
something less comfortable in OCaml.

Disadvantages:
- It splinters the language into dialects.
- It makes errors in the source code of user programs difficult to track
[anyone who has ever debugged camlp4-enabled programs have seen this]
- Bugs in the extension itself are fantastically difficult to debug.

Personally, I think the best situation would be if camlp4 were
unnecessary. But there is a lot of PL research to be done before that's
possible. And in the meantime, the extensions that people choose to
write in camlp4 communicate very loudly what users of OCaml really want
[as they are willing to put in serious effort into ``adapting'' the
language to their needs]; when I used to be a language developer (on
another system and, sadly, before I got proper training to do so), I
weighted user requests by how much effort the user had put into the
request. This annoyed loudmouths and pleased hardcore users, which
seemed fine with me.

Jacques

Jon Harrop

unread,
Nov 15, 2007, 9:52:48 AM11/15/07
to caml...@yquem.inria.fr
On Thursday 15 November 2007 13:48, Jacques Carette wrote:
> Personally, I think the best situation would be if camlp4 were
> unnecessary. But there is a lot of PL research to be done before that's
> possible...

How much more PL research do we need to tell us that OCaml is crying out for
a "try .. finally" construct?

> - It splinters the language into dialects.

Fork the OCaml distribution instead of using camlp4.

There are now thousands of OCaml programmers dying for trivial additions like
this, many of whom would contribute if they could and a single forked dialect
would improve in practical terms much faster than the current OCaml is.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Martin Jambon

unread,
Nov 15, 2007, 11:55:38 AM11/15/07
to Jon Harrop, caml...@yquem.inria.fr
On Thu, 15 Nov 2007, Jon Harrop wrote:

> On Thursday 15 November 2007 13:48, Jacques Carette wrote:
>> Personally, I think the best situation would be if camlp4 were
>> unnecessary. But there is a lot of PL research to be done before that's
>> possible...
>
> How much more PL research do we need to tell us that OCaml is crying out for
> a "try .. finally" construct?
>
>> - It splinters the language into dialects.
>
> Fork the OCaml distribution instead of using camlp4.
>
> There are now thousands of OCaml programmers dying for trivial additions like
> this,

As far as I know, nobody has died yet.

> many of whom would contribute if they could and a single forked dialect
> would improve in practical terms much faster than the current OCaml is.

That's only a theory. You sound like OCaml is the worst language on the
planet, while many people here think that it's the best.

Adding gazillions of extensions to the core language could just kill it.
I'm sure you know that.


Back to work...


Martin

_______________________________________________

Edgar Friendly

unread,
Nov 15, 2007, 12:30:51 PM11/15/07
to Pierre Weis, caml-list, Alain Frisch
Okay, let's see if I can summarize:

Private types have use because you can expose your implementation while
still having control over construction of values. This is important for
implementing quotient structures.

After reading everything about quotient types and the need for private
types, I have to ask "why not just completely abstract the type"? What
you seem to want from private types, you seem to gain pretty easily
through abstract types.

E.

Fernando Alegre

unread,
Nov 15, 2007, 3:29:29 PM11/15/07
to Edgar Friendly, caml-list, Pierre Weis, Alain Frisch

The main problem I have with abstract types is that they are heavyweight
since they need to be defined inside modules. In that particular, the
proposed private types are also heavyweight.

I would like to have private types be some kind of lightweight abstract
types, as illustrated in the following example with two different
injections and two different projections between integers and even
numbers. Note that no modules are needed. The only way to get into or out
of a private type would be by explicit coercion.

Private types defined this way would be sound, as expressions like
"half 10" would be forbidden by the compiler. The right expression
would be "half (10 :> even)". A programmer may break the semantics of
"even" by writing things such as "(11 :> even)", but he better know
what he is doing. The current abstract types can be used to make a private
type abstract too if that safety is needed.

type even = private int

let half (x:even) = (x :> int)/2
val half : even -> int

let int_of_even (x:even) = (x :> int)
val int_of_even : even -> int

let dbl (x:int) = (2*x :> even)
val dbl : int -> even

let even_of_int (x:int) =
if x mod 2 = 0 then (x :> even)
else invalid_arg "even_of_int"
val even_of_int : int -> even

There is another less natural way to encode even numbers that enforces
automatically the semantics: even number 2*n is encoded as integer n.
Then, the example becomes:

type even = private int

let half (x:even) = (x :> int)
let int_of_even (x:even) = 2*(x :> int)

let dbl (x:int) = (x :> even)
let even_of_int (x:int) =
let e = x/2 in
if 2*e = x then (e :> even)
else invalid_arg "even_of_int"

However, this kind of semantics-preserving encoding is just a matter of good
luck, and I don't think it can be generalized to many cases, at least
statically. You could always have a run-time check, but that misses the
point. Although, I don't see a way to avoid the run-time check in the
function even_of_int...

In summary, in my opinion private types should be completely orthogonal
to abstract types, so that both can be used either together or
separately.

Thanks,
Fernando

Michaël Le Barbier

unread,
Nov 15, 2007, 5:19:30 PM11/15/07
to Pierre Weis, caml-list, Alain Frisch
This long article is very interesting and pedagogic to the novice
programmer I am. The `int_of_peano' example is quite enlightening for
me. However, I must confess I was confused by the use of
`injection'/`projection', especially since the `canonical injection
S -> S/R' is usually not injective! The name `canonical surjection'
would suit it better, would'nt it? I am curious to know why this
strange terminology is used.
--
Regards,
Michaël LB

Michaël Le Barbier

unread,
Nov 15, 2007, 5:33:03 PM11/15/07
to Edgar Friendly, caml-list, Pierre Weis, Alain Frisch
Edgar Friendly <thele...@gmail.com> writes:

> After reading everything about quotient types and the need for private
> types, I have to ask "why not just completely abstract the type"? What
> you seem to want from private types, you seem to gain pretty easily
> through abstract types.

I think you have overlooked the following example in Pierre's article:

- Projection is automatic
# let rec int_of_peano = function
| Zero -> 0
| Succ n -> 1 + int_of_peano n
| Plus (n, p) -> int_of_peano n + int_of_peano p
;;
val int_of_peano : M.peano -> int = <fun>
# int_of_peano three;;
- : int = 3

Since we have access to the representation of the value we can filter
values (just like in the example) and since we are unable to create
ill-shaped values, we can assume that good invariants are true in the
values we are examining.

If we want to do the same with abstract data types, we would have
two types involved instead of one:

* the abstract type peano already described;
* a second type `good_representation' and an explicit application
`projection: peano -> good_representation'

Let's say that sets manipulated through the `Set' module are
represented by balanced trees. If the type set had been private
instead of abstract, we could investigate the representation of a
value of the type but could not produce an ill-formed value (an
`unbalenced tree').

(I also may be totally wrong :) )
--
Cheers,
Michaël

Yaron Minsky

unread,
Nov 15, 2007, 7:30:59 PM11/15/07
to Pierre Weis, caml-list, Alain Frisch
Most of what you said about quotient types made sense to me, but I must
admit to being deeply confused about the nature of the change to the
language. Consider the following trivial module and two possible
interfaces.

module Int = struct
type t = int
let of_int x = x
let to_int x = x
end

module type Abs_int = sig
type t


val of_int : int -> t
val to_int : t -> int

end

module type Priv_int = sig
type t = private int


val of_int : int -> t
val to_int : t -> int

end

So, what is the difference between (Int : Abs_int) and (Int : Priv_int)?
For instance, can I write (Priv_int.of_int 3) + (Priv_int.of_int 5)? Can
you point to anything concrete I can do with the private version that I
can't do with the abstract version? Is there any expression that is legal
for one but not for the other?

y

> INRIA Rocquencourt, http://bat8.inria.fr/~weis/<http://bat8.inria.fr/%7Eweis/>

Brian Hurt

unread,
Nov 15, 2007, 7:31:55 PM11/15/07
to Fernando Alegre, caml-list

On Thu, 15 Nov 2007, Fernando Alegre wrote:

>
> The main problem I have with abstract types is that they are heavyweight
> since they need to be defined inside modules. In that particular, the
> proposed private types are also heavyweight.

I don't see modules as being that heavyweight. Absent functors, they're
just namespaces. And Ocaml's cross-module inlining eliminates even most
of that overhead- identity conversions are generally inlined and become
no-ops. Even with functors, the overhead is small, approximately that of
calling a virtual function in C++ or Java.

Just my $0.02.

Brian

Christophe TROESTLER

unread,
Nov 15, 2007, 7:47:02 PM11/15/07
to caml...@yquem.inria.fr
Hi,

On Thu, 15 Nov 2007 14:26:49 +0100, Pierre Weis wrote:
>
> we define Z (the set of relative integer numbers) as a quotient of NxN),

I believe Z is generally defined as a quotient of the disjoint union
(borrowed as variants in programming languages) of N and N
(identifying the two 0). Of course you can also define it as a
quotient of N×N by the smaller equivalent relation "~" s.t. (n,m) ~
(n+1,m+1) and it is quite nice as addition is "inherited" from the one
on N×N but I am not sure it is the way it is usually done.

> In the previous definition of quotient structures, there is a
> careful distinction between the base set S and the quotient set
> S/R. In fact, there always exists a canonical injection from S to S/R,

Sorry for the nit-picking (I'm a mathematician, you know how we
are :) ), but unless R is diagonal, the function S -> S/R is not
injective (since we identify R-equivalent elements) but it is
surjective.

> We understand why the mathematicians always write after having designed a
> quotient structure: ``thanks to this isomorphism, and if no confusion may
> arise, we always assimilate S to its canonical injection in S/R''.

By the above remark, S can be identified to a subset of S'/R for some
S' but definitely not (of) S/R.

> (...)


> - the canonical projection from S/R to S is automatic.

> (...)


> More exactly, the canonical injection from S to S/R maps each element of S to
> its equivalent class in S/R; if we assimilate each equivalence class to its
> canonical representant (an element of S), the canonical injection maps each
> element of S to the canonical representant of its equivalence class. Hence
> the canonical injection has type S -> S.

This is rather the projection (and generally not an injection) as,
mathematically, p : S -> S is a projection means p(p x) = p x, forall x.
Moreover, in the abstract, there is no canonical representative of an
equivalent class -- that depends on the situation at hand.

Given your example (deleted), I suppose what you mean is that the
projection S -> S (representing S -> S/R) has to be written (since one
has to actually choose a representative) while the injection S/R -> S
comes for free or up to a coerecion (since we identified S/R to a
subset of S).

> What about private abbreviations ? (...)


> If not implicit, the injection function should granted to be the
> identity function (something that we would get for free, if we allow
> projection via sub-typing coercion).

I second the use of a subtyping coercion for that (after all, it would
be really annoying to have to use a private type abbreviation from a
module "forgetting" to provide a way out :) ).

> The moca compiler (see http://moca.inria.fr/) helps you to write the
> canonical injection by generating one for you, provided you can express the
> injection at hand via a set of predefined algebraic relations (and/or rewrite

May you tell a bit more ? That looks quite interesting ! Since there
is nothing canonical about the injection S/R -> S, what does moca do
if several representatives are possible (e.g. pick one by preferring
left associativity to the right one?). (I saw there is some paper on
the page but I have no time to dig through it right now.)

Best regards,
ChriS

Martin Jambon

unread,
Nov 15, 2007, 8:52:11 PM11/15/07
to Yaron Minsky, caml-list, Pierre Weis, Alain Frisch
On Thu, 15 Nov 2007, Yaron Minsky wrote:

> Most of what you said about quotient types made sense to me, but I must
> admit to being deeply confused about the nature of the change to the
> language. Consider the following trivial module and two possible
> interfaces.
>
> module Int = struct
> type t = int
> let of_int x = x
> let to_int x = x
> end
>
> module type Abs_int = sig
> type t
> val of_int : int -> t
> val to_int : t -> int
> end
>
> module type Priv_int = sig
> type t = private int
> val of_int : int -> t
> val to_int : t -> int
> end
>
> So, what is the difference between (Int : Abs_int) and (Int : Priv_int)?

Just like for other constructors of concrete types, "private" makes them
read-only, i.e. you can use them only in pattern-matching.

You can write
match (x : Priv_int.t) with 0 -> true | _ -> false

But not
(x : Priv_int.t) = 0


> For instance, can I write (Priv_int.of_int 3) + (Priv_int.of_int 5)?

No, Priv_int would have to define its own Priv_int.(+) function.

> Can
> you point to anything concrete I can do with the private version that I
> can't do with the abstract version? Is there any expression that is legal
> for one but not for the other?

You can do some pattern-matching over private versions of ints, strings,
chars, arrays, builtin lists, polymorphic variants, in addition to records
and variants, without the risk of passing such values to the wrong
function (if such wrong function expects a real "int" for instance).

You can't do any useful pattern matching with abstract types.


Martin

_______________________________________________

Andrej Bauer

unread,
Nov 16, 2007, 3:23:29 AM11/16/07
to Christophe TROESTLER, Caml
Christophe TROESTLER wrote:
> I believe Z is generally defined as a quotient of the disjoint union
> (borrowed as variants in programming languages) of N and N
> (identifying the two 0). Of course you can also define it as a
> quotient of N×N by the smaller equivalent relation "~" s.t. (n,m) ~
> (n+1,m+1) and it is quite nice as addition is "inherited" from the one
> on N×N but I am not sure it is the way it is usually done.

As another mathematician, also prone to nitpicking, I beg to differ. It
is more natural to define Z as a quotient of N x N, because this is just
a special case of a general construction of a (commutative) ring out of
a (commutative) rig.

A rig is a structure with 0, + and * but no subtraction.

A ring is a structure with 0, +, - and *.

(Of course you need to imagine suitable axioms for 0, +, * such as
commutativity, associativity and distributivity, but I am not writing
them down here.)

From every rig M we can construct a ring R as a quotient (M x M)/~ of M
x M where we define

(a,b) ~ (c,d) iff a+d = b+c

Think of (a,b) as the "non-existent" difference a-b. This construction
is natural in the technical sense that it is left adjoint to the
forgetful functor(*) from rings to rigs. (In non-mathematician language:
if we already have + and *, there is a _best_ way of getting - as well.)

Caveat: (*) the use of "functor" here was as in mathematics, not as in
ocaml.

In what way is natural, or canonical, the construction of Z defined by
sticking together two copies of N?

To add some value to this post, other than mathematical willy waving,
let me pose a puzzle. Implement the above in Ocaml, i.e., start with:

module type RIG =
sig
type t
val zero : t
val add : t -> t -> t
val mul : t -> t -> t
end

module type RING =
sig
include RIG
val sub : t -> t -> t
end

The puzzle is this: write down the module type for the construction
taking a rig and returning the free ring over it. It is _not_ just

module type ConstructRing = functor (M : RIG) -> RING

because that would allow _any_ ring whatsoever as the result. It has to
be the ring that comes from M via above construction, so there has to be
extra stuff, such as an embedding of M into the resulting R. But what else?

Andrej

Jean-Christophe Filliâtre

unread,
Nov 16, 2007, 3:59:03 AM11/16/07
to Andrej...@andrej.com, Christophe TROESTLER, Caml
Andrej Bauer a écrit :

> The puzzle is this: write down the module type for the construction
> taking a rig and returning the free ring over it. It is _not_ just
>
> module type ConstructRing = functor (M : RIG) -> RING
>
> because that would allow _any_ ring whatsoever as the result. It has to
> be the ring that comes from M via above construction, so there has to be
> extra stuff, such as an embedding of M into the resulting R. But what else?

Despite the embedding you're mentioning, which could be added as follows

module type ConstructRing =
functor (M : RIG) -> sig include RING val embed : M.t -> t end

I don't see what else you could add to the signature (unless you want to
expose that the type t of the RING is a pair of type M.t * M.t; and even
with that you couldn't expose the definitions of the operations in the
signature).

All the properties you're expecting for the resulting ring are part of
(logical) specifications, which cannot be expressed via ocaml types.
You could do that in a richer type system (e.g. Coq).

But may be I misunderstood your question...

--
Jean-Christophe

Andrej Bauer

unread,
Nov 16, 2007, 4:13:35 AM11/16/07
to Jean-Christophe Filliâtre, Caml
Jean-Christophe Filliâtre wrote:
> Despite the embedding you're mentioning, which could be added as follows
>
> module type ConstructRing =
> functor (M : RIG) -> sig include RING val embed : M.t -> t end
>
> I don't see what else you could add to the signature (unless you want to
> expose that the type t of the RING is a pair of type M.t * M.t; and even
> with that you couldn't expose the definitions of the operations in the
> signature).
>
> All the properties you're expecting for the resulting ring are part of
> (logical) specifications, which cannot be expressed via ocaml types.
> You could do that in a richer type system (e.g. Coq).

Not quite. The universal property of the resulting ring R is this:
for any ring T, if f : M -> T is a function preserving 0, + and * then
there is a unique function g : R -> T preserving 0, +, * and - such that
g (include x) = f x, where include is as above. This needs to be
accounted for, and it's not just a logical specification.

Andrej

Alain Frisch

unread,
Nov 16, 2007, 4:24:28 AM11/16/07
to caml-list
Martin Jambon wrote:
> You can write
> match (x : Priv_int.t) with 0 -> true | _ -> false

Actually, you cannot do that, at least with private types as implemented
in OCaml's CVS. And this is to be expected given the lack of implicit
subsumption in OCaml. If you were able to do such a thing, what type
schema would you give to:

let f = function 0 -> true | _ -> false

?

This function should work on integers as well as on values of type
Priv_int.t, but the type algebra cannot express that.


My understanding (Pierre will correct me if I'm wrong) is that private
type abbreviations, as they are currently implemented, can always be
replaced by abstract types without turning a well-typed program into an
ill-typed one. Doing so will prevent some type-based optimizations to
happen, though.

But the really interesting thing is that the new feature opens the door
to extending the subtyping operator :> to take into account the natural
(identity) injection from a private abbreviation to the underlying type.
This is especially useful when the value of the private type appears
deeply nested in a structure. With a normal function that implements the
injection, you need to lift it to the whole structure which forces
useless copies (and worse: the manual lifting may not be possible if
some module declares a covariant type without a corresponding map
function).

For instance:

(l : (Priv_int.t * Priv_int.t) list :> (int * int) list)

instead of

List.map (fun (x, y) -> (Priv_int.to_int x, Priv_int.to_int y)) l

-- Alain

Christophe TROESTLER

unread,
Nov 16, 2007, 4:49:35 AM11/16/07
to OCaml Mailing List, Andrej...@fmf.uni-lj.si, Andrej...@andrej.com
On Fri, 16 Nov 2007 09:23:28 +0100, Andrej Bauer wrote:
>
> This construction is natural in the technical sense that it is left
> adjoint to the forgetful functor(*) from rings to rigs.

OK.

> In what way is natural, or canonical, the construction of Z defined
> by sticking together two copies of N?

I do not think there is something natural about this construction in
the way you outlined it above. It's just ad hoc for this case.

> The universal property of the resulting ring R is this: for any ring
> T, if f : M -> T is a function preserving 0, + and * then there is a
> unique function g : R -> T preserving 0, +, * and - such that g
> (include x) = f x, where include is as above. This needs to be
> accounted for, and it's not just a logical specification.

Would this fit your bill?

module type To_Ring = functor (M: RIG) -> sig


include RING
val embed : M.t -> t

module F : functor (T:RING) -> sig
val lift : (M.t -> T.t) -> (t -> T.t)
end
end

Regards,
ChriS

ross...@ps.uni-sb.de

unread,
Nov 16, 2007, 9:17:38 AM11/16/07
to caml-list
"Alain Frisch" <al...@frisch.fr>:

>
> But the really interesting thing is that the new feature opens the door
> to extending the subtyping operator :> to take into account the natural
> (identity) injection from a private abbreviation to the underlying type.

Indeed. The first thing I was wondering when reading Pierre's description
was how that makes

type t = private int

any different from a bounded abstraction

type t <: int

With OCaml's explicit subtyping coercions, the former seems to be subsumed
by the latter.

- Andreas