[Caml-list] GADT constructor syntax

Jacques Le Normand

unread,

Dec 4, 2010, 2:25:09 PM12/4/10

to caml-list caml-list

Dear caml-list,
I would like to start a constructive discussion on the syntax of GADT
constructors of the ocaml gadt branch, which can be found at:

https://sites.google.com/site/ocamlgadt/

There are two separate issues:

1) general constructor form

option a)

type _ t =
TrueLit : bool t
| IntLit of int : int lit

option b)

type _ t =
TrueLit : bool t
| IntLit : int -> int lit

I'm open to other options. The branch has used option b) from the
start, but I've just switched to option a) to see what it's like

Personal opinion:

I slightly prefer option b), because it makes it clear that it's a
gadt constructor right from the start. This is useful because the type
variables in gadt constructors are independent of the type parameters
of the type, consider:

type 'a t = Foo of 'a : 'b t

this, counter intuitively, creates a constructor Foo of type forall 'd
'e. 'd t -> 'e t.

2) explicit quantification of existential variables

option a)

leave existential variables implicitly quantified. For example:

type _ u = Bar of 'a t : u
or
type _ u = Bar : 'a t -> u

option b)

specifically quantify existential variables. For example:

type _ u = Bar of 'a. 'a t : u
or
type _ u = Bar : 'a. 'a t -> u

Currently, the branch uses option a).

Personal opinion: I prefer option b). This is for four reasons:

I) the scope of the explicitly quantified variable is not clear. For
example, how do you interpret:

type _ u = Bar of 'a. 'a : 'a t
or
type _ u = Bar : 'a. 'a -> 'a t

In one interpretation bar has type forall 'a 'b. 'a -> 'b t and in
another interpretation it has type forall 'a. 'a -> 'a t. My
inclination would be to flag it as an error.

II)

In the example of option b), the 'a variable is quantified as a
universal variable but, in patterns, it is used as an existential
variable. This is something I found very confusing in Haskell where
they actually use the 'forall' keyword.

III) option a) is the current Haskell GADT syntax and I've never heard
anyone complain about it

IIII) I don't see how option b) improves either readability or bug prevention

I look forward to hearing your opinions.

--Jacques Le Normand

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

gasche

unread,

Dec 4, 2010, 2:37:07 PM12/4/10

to Jacques Le Normand, caml-list caml-list

I prefer option a) for the constructor form, and b) for the quantification
syntax. I'm strongly sure that the explicit quantification is a much better
choice, while I think the first choice (constructor form) is more of a
matter of taste.

I think the constructor form using "of" is better as the general function
syntax suggests currying is possible (| Foo : a -> b -> c), while it is not
possible for constructors (or you maybe it is, and then should support it).
"of foo * bar * baz" is not perfect (I prefer the revised syntax "of foo and
bar and baz"), but at least it's a consistent defect we're used to.

Basile Starynkevitch

unread,

Dec 4, 2010, 2:39:53 PM12/4/10

to Jacques Le Normand, caml-list caml-list

On Sat, 4 Dec 2010 14:25:01 -0500
Jacques Le Normand <rathe...@gmail.com> wrote:

> Dear caml-list,
> I would like to start a constructive discussion on the syntax of GADT
> constructors of the ocaml gadt branch, which can be found at:
>
> https://sites.google.com/site/ocamlgadt/
>
> There are two separate issues:
>
> 1) general constructor form
>
> option a)
>
> type _ t =
> TrueLit : bool t
> | IntLit of int : int lit
>
> option b)
>
> type _ t =
> TrueLit : bool t
> | IntLit : int -> int lit
>

I would believe that we need more examples to choose the most readable
or natural syntax. Because both options have their valid point.

Maybe you might
1. provide a more convincing example, e.g. a 20 lines tutorial example,
and give it in both syntaxes. This let people choose really...

2. choose one syntax for the ocamlyacc (or menhir) based parser, and
provide an Ocamlp4 (or is it Ocamlp5) extension for the other syntax.

3. support both syntaxes at once, and have the user choose what fits him best.

Cheers.
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

Lukasz Stafiniak

unread,

Dec 4, 2010, 2:42:07 PM12/4/10

to Jacques Le Normand, caml-list caml-list

Hi! Just a quick answer for now.

On Sat, Dec 4, 2010 at 8:25 PM, Jacques Le Normand <rathe...@gmail.com> wrote:
> Dear caml-list,
> I would like to start a constructive discussion on the syntax of GADT
> constructors of the ocaml gadt branch, which can be found at:
>
> https://sites.google.com/site/ocamlgadt/
>
> There are two separate issues:
>
> 1) general constructor form
>
> option a)
>
> type _ t =
> TrueLit : bool t
> | IntLit of int : int lit
>
> option b)
>
> type _ t =
> TrueLit : bool t
> | IntLit : int -> int lit
>
> I'm open to other options. The branch has used option b) from the
> start, but I've just switched to option a) to see what it's like

I like option (a) for consistency with the existing OCaml syntax, and
while I like option (b) for its conformance to standard notation, I
don't like your reasons for liking (b) ;-)

> I slightly prefer option b), because it makes it clear that it's a
> gadt constructor right from the start. This is useful because the type
> variables in gadt constructors are independent of the type parameters
> of the type, consider:
>
> type 'a t = Foo of 'a : 'b t
>
> this, counter intuitively, creates a constructor Foo of type forall 'd
> 'e. 'd t -> 'e t.

I think that the scope should propagate, i.e. that somehow the 'a
should really be bound, giving

Foo : forall 'a . 'a t -> 'a t

> 2) explicit quantification of existential variables

I don't even like the problem formulation. I think that existential
variables should not be differentiated from universal variables. (So I
think I like what you don't like about the Haskell solution).

>
> option a)
>
> leave existential variables implicitly quantified. For example:
>
> type _ u = Bar of 'a t : u
> or
> type _ u = Bar : 'a t -> u
>
> option b)
>
> specifically quantify existential variables. For example:
>
> type _ u = Bar of 'a. 'a t : u
> or
> type _ u = Bar : 'a. 'a t -> u
>
> Currently, the branch uses option a).

For me, it is a question for _all_ variables whether be implicitly or
explicitly quantified...

> I) the scope of the explicitly quantified variable is not clear. For
> example, how do you interpret:
>
> type _ u = Bar of 'a. 'a : 'a t
> or
> type _ u = Bar : 'a. 'a -> 'a t
>
> In one interpretation bar has type forall 'a 'b. 'a -> 'b t and in
> another interpretation it has type forall 'a. 'a -> 'a t.

Of course the "forall 'a. 'a -> 'a t" as far as I'm concerned!

> II)
>
> In the example of option b), the 'a variable is quantified as a
> universal variable but, in patterns, it is used as an existential
> variable. This is something I found very confusing in Haskell where
> they actually use the 'forall' keyword.

It often happens in logic! You have two sides of the turnstyle...

I'm sorry if I sounded harsh, not my intention!

Best Regards.

Jacques Le Normand

unread,

Dec 4, 2010, 3:14:57 PM12/4/10

to Lukasz Stafiniak, caml-list caml-list

see below

having

type 'a t = Foo of 'a : 'b t

creating a constructor of type forall 'a. 'a -> 'a t is really
confusing since the user explicitly gave the return type of the
constructor as 'b t.

Jacques Le Normand

unread,

Dec 4, 2010, 3:54:26 PM12/4/10

to Lukasz Stafiniak, caml-list caml-list

you may want to name your type parameters because you can mix GADT
constructors with normal constructors. consider:

type 'a term =
| Ignore of 'a term : int term
| Embed of 'a

I don't think propagating type parameters is a good idea. The current
rule is: ignore type parameters in GADT constructors. This is simpler
than unifying the type parameters with the arguments of the return
type of the constructor.

On Sat, Dec 4, 2010 at 3:22 PM, Lukasz Stafiniak <luks...@gmail.com> wrote:

> On Sat, Dec 4, 2010 at 9:14 PM, Jacques Le Normand <rathe...@gmail.com> wrote:
>>
>> having
>>
>> type 'a t = Foo of 'a : 'b t
>>
>> creating a constructor of type forall 'a. 'a -> 'a t is really
>> confusing since the user explicitly gave the return type of the
>> constructor as 'b t.
>

> But why would the user have named 'a in the first place then. In the
> context of Foo, it should be assumed that
>
> 'a t = 'b t

Lukasz Stafiniak

unread,

Dec 4, 2010, 4:00:53 PM12/4/10

to Jacques Le Normand, caml-list caml-list

On Sat, Dec 4, 2010 at 9:54 PM, Jacques Le Normand <rathe...@gmail.com> wrote:
> you may want to name your type parameters because you can mix GADT
> constructors with normal constructors. consider:
>
> type 'a term =
> | Ignore of 'a term : int term
> | Embed of 'a
>
> I don't think propagating type parameters is a good idea. The current
> rule is: ignore type parameters in GADT constructors. This is simpler
> than unifying the type parameters with the arguments of the return
> type of the constructor.

But it expands to:

> type 'a term =
> | Ignore of 'a term : int term

> | Embed of 'a : 'a term

Jacques Le Normand

unread,

Dec 4, 2010, 4:07:02 PM12/4/10

to Lukasz Stafiniak, caml-list caml-list

in this case, yes, but if you have constraints then it is different. consider:

type 'a term =
| Ignore of 'a : int term
| Embed of 'a
constraint 'a = int

this is different from:

type 'a term =
| Ignore of 'a : int term

| Embed of 'a : 'a term

constraint 'a = int

(in fact, an error is flagged on the second one)

Also, it can save the user some typing.

Lukasz Stafiniak

unread,

Dec 5, 2010, 3:10:31 AM12/5/10

to Jacques Le Normand, caml-list caml-list

On Sat, Dec 4, 2010 at 10:06 PM, Jacques Le Normand
<rathe...@gmail.com> wrote:
> in this case, yes, but if you have constraints then it is different. consider:
>
> type 'a term =
> | Ignore of 'a : int term
> | Embed of 'a
> constraint 'a = int
>
> this is different from:
>
> type 'a term =
> | Ignore of 'a : int term
> | Embed of 'a : 'a term
> constraint 'a = int
>
> (in fact, an error is flagged on the second one)
>
> Also, it can save the user some typing.

I don't think that it is very different. Standard-language constraints
are outside of implications, so they apply to each branch. It already
behaves this way in the standard language, basically restricting the
type family to "int term". So I don't see why

> | Embed of 'a : 'a term
> constraint 'a = int

shouldn't mean that Embed is "basically" Embed of int : int term. My
position is that if a type variable should be treated as local to a
branch, it should be explicitly quantified (using the dot notation,
for example "Ignore of 'a. 'a : int term", without any exceptions to
"existential" variables). And that the patterns

type X =
| Branch of Y

and

type X =
| Branch of Y : X

should be equivalent (where X could for example be 'a term).

I would accept concessions to my general outlook on the grounds of
being conservative over standard OCaml programmer intuitions and
conciseness of code... when they apply...

Lukasz Stafiniak

unread,

Dec 5, 2010, 3:17:20 AM12/5/10

to Jacques Le Normand, caml-list caml-list

On Sun, Dec 5, 2010 at 9:10 AM, Lukasz Stafiniak <luks...@gmail.com> wrote:
>
> I don't think that it is very different. Standard-language constraints
> are outside of implications, so they apply to each branch. It already
> behaves this way in the standard language, basically restricting the
> type family to "int term". So I don't see why
>
>> | Embed of 'a : 'a term
>> constraint 'a = int
>
> shouldn't mean that Embed is "basically" Embed of int : int term.

[To clarify: and the whole family restricted to "int term".]

Lukasz Stafiniak

unread,

Dec 5, 2010, 3:26:01 AM12/5/10

to Jacques Le Normand, caml-list caml-list

On Sun, Dec 5, 2010 at 9:10 AM, Lukasz Stafiniak <luks...@gmail.com> wrote:
>

> shouldn't mean that Embed is "basically" Embed of int : int term. My
> position is that if a type variable should be treated as local to a
> branch, it should be explicitly quantified (using the dot notation,
> for example "Ignore of 'a. 'a : int term", without any exceptions to
> "existential" variables). And that the patterns
>
> type X =
> | Branch of Y
>
> and
>
> type X =
> | Branch of Y : X
>
> should be equivalent (where X could for example be 'a term).

To clarify again: modulo variable capture! That is, the pattern

> type X =
> | Branch of 'a. Y : X

will be different (unless 'a is not used in Y nor X).

bluestorm

unread,

Dec 5, 2010, 3:35:42 AM12/5/10

to Lukasz Stafiniak, caml-list caml-list

I'm not sure I see the point of this long discussion between Lukasz and
Jacques.

It seems everybody agree that it is a good idea to explicitely quantify the
constructor-specific type variables.

The question Jacques raised is wether, when we write (| Foo of 'a . S : T)
or (| Foo : 'a . S -> T), the 'a is quantified on S only, or on both S and
T.

It think we all agree that, for semantical reasons, quantification should be
scoped over both S and T. However, the (of S : T) syntax does not make it
very obvious, and this should be rightfully considered as a defect of this
syntax.

I'm under the impression that your intense debate is about the edge case
where :
1. we don't use explicit quantification of constructor-specific variables
2. we reuse the type parameter variables in the type of a GADT constructor
(so they're implicitly shadowed by the implicit quantifications, or maybe
not)

If we reject possibly 1. (and ask for explicit quantification), all is well.
If you want to consider option 1., then I think the edge case 2. is evil and
shoud result in a warning.

Jacques Garrigue

unread,

Dec 5, 2010, 7:22:09 PM12/5/10

to bluestorm, caml...@yquem.inria.fr

On 2010/12/05, at 17:35, bluestorm wrote:

> I'm not sure I see the point of this long discussion between Lukasz and Jacques.
>
> It seems everybody agree that it is a good idea to explicitely quantify the constructor-specific type variables.
>
> The question Jacques raised is wether, when we write (| Foo of 'a . S : T) or (| Foo : 'a . S -> T), the 'a is quantified on S only, or on both S and T.
>
> It think we all agree that, for semantical reasons, quantification should be scoped over both S and T. However, the (of S : T) syntax does not make it very obvious, and this should be rightfully considered as a defect of this syntax.
>
> I'm under the impression that your intense debate is about the edge case where :
> 1. we don't use explicit quantification of constructor-specific variables
> 2. we reuse the type parameter variables in the type of a GADT constructor (so they're implicitly shadowed by the implicit quantifications, or maybe not)
>
> If we reject possibly 1. (and ask for explicit quantification), all is well. If you want to consider option 1., then I think the edge case 2. is evil and shoud result in a warning.

Actually I'm not sure that fully explicit quantification is necessary, or even desirable.
The reason is that the gadt extension introduces actually two kinds of case-specific
type variables: universals and existentials.

Namely, in the type

type _ term =
Int : int -> int term
| Bool : bool -> bool term
| Lam : ('a -> 'b) -> ('a -> 'b) term
| App : ('a -> 'b) * 'a -> 'b term

the type variables 'a and 'b in Lam are universals, but
in App only 'b is universal, while 'a is existential.

Personally, I would prefer this to be written:

type _ term =
Int of int : int term
| Bool of bool : bool term
| Lam of ('a -> 'b) : ('a -> 'b) term
| App of 'a. ('a -> 'b) * 'a : 'b term

That is, use a syntax close to the current ocaml one, which makes
easy to quantify clearly existential variables, so that they cannot
be confused with universal ones. If we use the arrow syntax,
the natural scope for quantification includes the return type, which
is incorrect for existential types. And using the "of" syntax, it is hard
to quantify type variables appearing in the return type, so I think
this is better to leave the universals implicit.

Considering the definition

type 'a t = Foo of 'a : 'b t

I think that is should either unify 'a and 'b, being in practice
equivalent to "type 'a t = Foo of 'a", or flag an error because 'a is not
available in this branch. I might actually prefer the unification approach,
because this conforms to the intuition that the scope of 'a is the whole
type definition, but maybe people can come to understand that an
explicit return type overrides this.

Jacques Garrigue

Daniel de Rauglaudre

unread,

Dec 6, 2010, 4:11:06 AM12/6/10

to caml...@yquem.inria.fr

Hi,

On Sat, Dec 04, 2010 at 08:25:07PM +0100, Jacques Le Normand wrote:

> option a)
> type _ t =
> TrueLit : bool t
> | IntLit of int : int lit
> option b)
> type _ t =
> TrueLit : bool t
> | IntLit : int -> int lit

For a:
A little bit easier to parse (as a Camlp5 designer), which is just
a parse of the 'of' part followed (or not) with the ':' and another
type. In the option b), in the "revised" syntax version, I decided
to also separate the constructor parameters with arrows, which forced
me to add a function separating the ending type, the parsing being
therefore more complicated.

For b:
1/ The difference between normal constructors and constructors with GADT
is very visible. All examples given here are often with definitions
which are relatively short, but I tried an example with constructors
with several long types and I like the idea of seing immediately
that they are GADTs, rather at the end of the definition line (or
lines).
2/ I like the idea of defininig them with a syntax like the one of
functions definitions, like 'val'. In the "revised" syntax version,
where the constructors parameters are in "curried" form, this is
even more readable (even if partially applied parameters are
not accepted in the OCaml compiler).

PS The latest version of Camlp5 works with Jacques' version. You can
download it at: http://pauillac.inria.fr/~ddr/camlp5/

PPS The version with GADT is very interesting to parse parsing rules. I
is what I was searching for many years. In Camlp[45], the EXTEND statement
generates calls to Obj.magic and types constraints to "extend" the
OCaml type system. (This cannot be simulated by first class modules.)

--
Daniel de Rauglaudre
http://pauillac.inria.fr/~ddr/