[Caml-list] Subtyping structurally-equivalent records, or something like it?

Anthony Tavener

unread,

May 1, 2010, 12:04:28 PM5/1/10

to caml...@yquem.inria.fr

I have this:

type kinematic = { lin: Vec.t; ang: Vec.t }

Which I've been using to represent a medley of physical attributes (force,
momentum, velocity, etc.).

As the physics code becomes increasingly substantial I'm running into
possible human-error, like passing a momentum where a force is expected, or
a mass instead of inverse-mass (mass is also a record though different, but
inv-mass has the same structure as mass). So I'd like to make distinct
types, such as:

type position = { r: Vec.t; theta: Vec.t }
type acceleration = { a: Vec.t; alpha: Vec.t }
type force = { f: Vec.t; tau: Vec.t }

They are structurally equivalent, and ideally I'd like to be able to treat
these as 'kinematic' too, since that is how I would express the arithmetic
and common functions on these types (add, mul, etc).

I'm sure I've seen posts on this before but I can't find them now (though
what I remember are questions about having distinct 'float' types, such as
for degrees vs radians, rather than records).

I know OCaml doesn't have subtypes for records, which is effectively what
I'm looking for. Though this case is a bit more specialized that that... all
the subtypes and base type are structurally equivalent. Code for one of
these types would technically work on any... but is there a way to inform
the type system of my intentions?

I hope someone has a better option than those I've considered, or that I
have a grave misunderstanding somewhere and one of these options is actually
good:

1. Objects. Subtyping makes these a natural fit, but in this case I don't
need anything else which objects provide, and a certainly don't need the
poor performance or method-calling mixed in with my computational code
(aesthetically... yucky, to me). Again, each type is structurally
equivalent. Just some functions want certain types.

2. Using distinct records for each type, but no 'kinematic' base type, so
all common operations are duplicated for each new type. No performance hit.
But the redundant code is horrible -- makes extensions a pain, and a
potential bug-source.

2b. Same as above, but putting the common code in a functor which we apply
on all the different types. I think this will add some overhead, since the
signature of the types (below) would demand accessor functions for the
record fields, in order to uniformly get the fields from the different types
(again, even though they are structurally equivalent) -- these calls
probably wouldn't get optimized out. But maybe there is a better way to do
this?

module type KINEMATIC = sig
type t
val lin : t -> Vec.t
val ang : t -> Vec.t
end

3. Making all the other types different aliases of 'kinematic'; then using
explicit type declarations in function parameters and coercion to
'kinematic' when needed. This makes some ugly code, and the added-typesafety
is almost illusory. This is kind-of like 'typesafe' C code doing typecasting
gymnastics.

4. Adapter functions: 'kinematic_of_force: force -> kinematic', etc. as a
way to use the common set of 'kinematic' functions. This is clunky and comes
with a big performance hit unless these functions became like
type-coercions. If there is a way this can be done with zero runtime cost,
I'd accept the clunkiness. :)

Any thoughts?

I've been using OCaml for a few years now, but this is my first post. I feel
many of you are familiar online personae through reading archives, blogs,
and websites. Thank-you for all the help I've absorbed through those various
channels. And thanks to those making the language I favor for most tasks!

Briefly introducing myself: I've been a professional video-game developer
for 15 years, most recently specializing in AI. I quit my last job almost
two years ago to travel and program (95% in OCaml!), and am developing a
game now. I've seen indications over the years of other game developers
taking the plunge and then parting ways with OCaml, surely back to C++. I
see OCaml as viable and certainly more pleasurable, even with avoiding
mutation. But within a pressure-cooker environment (working for $$ from
someone else) people fall back on what they are most familiar with... also
you can't go too rogue while still being part of a team. :)

-Anthony Tavener

Stéphane Lescuyer

unread,

May 1, 2010, 12:51:24 PM5/1/10

to Anthony Tavener, caml...@yquem.inria.fr

Hi Anthony,
I think that maybe using phantom types could do the trick : consider
defining empty types for all the different "kinds" of similar
constructs that you have, and then define the kinematic record with a
phantom parameter type.

type position
type acceleration
type force

type 'a kinematic = {lin : Vec.t; ang: Vec.t}

By adding some extra typing annotations, you can then constraint your
functions to accept (or produce) only a given kind of construct, say
for example a position kinematic :

let pos_0 : position kinematic = {lin = Vec.origin; ang = Vec.origin }

let double_force (v : force kinematic) : force kinematic = {lin =
Vec.mult 2. v.lin; ang = v.ang }

double_force pos_0 -> does not type check

You can write generic functions as add, norm, products, etc : they are
just polymorphic with respect to the phantom type parameter. By the
way you ensure that you are not multiplying apples and carrots :
let plus (v : 'a kinematic) (v' : 'a kinematic) : 'a kinematic = ...

Of course, the overhead is that you have to explicitely write some
type annotations, especially for constructors, but the runtime
overhead is exactly 0. And also, one limitation is that you can't use
different projection names for the different cosntructs, since it is
really always the same record type that you are using.

I hope this helps,

St�phane L.

> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

--
I'm the kind of guy that until it happens, I won't worry about it. -
R.H. RoY05, MVP06

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Dario Teixeira

unread,

May 1, 2010, 3:55:25 PM5/1/10

to caml...@yquem.inria.fr, Anthony Tavener

Hi,

> type kinematic = { lin: Vec.t; ang: Vec.t }
>
> Which I've been using to represent a medley of physical attributes (force, > momentum, velocity, etc.).

I second St�phane's suggestion of using phantom types; moreover,
I recommend you read an article that discusses them to some detail
and covers their use for precisely this sort of problem:
http://camltastic.blogspot.com/2008/05/phantom-types.html

Cheers,
Dario Teixeira

Sylvain Le Gall

unread,

May 1, 2010, 4:01:33 PM5/1/10

to caml...@inria.fr

On 01-05-2010, Dario Teixeira <dariot...@yahoo.com> wrote:
> Hi,
>
>> type kinematic = { lin: Vec.t; ang: Vec.t }
>>
>> Which I've been using to represent a medley of physical attributes (force, > momentum, velocity, etc.).
>

> I second Stéphane's suggestion of using phantom types; moreover,

> I recommend you read an article that discusses them to some detail
> and covers their use for precisely this sort of problem:
> http://camltastic.blogspot.com/2008/05/phantom-types.html
>

I really like the use of private type abbreviation for phantom type:
http://ocaml.janestreet.com/?q=node/77

Regards,
Sylvain Le Gall

Anthony Tavener

unread,

May 1, 2010, 10:59:55 PM5/1/10

to Stéphane Lescuyer, caml...@yquem.inria.fr

Wow! Thanks St�phane... that was a little piece of magic I was
hoping for. It's a bit verbose, but at least it doesn't affect
performance and it allows all the control over types I need.

I now see I didn't really grok phantom types whenever they were
mentioned. A bit of "in one ear and out the other". Now I have
better grasp on yet another feature of the language.

I know I read that one before, Dario... that title about NASA
is familiar! Too bad I didn't remember it was a solution to the
problem I was now having. Thank-you for that link -- an
excellent description of the problem and a nice solution.

Ah, great stuff... that looks like something to consider too,
Sylvain! There's plenty of material on phantom types but I never
made the connection that they were what I was looking for.

Thank-you everyone for the help and clear explanations! I'm
going to go play with this now. :)

Anthony Tavener

Sylvain Le Gall

unread,

May 4, 2010, 7:55:23 AM5/4/10

to caml...@inria.fr

On 04-05-2010, AUGER Cédric <Cedric...@lri.fr> wrote:
> I am not expert in Ocaml, is the following the same in terms
> of performances as the phantom types?
>
> type kinematic = ...
>
> type force = Force of kinematic
> type momentum = Moment of kinematic
> ...
>
> That is does the constructor introduce an overhead or not?
> As there is only one constructor, no overhead should be done in an
> optimized compiler.
>

The variants are represented using a block. If you introduce a single
variant, it will create a block that points to kinematic. E.g. "Force of
kinematic" will create a pointer to the kinematic structure.

Your construction of force and momentum will add a level of indirection
for every use of the kinematic structure.

Phantom type doesn't add a level of indirection and left no trace in the
generated assembler.

This is not about optimized compiler in this case but about data
representation. Even if you use an optimized compiler (which is not
really the case with ocamlopt), you won't change datastructure
representation to optimize.

ross...@mpi-sws.org

unread,

May 4, 2010, 8:47:12 AM5/4/10

to Sylvain Le Gall, caml...@inria.fr

"Sylvain Le Gall" <syl...@le-gall.net>:

>
> This is not about optimized compiler in this case but about data
> representation. Even if you use an optimized compiler (which is not
> really the case with ocamlopt), you won't change datastructure
> representation to optimize.

What do you mean? There is no reason in general why a compiler cannot
optimize data representations, and some do in cases like this.

/Andreas

Edgar Friendly

unread,

May 4, 2010, 10:29:12 AM5/4/10

to caml...@yquem.inria.fr

On 05/04/2010 07:53 AM, Sylvain Le Gall wrote:
> On 04-05-2010, AUGER Cédric<Cedric...@lri.fr> wrote:
>> type momentum = Moment of kinematic
>>

>> That is does the constructor introduce an overhead or not?
>> As there is only one constructor, no overhead should be done in an
>> optimized compiler.
>>

> This is not about optimized compiler in this case but about data
> representation. Even if you use an optimized compiler (which is not
> really the case with ocamlopt), you won't change datastructure
> representation to optimize.
>

The OCaml compiler *could* special-case this kind of constructor, but as
there's the syntax:

type momentum = kinematic

Which produces the non-boxed kinematic values, the authors probably
decided to follow the maxim "Do what the programmer says" for singleton
variant types. The question becomes whether phantom types solve this
problem sufficiently or do we need another type-level construct -
explicit subtyping relationships. Forever ago I suggested this to
achieve a similar goal, and was given yet another solution:

module M : sig
type momentum
val of_kin : kinematic -> momentum
val to_kin : momentum -> kinematic
end = struct
type momentum = kinematic
let of_kin x = x
let to_kin x = x
end

Yes, it's a lot of boilerplate for each type, but you only have to write
it once (per type), and cross-module inlining should give zero runtime
cost. If not, use "%identity", and expose it in the interface. This
method is along the lines of Anthony's proposal #4.

E.

Sylvain Le Gall

unread,

May 4, 2010, 10:31:03 AM5/4/10

to caml...@inria.fr

On 04-05-2010, ross...@mpi-sws.org <ross...@mpi-sws.org> wrote:
> "Sylvain Le Gall" <syl...@le-gall.net>:
>>
>> This is not about optimized compiler in this case but about data
>> representation. Even if you use an optimized compiler (which is not
>> really the case with ocamlopt), you won't change datastructure
>> representation to optimize.
>
> What do you mean? There is no reason in general why a compiler cannot
> optimize data representations, and some do in cases like this.
>

Anyway, if it comes to data alignement and things like that, the
compiler should optimize data representations. But in this case, I
really don't think we are talking about data alignement.

Regards,
Sylvain Le Gall

Fabrice Le Fessant

unread,

May 4, 2010, 11:13:03 AM5/4/10

to Sylvain Le Gall, caml...@inria.fr

"ocamlopt" does optimize data representation: for example, it can unbox
floats into registers, or into arrays of floats. However, there is a
tradeoff between such optimizations and efficiency: when you do too much
representation optimisation, you might end up performing a lot of tests
because a given type can be represented in multiple formats, and
performing a lot of transformations between the representations,
especially as the FFI (Ocaml interface to C) specifies the
representation of data.

Of course, I am not saying it is not possible to do better: it requires
a lot of energy, the compiler code becomes more complex, and the
improvement on speed is not clear.

--Fabrice

--
Fabrice LE FESSANT
Chercheur, Equipe ASAP
(As Scalable As Possible)
http://www.lefessant.net/

INRIA-Futurs, Bat P - 112
Parc Orsay Universit�
2-4, rue Jacques Monod
F-91893 Orsay Cedex, FRANCE

Goswin von Brederlow

unread,

May 5, 2010, 5:31:47 AM5/5/10

to ross...@mpi-sws.org, Sylvain Le Gall, caml...@inria.fr

ross...@mpi-sws.org writes:

> "Sylvain Le Gall" <syl...@le-gall.net>:
>>
>> This is not about optimized compiler in this case but about data
>> representation. Even if you use an optimized compiler (which is not
>> really the case with ocamlopt), you won't change datastructure
>> representation to optimize.
>
> What do you mean? There is no reason in general why a compiler cannot
> optimize data representations, and some do in cases like this.

How could it? At least for any type that is public in a module.

The data representation is part of the ABI. As such it is fixed and can
in no way be optimized by the compiler. Only thing you can do is change
the ABI and define a more optimized representation in the first place.

As example

type foo = { x : int; y : int; }
type bar = Bar of foo

Currently a bar is represented as

foo {
tag = 0
size = 2
field[0] = int(x)
field[1] = int(y)
}
bar {
tag = 0 (for Bar)
size = 1
field[0] = &foo
}

2 Blocks taking up 5 words.

A better representation would be to combine the two:

bar {
tag = 0 (for Bar)
size = 2
field[0] = int(x)
field[1] = int(y)
}

A single block of 3 words.

This also works for

type bar = Foo of foo | Blub of foo | Blubber of foo

but not for

type baz = Baz of bar

There are lots of cases where the representation could be improved for
special cases. But ocaml I think only does that for float arrays or
records that only contain floats. But that is a design decision and not
something the compiler can just optimize.

MfG
Goswin

Goswin von Brederlow

unread,

May 5, 2010, 5:33:55 AM5/5/10

to Edgar Friendly, caml...@yquem.inria.fr

Edgar Friendly <thele...@gmail.com> writes:

I think that can be cut down to:

module M = struct
type momentum = private kinematic
let of_kin = %identity
let to_kin = %identity
end

> Yes, it's a lot of boilerplate for each type, but you only have to
> write it once (per type), and cross-module inlining should give zero
> runtime cost. If not, use "%identity", and expose it in the
> interface. This method is along the lines of Anthony's proposal #4.
>
> E.

MfG
Goswin

Yaron Minsky

unread,

May 5, 2010, 7:10:48 AM5/5/10

to Edgar Friendly, caml...@yquem.inria.fr

On Tue, May 4, 2010 at 9:37 AM, Edgar Friendly <thele...@gmail.com> wrote:

> module M : sig
> type momentum
> val of_kin : kinematic -> momentum
> val to_kin : momentum -> kinematic
> end = struct
> type momentum = kinematic
> let of_kin x = x
> let to_kin x = x
> end
>
> Yes, it's a lot of boilerplate for each type, but you only have to write it
> once (per type), and cross-module inlining should give zero runtime cost.
> If not, use "%identity", and expose it in the interface. This method is
> along the lines of Anthony's proposal #4.

You can do the above solution with essentially no boilerplate:

module M = struct
type t = kinematic
val to_kin x = x
val of_kin x = x
end

module type S = sig
type t
val to_kin : t -> kinematic
val of_kin : kinematic -> t
end

module Momentum : S = M
module Position : S = M
module Force : S = M

and so on. Just one line per kinematic-like type you want to create.

That said, I still think the phantom type solution will end up cleaner.

y

ross...@mpi-sws.org

unread,

May 5, 2010, 8:12:25 AM5/5/10

to Goswin von Brederlow, caml...@inria.fr

"Goswin von Brederlow" <goswi...@web.de>:

>
>>> This is not about optimized compiler in this case but about data
>>> representation. Even if you use an optimized compiler (which is not
>>> really the case with ocamlopt), you won't change datastructure
>>> representation to optimize.
>>
>> What do you mean? There is no reason in general why a compiler cannot
>> optimize data representations, and some do in cases like this.
>
> How could it? At least for any type that is public in a module.
>
> The data representation is part of the ABI. As such it is fixed and can
> in no way be optimized by the compiler. Only thing you can do is change
> the ABI and define a more optimized representation in the first place.

Yes, and I didn't say that OCaml easily could, given external constraints
like the one you mention. I only was objecting to the statement that this is
not an optimization.

> A better representation would be to combine the two:
>
> bar {
> tag = 0 (for Bar)
> size = 2
> field[0] = int(x)
> field[1] = int(y)
> }

That is called flattening or unboxing, and in degenerate use cases it can
actually be costly because you have to copy the record if you extract it
first-class. However, for the original case there would be a much simpler
optimization: if a data type has only one constructor (more precisely, one
except for nullary ones), you don't need to represent its tag at all, so the
whole indirection is unnecessary.

/Andreas

Goswin von Brederlow

unread,

May 5, 2010, 12:46:52 PM5/5/10

to ross...@mpi-sws.org, caml...@inria.fr

ross...@mpi-sws.org writes:

Actualy in this case you don't. In foo the "tag" part of the block is
unused and in bar it is the constructor id. (foo) and (Bar foo) just
happen to have the same representation. Extracting a foo from a bar
means you just ignore the tag part, which is already ignored anyway.

The case of only having one constructor can be seen as special case for
this in the case of records. But type gnarf = Gnarf of int could be
represented as plain int as you say. The idea is the same.

The reason ocaml does not do this is that defining lots of exception for
the representation of data makes the compiler much more complicated and
ocamls compiler aims at being simple. At least that's what has been said
in the past.

MfG
Goswin

AUGER

unread,

May 7, 2010, 5:42:41 AM5/7/10

to Anthony Tavener, caml...@yquem.inria.fr

Le Sun, 02 May 2010 04:59:48 +0200, Anthony Tavener
<anthony...@gmail.com> a écrit:

> Wow! Thanks Stéphane... that was a little piece of magic I was

Just one last thing about this thread:
It is not really related to type checking,
but more on a way to convince the programmer that it uses rightfully a
force and not a position;
it is the use of labels. Imagine you have this function:

(** [next_position p v dt] is the next position of an object on position
[p] with
a speed [v] in a time [dt] *)
val next_position: kinematic -> kinematic -> float -> kinematic

as far as you have the documentation next to you, you won't make mistake
between the two `kinematic';
but if you don't have it, then you may not recall if you have to write
`next_position v p dt' or `next_position p v dt'

but now, imagine you have:

(** [next_position p v dt] is the next position of an object on position
[p] with
a speed [v] in a time [dt] *)
val next_position: pos:kinematic -> speed:kinematic -> laps:float ->
kinematic

then you just have to use the right labels to make it work:
`next_position ~speed:v ~pos:p ~laps:dt'.

As I said, this is not a type checking solution, but it can make the
program clearer and allows the user to forget parameters order (but he
needs to memorize label names)

Just beware with higher order:

# let g ~x ~y = x - y;;
val g : x:int -> y:int -> int = <fun>
# let app (f: y:int -> x:int -> int) a b = f ~x:a ~y:b;;
val app : (y:int -> x:int -> int) -> int -> int -> int = <fun>
# app g;;
Error: This expression has type x:int -> y:int -> int
but an expression was expected of type y:int -> x:int -> int

--
Cédric AUGER

Univ Paris-Sud, Laboratoire LRI, UMR 8623, F-91405, Orsay