Sum and recursive types

26 views
Skip to first unread message

Eric Holk

unread,
Oct 2, 2012, 7:03:20 PM10/2/12
to harla...@googlegroups.com
One of the major benefits we're touting of Harlan's region-based memory management is that it lets us handle richer data structures than nested arrays. So far, however, our only interesting type constructor is a vector, which can be nested to arbitrary (though fixed) depth. Sure, we can do ragged arrays instead of only strictly rectangular structures, but we are still missing out on a lot of things.

In order to typecheck things like lists and trees, we need some kind of recursive types. I think one of the reasons we haven't added them is that we're not sure what the best design is.

The standard way to do these is with algebraic data types. For example, we could define a type `List a`, which has constructors `Nil` and `Cons a (List a)`. I'm not super fond of this for Harlan, because so far we've been able to avoid any type declarations and annotations. With ADTs, we'd definitely need a way to declare new types, and chances are we'd eventually need type annotations. As much as possible, I'd like Harlan to feel like Scheme, even if it's a statically typed language.

One option would be to include a dynamic type, and use that as our only form of recursive types. This likely means we'll end up with a lot of type checks everywhere, which is probably not great for GPUs. In practice, I'm not sure it will be bad though, as truly heterogeneous data structures seem to be rare. Often if you have a list, its contents are all symbols or all ints, etc. This approach might also lead to some clever implementations strategies. For example, we could use a Big Bag of Pages style layout, where all the ints in a list are in one heap (or region?), the strings are in another heap, the symbols are in another, and so on. Then we could generate one kernel for each type, and run each kernel in succession as needed. In many cases, this would allow us to remove run time type checks.

Today we discussed trying to write something like a control flow analysis in Harlan. In this case, ADTs would make a lot of sense. The Expr type would have the usual (Var x), (Lambda x Expr), (App Expr Expr) forms. To go along with these, we could add a new form called something like match-kernel. It might look something like this:

(match-kernel e
  ((Var x) (list x))
  ((Lambda x e) (remove x e))
  ((App e1 e2) (union e1 e2)))

This kernel would end up calculating the free variables of an expression. Harlan would translate it into a bottom-up traversal. It'd be required to have an exhaustive set of patterns, and each body would return the same type. In terms of our usual Scheme match, it'd be like each pattern binding has a catamorphism automatically applied.

Of course, bottom-up isn't always the traversal we'd want. For example, we can't very easily thread an environment through in this example, making it hard to do an environment passing interpreter. However, we could probably write a term-rewriting style interpreter, where the substitution would be written as a kernel. For example:

(define (subst x v e)
  (match-kernel e
    ((Var y) (if (= x y) v (Var y)))
    ((Lambda y b) (if (= x y) e (Lambda y b))
    ((App e1 e2) (App e1 e2))))

In some ways, this form combines elements of both map and reduce.


Finally, we could probably make sum types work without explicit constructors. For example, the type of (if (odd? x) #t x) might be (sum bool int). An expression like (vector #t 5) might have the type (vec (sum bool int)). I think this could cause problems though... It's not really clear how to do elim forms (can we do them implicitly, or must they be explicit?), and type inference could also be tricky as it may not be clear whether mismatched types should be an error or whether the type inferencer should unify them as a sum type. We may end up with a lot of cases where we don't have a unique type.

What do people think about these suggestions? Are there other ways that our type system could express the things we need that I didn't mention here?

Eric Holk

unread,
Oct 12, 2012, 1:06:36 PM10/12/12
to harla...@googlegroups.com
I thought I'd post another followup after our last meeting. I've been advocating open union types as a way of having a scheme-like feel in a statically typed language. In our discussions in the meeting yesterday, we realized that doing this requires most of the machinery for ADTs anyway, and that it's straightforward to translate open unions into ADTs as another frontend pass. Because of this, I think we should use standard ADTs to start with, and possible add open unions later.
Reply all
Reply to author
Forward
0 new messages