Next steps

Geoffrey Irving

unread,

May 4, 2011, 9:34:27 PM5/4/11

to duck...@googlegroups.com, Dylan Simon

I'm not sure if there's an easy way to divide the programming task of the LLVM backend, so it may be worth looking forward a bit further so that we have fully decoupled pieces to work on. There will be a slew of tasks looking towards self-hosting, but in order to do them cleanly it'd be good to nail down modules first. We'll probably also want macros for printf, etc. That gives us three tasks to choose from:

1. LLVM backend
2. Clean modules and Koenig lookup
3. Macros.

(3) is lower priority than (1) and (2). I'm happy to work on any of them. Which would you prefer to tackle?

By the way, I think the initial LLVM backend should use the Haskell bindings. Writing our own Haskell bindings won't significantly lessen the work of writing the duck bindings once we get there.

Geoffrey

Dylan Simon

unread,

May 4, 2011, 10:06:04 PM5/4/11

to duck...@googlegroups.com

Aggregating responses due to laziness. Feel free to resplit.

> I'm not sure if there's an easy way to divide the programming task of the LLVM backend, so it may be worth looking forward a bit further so that we have fully decoupled pieces to work on. There will be a slew of tasks looking towards self-hosting, but in order to do them cleanly it'd be good to nail down modules first. We'll probably also want macros for printf, etc. That gives us three tasks to choose from:
>
> 1. LLVM backend
> 2. Clean modules and Koenig lookup
> 3. Macros.
>
> (3) is lower priority than (1) and (2). I'm happy to work on any of them. Which would you prefer to tackle?

I am interested in working on Macros at some point. I don't want to take on
too much at once, but side things that don't hold other stuff up are
reasonable. That said, between 1 and 2 I am more interested in how 1 goes.

> By the way, I think the initial LLVM backend should use the Haskell bindings. Writing our own Haskell bindings won't significantly lessen the work of writing the duck bindings once we get there.

Agreed.

What do you think about using a function-application-like syntax for
delay/macro, both to keep it out of the type constructor space and allow it to
fit into both types and lambdas? It sort of makes sense because these things
are functions on arguments (types) in a sense:

if :: Bool -> delay a -> delay a -> a
if = t -> delay x -> delay y -> ...

Obviously in overloads you should only need to specify it in the type, but in
lambdas it seems nicer than {x :: Delay a -> ...}, and it is an unused part of
the syntax (application in types and patterns). {delay -> ...} would still be
a (type) variable named "delay" of course, and {Delay a} would still be an
abstract type constructor for an (already) delayed thunk.

> 1. Replace the Var with a reference to the partially traversed Ptrie,
> so that each new applied argument only has to traverse through one
> Ptrie step, and (importantly) doesn't need access to progOverloads.
> Unfortunately, the naive way of doing this would require making the
> Ptrie's mutable, since we're now converting references into functions
> into ValClosure at Ir -> Lir conversion time. This is worth some
> thought to get right. It's clear that either some mutability or
> access to a global (essentially mutable) map is required. For
> example, we could load and compile a module defining a sum function,
> which contains an internal reference to (+). Later we might load a
> module of complex numbers, which defines another (+) overload. sum
> needs to be able to access this overload. The same applies to
> partially evaluated ValClosures created via constant folding.

Are there ever cases where we need to create a reference to overloads that
don't exist yet? In haskell, you'd often define a class, and functions that
reference it, without any instances in that module, which in duck's case would
lead to references to functions that don't have any overloads (yet). Maybe we
don't, but it will affect what kind of mutability is needed to support this.

> 2. Replace TypeVal with a thin TypeRep wrapper around an integer, and
> provide a module that caches TypeVals into unique TypeReps.

We've talked about this before. There might have even been a design document
for it?

Geoffrey Irving

unread,

May 5, 2011, 1:37:01 AM5/5/11

to duck...@googlegroups.com, Dylan Simon

On Wed, May 4, 2011 at 7:06 PM, Dylan Simon <dylan-...@dylex.net> wrote:
> Aggregating responses due to laziness. Feel free to resplit.

Merged is fine.

>> I'm not sure if there's an easy way to divide the programming task of the LLVM backend, so it may be worth looking forward a bit further so that we have fully decoupled pieces to work on. There will be a slew of tasks looking towards self-hosting, but in order to do them cleanly it'd be good to nail down modules first. We'll probably also want macros for printf, etc. That gives us three tasks to choose from:
>>
>> 1. LLVM backend
>> 2. Clean modules and Koenig lookup
>> 3. Macros.
>>
>> (3) is lower priority than (1) and (2). I'm happy to work on any of them. Which would you prefer to tackle?
>
> I am interested in working on Macros at some point. I don't want to take on
> too much at once, but side things that don't hold other stuff up are
> reasonable. That said, between 1 and 2 I am more interested in how 1 goes.

(1) is fairly huge and daunting, so why don't you toy with macros and
I'll slowly chip away at (1). For my first chip, I'll make duck
execution imperative and get rid of the low level monad stuff.

One wrinkle I can see with macros is how to preserve single pass type
inference / macro evaluation in the presence of variable definitions.
If you have something like

printf (complicated-expression) x y z

you know after type inferring the printf that the complicated
expression should be evaluated immediately, and all is well. However,
if you have

s = complicated-expression
printf s x y z

you'd have to go back and reevaluate s. Doable, but a bit unpleasant.
One option is to have a special syntax for compile time variable
declarations, perhaps

macro s = complicated-expression
printf s x y z

That parallels what you'd write for an anonymous macro:

macro x -> sif x then 1 else "not even close to 1"

where sif ("static if") has the definition

sif :: Macro x -> Delayed a -> Delayed b -> c
sif x a b = scase bool x of False -> a ; True -> b

and scase ("static case") is a compile time case statement. All the
names are only possibilities, of course. I'm fairly sure we want sif
and scase to be named separately from if and case, since one has to
specify that you don't want type unification of the branches. I'm
less sure about the separate variable definition syntax, though I do
like the parallel with anonymous functions.

Maybe "static" is a better name than "macro" for the transform? It
seems more analogous to delay(ed).

static x -> sif x then 1 else "I'm pretty sure 1 isn't a string"

Side note: I don't think we should try to defend against someone using
effects inside compile time evaluated functions. This is perfectly
reasonable for debugging, or memoization, etc. Or at least no less
reasonable than having effects at all.

> What do you think about using a function-application-like syntax for
> delay/macro, both to keep it out of the type constructor space and allow it to
> fit into both types and lambdas? It sort of makes sense because these things
> are functions on arguments (types) in a sense:
>
> if :: Bool -> delay a -> delay a -> a
> if = t -> delay x -> delay y -> ...
>
> Obviously in overloads you should only need to specify it in the type, but in
> lambdas it seems nicer than {x :: Delay a -> ...}, and it is an unused part of
> the syntax (application in types and patterns). {delay -> ...} would still be
> a (type) variable named "delay" of course, and {Delay a} would still be an
> abstract type constructor for an (already) delayed thunk.

That sounds reasonable. It seems very slightly unclean that "delay x"
is optional if delay is specified in the type, but not unclean enough
to not do.

>> 1. Replace the Var with a reference to the partially traversed Ptrie,
>> so that each new applied argument only has to traverse through one
>> Ptrie step, and (importantly) doesn't need access to progOverloads.
>> Unfortunately, the naive way of doing this would require making the
>> Ptrie's mutable, since we're now converting references into functions
>> into ValClosure at Ir -> Lir conversion time. This is worth some
>> thought to get right. It's clear that either some mutability or
>> access to a global (essentially mutable) map is required. For
>> example, we could load and compile a module defining a sum function,
>> which contains an internal reference to (+). Later we might load a
>> module of complex numbers, which defines another (+) overload. sum
>> needs to be able to access this overload. The same applies to
>> partially evaluated ValClosures created via constant folding.
>
> Are there ever cases where we need to create a reference to overloads that
> don't exist yet? In haskell, you'd often define a class, and functions that
> reference it, without any instances in that module, which in duck's case would
> lead to references to functions that don't have any overloads (yet). Maybe we
> don't, but it will affect what kind of mutability is needed to support this.

All that means is that we need some way of forward declaring
functions, so that they get initialized with an empty overload table
and don't trigger undeclared variable warnings. Otherwise the case of
zero overload declarations is that same as a nonzero set of overload
declarations none of which happen to match anything we're tried to
apply so far.

>> 2. Replace TypeVal with a thin TypeRep wrapper around an integer, and
>> provide a module that caches TypeVals into unique TypeReps.
>
> We've talked about this before. There might have even been a design document
> for it?

Yep, we've definitely mentioned it before, and in any case the idea is
stolen from LLVM.

Time to rip monads out of Lir.

Geoffrey

Dylan Simon

unread,

May 7, 2011, 11:51:12 AM5/7/11

to duck...@googlegroups.com

I pushed the delay stuff. One open question wrt Delay (thunk) vs delay
(argument transform) is which of these two options should we do:

(A) delay functions always create thunks. This is what we do currently, and
results in:

if :: Bool -> delay (Delay a) -> delay (Delay a) -> Delay a

The downside is if you have a thunk already, and want to pass it to a delay
function, you need to pass "force thunk", resulting in nestings "thunk(force
thunk)", possibly deep ones if it's passed along multiple functions.

(B) delay functions never delay Delay. This results in:

if :: Bool -> Delay a -> Delay a -> a

This is perhaps less natural in terms of types (Delay is special) but possibly
more convenient. This could also allow us, in the same breath, to not create
thunks when passing functions, since they are unnecessary. If you really want
to nest thunks, you can always add a Box layer.

Both of these approaches are type safe (I think) and retain identical function
bodies for delay functions regardless -- only application changes.

As for macros, I like the idea of "macro x = 5" assignments (though will have
to think about syntax, since that looks like a function definition). Related
questions that you may have already thought about:

How do macros interact with overloading? Of course we should be able to
overload "f :: macro Int -> ..." and "f :: macro Bool -> ...", but what about
non-macro arguments. Consider:

(1) f :: macro Bool -> a
(2) f :: macro Bool -> a -> b
(3) f :: macro Bool -> Int -> a

printf clearly needs something like (1) (and we might even want a prettier,
though meaningless "..." syntax like "f :: macro String -> ..."), while "sif"
could have (2). But should (2) and (3) be allowed to coexist as overloads?
If macros can be partially applied (which seems only reasonable), you need to
know which one to call before you get a second argument.

I think we should not support (3)-type overloading (or maybe should not allow
partial evaluation, but that may be confusing). However, it strikes me that
we could support an even more flexible type of overloading with functions like
"f :: macro [Type] -> ...", essentially custom overload resolution functions,
which may be called implicitly. But this is probably a separate topic.

This also raises the question of whether we allow macro argument to follow
non-macro arguments. This might be okay.

In terms of evaluation, consider:

f :: macro Bool -> ...
(1) f True = x -> y -> x
f False = x -> y -> y
(2) f t = if t
then x -> y -> x
else x -> y -> y
(3) f t x y = if t then x else y

Which of these are allowed and do the right thing? Note that (1) is
equivalent to the "case" ("scase"/"sif") version of (2), while (2) calls a
non-macro function "if". I am inclined to disallow (3) only.

I think the idea is, keep track of the set of "static/macro" variables. If an
expression does not contain any free argument variables that are not static,
evaluate it early (at type inference time). So, in (3), don't call "if"
because the call contains "x" and "y", but for (2), if can be called since all
its arguments are macroable expressions. One could similarly exclude certain
primitives from early evaluation, but as you say this might not be necessary.
Does this make sense? Is it too weak/strong?

(I presume the macroized printf always has to take a trailing () argument.)

Geoffrey Irving

unread,

May 7, 2011, 1:39:25 PM5/7/11

to duck...@googlegroups.com

On Sat, May 7, 2011 at 8:51 AM, Dylan Simon <dylan-...@dylex.net> wrote:
> I pushed the delay stuff. One open question wrt Delay (thunk) vs delay
> (argument transform) is which of these two options should we do:
>
> (A) delay functions always create thunks. This is what we do currently, and
> results in:
>
> if :: Bool -> delay (Delay a) -> delay (Delay a) -> Delay a
>
> The downside is if you have a thunk already, and want to pass it to a delay
> function, you need to pass "force thunk", resulting in nestings "thunk(force
> thunk)", possibly deep ones if it's passed along multiple functions.
>
> (B) delay functions never delay Delay. This results in:
>
> if :: Bool -> Delay a -> Delay a -> a
>
> This is perhaps less natural in terms of types (Delay is special) but possibly
> more convenient. This could also allow us, in the same breath, to not create
> thunks when passing functions, since they are unnecessary. If you really want
> to nest thunks, you can always add a Box layer.

Actually, you still have to create thunks when passing functions,
since the function could be created by an expression involving an
effect. Hmm. The same is true of delays, actually. Consider

weird :: a -> Delay a
weird x =
print "weird"
delay x

if c then weird 3 else 4

Since you can't determine whether to add the extra layer of delay in
terms of just the type, I think we need to always add the second
layer.

> Both of these approaches are type safe (I think) and retain identical function
> bodies for delay functions regardless -- only application changes.
>
>
> As for macros, I like the idea of "macro x = 5" assignments (though will have
> to think about syntax, since that looks like a function definition). Related
> questions that you may have already thought about:
>
> How do macros interact with overloading? Of course we should be able to
> overload "f :: macro Int -> ..." and "f :: macro Bool -> ...", but what about
> non-macro arguments. Consider:
>
> (1) f :: macro Bool -> a
> (2) f :: macro Bool -> a -> b
> (3) f :: macro Bool -> Int -> a
>
> printf clearly needs something like (1) (and we might even want a prettier,
> though meaningless "..." syntax like "f :: macro String -> ..."), while "sif"
> could have (2). But should (2) and (3) be allowed to coexist as overloads?
> If macros can be partially applied (which seems only reasonable), you need to
> know which one to call before you get a second argument.
>
> I think we should not support (3)-type overloading (or maybe should not allow
> partial evaluation, but that may be confusing). However, it strikes me that
> we could support an even more flexible type of overloading with functions like
> "f :: macro [Type] -> ...", essentially custom overload resolution functions,
> which may be called implicitly. But this is probably a separate topic.

I think you can overload based on the macro arguments, but not beyond.
So printf would be

printf :: Macro String -> a

and you can't have any other printf overloads that take other arguments.

Hmm. Actually I take that back. There are some weird cases involving
macros interacting with delay, but maybe arbitrary overloading works
fine alongside macros. Let's see:

f :: Macro Int -> Bool -> a
f i b = ...

f :: Macro Int -> Char -> a
f i b = ...

f :: Macro Char -> Bool -> a
f c = ...

Hmm. I think a problem would arise if you write f 3 and then try to
use it as an overloaded function (storing the result in a data
structure, passing it around, etc.). I think macro functions are not
first class, which probably means no overloads based on non-macro
arguments.

> This also raises the question of whether we allow macro argument to follow
> non-macro arguments. This might be okay.

I think it would work as long as one is forced to specify all the way
through all the macro arguments, but it seems confusing in the context
of currying and possible effects. One expects the earlier arguments
to be "passed first", but actually the macro arguments always would
be.

> In terms of evaluation, consider:
>
> f :: macro Bool -> ...
> (1) f True = x -> y -> x
> f False = x -> y -> y
> (2) f t = if t
> then x -> y -> x
> else x -> y -> y
> (3) f t x y = if t then x else y
>
> Which of these are allowed and do the right thing? Note that (1) is
> equivalent to the "case" ("scase"/"sif") version of (2), while (2) calls a
> non-macro function "if". I am inclined to disallow (3) only.
>
> I think the idea is, keep track of the set of "static/macro" variables. If an
> expression does not contain any free argument variables that are not static,
> evaluate it early (at type inference time). So, in (3), don't call "if"
> because the call contains "x" and "y", but for (2), if can be called since all
> its arguments are macroable expressions. One could similarly exclude certain
> primitives from early evaluation, but as you say this might not be necessary.
> Does this make sense? Is it too weak/strong?

This is what I'd imagined earlier, but I think defaulting to immediate
evaluation might be too surprising. It's essentially the extreme
opposite of lazy evaluation. A standard bad example would be 10
functions each of which constructs a gigantic table for use inside the
function. If we evaluate all of the tables at inference time, we
could massively grow the memory footprint of the program. Therefore,
I think it's best for the programmer to specify which expressions
should be static.

In terms of your example, this would mean that (1), (2), and (3) would
all be runtime evaluated, since you haven't used sif or scase. They
might be constant folded later, of course, but the branches would be
forced to have compatible types at inference time. If you wanted the
type checking but also the static evaluation, you could do something
like

f :: Macro Bool -> a
f t =
static g = if t then x -> y -> x else x -> y -> y
g

We naturally want a helper function which does exactly this: takes an
argument and evaluates it at compile time.

Geoffrey

> (I presume the macroized printf always has to take a trailing () argument.)

No, printf doesn't need a trailing () argument. If you want to delay
it, you can always write () -> printf ...

Geoffrey

Dylan Simon

unread,

May 7, 2011, 2:36:16 PM5/7/11

to duck...@googlegroups.com

> Since you can't determine whether to add the extra layer of delay in
> terms of just the type, I think we need to always add the second
> layer.

Okay, that's fine. I'm tempted to blame this on effect typing and its
interaction with delays, though. It at least suggests that to be
effect-type-safe, Delays need an arrow in them. "Delay (-effect-> a)"

> I think macro functions are not
> first class, which probably means no overloads based on non-macro
> arguments.

Yes, this makes sense.

> > In terms of evaluation, consider:
> >

> > ? ? ? ? ? ?f :: macro Bool -> ...
> > ? ? ? ?(1) f True ?= x -> y -> x
> > ? ? ? ? ? ?f False = x -> y -> y
> > ? ? ? ?(2) f t = if t
> > ? ? ? ? ? ? ? ?then x -> y -> x
> > ? ? ? ? ? ? ? ?else x -> y -> y
> > ? ? ? ?(3) f t x y = if t then x else y
> >
> > Which of these are allowed and do the right thing? ?Note that (1) is

> > equivalent to the "case" ("scase"/"sif") version of (2), while (2) calls a

> > non-macro function "if". ?I am inclined to disallow (3) only.
> >
> > I think the idea is, keep track of the set of "static/macro" variables. ?If an

> > expression does not contain any free argument variables that are not static,

> > evaluate it early (at type inference time). ?So, in (3), don't call "if"

> > because the call contains "x" and "y", but for (2), if can be called since all

> > its arguments are macroable expressions. ?One could similarly exclude certain

> > primitives from early evaluation, but as you say this might not be necessary.

> > Does this make sense? ?Is it too weak/strong?

>
> This is what I'd imagined earlier, but I think defaulting to immediate
> evaluation might be too surprising. It's essentially the extreme
> opposite of lazy evaluation. A standard bad example would be 10
> functions each of which constructs a gigantic table for use inside the
> function. If we evaluate all of the tables at inference time, we
> could massively grow the memory footprint of the program. Therefore,
> I think it's best for the programmer to specify which expressions
> should be static.
>
> In terms of your example, this would mean that (1), (2), and (3) would
> all be runtime evaluated, since you haven't used sif or scase. They
> might be constant folded later, of course, but the branches would be
> forced to have compatible types at inference time. If you wanted the
> type checking but also the static evaluation, you could do something
> like
>
> f :: Macro Bool -> a
> f t =
> static g = if t then x -> y -> x else x -> y -> y
> g
>
> We naturally want a helper function which does exactly this: takes an
> argument and evaluates it at compile time.

So (just thinking this through), then the rules are: the value of any static
variable/argument are fully evaluated, and any functions with macro arguments
are evaluated until... something. A function with only macro arguments is
fully evaluated, but once things are mixed it's unclear. Under this scheme,
the "macro" label really is more about the function itself (static f :: Bool
-> a) in some sense. We also need a new built-in "scase" construction, and
either macro function cases need to be converted to this instead, or you can't
use them. I guess that might be reasonable to be more explicit, but possibly
more restrictive.

> > (I presume the macroized printf always has to take a trailing () argument.)
>
> No, printf doesn't need a trailing () argument. If you want to delay
> it, you can always write () -> printf ...

Again, this is an issue of which arrow the effect lives in. Maybe I have to
think more about effect typing to understand all this, but it's not clear to
me how you define {printf ""} to not print immediately at compile time (vs
defining one that does), since in one case you have {printf -> macro "%d"
-IO-> Int} and in the other {printf -IO-> macro "%s"}.

On a completely separate note, I wonder if defining an FFI should be higher
priority, though maybe that depends on the backend.

Geoffrey Irving

unread,

May 7, 2011, 6:10:58 PM5/7/11

to duck...@googlegroups.com

On Sat, May 7, 2011 at 11:36 AM, Dylan Simon <dylan-...@dylex.net> wrote:
>> Since you can't determine whether to add the extra layer of delay in
>> terms of just the type, I think we need to always add the second
>> layer.
>
> Okay, that's fine. I'm tempted to blame this on effect typing and its
> interaction with delays, though. It at least suggests that to be
> effect-type-safe, Delays need an arrow in them. "Delay (-effect-> a)"

Agreed on both counts.

The idea of staticness being about the function rather than the
arguments is interesting. That gives us two options:

1. Static applies to the arguments. As a consequence, the function
itself is as dynamic as possible, and is evaluated at inference time
only as much as required by static variable assignments, scase, and
other macro calls. This plays nicely with effects (see below).

2. Static applies to the function itself. In this case, the entire
function is evaluated at inference time as much as possible. It'd be
natural to treat function definition pattern matches as scase.

(1) seems better to me, since it reflects the fact that the purpose of
macros is primarily to dodge type checking, not to evaluate anything
early. The only advantage I see in (2) is treating function cases as
scase, which seems minor.

>> > (I presume the macroized printf always has to take a trailing () argument.)
>>
>> No, printf doesn't need a trailing () argument. If you want to delay
>> it, you can always write () -> printf ...
>
> Again, this is an issue of which arrow the effect lives in. Maybe I have to
> think more about effect typing to understand all this, but it's not clear to
> me how you define {printf ""} to not print immediately at compile time (vs
> defining one that does), since in one case you have {printf -> macro "%d"
> -IO-> Int} and in the other {printf -IO-> macro "%s"}.

I think you're right that printf "" would have to print at compile
time in case (2) above, but that it works fine in case (1) as long as
the actual print statements aren't part of static variable
assignments.

> On a completely separate note, I wonder if defining an FFI should be higher
> priority, though maybe that depends on the backend.

Are you thinking of that in the context of writing printf? Defining
the FFI doesn't seem that hard linguistically, the only trick in doing
it before the backend is in place would be how to call a C function
without a Haskell FFI declaration. Or maybe that only involves a
dlopen call.

Geoffrey

Dylan Simon

unread,

May 7, 2011, 7:39:35 PM5/7/11

to duck...@googlegroups.com

> > So (just thinking this through), then the rules are: the value of any static
> > variable/argument are fully evaluated, and any functions with macro arguments

> > are evaluated until... something. ?A function with only macro arguments is
> > fully evaluated, but once things are mixed it's unclear. ?Under this scheme,

> > the "macro" label really is more about the function itself (static f :: Bool

> > -> a) in some sense. ?We also need a new built-in "scase" construction, and

> > either macro function cases need to be converted to this instead, or you can't

> > use them. ?I guess that might be reasonable to be more explicit, but possibly

> > more restrictive.
>
> The idea of staticness being about the function rather than the
> arguments is interesting. That gives us two options:
>
> 1. Static applies to the arguments. As a consequence, the function
> itself is as dynamic as possible, and is evaluated at inference time
> only as much as required by static variable assignments, scase, and
> other macro calls. This plays nicely with effects (see below).
>
> 2. Static applies to the function itself. In this case, the entire
> function is evaluated at inference time as much as possible. It'd be
> natural to treat function definition pattern matches as scase.
>
> (1) seems better to me, since it reflects the fact that the purpose of
> macros is primarily to dodge type checking, not to evaluate anything
> early. The only advantage I see in (2) is treating function cases as
> scase, which seems minor.

Maybe. We might be able to figure out that function cases on macro arguments
use scase anyway.

> >> > (I presume the macroized printf always has to take a trailing () argument.)
> >>

> >> No, printf doesn't need a trailing () argument. ?If you want to delay

> >> it, you can always write () -> printf ...
> >

> > Again, this is an issue of which arrow the effect lives in. ?Maybe I have to

> > think more about effect typing to understand all this, but it's not clear to
> > me how you define {printf ""} to not print immediately at compile time (vs
> > defining one that does), since in one case you have {printf -> macro "%d"
> > -IO-> Int} and in the other {printf -IO-> macro "%s"}.
>
> I think you're right that printf "" would have to print at compile
> time in case (2) above, but that it works fine in case (1) as long as
> the actual print statements aren't part of static variable
> assignments.

I'm still troubled by this. Let's take a stupid printf:

printf s = scase s of
[] -> print "\n"
... ->

Expanding "printf """, we get a function definition that takes no run-time
arguments (normally a variable definition). It's not clear how to support
fully-applied closures like this, or what their semantics should be. Let
alone where the effect arrow goes, which is an important issue.

Another question is how to pass "varargs" arguments along. How can I define
printf in terms of sprintf? This is an issue with any approach we have,
though.

> > On a completely separate note, I wonder if defining an FFI should be higher
> > priority, though maybe that depends on the backend.
>
> Are you thinking of that in the context of writing printf? Defining
> the FFI doesn't seem that hard linguistically, the only trick in doing
> it before the backend is in place would be how to call a C function
> without a Haskell FFI declaration. Or maybe that only involves a
> dlopen call.

You're probably right, we can leave it for now.

Geoffrey Irving

unread,

May 7, 2011, 9:17:25 PM5/7/11

to duck...@googlegroups.com

The theoretical resolution is that (printf "") is exactly the same as
writing (print "\n"), in both value and effects.

Practically, I'm not sure I see a clean way to structure the
implementation. It seems to break the simplicity of lifted IR, since
invocations of printf with different format strings need to have
different lifted names. Maybe we have to restructive Ir -> Lir
conversion so that it can happy dynamically as necessary during type
inference and macro expansion?

> Another question is how to pass "varargs" arguments along. How can I define
> printf in terms of sprintf? This is an issue with any approach we have,
> though.

The first solution that comes to mind is to

vsprintf :: Static string -> (string -> a) -> b
sprintf (static format) = vsprintf format id
printf (static format) = vsprintf format print

Another rather exotic solution is

applyIfType :: Type t -> a -> b
applyIfType t f x y = applyIfType t f (x y)

applyIfType :: Type t -> a -> t
applyIfType t f x = f x

printf (static format) = applyIfType String print sprintf

Geoffrey

Dylan Simon

unread,

May 8, 2011, 10:49:51 AM5/8/11

to duck...@googlegroups.com

> >> >> > (I presume the macroized printf always has to take a trailing () argument.)
> >> >>
> >> >> No, printf doesn't need a trailing () argument. ?If you want to delay
> >> >> it, you can always write () -> printf ...
> >> >
> >> > Again, this is an issue of which arrow the effect lives in. ?Maybe I have to
> >> > think more about effect typing to understand all this, but it's not clear to
> >> > me how you define {printf ""} to not print immediately at compile time (vs
> >> > defining one that does), since in one case you have {printf -> macro "%d"
> >> > -IO-> Int} and in the other {printf -IO-> macro "%s"}.
> >>
> >> I think you're right that printf "" would have to print at compile
> >> time in case (2) above, but that it works fine in case (1) as long as
> >> the actual print statements aren't part of static variable
> >> assignments.
> >

> > I'm still troubled by this. ?Let's take a stupid printf:
> >
> > ? ? ? ?printf s = scase s of
> > ? ? ? ? ?[] -> print "\n"
> > ? ? ? ? ?... ->

> >
> > Expanding "printf """, we get a function definition that takes no run-time

> > arguments (normally a variable definition). ?It's not clear how to support
> > fully-applied closures like this, or what their semantics should be. ?Let

> > alone where the effect arrow goes, which is an important issue.
>
> The theoretical resolution is that (printf "") is exactly the same as
> writing (print "\n"), in both value and effects.
>
> Practically, I'm not sure I see a clean way to structure the
> implementation. It seems to break the simplicity of lifted IR, since
> invocations of printf with different format strings need to have
> different lifted names. Maybe we have to restructive Ir -> Lir
> conversion so that it can happy dynamically as necessary during type
> inference and macro expansion?

It's worth thinking about what sprintf really looks like (at least a stupid
inefficient version)

sprintf :: macro String -> ...
sprintf s = sprintf' s "" where
sprintf' "" r = r
sprintf' ('%':'d':s) r = d -> sprintf' s (r ++ show d)
sprintf' (c:s) r = sprintf' s (r ++ [c])

A few things have to happen here: sprintf' has to be lifted with a "macro
String" first argument. The d -> lambda also has to be lifted with a macro
first argument:

lift_d :: macro String -> String -> Int -> ...
lift_d s r d = sprintf' s (r ++ show d)

Let's say that's all Lir does. During type inference, all of these functions
end up disappearing completely, to be replaced by one new function for every
application of sprintf. This process involves recursively expanding all the
macro functions and collecting all the unmacroed arguments.

This seems tractable, except for the tricky role of the "r" argument. Perhaps
this argument needs to be macro too, but of course it can't be evaluated
completely because it contains unmacroed arguments. What you want to end up
with for {sprintf "a%d,%d"} is something like:

sprintf1431 :: Int -> Int -> String
sprintf1432 d1 d2 = (((("" ++ ["a"]) ++ show d1) ++ [',']) ++ show d1)

Which really looks like what's happened is everything was evaluated as much as
possible, up to the arguments provided.

This implies to me that the rule does need to be, expand as far as you can by
"inlining" macro arguments. I have a hard time seeing how this can happen
with the "static" variable approach. Possibly you stop with any non-macro
function calls, or possibly not. (I suspect you can continue to expand
non-macro functions when they are passed only macro arguments -- consider {""
++ ["a"]} above.)

(Speaking of inlining, we will have to be careful that whatever semantics we
come up with are preserved by whatever inlining optimizations we have --
unless inlining happens after macro expansion.)

It does seem to be a mess, and I'm still not sure I understand fully any of
these proposals, but I do still believe printf needs a final ().

Geoffrey Irving

unread,

May 8, 2011, 1:59:03 PM5/8/11

to duck...@googlegroups.com

On Sun, May 8, 2011 at 7:49 AM, Dylan Simon <dylan-...@dylex.net> wrote:
> <snip>

> It's worth thinking about what sprintf really looks like (at least a stupid
> inefficient version)
>
> sprintf :: macro String -> ...
> sprintf s = sprintf' s "" where
> sprintf' "" r = r
> sprintf' ('%':'d':s) r = d -> sprintf' s (r ++ show d)
> sprintf' (c:s) r = sprintf' s (r ++ [c])

Rewritten using the expand-only-as-necessary version, this would be

vsprintf :: Macro String -> (String -> a) -> b
vsprintf s f = scase s of
"" -> f ""
"%d":s -> d -> vsprintf s (f .\ show d ++)
c:s -> vsprintf s (f .\ c :)

sprintf :: Macro String -> a
sprintf s = vsprintf s id

It's basically exactly the same except for the lack of function patterns.

> A few things have to happen here: sprintf' has to be lifted with a "macro
> String" first argument. The d -> lambda also has to be lifted with a macro
> first argument:
>
> lift_d :: macro String -> String -> Int -> ...
> lift_d s r d = sprintf' s (r ++ show d)
>
> Let's say that's all Lir does. During type inference, all of these functions
> end up disappearing completely, to be replaced by one new function for every
> application of sprintf. This process involves recursively expanding all the
> macro functions and collecting all the unmacroed arguments.
>
> This seems tractable, except for the tricky role of the "r" argument. Perhaps
> this argument needs to be macro too, but of course it can't be evaluated
> completely because it contains unmacroed arguments. What you want to end up
> with for {sprintf "a%d,%d"} is something like:
>
> sprintf1431 :: Int -> Int -> String
> sprintf1432 d1 d2 = (((("" ++ ["a"]) ++ show d1) ++ [',']) ++ show d1)
>
> Which really looks like what's happened is everything was evaluated as much as
> possible, up to the arguments provided.

It seems like you get largely the same thing in the
expand-only-as-necessary case. Specifically, you get

vsprintf :: Macro String -> (String -> a) -> b
vsprintf s f = scase s of
"" -> f ""
"%d":s -> d -> vsprintf s (f .\ show d ++)
c:s -> vsprintf s (f .\ c :)

sprintf :: Macro String -> a
sprintf s = vsprintf s id

--------------------------

sprintf "a%d,%d" = vsprintf "a%d,%d" id
= vsprintf "%d,%d" (id .\ 'a':)
= x -> vsprintf ",%d" (id . ('a':) . (show x ++))
= x -> vsprintf "%d" (id . ('a':) . (show x ++) . (',':))
= x -> y -> vsprintf "" (id . ('a':') . (show x ++) . (',':) .
(show y ++))
= x -> y -> (id . ('a':') . (show x ++) . (',':) . (show y ++)) ""

where the only expansions were those forced by vsprintf or scase.

> This implies to me that the rule does need to be, expand as far as you can by
> "inlining" macro arguments. I have a hard time seeing how this can happen
> with the "static" variable approach. Possibly you stop with any non-macro
> function calls, or possibly not. (I suspect you can continue to expand
> non-macro functions when they are passed only macro arguments -- consider {""
> ++ ["a"]} above.)
>
> (Speaking of inlining, we will have to be careful that whatever semantics we
> come up with are preserved by whatever inlining optimizations we have --
> unless inlining happens after macro expansion.)

Inlining probably does wait until after macro expansion, so it should be fine.

> It does seem to be a mess, and I'm still not sure I understand fully any of
> these proposals, but I do still believe printf needs a final ().

With my vsprintf version above, printf would be

printf :: Macro String -> a

printf s = vsprintf s print

------------------

printf "" = vsprintf "" print = print ""

where the last expression is not evaluated since it isn't in a static context.

In general, the expand-only-as-necessary version seems more flexible
than expand-as-much-as-possible. You can always turn the former into
the latter with a single static variable assignment (or call to the
evaluate-at-compile-time function, whose name I'm not sure of).
Moreover, I think there will be applications where runtime and compile
time stuff is mixed in the same function. Although I suppose
expand-as-much-as-possible would handle most of those cases just fine
as well, so maybe that last bit isn't a relevant point.

In any case, printf "" () is unsightly, so definitely worth avoiding.
Both seem implementable, with about the same level of difficulty. In
particular, they both require some sort of interleaving of lifting and
type inference. Thus, we should pick between them based on which is
nicer, even if just boils down to which one let's us avoid the
trailing () in printf. :)

Geoffrey

Reply all

Reply to author

Forward