A diatribe on code blocks

26 views
Skip to first unread message

Doug

unread,
Jan 13, 2010, 7:07:29 PM1/13/10
to Noop
Apropos of (apparently) nothing ... the following came about after a
conversation with Christian and Bobby a couple of months ago made me
start thinking about languages again. It may be off the mark with
respect to Noop (you decide), but there are some interesting ideas in
it, and Christian is making me post it. So enjoy. And post feedback.

(And if some of my complaints seem extreme, blame it on my C/C++
upbringing...)

Doug


Blocks and blocks...
-------------------------

I'm annoyed by code blocks - those things in braces. We use similar,
almost-but-not-quite identical syntax to notate similar-but-not-quite-
identical things - why can't we simplify our lives by making the
"things" entirely identical and use a single syntax?

What do I mean? Consider a "code block" (perhaps a function body,
perhaps one branch of an "if", etc): it's:
- an environment (set of variable bindings from outside the block -
perhaps params/returns, perhaps variables from an enclosing static
scope...),
- a (possibly empty) set of data declarations,
- a (possibly empty) set of nested function definitions, and
- some code that performs some computation on the data available to
it via the environment, and along the way, stores some values into the
data items declared in the block.

--> The declared data items have a "scope" limited to the inside of
the block.
--> The declared data items have a "lifetime" limited to the function
execution.

Now consider a class definition: it's:
- an environment (usually just the params passed to the
constructor),
- a set of data declarations (the object's aggregated fields),
- some function definitions (but we call 'em methods), and
- some code that performs some computation on the data available to
it via the environment, and along the way, stores some values into the
data items declared in the block (we call it a constructor).

--> The declared data items have scopes that depend on how they were
declared (public/private).
--> The declared data items have a lifetime that depends on the
disposition of each object created. (And "static" ones have the
program's lifetime.)

These are almost identical behaviors - only scopes and lifetimes are
really different. Yet for some reason, we write the code in a code
block in-line with the declarations, while for classes we feel a need
to enclose the corresponding code in an anonymously-named function
(named the same as the class in C++, or named __init__() in python
or ...). And that extra syntactic wrapper doesn't seem to do anything
useful for us. Do we need it?

Think about lisp for a minute, particularly w.r.t lambdas and
closures. How might you implement objects in lisp, which lacks all
that specialized syntax? Here's a class definition (f1 ... fn are
just some user-provided blobs of computation code):

(def c (lambda (params)
(let ((x1 (f1 params)) ;; or let* or letrec... as
appropriate
...
(xn (fn params))
(getter (lambda (field)
(cond ((= field "x1") x1)
((= field "x2") x2)
...
((= field "xn") xn))))
(setter (lambda (field val)
(cond ((= field "x1") (:= x1 val))
((= field "x2") (:= x2 val))
...
((= field "xn") (:= xn val)))))
(foo (lambda (z) (:= x1 (+ x2 x3 z))))
)
(lambda (cmd)
(cond ((= cmd 'get) getter)
((= cmd 'set) setter)
((= cmd 'foo) foo)
)
)
)
)
)

That's a class with a constructor, setters, getters, and a method
named foo. Use it like:

(def ob (c params)) ;; more usual (non-lisp) syntax: ob = c
(...)
(def x ((ob 'get) 'x1)) ;; more usual (non-lisp) syntax: x =
ob.x1
((ob 'set) 'x1 val) ;; more usual (non-lisp) syntax:
ob.x1 = val
((ob 'foo) 3) ;; more usual (non-lisp) syntax:
ob.foo(3)

(Yes, there are undoubtedly more efficient, better implementations.
Sue me.)

Using the characterization(s) I started with above,
- the environment is the list of params passed when calling c to
construct an object. (though with lisp, you can also capture values
from enclosing static scopes if you want to.)
- the data declarations are the let/let*/letrec of the xi.
- the method declarations are exemplified by the function foo. (I
think of the getters and setters as part of the data declaration
boilerplate.)
- the code that operates on the input environment (what you probably
think of as the constructor code) is the fn expressions that compute
the initial values of the xi.

The let/let*/letrec construct, the getter and setter functions, and
the final lambda/cond are boilerplate.

Want to make an xi private to the class? Remove its cond entries from
the setter and getter functions. It's a change to the boilerplate.

Want to make an xi readonly? Remove just its entry from the setter
function. It's a change to the boilerplate.

Want a function instead of a class? Replace the lambda that's
returned with whichever xi you want to return as a value. It's a
change to the boilerplate.

So lisp can do this, and fairly simply, showing that code blocks and
class definitions really are specializations of some common, more
general concept.

So what I want is:

- one syntax for all blocks.
- all blocks are first-class entities.
- various options (scope and lifetime) can let you make a block act
like a function body or like a class definition. Or maybe like
something else.

This means (among other things):

- most (all?) control-flow constructs ("if", "while", "foreach",
"when", etc) can just be operators on blocks. So you can create new
ones. ("foreach.in_its_own_thread", anybody?)
- No more icky "return x" statement (anonymous return value - I've
always hated that syntax, for some reason). A block merely alters its
environment - some other compiler mechanism (having to do with
variable binding and "return values" declaration) has the
responsibility for returning the appropriate part of the environment
to a caller, if the block is being used as a body of a called
function. (This may make interprocedural data flow analysis a bit
more tractable?)
- There's only one constructor for a class. Want another (say, with
a different number of params or something)? Write an auxiliary
function. I think this seems cleaner to me, in terms of code hygiene,
as all object construction *must* happen in exactly one place.

Thoughts?

Alex Eagle

unread,
Mar 21, 2010, 7:11:52 PM3/21/10
to no...@googlegroups.com
Hey Doug,

I spent the last quarter focused on 80% stuff and have done very little with Noop, but now it's time to pick it up again.

I like your ideas a lot. I want to incorporate this into the language model, which I'm rethinking a little right now. I want to ignore syntax for the moment and get a compelling and comprehensive model. I'd like to compress the mental model for the mundane parts of assembling control flows and method definitions, and make more room for things like dependency declaration and versioning in the language instead. So this fits well.

Some questions:
- I'd love for functions to be siblings of classes in the language model. Meaning that functions don't need to be declared "inside" of something, since they don't need a "this" reference. That means you can distinguish between methods and functions by where the declaration appears. Does that make sense?
- How do you deal with the else clause? Is "if" a function that takes either one or two blocks as arguments?
- Return value as merely a side-effect is counter-intuitive. What if you had pure functions, like http://smallwig.blogspot.com/2008/04/pure-functions.html - we need to guarantee there are no side effects.
- How would we distinguish a block which should capture its lexical context with one that doesn't? For example, in Ruby I believe you have closures and also bare blocks.

Thanks for the input and sorry I let it sit for so long!
-Alex

Christian Edward Gruber

unread,
Apr 19, 2010, 10:15:05 AM4/19/10
to no...@googlegroups.com
Ping?!  Inquiring minds want to know! :)

Christian.


To unsubscribe from this group, send email to noop+unsubscribegooglegroups.com or reply to this email with the words "REMOVE ME" as the subject.

Doug

unread,
Apr 27, 2010, 9:24:36 AM4/27/10
to Noop
Too busy here too. Here are a few musings pretending to be answers.

I like the idea of not having a "this" (disambiguating funcs vs
methods by
where they're declared) - having a special "this" variable has always
felt
like a hack to me. (But I suppose there's still a need for a way to
refer
to "this object" explicitly so that it can be passed to other
functions,
etc...)

I don't have good syntax yet, but I'm thinking that "if" might be
implemented by applying some sort of "if" operator to the condition to
be
evaluated and the block:
{ {"then" branch block} {"else" branch block} }
(and just leave the "else" block as the empty block if there's no
"else"
clause). This needs some syntax sugaring, of course...

Designing a language around function purity makes for a nice, cleanly
defined language where you can parallelize stuff easily, do lots of
nifty
optimizations, etc. But my experience has been that it also makes the
language a pain to program in. In practice, most functions tend to be
impure (people just love to do I/O at levels other than the top level
of the
function tree, maintain state in their objects, etc).

(
I should mention where I got the idea of having return values be
side-effects, and how that might work. I once invented a language (of
which
python is reminiscent) that did its function definitions like this:

def my_function(param1, param2, ..)(retval1, retval2, ...)
compute...
compute...

The params work as you think. The retvals behaved like other local
variables
inside the function - except that their values were returned to the
caller
when the function ended. So you could write something like:

def factorial(n)(fact)
fact = 1
for i = 1 to n
fact *= i

x = factorial(9)

(allowing multiple retvals let you return a tuple.) So outside of the
function, calling works as you expect. Inside the function, you just
compute values for special "outie" variables. When you read the
function
definition line, you already know what it's going to return. And
there's no
need for an extra, special-case "return" statement that's only useful
in one
context.
)

Conceptually, all blocks capture their context. Except that some
blocks
don't access anything up the chain - that can be analyzed fairly
easily and
used for optimizing the block's allocation and code. Many blocks only
access their own locals and things belonging to their immediate
container;
these blocks could be "inlined" into their containers (being careful
to
make sure their namespaces remain separate though!), with the result
that
their execution wouldn't require any setup. It's also fairly
straightforward to figure out whether a (first-class) block will
escape the
context it was defined in - if it won't, then its local data can be
allocated on the stack rather than the heap. All these together get
you the
memory allocation behavior of a language like C (mostly, variables
live on
the stack, and the stack frame gets created once at function entry),
and you
only pay extra costs if you use the more advanced features
(propagating
a first-class block somewhere else for later execution, etc).


Doug


On Mar 21, 6:11 pm, Alex Eagle <ea...@post.harvard.edu> wrote:
> Hey Doug,
>
> I spent the last quarter focused on 80% stuff and have done very little with
> Noop, but now it's time to pick it up again.
>
> I like your ideas a lot. I want to incorporate this into the language model,
> which I'm rethinking a little right now. I want to ignore syntax for the
> moment and get a compelling and comprehensive model. I'd like to compress
> the mental model for the mundane parts of assembling control flows and
> method definitions, and make more room for things like dependency
> declaration and versioning in the language instead. So this fits well.
>
> Some questions:
> - I'd love for functions to be siblings of classes in the language model.
> Meaning that functions don't need to be declared "inside" of something,
> since they don't need a "this" reference. That means you can distinguish
> between methods and functions by where the declaration appears. Does that
> make sense?
> - How do you deal with the else clause? Is "if" a function that takes either
> one or two blocks as arguments?
> - Return value as merely a side-effect is counter-intuitive. What if you had
> pure functions, likehttp://smallwig.blogspot.com/2008/04/pure-functions.html- we need to
> guarantee there are no side effects.
> - How would we distinguish a block which should capture its lexical context
> with one that doesn't? For example, in Ruby I believe you have closures and
> also bare blocks.
>
> Thanks for the input and sorry I let it sit for so long!
> -Alex
>
--
Subscription settings: http://groups.google.com/group/noop/subscribe?hl=en

Christian Edward Gruber

unread,
Apr 27, 2010, 9:56:32 AM4/27/10
to no...@googlegroups.com
Hey Doug

On Apr 27, 2010, at 9:24 AM, Doug wrote:

 def factorial(n)(fact)
   fact = 1
   for i = 1 to n
     fact *= i

 x = factorial(9)

Man, the above really reminds me of Miranda...

Anyway, the above I have no problem with - it's not really side-effect - just an alternate declaration approach.

As to your musings about "this" or "self", it's because O-O is not functional, and classes (and their instances) are not just blocks above functions, quite.  They're first-class items, and they provide a kind of quasi-global that's encapsulated from the world, but accessible to everything below.  Also, OO lets you model somewhat from the perspective of "entering the data" (which is why I like self more than this).  You can see if you were that object, who do you talk to and who asks you for what.  But that's sort of an aside...

It's not that "OO folks 'like' to maintain state in their objects," it's that this is the point of O-O.  Data + behaviour, encapsulated.  It's orthogonal, or possibly contends directly with the pure-functional model by specific intent.  I think there are places where a mixed model makes sense, but you're never going to "wean" OO folks off of instance variables.  Just FYI. ;-)

Christian.

Reply all
Reply to author
Forward
0 new messages