Cat as a distributed programming language

Ian

unread,

Dec 16, 2008, 5:27:49 PM12/16/08

to Cat Language

A few weeks ago I had an idea for a new way to create a transparently
distributed programming language, described here:

http://blog.locut.us/2008/10/06/swarm-a-true-distributed-programming-language/

The short version is that data is distributed across multiple
computers (basically objects which can have fields which contain
primitives, or pointers to other objects), and the execution will
migrate between computers (using a portable continuation) so that it
is always acting on local data.

The "toy" language I created to test the idea was stack based, but
really it would be a serious encumbrance to have to invent "yet
another" new programming language for this.

Cat caught my interest as an easily implemented VM (at least the
untyped version) that could form a basis for Swarm. I'd be interested
in people's opinions...

Ian.

Christopher Diggins

unread,

Dec 20, 2008, 9:14:15 AM12/20/08

to catla...@googlegroups.com

You may be interested in a new project to build a VM based on Cat, called CVML.
There is no documentation yet, but the source code is at
http://cvml.googlecode.com/
The goal is to produce a very compact byte-code by applying aggressive
optimizations.
Preliminary results show that it may be about 1/4 the size of Java bytecode.
So far the focus has been only on the optimization, for the purpose of
an upcoming paper, but I plan on throwing together an implementation
real soon now.
If you are interested, I'll post more here.

- Christopher

Ian Clarke

unread,

Dec 20, 2008, 11:45:56 AM12/20/08

to catla...@googlegroups.com

On Sat, Dec 20, 2008 at 8:14 AM, Christopher Diggins <cdig...@gmail.com> wrote:
> You may be interested in a new project to build a VM based on Cat, called CVML.

I am interested. Really one of my goals is to minimize the amount of
time I need to spend on the language, so a simple VM with a
pre-existing set of tools is appealing. Really my project would be a
combination of a customized bytecode interpreter (that supports
portable continuations), and a Domain Specific Language.

I have a semi-related question:

Cat is a postfix or RPN language as it is stack-based. Would it be
very difficult to extend the interpreter such that commands could be
declared to be prefix or infix (with precedence, and support for
brackets to clarify ambiguity)?

It would seem that such an extension would provide far more potential
to build DSLs, although of course it would somewhat complicate the
interpreter?

Ian.

--
Ian Clarke
CEO, Uprizer Labs
Email: i...@uprizer.com
Ph: +1 512 422 3588
Fax: +1 512 276 6674

Christopher Diggins

unread,

Dec 20, 2008, 3:05:43 PM12/20/08

to catla...@googlegroups.com

On Sat, Dec 20, 2008 at 11:45 AM, Ian Clarke <ian.c...@gmail.com> wrote:
>
> On Sat, Dec 20, 2008 at 8:14 AM, Christopher Diggins <cdig...@gmail.com> wrote:
>> You may be interested in a new project to build a VM based on Cat, called CVML.
>
> I am interested. Really one of my goals is to minimize the amount of
> time I need to spend on the language, so a simple VM with a
> pre-existing set of tools is appealing. Really my project would be a
> combination of a customized bytecode interpreter (that supports
> portable continuations), and a Domain Specific Language.

CVML currently has very little documentation. I have been developing
it primarily as a research project for a paper I am working on for
LCTES 09 (Languages, Compilers, and Tools for Embedded Systems). Since
you are interested, I will spend some time this week putting together
a bit of documentation about the project, and post it on this list.

> I have a semi-related question:
>
> Cat is a postfix or RPN language as it is stack-based. Would it be
> very difficult to extend the interpreter such that commands could be
> declared to be prefix or infix (with precedence, and support for
> brackets to clarify ambiguity)?

Not too difficult, but a bit time consuming. You would have to come up
with a syntax for expressing whether a new command is prefix, postfix,
or infix. You would also have to have to design a system for
expressing operator precedence. I have never seen such a system turn
out to be very elegant in practice. It is sometimes easier to just go
with the already established C operator hierarchy. Finally you have to
create a parse tree and parse it according to your operator rules.

> It would seem that such an extension would provide far more potential
> to build DSLs, although of course it would somewhat complicate the
> interpreter?

There are some languages which already do this for example XMF (
http://www.ceteva.com/xmf.html ) and Katahdin (
http://www.chrisseaton.com/katahdin/ ). Personally I prefer to write
my own parsers and compilers if I want a DSL.

Incidentally the CVML project is designed to work with different
languages used as a front-end. Currently I am testing with an infix
language called Heron (http://www.heron-language.com) as a front-end.
Heron is similar to JavaScript but with optional typing and a module
system. Before Heron is converted into CVML, it is converted into an
s-expression (prefix) form. Some optimizations turn out to be easier
on the s-expression form than on the stack-based byte-code.

Do you already have a clear idea what your DSL is going to look like?

Craig Overend

unread,

Dec 21, 2008, 8:20:07 AM12/21/08

to catla...@googlegroups.com

I've been a lurker on this list for a little while. I've had similar ideas for a stack-based distributed programming language and have been doing my own related research.

You may want to take a look at YASBL (Yet Another Stack Based Language) for a stack-based prefix, infix, postfix notation interpreted language in action. It does however require dual stacks, one for operators, the other for operands.
http://au.youtube.com/watch?v=uZxHAtMxg1E&feature=channel_page
I understand it was based on ideas from Andrew Cooke's otuto.
http://www.acooke.org/otuto.html

Craig.

Christopher Diggins

unread,

Dec 21, 2008, 11:08:28 AM12/21/08

to catla...@googlegroups.com

That's an interesting approach, placing operators on a secondary stack.

So I should probably share my ideas about how I was planning on making
CVML (the Cat-based bytecode project) concurrent. What I plan on doing
was making the VM's very lightweight. Multiple instances could be run
on the same machine on different processes, or on multiple machines.
Each program would run on its own VM. Programs could be strung
together as in UNIX pipes. Asynchronous communication between VM's
could also be done using event passing. Each VM would have an event
stack. Each event could carry arbitrary data, including functions.

I have been on the fence regarding whether or not I should allow
continuations in CVML. I have never been a big fan of sharing data
between concurrent processes. One of my big concerns becomes the
burden on users to properly synchronize data access.

I'd be curious to hear people's thoughts on this.

- Christopher

Christopher Diggins

unread,

Dec 22, 2008, 10:35:54 AM12/22/08

to catla...@googlegroups.com

I reread Ian Clarke's writing about Swarm (
http://blog.locut.us/main/2008/10/7/swarm-a-true-distributed-programming-language.html
). I think the idea has a lot of potential. I also realized that
adding continuations to my proposed VM for CVML will be quite easy, so
I plan on adding them. A VM object simply needs to be cloned, and
treated as data on the stack, or passed as a message to another VM.

- Christopher

On Sun, Dec 21, 2008 at 11:08 AM, Christopher Diggins

Ian Clarke

unread,

Dec 22, 2008, 1:22:53 PM12/22/08

to catla...@googlegroups.com

Great! Note that the Swarm proposal doesn't just need continuations,
they must be portable continuations (ie. you must be able to serialize
them and send them to a remote computer where they can be
de-serialized and continue executing).

Portable continuations seem to be quite rare, Haskell doesn't have
them for example. Rhino (the Java JavaScript interpreter) does have
them, but only in interpreted mode.

If Cat supported Portable Continuations then I see no reason why it
couldn't form the basis for my Swarm proposal.

Actually, I've gone ahead and set up a Google Code project, also I've
renamed it from "Swarm" (which was a rather overloaded name), to
"Vega" (which is less overloaded, and sounds good IMHO).

You can find some very preliminary architectural brainstorming here:
http://code.google.com/p/vega/wiki/VegaArchitecture

It would be really great if we could use Cat for the Virtual Machine
and some of the Platform, that way we can bootstrap from Cat's growing
ecosystem, without having to invent yet another language.

A few vaguely related points/questions:

Have you considered how the compiler could provide support to IDEs -
for example when it comes to auto-completion? I'm not sure how this
is commonly handled (for example with Java's Eclipse IDE), but one
option would be for the IDE to come up with an ordered list of guesses
(perhaps determined statistically based on near-by keywords and other
things), and then feed them to the compiler which will filter the list
based on what is plausible given the types of the suggestions and
near-by keywords.

For Vega I'd love to see an in-browser IDE with functionality like
auto-completion. IMHO the ability to allow IDEs to do this kind of
thing is one of the major benefits of strongly typed languages.

A separate question: what thought have you given to namespaces, for
example some kind of module system analogous to Java's packages or
Haskell's Module system?

If I could make a suggestion: It would be wonderful if something like
the following could be supported (please ignore the syntax!):

import HTTP.* from http://modules.cat-language.com/net
signed by urn:sha1:79437f5edda13f9c0669b978dd7a9066dd2059f1
version 1.4.x

In this case, it would import a module called HTTP from the /net
directory on the Cat website, verifying the code with an SHA1
signature, and specifying that it must be version 1.4.x of the module.

Clearly, anything downloaded from here would be cached locally so it
would only need to be downloaded the first time it is used. The user
could specify how long modules are cached locally.

So what we are doing here is baking the type of functionality provided
by systems like Java's Maven into Cat itself (the concept of Maven is
nice, but not its nightmarishly over-engineered XML-everywhere
design!).

And of course, a future Cat IDE could provide lots of support to the
developer for this module system.

Anyway, perhaps such a thing is way-overkill for what you had in mind.

I'm happy to add anyone to the Vega project who wishes to join, just
let me know.

Ian.

Craig Overend

unread,

Dec 23, 2008, 1:09:16 AM12/23/08

to catla...@googlegroups.com

An astute reader of my bookmarks sent me a reference to Kali Scheme you might like to read

Kali Scheme is a distributed implementation of Scheme that permits efficient transmission of higher-order objects such as closures and continuations. The integration of distributed communication facilities within a higher-order programming language engenders a number of new abstractions and paradigms for distributed computing. Among these are user-specified load-balancing and migration policies for threads, incrementally-linked distributed computations, and parameterized client-server applications. Kali Scheme supports concurrency and communication using first-class procedures and continuations. It integrates procedures and continuations into a message-based distributed framework that allows any Scheme object (including code vectors) to be sent and received in a message.

http://community.schemewiki.org/kali-scheme/

I also found what appears to be the original paper

http://www.cs.purdue.edu/homes/suresh/papers/toplas95.ps.gz

Craig.

Christopher Diggins

unread,

Dec 23, 2008, 9:16:33 AM12/23/08

to catla...@googlegroups.com

On Mon, Dec 22, 2008 at 1:22 PM, Ian Clarke <ian.c...@gmail.com> wrote:
>
> Great! Note that the Swarm proposal doesn't just need continuations,
> they must be portable continuations (ie. you must be able to serialize
> them and send them to a remote computer where they can be
> de-serialized and continue executing).

So my current idea will be to allow the current executor/VM object to
not only be stored on the stack, it can also be packed up and sent
across the network as data.

> Portable continuations seem to be quite rare, Haskell doesn't have
> them for example. Rhino (the Java JavaScript interpreter) does have
> them, but only in interpreted mode.
>
> If Cat supported Portable Continuations then I see no reason why it
> couldn't form the basis for my Swarm proposal.
>
> Actually, I've gone ahead and set up a Google Code project, also I've
> renamed it from "Swarm" (which was a rather overloaded name), to
> "Vega" (which is less overloaded, and sounds good IMHO).

I like the name.

> You can find some very preliminary architectural brainstorming here:
> http://code.google.com/p/vega/wiki/VegaArchitecture
>
> It would be really great if we could use Cat for the Virtual Machine
> and some of the Platform, that way we can bootstrap from Cat's growing
> ecosystem, without having to invent yet another language.

I'll keep the list informed as to progress on the next-gen Cat (that
is CVML) executor is ready to be played with.

> A few vaguely related points/questions:
>
> Have you considered how the compiler could provide support to IDEs -
> for example when it comes to auto-completion?
>
> I'm not sure how this
> is commonly handled (for example with Java's Eclipse IDE), but one
> option would be for the IDE to come up with an ordered list of guesses
> (perhaps determined statistically based on near-by keywords and other
> things), and then feed them to the compiler which will filter the list
> based on what is plausible given the types of the suggestions and
> near-by keywords.

Eclipse already has Cat syntax coloring at least thanks to Adrian Savage:
http://code.google.com/p/cat-plugin/
Perhaps this tool can be extended.

> For Vega I'd love to see an in-browser IDE with functionality like
> auto-completion. IMHO the ability to allow IDEs to do this kind of
> thing is one of the major benefits of strongly typed languages.

Auto-completion is also possible for dynamic languages in many contexts.

> A separate question: what thought have you given to namespaces, for
> example some kind of module system analogous to Java's packages or
> Haskell's Module system?

I consider a module system critical.

> If I could make a suggestion: It would be wonderful if something like
> the following could be supported (please ignore the syntax!):
>
> import HTTP.* from http://modules.cat-language.com/net
> signed by urn:sha1:79437f5edda13f9c0669b978dd7a9066dd2059f1
> version 1.4.x

I like that. I was wondering though how important is signing of
modules? Java doesn't use it and seems to get by fine.

I should point out that my current plan has been to extend the Heron
front end to CVML, and to finish the C++ implementation of the CVML
optimizer and VM.

For me Cat was little more than a proof of concept that functional
stack-based code could be strongly typed. I am not sure that
programmers will ever use stack-based languages in a significant way.
Personally I prefer to write code in an infix syntax and let my tools
generate stack-based back end code.

My thought is that perhaps Vega would be more of a language agnostic
project. It would be based on a common bytecode and VM model, that
could be shared by different languages. CVML is designed to be
applicable to far more languages than Java bytecode. Even though we
have seen many languages targeting Java bytecode (e.g. Scala, Groovy,
etc.) they are not very efficient because Java bytecode has terrible
support for functional languages.

How does this fit in with your vision.

> In this case, it would import a module called HTTP from the /net
> directory on the Cat website, verifying the code with an SHA1
> signature, and specifying that it must be version 1.4.x of the module.
>
> Clearly, anything downloaded from here would be cached locally so it
> would only need to be downloaded the first time it is used. The user
> could specify how long modules are cached locally.

I like it.

> So what we are doing here is baking the type of functionality provided
> by systems like Java's Maven into Cat itself (the concept of Maven is
> nice, but not its nightmarishly over-engineered XML-everywhere
> design!).
>
> And of course, a future Cat IDE could provide lots of support to the
> developer for this module system.
>
> Anyway, perhaps such a thing is way-overkill for what you had in mind.

Well time is limited. I have a day job that doesn't permit to spend
the time I would like on these things. Until Feb 6 my focus is going
to be finishing up my paper on space efficient byte-code for LCTES
2009. I am also going to try and finish the CVML VM as fast as I can.

Reply all

Reply to author

Forward