Simplest language implementation for the JVM

72 views
Skip to first unread message

Dibyendu Majumdar

unread,
Sep 5, 2010, 7:47:37 PM9/5/10
to JVM Languages
Hi,

I am starting work on implementing the Go like language for the JVM
and was wondering if there is a simple toy language implementation for
the JVM that I could look at. Is anyone aware of such an
implementation?

I am looking to implement a hand-coded lexer/parser for the language.

Thanks and Regards
Dibyendu

vruz

unread,
Sep 5, 2010, 7:55:01 PM9/5/10
to jvm-la...@googlegroups.com
one of the many LISP and LISP-like implementations perhaps?

http://www.is-research.de/info/vmlanguages/tag/lisp/

> --
> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
>
>

--
---- vruz
---- alterius non sit qui suus esse potest

Tom Davies

unread,
Sep 5, 2010, 10:59:39 PM9/5/10
to JVM Languages
I've implemented the 'postfix' language from Design Concepts in
Programming Languages.

It's a *very* simple language, and I used ANTLR rather than doing my
own lexer/parser, but if you want to see an example of using asm to
build class files from an AST I'm happy to zip up the project and send
it to you.

Tom

Randall R Schulz

unread,
Sep 5, 2010, 11:08:48 PM9/5/10
to jvm-la...@googlegroups.com
On Sunday September 5 2010, Dibyendu Majumdar wrote:
> Hi,
>
> I am starting work on implementing the Go like language for the JVM
> and was wondering if there is a simple toy language implementation
> for the JVM that I could look at. Is anyone aware of such an
> implementation?

Clojure might qualify as a relatively simple language from the
perspective of code generation.


> I am looking to implement a hand-coded lexer/parser for the language.

Why?


> Thanks and Regards
> Dibyendu


Randall Schulz

vruz

unread,
Sep 5, 2010, 11:10:31 PM9/5/10
to jvm-la...@googlegroups.com
I guess a distinction must be drawn between a simple language and a
simple implementation of a language.

They're many times not interchangeable.

:-)

Richard L. Burton III

unread,
Sep 5, 2010, 11:12:10 PM9/5/10
to jvm-la...@googlegroups.com
Hey Tom,

I've been doing a language based upon VBScript. I need to write up something on what I've done and the various things to consider when writing a language.

Interesting stuff :)

--
You received this message because you are subscribed to the Google Groups "JVM Languages" group.
To post to this group, send email to jvm-la...@googlegroups.com.
To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.




--
-Richard L. Burton III

Jim White

unread,
Sep 6, 2010, 1:33:23 AM9/6/10
to jvm-la...@googlegroups.com
I would say llava could be a good example for you even though it is
pretty much inactive. It is a Lisp whose operators map to JVM bytecodes
in a very direct manner so the compiler is very simple.

http://llava.org/

Jim

Ben Evans

unread,
Sep 6, 2010, 2:16:44 AM9/6/10
to jvm-la...@googlegroups.com

Sent from my iPhone

Patrick Wright

unread,
Sep 6, 2010, 3:03:16 AM9/6/10
to jvm-la...@googlegroups.com
I found the implementations of Talc (http://code.google.com/p/talc/)
and Pnuts (https://pnuts.dev.java.net/) fairly small and easy to
follow. I wouldn't call them toy languages or toy implementations but
the code is not overwhelming, I think.


Patrick

Tom Davies

unread,
Sep 6, 2010, 3:09:51 AM9/6/10
to JVM Languages

Attila Szegedi

unread,
Sep 6, 2010, 3:13:03 AM9/6/10
to jvm-la...@googlegroups.com
I'd say they are *never* interchangeable.

Whenever you want to have a computation task accomplished, you need to, in the end, get those electrons shuffled around the silicon in the right way. Every computational task has an inherent level of complexity that you can't reduce below. This complexity has to manifest itself somewhere between your source code that encodes the computation the way you expressed it, and the electrons being shuffled. You can push the complexity between the layers (your source code, libraries, language runtime, OS, hardware) but it has to be somewhere.

This is also a very theoretical minimum. In practice, our today's implementations of computational systems are still far above their minimal level of complexity - when we discover simplifications, we're incrementally (and likely, asymptotically) approaching the minimum.

(This morning's philosophical thought brought to you by a guy sitting home with flu, unable to do anything much more meaningful.)

Attila.


--
http://twitter.com/szegedi

Rémi Forax

unread,
Sep 6, 2010, 5:48:49 AM9/6/10
to jvm-la...@googlegroups.com
Le 06/09/2010 09:03, Patrick Wright a �crit :

Pseudo (for pseudo code) http://code.google.com/p/pseudo-language/
It uses Tatoo as parser generator and javac as generator (instead of ASM),
It builds javac Tree and let javac backend generates the bytecode :)

R�mi

Dibyendu Majumdar

unread,
Sep 6, 2010, 7:01:27 AM9/6/10
to JVM Languages
On Sep 6, 4:08 am, Randall R Schulz <rsch...@sonic.net> wrote:
> > I am looking to implement a hand-coded lexer/parser for the language.
>
> Why?

Primarily because I am up to speed with parser generators.

Regards

Dibyendu Majumdar

unread,
Sep 6, 2010, 7:16:43 AM9/6/10
to JVM Languages
Thanks all - looks like there are a few implementations I can look at.

My aim is to initially generate Java code, and later on implement byte
code generation.

The initial problem is how to load classes referred to in the source
code, i.e., how to handle imports. Do I need to use something like ASM
to locate and load references to classes, or is it possible to use a
custom class loader to do this? What is the general approach followed
by other implementations?

Many thanks for your input.

Regards
Dibyendu

On Sep 6, 10:48 am, Rémi Forax <fo...@univ-mlv.fr> wrote:
>   Le 06/09/2010 09:03, Patrick Wright a crit :
>
> > I found the implementations of Talc (http://code.google.com/p/talc/)
> > and Pnuts (https://pnuts.dev.java.net/) fairly small and easy to
> > follow. I wouldn't call them toy languages or toy implementations but
> > the code is not overwhelming, I think.
>
> > Patrick
>
> Pseudo (for pseudo code)http://code.google.com/p/pseudo-language/

Randall R Schulz

unread,
Sep 6, 2010, 9:21:13 AM9/6/10
to jvm-la...@googlegroups.com

I still don't get it. You're familiar with parser generators so you
don't want to use one?


> Regards


Randall Schulz

Dibyendu Majumdar

unread,
Sep 6, 2010, 9:49:10 AM9/6/10
to JVM Languages
Oops ! - that was a typo - I meant I am _not_ up to speed with them.

Attila Szegedi

unread,
Sep 6, 2010, 9:49:26 AM9/6/10
to jvm-la...@googlegroups.com

Listen,

parsing is hard (well, unless your language is LISP). You'll end up wasting lot of time on debugging your parser. Worse, if later you'll want to extend your grammar, you'll likely realize that too is hard in a handwritten parser as you'll have to modify a lot of it. Parser generators exist for a reason; they do a tremendous amount of mundane work instead of you. I would never write a lexer/parser by hand, unless the goal is to amuse myself with exactly that.

There, you can't say we didn't try to warn you :-)

Attila.

>
> Regards

Randall R Schulz

unread,
Sep 6, 2010, 9:59:07 AM9/6/10
to jvm-la...@googlegroups.com
On Monday September 6 2010, Attila Szegedi wrote:
> On 2010.09.06., at 13:01, Dibyendu Majumdar wrote:
> > On Sep 6, 4:08 am, Randall R Schulz <rsch...@sonic.net> wrote:
> >>> I am looking to implement a hand-coded lexer/parser for the
> >>> language.
> >>
> >> Why?
> >
> > Primarily because I am [not] up to speed with parser generators.

(Edit applied by RRS based on correction from DM.)


> Listen,
>
> parsing is hard (well, unless your language is LISP). You'll end up
> wasting lot of time on debugging your parser. Worse, if later you'll
> want to extend your grammar, you'll likely realize that too is hard
> in a handwritten parser as you'll have to modify a lot of it. Parser
> generators exist for a reason; they do a tremendous amount of mundane
> work instead of you. I would never write a lexer/parser by hand,
> unless the goal is to amuse myself with exactly that.

Seconded.


> There, you can't say we didn't try to warn you :-)
>
> Attila.


Randall Schulz

Dibyendu Majumdar

unread,
Sep 6, 2010, 10:20:44 AM9/6/10
to JVM Languages

On Sep 6, 2:59 pm, Randall R Schulz <rsch...@sonic.net> wrote:
> > parsing is hard (well, unless your language is LISP). You'll end up
> > wasting lot of time on debugging your parser. Worse, if later you'll
> > want to extend your grammar, you'll likely realize that too is hard
> > in a handwritten parser as you'll have to modify a lot of it. Parser
> > generators exist for a reason; they do a tremendous amount of mundane
> > work instead of you. I would never write a lexer/parser by hand,
> > unless the goal is to amuse myself with exactly that.
>
> Seconded.
>

I agree with you both - fortunately the grammer of the new language
isn't that difficult - and therefore should not be too difficult to
write a
parser for.

Regards

Rémi Forax

unread,
Sep 6, 2010, 10:23:14 AM9/6/10
to jvm-la...@googlegroups.com
Le 06/09/2010 15:59, Randall R Schulz a �crit :

Yeah, let's start writing a parser generator !

>> There, you can't say we didn't try to warn you :-)
>>
>> Attila.
>
> Randall Schulz

R�mi

Randall R Schulz

unread,
Sep 6, 2010, 10:28:06 AM9/6/10
to jvm-la...@googlegroups.com

Famous last words...

Good luck.


> Regards


RRS

Angel Java Lopez

unread,
Sep 6, 2010, 10:41:13 AM9/6/10
to jvm-la...@googlegroups.com
Hi people!

Dibyendu, writing your own parser (and lexer, I guess) is a nice training and experience. I would recommend you to use Test-Driven Development.

Angel "Java" Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

Dibyendu Majumdar

unread,
Sep 6, 2010, 10:39:53 AM9/6/10
to JVM Languages
From an initial look, Talc looks really something that I can learn
from;
its got a few similarities with Go as well (such as type inference).

Thanks very much!

Regards

Dibyendu Majumdar

unread,
Sep 6, 2010, 10:45:30 AM9/6/10
to JVM Languages


On Sep 6, 3:41 pm, Angel Java Lopez <ajlopez2...@gmail.com> wrote:
> Hi people!
>
> Dibyendu, writing your own parser (and lexer, I guess) is a nice training
> and experience. I would recommend you to use Test-Driven Development.
>

Absolutely - I plan to iteratively develop each language feature, and
have
a working testable compiler at every stage.

Regards

Kirk

unread,
Sep 6, 2010, 10:49:54 AM9/6/10
to jvm-la...@googlegroups.com
This is how Groovy started... no proper grammar and it was a freak'en mess to clean up BNF isn't that difficult... in fact it's simpler than rolling your own.

Regards,
Kirk

Dibyendu Majumdar

unread,
Sep 6, 2010, 10:56:43 AM9/6/10
to JVM Languages


On Sep 6, 3:49 pm, Kirk <kirk.pepperd...@gmail.com> wrote:
> This is how Groovy started... no proper grammar and it was a freak'en mess to clean up BNF isn't that difficult... in fact it's simpler than rolling your own.
>

Hi, I am planning to define a proper grammer - started work on it
already.
But I don't want to wait until the whole language is defined before
starting to code. So grammer, lexer, parser, compiler will all evolve
side
by side if that makes sense.

Regards

Per Bothner

unread,
Sep 6, 2010, 11:33:09 AM9/6/10
to jvm-la...@googlegroups.com
On 09/06/2010 06:49 AM, Attila Szegedi wrote:
> parsing is hard (well, unless your language is LISP). You'll end up wasting lot of time on debugging your parser. Worse, if later you'll want to extend your grammar, you'll likely realize that too is hard in a handwritten parser as you'll have to modify a lot of it. Parser generators exist for a reason; they do a tremendous amount of mundane work instead of you. I would never write a lexer/parser by hand, unless the goal is to amuse myself with exactly that.

As an old curmudgeon, I still prefer a hard-written recursive-descent
parser. It's simple, efficient, and much easier to debug. You can set
a breakpoint in the parser and inspect the parser state. You don't have
IDEs getting confused. Yes, some parser-generators have debugging tools
and IDE interfaces, but that's more stuff you need to install and
understand.

(For typical arithmetic with infix operators, one uses an operator
precedence sub-parser.)

Error recovery is a mixed issue: It is easier and more flexible to add
recovery in a hand-written parser, but this is where a generator could pay
off - but note that (for example) learning how to use ANTLR's error
recovery features is another learning hump.

More appealing seems to be a parser library (or language feature), as in
Perl 6 or Scala, but I don't experience with those.
--
--Per Bothner
p...@bothner.com http://per.bothner.com/

Randall R Schulz

unread,
Sep 6, 2010, 11:42:56 AM9/6/10
to jvm-la...@googlegroups.com
On Monday September 6 2010, Dibyendu Majumdar wrote:

All the more reason to use a parser generator tool!

If you have any intention that this language will be released into the
wild, you owe it to those who will take it up after you to put it on a
proper foundation. And even if you don't, you owe it to yourself not to
waste time and effort on a hand-written parser.


> Regards


Randall Schulz

John Wilson

unread,
Sep 6, 2010, 11:52:41 AM9/6/10
to jvm-la...@googlegroups.com
On 6 September 2010 16:33, Per Bothner <p...@bothner.com> wrote:
[snip]

> As an old curmudgeon, I still prefer a hard-written recursive-descent
> parser.  It's simple, efficient, and much easier to debug.  You can set
> a breakpoint in the parser and inspect the parser state.  You don't have
> IDEs getting confused.  Yes, some parser-generators have debugging tools
> and IDE interfaces, but that's more stuff you need to install and
> understand.
>
> (For typical arithmetic with infix operators, one uses an operator
> precedence sub-parser.)
>
> Error recovery is a mixed issue: It is easier and more flexible to add
> recovery in a hand-written parser, but this is where a generator could pay
> off - but note that (for example) learning how to use ANTLR's error
> recovery features is another learning hump.
>
> More appealing seems to be a parser library (or language feature), as in
> Perl 6 or Scala, but I don't experience with those.


I must agree; certainly if you're designing a language and fiddling
with the syntax. If you're implementing an existing language then you
will presumably have the BNF so a parser generator makes sense.

Groovy is a good example of this. It took us ages to get the small
details of the syntax right (especially the optional semicolon stuff).
An ANTLR grammer would actually have made it harder (unless we'd has a
world class ANTLR wrangler, which we didin't). We ended up with a hand
crafted parser which had numerous idiosyncrasies but got the basics
right. Then it was possible to produce a grammer which removed the
numerous parsing bugs but kept the important characteristics of the
syntax of the language. If you start with BNF you end up with a syntax
which is easy to define and hence easy to parse. What you really want
is a syntax that it's easy for humans to read and write - the two are
not normally compatible.

I think error recovery is less important than it was in the days when
we had one or two batch runs per day. Error location identification is
more important. With virtually instant compilation I can live with a
compiler that doesn't find all the errors in one parse.

John Wilson

Randall R Schulz

unread,
Sep 6, 2010, 12:02:44 PM9/6/10
to jvm-la...@googlegroups.com
On Monday September 6 2010, Per Bothner wrote:
> On 09/06/2010 06:49 AM, Attila Szegedi wrote:
> > parsing is hard (well, unless your language is LISP). You'll end up
> > wasting lot of time on debugging your parser. Worse, if later
> > you'll want to extend your grammar, you'll likely realize that too
> > is hard in a handwritten parser as you'll have to modify a lot of
> > it. Parser generators exist for a reason; they do a tremendous
> > amount of mundane work instead of you. I would never write a
> > lexer/parser by hand, unless the goal is to amuse myself with
> > exactly that.
>
> As an old curmudgeon, I still prefer a hard-written recursive-descent
> parser.

In that case, I'd go with a combinator parser approach such as Scala's
combinator parser library. (Now with Packrat Parsing!)


> ...
> --
> --Per Bothner


Randall Schulz

John Cowan

unread,
Sep 6, 2010, 1:08:56 PM9/6/10
to jvm-la...@googlegroups.com
On Mon, Sep 6, 2010 at 11:33 AM, Per Bothner <p...@bothner.com> wrote:

> As an old curmudgeon, I still prefer a hard-written recursive-descent
> parser.  It's simple, efficient, and much easier to debug.

I strongly agree. Lua, where code compactness is almost everything,
started out with a yacc parser (and hand-written lexer; in those days
lex produced slow and bulky code) because they were still mutating the
language quite a bit. When the syntax mostly stabilized in Lua 3.0,
they switched to the much smaller recursive descent parser they use
today.

> (For typical arithmetic with infix operators, one uses an operator
> precedence sub-parser.)

I wouldn't even bother, unless you are dealing with a very slow
processor or a language like C with way too many levels of precedence.

> Error recovery is a mixed issue: It is easier and more flexible to add
> recovery in a hand-written parser,

The Lua folks agree. And you have control over what to do, whereas
typical yacc-likes have at least partly hard-coded recovery
strategies.

Angel Java Lopez

unread,
Sep 6, 2010, 1:12:12 PM9/6/10
to jvm-la...@googlegroups.com
Well.... I usually don't write a proper grammar, if I'm using TDD.

I write the tests, and then implement the parser, step by step, recursive descent parser.

I implemented some features of Go, in a .NET interpreter (not compiler), using that approach:

http://ajlopez.wordpress.com/2009/12/28/channels-and-goroutines-in-ajsharp-part-1/
http://ajlopez.wordpress.com/2009/12/30/channels-and-goroutines-in-ajsharp-part-2/

and agents, distributed objects, functional values, futures.. without writing the grammar. The parser, lexer and supporting AST were extended test by test.

Modern languages has relative easy grammars. My guess: the "big work" is in the compiler.

Regards

Reply all
Reply to author
Forward
0 new messages