Is ANTLR the right tool for translating custom DSL source code existing languages?

Róbert Kohányi

unread,

Jun 30, 2016, 6:47:26 AM6/30/16

to antlr-discussion

Hi All,

My company has a heap of source code in an in-house DSL language which is interpreted and executed at runtime by a component in our software stack.

The DSL language is unmaintainable, nobody understands it, the interpreter contains some bugs and it's not extensible.

The DSL itself is kind of simple, a barebones expression language, nothing complicated like classes and such.

We want to switch over to something more sensible: JavaScript, Groovy, anything.

So basically we'd like to translate file.crap-dsl to file.groovy or file.js replacing the "custom DSL interpreter".

Are there any examples of doing something like this with ANTLR?

Any thoughts on if this seems to be the right tool for the job?

As I've understood so far I would need to do the following to do source-to-source translation with ANTLR.

Define a grammar file for my DSL.
Write some custom Java code relying on ANTLR's API to output JavaScript/Groovy/etc.

I've no experience with ANTRL whatsoever, but I remember reading about it on StackOverflow and several other places and I have used StringTemplate successfully a few times in the past.

Thanks for the input.

Best,

Robert

Devlin Poster

unread,

Jun 30, 2016, 7:00:42 AM6/30/16

to antlr-discussion

Any thoughts on if this seems to be the right tool for the job?

It might be, but sometimes there's an easier way:

If the interpreter has an AST visitor, you can simply change it (or provide a new visitor, if possible) to write out statements in the target language instead of interpreting.

Róbert Kohányi

unread,

Jun 30, 2016, 9:28:40 AM6/30/16

to antlr-discussion

Nope, it doesn't have an AST visitor, it's a complete utter crap unfortunately :(

Jim Idle

unread,

Jun 30, 2016, 10:18:46 AM6/30/16

to antlr-di...@googlegroups.com

I have done exactly this at least six or seven times for production commercial environments ranging from IBM mainframes to your "standard" Linux. It's fairly trivial to do the language translation, but unless you're existing language is directly translateable to your target, then you will also need some kind of runtime library to reproduce the semantics of your existing language.

A better solution might be to translate to a much better thought out DSL, but with a runtime for that that executes in say the JVM, or C++ or via a machine translation to source such as Java that you never look at.

Lots of options but it depends what your outcome goals are. Do you want to spend time generating LLVM object codes if speed isn't really your issue, for instance? I would not translate to groovy or JavaScript - frying pans and fires if you ask me.

Jim

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jim Idle

unread,

Jun 30, 2016, 10:20:33 AM6/30/16

to antlr-di...@googlegroups.com

Just guessing of course but as this DSL sounds trivial and the source for the interpreter sounds like a nightmare then I assume an Antlr4 Parser will be a heck of a lot easier ;)

Jim

--

Jim Idle

unread,

Jun 30, 2016, 10:22:31 AM6/30/16

to antlr-di...@googlegroups.com

I take it neither you nor your current colleagues wrote this :D

If you post some example code it would help, but I can see no barriers to you doing this other than perhaps experience.

--

Devlin Poster

unread,

Jun 30, 2016, 10:40:06 AM6/30/16

to antlr-discussion

Just guessing of course but as this DSL sounds trivial and the source for the interpreter sounds like a nightmare then I assume an Antlr4 Parser will be a heck of a lot easier ;

A parser is only half the job, he has to generate target code as well. If he had an AST, he already had a parser and a place to generate target code.

All moot, since he said there was no AST visitor, so going through ANTLR4 probably makes sense.

Róbert Kohányi

unread,

Jun 30, 2016, 11:40:19 AM6/30/16

to antlr-discussion

I take it neither you nor your current colleagues wrote this :D

Right on the mark. :)

If you post some example code it would help, but I can see no barriers to you doing this other than perhaps experience.

Well, that's a though one. Actually I've oversimplified my scenario when I've written that I want to translate my file.crap-dsl to file.js. file.crap-dsl is non-existent right now. The DSL source-code lies around in DB records and it's kind of hard to create a "single source file" from them, because there's no such thing right now. However here's a few snippet.

IF(PADFIELD(LEFT,0,5,Message.VARIABLE1),>,00001)THEN(E)ELSE(IF(SUBSTRING(1,2,Message.VARIABLE2),!=,UT)THEN(D)ELSE(Z))

The E, D and Z are replaced with other snippets at runtime I believe, like

SUBSTRING(1,35,DROPCHAR(DROPCOMMAS(Message.VARIABLE3),')
...

A whole heap of this crap. Right now I can't include a "complete" snippet, but it's like this. Paren delimited madness. DROPCHAR and the like are methods, MESSAGE.VARIABLEX are variables that are fed to the script.


Do you want to spend time generating LLVM object codes if speed isn't really your issue, for instance? I would not translate to groovy or JavaScript - frying pans and fires if you ask me.

I'm not sure what are you getting at with LLVM and speed, sorry, I'm not familiar with ANTRL and LLVM and compiler-y stuff in general.
However speed is not an issue, but why wouldn't you translate to Groovy/JS? Would you translate to something else or wouldn't translate at all and suggest to write/create a better a DSL and write a new interpreter for it? And to use ANTRL to translate from the old DSL to the new? I'm not sure ... a bit confused. Thanks for the input though, appreciate it.

Jim Idle

unread,

Jun 30, 2016, 10:19:50 PM6/30/16

to antlr-di...@googlegroups.com

On Thu, Jun 30, 2016 at 11:40 PM, Róbert Kohányi <kohanyi...@gmail.com> wrote:

I take it neither you nor your current colleagues wrote this :D

Right on the mark. :)

If you post some example code it would help, but I can see no barriers to you doing this other than perhaps experience.

Well, that's a though one. Actually I've oversimplified my scenario when I've written that I want to translate my file.crap-dsl to file.js. file.crap-dsl is non-existent right now. The DSL source-code lies around in DB records and it's kind of hard to create a "single source file" from them, because there's no such thing right now. However here's a few snippet.
IF(PADFIELD(LEFT,0,5,Message.VARIABLE1),>,00001)THEN(E)ELSE(IF(SUBSTRING(1,2,Message.VARIABLE2),!=,UT)THEN(D)ELSE(Z))
The E, D and Z are replaced with other snippets at runtime I believe, like

SUBSTRING(1,35,DROPCHAR(DROPCOMMAS(Message.VARIABLE3),')
...

OK, so your lexer is going to be more complicated than a straight ANTLR

as you will either need to write a preprocessor to produce the final source for parsing, or have the lexer do it as it goes. The prior is probably better.


A whole heap of this crap. Right now I can't include a "complete" snippet, but it's like this. Paren delimited madness. DROPCHAR and the like are methods, MESSAGE.VARIABLEX are variables that are fed to the script.

That's enough to see.

I think it would be trivial to parse, but obviously more complex to assemble the input before hand.


Do you want to spend time generating LLVM object codes if speed isn't really your issue, for instance? I would not translate to groovy or JavaScript - frying pans and fires if you ask me. 

I'm not sure what are you getting at with LLVM and speed, sorry, I'm not familiar with ANTRL and LLVM and compiler-y stuff in general.

Don't worry too much about it. LLVM is a system that allows you to generate machine code files in the same way that say a C compiler does. IT is quite a lot of work, so you only need to do it if the end result of your compile should be as fast as possible, or should be linkable with C/C++/etc. generally most people are not writing compilers, but are looking to do something like you need to do.

However speed is not an issue, but why wouldn't you translate to Groovy/JS?

Lots of people like Groovy and/or JS but I don't. I don't think JS is very maintainable and Groovy is a lot of syntactic sugar based upon dubious decisions in my experience.

However, that is the key. I was asked to translate something similar for a company and they wanted Groovy as output. But the problem with such translations is that because Groovy will almost certainly not behave like your existing script/code, you will find yourself wrapping what would be otherwise simple expressions in strange looking method calls. For instance if +, / don't give exactly the same results as Groovy, then you will need to write methods that simulate the results from your DSL's runtime.

Basically, you will get a mess. The company in question would not believe me until they started to see the output. Then we changed to a better DSL that translated to JVM byte code and ran in the Java JVM with some runtime routines.

Either just starting again with a new DSL and rewriting the current source or translating the current source to your new DSL. BUt translating the existing code to a new DSL is probably the best bet, getting rid of this snippets of code stored in database tables.

It depends what you can run in your software stack, but if the JVM is not an issue, then your DSL would generate JVM byte codes (not very difficult - you use the ASM library to do this), or if you can't be bothered with that, then just generate Java code and compile that in the background. You will likely need a small runtime library jar that supports the idioms of your language.

It sounds like the language you have gets the job done, but organizationally and syntactically it is a mess. So it seems you need a 're-assembler' that starts with the base code snippets, replaces the insert points with the relevant snippets so you have a source, then a translator to the new DSL and a parser for the new DSL. The translator and parser will probably the least of your issues.

One possible problem I can see for you is that the interpreter may do this snippet replacement dynamically. As in when you run it with one set of inputs it uses snippet A from the database but a different set of inputs uses snippet B. In other words it will have to be interpreted unless you can somehow predict all possible paths in the interpreter and generate if statements or switch statements to cover them. If it starts to get that complicated then I think it is likely better to throw it away and start again, maybe with a DSL or maybe you just bite the bullet and write that logic from scratch in something that you want.

Assuming that the programs do not change dynamically (but I am willing to bet, given how you say it is put together, that they do), then:

With all baselilne/startpoints - use an 'assembler' to put together the source that runs as a a blob
Use ANTLR to translate this to a new DSL form
Eventually, throw those tools away
Use ANTLR to write a new compiler for your code and generate java or java byte code (you can of course pick anything)
Run the byte code in your stack

Without more information on what this does at runtime/interpretation time, I can't begin to guess if it is even translatable in a static form.

Jim

Jim Idle

unread,

Jun 30, 2016, 10:23:22 PM6/30/16

to antlr-di...@googlegroups.com

He may actually be forced to do this by hooking into the interpreter :(

But if the language does things dynamically at runtime, then he is stuck with an interpreted language. If it is not possible to predict what the interpreter will do such that a single static source can be generated, then it is essentially not translatable to any form that is any more maintainable than what he has.

I wonder what language the interpreter is written in and how old this system is.

Jim

--

Róbert Kohányi

unread,

Jul 1, 2016, 5:37:08 AM7/1/16

to antlr-discussion

I wonder what language the interpreter is written in and how old this system is.

The interpreter is written in Java. I wouldn't even call it an interpreter, it's a huge class with static methods doing obscure and/or magical stuff. I think it was written some 10 years ago by someone who thought a good idea to re-invent the wheel.

write a preprocessor to produce the final source for parsing, or have the lexer do it as it goes. The prior is probably better.

Agreed.

Lots of people like Groovy and/or JS but I don't. I don't think JS is very maintainable and Groovy is a lot of syntactic sugar based upon dubious decisions in my experience.

There are two driving factor behind why my company wants to replace our crap DSL.

It's unmaintainable, code lives scattered around in a DB, obscure syntax and behavior that is crap enough that you don't want anybody to learn it. New employees shouldn't be exposed to this crap. They'll lose heart.
People who work with the current DSL are not developers (there are exceptions of course). They're business people with some development experience. They can handle SQL, because they edit the DSL code through SQL Developer. Yeah. They already know Groovy and/or JS, at least the basic syntax. Groovy is a winner for us, because most of the people knows it, we're a Java company. JS can also be good because it's also very well known, it's everywhere. We want old and new guys to edit code with IntelliJ (or something that makes sense) that'll flag every error for them and we want to check the code in a VCS.

Groovy will almost certainly not behave like your existing script/code, you will find yourself wrapping what would be otherwise simple expressions in strange looking method calls.

Yeah, unfortunately I think we can't workaround that. The current DSL contains functions that has "quirks" which would need to be reimplemented in the same erroneous way for an identical translation. I think it's more flexible for us to use Groovy and such strange looking method calls.

Existing code translated to the new format would use quirky helper methods like (oldDROPCHARS, just to make sure it's ugly enough to stand out) to retain backward compatibility. New scripts would use something more sensible. When we need to update an old script we can update it or leave it as it is, whatever like.

It'll be a mess, that's for sure. Anyway, I like the idea of a new DSL that translate to bytecode, don't get me wrong. I think my employer will like the former idea, because they already failed with a custom DSL.

It sounds like the language you have gets the job done, but organizationally and syntactically it is a mess.

Yeah, it works, but we hate it. At this point I'm inclined to think that the best idea is to leave the current interpreter as it is and rewrite only the parts that fetches the code from the DB to fetch it instead from a source file. Check those into version control and leave them be until further notice. New scripts simply would use Groovy scripts with common helper methods. We already have something like this in our system so it would be well understood.

Anyway, not sure what it'll come down to, we're in a preliminary phase weighing our options. Your input on the matter is well received, thank you!

One possible problem I can see for you is that the interpreter may do this snippet replacement dynamically.

If you mean that it might be a problem if something like this happens

IF(PADFIELD(LEFT,0,5,Message.VARIABLE1),>,00001)THEN(MAGIC(Message.VARIABLE2)

where MAGIC will return a snippet choosen dynamically based on VARIABLE2's value then I'm 87% percent sure at this point that won't happen. If that happens I still can organize my source code format like this I think

main {

IF(PADFIELD(LEFT,0,5,Message.VARIABLE1),>,00001)THEN(MAGIC(Message.VARIABLE2)

}

snippetA {

}

snippetB {

Róbert Kohányi

unread,

Jul 1, 2016, 5:41:03 AM7/1/16

to antlr-discussion

Premature post.

So, I was writing that I could represent my source code like this

main {

  IF(PADFIELD(LEFT,0,5,Message.VARIABLE1),>,00001)THEN(MAGIC(Message.VARIABLE2)

}

snippetA {

<<doA>>

}

snippetB {

<<doB>>

}

and I would have my MAGIC method include the particular snippet dynamically. I'm not sure if that's a complicated thing to do ... I can see that it's more complicated than the other scenario though. :)

I'll have to dig down into this a bit more, I've got exposed to this crap only yesterday. Didn't know a single thing about it before (not that I wanted to :D).

Róbert Kohányi

unread,

Jul 1, 2016, 6:59:19 AM7/1/16

to antlr-discussion

Some new information came in just now.

I haven't seen all of the crap code yesterday, but now I was introduced to a monster child of the DSL.

The code lines are stored in DB records and there are records that holds 2000+ or more character long lines. And there are like a 1000 lines (for a single "script unit"), not all of them this long, but ... a big portion of them. Madness. I haven't seen anything like this before I must admit :D

Earlier I was only shown cute little examples of the DSL and I thought translating that to something usable will benefit us.

We've discussed with my boss that it's just doesn't worth translating a big pile of crap into a smaller (but still huge pile) of crap. Just no. Complete bollocks.

The monster code solves a particular problem of ours. This problem needs to be solved in a different manner with a different tool - was the conclusion. The current madness still have to be checked into VCS, but I was shown that we have already something for that, the same that my idea was earlier

leave the current interpreter as it is and rewrite only the parts that fetches the code from the DB to fetch it instead from a source file

So it came down to what you've suggested Jim,

If it starts to get that complicated then I think it is likely better to throw it away and start again

I would have loved to use ANTRL because it seems interesting, but it's never gonna happen with this project. Unfortunate, because I was looking forward to it - always good to work on something interesting.

Thanks for the input again though, it was enlightening.

Best,

Robert

Reply all

Reply to author

Forward