(Manual) transformation of ANTLR 4 grammar into Scala parser combinator DSL - a viable solution ?

Jürgen Pfundt

Jun 29, 2013, 4:21:20 PM6/29/13
to antlr-di...@googlegroups.com
At the moment no ANTLR 4 Scala target language is available. Supposed that building a bridge between Scala and a Java parser generated from an ANTLR 4 grammar is not an option. Is the (manual) conversion of an existing well written ANTLR 4 grammar into a Scala parser combinator a viable solution ?

To illustrate this with a concrete example, I use the ANTLR 4 grammar for CSV from Terence Parr's book:

grammar CSV;
// start of parser
: hdr row+ ;
: row ;
: field (',' field)* '\r'? '\n' ;
// start of lexer
: ~[,\n\r"]+ ;
STRING : '"' ('""'|~'"')* '"' ; // quote-quote is an escaped quote

Here a brief description of the Scala notation used in the example. Given p1 and p2 of type scala.util.parsing.combinator.Parsers.Parser:

p1 ~ p2   // sequencing: must match p1 followed by p2
| p2   // alternation: must match either p1 or p2,
// with preference given to p1
||| p2 // alternation: If p and q both succeed,
// the parser that consumed the most characters accepts
.?      // optionality: may match p1 or not
.+      // repetition: matches 1 or more repetitions of p1
.*      // repetition: matches any number of repetitions of p1
~> p2  // a parser combinator for sequential
// composition which keeps only the right result.
<~ p2  // a parser combinator for sequential
// composition which keeps only the left result.

Looking at the CSVParser trait below, the resemblance of each single parser combinator rule to the original ANTLR 4 rule is remarkable. The Scala grammar is extended to a complete program.

import util.parsing.combinator.RegexParsers
CSVParser extends RegexParsers {
// adjust handling of white space to ANTLR 4 characteristics
override val skipWhitespace = false
override val whiteSpace = """[ \t]""".r
// start of parser
def file: Parser[List[List[String]]] = hdr ~ row.+ ^^ {
case header ~ rows => header :: rows
def hdr: Parser[List[String]] = row
def row: Parser[List[String]] = field ~ ("," ~> field).* <~ "\r".? <~ "\n" ^^ {
case field ~ fields => field :: fields
def field: Parser[String] = TEXT ||| STRING | EMPTY
// start of lexer
  lazy val TEXT
: Parser[String] = ("[^,\n\r\"]".r).+ ^^ makeText
  lazy val STRING
: Parser[String] = "\"" ~> ("\"\"" | "[^\"]".r).* <~ "\"" ^^ makeString
  lazy val EMPTY
: Parser[String] = "" ^^ makeEmpty
// signatures
def makeText: List[String] => String
def makeString: List[String] => String
def makeEmpty: String => String

CSVLexerAction {
// remove leading and trailing blanks
def makeText = (text: List[String]) => text.mkString("").trim
// remove embracing quotation marks
// replace to consecutive quotes by a single quote
def makeString = (string: List[String]) => string.mkString("").replaceAll("\"\"", "\"")
// modify result of EMPTY token if required
def makeEmpty = (string: String) => ""

import java.io.FileReader
object CSVParserCLI extends CSVParser with CSVLexerAction {
def main(args: Array[String]) {
(parseAll(file, new FileReader(args(0))))

The transformation of the ANTLR 4 CSV grammar into Scala parser combinators is admittedly a simple example, but I believe that a lot of existing ANTLR 4 grammars can be transformed into Scala using the parser combinators DSL.

Is this a viable way to circumvent the missing Scala language target for ANTLR 4 ? 


Jul 3, 2013, 5:45:04 AM7/3/13
to antlr-di...@googlegroups.com
If you are using listeners, you don't need a scala target to be able to use an antlr4 grammar with Scala.
If your grammar is named "Expr", then antlr will generate a java file ExprBaseListener.java.
You just need to derived the ExprBaseListener class in your own scala file and implement the various listener methods and you are almost done


