Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Grammar to automation translation at runtime (NOT at compile time)?

43 views
Skip to first unread message

Soumen Sarkar

unread,
May 14, 2003, 12:49:00 AM5/14/03
to
All,

Currently the parser generator technology allows grammar to automation
translation in compile time only. For example if I have to validate a
text stream that I am receiving, I can not load a grammar file
dynamically and validate the text stream. Please note that this has
been already accomplished in the world of XML. I can load a XML schema
(can be compared to grammar) and validate a XML document
dynamically. I understand this is possible in XML because of the
standardization of syntax (since XML is a meta language) and parser
reuse at meta language/language level for XML based languages. This
important syntactic standradization makes dynamic XML validation (in
the sense of a language conforming to a grammar) possible. I could be
wrong if XML schemas are refused to be considered as a grammar but
then there are many XML validation technologies are available.

Java/C# offer reflection features which allows some programs to be
authored /specified both at compile time or runtime.

Therefore, my question is if there is any thought/effort to make the
grammar to automation (a computational service) make available at run
time? I am interested in CFG based generators (current impls are
JavaCC, ANTLR).

If this would be possible, I just have to supply a grammar file at run
time to validate an ASCII stream -- I do not have to make a commitment
at compile time. Only commitment I make that I will supply a valid
grammar file to which the ASCII stream is validated (no nore grammar
to Java/C# codegen/compile/write code on top of generated code).

I took a look at Recursive Adaptive Grammar (which blurs the
distinction between grammar vs automation) and it may hold certian
potential towards dynamic grammar based validation.

Thanks,
Soumen Sarkar.
[Thirty years ago languages like IMP72 and EL/1 let you change the
syntax on the fly by including BNF-like stuff in the input. They
worked OK, but they didn't turn out to be useful, and they all
disappeared. The main effect was to make programs unreadable, since
no two programs used the same syntax. More generally, the set of
problems that require parsing a language where you can't specify the
grammar in advance seems to be vanishingly small other than contrived
examples. -John]

Quinn Tyler Jackson

unread,
May 16, 2003, 9:56:09 PM5/16/03
to
> Therefore, my question is if there is any thought/effort to make the
> grammar to automation (a computational service) make
> available at run
> time? I am interested in CFG based generators (current impls are
> JavaCC, ANTLR).
>
> If this would be possible, I just have to supply a grammar
> file at run
> time to validate an ASCII stream -- I do not have to make a
> commitment
> at compile time. Only commitment I make that I will supply a valid
> grammar file to which the ASCII stream is validated (no nore grammar
> to Java/C# codegen/compile/write code on top of generated code).
>
> I took a look at Recursive Adaptive Grammar (which blurs the
> distinction between grammar vs automation) and it may hold certian
> potential towards dynamic grammar based validation.

Meta-S allows grammars to be instantiated at run-time from strings
(which could be stored in text files). The CLpmGrammar class has a
member function:

bool InstantiateFromString(const CShuString& ssGrammar, bool
bDoSemanticChecks);

So, for instance, if your grammar is saved as an .abn file, which in
flat ASCII, looks something like this:

grammar TheGrammar
{
S ::= A B C;
A ::= /* whatever */;
// etc;
};

You do something like this:

CLpmGrammar gmr;

CShuString ssGrammar = LoadFromFile("the_grammar.abn");
CShuString ssTestInput = LoadFromFile("test_input.txt");

if(gmr.InstantiateFromString(ssGrammar, false))
{
if(gmr.Accept(ssTestInput))
{
// you can now traverse the parse tree
}
else
{
// grammar did not accept input
}
}

You can effectively do the above with the ABN2CPP utility (included
with Meta-S) thusly:

abn2cpp the_grammar.abn -i test_input.txt

You can develop your grammars either in a simple text editor, in
Meta-S Lite, or in Meta-S, a graphical grammar development environment
which allows you to debug the grammar against test input.

For more information, either visit http://www.meta-s.com or email me
directly.

Cheers.
--
Quinn Tyler Jackson

Vladimir Makarov

unread,
May 18, 2003, 1:30:49 AM5/18/03
to
Soumen Sarkar wrote:
> Currently the parser generator technology allows grammar to automation
> translation in compile time only. For example if I have to validate a
> text stream that I am receiving, I can not load a grammar file
> dynamically and validate the text stream.

Year ago I wrote Earley parser which can do that (you can use grammars
from files or form it in run time). It is designed as an abstract
data and implemented on C and C++. I spent much time to design and
implement it fast and using few memory. E.g. it parses 6,000 lines of
C code for 0.1 sec on 2.5GHz P4 or 0.35 sec on 550Mhz P3 and consumes
1MB of dynamic memory for this. This times includes all (processing
ANSI C grammar, lexical analysis and Earley parser work itself).

As Earley parser it can parse ambiguous grammar too. The description
is similar to YACC one but instead of actions there are simple
translations. So the output of its work can be abstract tree. For
ambiguous grammar it can give all abstract trees (in compact form by
reusing similar subtrees) or only one abstract tree (it can be chosen
random or based on rule costs). It also can do a minimal cost
recovery from syntactic errors.

Honestly, I don't see necessity of usage of other compiler compilers
for the most projects (except industrial compilers). And I don't use
them although long ago I wrote another compiler compiler MSTA faster
than YACC and Bison.

Earley parser is a part of COCOM toolset (see directory AMMUNITION).
You can find the toolset on

http://cocom.sf.net

The earley parser documentation is on

http://cocom.sourceforge.net/ammunition-13.html

The mentioned test (#41) is in file earley.tst.

The parser is distributed under GPL (not LGPL)

Usually I use earley parser in interpreter of DINO language for my
recent projects

http://cocom.sourceforge.net/dinoload.html

Best regards,
Vladimir Makarov

Sreenivas Viswanadha

unread,
May 18, 2003, 1:25:41 AM5/18/03
to
> Therefore, my question is if there is any thought/effort to make the
> grammar to automation (a computational service) make available at run
> time? I am interested in CFG based generators (current impls are
> JavaCC, ANTLR).

The JavaCC lexer generator builds a full FA during the process of
generating the lexer. I have a private prototype that can be used as a
dynamic lexer. The lexer generation is fast, but it runs slower than
the generated and compiled version. I haven't given much thought to
the CFG part though.

Sreeni.

VBDis

unread,
May 18, 2003, 11:52:41 PM5/18/03
to
sarkar...@yahoo.com (Soumen Sarkar) schreibt:

>Therefore, my question is if there is any thought/effort to make the
>grammar to automation (a computational service) make available at run
>time?

I'm thinking about such a thing myself, inspired by MetaS. In several
of my applications I'd appreciate some scripting feature, based upon a
grammar which can be input on the fly, from either manual input or a
text file.

It's no special problem to create automation tables at runtime, but
how to add actions to these tables? Once a grammar is customized with
actions, it's very hard to propagate eventual changes in the syntax to
all those customized grammars :-(

DoDi

Cedric LEMAIRE

unread,
May 18, 2003, 11:53:18 PM5/18/03
to
sarkar...@yahoo.com (Soumen Sarkar) wrote in message news:<03-0...@comp.compilers>...
> [snip...]

> Therefore, my question is if there is any thought/effort to make the
> grammar to automation (a computational service) make available at run
> time? I am interested in CFG based generators (current impls are
> JavaCC, ANTLR).

Code Worker is a scripting language distributed under LGPL
(http://codeworker.free.fr) that works on generative programming. It
allows building DSLs and their implementation, but also program
transformation, source-to-source translation, code generation and a
lot of other stuffs.

To acquire data, CodeWorker provides a declarative language for
describing the grammar of the data format: an extended BNF enriched of
some convenient directives. It allows just scanning, or scanning and
parsing together without binding to an external programming language
(such as C++/Java/C#).

> If this would be possible, I just have to supply a grammar file at run
> time to validate an ASCII stream -- I do not have to make a commitment
> at compile time. Only commitment I make that I will supply a valid
> grammar file to which the ASCII stream is validated (no nore grammar
> to Java/C# codegen/compile/write code on top of generated code).

CodeWorker doesn't work as an interpreter only. You can exploit the
CodeWorker's features via its C++ library. In C++, what you need is
(not tested):
...
#include "CppParsingTree.h"
#include "CGRuntime.h"
using namespace CodeWorker;

CppParsingTree_value aTree;
try {
CGRuntime::parseAsBNF("your-grammar-file.gen", aTree, "file-to-validate");
} catch(UtlException& error) {
std::cerr << error.getMessage() << std::endl;
}
...
In C, write a little library with a function that calls this piece of code.
In Java, use a little JNI. In C#, I don't know how it works, but it should be
possible too.

Example of grammar file in CodeWorker (just scanning):
translation_unit ::=
// directive to ignore C++/Java-like comments and blanks between tokens
#ignore(C++)
[class_declaration]* // scans simplified class declarations
// will raise an accurate syntax error if the rest of the sequence
// isn't valid
#continue
#empty; // end of file expected

class_declaration ::=
// predefined non-terminal that reads an identifier (very often used);
// <non-terminal>:"constant_str" means that the token must match the string
#readIdentifier:"class"
#continue
#readIdentifier
[':' #continue #readIdentifier]? // reads the superclass name if any
'{'
[member_declaration]*
'}';
member_declaration ::= ...

Parsing is also very natural, but your are just interested in scanning, no?

You can scan binary data too.

Hope that can help.

-- Cedric

Oliver Zeigermann

unread,
May 18, 2003, 11:53:53 PM5/18/03
to
> I can load a XML schema (can be compared to grammar) and validate a
> XML document dynamically. I understand this is possible in XML
> because of the standardization of syntax (since XML is a meta
> language) and parser reuse at meta language/language level for XML
> based languages. This important syntactic standradization makes
> dynamic XML validation (in the sense of a language conforming to a
> grammar) possible. I could be wrong if XML schemas are refused to
> be considered as a grammar but then there are many XML validation
> technologies are available.


Well, although neatly disguised you can only define regular languages
with XML DTDs or schemas. Construction of Finite State Automatons from
regular grammars is thoroughly understood and very efficient
algorithms exist for it.


> Therefore, my question is if there is any thought/effort to make the
> grammar to automation (a computational service) make available at run
> time? I am interested in CFG based generators (current impls are
> JavaCC, ANTLR).

I think the main problem is speed. The best known algorithm to contruct
a parser for a random CFG has polynomial time complexity. The trick
ANTLR, YACC, etc. use is to restrict the set of allowed grammars from CF
to LL(k) resp. LALR(1). Although construction of those parsers is not
linear in general, actually parsing with them is.

Oliver

Quinn Tyler Jackson

unread,
May 23, 2003, 1:31:34 AM5/23/03
to
VBDis said:

> It's no special problem to create automation tables at runtime, but
> how to add actions to these tables? Once a grammar is
> customized with
> actions, it's very hard to propagate eventual changes in
> the syntax to
> all those customized grammars :-(

Actually, Meta-S 5.0 *will* include interpreted scripted actions,
using a Lua-MetaS interface.

In other words, reduction actions and support functions will be
written in Lua with access to LPM objects via the Luna interface.

And, of course, since the actions will be interpreted, it will be
possible to modify them at parse time, as with productions.

I selected Lua for the interpreted language because it's compact,
simple, extensible, and because it has a liberal license.

By doing this, non-C++ hosted versions of Meta-S grammars will be
portable across various hosts without any changes in the reduction or
post-parse tree manipulation code.

--
Quinn Tyler Jackson

0 new messages