Creating simple Java Compiler (syntax checker)using Prolog

106 views
Skip to first unread message

Nadeesha

unread,
Jul 9, 2015, 11:49:37 PM7/9/15
to swi-p...@googlegroups.com
Basically what I need is developing a simple java syntax checking application using SWI-Prolog .Before implementing that I need to refer existing such applications.I search a lot but I couldn't find something related to my topic.So can you please give me any hints ,suggestions or useful link .Thanks 

HelloFunk

unread,
Jul 10, 2015, 12:10:34 PM7/10/15
to swi-p...@googlegroups.com
If you are googling for "compiler" this might explain why you are not finding anything, since what you describe is not a compiler. Probably what you want is a "linter". A Java compiler would generate actual Java byte code.

Blond Mammuth

unread,
Jul 24, 2015, 5:32:44 AM7/24/15
to SWI-Prolog, hell...@floatingmachine.com
Hallo Nadeesha,

it seems to me that you require either an existing parser, which creates Prolog readable output, or you find a Java-Parser written in Prolog, which you can adapt, or you need to write a parser yourself.

I do not know of an existing parser of this kind (does not mean there exists none), but I can tell you what I would do in this case: Basically, Java is a C-style language, which means, that it has a LR-Grammar. Prolog offers the very flexible indeterministic phrase mechanism, which essentially allows us to define indeterministic LL-Grammars. The difference lies in the way such languages must be parsed. LR-languages are (I think) a real superset of deterministic LL-languages, but because we can be indeterministic with phrases, they are much more expressive than deterministic LL or LR-languages.

Parsing an LR-grammar usually works bottom up. The parser begins reading terminal symbols (characters or tokens from a lexer) until it decides, that what is there, is described (matched) by a known grammar rule. So it "reduces" the "tail" of the content to that rule-nonterminal. Again it reads terminals until the next rule is matched by a part of the input, and the content is again replaced by the corresponding non terminal. These decisions, whether the content should be "reduced", or "shifted" (reading the next input symbol) are encoded in tables, which are generated out of the given LR-Grammar by tools like YACC or BISON (or other related tools). So generally, at any time of parsing, the stack contains a mixture of terminals and non terminals, which are step by step replaced by "higher" rules. The parser succeeds if the top rule can be matched by the content on the stack.

LL-parsers do the opposite: They see the next n symbols (for an LL(n)-Grammar), and try to match it with the input. If the symbols match, the next rules of the production are tried, otherwise the next alternative production is tried. If no production is found in the rule, the parser fails (this is the difference to phrase grammars, which can backtrack in this situation).

All this means, that we can allow left recursion in an LR-Grammar. For example, a rule can have the form:

a-sequence : a-sequence "a" | empty.

The LR-parser knows from its table, that a-sequence "a" can be reduced to a-sequence. In C- or Java-Grammars you will find a lot of these left recursive rules. An LL-parser, deterministic or non deterministic, could not parse this. It would disappear in an endless decending loop and most probably end with a stack overflow.

So what I would do, if I wanted to write a Java-Parser in Prolog: Obtain an LR-Grammar for Java (which is certainly available somewhere in the internet), and transform it into an indeterministic LL-Grammar that can be parsed by Prolog-phrases.

This works, because phrases are indeterministic, and therefore more expressive than LR-grammars. What you have to do is to remove all left recursions systematically, and change the grammar-rules so that you always (sooner or later) have to match a terminal symbol before the same rule can be called again during the parse.

The example above would turn into:

a-sequence--> "a" a-sequence | [].
 
If you do it correctly (which will take some time, don't underestimate the effort! Perhaps you can support this step by implementing parts of it in Prolog), you obtain an phrase grammar which is certainly less efficient than the original LR-Grammar (because it has to backtrack a lot, which LR-parsers never do), but it works.

If you write the parser yourself, I recommend that you write a lexer before. You will also find lexical definitions alongside with grammar definitions, perhaps in LEX-form, in the net. I usually write a lexer which creates a list of tokens, then apply the parser on that list.

Perhaps this helps you a bit?

best regards!
Reply all
Reply to author
Forward
0 new messages