How about efficiency?

181 views
Skip to first unread message

Li Dong

unread,
Jul 2, 2013, 4:47:28 AM7/2/13
to antlr-di...@googlegroups.com
Dear all,

I have been creating a tool for translating Fortran codes, and written a Fortran parser by using Treetop and Parslet (Ruby). Unfortunately, the parser is too slow (spent ~10 seconds to parse a Fortran codes with 2000+ lines). So I am thinking to switch to ANTLR, but before diving in, I would like to estimate the potential outcome. Will ANTLR give me more efficiency? Btw, I know there is a OpenFortranParser using ANTLR 3.5, but I would like to implement some new things.

Best
Li

Jim Idle

unread,
Jul 2, 2013, 5:26:35 AM7/2/13
to antlr-di...@googlegroups.com
Comparing chalk and cheese really. Ruby is horribly slow, ANTLR 4 is very fast. 

In ten seconds you can parse hundreds of source files, maybe thousands. There will be no efficiency problems that I can think of. 

Jim
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

董理

unread,
Jul 2, 2013, 5:31:04 AM7/2/13
to antlr-di...@googlegroups.com
Hi Jim,

Hundreds of source files is what I want to hear about! Then I will dive into ANTLR. Thanks for quick replying!

Li

You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/zAy7hl3D0jo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussi...@googlegroups.com.

Terence Parr

unread,
Jul 2, 2013, 12:31:48 PM7/2/13
to antlr-di...@googlegroups.com
This depends a LOT on the grammar; i.e., how you write the grammar.  I can parse the 7400 Java files from Java 7 lib in 7 seconds including load from SSD, traversing dirs, GC, etc...

I have another java grammar that is BRUTALLY slow for same version of ANTLR and same input.

Ter

July 2, 2013 2:31 AM
Hi Jim,

Hundreds of source files is what I want to hear about! Then I will dive into ANTLR. Thanks for quick replying!

Li



--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
July 2, 2013 2:26 AM
Comparing chalk and cheese really. Ruby is horribly slow, ANTLR 4 is very fast. 

In ten seconds you can parse hundreds of source files, maybe thousands. There will be no efficiency problems that I can think of. 

Jim

On Jul 2, 2013, at 16:47, Li Dong <dongl...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
July 2, 2013 1:47 AM

董理

unread,
Jul 2, 2013, 9:15:12 PM7/2/13
to antlr-di...@googlegroups.com
Well, I wrote grammars in Parslet normally, and I have reviewed the grammar thoroughly, removed many small matches (make time cost drop from ~20 seconds to ~10 seconds). Then I can not squeeze any performance any more without messing the grammars. This is just for one file by the way. So I think ANTLR is my solution?

Li

You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/zAy7hl3D0jo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussi...@googlegroups.com.

Terence Parr

unread,
Jul 2, 2013, 9:24:24 PM7/2/13
to antlr-di...@googlegroups.com
ANTLR is as faster or faster than competing tools if used properly.
Ter

July 2, 2013 6:15 PM
Well, I wrote grammars in Parslet normally, and I have reviewed the grammar thoroughly, removed many small matches (make time cost drop from ~20 seconds to ~10 seconds). Then I can not squeeze any performance any more without messing the grammars. This is just for one file by the way. So I think ANTLR is my solution?

Li



--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
July 2, 2013 9:31 AM

George S Cowan

unread,
Jul 3, 2013, 2:45:32 PM7/3/13
to
OK, Ter, give us a hint: What's the main difference between the two Java grammars that accounts for the radically different parse times.

I hope it's not dozens of little tweaks that only you know how to make, or exhaustive testing of dozens of alternatives.

The only advice I recall from the Antlr 4 Ref is to try the SLL(*) parsing option.

George

Terence Parr

unread,
Jul 3, 2013, 4:56:42 PM7/3/13
to antlr-di...@googlegroups.com
Hi George, well, starting with speed, the key is to avoid stack
sensitive rules. Take a look at the Tech report on ALL(*) parsing in the
empirical data section:

http://www.antlr.org/papers/allstar-techreport.pdf

if you look at figure 13, all you will see a difference between the C
grammar (C11) with a faster one called C11SLL where we tweaked a single
rule. We used ANTLR’s DiagnosticErrorListener
object to identify the stack-sensitivity at parse time.

if you look at the DFA size per files parsed, you'll see a huge
difference between Java (top graph) and JavaSable (bottom graph).
JavaSable is a literal copy of the Java specification from the book
without concern for efficiency. (taken from SableCC's website). we
have a TestPerformance.java file that tells us which decisions take the
most model look ahead and so on. We plan on building a simple interface
possibly inside AW2 that will help out. We will also introduce automatic
left factoring to reduce lookahead requirements.

The short answer is there are lots of obvious left factorings that help
like:

decl : method_head method_body
| method_head ';' // interface
;

Here, matter what, we have to look all the way past the full method
head. we will automatically left factor this in a future version. For
the moment, you can merge the method_head.

Ter


> George S Cowan <mailto:co...@acm.org>
> July 3, 2013 11:40 AM
> OK, Ter, give us a hint: What's the main difference between the two
> Java grammars that accounts for the radically different parse times.
>
> I hope it's not dozens of little tweaks that only you know how to
> make, or exhaustive testing of dozens of alternatives.
>
> George
>
>
> On Tuesday, July 2, 2013 12:31:48 PM UTC-4, the_antlr_guy wrote: --
parse-time-stats.pdf
dfasize-stats.pdf

Mike Lischke

unread,
Jul 4, 2013, 3:03:05 AM7/4/13
to antlr-di...@googlegroups.com

Ter,

> The short answer is there are lots of obvious left factorings that help like:
>
> decl : method_head method_body
> | method_head ';' // interface
> ;


I wonder what could be a reason to write a rule this way. I would expect that everybody with at least some parser background would left factor method_head.

Mike
--
www.soft-gems.net

Sam Harwell

unread,
Jul 4, 2013, 11:45:08 AM7/4/13
to antlr-di...@googlegroups.com
One of the driving goals of ANTLR 4 is allowing developers to write grammars that closely follow a language specification. If we can provide great performance for a wide variety of notations, the end result will be compilers and tools with fewer bugs and greater flexibility. If the language specification is written without left factoring method_head, then it would be preferable to write the grammar in the same way and allow the parser generator to perform the left factoring internally so everybody wins.

--
Sam Harwell
Owner, Lead Developer
http://tunnelvisionlabs.com

Mike Lischke

unread,
Jul 4, 2013, 12:11:52 PM7/4/13
to antlr-di...@googlegroups.com

> One of the driving goals of ANTLR 4 is allowing developers to write grammars that closely follow a language specification. If we can provide great performance for a wide variety of notations, the end result will be compilers and tools with fewer bugs and greater flexibility. If the language specification is written without left factoring method_head, then it would be preferable to write the grammar in the same way and allow the parser generator to perform the left factoring internally so everybody wins.


Sounds like a nice vision. But then it would be necessary to have full automatic left recursion handled too. I mean, left factoring is usually easy to accomplish. Multirule left recursion on the other hand is a real challenging problem.

Mike
--
www.soft-gems.net

George S Cowan

unread,
Jul 7, 2013, 8:13:31 PM7/7/13
to antlr-di...@googlegroups.com
Ter, I finally got around to upgrading my Java test system with your suggestion to use SLL parsing and failover to the full ALL* if there is an error, as described in the Antlr 4 Ref on p.243. I'm very impressed by the added efficiency. I went from 1 minute 12 seconds down to 15 seconds, but I think my paltry 8145 Java files isn't really enough for it to hardly get started.

I added sysout writes to see where your latest Java.g4 causes any failovers. But it doesn't. I can't find any correct Java programs that fail with SLL. I tried one of the older Java grammars and found that it issued a ParseCancellationException (that was a surprise) for the following program:

import java.util.Collections;
import java.util.List;

class TestExplicitGenericInvocation {

public static void main(String[] args) {
 
List<String[]> s1 = Collections.emptyList();
 
List<String[]> s2 = Collections.<String[]>emptyList();

 
List<int[]> i1 = Collections.emptyList();
 
List<int[]> i2 = Collections.<int[]>emptyList();
 
System.out.println("Objects: " +s1+ " " +s2+ " " +i1+ " " +i2);
}

}

But that was because the grammar, even with ALL*, actually thought it was an error.

So the question is: can you tell me where the grammar is that doesn't work well with SLL?

George

Terence Parr

unread,
Jul 9, 2013, 6:23:28 PM7/9/13
to antlr-di...@googlegroups.com
Hi.  The answer is that the grammar is SLL in fact but because of the left recursion elimination procedure, the predicates make it impossible for the SLL prediction to decide on a conflict so it fails over to full stack-sensitive LL even though it doesn't need to. the two-stage approach overcomes the problem :)

Ter
--
Dictation in use. Please excuse homophones, malapropisms, and nonsense. 
Reply all
Reply to author
Forward
0 new messages