What mean ATN and DFA ?

1,001 views
Skip to first unread message

gaulou...@gmail.com

unread,
Dec 23, 2016, 3:51:42 AM12/23/16
to antlr-discussion
Hello,

I have quickly searched in the comment in the source code, but I can't find what mean ATN, and DFA

Do you known what it mean ?

Thanks for you interest,

Marc Gorzala

unread,
Dec 23, 2016, 5:31:55 AM12/23/16
to antlr-discussion
Hi,

for DFA I want you to give this as a reference: https://en.wikipedia.org/wiki/Deterministic_finite_automaton

as an essence: I desribes a state-machine where each state transition happens in a deterministic way

for AT I guess that you could mean this: https://en.wikipedia.org/wiki/Alternating_Turing_machine
But you are giving so less context, which let me clearly know what you mean.

Cheers,
Marc

Mike Lischke

unread,
Dec 23, 2016, 9:52:11 AM12/23/16
to antlr-di...@googlegroups.com

I have quickly searched in the comment in the source code, but I can't find what mean ATN, and DFA

Do you known what it mean ?

DFA is the "DeterministicFinite Automata", like Marc already wrote.

ATN means "Augmented Transition Network" (https://en.wikipedia.org/wiki/Augmented_transition_network) - ANTLR uses the term "Augmented Recursive Transition Network" actually (even though there are RTNs as such, https://en.wikipedia.org/wiki/Recursive_transition_network). ATN and DFA represent the main networks as, for instance, described in the Dragon book (Aho, Sethi, Ullman), even though there they use a different term for the ATN (NFA, "Nondeterministic Finite Automaton").

Before ANTLR I have seen only one of the 2 in a compiler generator, but ANTLR uses both networks at the same time. Some parts are converted to a DFA and some are kept in the ATN, however I haven't found out why and where the line is drawn what to keep where (might have to do with Unicode input beyond a certain code point). Maybe Ter or Sam are willing to tell us a bit of the theory behind ANTLR and the story behind this usage. Alternatively, there is a scientific paper describing the ALL(*) algorithm here: http://www.antlr.org/papers/allstar-techreport.pdf.

The ATN is generated by ANTLR and stored as serialized ATN in the generated parser + lexer classes. These are then unserialized to static structures on first use of that class or during the static init of the application (depending on the target language). The DFA is then created on demand when the parser class doesn't find it for a given input symbol. This is what we refer to as the warmup phase. The DFA states are stored as well and can then directly be used on all following parse runs with the same input. There's one exception, however. If an alternative has a predicate on the left hand side the DFA for the rule it is in will not be cached, otherwise the predicate would not be evaluated each time. This means in such cases the DFA has to be created over and over again for that path.

This is in particular relevant for lexer rules, because all the standalone rules are handled as if they were alts of a big top level rule. So, when you have 900 keywords and only one of them has a predicate on the left hand side, all the DFAs for these keywords are regenerated on each run - a big performance hit as I had to find out (http://www.soft-gems.net/index.php/tools/49-the-antlr4-c-target-is-here). Hence the recommendation to avoid leading predicates in such situations, but use trailing ones instead (aka validating predicates). Of course there is a difference then: instead of keeping the lexer from using a rule at all, depending on the predicate, it will then first match the rule and then check the predicate to accept the match or not. Still better than the other way. Fortunately, it's not such a big hit for parser rules.
Reply all
Reply to author
Forward
0 new messages