Hello all,
A recently released new version of QSMM project features probabilistic adaptive top-down and bottom-up parsers utilizing a reinforcement learning approach for the induction of context-free grammars by template. The reward function is based on the probability of parsing an input text by a probabilistic context-free grammar (PCFG) being learned. A template for learning a PCFG is an ambiguous grammar consisting of nonterminal symbols with regular expressions at their right-hand sides.
The parsers internally generate probabilistic assembler programs for parsing nonterminal symbols of a template grammar. A probabilistic assembler program can contain "jprob P, label" instructions specifying a jump to a location label with probability P. The assembler program defines the structure of a probabilistic finite automaton. The reinforcement learning process consists in adjusting probabilities of transitions between automaton states while performing actual transitions between them.
The reinforcement learning algorithm calculates accumulated cycle-summed values over cycles traversed in an automaton state transition graph and selects the next automaton state according to a relative probability that depends on an accumulated cycle-summed value. On finishing traversing a cycle in the graph, an accumulated cycle-summed value is incremented by the difference between a total reward value at cycle end and a total reward value at cycle start (an initial cycle-summed value).
This algorithm supports scaling for large numbers of automaton states by generating helper finite automatons with tree-form state transition graphs. As every tree node has a small number of child nodes (2 for a binary tree), the number of relative probabilities to calculate for traversing a path from a root node to a leaf node is usually much smaller than the number of relative probabilities that would be calculated for all leaf nodes to select one of them.
The direct use of an accumulated cycle-summed value to calculate the relative probability of selecting a particular automaton state leads to poor learning results, as large cycle-summed values accumulated during long automaton execution decrease ability to try alternative state transition paths that may turn to be more optimal.
To cope with this problem, the algorithm performs the accumulation of cycle-summed values and statistics on led out PCFG productions only for a certain period of automaton execution time until current time. Compared to the previous QSMM version, the new version implements a simpler algorithm for calculating and adjusting the length of this time period.