This is a followup to the articles
"The Spinor Representation of Formal Languages"
"The Formal Language Representation of Algebras"
posted here recently.
Over the past several years I've developed a method of representing
formal languages by language expressions, that directly generalizes the
use of regular expressions. Every regular language can be represented as a
regular expression over the free monoid comprising the set of words. Likewise
every formal language can be represented as a regular expression over a
non-trivial algebra that contains the free monoid as a subalgebra. In this
representation, an automaton becomes the diagrammatic depiction of a system
of equations whose solution is the language itself. The axiom which makes the
correspondence possible is:
State = Expression.
In general, an automaton consists of the following:
Infinite State Automaton:
* Q: A set of states, possibly infinite.
* X: An input alphabet.
* D: The state transition relation, a subset of
(Q x (X union {1}) x Q
* F: A set of final states, a subset of Q.
* s: The start state, an element of Q.
with the notation:
q x |- q' iff ( (q, x), q' ) is in D.
q |- q' iff ( (q, 1), q' ) is in D.
with
q0 x1 x2 ... xn |= q{n+1} iff (qi xi |- q{i+1}) 0 <= i <= n
xi in X union {1}.
I'm using |= instead of the customary notation |-^* for convenience.
An expression w in X* is considered to be accepted by the automaton if
s w |= q for some q in F.
The language defined by the automaton is the set of expressions it accepts.
Different types of automaton can be obtained by placing restrictions on
Q, D and F. Examples:
FINITE STATE AUTOMATA: * Q is finite.
COUNTER AUTOMATA: * Q = Q0 x N
Q0 finite, N = { 0, 1, 2, 3, ... }
* F = { 0 }
* Only transitions (q,n) x |- (q,n')
allowed for |n - n'| < 2.
* If (q, n + 1) x |- (q, n + a)
then (q, n' + 1) x |- (q, n' + a)
PUSH DOWN AUTOMATA: * Q = Q0 x S*, S = stack alphabet.
Q0 finite
* F subset of Q0 x { 1 }
* Only transitions (q,w t) x |- (q,w u)
allowed for t in S, u in S*.
* If (q, w t) x |- (q, w u)
then (q, w' t) x |- (q, w' u)
PUSH DOWN AUTOMATA: * Q, F, s the same as above
(no lambda transitions) * D the same as above, except that no
lambda transitions
(q, w t) |- (q, w u)
are allowed.
with a similar set of constraints for a 2-stack automaton over the state
set Q0 x S* x T*, S and T being the respective stack alphabets.
Note the presence of symmetry requirements in the last two cases. This
allows the automaton to be given a finite presentation by factoring out the
finite state set Q0 and treating the automaton as a finite state automaton
with an extra device (counter, stack, 2 stacks, etc.).
Every automaton is a system of equations in disguise with the following
correspondence:
* The equations involve expressions denoting subsets of X*.
* The set of variables is Q, the main variable is s.
* q = 1 + ... iff q is in F
* q = ... + x q' + ... iff q x |- q' (if ((q,x),q') is in D).
* q = ... + q' + ... iff q |- q' (if ((q,1),q') is in D).
and the following result:
The language accepted by the automaton is the least fixed point
solution of the system for the main variable s.
Example:
The counter automaton
-- p --> -- p --> -- p --> ...
--> [[0]] [1] [2] [3]
<-- q -- <-- q -- <-- q -- ...
yields the following set of relations:
X = {p, q}, s = 0, F = {0}
n p |- n+1 n+1 q |- n
therefore corresponds to the following system:
q0 = 1 since 0 is in F
+ p q1 since 0 p |- 1
q{n+1} = q qn since n+1 q |- n
+ p q{n+2} since n+1 p |- n+2
thus yielding the system
q0 = 1 + p q1
q1 = q q0 + p q2
q2 = q q1 + p q3, etc.
The least fixed point solution to this system is:
qn = S (q S)^n
where S is the Dyck language D{p, q) defined by
S = 1 + p S q S.
In fact, you can verify this to be a solution by direct substitution. To
show that it's the *least* such solution means to show that the Dyck language
can be given by a construction sequence:
S0 = 0
S{k+1} = 1 + p Sk q Sk
S = union (Sk: k >= 0)
which is always true since the words in the Dyck language are all finite.
For each type of automaton, the corresponding system of equations will
take on a special form reflecting the constraints placed on the automaton.
For a finite state automaton, the system is a finite right-linear system. For
a counter automaton it is an infinite right linear system of the form:
q0 = 1 + a q0 + b q1
q1 = c q0 + d q1 + e q2
q2 = c q1 + d q2 + e q3
q3 = c q2 + d q3 + e q4
...
where a, b, c, d, e are finite sums over the set (X union {1}). The symmetry
requirement makes all but the first equation identical in form.
To arrive at this corerspondence, define the following:
|q| = { w in X*: q x |= q' for some q' in F }
The axiom
State = Expression.
is then realized by identifying q with |q|. Since |= is defined as the
RTC of the relation |-, the following properties then hold:
R (Reflexivity) A |= A
T (Transitivity) A |- B, B |= C ==> A |= C
C (Closure) |= is the smallest such relation
Thus an equation for |q| can be developed:
|q| = { w in X*: w = 1, q = q', q' in F }
+ { x w in X*: q x |- q', q' w |= q'', q'' in F }
+ { w in X*: q |- q', q' w |= q'', q'' in F }
The first term comes from R, the next two from T. Property C guarantees
that |q| will be the least solution to the system of equations developed.
These terms can be reduced as follows:
|q| = \q
+ union x ( { w in X*: w in |q'| }: for q x |- q' )
+ union { w in X*: w in |q'| }: for q |- q' )
where
\q = { 1 } if q in F
= 0 else
or
|q| = \q
+ union (x |q'|: q x |- q')
+ union (|q'|: q |- q')
Applying the axiom q = |q| results in the system:
q = \q + union (x q': q x |- q') + union (q': q |- q')
thus arriving at the correspondence.
As a consequence of these developments, we arrive at the correspondence
Language = Expression
yet again. In fact, it is not too hard to see the link to the previous
articles when one notes that the language given by the example above:
q0 = 1 + p q1
q1 = q q0 + p q2
q2 = q q1 + p q3, etc.
is the same as the one which was given in the previous articles by the context
free expression:
<0| (p v + q u)* |0>
whose matrix representation had the form:
/ \ * / \
| 0 p 0 0 0 ... | | 1 |
| q 0 p 0 0 ... | | 0 |
(1 0 0 0 ...) | 0 q 0 p 0 ... | | 0 |
| 0 0 q 0 p ... | | 0 |
... ...
Defining:
/ \ / \ * / \
| q0 | | 0 p 0 0 0 ... | | 1 |
| q1 | | q 0 p 0 0 ... | | 0 |
Q = | q2.|. = | 0 q 0 p 0 ... | | 0 |
| q3 | | 0 0 q 0 p ... | | 0 |
.... ... ...
and using the relation (A* B = B + A (A* B)), we can write:
/ \ / \ / \ / \
| q0 | | 1 | | 0 p 0 0 0 ... | | q0 |
| q1 | | 0 | | q 0 p 0 0 ... | | q1 |
| q2.|. = | 0 | + | 0 q 0 p 0 ... | | q2 |
| q3 | | 0 | | 0 0 q 0 p ... | | q3 |
.... ... ... ....
with
<0| (p v + q u)* |0> = (1 0 0 0... ) Q = q0
selecting out the main variable.