Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The Equational Representation of Formal Languages

5 views
Skip to first unread message

Mark Hopkins

unread,
Jul 28, 1995, 3:00:00 AM7/28/95
to
This is a followup to the articles
"The Spinor Representation of Formal Languages"
"The Formal Language Representation of Algebras"

posted here recently.

Over the past several years I've developed a method of representing
formal languages by language expressions, that directly generalizes the
use of regular expressions. Every regular language can be represented as a
regular expression over the free monoid comprising the set of words. Likewise
every formal language can be represented as a regular expression over a
non-trivial algebra that contains the free monoid as a subalgebra. In this
representation, an automaton becomes the diagrammatic depiction of a system
of equations whose solution is the language itself. The axiom which makes the
correspondence possible is:

State = Expression.

In general, an automaton consists of the following:

Infinite State Automaton:
* Q: A set of states, possibly infinite.
* X: An input alphabet.
* D: The state transition relation, a subset of
(Q x (X union {1}) x Q
* F: A set of final states, a subset of Q.
* s: The start state, an element of Q.

with the notation:
q x |- q' iff ( (q, x), q' ) is in D.
q |- q' iff ( (q, 1), q' ) is in D.
with
q0 x1 x2 ... xn |= q{n+1} iff (qi xi |- q{i+1}) 0 <= i <= n
xi in X union {1}.

I'm using |= instead of the customary notation |-^* for convenience.

An expression w in X* is considered to be accepted by the automaton if

s w |= q for some q in F.

The language defined by the automaton is the set of expressions it accepts.

Different types of automaton can be obtained by placing restrictions on
Q, D and F. Examples:

FINITE STATE AUTOMATA: * Q is finite.

COUNTER AUTOMATA: * Q = Q0 x N
Q0 finite, N = { 0, 1, 2, 3, ... }
* F = { 0 }
* Only transitions (q,n) x |- (q,n')
allowed for |n - n'| < 2.
* If (q, n + 1) x |- (q, n + a)
then (q, n' + 1) x |- (q, n' + a)

PUSH DOWN AUTOMATA: * Q = Q0 x S*, S = stack alphabet.
Q0 finite
* F subset of Q0 x { 1 }
* Only transitions (q,w t) x |- (q,w u)
allowed for t in S, u in S*.
* If (q, w t) x |- (q, w u)
then (q, w' t) x |- (q, w' u)

PUSH DOWN AUTOMATA: * Q, F, s the same as above
(no lambda transitions) * D the same as above, except that no
lambda transitions

(q, w t) |- (q, w u)

are allowed.

with a similar set of constraints for a 2-stack automaton over the state
set Q0 x S* x T*, S and T being the respective stack alphabets.

Note the presence of symmetry requirements in the last two cases. This
allows the automaton to be given a finite presentation by factoring out the
finite state set Q0 and treating the automaton as a finite state automaton
with an extra device (counter, stack, 2 stacks, etc.).


Every automaton is a system of equations in disguise with the following
correspondence:

* The equations involve expressions denoting subsets of X*.
* The set of variables is Q, the main variable is s.
* q = 1 + ... iff q is in F
* q = ... + x q' + ... iff q x |- q' (if ((q,x),q') is in D).
* q = ... + q' + ... iff q |- q' (if ((q,1),q') is in D).

and the following result:

The language accepted by the automaton is the least fixed point
solution of the system for the main variable s.

Example:
The counter automaton

-- p --> -- p --> -- p --> ...
--> [[0]] [1] [2] [3]
<-- q -- <-- q -- <-- q -- ...

yields the following set of relations:

X = {p, q}, s = 0, F = {0}
n p |- n+1 n+1 q |- n

therefore corresponds to the following system:

q0 = 1 since 0 is in F
+ p q1 since 0 p |- 1

q{n+1} = q qn since n+1 q |- n
+ p q{n+2} since n+1 p |- n+2

thus yielding the system

q0 = 1 + p q1
q1 = q q0 + p q2
q2 = q q1 + p q3, etc.

The least fixed point solution to this system is:

qn = S (q S)^n

where S is the Dyck language D{p, q) defined by

S = 1 + p S q S.

In fact, you can verify this to be a solution by direct substitution. To
show that it's the *least* such solution means to show that the Dyck language
can be given by a construction sequence:

S0 = 0
S{k+1} = 1 + p Sk q Sk
S = union (Sk: k >= 0)

which is always true since the words in the Dyck language are all finite.


For each type of automaton, the corresponding system of equations will
take on a special form reflecting the constraints placed on the automaton.
For a finite state automaton, the system is a finite right-linear system. For
a counter automaton it is an infinite right linear system of the form:

q0 = 1 + a q0 + b q1
q1 = c q0 + d q1 + e q2
q2 = c q1 + d q2 + e q3
q3 = c q2 + d q3 + e q4
...

where a, b, c, d, e are finite sums over the set (X union {1}). The symmetry
requirement makes all but the first equation identical in form.

To arrive at this corerspondence, define the following:

|q| = { w in X*: q x |= q' for some q' in F }

The axiom
State = Expression.

is then realized by identifying q with |q|. Since |= is defined as the
RTC of the relation |-, the following properties then hold:

R (Reflexivity) A |= A
T (Transitivity) A |- B, B |= C ==> A |= C
C (Closure) |= is the smallest such relation

Thus an equation for |q| can be developed:

|q| = { w in X*: w = 1, q = q', q' in F }
+ { x w in X*: q x |- q', q' w |= q'', q'' in F }
+ { w in X*: q |- q', q' w |= q'', q'' in F }

The first term comes from R, the next two from T. Property C guarantees
that |q| will be the least solution to the system of equations developed.

These terms can be reduced as follows:

|q| = \q
+ union x ( { w in X*: w in |q'| }: for q x |- q' )
+ union { w in X*: w in |q'| }: for q |- q' )
where
\q = { 1 } if q in F
= 0 else
or
|q| = \q
+ union (x |q'|: q x |- q')
+ union (|q'|: q |- q')

Applying the axiom q = |q| results in the system:

q = \q + union (x q': q x |- q') + union (q': q |- q')

thus arriving at the correspondence.

As a consequence of these developments, we arrive at the correspondence

Language = Expression

yet again. In fact, it is not too hard to see the link to the previous
articles when one notes that the language given by the example above:

q0 = 1 + p q1
q1 = q q0 + p q2
q2 = q q1 + p q3, etc.

is the same as the one which was given in the previous articles by the context
free expression:
<0| (p v + q u)* |0>

whose matrix representation had the form:

/ \ * / \
| 0 p 0 0 0 ... | | 1 |
| q 0 p 0 0 ... | | 0 |
(1 0 0 0 ...) | 0 q 0 p 0 ... | | 0 |
| 0 0 q 0 p ... | | 0 |
... ...
Defining:
/ \ / \ * / \
| q0 | | 0 p 0 0 0 ... | | 1 |
| q1 | | q 0 p 0 0 ... | | 0 |
Q = | q2.|. = | 0 q 0 p 0 ... | | 0 |
| q3 | | 0 0 q 0 p ... | | 0 |
.... ... ...

and using the relation (A* B = B + A (A* B)), we can write:

/ \ / \ / \ / \
| q0 | | 1 | | 0 p 0 0 0 ... | | q0 |
| q1 | | 0 | | q 0 p 0 0 ... | | q1 |
| q2.|. = | 0 | + | 0 q 0 p 0 ... | | q2 |
| q3 | | 0 | | 0 0 q 0 p ... | | q3 |
.... ... ... ....
with
<0| (p v + q u)* |0> = (1 0 0 0... ) Q = q0

selecting out the main variable.

0 new messages