Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The Untold Story of Formal Language Theory 3

12 views
Skip to first unread message

Mark Hopkins

unread,
Feb 20, 1996, 3:00:00 AM2/20/96
to
Context Free Expressions

In this article, we'll illustrate how the algebra C2 enables us to
directly represent context-free languages in a manner completely analogous
to how regular expressions are used to represent regular languages.

(3.1) Embedding Context-Free Languages in C2 x R(X).
Let X be a finite set (intended to denote an "alphabet"), and assume
that we have a context-free grammar, given by the following:

V = a set of variables (or "non-terminals"),
X = the "terminals"
S = the main variable (or "start symbol"),
P = a set of reduction rules of the form:
n -> w1 w2 ... wk,
where wi is in (X union V), i = 1, ..., k
and k >= 0.

Assume we have a complete postfix (b-p) code corresponding to the set
({0} union V union X) and denote the code of element i by <i|, and
let |i> denote the adjoint of <i|. By completeness we have the identity:

|0><0| + sum (|x><x|: x in X) + sum (|v><v|: v in V) = 1

and by the postfix property we have:

<i| |j> = delta[i,j] = 1 if i = j, 0 else

Corresponding to each rule (n -> w1 w2 ... wk) let

[n -> w1 w2 ... wk] = |n><wk| ... <w2|<w1|
and let
[P] = sum ([r]: r is in P)
Also, let
[X] = sum (|x> x: x is in X)

Then if L is the context-free set generated by the grammar, we may write:

<0| <S| ([X] + [P])* |0> = lub { w: w is in L }

Furthermore, this mapping from C(X), the class of all context-free
languages over X to C2 x R(X) is one-one. Therefore, we are justified
in calling the elements of C2 x R(X) "context free expressions".

To prove this, note that:

<0|<S| ([X] + [P])* |0> = lub { <0|<S| ([X]+[P])^n |0>: n >= 0 }

which in virtue of the relation:

([X]+[P])^n = lub { w1 w2 ... wn: wi is in [X] union [P] }

may be further reduced to the form:

lub { <0|<S| w1 w2 ... wn |0>: wi in [X] union [P], n >= 0 }

This is an enumeration of all the left-most derivation sequences. Exactly
those words that correspond to VALID sequences will cancel and yield a
non-zero result. All other words will cancel to 0 and we may eliminate
0 from the set in virtue of the identity:

lub ({0} union X) = lub X

The result is an enumeration of exactly the words recognized by the
context-free grammar.

(3.2) Example
An example will make this clearer. Suppose we have the following
context-free grammar:
E -> E T
E ->
T -> r E s
T -> x

where E plays the role of the start symbol, S.

Let { <0|, <E|, <T|, <r|, <s|, <x| } represent a complete postfix (b-p)
code of the set { 0, E, T, r, s, x }, and |i> the adjoint of <i|, e.g.

<0| = bb, <E| = bpb, <T| = ppb
<r| = bp, <s| = bpp, <x| = ppp

|0> = dd, |E> = dqd, |T> = dqq
|r> = qd, |s> = qqd, |x> = qqq

Completeness may be verified, e.g.

sum |i><i| = ddbb + qdbp + dqdbpb + qqdbpp + dqqppb + qqqppp
= ddbb + qdbp + dq(db + qp)pb + qq(db + qp)pp
= ddbb + qdbp + dqpb + qqpp
= d(db + qp)b + q(db + qp)p = db + qp = 1

You can verify the postfix property just as easily (i.e. <i||j> = delta[i,j])

For [P] and [X] we may then write:

[P] = |E> + |E><T|<E| + |T><s|<E|<r| + |T><x|
[X] = |r> r + |s> s + |x> x

and we may write:

<0|<E| ([X] + [P])^0 = <0| <E|

corresponding to the zero length derivation # E.

<0|<E| ([X] + [P])^1 = <0| <E| [P] + <0| <E| [X]
= <0| (1 + <T|<E|)
corresponding to the two length one derivations:

# E -> # 1
# E -> # E T

<0|<E| ([X] + [P])^2 = <0| (1 + <T|<E|) ([X] + [P])
= <0| (<T| + <T|<T|<E|)

which corresponds to the two length two derivations:

# E -> # E T -> # T
# E -> # E T -> # E E T

and you may work out the details of the next equality:

<0|<E| ([X] + [P])^3 = <0| (<s|<E|<r| + <x| + <s|<E|<r|<T|<E| + <x|<T|<E|)

which matches the length three derivation sequences:

# E -> # E T -> # T -> # r E s
# E -> # E T -> # T -> # x
# E -> # E T -> # E E T -> # r E s E T
# E -> # E T -> # E E T -> # x E T

Note that multiplying <0|<x| by one further ([X] + [P]) will yield <0| x,
and this represents the step where a "shift" is done on the symbol x:

# E -> # E T -> # T -> # x -> x #

Similarly, each non-zero word in <0|<E| ([X] + [P])^n of the form
<0| x1 ... xm <wk| ... <w1| = x1 ... xm <0| <wk| ... <w1| will represent
a length n leftmost derivation sequence of:

# E ->* x1 ... xm # w1 ... wk

The sequence is considered fully reduced when all the w's are gone, and
this corresonds to the case where the only operator in the word is <0|.
Therefore the only words in <0|<E| ([X] + [P])^n |0> that yield a non-zero
result are those which represent a fully reduced leftmost derivation.

(3.4) The Embedding Theorem
To round out the embedding theorem we also need to show that the
correspondence:

L <--> <0| <S| ([X] + [P])* |0>

is one-one. But this follows directly from working out the experssion in
the matrix representation provided in the first article:

b = 1 0 0 0 0 0 ... p = 0 1 0 0 0 0 ...
0 0 1 0 0 0 ... 0 0 0 1 0 0 ...
0 0 0 0 1 0 ... 0 0 0 0 0 1 ...
0 0 0 0 0 0 ... 0 0 0 0 0 0 ...
0 0 0 0 0 0 ... 0 0 0 0 0 0 ...
... ...

d = transpose of b, q = transpose of p.

x = diag(x, x, ..., x) for each x in X

Every expression of the form above will be a diagonal matrix

<0| <S| ([X] + [P])* |0> = diag(L, L, L, ...)

where L is the language recognized by the grammar.

Therefore the correspondence is one-one.


0 new messages