Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

The Hardest Context Free Language (proof)

4 views

Skip to first unread message

Mark Hopkins

unread,

Jul 28, 1995, 3:00:00 AM7/28/95

This article is a continuation of the series:

The Spinor Representation of Formal Languages
The Formal Language Representation of Algebras
The Equational Representation of Formal Languages
Context Free Expressions

previously posted here. Using the results of the previous article, a
simplified proof of the existence of the Hardest CFL will be given.

(1) History
Back in the late 1960's, Greibach came up with a Context Free Language
(CFL) with the feature that every other Context Free Language has an O(n)
reduction to it. This language is therefore called the Hardest CFL, which I
will call HL. What this means is that you can map any CFL to this language
and test membership of input in the original language by testing membership
of the mapped string in HL. This language therefore provides the one and
only language that you ever need to consider when trying to develop and test
a new CFL Membership algorithm.

The following right linear system may be used to define HL:

S = <0| x A + 1
A = (q <1| + |1> p + d <2| + |2> b) A + y B + z C
B = (q + p + d + b + y) B + z C
C = (q + p + d + b + y) C + y A + x B + |0>

The automaton corresponding to this system can be depicted as an infinite
binary tree of state triples (A, B, C), threaded by arcs connecting each A
state in both directions with its child states. Alternatively, this may be
viewed as a 4 state PDA with 3 stack symbols.

This language can also be represented by the following Context Free
Grammar:
S = x T A
T = d T b T + q T p T + A y T + 1
A = y B + z C
B = b B + d B + q B + p B + y B + z C
C = b C + d C + q C + p C + x B + 1

(2) The Mapping
Assume the language to be mapped is given by a grammar in Greibach Normal
Form, as a system of equations of the form:

N = sum of terms of the form (a N ... N).

In addition, if the language accepts the empty string, 1, the RHS of S's
equation may include the term, 1.

Mapping this language consists of 3 steps: (1) mapping the non-terminals,
(2) mapping the rules, and (3) mapping the symbols. This process will be
illustrated with the following example language:

S = a b + a S b

presented in Greibach Normal Form:

S = a T + a S T
T = b

(2.1) Mapping the non-terminals
(a) Map each of the nonterminals using a binary coding in the symbols
q and d. This coding should have the property of unique decodability. One
way to accomplish this is that if the number of non-terminals is more than
2^(n-1) but no more than 2^n, then map each non-terminal into a distinct n-bit
binary number. If N is a non-terminal, denote the result by <N|.

Example: <S| = q, <T| = d

(b) For each non-terminal, define the complementary map |N> as follows:

|N> = <N| with all q's replaced by p, and all d's by b.

Example: |S> = p, |T> = b

(2.2) Mapping the rules
(a) In the equation for the non-terminal, N, for each term on the right hand
side of its equation:

N = ... + a A ... Z + ...

define the following term:

[N -> a A ... Z] = y |N> <Z| ... <A|

Example:
[S -> a T] = y |S> <T| = y p d
[S -> a S T] = y |S> <T| <S| = y p d q
[T -> b] = y |T> = y b

(2.3) Mapping the symbols
(a) For each symbol, a, define its Start Prefix, Pre(x) as follows:

Pre(a) = x <S| d, if a is a possible start symbol.
= 1 else

Because the grammar is listed in Greibach Normal Form, the set of all the
possible start symbols is contained the set of terminal symbols occurring
of the right hand side of the start symbol's equation.

Example:
Pre(a) = x <S| z = x q z
Pre(b) = 1

(b) For each symbol, a, define the following mapping:

h(a) = Pre(a) R1 R2 ... Rn z

where R1, R2, ..., Rn is a list of all the terms of the form

[N -> a N1 ... Nk]

presented in an arbitrary order.

Example:
h(a) = Pre(a) [S -> a T] [S -> a S T] = x q z y p d y p d q z
h(b) = Pre(b) [T -> b] = y b z

(3) Equivalence of CFL Membership to Membership on HL.
If the original CFL accepts the empty string, 1, then this string will be
mapped to 1, which is accepted by HL in virtue of the first equation:

S = <0| x A + 1

If the language accepts a non-empty string, a1 a2 ... an, then it follows
that Pre(a1) = x <S| z and that h(a1) has the form x <S| z ... Therefore,
every string in the language will have the prefix x <S| z.

For convenience we'll use the following sets of symbols:

q and <1|, p and |1>, d and <2|, and b and |2>

interchangeably and compress our notation, e.g. <0 2| = <0| <2|. Then the
following derivation sequence results:

S x <S| z -> <0| A <S| z -> ... -> <0 S| A z -> <0 S| C

The prefix serves to kick the automaton out of its starting state into the
initial holding state <0 S| C. It will be ignored in any holding state <W| C
as demonstrated by the following derivation sequence:

<W| C x <S| z -> <W| B <S| z -> ... -> <W| B z -> <W| C

Thus, all the strings but those that begin in a valid start symbol are rejected
by the automaton due to the handling of the prefix.

The state <0| <Nk| ... <N1| C effectively represents the following parse
state:

S ->* a1 ... an N1 ... Nk

in the original language in which the first n symbols, a1 ... an, of the input
word have been processed.

The following derivation sequences apply after processing a rule:

[N -> a N1 ... Nk] = y |N> <Nk| ... <N1|

<W N'| C y |N> <Nk| ... <N1|
-> <W N'| A |N> <Nk| ... <N1|
-> ...
-> <W N'| |N> <Nk ... N1| A
-> 0 if N' != N
<W Nk ... N1| A if N' = N

<W N'| C y |N> <Nk| ... <N1|
-> <W N'| C |N> <Nk| ... <N1|
-> ...
-> <W N'| C

Therefore a given rule will result in a non-deterministic branch. One branch
will effectively apply the rule and land in state A, and the other will remain
in state C awaiting further rules to try out.

From state A, the following derivation sequences will apply:

<W| A y |N> <Nk| ... <N1|
-> <W| B |N> <Nk| ... <N1|
-> ...
-> <W| B
and
<W| B y |N> <Nk| ... <N1|
-> <W| B |N> <Nk| ... <N1|
-> ...
-> <W| B

Therefore, once a rule has been applied it will be effectively stowed away in
state B as other rules are tried out.

Therefore, as the rules Ra, ..., Ry, Rz in the mapping of a symbol:

h(a) = Pre(a) Ra ... Ry Rz z

are processed all the rules amongst Ra ... Ry that correspond to valid rules
in this parse state will be captured in the corresponding states <W| B where
each of the W's effectively captures the parse state after the application
of the corresponding rule. The rule Rz is captured in state <W| A if valid.
All the invalid rules are filtered out. Thus, the automaton from a state of
the form <W| C will process h(a) non-deterministically leading to states
of the form

<W| C h(a) ->* <W'| B z
-> <W'| C

<W| C h(a) ->* <W''| A z
-> <W''| C

<W| C h(a) ->* <W| C z
-> 0

The final <W| C, having been carried over from one application to the next
is filtered out at this point.

As you can see, the grammar for HL is designed in such a way that it
exactly mimics the actions of a Push-Down automaton corresponding to
the original CFL.

(4) Example
Applying this to the example language presented above should make everything
clear.

(4.1) a
h(a) = x q z y p d y p d q z

S x q z y p d y p d q z
-> <0| A q z y p d y p d q z
-> <0 1| A z y p d y p d q z
-> <0 1| C y p d y p d q z The initial state reached.

-> (<0 1| A + <0 1| C) p d y p d q z
-> (<0| A + <0 1| C) d y p d q z
-> (<0 2| A + <0 1| C) y p d q z Rule [S -> a T] tried.

-> (<0 2| B + <0 1| A + <0 1| C) p d q z
-> (<0 2| B + <0| A + <0 1| C) d q z
-> (<0 2| B + <0 2| A + <0 1| C) q z
-> (<0 2| B + <0 2 1| A + <0 1| C) z Rule [S -> a S T] tried.

-> (<0 2| C + <0 2 1| C) Rules captured...

-> (<0 2| |0> + <0 2 1| |0>) = 0 ... and input rejected.

(4.2) a b
Applying the same input in string a b results in the following

h(a) h(b) = x q z y p d y p d q z y b z
A h(a) h(b)
-> ...
-> (<0 2| C + <0 2 1| C) y b z
-> (<0 2| A + <0 2 1| A + <0 2| C + <0 2 1| C) b z
-> (<0| A + 0 + <0 2| C + <0 2 1| C) z Rule [T -> b] tried.
-> (<0| C)
-> <0| |0> = 1 ... input accepted.

Note that the state <0 2 1| C, which corresponds to the parse state

S ->* a S T

has been filtered out and the state, <0 2| C, corresponding to the parse state

S ->* a T

has been converted to the state <0| C, which corresponds to the parse state:

S ->* a b

Thus, the input is accepted.

0 new messages