Yes, you're right as far as I can see.
The only way that I can see now to test for subset is the method I use
for testing for equivalence.
I will describe the algorithm starting from two NFAs.
Let B and C be two NFAs (possibly DFAs).
We want to test if L(B) is a subset of L(C) and conversely.
Let's construct the DFA of B | C.
By using the or operator '|' between NFAs I mean to use the method
used for the '|' operator
in Thomson's construction. That is: add a new start state and a new
end state and join
the new start state to each of the start states of B and C by
epsilon transitions;
then join the end states of B and C to the new end state by
epsilon transitions.
Let D be the DFA constructed from B | C using the subset
construction algorithm.
Let d be any state of D. As a result of the subset construction
algorithm,
d has associated with it a set of NFA states s(d) that distinguish
d from any other state of D.
s(d) is constructed from the states of B and C, such that
s(d) = s(d,B) disjointUnion s(d,C)
where s(d,B) is the subset of the states of B that are states of s
(d)
and s(d,C) similarly is the subset of states of C that are states
of s(d).
Let t be a string such that, after reading t, D is in state d.
Assume s(d,B) is empty. Then B will reach an error on reading t
before reaching the end of t.
But then s(d,C) is not empty since s(d) is not empty.
Thus, C will NOT reach an error on reading t before reaching the
end of t.
As a result, L(C) is not a subset of L(B) because there are strings
accepted by C that are not
accepted by B; some (possibly all) of those strings begin with t.
More generally, L(B) is a superset of L(C) if for every state e
of D, s(e,B) is not empty.
Similarly, L(C) is a superset of L(B) if for every state e
of D, s(e,C) is not empty.
If L(B) is a superset of L(C) and conversely, then B and C are
equivalent.
Thus the algorithm is:
1) Construct D from B | C.
2) Return "B and C equivalent" if for every state e of D, s
(e,B) and s(e,C) are not empty.
3) Return "B is a subset of C" if for every state e of D, s
(e,C) is not empty.
4) Return "C is a subset of B" if for every state e of D, s
(e,B) is not emtpy.
5) Return "B is not a subset of C and conversely".
6) Send Ralph Boland $1000.
OK; the last step is optional.
Notes:
1) It is not always necessary to construct all of D. As each state
of D is constructed
by subset construction the subset or equivalence test can be applied.
The algorithm can
be terminated immediately if a test failure occurs. In my case the
finite automata are
almost always equivalent so usually all states get tested.
2) In principle, this algorithm could be more expensive for testing
equivalence than constructing
miniminal state DFAs and testing for isomorphism, especially if B
and C are DFAs.
But in practice, I find the algorithm works quite well. I have never
implemented the isomorphism
approach so I can't provide comparisons. My algorithm is certainly
simpler though; especially if
you have all the tools in place to construct DFAs from regular
expressions but not the DFA
minimization algorithm as was the situation I was in at the time.
3) I am sure that I am not the first to come up with this algorithm
but I have no references.
4) If there is a better way, I'd love to hear it.
5) I never meant the above to be a formal proof but a brief
explanation of the algorithm so
if you want more rigor you're on your own though you could post
your result here.
6) The code I have referred to is Smalltalk code which I will
eventually release as Open source.
Ralph Boland