NFA and negation/conjunction

Stefan Monnier

unread,

May 13, 2010, 11:42:22 PM5/13/10

to

I'm looking for any work that tries to provide negation and/or
conjunction operations in regexps without using a DFA (or at least
while avoiding the DFA state explosion problem).

My search has currently come up empty (actually I pretty much haven't
seen mention of conjunction and negation in regexps other than in
Brzozowski's derivatives). So any pointer would be welcome,

Stefan

klyjikoo

unread,

May 14, 2010, 3:29:02 PM5/14/10

to

Recently I Built a tool for generating NFA's from simple regexes and
then reducing the NFA,i simply used the following rule for conjunction
of NFA's during construction:
If N1 and N2 are the NFAs associated with regular expression R1 and
R2 respectively, then we construct the N' associated with R1R2 as
follows:

a) For each final state s of N1, we make any transition that initiate
from start state of N2 also to initiate from s.

b) Start state of N1 would be start state of N'.

c) Final states of N2 would be final states of N'.

d) If start state of N2 is a final state then final states of N1 also
would be final states of N'.

e) We eliminate start state of N2 and all related transition if it is
a useless state in N'.

For negation of NFA you could interchange the start state and final
states of Original NFA .

Stephen Horne

unread,

May 15, 2010, 11:25:49 AM5/15/10

to

On Thu, 13 May 2010 23:42:22 -0400, Stefan Monnier
<mon...@iro.umontreal.ca> wrote:

>I'm looking for any work that tries to provide negation and/or
>conjunction operations in regexps without using a DFA (or at least
>while avoiding the DFA state explosion problem).

I've done this, but my method *does* cause a combinatorial explosion
because, basically, one of the steps is to eliminate nondeterminism.
The next step is to minimise, though - but of course DFA minimisation
only achieves so much.

I've never checked how big the intermediate results get because, in
practice, my inputs just aren't that complex anyway.

I based what I do on what Adrian Thurstons Ragel does, at least as
described in the manual.

I'm assuming that conjunction means set intersection, and negation is
set negation. Rather the implement negation, I think the set
difference is more practical - you can always do "any* - input" for a
negation. A true negation is a slippery thing anyway - it all seems
sensible while you have one fixed set of input tokens, then someone
says "now lets update that regex to support unicode". A "true"
negation would need to support an infinite set of input tokens.

Anyway, the basic principle is simple enough. Merge the two input
NFAs. No states or transitions are added - you just renumber the
states in one then combine the state and transition tables.

Then, do a closure that removes all the nondeterminism - each state in
the result mapping to a set of states in the merged input.

Then, check each state. If a state has end states from both the first
input and the second input, for a set intersection, it is itself an
end state - otherwise not. For set difference, you look for states
that have input 1 end states but no input 2 end states.

Then you do dead state removal and minimisation.

Thats the simplified explanation of course - in practice, a lot of the
work can be built into the one modified convert-to-DFA algorithm, so
that e.g. a lot of dead states are never even created.

I'm sure this isn't new to anyone here.

What I'd like to know is - is there any such thing as an efficient
algorithm to minimise an NFA?

Obviously DFA minimisation algorithms can be applied to NFAs, but in
general they don't give a minimal result - only as minimal as can be
achieved by merging equivalent states.

Consider, for example, a six state DFA loop where states are end
states if their distance from the start is divisible by two or three.
A minimal NFA would have two separate loops - a two-state loop for the
divisible-by-two case and a three-state loop for the
divisible-by-three case. Automatically spotting things like this is
interesting because it involves splitting states apart (e.g. to make a
separate six-state loop for the divisible-by-two case) as well as
merging states. And the trouble with splitting states is that there's
not necessarily any clear indicator of when to stop.

I'm guessing that this is an NP-hard or NP-complete problem - am I
right?

Stefan Monnier

unread,

May 15, 2010, 2:32:00 PM5/15/10

to

> If N1 and N2 are the NFAs associated with regular expression R1 and
> R2 respectively, then we construct the N' associated with R1R2 as
> follows:
> a) For each final state s of N1, we make any transition that initiate
> from start state of N2 also to initiate from s.
> b) Start state of N1 would be start state of N'.
> c) Final states of N2 would be final states of N'.
> d) If start state of N2 is a final state then final states of N1 also
> would be final states of N'.
> e) We eliminate start state of N2 and all related transition if it is
> a useless state in N'.

That seems to describe the rules for the concatenation of R1 and R2, but
not their conjunction, right (where the conjunction R1&R2 is a regexp
that describes the set of strings corresponding to the intersection of
the set of strings described by R1 and by R2).

Stefan

klyjikoo

unread,

May 16, 2010, 6:35:08 AM5/16/10

to

> That seems to describe the rules for the concatenation of R1 and R2, but
> not their conjunction,

Sorry for false responce,

I think For both intersection and complement of DFA's, there are
standard algorithms that can be found in textbooks, also for
elimination of useless states and minimization of DFA .

In case of NFA also applying the same algorithms is possible, except
that minimization of NFA is PSPACE-Complete.

specially for Negation of NFA, i could simple interchange the start
state and final states of original NFA, that this lead to several
start states and only one final state and a number of dead state. For
avoidance of several start state, i can follow this rules:
a) Adding a new state s ,we make any transition that initiate
from current start states also to initiate from s.
b) State s in above would be new start state.
c) Elimination of any non-reachable state in new NFA.

For intersection of NFA's i can use the standard algorithms that
produce an NFA with |n*m| states in the case of two input NFA with |n|
and |m| states. But a solution resulting less states would be
applying the deMorgon's Law. I could first negate each automata,
compute the union of them, then again negate the final result.

Then I have used the following rule for computing union of NFA's :

If N1 is the NFA associated with regular expression R1 and N2 is the
NFA associated with regular expression R2 ,then we construct the N'

associated with R1|R2 as follows:

a) We add a new state s and mark it as start state of N', then we make
any transition that initiate from start states of N1 and N2 also to
initiate from s.
b) Final states of N1 and N2 would be final states of N'.
c) If any of start state of N1 and N2 is a final state then start
state of N' also would be a final state.
d) We eliminate start state of N1 and N2 and all related transition if
they are useless states in N'.

Chris F Clark

unread,

May 16, 2010, 9:27:28 PM5/16/10

to

klyjikoo <klyj...@gmail.com> writes:

> I think For both intersection and complement of DFA's, there are
> standard algorithms that can be found in textbooks, also for
> elimination of useless states and minimization of DFA .
>
> In case of NFA also applying the same algorithms is possible, except
> that minimization of NFA is PSPACE-Complete.
>
> specially for Negation of NFA, i could simple interchange the start
> state and final states of original NFA, that this lead to several
> start states and only one final state and a number of dead state. For
> avoidance of several start state, i can follow this rules:
> a) Adding a new state s ,we make any transition that initiate
> from current start states also to initiate from s.
> b) State s in above would be new start state.
> c) Elimination of any non-reachable state in new NFA.

When working with NFAs, there is the "small" issue that one doesn't
generally put arrows to a failure state for transitions, one just
omits them and the machine halts without accepting. Epsilon
transitions are another aspect of NFAs that distinguish it from DFAs.
Resolving these issues generally gets one back to a DFA and one has to
face the state explosion issue.

So, while NFAs and DFAs have the same expressive power, they don't do
so with the exact same complexity level. Some things are easier with
NFAs, others with DFAs, and there aren't simple linear algorithms to
convert the two formulations. That holds true with both of those
compared to regular expressions also. If that wasn't true, my current
job would be a lot easier (and I'd have to do something else).

Hope this helps,
-Chris

******************************************************************************
Chris Clark email: christoph...@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

Torben �gidius Mogensen

unread,

May 17, 2010, 5:19:11 AM5/17/10

to

Stefan Monnier <mon...@iro.umontreal.ca> writes:

> I'm looking for any work that tries to provide negation and/or
> conjunction operations in regexps without using a DFA (or at least
> while avoiding the DFA state explosion problem).

I assume you mean intersection when you say conjunctipon.

> My search has currently come up empty (actually I pretty much haven't
> seen mention of conjunction and negation in regexps other than in
> Brzozowski's derivatives). So any pointer would be welcome,

You can do intersection of epsilon-free NFAs using the construction
that you use for DFAs (constructing a product automaton). If you just
do a single intersection, this stays quadratic, but multiple
intersections gets exponential. This can't be avoided, as deciding
emptyness of the intersection of a set of regexps is PSPACE complete
in the combined size of the regexps. Since deciding emptyness is
linear in teh size of an automaton, you can't construct an automaton
faster.

Complement of complete DFAs (i.e., DFAs without undefined transitions)
is trivial: You just negate the accepting-state bit. If your DFA is not
complete, it is easy to make it so (just add a dead state and make all
undefined transitions go to this). But I don't think there is a
sub-exponential complement construction for regexps or NFAs.

Torben

klyjikoo

unread,

May 19, 2010, 4:10:56 AM5/19/10

to

Although the same method for complementation of DFA can not be applied
in the case of NFA, but I think the correct implementation can be
achieved in some way tricky. Applying the usual interchanging of final
states in the case of NFA would not work when there are some
acceptable string that is prefix of another acceptable string by NFA.
Usually after checking an input string against an NFA these three
situations can occur:

1) The NFA accepts the string
2) The NFA halts when checking string
3) Both situation 1 and 2 occurred

Comparing these situation it would be possible to simulate NFA
complementation by surrounding NFA with a complementation module that
works as follows: in the situation 1 and 3 of above, module simply
rejects the input string and in situation 2 the module accepts the
string.

The final note that using this implementation it is also possible
to work about intersection of NFA's with applying the deMorgan rule;
however it would need epsilon transitions.

Russ Cox

unread,

May 20, 2010, 12:31:15 AM5/20/10

to

On Thu, May 13, 2010 at 8:42 PM, Stefan Monnier
<mon...@iro.umontreal.ca> wrote:
> I'm looking for any work that tries to provide negation and/or
> conjunction operations in regexps without using a DFA (or at least
> while avoiding the DFA state explosion problem).
>
> My search has currently come up empty (actually I pretty much haven't
> seen mention of conjunction and negation in regexps other than in

> Brzozowski's derivatives). B So any pointer would be welcome,

Torben C gidius Mogensen's mail makes a pretty good case that
one can't avoid a lot of effort in the general case. However,
derivatives are a great way to avoid unnecessary effort, because
that formalism can model the concepts directly (in contrast to the
NFA formalism). http://www.ccs.neu.edu/home/turon/re-deriv.pdf
is a recent paper worth looking at. If I needed to implement support
for those concepts, I would start there.

Russ

Torben �gidius Mogensen

unread,

May 20, 2010, 5:34:05 AM5/20/10

to

klyjikoo <klyj...@gmail.com> writes:

> Although the same method for complementation of DFA can not be applied
> in the case of NFA, but I think the correct implementation can be
> achieved in some way tricky.

> Usually after checking an input string against an NFA these three

> situations can occur:
>
> 1) The NFA accepts the string
> 2) The NFA halts when checking string
> 3) Both situation 1 and 2 occurred
>
> Comparing these situation it would be possible to simulate NFA
> complementation by surrounding NFA with a complementation module that
> works as follows: in the situation 1 and 3 of above, module simply
> rejects the input string and in situation 2 the module accepts the
> string.

You can do that, but the result won't be an NFA. So you get into
trouble when you want to build on top of this, such as (not A) | B or
(not A)*.

The main problem with complementing an NFA is that the semantics of an
NFA is that if there exist a path from initial to final state spelling
out the input, then you accept. Complementing this means that there
does not exist any such path. This change of quantifier means that you
need an automaton that requires all paths that spell out the input to
reach a final state. In a DFA, this is the same (there is only one
path), but in an NFA, it is quite different.

You could use alternating aotomata: These have two kinds of states: OR
states and AND states. In an OR state, you just need one of the
transitions leading out of the state to reach a final state (as in an
NFA), but in an AND state, you need all the transitins out of the state
to lead to a final state.

> The final note that using this implementation it is also possible
> to work about intersection of NFA's with applying the deMorgan rule;
> however it would need epsilon transitions.

This basically shows that complementation is a harder problem than
intersection: With complement, you can get intersection, but you can't
get complement from intersection.

BTW, instead of looking at complement, you could look at set difference.
This has the advantage that you don't need to specify the alphabet
(which you do for complement). Obviously, (with some abuse of LaTeX
notation):

\complement A = \Sigma* \setminus A

A \setminus B = A \intersect (\complement B)
= \complement (\complement A \union B)

so once you have one, you have the other. But set difference can be
easier to work with, e.g., if you use derivatives.

Torben