NFA to DFA

95 views
Skip to first unread message

rpm

unread,
Mar 15, 2011, 5:27:03 AM3/15/11
to proglang-course-2011
Hi,
(a|b)* a (a|b)
Teacher said a very easy technique for DFA.Can any body describe the
basic rule for it ?
It would be a great help.

/A

Hugo

unread,
Mar 15, 2011, 9:47:30 AM3/15/11
to proglang-course-2011
Would be great to see how to get the DFA out from the NFA of (a|b)* a
(a|b) .

I just tried the technique on p 155 of the course book but I haven't
got the same answer as in the solutions (http://www.cse.chalmers.se/
edu/course/TIN321/exams/solutions-plt-2008-0.txt). Maybe I did it
wrong...

Maybe we are we should go to the DFA straight from the regexp? Are we
dumb taking the deviation this through the NFA?

klondike

unread,
Mar 15, 2011, 1:09:37 PM3/15/11
to proglang-c...@googlegroups.com
The technique I recall (the one I learnt in Spain is as follows).

For each state i of the NFA create a new one containing each of the
possible states sets from the nfa.

(ie, on a two state NFA with states 1,2 you get (1), (2) and (1,2)).

Now for each connection from states i to j you have to create a new
one from every state containing i to each state containing j.

Finally mark any state containing a final state in the original
automaton as final and mark the state conatining only the initial
state as initial.

The funny thing is that this causes a state explosion showing that
although not more descriptive NFA create more eficient automatons :D

Arnar Birgisson

unread,
Mar 15, 2011, 1:27:32 PM3/15/11
to proglang-c...@googlegroups.com, klondike
On Tue, Mar 15, 2011 at 18:09, klondike <franxi...@gmail.com> wrote:
> The technique I recall (the one I learnt in Spain is as follows).
>
> For each state i of the NFA create a new one containing each of the
> possible states sets from the nfa.
>
> (ie, on a two state NFA with states 1,2 you get (1), (2) and (1,2)).
>...

> The funny thing is that this causes a state explosion showing that
> although not more descriptive NFA create more eficient automatons :D


Yes, this is (for me) the easiest way of constructing a DFA, by doing
it from the NFA. However, as you point out you'll have to write many
many states if you start that way.

I start with the NFA start state, and then follow the edges to the
next possible states, and create the states in the DFA on demand.

E.g. if you have the following NFA with states 1,2, 3 and 4, with 1 as
the start state:

1 -- a --> 2
1 -- a --> 3
1 -- b --> 3
2 -- c --> 3
3 -- c --> 4

Then we start by making a state {1} in the DFA. From it we can follow
the a edge to both 2 and 3, so we create the state {2,3} and make an
"a" arrow from state {1} to it. Then we follow the b edge from 1 and
find that we can only land in 3, so we create a state {3} in the DFA
(and draw the edge). Now we have:

{1} -- a --> {2,3}
{1} -- b --> {3}

Looking at the NFA, I can see that if I'm in either state 2 or 3, the
only edges are c edges, and following (all of) the will put me in
state 3 or 4. So, I create state "3,4" and add the edge

{2,3} -- c --> {3,4}

From state 3 we can get only get to 4 via a c edge, so we add the
state {4} and edge

{3} -- c --> {4}

Finally we see if that we are in either state 3 or 4 (need to consider
this since there is a node {3,4} in our DFA), then the only arrow we
can follow is a c arrow to state 4. So we add also

{3,4} -- c --> {4}

Finally, to make sure, check every state in your DFA, and look at the
corresponding sets of states in the NFA to see if there are any
outgoing edges (from the _set_) that are not represented in the DFA.
If so, you must add them.

(Note that the above state machines don't correspond to any regular
expression, it's just a made up example)

You have to be careful also with epsilon edges. Whenever we can enter
a state in the NFA which has an outgoing epsilon edge, we must include
the possiblity of following that edge also. E.g. for the NFA:

1 -- a --> 2 -- eps --> 3

The corresponding DFA is

{1} -- a --> {2,3}

A state in the DFA is an accept state if any of the states it
represents in the NFA is an accept state.

hope this helps,
Arnar

klondike

unread,
Mar 15, 2011, 1:59:00 PM3/15/11
to Arnar Birgisson, proglang-c...@googlegroups.com
2011/3/15 Arnar Birgisson <arn...@gmail.com>:

> You have to be careful also with epsilon edges. Whenever we can enter
> a state in the NFA which has an outgoing epsilon edge, we must include
> the possiblity of following that edge also. E.g. for the NFA:
>
> 1 -- a --> 2 -- eps --> 3
>
> The corresponding DFA is
>
> {1} -- a --> {2,3}
>
> A state in the DFA is an accept state if any of the states it
> represents in the NFA is an accept state.
>
> hope this helps,
> Arnar
>
Well in Spain I recall calling the NFA with epsilon edges in another
way (amongst other things because we studied them before the othe
NFAs). So I thought he was refering to the same quin of NFAs.

David

unread,
Mar 15, 2011, 3:40:44 PM3/15/11
to proglang-course-2011
> The funny thing is that this causes a state explosion showing that
> although not more descriptive NFA create more eficient automatons :D

Just want to point out that NFA:s most of the time is more efficient
considering space complexity but almost
never considering the time complexity. DFA:s is always linear in the
size of the input (I may be wrong about this)
while NFA can be exponential or worse.

For example there is an old example considering perl:s implementation
of regexps vs for example grep which uses
DFA:s to match strings.

$ cat test.txt
aaaaaaaaaaaaaaaaaaaaaaaaaa

$ time perl -e "print "`cat test.txt`" =~ /^((a?){26})(a{26})$/"
aaaaaaaaaaaaaaaaaaaaaaaaaa
real 0m15.330s
user 0m15.320s
sys 0m0.000s

$ time grep -E "^((a?){26})(a{26})$" test.txt
aaaaaaaaaaaaaaaaaaaaaaaaaa

real 0m0.002s
user 0m0.000s
sys 0m0.000s

Try adding one 'a' to the text file and updating the regex in both
cases to look for 27 a:s instead, you will notice
that the perl version will need more then twice the amount of time to
match it while grep still will be very fast.

Hugo

unread,
Mar 15, 2011, 4:17:57 PM3/15/11
to proglang-course-2011
Thanks guys!

I will try it again.

Just to make sure I work on the right NFA of (a|b)*a(a|b). Is this the
one?
a b
0 {0,1} 0
1 2 2
2 - -

I not sure how to describe an acceptance state in the table. In my
diagram, state 2 does not have any outbound arrows. I know that
acceptance states in DFA's have outbound arrows though.

Arnar Birgisson

unread,
Mar 15, 2011, 4:30:27 PM3/15/11
to proglang-c...@googlegroups.com
There can be accepting states with outbound arrows, as well as stuck
states (states with now outbound arrows) that are not accepting.

I'd suggest you mark the accepting states in the table e.g. with an
asterisk or something, as long as you explain your notation. If you
draw the state machine on the exam, it is conventional to mark the
start state with an incoming arrow (not from any state) and an
accepting state with a double stroke circle.

cheers,
Arnar

fatemeh lashkari

unread,
Mar 15, 2011, 5:18:55 PM3/15/11
to proglang-c...@googlegroups.com
Hi
Is there any one know the DFA for question 4 of 2008-1 exam?

Write a regular expression that recognizes a sequence of tokens that are
of one of these two forms, and may be separated by spaces (and doesn’t
recognize anything else):
string literal: begins and ends with a double quote ", between which
it can contain any characters except the double quote itself (so there
are no escapes)
comment: begins with /* and ends with */; in between, any characters
except the sequence */ are allowed; the mark /* cannot be inside a
string literal

Aarne Ranta

unread,
Mar 16, 2011, 4:29:11 AM3/16/11
to proglang-c...@googlegroups.com
Dear All,

Thanks for the lively discussion of automata! My best advice is that the book indeed has an excellent coverage of all the issues concerning conversion algorithms, complexity, etc. This is traditionally an essential body of knowledge for all computer scientists. The lexing tools have made it less important in practice for compiler writers - but surely it is still valuable to know how the things work under the hood.

As for the DFA conversion below, you might remember that I got the same answer in the lecture both by informal reasoning and by the subset construction. The difference *can* result from using a different NFA as starting point. Or else it is very easy to do something wrong when following the algorithm manually. The first things to check then are (0) is the only difference the naming of states? (1) is the automaton really deterministic? (2) is it minimal? (3) does it really implement the same language?

With best regards

  Aarne.

David

unread,
Mar 16, 2011, 4:46:55 AM3/16/11
to proglang-course-2011
On Mar 15, 10:18 pm, fatemeh lashkari <fl.5...@gmail.com> wrote:
> Write a regular expression that recognizes a sequence of tokens that are
> of one of these two forms, and may be separated by spaces (and doesn’t
> recognize anything else):
> string literal: begins and ends with a double quote ", between which
> it can contain any characters except the double quote itself (so there
> are no escapes)
> comment: begins with /* and ends with */; in between, any characters
> except the sequence */ are allowed; the mark /* cannot be inside a
> string literal

This is non-trivial. Even bnfc gets this one wrong. Consider the
following grammar:

Program. Prog ::= [Ident] ;
separator Ident "" ;
comment "/*" "*/" ;

bnfc will produce a regular expression for comments working the same
way as this: ^/\*([^*]|\*[^/])\*+/$

This grammar will accept the input "/* **/ */ foo" but not "/* ***/ */
foo". It should fail on both.

Is it even possible to match comments on this form correctly without
look-ahead?

David

unread,
Mar 16, 2011, 6:01:43 AM3/16/11
to proglang-course-2011
> Is it even possible to match comments on this form correctly without
> look-ahead?
http://ostermiller.org/findcomment.html

Hugo

unread,
Mar 16, 2011, 7:06:05 AM3/16/11
to proglang-course-2011
I also wonder how to represent "all characters EXCEPT x,y,z"...

In this question we wanted to match everything but (/*|*/) and later
everything but ".

Also, do we refer to all ASCII characters or all alphanumeric
characters?

Hugo

Hugo

unread,
Mar 16, 2011, 7:05:26 AM3/16/11
to proglang-course-2011

Hugo

unread,
Mar 16, 2011, 7:03:23 AM3/16/11
to proglang-course-2011
I also wondering how to write "all characters (do we talk about all
ASCII or just all alphabetic characters...?) EXCEPT (*/|/*).

Hugo

unread,
Mar 16, 2011, 7:10:29 AM3/16/11
to proglang-course-2011
My current guess I just, supposing we mean all alphabetic characters,
match [:alpha:] or what we need.

( /* ([a-zA-z|space]) */ | " ([a-zA-z|space]) " )*

I.e either alternative 1 or alternative 2 any number of times: (alt1
| alt2) *

fatemeh lashkari

unread,
Mar 16, 2011, 7:43:51 AM3/16/11
to proglang-c...@googlegroups.com
My guess is base on all solution

/* (space | char - * | * (char -/)) * ** */ | " (space |char - " | char - /*) * "

Hugo

unread,
Mar 16, 2011, 12:06:28 PM3/16/11
to proglang-course-2011
Can we have an official answer from course folks? :)

On 16 mar, 12:43, fatemeh lashkari <fl.5...@gmail.com> wrote:
> My guess is base on all solution
>
> /* (space | char - * | * (char -/)) * ** */ | " (space |char - " | char -
> /*) * "
>

Aarne Ranta

unread,
Mar 16, 2011, 4:00:28 PM3/16/11
to proglang-c...@googlegroups.com
Hello,

One "official answer" (in any case, an accepted one), is best given by a sequence of definitions:

  space = (' ' | '\t' | '\n')
  code = (token space*) *
  token = string | comment
  string = '"' (char - '"')* '"'
  comment = '/' '*' ((char - '*') | '*' (char - '/'))* ('*')+ '/' 

Here char means, as in BNFC, any 8-bit ASCII character.

Notice the use of negation (-) between regexps, not supported by all tools but given in the lectures. Also notice that the regexp corresponds to a very non-deterministic automaton, but this is not an issue in the way the question was formulated.

Also thanks for the pointer to http://ostermiller.org/findcomment.html, David !

Regards

  Aarne.

klondike

unread,
Mar 17, 2011, 3:16:48 PM3/17/11
to proglang-c...@googlegroups.com
Wanted to do this before the exam but lacked time to do so: If you
want to make sure there are not /* markings inside the strings just do
this:

space = (' ' | '\t' | '\n')
code = (token space*) *
token = string | comment

string = '"' ((char - '"'-'*')| '*' (char - '"'-'/')))* '"'


comment = '/' '*' ((char - '*') | '*' (char - '/'))* ('*')+ '/'

Hope that helps although the notation may not be correct.

Reply all
Reply to author
Forward
0 new messages