# NFA to DFA conversion C++ implementation

I needed a C++ implementation of NFA to DFA conversion for my compilers class and could not find a simple implementation on the web so I thought I would provide one.

A DFA (Deterministic Finite Automaton) is a finite state machine where from each state and a given input symbol, the next possible state is uniquely determined. On the other hand, an NFA (Non-Deterministic Finite Automaton)  can move to several possible next states from a given state and a given input symbol. However, this does not add any more power to the machine. It still accepts the same set of languages, namely the regular languages. It is possible to convert an NFA to an equivalent DFA  using the powerset construction.

The intuition behind this scheme is that an NFA can be in several possible states at any time. We can simulate it with a DFA whose states correspond to sets of states of the underlying NFA.

Take a look at the C++ implementation. Note however that it is not designed for performance. It is my first attempt at a simple, readable and easy-to-understand implementation and I hope I succeeded in that regard.

## 40 thoughts on “NFA to DFA conversion C++ implementation”

1. gillerst says:

Hi, I don’t understand how to write input file “NFA.txt”.
Please teach me how to input an NFA into the program.

• Here’s the format of NFA.txt:
N M
F a1 a2 … af
T
s1 y1 T1 t1 t2 … tt1
s2 y2 T2 t1 t2 … tt2
:
The first line contains two integers N & M, representing the number of states and the number of input symbols respectively. The states are implicity 0, 1, …, N-1 and the input symbols are 1, 2, …, M. The integer 0 is used to represent epsilon.

The second line starts with an integer F, denoting the number of final states in the NFA, followed by F integers which represent the final states.

The third line contains an integer T denoting the number of transitions / no. of lines following this one in NFA.txt.

T lines follow. Each line represents a transition and starts with three integers, denoting the previous state si, the input symbol yi and the no. of states ti the NFA goes to from the previous state on that input symbol. ti integers follow representing the next states NFA can go to from the previous state si on the input symbol yi.

For example, NFA in example 1 on this site: http://www.cs.odu.edu/~toida/nerzic/390teched/regular/fa/nfa-2-dfa.html is represented as:
4 2
2 0 1
4
0 1 2 1 2
1 1 2 1 2
2 2 2 1 3
3 1 2 1 2
The only critical thing is that the symbols a & b have been relabeled as 1 & 2 respectively.
Hope this helps.

2. tammna says:

Thanks for this code. It helped me a lot. Have you worked on the conversion of regular expression into NFA in c++. Even some idea from where I can get this would be helpful.

• Converting regular expression to NFA is fairly easy. I might write a post on it in the near future but I’m a little swamped right now.
Anyway, you could learn it from Algorithms fourth edition by Robert Sedgewick and Kevin Wayne. They provide the code for regex to NFA conversion. If you would like to go deep, you could join the Algorithms II course on Coursera, taught by none other than Robert Sedgewick himself.
Hope this helps.

• tammna says:

The given code is in java , which is fairly new for me , but thanks a lot anyway!

3. Prajwal G says:

Please teach us the output format also

• “DFA.txt” will have the following format:
N M
F a1 a2 … af
s1 y1 t1
s2 y2 y2
:
:
The first line contains two integers N and M, the number of states in the equivalent DFA and the number of moves (alphabet size). M will have the same value as that for the underlying NFA, except that 0 (epsilon moves) won’t be used so the available moves will be 1, 2, …, M.

The second line starts with an integer F, denoting the number of final states in the DFA followed by F final states, a1, a2, …, af (0 <= ai <= N-1).

Each next line until the end of file contains 3 integers (si yi ti), denoting a transition from previous state si to next state ti on input yi (0 <= si, ti <= N-1 and 1 <= yi <= M).

For example, NFA, in example 1 on this site: http://www.cs.odu.edu/~toida/nerzic/390teched/regular/fa/nfa-2-dfa.html, represented as:
4 2
2 0 1
4
0 1 2 1 2
1 1 2 1 2
2 2 2 1 3
3 1 2 1 2

has the equivalent DFA:
4 2
3 0 1 3
0 1 1
0 2 2
1 1 1
1 2 3
2 1 2
2 2 2
3 1 1
3 2 2

where a and b are replaced with 1 and 2 respectively.
Hope this helps!

4. alban says:

Thank you for the code, please can you give the screenshots of a simulation?? Why the input string should be 12 instead of ab?? can we fix this problem in the code, so it is much more easier for the tester

• It’s a console program. There’s nothing to take screenshots of. It reads in the representation of an NFA from a file “NFA.txt” and writes out the representation of the corresponding DFA to a file “DFA.txt”. Nothing appears on the screen as such. You need to write the representation of an NFA in a file “NFA.txt” stored in the same folder as the program executable.

For an example, please take a look at my reply on comment by “Pragwal G”.

I made a design decision to use numbers to represent states and input symbols. It made indexing into arrays easier. Anyway, it’s very easy to modify the code to accept alphabets to represent states and input symbols. You can easily create a mapping a->1, b->2, … by adding (1-‘a’) to an alphabet. The reverse mapping can be created by adding (‘a’-1) to a number.

If you want to submit NFA with alphabets representing states/symbols, you can map alphabets to numbers while reading the file. Similarly, if you want to receive DFA with alphabets representing states/symbols, you can map numbers to alphabets while writing the file. Reply if you encounter any problem.

5. alban says:

Hey thank you very very much for the quick reply. I have to submit this homework at midnight and now is 5 pm. Thank you for the code. I used your code for reading the NFA file, converting it to DFA, and then used the other class DFA to display outputs “Accept or Reject”. Everything works fine. But i dont know how to test it. For example, i want to give the representation of an NFA which contains the empty transitions, and then want to test the program on different inputs.

Sorry for asking you again, to test the program with alphabets e.g abb, abba, aabbb etc what piece of code should i change?? or i just need to change the NFA file?

Im just into pressure of the homework, and i also have to report it and explain the flow execution.
Thank you again for the code, and if you can please reply me quick because i have to submit the homework within 5 hours.

Thank you

• It would be much easier to write a script, in whichever language you are comfortable with, to translate the NFA representation you want to accept into the representation my code accepts. Just write a python script to translate ur NFA containing alphabets into an NFA containing numbers and then run my code on it.

• If you just want to simulate the NFA on different inputs, you should take a look at this post: . If you want to first convert it to DFA and then simulate that DFA on different inputs, then you can use this program (after representation translation with an appropriate script) to create the DFA and then simulate that DFA on different inputs. For that, look at this post: . I’m sorry if I’m not being very helpful. I too have quite a lot on my plate.

6. alban says:

If you can, please email me the code changed so that it accepts alphabet input strings. My homework is about decidability,and i have to admit i am not very good at cpp.

I would really appreciate it.

7. alban says:

on input 01 the DFA outputs accepted, on input 02 outputs rejected and on every other outputs it displays wrong input and then Accepted. why does this happen?

i am using the same file representation as your code and same automaton.

• Did you implement the DFA yourself or are you using my implementation?

• When working with NFA, I’ve chosen 0 to represent a null move (epsilon move). When working with DFA, valid input symbols are from 1 through M. 0 is not a valid input symbol in DFA. Don’t give strings containing 0 to the DFA and give spaces between those integers. All this was explained in comments at the starting of the code. Hope this helps! Reply if you face any problem.

8. alban says:

im using your implementation, with the DFA file that is returned from the conversion

• Please see my other comment. 0 is not a valid input symbol for the DFA. It was used to represent epsilon moves in NFA. A DFA does not have epsilon moves.

9. alban says:

Yes i got it, and NFA.txt is same as the one you gave, and the output DFA i get is the same as the one in this page above. But when i test it , it does’t accepts anything else except everything and also displays the message “Invalid input”. Can you give me some rejected and accepted input strings for testing??? Or am i in pressure and everything just there ??!! ☺

• I just tried it and it works. Here’s a sample execution:
Enter a string (‘.’ to exit) : 1
String accepted.
Enter a string (‘.’ to exit) : 1 2
String accepted.
Enter a string (‘.’ to exit) : 2
String rejected.
Enter a string (‘.’ to exit) : 1 2 1 2
String accepted.
Enter a string (‘.’ to exit) : 1 2 1
String accepted.
Enter a string (‘.’ to exit) : 1 2 2
String rejected.
Enter a string (‘.’ to exit) : .

Make sure to give spaces between integers. Input symbols are integers from 1 through M. Don’t give anything else.

10. alban says:

for example
Test a string for acceptance …. 1122
Invalid input symbol 1122
String accepted

This is what i get for every string except from 1 and 2 alone

• You are not giving spaces between integers. Enter string: 1 1 2 2 (with spaces). It gets rejected btw.

11. alban says:

oh my god ……………

Thank you very very much for your time, and god bless you.
Sorry for being such under pressure.

12. alban says:

Sorry , one last question. Which line of code recognizes the start state from the file?

• 0 is the start state. It’s mentioned in the comments in all files. That thing is hard-coded so there’s no need to recognize it.

13. can i ask a question sir? I m not very familiar with c++, and can hardly differentiate between NFAstate and NFAstates in your code. Can you please write a little explanation about difference?

• NFAstate represents a single NFA state. It has a transition table which denotes whether we have a transition from this state to some other state, given an input symbol.

NFAstates is an array of all the NFA states; just a collection of them.

14. also if possible i’d like to ask u for the brief explanation of epsilonClosure functions please

• Epsilon closure of a state (or a set of states) is the set of states which can be reached by using epsilon moves, starting from that (or those) state(s).

The code has two overloads to calculate epsilon closure, one for a single state and one for a set of states. The latter basically calls the former for each of the component states.

We pass in a bitset (you can think of it as a boolean array), where a set bit indicates that the state corresponding to its index falls in the epsilon closure, and the function is supposed to fill in that bitset. For a given state, we check which states are reachable from it using epsilon moves, set their corresponding bit and recursively call the function for those states.

• I m really grateful,thanks for clear explanation.

• If you don’t mind, can u please explain me the usage of the following bitset “bitset transitions[MAX_ALPHABET_SIZE];”, i have troubles with understanding usage of this part “[MAX_ALPHABET_SIZE]”

• Each state can have transitions on any input symbol to any number of next states. MAX_ALPHABET_SIZE is the number of input symbols.

If you are familiar with graph representations, then it’s basically the adjacency matrix representation. Alternatively, we could’ve used an adjacency list representation, where we could map each input symbol to the list of next states.

15. jainam says:

i am including 0 as an input symbol as my NFA to be converted in a DFA also has epsilon moves.However it is not working. please could you help.

• Provide more details on what is your input. As described in comments in the code, 0 is reserved for epsilon moves in input NFA and is not an input symbol in the output DFA.

16. I am including 0 as an input symbol because the NFA has epsilon moves. However it is not displaying the correct DFA. Please could you help.

• Please provide more details. There’s no way to help if I don’t know what’s your input NFA and what error you are getting.

17. michal zaso says:
• To convert this NFA to the representation required by the program, consider states 1 through 4 to be states 0 through 3 respectively, and input symbols 0 and 1 to be 1 and 2 respectively. Nfa.txt file would then be:
4 2
2 2 3
6
0 0 1 2
0 1 1 1
1 2 2 1 3
2 0 1 1
2 1 1 3
3 1 1 2

If you run the program, it would create the following DfA.txt file:
5 2
4 0 1 2 3
0 1 1
0 2 1
1 1 2
1 2 1
2 1 3
2 2 1
3 1 2
3 2 4
4 1 4
4 2 4

Draw these two automata to better understand how they are equivalent.