FIRST set of each variable in a CFG C++ implementation

Here I provide a C++ implementation for finding the FIRST set of each variable in a Context-Free Grammar. This is useful in the parsing phase of compilers.

If w is any string of grammar symbols, then FIRST(w) is the set of terminals that can be found at the start of some string in w, plus ε if the empty string also belongs to w. To compute FIRST(X) for any grammar symbol X, following rules are applied until no more terminals or ε can be added to any FIRST set:

  1. If X is a terminal, then FIRST(X) is { X }.
  2. If X →  ε is a production, then add ε to FIRST(X).
  3. If X is a variable and X → Y1Y2…Yk is a production, then place ‘a’ in FIRST(X) if and only if for some ‘i’, ‘a’ is in FIRST(Yi) and ε is in all of FIRST(Y1), …, FIRST(Yi-1). If ε is in FIRST(Yj) for all j = 1, 2, …, k, then add ε to FIRST(X).

I used the bits of an integer to hold the elements of a FIRST set i.e., an integer represents a set; and simplified the task of identifying the variables and terminals by limiting the set of variables to uppercase alphabets and the set of terminals to lowercase alphabets. This puts an upper bound on the size of grammar this particular implementation can handle to 26 variables and 25 terminals (excluding ε).

Take a look at the C++ implementation.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s