Here I provide a C++ implementation for finding the FIRST set of each variable in a Context-Free Grammar. This is useful in the parsing phase of compilers.

If *w* is any string of grammar symbols, then FIRST(*w*) is the set of terminals that can be found at the start of some string in *w*, plus ε if the empty string also belongs to *w*. To compute FIRST(X) for any grammar symbol X, following rules are applied until no more terminals or ε can be added to any FIRST set:

- If X is a terminal, then FIRST(X) is { X }.
- If X → ε is a production, then add ε to FIRST(X).
- If X is a variable and X → Y
_{1}Y_{2}…Y_{k} is a production, then place ‘a’ in FIRST(X) if and only if for some ‘i’, ‘a’ is in FIRST(Y_{i}) and ε is in all of FIRST(Y_{1}), …, FIRST(Y_{i-1}). If ε is in FIRST(Y_{j}) for all j = 1, 2, …, k, then add ε to FIRST(X).

I used the bits of an integer to hold the elements of a FIRST set i.e., an integer represents a set; and simplified the task of identifying the variables and terminals by limiting the set of variables to uppercase alphabets and the set of terminals to lowercase alphabets. This puts an upper bound on the size of grammar this particular implementation can handle to 26 variables and 25 terminals (excluding ε).

Take a look at the C++ implementation.

28.635308
77.224960

### Like this:

Like Loading...

*Related*