Top-down Parser with Backtracking in C++

Another assignment in my Foundations of Computing class in college was to create a top-down parser that also supports backtracking. My implementation works for any arbitrary context-free grammar. Here’s the program with source code.

The program accepts any arbitrary context-free grammar as an input file, as well as a file that contains the input string to be parsed. When the program starts, it looks for a file called grammar.txt in the program’s directory. The grammar should have the following form:
<L> -> <L>;<A>|<A>
<A> -> <I>=<E>

<E> -> <E>-<T> | <E>+<T> | <T>
<T> -> <T>/<F> | <T>*<F> | <F>
<F> -> <P>^<F> | <P>
<P> -> (<E>) | <I>
<I> -> w|x|y|z

The example grammar above defines a language where any mathematical expression can be formed with the variables w, x, y, z, and the operators +, -, *, /, ^, and =. Multiple expressions can also be separated by a semicolon.

Symbols enclosed in brackets “<…>” are non-terminals. The arrow symbol “->” separates the two sides of the given rewrite rule (production). The “or” symbol “|” is used to separate different choices for a single rewrite rule. Any other symbol is a terminal. The special symbol “<epsilon>” represents a null or empty terminal symbol (sometimes referred to as lambda). Comments can be written in the grammar file by placing a # symbol at the beginning of the line.

Notice that the parser supports left-recursion in the grammar. The program will automatically eliminate all left-recursion from the grammar before parsing any data. After eliminating left-recursion, the program will output a file called grammar_after_left_recursion.txt (for debugging purposes), then another file called grammar_after_unit.txt (after eliminating unit productions), and a file called grammar_after_epsilon.txt (after eliminating any empty productions).

After loading and massaging the grammar, the program will ask you to type in the string to be parsed. Alternatively, you can create a file that contains the string, and pass it as a command line parameter to the program. The program will then state whether or not the data was parsed successfully. If the parsing was successful, the program will display a graphical window with a tree that depicts the parsing performed by the program.

Recursive Descent Parser in C++

One of the later assignments in my Foundations of Computing class in college was to implement a recursive descent parser that outputs assembly code (defined within the assignment). I implemented mine in C++. Here’s the program and the source code.

This program accepts any arbitrary grammar as an input file, as well as a file that contains the input string to be parsed. When the program starts, it looks for a file called grammar.txt in the program’s directory. The grammar should have the following form:
<L> -> <A><L'>
<L'> -> ;<A><L'> | <epsilon><A> -> <I>=<E><#gen>

<E> -> <T><E'>
<E'> -> -<T><#gen><E'> | +<T><#gen><E'> | <epsilon>

<T> -> <F><T'>
<T'> -> /<F><#gen><T'> | *<F><#gen><T'> | <epsilon>

<F> -> <P><F'>
<F'> -> ^<F><#gen> | <epsilon>

<P> -> (<E>) | <I>
<I> -> w|x|y|z

The example grammar above defines a language where any mathematical expression can be formed with the variables w, x, y, z, and the operators +, -, *, /, ^, and =. Multiple expressions can also be separated by a semicolon.

Symbols enclosed in brackets “<…>” are non-terminals. The arrow symbol “->” separates the two sides of the given rewrite rule (production). The “or” symbol “|” is used to separate different choices for a single rewrite rule. Any other symbol is a terminal. The special symbol <epsilon> represents a null or empty terminal symbol (sometimes referred to as lambda). The other special symbol <#gen> is an instruction for the parser to generate code at the point where this symbol appears. Comments can be written in the grammar file by placing a # symbol at the beginning of the line.

The parser supports exactly one non-terminal and no terminals on the LHS (left-hand side) of any rewrite rule. The RHS of any rewrite rule can be arbitrary.

After loading the grammar, the program will ask you to type in the string to be parsed. Alternatively, you can create a file that contains the string, and pass it as a command line parameter to the program. If the data is parsed successfully, the program will generate a file called out.txt that will contain the assembly code generated by the grammar.

Markov Chains in Maple

The following is a procedure for Maple that simulates a Markov chain by executing a specified number of trials:

MarkovSim := proc(Mat, startstage, ntrials)
  local count, nvisit, a, i, j, firstAbsorbingStage, run, stage, sumQuant;
  count:=Vector(ColumnDimension(Mat));
  nvisit:=Vector(ColumnDimension(Mat));
  firstAbsorbingStage:=0;
  for i from 1 to ColumnDimension(Mat) do
    for j from 1 to RowDimension(Mat) do
      if Mat[j,i]>=1.0 then firstAbsorbingStage:=i; break; fi;
    od;
    if firstAbsorbingStage>0 then break; fi;
  od;
  for i from 1 to ColumnDimension(Mat) do count[i]:=0; od;
  for run from 1 to ntrials do
    for i from 1 to ColumnDimension(Mat) do nvisit[i]:=0; od;
    nvisit[startstage]:=1;
    stage:=startstage;
    while stage
      a:=rand()/10.^12;
      sumQuant:=0;
        for i from 1 to RowDimension(Mat) do
        if a<(sumQuant+Mat[i,stage]) and a>sumQuant then
          stage:=i; nvisit[i]:=nvisit[i]+1; break;
        fi;
        sumQuant:=sumQuant+Mat[i,stage];
      od;
    od;
    count:=count+nvisit;
  od;
  printf("Average number of steps in different stages before absorption:\n");
  for i from 1 to firstAbsorbingStage-1 do printf("%g ", count[i]/ntrials); od;
  j:=0; printf("\n");
  for i from 1 to ColumnDimension(Mat) do j:=j+count[i]; od;
  printf("Average number of steps until absorption:\n %g", j/ntrials);
end proc:

DFA (Deterministic Finite Automaton) Builder

Here is a program I wrote for my Foundations of Computing class in college (here’s the source code). The program can be used to visually design and test deterministic finite automata (DFAs) of arbitrary size and complexity.

To create a new DFA from scratch, select “File » New DFA” from the main menu. A new, blank DFA window will be displayed.

The first step in creating the DFA is to define the alphabet over which the DFA will operate. The alphabet can be constructed from any combination of subsets of ASCII characters. From the right-hand portion of the window, select the subset of ASCII characters to include in the alphabet, and click “Add.” For example, you can add “Letter,” which will make the alphabet contain all letters (a-z, uppercase and lowercase), or you can add “Lowercase,” which will make the alphabet contain only lowercase letters. You may also add individual characters from the list, for example if you want the alphabet to consist only of the letters “a” and “b.”

The second step in creating the DFA is to define the possible states that the machine can be in. To begin defining states, go to the “State Transitions” tab. The states are organized in a state transition table, as shown below. To add a state, click “Add State.” For each state, the program will generate fields for all items in the alphabet, as well as the end-of-string character. By double-clicking in a field, you can change the transition rule for that state and character. The first state you add will be the start state of the machine. The states are automatically numbered in ascending order.

At this point the DFA is complete, and ready to accept strings. To start testing strings, go to the “Test String” tab. In the top field, type in the string you want to test, and click “Test.” The program will then list the configuration sequence that results from testing the string, and will finally say whether the string was accepted or rejected.

The program also allows the user to save the DFA to a file. To save the DFA you are currently working on, select “File » Save DFA” from the main menu. You may then select the directory and the file name for the DFA. To open a saved DFA file, select “File » Open DFA” from the main menu, and browse for the DFA file that you want to open.