Top-down Parser with Backtracking in C++

Another assignment in my Foundations of Computing class in college was to create a top-down parser that also supports backtracking. My implementation works for any arbitrary context-free grammar. Here’s the program with source code.

The program accepts any arbitrary context-free grammar as an input file, as well as a file that contains the input string to be parsed. When the program starts, it looks for a file called grammar.txt in the program’s directory. The grammar should have the following form:
<L> -> <L>;<A>|<A>
<A> -> <I>=<E>

<E> -> <E>-<T> | <E>+<T> | <T>
<T> -> <T>/<F> | <T>*<F> | <F>
<F> -> <P>^<F> | <P>
<P> -> (<E>) | <I>
<I> -> w|x|y|z

The example grammar above defines a language where any mathematical expression can be formed with the variables w, x, y, z, and the operators +, -, *, /, ^, and =. Multiple expressions can also be separated by a semicolon.

Symbols enclosed in brackets “<…>” are non-terminals. The arrow symbol “->” separates the two sides of the given rewrite rule (production). The “or” symbol “|” is used to separate different choices for a single rewrite rule. Any other symbol is a terminal. The special symbol “<epsilon>” represents a null or empty terminal symbol (sometimes referred to as lambda). Comments can be written in the grammar file by placing a # symbol at the beginning of the line.

Notice that the parser supports left-recursion in the grammar. The program will automatically eliminate all left-recursion from the grammar before parsing any data. After eliminating left-recursion, the program will output a file called grammar_after_left_recursion.txt (for debugging purposes), then another file called grammar_after_unit.txt (after eliminating unit productions), and a file called grammar_after_epsilon.txt (after eliminating any empty productions).

After loading and massaging the grammar, the program will ask you to type in the string to be parsed. Alternatively, you can create a file that contains the string, and pass it as a command line parameter to the program. The program will then state whether or not the data was parsed successfully. If the parsing was successful, the program will display a graphical window with a tree that depicts the parsing performed by the program.

Recursive Descent Parser in C++

One of the later assignments in my Foundations of Computing class in college was to implement a recursive descent parser that outputs assembly code (defined within the assignment). I implemented mine in C++. Here’s the program and the source code.

This program accepts any arbitrary grammar as an input file, as well as a file that contains the input string to be parsed. When the program starts, it looks for a file called grammar.txt in the program’s directory. The grammar should have the following form:
<L> -> <A><L'>
<L'> -> ;<A><L'> | <epsilon><A> -> <I>=<E><#gen>

<E> -> <T><E'>
<E'> -> -<T><#gen><E'> | +<T><#gen><E'> | <epsilon>

<T> -> <F><T'>
<T'> -> /<F><#gen><T'> | *<F><#gen><T'> | <epsilon>

<F> -> <P><F'>
<F'> -> ^<F><#gen> | <epsilon>

<P> -> (<E>) | <I>
<I> -> w|x|y|z

The example grammar above defines a language where any mathematical expression can be formed with the variables w, x, y, z, and the operators +, -, *, /, ^, and =. Multiple expressions can also be separated by a semicolon.

Symbols enclosed in brackets “<…>” are non-terminals. The arrow symbol “->” separates the two sides of the given rewrite rule (production). The “or” symbol “|” is used to separate different choices for a single rewrite rule. Any other symbol is a terminal. The special symbol <epsilon> represents a null or empty terminal symbol (sometimes referred to as lambda). The other special symbol <#gen> is an instruction for the parser to generate code at the point where this symbol appears. Comments can be written in the grammar file by placing a # symbol at the beginning of the line.

The parser supports exactly one non-terminal and no terminals on the LHS (left-hand side) of any rewrite rule. The RHS of any rewrite rule can be arbitrary.

After loading the grammar, the program will ask you to type in the string to be parsed. Alternatively, you can create a file that contains the string, and pass it as a command line parameter to the program. If the data is parsed successfully, the program will generate a file called out.txt that will contain the assembly code generated by the grammar.