Another assignment in my Foundations of Computing class in college was to create a top-down parser that also supports backtracking. My implementation works for any arbitrary context-free grammar. Here’s the program with source code.
The program accepts any arbitrary context-free grammar as an input file, as well as a file that contains the input string to be parsed. When the program starts, it looks for a file called grammar.txt
in the program’s directory. The grammar should have the following form:
<L> -> <L>;<A>|<A>
<A> -> <I>=<E>
<E> -> <E>-<T> | <E>+<T> | <T>
<T> -> <T>/<F> | <T>*<F> | <F>
<F> -> <P>^<F> | <P>
<P> -> (<E>) | <I>
<I> -> w|x|y|z
The example grammar above defines a language where any mathematical expression can be formed with the variables w, x, y, z, and the operators +, -, *, /, ^, and =. Multiple expressions can also be separated by a semicolon.
Symbols enclosed in brackets “<…>” are non-terminals. The arrow symbol “->” separates the two sides of the given rewrite rule (production). The “or” symbol “|” is used to separate different choices for a single rewrite rule. Any other symbol is a terminal. The special symbol “<epsilon>” represents a null or empty terminal symbol (sometimes referred to as lambda). Comments can be written in the grammar file by placing a # symbol at the beginning of the line.
Notice that the parser supports left-recursion in the grammar. The program will automatically eliminate all left-recursion from the grammar before parsing any data. After eliminating left-recursion, the program will output a file called grammar_after_left_recursion.txt
(for debugging purposes), then another file called grammar_after_unit.txt
(after eliminating unit productions), and a file called grammar_after_epsilon.txt
(after eliminating any empty productions).
After loading and massaging the grammar, the program will ask you to type in the string to be parsed. Alternatively, you can create a file that contains the string, and pass it as a command line parameter to the program. The program will then state whether or not the data was parsed successfully. If the parsing was successful, the program will display a graphical window with a tree that depicts the parsing performed by the program.