CS 503 - Advanced Programming I Programming Assignment 1

Programming Assignment 1 - Generating Regular Languages

Advanced Programming I, Spring 2007


Due Date

This assignment is due by 5:30 p.m. on Tuesday, 30 January.

See the assignment turn-in page (last modified on 14 January 2006) for instructions on turning in your assignment.

Background

Regular expressions are a well-known technique for selecting particular strings from a set of string; the strings selected are those matching the pattern described by a regular expression. A less well-known use of regular expressions is to generate a set of strings; the strings in the set are those that would match the pattern given by the regular expression. A set of strings generated by a regular expression is called a regular language.

For this assignment, a regular expression is a sequence of items. Items otherwise unidentified in the syntax are copied as is:

$ echo "a b c" | regexp-gen
"a b c"

$
An option item is a question mark followed by an item; it indicates that the item may or may not appear in strings:
$ echo "a ? 1 b" | regexp-gen
"a 1 b" 
"a b"

$
Note that the question mark precedes the optional item rather than following it as is usually done. This is true for all the regular-expression operations.

A choice item is a bar followed by two items; it indicates that exactly one of the two items may appear in strings:

$ echo "a | 1 2 c" | regexp-gen 
"a 2 c"
"a 1 c"

$
A repetition item is a star followed by an item; it indicates that zero, one or two copies of the item in succession may appear in strings:
$ echo "a * 1 b" | regexp-gen 
a 1 b
a 1 1 b
a b

$
Normally repetition generates infinitely many strings when applied to a non-empty item; quasi regular expressions restrict repetition to the three shortest strings.

Parentheses group a sequence of items into a single item:

$ echo "a ( b ( c ) ) d" | regexp-gen 
a b c d

$ echo "a ? ( b c ) d" | regexp-gen 
a b c d
a d

$ echo "a * ( b c ) d" | regexp-gen 
a b c d
a b c b c d
a d

$ echo  "a | b ( c d )" | regexp-gen
a c d
a b

$
Empty parenthesis and choice make option redundant:
$ echo "a | b ( )" | regexp-gen 
a
a b

$

The Problem

Write a program that reads regular expressions from std-in and writes the language generated by the regular expression to std-out.

Input

Your program should repeatedly read regular expressions from std-in. Each regular expressions occupies a single line; each line contains at most one regular expression. Your program should end when it receives an end-of-stream.

For parsing convenience, adjacent items have at least one space character between them; a space character is any character for which isspace() returns true.

Output

The language generated by the input regular expression should be written to std-out one string per line. If the input regular expression is incorrect, print an error message to std-err and continue reading from std-in; the error message should begin with "! ", an exclamation mark followed by a space.

Testing

My solution regexp-gen is available in the class assignment directory /export/home/class/cs-503/a1. Remember the purpose of this assignment is to implement a correct solution, not to copy the behavior of my solution. If my solution's wrong and you copy the incorrect behavior, you're going to lose points.

You'll also find a simple testing script called tests in the class assignment directory. It's a text file so you can read it using more (for example) or your text editor.

When you run the garbage test, you might want to send the output to /dev/null like so:

$ ./regexp-gen < regexp-gen > /dev/null
This will keep your terminal from being corrupted.


This page last modified on 14 January 2007.