CS 503 - Advanced Programming I Programming Assignment 4

Programming Assignment 4 - Neighbor Trees

Advanced Programming I, Spring 2006


Due Date

This assignment is due by 5:30 p.m. on Friday, 24 March.

See the assignment turn-in page (last modified on 18 January 2006) for instructions on turning in your assignment.

Background

Let w be a word appearing on each of the pages i - n/2 through i + (n - n/2) of a text for non-negative n. The n-neighbors of w are all the words that also appear on each of the pages i - n/2 through i + (n - n/2).

Given a word w in a text, the n-neighbor tree T of w is a tree with w at its root and has the following properties true for every subtree of T:

  1. The children of the root of the subtree contain the words that are n-neighbors of the word at the root of the subtree.

  2. The children of the root of the subtree contain the words that do not appear anywhere else in T.
Property 2 implies that the final shape of the tree depends on how the tree was constructed.

The Problem

Write a program that generates n-neighbor trees found in some text. The name of the file containing the text is given on the command line; the specifications for the n-neighbor trees to construct are are read from std-in; the program displays the n-neighbor trees by calling a procedure described below.

The program should accept one or two command-line arguments. If there is one command-line argument, it should be taken as the name of the file containing the text to be analyzed. If there are two command-line arguments, the first should be taken as the name of a file containing stop words and the second should be taken as the name of the file containing the text to be analyzed.

A stop word is a word that should be ignored if it appears in the text being analyzed. If no stop words are given, all words in the text should be part of the analysis.

For example, in the command

$ nn-tree wealth.txt
wealth.txt is the name of the file containing the text to be analyzed. In the command line
$ nn-tree ignore origins.txt
ignore is the name of the file containing stop words and origins.txt is the name of the file containing the text to be analyzed.

Input

There are three kinds of inputs: the text to be analyzed, the n-neighbor tree specifications read from std-in, and stop words to use.

The Text

The format of the input text is identical to that of Assignment 3.

Tree Specifications

A tree specification has the form
word n
where word is the word that should appear at the root of the n-neighbor tree created and n is a non-negative integer. For example, the specification
? nail 3
Specifies a 3-neighbor tree for the word "nail".

Stop Words

The stop-word file is a text file; each word in the file should be taken as a stop word. The the file stop-words in the /export/home/class/CS-503/a4 assignment directory contains an example stop-word file.

Output

The prompt "? " (that is, a question mark followed by a space) should be output to std-out each time the program is ready for a query.

Under non-error conditions, the program outputs an n-neighbor tree by calling the procedure

void show_tree(const tree_node * const);
Passing a pointer to the root of the tree or 0 if there is no tree. The file show-tree.cc in the /export/home/class/CS-503/a4 assignment directory contains the code for show_tree(). This file should be compiled and linked in to your code.

The definition for tree-node used by show_tree() can be found in the file tree-node.h, which can also be found in the assignment directory. You may extend this definition any way you see fit, but you should not change the declarations for the existing member functions (tree_node::child_count(), tree_node::get_child(), and tree_node:: get_word()).

Error messages should be brief and informative, and written to std-out (not std-err) preceded by "! " (that is, an exclamation point followed by a space).

Testing

You can continue to use the Assignment 3 test files for this assignment; they can be found in the Assignment 3 directory.


This page last modified on 22 March 2006.