See the assignment turn-in page (last modified on 18 January 2006) for instructions on turning in your assignment.
The following steps build a concept graph G from a seed word w:
show-graphs()
procedure described below.
The program should accept one or two command-line arguments. If there is one command-line argument, it should be taken as the name of the file containing the text to be analyzed. If there are two command-line arguments, the first should be taken as the name of a file containing stop words and the second should be taken as the name of the file containing the text to be analyzed.
For example, in the command
$ cgraph wealth.txt
wealth.txt
is the name of the file containing the text to be analyzed. In
the command line
$ cgraph ignore origins.txt
ignore
is the name of the file containing stop words and origins.txt
is the name of the file containing the text to be analyzed.
The concept graph is weighed; the edge between two nodes has a weight equal to the non-negative integer n, where n is the size of the largest n-neighborhood containing both words associated with the two nodes. For example, if "Frank" and "Lucy" appeared as 3-neighbors and 2-neighbors, then the edge between the nodes associated with "Frank" and "Lucy" would have a weight of 3, the size of the largest n-neighborhood containing both words.
The nodes of the concept graph are also weighted. A node's weight is the average weight of all its edges. If a node has n edges, then its weight would be the floating-point number s/n where s is the sum of the weights on all the edges associated with the node. For example, if a node has three edges of weights 3, 3, and 1, then the node's weight is (3 + 3 + 1)/3 = 2.3.
Node and edge weights give a measure of importance to words. High edge weights indicate the associated words frequently appear together; high node weights indicate words that are heavily connected to other words. The seed node used to start the concept graph any of the nodes having the highest weighting. Similarly, the next node added to the graph is any of the candidate nodes having the highest weighting.
start end minstart and end are a valid range of page numbers, start ≤ end; the pages start and min are part of the page range. min is a positive integer indicating the minimum number of times a word should appear on a page in the given range to be considered important; that is, each word appearing in the graph must appear in, at least, an min-neighborhood. For example, the specification
specifies that a concept graph should be constructed for all words appearing at least three times on any page in the range 1 through 100.? 1 100 3
Under non-error conditions, the program outputs an n-neighbor tree by calling the procedure
passing a pointer to a node of the graph. The filevoid show_graph(const graph_node * const);
show-graph.cc
in the
/export/home/class/CS-503/a5
assignment directory contains the code for
show_graph()
. This file should be compiled and linked in to your code.
The definition for graph-node used by show_graph()
can be found in the file
graph-node.h
, which can also be found in the assignment directory. You
may extend this definition any way you see fit, but you should not change the
declarations for the existing member functions
(graph_node::neighbor_count()
, graph_node::get_neighbor()
, and
graph_node:: get_word()
).
Error messages should be brief and informative, and written to std-out (not std-err) preceded by "! " (that is, an exclamation point followed by a space).
This page last modified on 13 April 2006. |