See the assignment turn-in page (last modified on 18 January 2006) for instructions on turning in your assignment.
Here is an example session:
$ concord verne.txt ? the an in 100 200 300 ? 100 200 300 afternoon an in the ? laser ?
The question mark is the query prompt. The first query asks for all pages containing all three of the words "the", "an", and "in"; there are three of them. The second query asks for all the words in common on pages 100, 200 and 300; there are four of them. The third query asks for a word that doesn't exist.
combination of their different operations. PAGE 5 In every other art and manufacture, the effects of the division of
while the following is not a page delimiter:
"Turn to PAGE 5?"
The page delimiters are not part of the text and should not be part of the concordance (that is, if the text itself doesn't contain the word "page", then a query for "page" should return no pages).
A page delimiter marks the start of a page; the text for a page follows the page delimiter. Any text before the first page (PAGE 1) should be ignored.
Words in text are maximal sequences of one or more letters in either case; for example the text "it's" consists of the two words "it" and "s". Words need not be stemmed or otherwise transformed.
lucy scotland virago 10 20 30
Input read from std-in not in either of the described formats should produce a brief error message to std-out (not std-err).
There are two forms of output corresponding to the two types of queries. The result of a page query should be a list of page numbers in increasing order. For example
? lucy scotland 23 128 240
The result of a word query should be a list of words in increasing order; each word listed should appear only once. For example
? 100 200 300 an in the
Error messages should be brief and informative, and written to std-out (not std-err) preceded by "! " (that is, an exclamation point followed by a space). For example
? it's ! Input garbled: "it's" not understood. ? 10 the ! input garbled: "the" not understood.
Testing
You can use the files smith.txt
, trollope.txt
, and darwin.txt
as
input texts for your program. The files can be found in the public class
directory /export/home/class/CS-503/a3
.
This page last modified on 11 March 2006. |