See the assignment turn-in page (last modified on 14 January 2006) for instructions on turning in your assignment.
Here is an example session:
The question mark is the query prompt. The first query asks for all pages containing all three of the words "the", "an", and "in"; there are three of them. The second query asks for all the words in common on pages 100, 200 and 300; there are four of them. The third query asks for a word that doesn't exist.$ concord verne.txt ? the an in 100 200 300 ? 100 200 300 afternoon an in the ? laser ?
while the following is not a page delimiter:combination of their different operations. PAGE 5 In every other art and manufacture, the effects of the division of
The page delimiters are not part of the text and should not be part of the concordance (that is, if the text itself doesn't contain the word "page", then a query for "page" should return no pages)."Turn to PAGE 5?"
A page delimiter marks the start of a page; the text for a page follows the page delimiter. Any text before the first page (PAGE 1) should be ignored.
Words in text are maximal sequences of one or more letters in either case; for example the text "it's" consists of the two words "it" and "s". Words need not be stemmed or otherwise transformed.
Input read from std-in not in either of the described formats should produce a brief error message to std-out (not std-err).lucy scotland virago 10 20 30
There are two forms of output corresponding to the two types of queries. The result of a page query should be a list of page numbers in increasing order. For example
The result of a word query should be a list of words in increasing order; each word listed should appear only once. For example? lucy scotland 23 128 240
Error messages should be brief and informative, and written to std-out (not std-err) preceded by "! " (that is, an exclamation point followed by a space). For example? 100 200 300 an in the
? it's ! Input garbled: "it's" not understood. ? 10 the ! input garbled: "the" not understood.
smith.txt
, trollope.txt
, and darwin.txt
as
input texts for your program. The files can be found in the public class
directory /export/home/class/cs-503/a3
.
This page last modified on 11 March 2006. |