Programming Assignment 5b - Searching

Computer Algorithms I, Fall 2003


Due Date

This assignment is due by 5:00 p.m. on Friday, 5 December.

See the assignment turn-in page (last modified on 3 November 2003) for instructions on turning in your assignment.

The Problem

Modify your page-logger to respond to a set of command URIs with the general form

http://pagelog/command?args

If the command portion of the is missing, the URI is syntatically incorrect. The commands your page-logger should recognize are: output, find, and clear. The forms and behaviors of each of these commands are

output:
The output page-log URI has the form

http://pagelog/output?filename

outputs the cumulative word list to a file with the given name, or to std-out (which corresponds to std::cout) if filename or ?filename is missing. The new file should replace an old file of the same name. filename may begin with a slash, as in, for example, the URI

http://pagelog/output?/tmp/words.out

In such cases, the filename without the initial slash is, to continue the example, /tmp/words.out, which creates the file words.out in the directory /tmp. Files that are not absolute are written relative to the directory in which cmd(page-logger) is running.

There are three filenames your code should recognize and treat specially:

  1. http://pagelog/output?std::cout - Output should be written to std-out (which corresponds to std::cout). This is the default behavior in the absence of a filename.

  2. http://pagelog/output?std::cerr - Output should be written to std-error (which corresponds to std::cerr).

  3. http://pagelog/output?html - Output should be written to (that is, stored in) a storage block allocated within mitm() and returned via the resource parameter in mitm(); the output should not be null terminated. The size field of the resource contains the output's length in characters. The information in the block will be sent back to the requesting browser as an HTML page; the code calling mitm() will free the storage block.

    The special filenames must appear exactly as given above; the URI http://pagelog/output?cout creates a file called cout in the directory in which the page-logger is running and writes the word statistics to it.

find:
The find page-log URI has the form

http://pagelog/find?[ keyword [ , keyword ]... ]

where the notation [ A ] means that A is optional; that is, it may or may not appear and the notation A... means zero or more repetitions of A; that is, nothing or A or AA or AAA and so on.

Each keyword is a taken to be a word in a Web page. It is not required that keyword match the syntax used to define words in a Web page. For example, <300!!!> is a valid keyword; it won't be a useful keyword, but that's not the page-logger's problem.

In response to a find command, the page-logger should return a list of pages containing all the keywords given in the command. The list should be a sequence of URIs, each separated from the next by a newline character (the last URI, if any, need not be followed by a newline character). URIs in the list should be ordered by descending hit count, which is the total number of times all keywords in the find command match words in the page; URIs with, identical hit counts can be arbitrarirly ordered relative to one-another. If a find command contains no keywords following the ?, or contains no ? or keywords, then all pages collected so far should be listed.

The page list should be placed in a block of storage and returned via the mitm() resource parameter, just as is done for HTML output described above.

clear:
The clear page-log URI has the form

http://pagelog/clear

The clear command clears all cumulative page information gathered so far; effecively re-starting the page-logger.

If an error occurs when processing a command, your code should pass back an informative error message using the same approach as described for HTML output above. The error message will be sent back to the requesting browser as an HTML page.


This page last modified on 8 December 2003.