Programming Assignment 4a - Inverted Indexes

Computer Algorithms I, Fall 2003


Due Date

This assignment is due by 5:00 p.m. on Friday, 7 November.

See the assignment turn-in page (last modified on 3 November 2003) for instructions on turning in your assignment.

The Problem

Invert the code you wrote for Assignment 3b so that it produces a list of the documents containing a word, rather than a list of words contained in a document (the latter list is known as an inverted index).

That is, rather than producing output that looks like this

1 university
2 students 
3 mu

your mitm() function should produce (to std-out) output that looks like this:

university http://www.monmouth.edu 1 97
students http://www.monmouth.edu 2 97
mu http://www.monmouth.edu 3 97

The general form of an output line is

word uri count total

where

Each field is separated from the next by at least one non-newline space character. Lines do not have to be output in any particular order, but each word in the document should only appear in one line.

For this assignment, mitm() should produce output each time it's called, and it should produce output only for the document passed in by the call. However, for the second half of this assignment, mitm() will keep a running inverted index of all documents it receives, so you might want to plan ahead for that.


This page last modified on 3 December 2003.