Programming Assignment 4 - KWIK Indexing

Concurrent Programming, Summer 2002


Due Date

This assignment is due by 7:00 a.m. on Tuesday, 23 July. See the assignment turn-in page (last modified on 23 July 2002) for instructions on turning in your assignment.

Background

Keyword in context (KWIC) indexing is a technique for indexing a collection of descriptions, each of which is a small (twenty to thirty words, or so) sentence such as article titles. The principle operation in KWIC indexing is rotation, which consists of circularly shifting the description right (or left) by one word to create a set of descriptions. For example, the title "Multithreaded, Parallel and Distributed Programming", when rotated, produces the following set:

Multithreaded, Parallel and Distributed Programming
Parallel and Distributed Programming Multithreaded,
and Distributed Programming Multithreaded, Parallel
Distributed Programming Multithreaded, Parallel and
Programming Multithreaded, Parallel and Distributed

The KWIC index of a set of descriptions is produced as follows:

  1. Rotate each description and add the rotations to the set.

  2. Sort the set to produce the index.

KWIC indexing can get fancier: rotated descriptions that begin with common words (such as "and Distributed Programming Multithreaded, Parallel" above, and known as stop words can be eliminated, the rotated descriptions can be unrotated back to the original description after sorting (in this case the first word of the rotated description is usually highlighted somehow in the unrotated description), and so on. For now though, the simple KWIC indexing is good enough.

The Problem

Write a Java program that accepts a set of descriptions from std-in and writes a KWIC index of the descriptions to std-out. Your implementation should use threads in a producer-consumer arrangement, and have at least four threads:

  1. An input thread that reads from std-in and produces descriptions, one per input line.

  2. A rotating thread that consumes descriptions and produces the rotated descriptions.

  3. A sorting thread that accepts descriptions and produces the descriptions in alphabetical order.

  4. An output thread the accepts descriptions and writes them to std-out.
The form of buffering between producer and consumer is up to you, although we'll be playing with this in later assignments.
You still need to work in groups of two, but you can chose whomever you want to work with, and you need not change between assignments if you don't want to.


This page last modified on 11 July 2002.