Programming Assignment 4b - Tag Clouds

Computer Algorithms II, Fall 2007


Due Date

This assignment is due by 5:30 p.m. on Tuesday, 11 December.

See the assignment turn-in page (last modified on 14 January 2006) for instructions on turning in your assignment.

Background

Given a set of words and their occurence frequences, a tag cloud for the words is a 2-D arrangement of the words illustrating their occurrence frequencies. For example, this
a tag cloud
is a tag cloud for the lecture on hash tables. In this case, frequency is indicated by a combination of font size and color (more frequent words are bigger and darker with color breaking ties when size is the same).

The Problem

Modify the concordence proxy from Assignment 4a to return a tag cloud in response to a query. The tag cloud is build from the most frequent words apearing on each web page answering the query.

Input

The input remains the same as the previous assignment.

Output

The result of a successful query should be a tag cloud containing the most frequent words in each page matching the query. The tag cloud should contain 50 or 64 or so words, which means each page contributes a varying number of words to the cloud depending on the number of pages matching the query. For example, if five pages matched a query and the tag could contains 50 words, each page would contribute its ten most frequent words to the query; if ten pages matched, each would contribute five words.

The tag could should be approximately square; for example, 7×7 (or so) for a 50-word cloud. The other details of the cloud design are up to you to decide; such details might include:

And so on.

Building

No change from the previous assignment.

Running

No change from the previous assignment, although using a text-only browser like lynx or w3m is less useful for this assignment.


This page last modified on 10 November 2007.