This assignment is due by 5:30 p.m. on Tuesday, 11 December. See the assignment turn-in page (last modified on 14 January 2006) for instructions on turning in your assignment.
Given a set of words and their occurence frequences, a tag cloud for
the words is a 2-D arrangement of the words illustrating their occurrence
frequencies. For example, this
is a tag cloud for the lecture on hash tables. In this case,
frequency is indicated by a combination of font size and color (more frequent
words are bigger and darker with color breaking ties when size is the same).
Modify the concordence proxy from Assignment 4a to return a tag cloud in
response to a query. The tag cloud is build from the most frequent words
apearing on each web page answering the query.
The input remains the same as the previous assignment.
The result of a successful query should be a tag cloud containing the most
frequent words in each page matching the query. The tag cloud should contain
50 or 64 or so words, which means each page contributes a varying number of
words to the cloud depending on the number of pages matching the query. For
example, if five pages matched a query and the tag could contains 50 words,
each page would contribute its ten most frequent words to the query; if ten
pages matched, each would contribute five words.
The tag could should be approximately square; for example, 7×7 (or so) for a 50-word cloud. The other details of the cloud design are up to
you to decide; such details might include:
- Should words from separate pages be kept separate or combined in the
cloud?
- What does varying font size mean?
- What does varying font color mean?
- Does position in the tag cloud mean anything, or is it in
alphabetical order?
And so on.
No change from the previous assignment.
No change from the previous assignment, although
using a text-only browser like lynx or w3m is less useful for this assignment.
This page last modified on 10 November 2007.
|
|