Programming Assignment 2b - Search Spiders

Computer Algorithms I, Fall 2003


Due Date

This assignment is due by 5:00 p.m. on Thursday, 9 October.

See the assignment turn-in page (last modified on 3 September 2003) for instructions on turning in your assignment.

The Problem

Write a search spider that accepts a URL, known as the initial URL, and a word, known as the search word, and searches the tree of pages rooted at the given URL for any occurrence of the search word, printing the absolute URL of any page visited, including the root page, that contains the search word.

Your spider should exhibit the following searching behavior:

Input

Your spider should accept two command-line arguments: a URL and a word in that order

your-search-spider URL word

Output

Your spider should write to std-out the absolute URL associated with each page on which it finds the search word. URLs should be output one per line, and can be output in any order. Each URL should be output once.

When comparing URLs remember that the protocol and host parts are case insensative. For example, the URLs

http://www.monmouth.edu

HtTp://www.monmouth.edu

http://WwW.mOnMoUtH.eDu

are all the same. The directory part of a URL is case sensative. For example, the URLs

http://www.monmouth.edu/index.html

http://www.monmouth.edu/Index.html

are different.


This page last modified on 11 October 2003.