See the assignment turn-in page (last modified on 14 January 2006) for instructions on turning in your assignment.
Given a web-page concordance, a query on the concordance is a list of words. The result of the query is a list of web pages containing all the words in the query list (that is, each web page in the result contains all the words in the query list). If no web page contains all the words in the query, the query result is empty (that is, contains no web pages).
Your code will be part of a web-browser proxy. The interface between your code and the proxy is the procedure
extern void wbproxy(const std::string & url, std::string & document);
defined in /export/home/class/cs-306/a4a/wbproxy.h
.
wbproxy()
is
the page's URL and the second argument contains the page itself. If the input
to your code is a query, then the first argument is the query itself and the
second argument is empty (that is, document = ""
).
A query URL has the form
where word1 through wordn are the words in the query. For example, the queryhttp://wpconc/
word1,
word2,...,
wordn
http://wpconc/quicksort,optimization,smallest
returns a page containing a list of all pages that contain the three words “quicksort,” “optimization,” and “smallest.”
If the input to your code was a query, your code should create the contents for
a web page containing the answers, if any, to the query. The contents should
be the HTML text that goes between the <body>
and </body>
tags in an
HTML document. The content need not contain a <body>
tag, nor any of the
tags that usually precede the <body>
tag in an HTML document; these will
be added by the proxy. Similarly, the content need not contain a </body>
tag nor any of the tags that usually follow the </body>
tag.
The web-page contents returned can be simple; for example, just a list of web pages satisfying the query. The web page can be made more useful, but don't tart it up too much (for example, no JavaScript).
The contents are stored in the second argument to wbproxy()
. Don't forget
to set the size field correctly; the size should not include the null byte at
the end of the content string.
libwbproxy.a
in the assignment
directory contains the code implementing a browser proxy. You link your code
with libwbproxy.a
to create the browser-proxy executable. The makefile in
the assignment directory shows the commands needed to create the executable.
wpconc
, type
$ ./wpconc
The proxy normally runs without producing any output to std-out or std-err.
In the other window, run a browser connected to the proxy; this forces all
traffic to and from the browser to pass through the proxy. The easiest way to
do this is use a text-only browser, such as lynx, set to use wpconc
as a
proxy:
$ http_proxy=http://localhost:10306 lynx
The proxy details may vary from one browser to another; check the browser's documentation for more information.
wpconc
for
communication with a browser. TCP ports are system-wide, which means only one
program can use a particular port at a time. This could be a problem if there
are several people working on the same system, such as rockhopper. In such
cases, the first person to run wpconc
will get the 10306 port and
everybody else will get an error message:
$ ./wpconc Can't bind to 10306 port: Address already in use. $
In such cases, use the -p option to your proxy to specify an alternative port. Use the port 10xxx, where xxx are the last three digits of your student id. For example, if the last three digits of your student id are 123, use port 10123:
$ ./wpconc -p 10123Don't forget to specify the alternative port when starting a browser:
$ http_proxy=http://localhost:10123 lynx
This page last modified on 10 November 2007. |
|