Computer Algorithms I CS 305 Class Notes

Lecture Notes for Computer Algorithms I

4 November 2003 - Code Review for Assignment 3

Assignment 3

Design
Tests
Implementation

Word Counting

If we had a list of words and counts in a document, we could solve this problem.

O.k. *poof* you've got it; now what?

mitm(document)

  word_list 
    * words = words_and_counts(document)
    * wp

  for (wp = words; wp; wp = wp->next)
    std::cout << wp->count << " " 
              << wp->word << "\n"

  wp = words
  while wp
    words = wp->next
    delete wp
    wp = words

Designing While Wishing

You can design a lot just by wishing.
What do we know about the list of words and counts?
- It's got three fields: count, word, and next.
- It's apparently a linked list.
Most of these features come directly from the problem statement.
Modify your word-scraping version of mitm() to output to std-out a count of the number of times each word appears in the given document. [ . . . ] Lines should be output in order of increasing count;

Designing The Word-Count List

From where does the word-count list come?
- If I had a list of every word in the document, I could make it.
O.k. *poof* you've got it.
- Oh, and a procedure to count the words, and a procedure to sort the counted words.

O.k. *poof* you've got them.

word_list *
words_and_counts(document)

  return
    sort_words_by_count(
      count_words(
        words_in_document(document)))

More Design

words_in_document() has already been (mostly) done.
- It has to be updated to put the words in a list.
sort_words_by_count() should be straightforward.
That leaves count_words().
- Counting words in a list seems hard.
  - There's duplicates to deal with.

And deleting and moving list elements.

Redesign

count_words() seems like a lot of work.
- It seems easiest to build the list from scratch by counting words.

In that case, get rid of count_words() and redo words_in_document() to count the words too.

O.k.

word_list *
words_and_counts(document)

  return
    sort_words_by_count(
      words_in_document(document)))

A Recap

One new functions, two old functions modified, a new data structure.
- words_in_document() creates a word-count list.
- sort_words_by_count() is new.
- mitm() outputs words and counts.
- A linked list to bind them all together.
This is about 30-50 extra semicolons.

Test Cases

The test cases were
- Empty page and body tests; broken body test.
  - <body broken body.
- One thousand different unique words test.
- Fifty different words test, word_i has count i.
- Anti-garbage test.
  - Minimum space required for one run.
    - ulimit -v 2650
  - Space used for 700 runs.

Deadlines Kill!

Unless you're perfect, don't wait until the last minute to turn-in your assignment.
Date: Thu, 30 Oct 2003 20:52:44 (EST) Subject: Submitted files From: s0------@monmouth.edu To: rclayton@monmouth.edu
When I submitted my files today I first accidentally submitted files in the wrong directory. I then resubmitted with the correct files. I received the email for the first submission after I received the email for the second one. I would like to make sure that the second submission was the one that was officially submitted at the deadline.
Unfortunately, it was not.

Avoid Convolutions

What is going on here?

void display_results()

  unsigned MyCnt = 4294967295U;
  node* NodeCountMax = NULL;
  node* curNode = head;

  for i = 0; i < curNode->size; i++
   do
     if (curNode->count < MyCnt) && 
        not curNode->displayed
       NodeCountMax  =  curNode
       MyCnt = curNode->count
     curNode = curNode->next
   while curNode
   if (NodeCountMax) {
     cout << "whatever..."
     NodeCountMax->displayed = true
     // decrease MyCnt = 0
     MyCnt = 4294967295U
     curNode = head

Be Straightforward

Do one thing at a time.
- Order numbers, then print them.
- Later, maybe, combine them.
No magic numbers. (4294967295U?)

Boolean flags almost always indicate bad design.

void display_results(node * head)

  node dummy = { "", head }
  
  while dummy.next
    max = &dummy
    for n = max->next; n->next; n = n->next
      if max->next->count < n->next->count)
        max = n

    out << max->count << max->word

    n = max->next
    max->next = max->next->next;
    delete n

Exploit Existing Features

There's too much of this

while (counter < sitedata.size())
  if (sitedata[counter] == '<')
    ++counter

    while (sitedata[counter] == ' ')
      ++counter

    while (sitedata[counter] != '>')
      ++counter
    }
  ++counter

counter can be run off the end of the string.

And not enough of this.

while (i = data.find("<", i)) != npos
  i = data.find_first_not_of(" ", i + 1)
  if i == npos, break

  i = data.find(">", i + 1)
  if i == npos, break

See Section 3.4 in Nyhoff or Chapter 19 in Deitel & Deitel.

No, Really - Use Existing Features

I don't want to see code like this any more.

for int i = 0; document.data[i] != '\0'; i++
 if document.data[i] == '<'
   i++
   while (isspace(document.data[i])) i++
   if toupper(document.data[i]) == 'B'
     i++
     if toupper(document.data[i]) == 'O'
       i++
       if toupper(document.data[i]) == 'D'
	 i++
	 if toupper(document.data[i]) == 'Y'
	   i++
	   while document.data[i] != '>'
	     i++
	     if document.data[i] == '>'
	       start = i + 1
	 else
	   i--
       else
	 i--
     else
       i--
   else
     i--

Write Without Echos

Cut 'n' paste is not programming.

if ((data[i] == 'b' || data[i] == 'B') &&
    (data[i+1] == 'o' || data[i+1] == 'O') &&
    (data[i+2] == 'd' || data[i+2] == 'D') &&
    (data[i+3] == 'y' || data[i+3] == 'Y')) {
  // whatever
  }

// And sometime later...

if ((data[i] == '/') &&
    (data[i+1] == 'b' || data[i+1] == 'B') &&
    (data[i+2] == 'o' || data[i+2] == 'O') &&
    (data[i+3] == 'd' || data[i+3] == 'D') &&
    (data[i+4] == 'y' || data[i+4] == 'Y') &&
    (data[i+5] == '>')) {
  // whatever
  }

It bulks up code.
It's hard to fix errors or make changes.

Factor Duplicated Code

If you see duplicated code, put it in a subroutine.

bool
has_str(string str, int i, char * word)

  if str.size() >= i + strlen(word)
    for j = 0; j < strlen(word); j++
      if str[i + j] != word[j]
        return false

  return true

or, even better,

str.find(word, i);

Some Interesting Code

What do you think about this code?

item & list::
operator [](unsigned x)
  if (x < size) && (x >= 0)
    node* spot = head
    for (unsigned i = 0; i < x; i++)
      spot=spot->next
    return spot->data
  else
    cerr << x << " does not exist\n"
    exit(1);

Here's what I think:

item & list::
operator [] (unsigned x)
  node * spot = head
  while spot and x--
    spot = spot->next
  if spot
    return spot->data
  cerr << "bad list index"
  abort()

Some Further Thoughts

List indexing is an example of a pun: giving a common thing a different meaning.
- Using vector indexing on lists.
- Some people consider punning a bad practice.
  - It sows confusion and doubt.
The main problem with list indexing is that it's expensive.
```
for (i = 0; i < lst.size(); i++)
  if (lst[i] == data)
    return true
return false
```
- This is an O(n²) search.
- List searching should be O(n).

Points to Remember

The problem tells you what you should wish for.
- Make sure you understand the problem.
The answer to your wishes will tell you a lot.
- Make sure you're paying attention.
Write small code, write simple code.
- Make your meaning clear.
Know your tools, and use them.
- There's no excuse not to.

This page last modified on 10 November 2003.