CS 509, Advanced Programming II

Spring 2001 - Test 4


  1. A quotation is a sequence of zero or more characters delimited by either single-quote characters (') or a double-quote characters ("); the same delimiter character must appear on both ends of the quotation. Any characters may appear in a quotation; if the delimiter character appears in the quotation, it must be doubled (that is, appear as '' or "").

    Write an algorithm that accepts a string and finds a quotation in the string, stopping at the end of the first quotation it finds, or at the end of the string. The input string may or may not contain a quotation, and the quotation may or may not be syntactically correct.


    a string recognizing fsm
    This FSM treats a string with no quotation as an error; alternatively, you could draw the end arrow from Start to End. Most people who got close to the right answer tripped up over strings such as """".


  2. Explain how backwards reasoning is useful in improving code performance.


    Backwards reasoning traces from the symptoms causing distress to the problem causing the symptoms. For performance, distress is caused by excessive resource use; too much time or space, for example. After measuring the program to determine the symptoms, backwards reasoning, along with detailed trace measurements, would trace back to the sections of code that are causing the excessive resource use.

    Notice how well the 80-20 rule works with backward reasoning. The 80-20 rule says that excessive resource use is generally caused by a small portion of the code. This means that you usually have only a few specific locations in code to find using backwards reasoning.


  3. What was the spam-filter performance problem that Kernighan and Pike described in Chapter 7? How did they fix it? Did they change the worst-case asymptotic running-time of the spam filter?


    Their spam filter had a simple double loop implementation:

    for each message m
      for each spam pattern p
        if p found in m then
          do something.
    
    This algorithm has asymptotic running time italit(O(MP)), where M is the number of messages to check and P is the number of spam patterns for which to check. In a production environment with a high volume of mail, both M and P are large enough turn the spam filter into a bottleneck.

    The greatest performance improvement came from specializing the inner loop to cut down on the number of iterations. By sorting the patterns and indexing on the first letter, the spam filter could quickly and efficiently locate the patterns starting with a particular letter.

    The worst-case asymptotic running time remains at italit(O(MP)) for the improved spam filter. The average-case asymptotic running time improved, although not as much as expected, for reasons explained by Kernighan and Pike in Chapter 7.


  4. What is the 80-20 rule? Explain why is it important.


    The 80-20 rule is a heuristic (or rule of thumb) stating that 80% of a program's execution time is spent in 20% of the code.

    The 80-20 rule is important because it suggests that a program with poor performance may have one or two fairly small sections of hot-spot code that are causing the problem. The first step in improving the program's performance is to find the hot spots and fix them.



This page last modified on 28 February 2000.