Advanced Programming II, Spring 2001

Programming Assignment 6 - An Example Solution


Table of Contents

The Problem

The motivation for this problem came from Kernighan and Pike's section on programs that write programs. The idea of recognizing command prefixes is an old one, although it seems to have fallen by the wayside with the rise of GUIs. Tenex, an operating system for the DECSYSTEM-10 time-sharing computer, had a particularly powerful prefix-based recognition system that included command completion and full help capabilities. The code was part of the operating system (those were the days of monolithic operating systems), which meant that all commands provided the same level of support.

The Algorithm

A little bit of thought about this problem shows two things: the main algorithm is fairly simple, and the important algorithm is the one for the generated code.

The main algorithm has the standard read-evaluate-print structure, where read means input the commands, evaluate means drop the duplicates, and print means generate the code. All these parts are straightforward, particularly because you aren't allowed to play with files other than cin and cout.

The algorithm for the generated code is also pretty simple. The commands are stored in the static, global, constant array cmds; the matches to a particular command are stored in the array matches.

#include <string>
#include <cassert>
#include <cctype>

using std::string;

static const unsigned cmd_cnt = 3;

static const string cmds[] = {
  "red",
  "white",
  "blue",
  ""
  };

static string matches[cmd_cnt + 1];
In this example there are three commands: red, white, and blue. The last, empty string solves two trivial but annoying problems. First, it eliminates having to worry about the trailing-comma problem, where the last command in the list should not be followed by a comma. Second, it also eliminates having to worry about the no-command case, because there'll always be at least one string in the commandlist.

The code also neds a case-insensitive prefix match routine.

static bool prefix_match(const string & p, const string & s) {
  if (p.length() > s.length()) return false;
  for (int i = p.length() - 1; i >= 0; i--)
    if (tolower(p[i]) != tolower(s[i])) return false;
  return true;
  }
command() just looks for prefix-matches between the input command cmd and the command list. Matches are copied into the return array matches; although more expensive, copying is safer because it isolates the command list from external change.
const string * command(const string & cmd) {

  unsigned m = 0;

  for (unsigned i = 0; i < cmd_cnt; i++)
    if (prefix_match(cmd, cmds[i]))
      matches[m++] = cmds[i];

  assert(m <= cmd_cnt);

  matches[m] = "";

  return matches;
  }

The Program

Everything in one file this time. Because we know about STL vectors now, we'll use them linked lists to store the commands. typedefs are your friend when it comes to the STL because templated class name often become long and unwieldy.
<gen-cmds.cc>=

#include <stdlib.h>
#include <iostream>
#include <vector>
#include <algorithm>
#include <functional>
#include <string>

using std::vector;
using std::string;
using std::cin;
using std::cout;
using std::cerr;
using std::ostream;

typedef vector<string>       string_vec;
typedef string_vec::iterator svec_iter;

<the procedure generator>

<equal-string comparison>

<main()>
Defines gen-cmds.cc, string_vec, svec_iter (links are to index).

A standard main(): read the input, process it, and produce the output.

<main()>= (<-U)

int main(void) {

  string_vec commands;
  string cmd;

  while (cin >> cmd) {

    if (cin.bad()) {
      cerr << "Error during command input.\n";
      return EXIT_FAILURE;
      }

    const svec_iter e = commands.end();
    if (e == find_if(commands.begin(), e, bind1st(ptr_fun(finder), cmd)))
      commands.push_back(cmd);
    }
  
  assert(cin.eof());

  generate_cmds_proc(cout, commands);

  return EXIT_SUCCESS;
  }
Defines main (links are to index).

The find_if() call in main() checks the command vector to see if the new string cmd already exists in the vector. find_if(s, e, p) is an STL algorithm that searches the elements in the iterator range (s, e) for the first element i that causes the unary predicate p(i) to return true. Unfortunately, the code in main() needs a binary predicate to compare new and old strings. To be acceptable to find_if(), the binary predicate needs to be turned into a unary predicate. This is possible to do in this case because one of the predicate's two arguments is always the new string.

Appropriately enough, the STL has some routines that help solve this problem. The STL provides binder adaptors that construct unary functions from binary functions. The bind1st(f, a) adaptor constructs a unary function u() by binding the unchanging argument a as the binary function f()'s first (or left) parameter:

u(x) = f(a, x)
The bind2nd(f, a) adaptor binds a as f()'s second (or right) parameter:
u(x) = f(x, a)
In either case u() is a unary function suitable for use in STL algorithms.

But the problem isn't quite solved yet. The f() argument to bind1st() isn't a function pointer, but rather a function object, also called a functor. Function objects will be described in more detail below; for now, it's enough to know the STL adaptor ptr_fun() accepts a pointer to a function and returns the equivalent function object. Now the problem is solved, as indicated by the code in main().

finder() implements a case insensitive, full-string comparison.

<equal-string comparison>= (<-U)

static bool finder(const string oldstr, const string newstr) {
  if (oldstr.length() != newstr.length()) return false;
  for (int i = newstr.length() - 1; i >= 0; i--)
    if (tolower(newstr[i]) != tolower(oldstr[i])) return false;
  return true;
  }
Defines finder (links are to index).

generate_cmds_proc() writes to the output stream out a procedure for recognizing the commands given in cmds. The structure of the code written to out is discussed in the Algorithm Section.

<the procedure generator>= (<-U)

<the printer>

static void generate_cmds_proc(ostream & out, const string_vec & cmds) {

  const unsigned cmd_cnt = cmds.size();

  out << "#include <string>\n";
  out << "#include <cassert>\n";
  out << "#include <cctype>\n";
  out << "\n";
  out << "using std::string;\n";
  out << "\n";
  out << "static const unsigned cmd_cnt = " << cmd_cnt << ";\n";
  out << "\n";
  out << "static const string cmds[] = {\n";

  for_each(cmds.begin(), cmds.end(), printer(out));

  out << "  \"\"\n";
  out << "  };\n";
  out << "\n";
  out << "static string matches[cmd_cnt + 1];\n";
  out << "\n";
  out << "\n";
  out << "static bool prefix_match(const string & p, const string & s) {\n";
  out << "  if (p.length() > s.length()) return false;\n";
  out << "  for (int i = p.length() - 1; i >= 0; i--)\n";
  out << "    if (tolower(p[i]) != tolower(s[i])) return false;\n";
  out << "  return true;\n";
  out << "  }\n";
  out << "\n";
  out << "\n";
  out << "const string * command(const string & cmd) {\n";
  out << "\n";
  out << "  unsigned m = 0;\n";
  out << "\n";
  out << "  for (unsigned i = 0; i < cmd_cnt; i++)\n";
  out << "    if (prefix_match(cmd, cmds[i]))\n";
  out << "      matches[m++] = cmds[i];\n";
  out << "\n";
  out << "  assert(m <= cmd_cnt);\n";
  out << "\n";
  out << "  matches[m] = \"\";\n";
  out << "\n";
  out << "  return matches;\n";
  out << "  }\n";
  }
Defines generate_cmds_proc (links are to index).

generate_cmds_proc() uses for_each() to print the strings in the initializer list for the string array cmds. for_each(s, e, f) is an STL algorithm that applies the unary function f() to each element in the iterator range (s, e). The print function that outputs the initializer values takes as arguments the output stream to write and the string. Because the print routine is a binary function, it can't be used as an argument to for_each().

This problem is similar to the string-finding problem discussed above, and can be solved in the same way, using bind1st() and ptr_fun(). However, this problem can be solved more directly, albeit with a little more work, using function objects.

One way to turn a binary function into a unary function is to pass one of the arguments into the function in some way other than through the argument list:

static string newstr = new_string;
bool same(string oldstr) { return newstr == oldstr; }
This works, but the global variable makes this a messy solution. A better solution delimits the scope of the global string by wrapping everything in a class:
struct compr {
  const string new_string;
  compr(string s) : new_string(s) { }
  bool same(string s) { return s == new_string; }
  }
Unfortunately, creating a new class doesn't work because find_if() is expecting a function, not a class.

C++, however, does offer a work-around for this problem: overloading the calling operator (). If a class C defines operator(), then objects of type C can be called as if they were functions:

C c;
c();    // equivalent to c.operator()(), calls C::operator().
Objects for which operator() is defined are known as function objects or functors.

By redefining the class compr to have operator()

struct compr {
  const string new_string;
  compr(string s) : new_string(s) { }
  bool operator()(string s) { return s == new_string; }
  }
objects of type compr become function objects that respond to a function call having a single string argument; that is, they become acceptable as the third argument to find_if().

Defining function objects is a general C++ technique; it is not limited to use with STL, and it can turn a function at a given arity into a function at any smaller arity. The STL adaptors bind1st() and ptr_fun() (and others not discussed here) essentially construct function objects in the way just described. However, it doesn't make any difference to the STL algorithms how function objects are created (mostly, there are a few exceptions) as long as the function objects exhibit the proper syntactic and semantic structure.

printer is a function-object class that writes to the output stream out the string str as part of an initializer for a string array. In this case out is the unchanging argument; all initializers get written to the same output stream.

The escape() routine requires a bit of explaining. Commands may contain any characters, including double quote and backslash characters. Putting such characters into the initializer list unescaped will confuse the compiler; for example, the command he"o would become "he"o" and chaos would ensue. Escaping these dangerous characters keeps the syntax straight.

<the printer>= (U->)

class printer {

  public:

    printer(ostream & _out) : out(_out) { }

    void operator()(const string & str) {
      out << "  \"" << escape(str) << "\",\n";
      }

  private:

    ostream & out;

    const string & escape(const string & str) {
      static string cpy;
      cpy = str;
      unsigned i = 0;
      while ((i = cpy.find_first_of("\\\"", i)) != string::npos) {
        cpy.insert(i, "\\");
        i += 2;
        }
      return cpy;
      }
  };
Defines printer (links are to index).

Index


This page last modified on 16 April 2001.