Programming Assignment 2, Advanced Programming II

Advanced Programming II, Spring 2002

Programming Assignment 2 - An Example Solution

Introduction
Design
Implementation
Index

Introduction

This page gives a solution to the second programming assignment, which involved writing a simple constant-value optimizer for statement blocks.

Reading the problem statement a couple of times fairly well establishes the overall purpose of the program: repeatedly read a statement block, optimize the statement block, then write the statement block.

Assuming we understand what a statement block is, this translates directly into a main() routine:

<main()>= (U->)

int main() {

  statement_block sblock;
  std::string sep;

  // All statement blocks of interest are either still in std-in or have
  // been read from std-in, optimized, and written to std-out.

  while (sblock.input(std::cin)) {
    sblock.optimize();

    std::cout << sep;
    sblock.output(std::cout);
    sep = "\n";
    }

  // All statement blocks of interest are either still in std-in or have
  // been read from std-in, optimized, and written to std-out, and std-in is
  // empty. 

  }

Defines main (links are to index).

Almost by inspection this code is correct, assuming statement blocks are implemented correctly. But what are statement blocks? Reading the problem a few more times gives us some hints, and main() supplies the rest.

The problem tells us that a statement block is a sequence of one or more statements, and that statements are never deleted from a statement block (this last bit comes from understanding the optimizations). These facts suggest that a vector of statements might be a good representation for a statement block, assuming we know what a statement is.

The problem suggests what statement blocks might be; main() tells us what statement blocks must do: input, optimize, and output. Together, we have enough information to develop a first-cut at defining a statement block:

<statement block class definition>= (U->)

class statement_block {

  public:

    bool input(std::istream &);
    void output(std::ostream &) const;
    void optimize(void);

  private:

    typedef std::vector<statement> sblock;

    sblock block;
  };

Defines statement_block (links are to index).

We now have two sub-problems to solve: figuring out how statement-block operations are implemented, and figuring out what a statement is. Because statements are probably going to figure in how statement-block operations are implemented, it's a good idea to assume we know what statements are, and use them to implement the statement-block operations.

Unfortunately, while main() tells us what the statement-block operations are, it doesn't tell us how they're implemented. Again, however, the problem statement tells us (perhaps indirectly) what we need to know. A statement block is a sequence of statements delimited by blank lines; ergo, to input a statement block we need to input the statements between blank lines and store them in a block:

<statement block procedure definitions>= (U->) [D->]

bool statement_block::
input(std::istream & ins) {

  block.clear();

  std::string s;

  if (!read_next_nonblank_line(ins, s)) 
    return false;

  do block.push_back(statement(s));
  while (getline(ins, s) && !is_blank_line(s));

  return true;
  }

Defines statement_block::input (links are to index).

main() requires that input() return false when it has no more blocks to read. Also note that we assume that a statement can be created from a string containing a textual-representation of a statement.

Outputting statement blocks is even easier than inputing it, because we don't have to go searching around std-in to find them.

<statement block procedure definitions>+= (U->) [<-D->]

void statement_block::
output(std::ostream & outs) const {

  // All the statements between 0 (inclusive) and i (exclusive) have been
  // written to std-out, and i is 0.

  for (sblock::size_type i = 0; i != block.size(); i++)
    block[i].output(outs);

  // All the statements between 0 (inclusive) and i (exclusive) have been
  // written to std-out, and i is equal to the number of statements in the
  // block.
  }

Defines statement_block::output (links are to index).

Optimizing statement blocks first appears as if it could be implemented in the same way as outputting them: let each statement optimize itself. This approach is almost correct, but overlooks an important point about constant folding: to correctly constant fold a statement, the optimization needs to know which variables have been equated to constants. The statement block is the proper place to manage this information, because individual statements don't have the overall scope required.

With these refinements, the design of the statement block optimizer is

optimize()
  mapping m

  for i = 0 to block.size()
    block[i].optimize(m)

where mapping is some data structure that maintains the variable-constant correspondence.

That takes care of the design of statement blocks. We still have to design statements, and we've introduced a mapping as a new data structure. A mapping appears to be a simple dictionary-like data structure with the variable as the key and the constant as the entry, so we won't consider it further in the design.

Turning to statements, we finally have to do some serious design work. The statement block design set the statement's public interface, but left open the choice of internal representation. Using a string to represent a statement is one possibility, but a little thought shows it's not a good possibility for at least two reasons:

it's difficult to manipulate a string in the way that the optimizations require,
and strings ignore almost all the structure present in statements, which makes them harder to analyze.

If a string is too hard to deal with, how about breaking it up into a sequence of substrings, where each substring is a piece of the original statement? This makes it easier to manipulate the statement - substituting a constant for a variable, for example - and makes some analyses easier - making sure values appear on either side of an operator, for example.

A sequence of substrings is an improvement over a single string, but it still doesn't exploit all the structure in a statement. In particular, it's difficult to analyze a sequence to determine operator order as determined by precedence and associativity. One possibility is to exploit the fact that a statement is made up of binary operators and represent the statement as a binary tree (treating assignment as an operator for the moment). Not only does this make the operation ordering obvious, but it gets rid of the parentheses, and is a handy form for optimizing.

The only problem with binary trees is we have to do a lot of work to implement them; we have define the node class and use difficult and dangerous dynamic memory programming to create the trees. It would be great if we could have trees in a form supported by the STL. And fortunately, we can walk the tree to linearize it and store the nodes in a vector. An inorder walk brings us back to a sequence of substrings; a pre-order walk puts the working end of the tree at the wrong end of the vector (the root would be at element 0), leaving a post-order walk to create a postfix representation.

To summarize, we are thinking about going from a string to a tree to a postfix stack:

But, the transformation still requires that we build a binary tree, which is what we were trying to avoid in the first place. Is there some way to go directly from the string to the postfix stack without a binary tree? A trip back to the problem statement gives us the answer we need. After re-reading the description of an expression and fiddling around a bit with a paper and pencil, it should be clear that a statement has the following form:

value₁ op₁ value₂ op₂ value₃ . . . value_n op_n value_{n + 1}

where n is at least 1 and value is either a variable, a constant or an expression between parentheses (in fact, value₁ is a variable, and op₁ is an assignment operator, but we'll ignore this for a moment).

Now we're almost there. Because all operators are left-associative, a right to left scan of statement is enough to generate a postfix representation for the statement: get the right operand value_{i + 1} and operator op_i, then recursively translate the left operand to postfix. Pushing the left, then right operand followed by the operator completes the transformation. If one of the values is a parenthesized expression, then recursively translate the expression inside the parentheses to postfix.

The last bit we have to clean up is value₁ and op₁, the variable and assignment statement. The problem here is assignment is right associative, not left associative. However, that's easily dealt with by stopping the transformation when it reaches the assignment operator and inserting it "by hand" on the top of the stack; this puts the assignment in proper postfix order, and the whole statement has been translated.

Before going on, it would be wise to make sure that optimization and output can work with a statement represented in postfix form. Output is equivalent to converting the postfix stack into a string, which is easily done with a recursive procedure:

postfix2string(stk)
  opr = stk.pop
  right = postfix2string(stk)
  if is_expression(right)
    right = "( " + right + " )"
  left = postfix2string(stk)
  return left + " " + opr + " " +right

There is one subtlety with this code: if the right operand is an expression, it must be placed in parentheses to make sure the whole expression is evaluated before the operation takes place.

Optimization has a similar recursive structure: recursively optimize the left and right operands, then optimize the expression itself.

optimize(stk)
  opr = stk.pop
  right = optimize(stk)
  left = optimize(stk)
  if left and right are constant
    optimize
  else
    push back left, right, and opr

That more or less concludes the design of an statement block optimizer. The one main issue left is dealing with errors. There are two techniques for dealing with errors: freaking out or pretending that nothing's wrong. Freaking out corresponds to throwing an exception, while pretending nothing's wrong corresponds to defining a special error statement. The disadvantage to using a special error statement is the code becomes cluttered with checks for errors (an error statement doesn't get optimized, for example); the disadvantage to using exceptions is providing correct resource management in the face of arbitrary termination. Because ugly code is marginally better than broken code, we'll assume a special statement to handle errors.

There are many other, smaller details to be worked out, but now that we have the general outline of the program designed, we'll have enough guidance to handle them as they come up during implementation.

Implementation

Statement Blocks

Most of the statement-block code is simple enough to be described directly in the design section. Statement-block optimization had a bit of complexity to it, and the design only sketched out an implementation.

Statement-block optimization proceeds statement-by-statement, with each statement doing its own optimization. If an error occurs during optimization, the whole block is replaced with the error statement.

<statement block procedure definitions>+= (U->) [<-D]

void statement_block::
optimize(void) {

  mapping e;

  for (sblock::size_type i = 0; i != block.size(); i++)
    if (!block[i].optimize(e)) {
      statement s = block[i];
      block.clear();
      block.push_back(s);
      break;
      }
  }

Defines statement_block::optimize (links are to index).

<statement-block.cc>=

#include "statement-block.h"
#include "str-utils.h"

<statement block procedure definitions>

Defines statement-block.cc (links are to index).

<statement-block.h>=

#ifndef _statement_block_h_defined_
#define _statement_block_h_defined_

#include <fstream>
#include <vector>
#include "statement.h"

<statement block class definition>

#endif

Defines statement-block.h (links are to index).

Statements

The externally visible parts of the statement class are given by the statement block class: create a statement from a string, optimize a statement under a particular set of variable assignments, and output a statement to a stream.

Within the class, a statement is represented in postfix form in the word stack stmt. Linearly, the stack top appears to the right; if s is x = y + 1, stmt ends up being x y 1 + = with = at the top of the stack and x at the bottom.

<the statement class definition>= (U->)

class statement {

  public:

    statement(const std::string &);
    bool optimize(mapping &);
    void output(std::ostream &) const;

  private:

    word_stk stmt;
  };

Defines statement (links are to index).

Inside the class, things are a bit messier. The constructor statement() uses a couple of helper routines to check for syntax errors and to translate s from infix to postfix.

<internal statement procedure definitions>= (U->) [D->]

static word_stk parse_expr(word_stk &);
static word_stk parse_value(word_stk &);

statement::optimze() uses a couple of helper routines to optimize a statement.

<internal statement procedure definitions>+= (U->) [<-D->]

static bool optimize(word &, const word &, const word &, const word &);
static word_stk optimize_expr(word_stk &, const mapping &);

And even statement::output() uses a helper routine to produce the string to output.

<internal statement procedure definitions>+= (U->) [<-D]

static bool output_stmt(string &, word_stk &);

statement() constructs a statement instance based on the statement given in s; the word stack stmt contains the postfix version of s. stmt is initialized to contain s in infix order; the body of the statement() turns the order into postfix. The statement in s is not optimized here; it is just checked for syntax errors and turned from infix to postfix form.

If a syntax error is discovered during the translation of s, stmt contains a single error word describing the problem.

<statement procedure definitions>= (U->) [D->]

statement::
statement(const string & s) : stmt(s) {

  assert(stmt.size());

  if (stmt.is_error())
    return;

  if ((stmt.size() < 3) || !(stmt.size() & 1)) {
    stmt.error("syntax error in statement");
    return;
    }

  word_stk expr = parse_expr(stmt);
  if (expr.is_error()) {
    stmt = expr;
    return;
    }

  if (stmt.size() != 2) {
    stmt.error("malformed syntax");
    return;
    }

  word assign = stmt.pop();
  if (!assign.is_type(word::assign_word)) {
    stmt.error(string("assignment operator expected, not \"") + 
               assign.prep + "\"");
    return;
    }

  if (!stmt.top_type(word::variable_word)) {
    stmt.error(string("variable expected to the left of an assignment, not \"") + 
               stmt.top().prep + "\"");
    return;
    }

  stmt.push(stmt.pop());
  stmt.push(expr);
  stmt.push(assign);
  }

%def

Try to optimize the binary expression l o r; if the expression can be optimized, place the result in opt and return true; otherwise return false.

<statement procedure definitions>+= (U->) [<-D->]

static bool
optimize(word & opt, const word & l, const word & o, const word & r) {

  if (l.is_type(word::constant_word) && r.is_type(word::constant_word)) {
    <evaluate a constant expression>
    return true;
    }

  <optimize some common arithmetic cases>

  return false;
  }

Defines optimize (links are to index).

Both operands are constants; evaluate the constant expression and store there result in opt. The argument-value checking is minimal, but the idea should be clear.

<evaluate a constant expression>= (<-U)

const int lval = atoi(l.prep.c_str());
const int rval = atoi(r.prep.c_str());

if (o.prep == "+")
  opt = word(lval + rval);
else if (o.prep == "-")
  opt = word(lval - rval);
else if (o.prep == "*")
  opt = word(lval*rval);
else if (o.prep == "/")
  opt = word(rval ? lval/rval : 0);
else if (o.prep == "%")
  opt = word(rval ? lval % rval : 0);
else if (o.prep == "<<")
  opt = word(rval >= 0 ? lval << rval : 0);
else if (o.prep == ">>")
  opt = word(rval >= 0 ? lval >> rval : 0);
else
  assert(!"Unrecognized operator in evaluate()");

Even if both operands aren't integer values, it may still be possible to optimize the binary expression using some arithmetic properties, such as

x +
0 = 0 + x = x

<optimize some common arithmetic cases>= (<-U)

if (o.prep == "+") {
  if (r.prep == "0") {
    opt = l;
    return true;
    }
  if (l.prep == "0") {
    opt = r;
    return true;
    }
  }
else if (o.prep == "-") {
  if (r.prep == "0") {
    opt = l;
    return true;
    }
  if (l.prep == r.prep) {
    opt = word(0);
    return true;
    }
  }
else if (o.prep == "*") {
  if (r.prep == "1") {
    opt = l;
    return true;
    }
  if (l.prep == "1") {
    opt = r;
    return true;
    }
  if ((l.prep == "0") || (r.prep == "0")) {
    opt = word(0);
    return true;
    }
  }
else if (((o.prep == "/") || (o.prep == "%")) && (r.prep == "1")) {
  opt = l;
  return true;
  }
else if ((o.prep == "/") && (l.prep == r.prep)) {
  opt = word(1);
  return true;
  }
else if (((o.prep == ">>") || (o.prep == "<<")) && (r.prep == "0")) {
  opt = l;
  return true;
  }

To optimize a statement, pop the assignment operator and optimize the expression on the right-hand side. After the expression has been optimized, stmt should contain the assignment's left-hand side. If the statement is an error statement; don't do anything. Return true if and only if optimization went o.k.; return false on an error.

<statement procedure definitions>+= (U->) [<-D->]

bool statement::
optimize(mapping & m) {

  assert(stmt.size());

  if (stmt.is_error())
    return false;
  
  word assign = stmt.pop();
  assert(assign.is_type(word::assign_word));

  word_stk expr = optimize_expr(stmt, m);

  assert((stmt.size() == 1) && (stmt.top_type(word::variable_word)));

  if ((expr.size() == 1) && expr.top_type(word::constant_word))
    m.associate(stmt.top(), expr.top());
  else if (m.has_association(stmt.top())) 
    m.drop_association(stmt.top());

  stmt.push(expr);
  stmt.push(assign);

  return true;
  }

Defines optimize (links are to index).

Optimize the postfix expression given at the top of expr in the context of the mapping m; return the optimized postfix expression in a word stack.

If the expression is a constant, return the constant. If the expression is a variable and the variable has an associated integer, return the integer (that is, fold the constant); otherwise, return the variable. If the expression is a binary expression, optimize the expression's two operands, and then optimize the expression itself, if possible (that is, try to compile-time evaluate constant expressions). Return the optimized result if it exists, otherwise return the binary expression (in postfix form).

<statement procedure definitions>+= (U->) [<-D->]

static word_stk 
optimize_expr(word_stk & expr, const mapping & m) {

  assert(expr.size());

  if (expr.top_type(word::constant_word))
    return word_stk(expr.pop());

  if (expr.top_type(word::variable_word)) {
    word v = expr.pop();

    if (m.has_association(v))
      v = m.get_association(v);

    return word_stk(v);
    }

  word opr = expr.pop();
  word_stk right = optimize_expr(expr, m);
  word_stk left = optimize_expr(expr, m);

  word_stk opt_e;
  word o(0);

  if (opr.is_type(word::operator_word) && 
      (left.size() == 1) && (right.size() == 1) &&
      optimize(o, left.top(), opr, right.top())) {
    opt_e.push(o);
    return opt_e;
    }
  else {
    opt_e.push(left);
    opt_e.push(right);
    opt_e.push(opr);
    }

  return opt_e;
  }

Defines optimize_expr (links are to index).

Printing a statement is most easily done by breaking it up into two parts: creating the string the print and then printing the string. statement::output() prints the string.

<statement procedure definitions>+= (U->) [<-D->]

void statement::
output(std::ostream & outs) const {

  word_stk st(stmt);
  string s;

  output_stmt(s, st);
  outs << s << "\n";
  }

Defines output (links are to index).

Translate the postfix expression given in the word stack stmt into an infix expression represented as a string; the expression translated is popped off stmt.

Variables, constants and errors are returned as themselves. Given a binary expression (and at this point we can consider assignment to be another binary operation), convert the two operands into infix strings and catenate them with the operator to form the infix string.

The return value is true if and only if the returned string is an expression with binary operators. The return value is used to determine if right operands need be surrounded by parentheses to preserve evaluation order. For example, without parentheses the right expression in a + ( b + c ) would be a + b + c which, because of left associativity, is not the order of evaluation specified by the original expression. When the right operand to a binary expression is another binary expression, the right operand needs to be surrounded by parentheses to make sure it is evaluated properly.

<statement procedure definitions>+= (U->) [<-D->]

static bool 
output_stmt(string & s, word_stk & stmt) {

  assert(stmt.size());

  word w = stmt.pop();

  if (w.is_type(word::variable_word) || 
      w.is_type(word::constant_word) ||
      w.is_type(word::error_word)) {
    s = w.prep;
    return false;
    }    

  assert(w.is_type(word::assign_word) || w.is_type(word::operator_word));

  string right;
  if (output_stmt(right, stmt) && !w.is_type(word::assign_word))
    right = string("( ") + right + " )";

  string left;
  output_stmt(left, stmt); 

  s = left + " " + w.prep + " " + right;

  return true;
  }

Defines output_stmt (links are to index).

Parse the expression at the top of the unparsed word stack and return it in postfix form. Expressions are parsed from right-to-left from the word stack unparsed; the right-most part of an expression must be a value. The right value may be preceded by an operator (as in left + right), an open parens (as in ( right )) or by the assignment operator (as in x = 1. In the last two cases, the expression is complete and the right value is returned as the expression (the open parens or assignment operator is left on the stack to be dealt with by the caller of parse_expr()).

If the right operand is followed by an operator, then the left operand is an expression because the operators are all left-associative. If the left operand can be successfully parsed, the expression found at the top of unparsed is returned in a new word stack in postfix form: the left expression followed by the right value followed by the operator (as in left right +.

<statement procedure definitions>+= (U->) [<-D->]

static word_stk 
parse_expr(word_stk & unparsed) { 

  assert(unparsed.size() && !unparsed.is_error());

  word_stk right = parse_value(unparsed);
  assert(unparsed.size());

  if (right.is_error() || 
      unparsed.top_type(word::lparen_word) ||
      unparsed.top_type(word::assign_word))
    return right;

  if (!unparsed.top_type(word::operator_word)) {
    const string & b = unparsed.top().prep;
    unparsed.error(string("operator expected, not \"") + b + "\"");
    return unparsed;
    }

  word opr = unparsed.pop();

  if (!unparsed.size()) {
    unparsed.error("missing value");
    return unparsed;
    }

  word_stk left = parse_expr(unparsed);
  if (left.is_error())
    return left;
    
  word_stk parsed(left);
  parsed.push(right);
  parsed.push(opr);

  return parsed;
  }

Defines parse_expr (links are to index).

Parse the value at the top of the word stack unparsed and return it in postfix form. A value is either a constant, a variable, or an expression in parentheses. The parsed value is returned in a word stack because an expression may contain more than one word.

<statement procedure definitions>+= (U->) [<-D]

static word_stk 
parse_value(word_stk & unparsed) { 

  assert(unparsed.size() && !unparsed.is_error());

  word_stk value;

  if (unparsed.top_type(word::rparen_word)) {
    unparsed.pop();

    value = parse_expr(unparsed);
    if (value.is_error())
      return value;

    if (!unparsed.top_type(word::lparen_word)) {
      unparsed.error(string("left parenthesis expected, not \"") + 
                     unparsed.top().prep + "\"");
      return unparsed;
      }

    unparsed.pop();
    }

  else if (unparsed.top_type(word::variable_word) || 
           unparsed.top_type(word::constant_word))
    value.push(unparsed.pop());

  else
    value.error(string("value expected, not \"") + unparsed.top().prep + "\"");

  return value;
  }

Defines parse_value (links are to index).

<statement.h>=

#ifndef _statement_h_defined_
#define _statement_h_defined_

#include <fstream>
#include <string>
#include "mapping.h"
#include "word-stk.h"

<the statement class definition>

#endif

<statement.cc>=

#include "statement.h"
#include "str-utils.h"
#include <cstdlib>

using std::string;

<internal statement procedure definitions>

<statement procedure definitions>

Defines statement.cc, statement.h (links are to index).

Words

The components of a statement - variables, constants and the rest - need to have a representation, which will be called a word. The rest of the code doesn't demand much of words; a word needs to indicate what it looks like in text - its print representation or prep - and it needs to indicate what type of word it is.

<word class definition>= (U->)

struct word {

    enum word_type {
      assign_word,
      constant_word,
      error_word,
      lparen_word,
      operator_word,
      rparen_word,
      variable_word
      };

    std::string prep;
    word_type type;

    word(const std::string &);
    word(int);

    bool is_type(word_type) const;
  };

Defines word (links are to index).

The primary word constructor accepts a word's print representation and uses it to set the word's type. This constructor can return a word of any type; any print representation that doesn't match any legal category is considered to be an error.

<word procedure definitions>= (U->) [D->]

word::
word(const string & s) : prep(s) {

  const char * letters = 
    "_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
  const char * lad = 
    "_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

  if (((s.length() == 1) && (s.find_first_of("+-*/%") == 0)) ||
      ((s.length() == 2) && ((s == ">>") || (s == "<<"))))
    type = operator_word;

  else if (s == "=")
    type = assign_word;

  else if (s == "(")
    type = lparen_word;

  else if (s == ")")
    type = rparen_word;

  else if (is_integer(s))
    type = constant_word;

  else if ((s.find_first_of(letters) == 0) &&
           (s.find_last_not_of(lad) == string::npos))
    type = variable_word;

  else
    type = error_word;
  }

Defines word (links are to index).

It will be convenient to have a constructor that returns an constant word given an integer value.

<word procedure definitions>+= (U->) [<-D->]

word::
word(int i) {

  const unsigned i_size = 15;
  char iprep[15];
  snprintf(iprep, i_size, "%d", i);

  type = constant_word;
  prep = iprep;
  }

Defines word (links are to index).

A predicate to test a word's type.

<word procedure definitions>+= (U->) [<-D]

bool word::
is_type(word_type t) const {
  return type == t;
  }

Defines is_type (links are to index).

<word.h>=

#ifndef _word_h_defined_
#define _word_h_defined_

#include <string>

<word class definition>

#endif

Defines word.h (links are to index).

<word.cc>=

#include "word.h"
#include <cstdio>
#include "str-utils.h"

using std::string;

<word procedure definitions>

Defines word.cc (links are to index).

The Word Stack

Like the name suggests: a stack of words. Constructors can build an empty word stack, a word stack containing a single, given word, and a word stack derived from a string containing a statement; in the last case, the rightmost word in the statement appears at the top of the stack.

<word_stk public interface>= (U->) [D->]
word_stk();
word_stk(const word &);
word_stk(const std::string &);

It's a stack, so it needs stackish operations. pop() returns the word popped, unlike the STL pop routines. In addition to pushing a single word on a stack, push() can push one word stack on another, so that the top of the given stack becomes the top of the pushed stack; the given stack is unchanged.

<word_stk public interface>+= (U->) [<-D->]
word top(void) const;
bool top_type(word::word_type) const;
word pop(void);
void push(const word &);
void push(const word_stk &);

Should an error be found in the statement being held in a word stack, the stack is changed to an error stack, which contains a single error word having the given error message as its print representation.

<word_stk public interface>+= (U->) [<-D->]
void error(const std::string &);

size() returns the number of words in the stack; is_error() returns true if the stack contains only the error word.

<word_stk public interface>+= (U->) [<-D]
unsigned size(void) const;
bool is_error(void) const;

The word stack constructors.

<word_stk procedure definitions>= (U->) [D->]

word_stk::
word_stk(void) { 
  }

word_stk::
word_stk(const word & w) { 
  wdq.push_back(w);
  }

word_stk::
word_stk(const string & s) { 

  std::vector<string> pieces = split(s);

  for (unsigned i = 0; i < pieces.size(); i++) {
    word w = word(pieces[i]);
    if (w.type == word::error_word) {
      wdq.clear();
      w.prep = string("Unexpected word \"") + w.prep + "\"";
      wdq.push_back(w);
      return;
      }
    else
      wdq.push_back(w);
    }
  }

Defines word_stk (links are to index).

The stack operations. These would be much easier to implement if we know about the STL stack adaptors, but because we don't, we'll do it the hard way (or the more educational way) using vectors.

<word_stk procedure definitions>+= (U->) [<-D->]

word word_stk::
top(void) const {
  assert(wdq.size());
  return wdq.back();
  }

bool word_stk::
top_type(word::word_type t) const {
  const unsigned s = wdq.size();
  assert(s);
  return wdq[s - 1].is_type(t);
  }

word word_stk::
pop(void) {
  assert(wdq.size());
  word w = wdq.back();
  wdq.pop_back();
  return w;
  }

void word_stk::
push(const word & w) {
  wdq.push_back(w);
  }

void word_stk::
push(const word_stk & wdq2) {
  for (unsigned i = 0; i < wdq2.size(); i++)
    wdq.push_back(wdq2.wdq[i]);
  }

Defines pop, push, top, top_type (links are to index).

Turn the word stack into an error word stack. This code is a bit dodgy because it assumes word() will interpret emsg as an error message. It always will for this program, but in general this is bad programming because the code should insure an error word gets created, rather than relying on the good behavior of the callers to error().

<word_stk procedure definitions>+= (U->) [<-D->]

void word_stk::
error(const string & emsg) {
  wdq.clear();
  wdq.push_back(word(emsg));
  }

Defines error (links are to index).

Word stack query functions.

<word_stk procedure definitions>+= (U->) [<-D]

bool word_stk::
is_error(void) const {
  return (size() == 1) && wdq[0].is_type(word::error_word);
  }

unsigned word_stk::
size(void) const {
  return wdq.size();
  }

Defines is_error, size (links are to index).

<word-stk.h>=

#ifndef _word_stk_h_defined_
#define _word_stk_h_defined_

#include <fstream>
#include <vector>
#include "word.h"

class word_stk {
  public:
    <word_stk public interface>
  private:
    std::vector<word> wdq;
  };

#endif

Defines word-stk.h (links are to index).

<word-stk.cc>=

#include "word-stk.h"
#include "str-utils.h"

using std::string;

<word_stk procedure definitions>

Defines word-stk.cc (links are to index).

Mappings

The optimizer uses a mapping to keep track of which variables are assigned constant values.

A mapping supports four operations:

Associate an integer with a variable.
Determine if a variable has an associated integer.
Return the integer associated with a variable.
Remove an variable-integer association.

All these operations can be performed by an STL map, but because we haven't talked about maps yet, we'll do it the hard way and make our own data structure.

<mapping public interface declarations>= (U->)

void associate(const word &, const word &);
bool has_association(const word &) const;
const word & get_association(const word &) const;
void drop_association(const word &);

An association is just a pair of words; and a mapping is a vector of associations.

<mapping private interface declarations>= (U->)

struct association { 
  word var, val;
  association(word var, word val) : var(var), val(val) { }
  };

std::vector<association> env;

int lookup(const word &) const;

Add the (var, val) association to the mapping; if var already is associated with an integer, val replaces the previous value.

<mapping procedure definitions>= (U->) [D->]

void mapping::
associate(const word & var, const word & val) {
  const int i = lookup(var);
  if (i < 0)
    env.push_back(association(var, val));
  else
    env[i].val = val;
  }

Defines associate (links are to index).

Return true if and only if var is associated with an integer in the mapping.

<mapping procedure definitions>+= (U->) [<-D->]

bool mapping::
has_association(const word & var) const {
  return lookup(var) != -1;
  }

Defines has_association (links are to index).

Return the integer associated with var in the mapping; die if var doesn't have an association.

<mapping procedure definitions>+= (U->) [<-D->]

const word & mapping::
get_association(const word & var) const {
  const int i = lookup(var);
  assert(i > -1);
  return env[i].val;
  }

Defines get_association (links are to index).

Drop the association involving var from the mapping; die if var doesn't have an association. This will be much easier to implement once we learn about iterators.

<mapping procedure definitions>+= (U->) [<-D->]

void mapping::
drop_association(const word & var) {
  const int i = lookup(var);
  assert(i > -1);
  env[i] = env[env.size() - 1];
  env.pop_back();
  }

Defines drop_association (links are to index).

Return the mapping index for the association involving var or -1 if there's no such association.

<mapping procedure definitions>+= (U->) [<-D]

int mapping::
lookup(const word & var) const {
  for (unsigned i = 0; i < env.size(); i++)
    if (env[i].var.prep == var.prep)
      return i;
  return -1;
  }

Defines lookup (links are to index).

<mapping.h>=

#ifndef _mapping_h_defined_
#define _mapping_h_defined_

#include <vector>
#include "word.h"

class mapping {

  public:

    <mapping public interface declarations>

  private:

    <mapping private interface declarations>
  };

#endif

Defines mapping.h (links are to index).

<mapping.cc>=

#include "mapping.h"

using std::string;

<mapping procedure definitions>

#ifdef MAPPING_TESTING

// g++ -o mapping-test -ansi -pedantic -Wall -DMAPPING_TESTING mapping.cc && mapping-test

int main() {
  mapping m;
  string var = "joe";
  int val = 1;
  assert(!m.has_association(var));
  m.associate(var, val);
  assert(m.has_association(var));
  assert(m.get_association(var) == val);
  m.drop_association(var);
  assert(!m.has_association(var));
  }

#endif

Defines mapping.cc (links are to index).

`main()`

<main.cc>=

#include <iostream>
#include <string>
#include "statement-block.h"

<main()>

Defines main, main.cc (links are to index).

String Utilities

A convenient definition of the string-index type.

<string utility prototype definitions>= (U->) [D->]

typedef std::string::size_type str_index;

Defines str_index (links are to index).

Return true if and only if the given string is blank; that is, contains only space characters. Empty lines are considered blank.

<string utility prototype definitions>+= (U->) [<-D->]

extern bool is_blank_line(const std::string &);

What space characters are.

<string utility prototype definitions>+= (U->) [<-D->]

extern const char * space_characters;

Split the given string into pieces at the given delimiter set (default is the space characters); only the non-empty pieces are returned.

<string utility prototype definitions>+= (U->) [<-D->]

extern std::vector<std::string> 
split(const std::string &, const char * = space_characters);

Return true if and only if the given string contains a possibly signed integer.

<string utility prototype definitions>+= (U->) [<-D->]

extern bool
is_integer(const std::string &);

Read the next non-blank line from the given input stream into the given string; return true if and only if the given string has a non-blank line.

<string utility prototype definitions>+= (U->) [<-D]

extern bool
read_next_nonblank_line(std::istream &, std::string &);

<string utility procedure definitions>= (U->)

bool is_blank_line(const string & s) {
  return s.find_first_not_of(space_characters) == string::npos;
  }


bool read_next_nonblank_line(std::istream & ins, string & s) {

  while (getline(ins, s))
    if (!is_blank_line(s)) return true;

  return false;
  }


vector<string> split(const string & s, const char * delimiters) {

  vector<string> pieces;

  str_index b = 0;
  const str_index sl = s.length();
  const string delims = delimiters;

  while (b < sl) {
    while ((b != sl) && (delims.find(s[b]) != string::npos)) b++;
    if (b == sl) break;

    str_index e = b + 1;
    while ((e != sl) && (delims.find(s[e]) == string::npos)) e++;
    pieces.push_back(s.substr(b, e - b));
    b = e + 1;
    }

  return pieces;
  }


bool is_integer(const string & s) {
  const char * digits = "012345679";
  return (s.find_last_not_of(digits) == string::npos) ||
         ((s[0] == '-') && (s.find_last_not_of(digits) == 0));
  }

Defines is_blank_line, is_integer, read_next_nonblank_line, split (links are to index).

<str-utils.h>=

#ifndef _str_utils_h_defined_
#define _str_utils_h_defined_

#include <string>
#include <vector>
#include <fstream>

<string utility prototype definitions>

#endif

Defines string-utils.h (links are to index).

<str-utils.cc>=

#include "str-utils.h"
#include <iostream>

using std::string;
using std::vector;

const char * space_characters = " \t\n\v";

<string utility procedure definitions>

#ifdef STR_UTILS_TESTING

// g++ -o str-utils-test -g -DSTR_UTILS_TESTING str-utils.cc && str-utils-test

#include <cassert>
#include <iostream>
#include <cstring>

int main() {
  assert(!is_blank_line("hello"));
  assert(is_blank_line("\n \v\t"));

  assert(split(" \n\t \v").empty());
  vector<string> svec = split(string("red white blue"), " ");
  assert(svec.size() == 3);
  assert(svec[0] == "red");
  assert(svec[1] == "white");
  assert(svec[2] == "blue");

  svec = split("red white blue", "e");
  assert(svec.size() == 3);
  assert(svec[0] == "r");
  assert(svec[1] == "d whit");
  assert(svec[2] == " blu");
  }

#endif

Defines str-utils.cc (links are to index).

Index

<evaluate a constant expression>: U1, D2
<internal statement procedure definitions>: D1, D2, D3, U4
<main()>: D1, U2
<main.cc>: D1
<mapping private interface declarations>: D1, U2
<mapping procedure definitions>: D1, D2, D3, D4, D5, U6
<mapping public interface declarations>: D1, U2
<mapping.cc>: D1
<mapping.h>: D1
<optimize some common arithmetic cases>: U1, D2
<statement block class definition>: D1, U2
<statement block procedure definitions>: D1, D2, D3, U4
<statement procedure definitions>: D1, D2, D3, D4, D5, D6, D7, D8, U9
<statement-block.cc>: D1
<statement-block.h>: D1
<statement.cc>: D1
<statement.h>: D1
<string utility procedure definitions>: D1, U2
<string utility prototype definitions>: D1, D2, D3, D4, D5, D6, U7
<str-utils.cc>: D1
<str-utils.h>: D1
<the statement class definition>: D1, U2
<word class definition>: D1, U2
<word procedure definitions>: D1, D2, D3, U4
<word.cc>: D1
<word.h>: D1
<word_stk procedure definitions>: D1, D2, D3, D4, U5
<word_stk public interface>: D1, D2, D3, D4, U5
<word-stk.cc>: D1
<word-stk.h>: D1

associate: U1, U2, D3, U4
drop_association: U1, U2, D3, U4
error: U1, U2, U3, U4, D5
get_association: U1, U2, D3, U4
has_association: U1, U2, U3, D4, U5
is_blank_line: U1, U2, D3, U4
is_error: U1, U2, U3, U4, U5, D6
is_integer: U1, U2, D3
is_type: U1, U2, U3, U4, U5, U6, D7, U8, U9
lookup: U1, U2, U3, U4, U5, D6
main: D1, U2, D3, U4
main.cc: D1
mapping.cc: D1
mapping.h: U1, D2, U3
optimize: U1, U2, U3, U4, U5, D6, D7, U8
optimize_expr: U1, U2, D3
output: U1, U2, U3, U4, D5
output_stmt: U1, U2, D3
parse_expr: U1, U2, D3, U4
parse_value: U1, U2, D3
pop: U1, U2, U3, U4, U5, U6, U7, D8
push: U1, U2, U3, U4, U5, U6, D7
read_next_nonblank_line: U1, U2, D3
size: U1, U2, U3, U4, U5, U6, U7, U8, U9, U10, U11, D12, U13, U14, U15
split: U1, U2, D3, U4
statement: U1, U2, U3, U4, U5, U6, D7, U8, U9, U10, U11, U12
statement_block: U1, D2, U3, U4, U5
statement-block.cc: D1
statement-block.h: U1, D2, U3
statement_block::input: D1
statement_block::optimize: D1
statement_block::output: D1
statement.cc: D1
statement.h: U1, D2
str_index: D1, U2
string-utils.h: D1
str-utils.cc: D1
top: U1, U2, U3, U4, U5, U6, D7
top_type: U1, U2, U3, U4, U5, U6, D7
word: U1, U2, U3, U4, U5, U6, U7, U8, U9, U10, U11, D12, D13, D14, U15, U16, U17, U18, U19, U20, U21, U22, U23, U24, U25, U26, U27, U28, U29, U30, U31, U32
word.cc: D1
word.h: D1, U2, U3, U4
word_stk: U1, U2, U3, U4, U5, U6, U7, U8, U9, U10, U11, U12, U13, D14, U15, U16, U17, U18
word-stk.cc: D1
word-stk.h: U1, D2, U3

This page last modified on 26 February 2002.