Programming Assignment 3 - Pattern-Directed Peephole Optimization

Advanced Programming II, Spring 2001


Table of Contents

Due Date

This assignment is due on Tuesday, 27 February, no later than 2:00 p.m.

See the assignment turn-in page for instructions on turning in your assignment.

Background

compiler structure A compiler translates source code into object code, but only the simplest compilers for simple languages do the translation in one step. More common is a multi-step translation in which the compiler translates the source code into an intermediate code, and then translates the intermediate code into the object code; see the figure to the left.

There are a number of reasons for preferring a multi-step translation over a single step one. Intermediate code helps to decouple the parts of a compiler, making it easier to break the compiler into modules and simplifying the task of each module. Multi-step translation also improves compiler portability; only the code generator needs to be rewritten to move the compiler to a new system.

A compiler may process intermediate code in many ways; this assignment is interested in optimizing intermediate code. The optimization process takes intermediate code that is deficient in some way - usually its either too big or too slow, or both - and improves it by reducing or eliminating the deficiencies - by, for example, making the intermediate code smaller and faster.

In particular, this assignment is interested in peephole optimization, so named because it looks at and optimizes only small pieces of the overall intermediate code, much as if the optimizer was looking through a peephole at the intermediate code.

The intermediate code used in this assignment has a simple form:

instruction:memory = value [ op value ]
memory:m [ value ]
value:memory / constant
constant:[ - ] number
number:digit / digit number
digit:0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
op:+ / - / * / / / << / . . .
where [ a ] means a is optional and a | b means either a or b. Each instruction appears in one line of text (that is, instructions are not split across lines of text) and only one instruction appears per line of text. Here's a sample of intermediate code:
m[1] = 3
m[2] = m[m[1]]
m[2] = m[2] + m[3]

The Problem

This assignment involves implementing a pattern-directed peephole optimizer for intermediate code. Your optimizer should read a set of patterns from a file and then apply those patterns to an intermediate-code program, optimizing the program as much as possible.

Patterns

A pattern has the following structure
pattern text
->
pattern text
The pattern text above the arrow is known as the match text; the pattern text below the arrow is known as the replacement text.

Pattern text looks just like intermediate code with the addition of pattern variables, which have the form

$number
$0, $1, and $349 are examples of pattern variables.

Pattern variables may replace any component of intermediate code except for the components of a constant (a pattern variable may, however, replace the whole constant). Here are some examples of pattern text containing pattern variables and the components the pattern variables are replacing:

$1instruction
$2 = m[3] + 3memory
m[$1] = m[$2] + 3value
m[3] = m[4] $1 3op
A pattern is well defined if every pattern variable appearing in the replacement text also appears at least once in the match text; A pattern is undefined if a pattern variable appears in the replacement text but not in the match text. The pattern
m[$1] = m[$2] + $3
->
m[$1] = $3 + m[$2]
is well defined, because every pattern variable in the replacement text also appears in the match text, while the pattern
m[$1] = m[$2] + $3
->
m[$1] = $4 + m[$2]
is undefined because the pattern variable $4 appears in the replacement text but not in the match text. A pattern variable appearing in the match text need not appear in the replacement text.

Pattern Files

The patterns are stored in a text file called the pattern file. Patterns appear one after the other in the file; consecutive patterns are separated by at least one blank line (that is, a line containing only whitespace and ending in a newline). Any number of blank lines may appear before the first pattern and after the last pattern in the file. A pattern file may be empty.

A pattern file may contain comments, which are indicated by a sharp character (#) at the start of line. The entire line containing a comment is ignored. Comments will only appear between patterns; they are never part of a pattern.

You can find example pattern files in /export/opt/cs-509/pa3.

Pattern Matching

The optimizer works by matching patterns against the intermediate code. A pattern matches against intermediate code when the pattern's match text matches some part of the intermediate code according to the following rules

  1. Each line of match text matches a line of intermediate code. Consecutive lines of match text must match consecutive lines of intermediate code. For example, the pattern text
    m[1] = 0
    m[2] = m[1] + $1
    
    can never match the intermediate code instruction
    m[1] = 0
    
    because the pattern text needs to match two lines of intermediate code, and there's only one line to match.

  2. Non-pattern variable parts of the match text must match exactly - that is, character by character - the intermediate code. For example, the pattern text
    m[m[$1]] = 0
    
    does not match the intermediate-code instruction
    m[1] = 0
    
    because the instruction does not contain a match for the second m[ in the pattern text.

  3. A pattern variable matches all of the corresponding text in the intermediate code; the matched text is said to be bound to the pattern variable.

  4. White space is ignored when matching, as is case. For example, the pattern text
    m[$1] = m[$2] + 0
    
    matches the intermediate code instruction
    m[m[3]]=m[2]+0
    
    binding m[3] to $1 and 2 to $2; the pattern text also matches the intermediate code instruction
    m  [  m  [  3  ]  ]  =  m  [  2  ]  +  0
    
    with the same pattern-variable bindings as before, and the pattern text also matches the intermediate code instruction
    M  [  M  [  3  ]  ]  =  M  [  2  ]  +  0
    
    with the same pattern variable bindings.

  5. Once a pattern variable is bound to matched text, other occurrences of the same pattern variable in the match text must match the same text.

When a pattern is matched, the optimizer deletes the intermediate instructions matched by the pattern and inserts the pattern's match text with the pattern variables in the replacement text replaced with their bound text. For example, the pattern

$1 = $2 + 0
->
$1 = $2
matches the intermediate code
m[1]=m[3]+0
with the pattern variable $1 bound to m[1] and $2 bound to m[3]. Matched intermediate code is deleted and the matched text
m[1]=m[3]
is inserted in its place.

The optimizer repeatedly applies all patterns to all the intermediate code and stops when it can find no more matches.

Optimizer Operation

Your optimizer should accept the name of a pattern file on the command line and then read intermediate code from std-in; optimized intermediate code should be written to std-out. Only those patterns given in the pattern file should be applied to the intermediate code.

You can find example pattern files in /export/opt/cs-509/pa3.


This page last modified on 13 February 2001.