CS 176, Introduction to Computer Science II

Summer 2001 - An Answer for Design Test 4


The problem is to design an algorithm that reads from an input stream all bytes upto and including a delimiter character and returns the bytes read in a memory block along with the number of bytes read. For concreteness, the prototype of the algorithm is

char * read_data(ifstream & ins, char delimiter, int & data_cnt);
where the return value is a pointer to the memory block and data_cnt contains the number of bytes read.

The point that makes the problem even slightly interesting is the fact that the total amount of data to be read is unknown. Ignoring, for a moment, this inconvenient fact, the reading algorithm is straightforward:

data_cnt = 0
while true

  // make sure there's enough space in data

  data[data_cnt] = ins.get()
  if !ins.good()
    return data

  if data[data_cnt++] == delimiter
    return data
This is a simple, "obviously correct" design, if perhaps a little inefficient due to the repeated calls to get(). I've chosen to fold the eof condition in with the other errors and treat them all the same: stop reading and return however much data's been read. This is an arguable decision, but is a minor issue when compared to the unknown data size; in addition, most other approaches for dealing with the eof condition are easily (but perhaps not simply) handled in this design.

Now for the space question. When data storage is full, a new, larger piece of data storage must be allocated, the contents of the old storage must be copied to the new storage, and the old storage must be deallocated. This too has a simple design:

if data_cnt >= data_capacity
  new_data_capacity = an increased value of data_capacity
  new_data = new char[new_data_capacity]

  memcpy(new_data, data, data_capacity)

  delete [] data_capacity
  data_capacity = new_data_capacity
data_capacity indicates the maximum number of bytes that can be stored in the memory block pointed to by data; memcpy() is a quasi-standard Unix way of copying bytes from one place to another.

The entire design has been reduced to the question of how to increase the value of data_capacity; this corresponds to determining the size of the new data block. The simplest approach to take is to increase it by one each trip through the loop:

new_data_capacity = data_capacity + 1
This design has the great virtue of being space efficient because new space isn't allocated unless it's needed (not quite, because no space is needed for eof but it's allocated anyway). Unfortunately, the design also has the great disadvantage of being time inefficient because each loop iteration allocates a new block, copies data from old to new blocks, and deallocates the old block. If you think about reading a couple of million bytes the problems here should be clear.

One way to deal with time inefficiency is to increase data_capacity by some value larger than 1:

new_data_capacity = data_capacity + capacity_increase
Now data are copied (and allocation-deallocation performed) approximately B/capacity_increase times during a call to read_data(), where B is the total number of bytes read. On the down side, there's the potential to waste up to capacity_increase bytes. In general, it's better to conserve time over space because it's easy and cheap to make more space (by adding more memory chips or disk) but hard and expensive to make more time (by adding a faster CPU and bus).


This page last modified on 27 August 2001.