The problem is to design an algorithm that reads from an input stream all bytes upto and including a delimiter character and returns the bytes read in a memory block along with the number of bytes read. For concreteness, the prototype of the algorithm is
char * read_data(ifstream & ins, char delimiter, int & data_cnt);
data_cnt
contains the number of bytes read.
The point that makes the problem even slightly interesting is the fact that the total amount of data to be read is unknown. Ignoring, for a moment, this inconvenient fact, the reading algorithm is straightforward:
data_cnt = 0 while true // make sure there's enough space in data data[data_cnt] = ins.get() if !ins.good() return data if data[data_cnt++] == delimiter return data
get()
. I've chosen to fold the
eof condition in with the other errors and treat them all the same: stop
reading and return however much data's been read. This is an arguable
decision, but is a minor issue when compared to the unknown data size; in
addition, most other approaches for dealing with the eof condition are easily
(but perhaps not simply) handled in this design.
Now for the space question. When data storage is full, a new, larger piece of data storage must be allocated, the contents of the old storage must be copied to the new storage, and the old storage must be deallocated. This too has a simple design:
if data_cnt >= data_capacity new_data_capacity = an increased value of data_capacity new_data = new char[new_data_capacity] memcpy(new_data, data, data_capacity) delete [] data_capacity data_capacity = new_data_capacity
data_capacity
indicates the maximum number of bytes that can be stored in
the memory block pointed to by data
; memcpy()
is a quasi-standard
Unix way of copying bytes from one place to another.
The entire design has been reduced to the question of how to increase the value of data_capacity; this corresponds to determining the size of the new data block. The simplest approach to take is to increase it by one each trip through the loop:
new_data_capacity = data_capacity + 1
One way to deal with time inefficiency is to increase data_capacity
by
some value larger than 1:
new_data_capacity = data_capacity + capacity_increase
capacity_increase
times during a call to read_data()
, where
B is the total number of bytes read. On the down side, there's the
potential to waste up to capacity_increase
bytes. In general, it's better
to conserve time over space because it's easy and cheap to make more space (by
adding more memory chips or disk) but hard and expensive to make more time (by
adding a faster CPU and bus).
This page last modified on 27 August 2001.