This problem involves keeping track of statistics on a file. Do the easiest part first - open a file and then close it:
ifstream ins(filename) if !ins couldn't open filename exit // process the file ins.close()
Now to process the file. The problem statement describes the statistics to be collected mostly in terms of lines (the largest, the smallest, and the total number), the remaining statistic (the total number of characters) can be determined from the lines.
The problem's orientation towards lines suggests that handling the input in
terms of lines might be a good approach to take. Using wishful thinking, the
algorithm designer's most powerful design tool, let's assume we know how to set
line_size
to the number of characters in the current input line, or to -1
if there's no more lines. The file processing algorithm follows immediately:
// process the file min_line_size = a big number max_line_size = 0 line_count = 0 char_count = 0 while true // find the line size if line_size < 0 break char_count += line_size line_count++ if min_line_size > line_size min_line_size = line_size if max_line_size < line_size max_line_size = line_size
// find the line size line_size = 0 while true if (ch = ins.get()) == EOF line_size = -1 break else if !ins.good() an error on read exit else line_size++ if ch == newline break
This end the design of the algorithm, or it would if the design was correct. Unfortunately, this design has an error; what is it?
The problem occurs in keeping track of the number of characters in the file. A line that doesn't end in a newline may not be a line, but it does contain characters which should be counted as part of the file.
Once the error is pointed out, it's easily fixed by adding the current line
size to the character count before wiping out the line size with -1
:
line_size = 0 while true if (ch = ins.get()) == EOF char_cnt += line_size line_size = -1 break else if !ins.good() an error on read exit else line_size++ if ch == newline break
With this correction we're done, but it's worth while taking a look at how
strings might help solve this problem. The most likely reason to use strings
is to simplify reading in a line, which can be done with one call to
getline()
:
getline(ins, str) line_size = str.length() if (ch = ins.get()) == EOF char_cnt += line_size line_size = -1 else if !ins.good() an error on read exit
getline()
does simplify the design by eliminating the while loop, but this
design has an error. What is the error?
The problem with getline()
is that it throws away the newline rather than
storing it in the string. The line size is off by one because it doesn't
include the newline. "Well", you may be saying to yourself,
"that's easy enough to fix, just add one to the line size to compensate
for the missing newline:"
getline(ins, str) line_size = str.length() + 1 if (ch = ins.get()) == EOF char_cnt += line_size line_size = -1 else if !ins.good() an error on read exit
Automatically adding one to the string size may over compensate because the last line in the file need not end in a newline. If the last line doesn't end in a newline, adding one produces incorrect statistics.
Fixing this new problem requires a deeper understanding of how getline()
works. getline()
expects each string to end in a newline. A string that
doesn't end in a newline is not formatted correctly, and getline()
produces a format error when it tries to read such a string. If getline()
produces a format error, then good()
will return false because an error
occurred. This suggests that line_size
should only be increased by one
when no format error occurred:
getline(ins, str) line_size = str.length() if ins.good() line_size++ if (ch = ins.get()) == EOF char_cnt += line_size line_size = -1 else if !ins.good() an error on read exit
good()
can return false
for a number of reasons (including eof), not only because a string didn't end
in a newline.
Although it is possible to fix-up the design so it works correctly with strings, we haven't covered the parts of C++ necessary to make the fix. Rather than battle on, we'll let the problem drop here. However, there are two important lessons you should take away from this attempt:
This page last modified on 27 July 2001.