CS 176, Introduction to Computer Science II

Lab Assignment 2 Answer


There were many problems with this lab assignment, starting with the most obvious one, found by any test set starting with white-space characters:

$ la2
Input a string:  **hello world!
hehello world!
Input a string:  
(asterisks indicate leading white-space characters). That's a problem, but what exactly is the problem? What appears to be going wrong? Fooling around with the leading white-space characters gives a clue:
$ la2
Input a string:  *hello world!
hhello world!
Input a string:  **hello world!
hehello world!
Input a string:  ***hello world!
helhelloworld!
Input a string:  ****hello world!
hellhello world!
Input a string:  
It seems as if, rather than shifting down the whole input string to cover the leading white space, only the first n characters of the string are being shifted down, where n is the number of leading white-space characters. It appears the problem is in the trim() routine, and in particular, in the second loop:
static string trim(string s) {

  // Return a copy of s having all initial white space stripped.

  // Find the first non-white-space character in s.

     int start;
     for (start = 0; is_whitespace(s[start]); start++) { }
     
  // Slide the non-whitespace characters in s down over the initial white-space
  // characters.

     for (int i = 0; i < start; i++)
       s[i] = s[start + i];

  return s;
  }
Sure enough, the index of the second for loop ranges from 0 to start - 1, which are the indices of the white-space characters. The index needs to range over the non-white-space characters; that is from start to s.length() - 1:
// Slide the non-whitespace characters in s down over the initial white-space
// characters.

   for (int i = start; i < s.length(); i++)
     s[i - start] = s[start];
Running the test case through the modified code gives
$ la2ans
Input a string:  *hello world!
hello world!!
Input a string:  **hello world!
hello world!d!
Input a string:  ***hello world!
hello world!ld!
Input a string:  
Oops. The problem's been fixed by moving it from the beginning of the string to the end of the string. If the strings were character arrays, fixing this new problem is simple: just copy the null byte too. However, because strings aren't teminated with null bytes, more work is required.

One possibility is to blank out the extra characters after the move:

// Blank out left over characters.

   for (int i = s.length() - start; i < s.length(); i++)
     s[i] = ' ';
This works
$ la2
Input a string:  *hello world!
hello world! 
Input a string:  **hello world!
hello world!  
Input a string:  ***hello world!
hello world!   
Input a string:  
but isn't a particularly good solution for a copule of reasons. First, the documentation for trim() mentions stripping off leading white-space characters; it doesn't say anything about adding white-space characters to the end of the string. This point can be made clearer by changing the output statement to double quotes around the program's output
cout << '"' << trim(s) << "\"\n";
This results in
$ la2
Input a string:  *hello world!
"hello world! "
Input a string:  **hello world!
"hello world!  "
Input a string:  ***hello world!
"hello world!   "
Input a string:  

Second, calling trim() with a string containing leading white space should return a string smaller than the original string because the returned string doesn't contain leading white space. trim() always returns a string with the same size as the input string, independent of wether or not the input string has leading white-space characters. Writing code that behaves counter to reasonable expectations is usually a bad idea.

One way to solve this problem is to copy the non-white-space characters in s to a new string; the new string will contain no leading white space and will be smaller than the input string s:

  // Copy the non-whitespace characters to a new string.

     string new_s(s.length() - start, ' ');
     for (int i = start; i < s.length(); i++)
       new_s.at(i - start) = s.at(i);

  return new_s;
which gives
$ la2
Input a string:  *hello world!
"hello world!"
Input a string:  **hello world!
"hello world!"
Input a string:  ***hello world!
"hello world!"
Input a string:  
The new string is given a size equal to the non-white-space portion of the s and is initialized to all blanks. It would be an error to forget to create enough space in the new string; in particular, replacing the declaration
string new_s(s.length() - start, ' ');
with the declaration
string new_s;
would wreak havoc.

There are a couple of more straightforward ways of copying from one string to another, descriptions of which you can find in Chapter 19 of Deitel and Deitel.

One test set, one error, one fix. What's next? The first test set contained initial white space; let's try a test set with no initial white space:

$ la2
Input a string:  hello world!
"hello world!"
Input a string:  red   white   and    blue
"red   white   and    blue"
Input a string:  ! @ # $ % ^ & * ( ) _ + | ~
"! @ # $ % ^ & * ( ) _ + | ~"
Input a string:
That looks ok. The first two test sets contained non-white-space characters; let's try a test set that doesn't contain non-white-space characters, that is, that contains only white-space characters:
$ la2ans
Input a string:  *
Abort(coredump)

$
Oops (astrisks indicate leading white-space characters). Let's try that again.
$ la2ans
Input a string:  ***a
"a"
Input a string:  ***
Abort(coredump)

$ 
It seems lines containing only white-space characters are causing trouble. Because the first loop in trim() deals with white-space characters, that might be a good place to start looking:
// Find the first non-white-space character in s.

   int start;
   for (start = 0; is_whitespace(s[start]); start++) { }
A little staring reveals the problem: there's no check for start running off the end of the string (remember, string accesses via square brackes are unchecked). Once recognized, the problem is easy to fix
for (start = 0; (start < s.length()) && is_whitespace(s[start]); start++) { }
as verified by the third test set:
$ la2ans
Input a string:  ***a
"a"
Input a string:  ***
""
Input a string:  


This page last modified on 30 May 2001.