SE 598, Data Structures and Algorithms

STL Vector Programming Assignment - An Example Solution


Table of Contents

Introduction

As is always the case, it's advisable to sit down for ten minutes or so and think things through before going to the keyboard to bash out some code.

The two most prominent things in the problem statement are the mailbox reading program readmbox and the mailbox. These are not the only things mentioned in the problem statement - there are readmbox commands and messages, for example - but all the other things can be associated with either readmbox or the mailbox. In terms of abstraction, it seems that readmbox and the mailbox are the most abstract things, in terms of encompassing and hiding the most details, and so represent a good place to start the design of the system.

Because the problem statement defines the structure of mailboxes, there isn't much design we have to do with respect to the mailbox itself. readmbox, on the other hand, is defined largely in terms of what the user can do with it: attach it to a mailbox, get the number of messages in the mailbox, and print the contents of messages in the mailbox. That's about all the problem statement says about readmbox, so these actions will serve as the starting point of readmbox's design.

(As a software engineering aside, the previous paragraphs are a simple example of the object-oriented design technique called noun-verb analysis. As you move along your software engineering career, you'll probably run across many, many versions of noun-verb analysis.)

Of the three operations - attach, count, and print - the most important one seems to be attach; attach must proceed count and print, and once attach is designed, count and print should follow along directly. How should readmbox attach to the mailbox? There are two possibilities: readmbox treats the mailbox as an external object and manipulates it as a file, or readmbox reads the contents into the mailbox and manipulates an internal representation.

This problem doesn't dictate the choice of which way to attach to the mailbox, and there's nothing in the problem that makes one way better than the other. This is supposed to be an exercise in vector programming, so it seems more appropriate to read the mailbox into readmbox as some kind of vector.

But, enough of this - ten minutes are up. Let's write some code!

Main

readmbox main() is straightforward: initialize the mailbox and go interact with the user.
<main()>= (U->)

int main(int argc, char * argv[]) {

  if (argc != 2) {
    cerr << "Command format is  \"" << argv[0] << " mailbox\".\n";
    return EXIT_FAILURE;
    }

  mailbox mbox;
  const string resp = mbox.initialize(argv[1]);
  if (resp != "") {
    cerr << resp << ".\n";
    return EXIT_FAILURE;
    }

  do_io(mbox);

  return EXIT_SUCCESS;
  }

User interaction is also straightforward: prompt for a command, read it, validate it, and, if valid, execute it. "Execute" in this case means print the message associated with the input number.

<user interaction>= (U->)

static void do_io(const mailbox & mbox) {

  const int mcnt = mbox.msgcnt();

  if (mcnt == 1)
    cout << "There is 1 message.\n";
  else
    cout << "There are " << mcnt << " messages.\n";

  while (true) { 
    cout << "* ";
    string resp;
    cin >> resp;

    if (cin.eof() || resp == "q" || resp == "Q") break;

    if (!is_number(resp)) {
      cout << "Please reply with a message number or 'q' to quit.\n";
      continue;
      }
    int mno = atoi(resp.c_str());
    if ((mno < 1) || (mno > mcnt)) {
      cout << "Message number " << mno 
           << " is out of range; the valid range is [1, " << mcnt << "].\n";
      continue;
      }

    mbox.output(cout, mno);
    }
  }

A message number is just a string of digits, and is_number() checks to make sure. The is_not_digit() circumlocution is not necessary because the STL provides the unary predicate adaptor not1(), but we haven't got to function adopters yet.

<number test>= (U->)

static bool is_not_digit(char c) {
  return !isdigit(c);
  }


static bool is_number(const string & str) {
  return find_if(str.begin(), str.end(), is_not_digit) == str.end();
  }

That's it for the readmbox program itself.

<readmbox.cc>=

#include <iostream>
#include <string>
#include <algorithm>
#include <cctype>
#include <stdlib.h>
#include "mailbox-char.h"

<number test>

<user interaction>

<main()>

The Mailbox Class Interface

From the discussion of the main routine above, the mailbox's public interface must provide functions for at least three operations: initialize the mailbox from the contents of a mailbox file,
<public mailbox functions>= (U->) [D->]

string initialize(const char * mbox_name);

return the number of messages in the mailbox,

<public mailbox functions>+= (U->) [<-D->]

unsigned msgcnt(void) const;

and print the contents of a message in the mailbox.

<public mailbox functions>+= (U->) [<-D]

void output(ostream & out, unsigned mno) const;

The private parts of the mailbox object begin to reveal how the mailbox is represented within readmbox.

The contents of the external mailbox is represented as a vector of characters.

<private mailbox elements>= (U->) [D->]

typedef vector<char> Mailbox;
Mailbox mbox;

Messages within the mailbox character vector mbox are represented by a vector of iterators into mbox. Message i begins at the character in mbox referenced by message_starts[i] and ends just before (to the left of) the character in mbox referenced by message_starts[i + 1]. For simplicity and consistency, message_starts also includes an iterator to a phantom next message so the last message in the mailbox can be treated the same as all the other messages in the mailbox; this representation is analogous to the usual iterator range representation used by the STL.

The vector of mailbox iterators header_ends keeps track of where each message's header ends. The header for message i ends just before (immediately to the left of) the character in mbox referenced by header_ends[i]. The intention is to have everything from header_ends[i] up to the next message start be the message content, but there's a small problem with this intention: this range includes the blank line between header and content as part of the content, which isn't correct. The proper thing to do is have separate vectors indicating end of header and beginning of content; maybe later (there's a similar problem at the other end with the blank line before the start of the next message).

<private mailbox elements>+= (U->) [<-D->]

typedef Mailbox::iterator   Mailbox_itr;
typedef vector<Mailbox_itr> Mailbox_itrs;

Mailbox_itrs message_starts;
Mailbox_itrs header_ends;

There are also some private functions to read the external mailbox; find the message starts, and find the header endings.

<private mailbox elements>+= (U->) [<-D]

bool read(const char * mbox_name); 
void find_message_starts(void); 
bool find_header_ends(void);

And that's it for the mailbox class interface.

<mailbox-char.h>=

#ifndef _mailbox_char_h_defined_ 
#define _mailbox_char_h_defined_

#include <vector>
#include <string>

class mailbox {

  public:

    <public mailbox functions>

  private:

    <private mailbox elements>

  };

#endif

The Mailbox Class Implementation

Now all that's left is to implement the mailbox class functions.
<mailbox-char.cc>= [D->]

#include <algorithm>
#include <fstream>
#include "mailbox-char.h"


The public member function initialize() reads in the external mailbox file, and then find the message starts and header ends. If anything goes wrong, return a string describing the problem, otherwise return the empty string. The assert() statement provides a sanity check: if everything's o.k., the number of message starts should equal the number of header ends.

<mailbox-char.cc>+= [<-D->]

string mailbox::initialize(const char * mbox_name) {

  if (!read(mbox_name)) 
    return string("Can't open ") + mbox_name;

  find_message_starts();

  if (!find_header_ends()) 
    return string("Mailbox appears to be corrupted: message header not ") +
           string("followed by a blank line");

  assert(msgcnt() == header_ends.size());

  return string(""); 
  }

The msgcnt() public member function is simple, as long as you don't forget the phantom message start at the end.

<mailbox-char.cc>+= [<-D->]

unsigned mailbox::msgcnt(void) const { 
  return message_starts.size() - 1; 
  }

The output() public member function is also simple, but with a few twists. First, the message number mno is between 1 and n, inclusive, and so needs to be knocked one to the left to be a sensible vector index. Also, don't forget to skip the blank line separating the header from the content.

This code can also be written using STL's output stream iterators, but we haven't studied them yet.

<mailbox-char.cc>+= [<-D->]

void mailbox::output(ostream & out, unsigned mno) const {
  assert((1 <= mno) && (mno <= msgcnt()));
  for (Mailbox_itr i = header_ends[mno - 1] + 1; i < message_starts[mno]; i++)
    out << *i;
  }

Reading the external mailbox is simple, but for a little trick: starting off mbox with a newline character. This trick simplifies some code in find_message_starts().

This code can be streamlined by using the STL's input-stream iterators, but we haven't studied them yet.

<mailbox-char.cc>+= [<-D->]

bool mailbox::read(const char * mbox_name) {

  ifstream mbox_file(mbox_name);
  if (!mbox_file)
    return false;

  mbox.push_back('\n');
  char c;
  while ((c = mbox_file.get()) != EOF)
    mbox.push_back(c);

  return true;
  }

Finding the message starting points is simple, thanks to the little trick in read(), which eliminates the need to treat the first occurence of "From " in the mailbox as a special case. Note that end is pushed into message_starts as the phantom message.

<mailbox-char.cc>+= [<-D->]

void mailbox::find_message_starts(void) {

  const char * marker = "\nFrom ";
  const unsigned mlen = strlen(marker);

  Mailbox_itr start = mbox.begin();
  const Mailbox_itr end = mbox.end();

  assert(mbox.size() > 0);

  while (true) {
    start = search(start, end, marker, marker + mlen);
    if (start == end) break;
    message_starts.push_back(start);
    start += mlen;
    }
  message_starts.push_back(end);
  }

Finding the end of a header works pretty much like finding the beginning of a message, except with different search text and a different end indicator. One difference is that RFC 822 requires that every header end in an blank line, even if the message has no content; failure to find the blank line indicates a trashed mailbox.

<mailbox-char.cc>+= [<-D]

bool mailbox::find_header_ends(void) {

  const char * marker = "\n\n";
  const int mlen = strlen(marker);

  for (unsigned i = 0; i < message_starts.size() - 1; i++) {
    Mailbox_itr j = search(message_starts[i], message_starts[i + 1],
                           marker, marker + mlen);
    if (j == message_starts[i + 1])
      return false;
    header_ends.push_back(j + 1);
    }

  return true;
  }

Index


This page last modified on 3 July 2000.