SE 598, Data Structures and Algorithms

STL List Programming Assignment - An Example Solution


Table of Contents

Introduction

This assignment adds another new command, and also changes the internal representation of messages.

The new command is the delete command. At first glance the delete command is similar in appearance to the order command, at least as far as the readmbox user can tell. In fact, the similarities between the delete and order commands go deep, allowing them to share a common implementation.

The internal change to messages involves a new kind of header field. Unlike adding a new command, which changes both the main program and the mailbox implementation, extending the message representation changes only the mailbox implementation.

Main

readmbox main() initializes the mailbox and then interacts with the user.
<main()>= (U->)

int main(int argc, char * argv[]) {

  if (argc != 2) {
    cerr << "Command format is  \"" << argv[0] << " mailbox\".\n";
    return EXIT_FAILURE;
    }

  mailbox mbox;
  const string resp = mbox.initialize(argv[1]);
  if (resp != "") {
    cerr << resp << ".\n";
    return EXIT_FAILURE;
    }

  do_io(mbox);

  return EXIT_SUCCESS;
  }
Defines main (links are to index).

The dequeue example solution pointed out that do_io() was getting large and shaggy. Because this assignment adds more code to do_io(), making it even larger and shaggier, it's time to do some clean-up.

<user interaction>= (U->) [D->]

typedef deque<string> command;

static bool get_command(command & cmd) {

  do {
    cout << "* ";
    string resp;
    getline(cin, resp);
    if (cin.eof()) 
      return false;

    cmd.erase(cmd.begin(), cmd.end());

    parse_cmd(resp, cmd);
    }
  while (cmd.size() == 0);

  return true;
  }
Defines command, get_command (links are to index).

Check an order command and, if syntactically correct, do the right thing; otherwise, print an error message.

<user interaction>+= (U->) [<-D->]

static void do_order(const command & cmd, mailbox & mbox) {

  if (cmd.size() == 0)
    mbox.restore_original_order();
  else if (cmd.size() == 2)
    mbox.order(cmd[0], cmd[1]);
  else
    cout << "Garbled command format for order.  Proper command format "
         << "is either \"order field word\" or \"order\".\n";
  }
Defines do_order (links are to index).

do_delete() is just a thinly veiled version of do_order().

<user interaction>+= (U->) [<-D->]

static void do_delete(const command & cmd, mailbox & mbox) {

  if (cmd.size() == 0)
    mbox.restore_deleted_msgs();
  else if (cmd.size() == 2)
    mbox.delete_msgs(cmd[0], cmd[1]);
  else
    cout << "Garbled command format for delete.  Proper command format "
         << "is either \"delete field word\" or \"delete\".\n";
  }
Defines do_delete (links are to index).

Print the contents of the requested message resp if the request is for an existing message; otherwise, print an error message.

<user interaction>+= (U->) [<-D->]

static void do_print(const string & resp, const mailbox & mbox) {

  assert(is_number(resp));
  const int mno = atoi(resp.c_str());
  const unsigned mcnt = mbox.msgcnt();

  if ((mno < 1) || (mno > mcnt)) {
    cout << "Message number " << mno << " is out of range; the ";
    if (mcnt == 0)
      cout << "message list is empty";
    else
      cout << "valid range is [1, " << mcnt << "]";
    cout << ".\n";
    }
  else
    mbox.output(cout, mno);
  }
Defines do_print (links are to index).

User interaction reads a string from the user, parses it, and executes a command based on the first word input from the user.

<user interaction>+= (U->) [<-D]

static void do_io(mailbox & mbox) {

  const int mcnt = mbox.msgcnt();

  if (mcnt == 1)
    cout << "There is 1 message.\n";
  else
    cout << "There are " << mcnt << " messages.\n";

  if (mcnt > 0) {
    command cmd;
    while (get_command(cmd)) { 
      command::value_type resp = cmd[0];
      cmd.pop_front();

      if (is_number(resp))
        do_print(resp, mbox);

      else if ((resp[0] == 'q') || (resp[0] == 'Q'))
        break;

      else if (resp == "order")
        do_order(cmd, mbox);

      else if (resp == "delete")
        do_delete(cmd, mbox);

      else {
        cout << "\""<< resp << "\" is an unknown command.  ";
        cout << "Known commands are\n";
        cout << "  n      - print message n.\n";
        cout << "  quit   - end the program.\n";
        cout << "  order  - re-order the message list.\n";
        cout << "  delete - delete messages from the message list.\n";
        }
      }
    }
  }
Defines do_io (links are to index).

Command parsing accepts a string input from the user and returns a dequeue of the words in the user's input string. In this case a word is a sequence of non-white-space characters.

Parsing itself skips over white space, which finds the start of a word, and then skips to white space, which finds the end of the word.

<parse user command>= (U->)

<white-space predicates>

static void parse_cmd(const string & str, deque<string> & cmd) {

  cmd.erase(cmd.begin(), cmd.end());

  string::const_iterator start = str.begin();
  const string::const_iterator end = str.end();

  while (start != end) {
    start = find_if(start, end, is_nonwhite_space);
    if (start == end) break;
    const string::const_iterator e = find_if(start, end, is_white_space);
    cmd.push_back(string(start, e));
    start = e;
    }
  }
Defines parse_cmd (links are to index).

The white-space and non-white-space predicates for find_if().

<white-space predicates>= (<-U)

static bool is_white_space(char ch) {
  return ((ch == ' ') || (ch == '\t'));
  }

static bool is_nonwhite_space(char ch) {
  return !is_white_space(ch);
  }
Defines is_nonwhite_space, is_white_space (links are to index).

A message number is just a string of digits, and is_number() checks to make sure. The is_not_digit() circumlocution is not necessary because the STL provides the unary predicate adaptor not1(), but we haven't got to function adopters yet.

<number test>= (U->)

static bool is_not_digit(char c) {
  return !isdigit(c);
  }


static bool is_number(const string & str) {
  return find_if(str.begin(), str.end(), is_not_digit) == str.end();
  }
Defines is_not_digit, is_number (links are to index).

That's it for the readmbox program itself.

<readmbox.cc>=

#include <iostream.h>
#include <string>
#include <deque>
#include <algorithm>
#include <ctype.h>
#include <stdlib.h>
#include <assert.h>
#include "mailbox-string.h"

<number test>

<parse user command>

<user interaction>

<main()>

The Mailbox Class Interface

The mailbox interface keeps the three public methods it had in the previous assignment.
<public mailbox methods>= (U->) [D->]

string initialize(const char * mbox_name);
unsigned msgcnt(void) const;
void output(ostream & out, unsigned mno) const;

void order(const string & field, const string & word);
void restore_original_order(void);

The delete commands have the same interface as the order commands.

<public mailbox methods>+= (U->) [<-D]

void delete_msgs(const string & field, const string & word);
void restore_deleted_msgs(void);

The private parts of the mailbox object begin to reveal how the mailbox is represented within readmbox.

The mailbox text is stored internally as a vector of strings.

<private mailbox elements>= (U->) [D->]

typedef vector<string> Mailbox;
typedef Mailbox::iterator Mailbox_itr;
typedef Mailbox::const_iterator Mailbox_citr;

Mailbox mbox;

Defines Mailbox, Mailbox_citr, Mailbox_itr (links are to index).

This assignment has changed messages by allowing folded header fields. One way to deal with this change is to preprocess mbox to unfold them. This is probably the best approach to take: it's simple and it eliminates the need to change any other code. However, rather than do the smart thing, we'll do something else to keep it interesting.

Folded header fields mean that a header field may be spread out over more than one line in mbox. This suggest representing each unfolded header field as an iterator range (s, e) where s points to the first mbox line in the header field and e points to the first mbox line after the header field.

<private mailbox elements>+= (U->) [<-D->]

typedef pair<Mailbox_itr, Mailbox_itr> Header_field;
typedef vector<Header_field>           Header;
typedef Header::iterator               Header_itr;

struct Message {
  Header header;
  Mailbox_itr content_start;    // iterator to the first line of the content
  Mailbox_itr content_end;      // iterator to the first line past the content
  unsigned original_order;
  bool matching;
  };
Defines Header, Header_field, Header_itr, Message (links are to index).

Msg_starts is an auxiliary data structure helpful when parsing mbox.

<private mailbox elements>+= (U->) [<-D->]

typedef vector<Mailbox_itr> Msg_starts;
Defines Msg_starts (links are to index).

Representing the message list as a list instead of a vector makes it easier and faster to move groups of messages into and out of the message list (at the cost of eliminating direct access to individual messages).

<private mailbox elements>+= (U->) [<-D->]

typedef list<Message> Message_list;
typedef Message_list::iterator Msglst_itr;
typedef Message_list::const_iterator Msglst_citr;

Message_list msg_list, deleted_msgs;

Defines Message_list, Msglst_citr, Msglst_itr (links are to index).

The private functions to set up the mailbox and message list.

<private mailbox elements>+= (U->) [<-D->]

bool read(const char * mbox_name); 
string find_message_starts(Msg_starts &); 
Msglst_itr partition_matches(const string & field, const string & word);

These funky private functions are needed to get around some scope C++ restrictions and to serve as functional parameters for STL generic algorithms.

<private mailbox elements>+= (U->) [<-D->]

static bool original_msg_order(const Message &, const Message &);
static bool reorder_msg(const Message &);
static void search_header(Message &, const string &, const string &);
static bool search_header_field(
  const Header_field &, const string &, const string &);

<private mailbox elements>+= (U->) [<-D]

string parse_mailbox(void);
string parse_message(Mailbox_itr &, Mailbox_itr &, Message &);
string parse_header(Mailbox_itr, Mailbox_itr, Header & );

And that's it for the mailbox class interface.

<mailbox-string.h>=

#ifndef _mailbox_char_h_defined_ 
#define _mailbox_char_h_defined_

#include <list>
#include <vector>
#include <string>

class mailbox {

  public:

    <public mailbox methods>

  private:

    <private mailbox elements>

  };

#endif
Defines mailbox (links are to index).

The Mailbox Class Implementation

Now all that's left is to implement the mailbox class functions.
<mailbox-string.cc>= [D->]

#include <algorithm>
#include <fstream.h>
#include <assert.h>
#include <strings.h>
#include <iterator>
#include "mailbox-string.h"

The public member function initialize() reads in the external mailbox file, and then find the message starts and header ends. If anything goes wrong, return a string describing the problem, otherwise return the empty string.

<mailbox-string.cc>+= [<-D->]

string mailbox::initialize(const char * mbox_name) {

  if (!read(mbox_name)) 
    return string("Can't open ") + mbox_name;

  return parse_mailbox();
  }
Defines initialize (links are to index).

The msgcnt() public member function returns the number of messages in the message list.

<mailbox-string.cc>+= [<-D->]

unsigned mailbox::msgcnt(void) const { 
  return msg_list.size(); 
  }
Defines msgcnt (links are to index).

Print the content for message mno; don't forget to move mno one to the left so it's a sensible vector index.

<mailbox-string.cc>+= [<-D->]

void mailbox::output(ostream & out, unsigned mno) const {

  assert((1 <= mno) && (mno <= msgcnt()));

  Msglst_citr mp = msg_list.begin();
  while (mno-- > 1) mp++;

  copy(mp->content_start, mp->content_end, ostream_iterator<string>(out, "\n"));
  }
Defines output (links are to index).

Read into the mailbox mbox the contents of the file named mbox_name. Return true if everything went ok, otherwise return false. mbox ends with a dummy message to simplify some of the code later on.

<mailbox-string.cc>+= [<-D->]

bool mailbox::read(const char * mbox_name) {

  ifstream mbox_file(mbox_name);
  if (!mbox_file) return false;

  string s;
  while (getline(mbox_file, s))
    mbox.push_back(s);

  mbox.push_back("");
  mbox.push_back("From ");
  mbox.push_back("");

  return true;
  }
Defines read (links are to index).

Parse the mailbox mbox to find the messages. First, locate the starts of all the messages (include a dummy message start at the end to make the next part easier). Second, parse each message and store the message information in the message list.

parse_mailbox() returns the empty string if everything went ok, otherwise, it returns a string describing the problem.

<mailbox-string.cc>+= [<-D->]

string mailbox::parse_mailbox(void) {

  Msg_starts starts;
  string resp = find_message_starts(starts);
  if (resp != "") return resp;

  const Msg_starts::const_iterator end = starts.end() - 1;
  unsigned msgcnt = 0;

  for (Msg_starts::iterator i = starts.begin(); i != end; i++) {
    Message msg;
    resp = parse_message(*i, *(i + 1), msg);
    if (resp != "") return resp;
    msg.original_order = msgcnt++;
    msg_list.push_back(msg);
    }

  return ""; 
  }
Defines parse_mailbox (links are to index).

find_message_starts() stores into starts iterators pointing to the first line of each message in mbox. The first line of a message is the line immediately after the line containing the start-of-message marker "From ". According to RFC 822, every message should contain at least one text-line.

<mailbox-string.cc>+= [<-D->]

static bool is_from(const string str) {
  return (str.find("From ") == 0);
  }


string mailbox::find_message_starts(Msg_starts & starts) {

  Mailbox_itr start = mbox.begin();
  const Mailbox_itr end = mbox.end();

  while (true) {
    start = find_if(start, end, is_from);
    if (start == end) break;
    starts.push_back(++start);
    if (start == end)
      return "Corrupted mailbox:  the start-of-message marker is "
             "not followed by a header line";
    }

  return "";
  }
Defines find_message_starts, is_from (links are to index).

Parse the message found in the mailbox iterator range (s, e), and store the parsed message information in msg.

The second iterator i in the iterator range indicating the last header field points to the blank line separating the header from the content; the content starts on the line following i.

The mailbox iterator e points to the start of the next message, which means the end of msg is two lines (the blank line and the "From ") before the line pointed to by e. As a sanity check, the message header must contain at least one field (RFC 822 requires that it contain at least three fields, but one's enough here).

<mailbox-string.cc>+= [<-D->]

string mailbox::parse_message(
  Mailbox_itr & s, Mailbox_itr & e, Message & msg) {

  string resp = parse_header(s, e, msg.header);
  if (resp != "") return resp;

  const unsigned hdrcnt = msg.header.size();
  assert(hdrcnt > 0);

  msg.content_start = msg.header[hdrcnt - 1].second + 1;
  msg.content_end = e - 2;

  return "";
  }
Defines parse_message (links are to index).

Given the message delimited by the iterator range (s, e), find the header and store it in hedr. Return the empty string if everything went well; otherwise return a string describing the problem.

A line is unfolded if it doesn't start with a space or a tab.

<mailbox-string.cc>+= [<-D->]

static bool is_empty(const string & str) {
  return (str.length() == 0);
  }

static bool is_unfolded(const string & str) {
  return is_empty(str) || !index(" \t", str[0]);
  }

string mailbox::parse_header(Mailbox_itr s, Mailbox_itr e, Header & hedr) {

  if (!is_unfolded(*s))
    return "Corrupted mailbox:  message with first header line folded";

  Mailbox_itr hdr_end = find_if(s, e, is_empty);
  if (hdr_end == e)
    return "Corrupted mailbox:  message with no blank line after header";
  if (hdr_end == s)
    return "Corrupted mailbox:  message with no header";

  do {
    e = find_if(s + 1, hdr_end, is_unfolded);
    hedr.push_back(Header_field(s, e));
    s = e;
    }
  while (s != hdr_end);

  return "";
  }
Defines is_empty, is_unfolded, parse_header (links are to index).

Ordering the messages is done in two passes over the message list. The first pass marks the messages that should be moved to the head of the message list; the second pass does the moving.

The idea is to move all matching messages before all non-matching message, which can be done by the partition() generic algorithm. The requirement that relative order be maintained can be handled by the stable_partition() generic algorithm.

Both order() and delete() need to partition the message list using the same criterion. delete() could call order() (or vice versa) to do the partitioning, but delete() needs to know the location of the partition, and order() can't return that information. partition_matches() solves this problem by doing the partition and returning the partition point; order() can ignore the return value and delete() can use it.

<mailbox-string.cc>+= [<-D->]

bool mailbox::reorder_msg(const Message & m) {
  return m.matching;
  }

mailbox::Msglst_itr mailbox::partition_matches(
  const string & field, const string & word) {

  for (Msglst_itr i = msg_list.begin(); i != msg_list.end(); i++) 
    search_header(*i, field, word);

  return stable_partition(msg_list.begin(), msg_list.end(), reorder_msg);
  }
Defines partition_matches, reorder_msg (links are to index).

Order the message list so that all messages having word in a header field labeled field appear before any message not having word in a header field labeled field.

<mailbox-string.cc>+= [<-D->]

void mailbox::order(const string & field, const string & word) {
  partition_matches(field, word);
  }
Defines order (links are to index).

search_header_field() returns true if the header field delimited by the iterator range (s, e) has the label label and contains the word word. If the header field field doesn't have the label label or doesn't contain the word word, search_header_field() returns false.

A little bit of care is needed to make sure the header-field label doesn't take part in the search for word; that's what start does.

<mailbox-string.cc>+= [<-D->]

bool mailbox::search_header_field(
  const Header_field & hfield, const string & label, const string & word) {

  Mailbox_citr s = hfield.first;
  const Mailbox_citr e = hfield.second;
  assert(s != e);

  if (s->find(label) != 0) return false;
  unsigned start = label.size();
  if ((*s)[start] != ':') return false;

  do {
    if (s->find(word, start) != string::npos) return true;    
    start = 0;
    }
  while (++s != e);

  return false;
  }
Defines search_header_field (links are to index).

If message msg's header contains a field labeled with label and containing the word word in its text field, then search_header() sets msg.matching to true to indicate that msg should be re-ordered in the message list. Otherwise msg.matching is set to false to indicate that msg should be left alone in the message list.

<mailbox-string.cc>+= [<-D->]

void mailbox::search_header(
  Message & msg, const string & label, const string & word) {

  assert(msg.header.size() > 0);
  for (Header_itr i = msg.header.begin(); i != msg.header.end(); i++)
    if (msg.matching = search_header_field(*i, label, word)) break;
  }
Defines search_header (links are to index).

Sorting the message list on the original message order field in the messages restores the original message order. The previous version of restore_original_order() used the sort() generic algorithm, but because sort() can't work with lists, this version uses the sort() member function for lists.

<mailbox-string.cc>+= [<-D->]

bool mailbox::original_msg_order(const Message & m1, const Message & m2) {
  return m1.original_order < m2.original_order;
  }

void mailbox::restore_original_order(void) {
  msg_list.sort(original_msg_order);
  }
Defines original_msg_order, restore_original_order (links are to index).

To delete messages, move the messages to be deleted to the front of the message list (that is, order them) and then splice them into the deleted message list.

<mailbox-string.cc>+= [<-D->]

void mailbox::delete_msgs(const string & field, const string & word) {
  Msglst_itr i = partition_matches(field, word);
  deleted_msgs.splice(deleted_msgs.end(), msg_list, msg_list.begin(), i);
  }
Defines delete_msgs (links are to index).

To restore deleted messages, if any, splice them onto the end of the message list, which also empties the deleted-message list.

<mailbox-string.cc>+= [<-D]

void mailbox::restore_deleted_msgs(void) {
  msg_list.splice(msg_list.end(), deleted_msgs);
  }
Defines restore_deleted_msgs (links are to index).

Index


This page last modified on 1 August 2000.